Model Comparison

Library's comparison feature lets you test the same prompt against multiple AI Gateway models simultaneously, helping you choose the best model for your use case.

Getting Started

From the Compare Page

Navigate to Library in the sidebar
Click the Compare tab or go directly to /library/prompts/compare
Click Add Model to add models from your AI Gateway
Write your system prompt and user prompt
Click Run Comparison

From a Prompt

Open any prompt's detail page
Click the Compare button
The prompt content is pre-populated as the system or user prompt
Add models and run the comparison

Configuring Models

You can compare up to 4 models simultaneously. For each model, configure:

Parameter	Range	Default	Description
Temperature	0.0 - 2.0	0.7	Higher values = more creative, lower = more deterministic
Max Tokens	1 - model limit	1000	Maximum response length
Top P	0.0 - 1.0	1.0	Nucleus sampling threshold

Each model can have different parameters, allowing you to test variations like:

Same prompt, different models
Same model, different temperatures
Different system prompts per model

Reading Results

After a comparison runs, each model card shows:

Response

The model's generated text output, displayed in a scrollable area.

Metrics

Metric	Description
Response Time	Time in milliseconds from request to complete response
Input Tokens	Number of tokens in the prompt (system + user)
Output Tokens	Number of tokens in the response
Total Tokens	Sum of input and output tokens

The best value for each metric is highlighted in green across all models, making it easy to identify the fastest or most efficient model.

Saving & Exporting

Export as JSON

Click Export JSON to download the complete comparison data including prompts, parameters, responses, and metrics. Useful for offline analysis or sharing with teammates.

Save to History

Click Save Comparison to store the results with optional notes. Saved comparisons appear in the History tab and can be reviewed later.

Comparison History

The History tab shows all saved comparisons with:

Date and time
Models used (as badges)
Notes
Actions: View details or Delete

Tips

Start with temperature 0.7 as a baseline, then test 0.3 (factual) and 1.0 (creative)
Compare cost vs quality: A smaller model at lower cost might be sufficient for simple tasks
Test edge cases: Try unusual inputs to see which model handles them best
Save comparisons before and after prompt changes to track improvements
Use the same user prompt across models for fair comparison

Getting Started​

From the Compare Page​

From a Prompt​

Configuring Models​

Reading Results​

Response​

Metrics​

Saving & Exporting​

Export as JSON​

Save to History​

Comparison History​

Tips​