Skip to main content

Model Comparison

Prompt Studio's comparison feature lets you test the same prompt against multiple AI Gateway models simultaneously, helping you choose the best model for your use case.

Getting Started

From the Compare Page

  1. Navigate to Prompt Studio in the sidebar
  2. Click the Compare tab or go directly to /prompt-studio/compare
  3. Click Add Model to add models from your AI Gateway
  4. Write your system prompt and user prompt
  5. Click Run Comparison

From a Prompt

  1. Open any prompt's detail page
  2. Click the Compare button
  3. The prompt content is pre-populated as the system or user prompt
  4. Add models and run the comparison

Configuring Models

You can compare up to 4 models simultaneously. For each model, configure:

ParameterRangeDefaultDescription
Temperature0.0 - 2.00.7Higher values = more creative, lower = more deterministic
Max Tokens1 - model limit1000Maximum response length
Top P0.0 - 1.01.0Nucleus sampling threshold

Each model can have different parameters, allowing you to test variations like:

  • Same prompt, different models
  • Same model, different temperatures
  • Different system prompts per model

Reading Results

After a comparison runs, each model card shows:

Response

The model's generated text output, displayed in a scrollable area.

Metrics

MetricDescription
Response TimeTime in milliseconds from request to complete response
Input TokensNumber of tokens in the prompt (system + user)
Output TokensNumber of tokens in the response
Total TokensSum of input and output tokens

The best value for each metric is highlighted in green across all models, making it easy to identify the fastest or most efficient model.

Saving & Exporting

Export as JSON

Click Export JSON to download the complete comparison data including prompts, parameters, responses, and metrics. Useful for offline analysis or sharing with teammates.

Save to History

Click Save Comparison to store the results with optional notes. Saved comparisons appear in the History tab and can be reviewed later.

Comparison History

The History tab shows all saved comparisons with:

  • Date and time
  • Models used (as badges)
  • Notes
  • Actions: View details or Delete

Tips

  • Start with temperature 0.7 as a baseline, then test 0.3 (factual) and 1.0 (creative)
  • Compare cost vs quality: A smaller model at lower cost might be sufficient for simple tasks
  • Test edge cases: Try unusual inputs to see which model handles them best
  • Save comparisons before and after prompt changes to track improvements
  • Use the same user prompt across models for fair comparison