Model Comparison
Prompt Studio's comparison feature lets you test the same prompt against multiple AI Gateway models simultaneously, helping you choose the best model for your use case.
Getting Started
From the Compare Page
- Navigate to Prompt Studio in the sidebar
- Click the Compare tab or go directly to
/prompt-studio/compare - Click Add Model to add models from your AI Gateway
- Write your system prompt and user prompt
- Click Run Comparison
From a Prompt
- Open any prompt's detail page
- Click the Compare button
- The prompt content is pre-populated as the system or user prompt
- Add models and run the comparison
Configuring Models
You can compare up to 4 models simultaneously. For each model, configure:
| Parameter | Range | Default | Description |
|---|---|---|---|
| Temperature | 0.0 - 2.0 | 0.7 | Higher values = more creative, lower = more deterministic |
| Max Tokens | 1 - model limit | 1000 | Maximum response length |
| Top P | 0.0 - 1.0 | 1.0 | Nucleus sampling threshold |
Each model can have different parameters, allowing you to test variations like:
- Same prompt, different models
- Same model, different temperatures
- Different system prompts per model
Reading Results
After a comparison runs, each model card shows:
Response
The model's generated text output, displayed in a scrollable area.
Metrics
| Metric | Description |
|---|---|
| Response Time | Time in milliseconds from request to complete response |
| Input Tokens | Number of tokens in the prompt (system + user) |
| Output Tokens | Number of tokens in the response |
| Total Tokens | Sum of input and output tokens |
The best value for each metric is highlighted in green across all models, making it easy to identify the fastest or most efficient model.
Saving & Exporting
Export as JSON
Click Export JSON to download the complete comparison data including prompts, parameters, responses, and metrics. Useful for offline analysis or sharing with teammates.
Save to History
Click Save Comparison to store the results with optional notes. Saved comparisons appear in the History tab and can be reviewed later.
Comparison History
The History tab shows all saved comparisons with:
- Date and time
- Models used (as badges)
- Notes
- Actions: View details or Delete
Tips
- Start with temperature 0.7 as a baseline, then test 0.3 (factual) and 1.0 (creative)
- Compare cost vs quality: A smaller model at lower cost might be sufficient for simple tasks
- Test edge cases: Try unusual inputs to see which model handles them best
- Save comparisons before and after prompt changes to track improvements
- Use the same user prompt across models for fair comparison