Skip to main content

A/B Tests

Create, deploy, and analyze A/B tests that split inference traffic across model variants. Supports weighted-random, feature-based, multi-armed-bandit, and canary strategies, plus formal experiments with statistical significance testing.

All endpoints require authentication via X-API-Key header and the appropriate scope.


A/B Test Object

{
"id": "ab_abc123",
"name": "Llama vs Mistral routing",
"description": "Compare Llama-3 70B against Mistral-Large on customer support traffic",
"strategy": "weighted_random",
"status": "running",
"variants": [
{
"variantId": "var_control",
"modelId": "model_llama3_70b",
"weight": 50,
"enabled": true,
"isControl": true
},
{
"variantId": "var_treatment",
"modelId": "model_mistral_large",
"weight": 50,
"enabled": true,
"isControl": false
}
],
"tags": ["routing", "production"],
"featureRules": [],
"banditConfig": null,
"canaryConfig": null,
"workspaceId": "ws_abc123",
"organizationId": "org_xyz",
"ownerId": "user_456",
"createdAt": "2025-01-15T10:30:00Z",
"updatedAt": "2025-02-01T14:22:00Z"
}

GET /api/v1/ab-tests

List A/B tests accessible to the authenticated user.

Scope: mlops:read

Query Parameters

ParameterTypeRequiredDescription
statusstringNoFilter by status: draft, active, running, stopped
strategystringNoFilter by strategy: weighted_random, feature_based, multi_armed_bandit, canary
workspaceIdstringNoFilter by workspace ID

Response 200 OK

{
"count": 2,
"abTests": [
{
"id": "ab_abc123",
"name": "Llama vs Mistral routing",
"strategy": "weighted_random",
"status": "running",
"variants": [],
"createdAt": "2025-01-15T10:30:00Z"
}
]
}

POST /api/v1/ab-tests

Create a new A/B test. Requires at least 2 model variants.

Scope: mlops:write

Request Body

{
"name": "Llama vs Mistral routing",
"strategy": "weighted_random",
"variants": [
{ "variantId": "var_control", "modelId": "model_llama3_70b", "weight": 50, "isControl": true },
{ "variantId": "var_treatment", "modelId": "model_mistral_large", "weight": 50 }
],
"description": "Compare Llama-3 70B against Mistral-Large",
"tags": ["routing"],
"featureRules": [],
"banditConfig": null,
"canaryConfig": null
}
FieldTypeRequiredDescription
namestringYesA/B test name
strategystringYesRouting strategy: weighted_random, feature_based, multi_armed_bandit, canary
variantsarrayYesArray of { variantId, modelId, weight?, isControl? } — at least 2 required
descriptionstringNoA/B test description
tagsarrayNoTags for organization
featureRulesarrayNoFeature-based routing rules (for feature_based strategy)
banditConfigobjectNoMulti-armed bandit config: { explorationRate, rewardMetric, windowSize }
canaryConfigobjectNoCanary config: { initialWeight, incrementStep, successThreshold, rollbackThreshold }

Response 201 Created

{
"abTestId": "ab_abc123"
}

GET /api/v1/ab-tests/:id

Get full details of an A/B test.

Scope: mlops:read

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID

Response 200 OK

Returns the full A/B Test object.


PUT /api/v1/ab-tests/:id

Update an A/B test's name, description, or tags.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID

Request Body

{
"name": "Updated routing test",
"description": "Updated description",
"tags": ["routing", "v2"]
}
FieldTypeRequiredDescription
namestringNoNew A/B test name
descriptionstringNoNew description
tagsarrayNoUpdated tags

Response 200 OK

{
"updated": true
}

DELETE /api/v1/ab-tests/:id

Soft-delete an A/B test.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID

Response 204 No Content


POST /api/v1/ab-tests/:id/deploy

Deploy an A/B test to production via the AI Gateway. Creates a routing endpoint that distributes inference requests across model variants based on the configured strategy.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID

Response 200 OK

{
"deployed": true
}

POST /api/v1/ab-tests/:id/stop

Stop a running A/B test deployment.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID

Response 200 OK

{
"stopped": true
}

POST /api/v1/ab-tests/:id/start

Start a stopped A/B test (re-deploys it).

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID

Response 200 OK

{
"started": true
}

PUT /api/v1/ab-tests/:id/variants/:sub/weight

Update the traffic weight for a specific variant.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID
substringYesVariant ID

Request Body

{
"weight": 70
}
FieldTypeRequiredDescription
weightnumberYesNew weight (0–100, percentage of traffic)

Response 200 OK

{
"updated": true
}

PUT /api/v1/ab-tests/:id/variants/:sub/toggle

Enable or disable a variant. At least 2 variants must remain enabled.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID
substringYesVariant ID

Request Body

{
"enabled": false
}
FieldTypeRequiredDescription
enabledbooleanYesEnable (true) or disable (false) the variant

Response 200 OK

{
"toggled": true
}

GET /api/v1/ab-tests/:id/metrics

Get A/B test metrics: total requests, success rate, average latency, and per-variant breakdowns.

Scope: mlops:read

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID

Query Parameters

ParameterTypeRequiredDescription
startDatestringNoFilter start date (ISO 8601)
endDatestringNoFilter end date (ISO 8601)

Response 200 OK

{
"totalRequests": 12450,
"successRate": 0.987,
"averageLatencyMs": 312,
"variants": [
{
"variantId": "var_control",
"requests": 6230,
"successRate": 0.985,
"averageLatencyMs": 320
},
{
"variantId": "var_treatment",
"requests": 6220,
"successRate": 0.989,
"averageLatencyMs": 304
}
]
}

GET /api/v1/ab-tests/:id/predictions

Get prediction logs routed through this A/B test.

Scope: mlops:read

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID

Query Parameters

ParameterTypeRequiredDescription
limitintegerNoMax results
offsetintegerNoPagination offset
variantIdstringNoFilter by specific variant

Response 200 OK

{
"count": 50,
"predictions": [
{
"predictionId": "pred_abc001",
"variantId": "var_control",
"modelId": "model_llama3_70b",
"latencyMs": 310,
"reward": 0.92,
"createdAt": "2025-02-01T14:22:00Z"
}
]
}

POST /api/v1/ab-tests/predictions/:id/feedback

Record feedback/reward for a routed prediction. Essential for multi-armed-bandit strategy.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesPrediction ID

Request Body

{
"reward": 0.92,
"label": "correct"
}
FieldTypeRequiredDescription
rewardnumberNoReward score (0–1 for bandit optimization)
labelstringNoFeedback label (e.g. correct/incorrect)

Response 200 OK

{
"recorded": true
}

POST /api/v1/ab-tests/:id/experiments

Create a formal A/B testing experiment. Defines a control variant (baseline) and treatment variants to compare with statistical significance testing.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesA/B test ID (used as routerId)

Request Body

{
"name": "Latency reduction experiment",
"controlVariantId": "var_control",
"treatmentVariantIds": ["var_treatment"],
"description": "Test if Mistral reduces latency vs Llama-3",
"hypothesis": "Mistral-Large will reduce p95 latency by 15%",
"primaryMetric": "latency",
"confidenceLevel": 0.95,
"minimumDetectableEffect": 0.05,
"minSamplePerVariant": 1000
}
FieldTypeRequiredDescription
namestringYesExperiment name
controlVariantIdstringYesControl (baseline) variant ID
treatmentVariantIdsarrayYesArray of treatment variant IDs to compare against control
descriptionstringNoExperiment description
hypothesisstringNoWhat you expect to happen
primaryMetricstringNoPrimary metric to measure (e.g. latency, accuracy, reward)
confidenceLevelnumberNoStatistical confidence level (default 0.95)
minimumDetectableEffectnumberNoMinimum effect size to detect
minSamplePerVariantnumberNoMinimum samples per variant before analysis

Response 201 Created

{
"experimentId": "exp_abc001"
}

POST /api/v1/ab-tests/experiments/:id/start

Start collecting data for an A/B testing experiment.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesExperiment ID

Response 200 OK

{
"experimentId": "exp_abc001",
"status": "running",
"startedAt": "2025-02-01T14:22:00Z"
}

POST /api/v1/ab-tests/experiments/:id/analyze

Run statistical analysis on an experiment. Returns p-values, confidence intervals, and whether the treatment outperforms the control.

Scope: mlops:read

Path Parameters

ParameterTypeRequiredDescription
idstringYesExperiment ID

Response 200 OK

{
"experimentId": "exp_abc001",
"controlMean": 320.1,
"treatmentMean": 304.5,
"pValue": 0.013,
"confidenceInterval": [-22.4, -8.7],
"significant": true,
"winner": "var_treatment"
}

POST /api/v1/ab-tests/experiments/:id/stop

Stop a running experiment. Freezes data collection.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesExperiment ID

Response 200 OK

{
"experimentId": "exp_abc001",
"status": "stopped",
"stoppedAt": "2025-02-05T10:00:00Z"
}

DELETE /api/v1/ab-tests/experiments/:id

Delete an A/B testing experiment and its data.

Scope: mlops:write

Path Parameters

ParameterTypeRequiredDescription
idstringYesExperiment ID

Response 204 No Content