A/B Testing Routers

A/B testing routers split traffic between multiple model variants, compare their performance, and enable data-driven decisions about which model to deploy to production. Routers are deployed to Kubernetes via the AI Gateway and serve predictions through a dedicated endpoint.

Why Use A/B Testing?

Machine learning models need continuous validation in production. A/B testing helps you:

Validate Model Changes -- Test a new model against your current production model with real traffic before fully replacing it
Measure Impact -- Quantify differences in accuracy, latency, and business metrics between model versions
Reduce Risk -- Gradually roll out new models rather than replacing everything at once
Optimize Continuously -- Use multi-armed bandit algorithms to automatically route more traffic to better-performing models

Routing Strategies

Weighted Random

Split traffic randomly based on configured percentages. Weights are proportional -- a variant with weight 70 gets 70% of traffic relative to the total weight of all enabled variants.

If all weights are zero, traffic is distributed equally across all enabled variants.

Best for: Standard A/B tests where you want precise, controlled traffic allocation.

Feature-Based

Route requests based on input feature values. Rules are evaluated in priority order (lower number = higher priority). The first matching rule determines which variant handles the request. If no rule matches, the request goes to the control variant.

Supported operators:

Operator	Description	Example
`equals`	Exact match	`region equals "us-east"`
`not_equals`	Not equal	`tier not_equals "free"`
`in`	Value is in array	`country in ["US", "CA", "UK"]`
`not_in`	Value is not in array	`status not_in ["blocked", "suspended"]`
`greater_than`	Numeric greater than	`age greater_than 18`
`less_than`	Numeric less than	`score less_than 0.5`
`contains`	String contains	`email contains "@company.com"`
`regex`	Regular expression match	`sku regex "^PREMIUM-.*"`

Each rule specifies:

Field	Description
`ruleId`	Unique identifier for the rule
`featureName`	Input feature to evaluate
`operator`	Comparison operator
`value`	Value to compare against
`targetVariantId`	Route to this variant if rule matches
`priority`	Evaluation order (lower = higher priority)

Best for: Segment-specific testing, personalized model selection, routing premium customers to specialized models.

Multi-Armed Bandit

Automatically balance exploration (testing variants) and exploitation (routing to the best performer). The algorithm learns which variants perform best and adjusts traffic allocation over time based on reward feedback.

Algorithms available:

Algorithm	How It Works
Epsilon-Greedy	Routes most traffic to the best-performing variant. With probability epsilon (default 0.1), explores a random variant instead. Variants without any stats are explored first.
Thompson Sampling	Uses Bayesian probability with a Beta distribution. Samples from each variant's posterior distribution and selects the variant with the highest sample. Starts with an uninformed prior (Beta(1,1) = uniform).
UCB1	Upper Confidence Bound algorithm. Selects the variant with the highest UCB score: `avg_reward + c * sqrt(ln(total_pulls) / arm_pulls)`. Unpulled arms have infinite UCB and are explored first. The exploration bonus `c` (default 2.0) controls the exploration-exploitation tradeoff.

Bandit configuration:

Field	Default	Description
`algorithm`	`epsilon_greedy`	Which MAB algorithm to use
`epsilon`	0.1	Exploration rate for epsilon-greedy (0-1)
`explorationBonus`	2.0	Exploration bonus constant for UCB1
`rewardMetric`	`success_rate`	Metric to optimize: `latency`, `success_rate`, or `custom`
`updateIntervalMinutes`	60	How often to recalculate arm statistics

Best for: Continuous optimization when you want to minimize exposure to poor-performing variants while still discovering better ones.

Canary

Gradually roll out a new model variant by routing a configurable percentage of traffic to the canary while the rest goes to the control variant. This strategy is designed for safe, incremental deployments.

Canary configuration:

Field	Default	Description
`targetVariantId`	Second variant	The canary variant receiving incremental traffic
`controlVariantId`	First variant	The stable control variant
`currentPercentage`	5	Current percentage of traffic going to canary
`targetPercentage`	100	Target percentage for full rollout
`incrementPercentage`	10	How much to increase per interval
`incrementIntervalMinutes`	60	How often to increment the percentage
`errorRateThreshold`	0.05	Maximum acceptable error rate before rollback
`latencyThresholdMs`	5000	Maximum acceptable latency before rollback

Best for: Safe rollouts of new model versions where you want to gradually increase traffic while monitoring for regressions.

Creating a Router

Via the UI

Navigate to MLOps then A/B Testing Routers
Click Create Router
Enter a name and optional description and tags
Select a routing strategy
Add model variants (minimum 2 required):
- Assign variant IDs (e.g., "control", "treatment_a")
- Select models from the Model Registry
- Set traffic weights (for weighted random)
- Mark one variant as the control
Configure strategy-specific settings:
- Weighted Random -- Set weights per variant (0-100)
- Feature-Based -- Define routing rules with operators and priorities
- Multi-Armed Bandit -- Choose algorithm and parameters
- Canary -- Configure target/control variants and rollout settings
Click Create Router

Router Methods

routers.create({
  name,
  description,
  tags,
  strategy,
  variants,
  featureRules,
  banditConfig,
  canaryConfig,
  workspaceId
})

Variant structure:

Field	Required	Description
`variantId`	Yes	Unique identifier (e.g., "control", "treatment_a")
`modelId`	Yes	Reference to model in Model Registry
`weight`	No	Traffic weight 0-100 (default: equal distribution)
`isControl`	No	Whether this is the control variant (default: first variant)

Variants are automatically initialized with isEnabled: true, isShadow: false, and zero metrics.

Managing Variants

Adjusting Traffic Weights

routers.updateVariantWeight(routerId, variantId, weight)

Weight must be between 0 and 100. Changes take effect immediately for subsequent requests. This is only meaningful for the weighted random strategy.

Enabling and Disabling Variants

routers.toggleVariant(routerId, variantId, enabled)

Toggle a variant's enabled status. At least 2 variants must remain enabled at all times. When a variant is disabled, its traffic is redistributed among the remaining enabled variants.

Deployment Operations

Deploying a Router

routers.deploy(routerId)

Deploys the router to Kubernetes via the AI Gateway service:

Router status changes to deploying
Configuration is sent to the AI Gateway backend with organization ID for namespace isolation
On success, status changes to running and the endpoint URL is stored
On failure, status changes to failed with error details in the deployment logs

The router is deployed with default resources:

CPU: 500m
Memory: 512Mi
Replicas: 1

Stopping a Router

routers.stop(routerId)

Stops the router deployment while preserving all configuration and historical data. Status changes to stopped.

Starting a Stopped Router

routers.start(routerId)

Re-deploys a stopped router. The router must be in stopped status.

Deleting a Router

routers.delete(routerId)

Performs a soft delete -- sets deleted: true and deletedAt timestamp. If the router is currently running, it is stopped first. Deleted routers are excluded from list queries.

Metrics and Predictions

Router Metrics

routers.getMetrics(routerId, { startDate, endDate })

Returns aggregated metrics calculated from the router_predictions collection:

Metric	Description
`totalRequests`	Total prediction requests processed
`successCount`	Number of successful predictions
`errorCount`	Number of failed predictions
`successRate`	Ratio of successes to total requests
`avgLatencyMs`	Average total latency across all requests
`variantMetrics`	Per-variant breakdown of requests, success rate, and latency

Prediction History

routers.getPredictions(routerId, { limit, offset, variantId })

Returns prediction log entries sorted by creation date (newest first). Each entry includes:

Field	Description
`predictionId`	Unique prediction identifier
`variantId`	Which variant handled the request
`modelId`	Which model served the prediction
`routingDecision`	Strategy used and reason for routing (e.g., "epsilon_greedy: exploit")
`request`	Input features and timestamp
`response`	Prediction result, probabilities, and confidence
`metrics`	Routing time, model latency, total latency, success status
`entityId`	Business entity ID for ground truth matching
`feedback`	Reward feedback if provided (for MAB)

Prediction logs have a 90-day TTL and are automatically cleaned up.

Feedback for Bandit Optimization

For multi-armed bandit routing to optimize traffic allocation, you need to provide reward feedback:

routers.recordFeedback(predictionId, { reward, label })

Field	Description
`reward`	Numeric reward value (e.g., 1.0 for success, 0.0 for failure)
`label`	Optional label for the feedback (e.g., "correct", "incorrect")

When reward feedback is recorded:

The prediction's feedback record is updated with the reward, label, timestamp, and source
If the router uses a bandit strategy, the arm statistics for the variant are updated:
- pulls is incremented
- totalReward accumulates the reward value
- avgReward is recalculated as totalReward / pulls

This feedback loop drives the bandit algorithm's learning -- variants that receive higher rewards are selected more frequently.

Permissions

Updating Access

routers.updatePermissions(routerId, { isPublic, sharedWith })

Field	Description
`isPublic`	If true, all authenticated users can view the router
`sharedWith`	Array of user IDs who can view the router

Only the owner and administrators can modify router permissions.

Access Rules

Admins see all routers (excluding soft-deleted ones)
Owners have full access to their own routers
Shared users can view routers shared with them
Public routers are visible to all authenticated users
Only owners and admins can modify, deploy, stop, or delete routers

Collections and Indexes

Routers Collection

Index	Purpose
`access.owner`	Find routers by owner
`deployment.status`	Filter by deployment status
`workspaceId`	Find routers in a workspace
`createdAt` (desc)	Sort by creation date
`access.sharedWith`	Find shared routers
`deleted`	Exclude soft-deleted routers

Router Predictions Collection

Index	Purpose
`routerId, createdAt` (desc)	Query predictions for a router
`predictionId` (unique)	Look up predictions by ID
`entityId, routerId`	Match ground truth to predictions
`variantId, routerId`	Filter predictions by variant
`createdAt` (TTL: 90 days)	Automatic cleanup of old predictions

Router Experiments Collection

Index	Purpose
`routerId`	Find experiments for a router
`status`	Filter by experiment status
`access.owner`	Find experiments by owner
`createdAt` (desc)	Sort by creation date

Best Practices

Designing Tests

Define success metrics before creating a router -- know what you are measuring
Start with small traffic percentages (5-10%) to the new variant before increasing
Run tests long enough to achieve statistically significant sample sizes
Document hypotheses for each test to track what you learned

Choosing a Strategy

Use weighted random for standard A/B tests with precise traffic control
Use feature-based when different user segments should see different models
Use multi-armed bandit when you want to minimize exposure to underperforming variants and automatically optimize
Use canary for cautious rollouts of new model versions with automatic rollback thresholds

Monitoring

Watch for increased error rates or latency spikes after deploying a new variant
Monitor per-variant metrics to compare performance across all dimensions
For bandit routing, track arm statistics to verify the algorithm is converging on the best variant
Set up canary thresholds to automatically detect regressions

After Testing

Once a test concludes, deploy the winning variant as the sole model
Clean up stopped routers to keep the list manageable
Archive test results and learnings for future reference

Why Use A/B Testing?​

Routing Strategies​

Weighted Random​

Feature-Based​

Multi-Armed Bandit​

Canary​

Creating a Router​

Via the UI​

Router Methods​

Managing Variants​

Adjusting Traffic Weights​

Enabling and Disabling Variants​

Deployment Operations​

Deploying a Router​

Stopping a Router​

Starting a Stopped Router​

Deleting a Router​

Metrics and Predictions​

Router Metrics​

Prediction History​

Feedback for Bandit Optimization​

Permissions​

Updating Access​

Access Rules​

Collections and Indexes​

Routers Collection​

Router Predictions Collection​

Router Experiments Collection​

Best Practices​

Designing Tests​

Choosing a Strategy​

Monitoring​

After Testing​