Skip to main content

A/B Testing Routers

A/B testing routers split traffic between multiple model variants, compare their performance, and enable data-driven decisions about which model to deploy to production. Routers are deployed to Kubernetes via the AI Gateway and serve predictions through a dedicated endpoint.

Why Use A/B Testing?

Machine learning models need continuous validation in production. A/B testing helps you:

  • Validate Model Changes -- Test a new model against your current production model with real traffic before fully replacing it
  • Measure Impact -- Quantify differences in accuracy, latency, and business metrics between model versions
  • Reduce Risk -- Gradually roll out new models rather than replacing everything at once
  • Optimize Continuously -- Use multi-armed bandit algorithms to automatically route more traffic to better-performing models

Routing Strategies

Weighted Random

Split traffic randomly based on configured percentages. Weights are proportional -- a variant with weight 70 gets 70% of traffic relative to the total weight of all enabled variants.

If all weights are zero, traffic is distributed equally across all enabled variants.

Best for: Standard A/B tests where you want precise, controlled traffic allocation.

Feature-Based

Route requests based on input feature values. Rules are evaluated in priority order (lower number = higher priority). The first matching rule determines which variant handles the request. If no rule matches, the request goes to the control variant.

Supported operators:

OperatorDescriptionExample
equalsExact matchregion equals "us-east"
not_equalsNot equaltier not_equals "free"
inValue is in arraycountry in ["US", "CA", "UK"]
not_inValue is not in arraystatus not_in ["blocked", "suspended"]
greater_thanNumeric greater thanage greater_than 18
less_thanNumeric less thanscore less_than 0.5
containsString containsemail contains "@company.com"
regexRegular expression matchsku regex "^PREMIUM-.*"

Each rule specifies:

FieldDescription
ruleIdUnique identifier for the rule
featureNameInput feature to evaluate
operatorComparison operator
valueValue to compare against
targetVariantIdRoute to this variant if rule matches
priorityEvaluation order (lower = higher priority)

Best for: Segment-specific testing, personalized model selection, routing premium customers to specialized models.

Multi-Armed Bandit

Automatically balance exploration (testing variants) and exploitation (routing to the best performer). The algorithm learns which variants perform best and adjusts traffic allocation over time based on reward feedback.

Algorithms available:

AlgorithmHow It Works
Epsilon-GreedyRoutes most traffic to the best-performing variant. With probability epsilon (default 0.1), explores a random variant instead. Variants without any stats are explored first.
Thompson SamplingUses Bayesian probability with a Beta distribution. Samples from each variant's posterior distribution and selects the variant with the highest sample. Starts with an uninformed prior (Beta(1,1) = uniform).
UCB1Upper Confidence Bound algorithm. Selects the variant with the highest UCB score: avg_reward + c * sqrt(ln(total_pulls) / arm_pulls). Unpulled arms have infinite UCB and are explored first. The exploration bonus c (default 2.0) controls the exploration-exploitation tradeoff.

Bandit configuration:

FieldDefaultDescription
algorithmepsilon_greedyWhich MAB algorithm to use
epsilon0.1Exploration rate for epsilon-greedy (0-1)
explorationBonus2.0Exploration bonus constant for UCB1
rewardMetricsuccess_rateMetric to optimize: latency, success_rate, or custom
updateIntervalMinutes60How often to recalculate arm statistics

Best for: Continuous optimization when you want to minimize exposure to poor-performing variants while still discovering better ones.

Canary

Gradually roll out a new model variant by routing a configurable percentage of traffic to the canary while the rest goes to the control variant. This strategy is designed for safe, incremental deployments.

Canary configuration:

FieldDefaultDescription
targetVariantIdSecond variantThe canary variant receiving incremental traffic
controlVariantIdFirst variantThe stable control variant
currentPercentage5Current percentage of traffic going to canary
targetPercentage100Target percentage for full rollout
incrementPercentage10How much to increase per interval
incrementIntervalMinutes60How often to increment the percentage
errorRateThreshold0.05Maximum acceptable error rate before rollback
latencyThresholdMs5000Maximum acceptable latency before rollback

Best for: Safe rollouts of new model versions where you want to gradually increase traffic while monitoring for regressions.

Creating a Router

Via the UI

  1. Navigate to MLOps then A/B Testing Routers
  2. Click Create Router
  3. Enter a name and optional description and tags
  4. Select a routing strategy
  5. Add model variants (minimum 2 required):
    • Assign variant IDs (e.g., "control", "treatment_a")
    • Select models from the Model Registry
    • Set traffic weights (for weighted random)
    • Mark one variant as the control
  6. Configure strategy-specific settings:
    • Weighted Random -- Set weights per variant (0-100)
    • Feature-Based -- Define routing rules with operators and priorities
    • Multi-Armed Bandit -- Choose algorithm and parameters
    • Canary -- Configure target/control variants and rollout settings
  7. Click Create Router

Router Methods

routers.create({
name,
description,
tags,
strategy,
variants,
featureRules,
banditConfig,
canaryConfig,
workspaceId
})

Variant structure:

FieldRequiredDescription
variantIdYesUnique identifier (e.g., "control", "treatment_a")
modelIdYesReference to model in Model Registry
weightNoTraffic weight 0-100 (default: equal distribution)
isControlNoWhether this is the control variant (default: first variant)

Variants are automatically initialized with isEnabled: true, isShadow: false, and zero metrics.

Managing Variants

Adjusting Traffic Weights

routers.updateVariantWeight(routerId, variantId, weight)

Weight must be between 0 and 100. Changes take effect immediately for subsequent requests. This is only meaningful for the weighted random strategy.

Enabling and Disabling Variants

routers.toggleVariant(routerId, variantId, enabled)

Toggle a variant's enabled status. At least 2 variants must remain enabled at all times. When a variant is disabled, its traffic is redistributed among the remaining enabled variants.

Deployment Operations

Deploying a Router

routers.deploy(routerId)

Deploys the router to Kubernetes via the AI Gateway service:

  1. Router status changes to deploying
  2. Configuration is sent to the AI Gateway backend with organization ID for namespace isolation
  3. On success, status changes to running and the endpoint URL is stored
  4. On failure, status changes to failed with error details in the deployment logs

The router is deployed with default resources:

  • CPU: 500m
  • Memory: 512Mi
  • Replicas: 1

Stopping a Router

routers.stop(routerId)

Stops the router deployment while preserving all configuration and historical data. Status changes to stopped.

Starting a Stopped Router

routers.start(routerId)

Re-deploys a stopped router. The router must be in stopped status.

Deleting a Router

routers.delete(routerId)

Performs a soft delete -- sets deleted: true and deletedAt timestamp. If the router is currently running, it is stopped first. Deleted routers are excluded from list queries.

Metrics and Predictions

Router Metrics

routers.getMetrics(routerId, { startDate, endDate })

Returns aggregated metrics calculated from the router_predictions collection:

MetricDescription
totalRequestsTotal prediction requests processed
successCountNumber of successful predictions
errorCountNumber of failed predictions
successRateRatio of successes to total requests
avgLatencyMsAverage total latency across all requests
variantMetricsPer-variant breakdown of requests, success rate, and latency

Prediction History

routers.getPredictions(routerId, { limit, offset, variantId })

Returns prediction log entries sorted by creation date (newest first). Each entry includes:

FieldDescription
predictionIdUnique prediction identifier
variantIdWhich variant handled the request
modelIdWhich model served the prediction
routingDecisionStrategy used and reason for routing (e.g., "epsilon_greedy: exploit")
requestInput features and timestamp
responsePrediction result, probabilities, and confidence
metricsRouting time, model latency, total latency, success status
entityIdBusiness entity ID for ground truth matching
feedbackReward feedback if provided (for MAB)

Prediction logs have a 90-day TTL and are automatically cleaned up.

Feedback for Bandit Optimization

For multi-armed bandit routing to optimize traffic allocation, you need to provide reward feedback:

routers.recordFeedback(predictionId, { reward, label })
FieldDescription
rewardNumeric reward value (e.g., 1.0 for success, 0.0 for failure)
labelOptional label for the feedback (e.g., "correct", "incorrect")

When reward feedback is recorded:

  1. The prediction's feedback record is updated with the reward, label, timestamp, and source
  2. If the router uses a bandit strategy, the arm statistics for the variant are updated:
    • pulls is incremented
    • totalReward accumulates the reward value
    • avgReward is recalculated as totalReward / pulls

This feedback loop drives the bandit algorithm's learning -- variants that receive higher rewards are selected more frequently.

Permissions

Updating Access

routers.updatePermissions(routerId, { isPublic, sharedWith })
FieldDescription
isPublicIf true, all authenticated users can view the router
sharedWithArray of user IDs who can view the router

Only the owner and administrators can modify router permissions.

Access Rules

  • Admins see all routers (excluding soft-deleted ones)
  • Owners have full access to their own routers
  • Shared users can view routers shared with them
  • Public routers are visible to all authenticated users
  • Only owners and admins can modify, deploy, stop, or delete routers

Collections and Indexes

Routers Collection

IndexPurpose
access.ownerFind routers by owner
deployment.statusFilter by deployment status
workspaceIdFind routers in a workspace
createdAt (desc)Sort by creation date
access.sharedWithFind shared routers
deletedExclude soft-deleted routers

Router Predictions Collection

IndexPurpose
routerId, createdAt (desc)Query predictions for a router
predictionId (unique)Look up predictions by ID
entityId, routerIdMatch ground truth to predictions
variantId, routerIdFilter predictions by variant
createdAt (TTL: 90 days)Automatic cleanup of old predictions

Router Experiments Collection

IndexPurpose
routerIdFind experiments for a router
statusFilter by experiment status
access.ownerFind experiments by owner
createdAt (desc)Sort by creation date

Best Practices

Designing Tests

  1. Define success metrics before creating a router -- know what you are measuring
  2. Start with small traffic percentages (5-10%) to the new variant before increasing
  3. Run tests long enough to achieve statistically significant sample sizes
  4. Document hypotheses for each test to track what you learned

Choosing a Strategy

  • Use weighted random for standard A/B tests with precise traffic control
  • Use feature-based when different user segments should see different models
  • Use multi-armed bandit when you want to minimize exposure to underperforming variants and automatically optimize
  • Use canary for cautious rollouts of new model versions with automatic rollback thresholds

Monitoring

  • Watch for increased error rates or latency spikes after deploying a new variant
  • Monitor per-variant metrics to compare performance across all dimensions
  • For bandit routing, track arm statistics to verify the algorithm is converging on the best variant
  • Set up canary thresholds to automatically detect regressions

After Testing

  • Once a test concludes, deploy the winning variant as the sole model
  • Clean up stopped routers to keep the list manageable
  • Archive test results and learnings for future reference