MLOps

The MLOps module provides production machine learning operations including A/B testing routers for traffic splitting between model variants, drift detection for monitoring model performance over time, and ground truth integration for measuring real-world accuracy.

Overview

Core Capabilities

A/B Testing Routers -- Split traffic between model variants using weighted random, feature-based, multi-armed bandit, or canary routing strategies
Drift Detection -- Monitor changes in production data distributions compared to training data baselines using PSI, KS test, Chi-Square, and Jensen-Shannon divergence
Ground Truth Integration -- Upload actual outcomes to track prediction accuracy over time and detect concept drift
Performance History -- Track model accuracy trends with configurable granularity (hourly, daily, weekly)
Alert Configuration -- Set drift thresholds and notification preferences per model

Architecture

Backend Services

Service	Description
Model Router	Backend service that implements routing strategies and serves predictions
AI Gateway	Handles router deployment and management via Kubernetes

A/B Testing Routers

Compare model variants in production with traffic splitting across four routing strategies:

Routing Strategies

Strategy	Description
Weighted Random	Split traffic by percentage between variants. A variant with weight 70 gets 70% of traffic relative to total enabled variant weight.
Feature-Based	Route requests based on input feature values using priority-ordered rules with operators (equals, not_equals, in, not_in, greater_than, less_than, contains, regex).
Multi-Armed Bandit	Automatically balance exploration and exploitation to route more traffic to better-performing variants. Supports epsilon-greedy, Thompson sampling, and UCB1 algorithms.
Canary	Gradually roll out a new model variant by routing a configurable percentage of traffic to the canary while the rest goes to the control variant.

Router Lifecycle

Routers follow this lifecycle:

registered --> deploying --> running --> stopped
                              |
                              v
                            failed

Status	Description
`registered`	Router created but not yet deployed
`deploying`	Deployment to Kubernetes is in progress
`running`	Router is live and serving predictions
`stopped`	Router is paused, preserving configuration
`failed`	Deployment failed with errors

Access Control

Owner -- The user who created the router (full modify/delete access)
Public -- Router can be marked as public for all users to view
Shared -- Router can be shared with specific users by ID
Admins -- Administrators have full access to all routers

Learn more about A/B Testing -->

Drift Detection

Monitor model performance and detect distribution changes in production data:

Statistical Tests

Metric	Use Case	Interpretation
PSI (Population Stability Index)	Overall distribution shift	< 0.1: OK, 0.1-0.2: Warning, > 0.2: Alert
Kolmogorov-Smirnov Test	Numerical feature drift	p-value < 0.05 indicates significant drift
Chi-Square Test	Categorical feature drift	p-value < 0.05 indicates significant drift
Jensen-Shannon Divergence	General distribution comparison	0-1 scale, higher values indicate more divergence

Drift Detection Workflow

Create a reference baseline from your training data feature distributions
Log predictions as your model serves requests in production
Run drift analysis to compare current production data against the baseline
Review results showing overall drift score and per-feature metrics
Upload ground truth to measure actual prediction accuracy over time

Drift Status Levels

Status	PSI Range	Action
`ok`	PSI < 0.1	No action needed. Data distribution is stable.
`warning`	0.1 <= PSI < 0.2	Investigate potential causes of drift.
`alert`	PSI >= 0.2	Significant drift detected. Consider retraining the model.

Learn more about Drift Detection -->

Router Methods

Method	Description
`routers.create`	Create a new router with strategy and variants
`routers.get`	Get a router by ID (access-controlled)
`routers.list`	List accessible routers with optional filters
`routers.update`	Update router name, description, or tags
`routers.delete`	Soft-delete a router (stops deployment first)
`routers.deploy`	Deploy router to Kubernetes via AI Gateway
`routers.stop`	Stop a running router
`routers.start`	Start a stopped router (re-deploys)
`routers.updateVariantWeight`	Change a variant's traffic weight (0-100)
`routers.toggleVariant`	Enable or disable a variant (minimum 2 must remain enabled)
`routers.getMetrics`	Get aggregated metrics with per-variant breakdown
`routers.getPredictions`	Get prediction history with pagination
`routers.recordFeedback`	Record reward feedback for bandit optimization
`routers.updatePermissions`	Update public/shared access settings

Drift Detection Methods

Method	Description
`drift.logPrediction`	Log a prediction with features and output for drift tracking
`drift.getPredictions`	Get prediction logs with date and ground truth filters
`drift.addGroundTruth`	Upload a single ground truth record
`drift.uploadGroundTruth`	Bulk upload ground truth from CSV/JSON
`drift.getUnmatchedPredictions`	Find predictions without ground truth
`drift.createBaseline`	Create a reference baseline from feature data
`drift.getActiveBaseline`	Get the active baseline for a model
`drift.setActiveBaseline`	Set which baseline is active
`drift.getBaselines`	List all baselines for a model
`drift.analyze`	Run drift analysis comparing production data to baseline
`drift.getLatestResult`	Get the most recent drift analysis result
`drift.getResults`	Get drift result history with optional status filter
`drift.getAlertConfig`	Get alert configuration for a model
`drift.updateAlertConfig`	Update alert thresholds and notification settings
`drift.getPerformanceHistory`	Get accuracy history with configurable granularity

Getting Started

Setting Up A/B Testing

Register your model variants in the Model Registry
Create a router with at least 2 variants and a routing strategy
Deploy the router to get a prediction endpoint
Route your application's requests through the router endpoint
Monitor metrics and adjust variant weights based on results

Setting Up Drift Detection

Export feature distributions from your training dataset
Create a reference baseline using drift.createBaseline
Configure prediction logging so production predictions are tracked
Run periodic drift analysis using drift.analyze
Optionally upload ground truth data to track accuracy over time
Configure alert thresholds to be notified of significant drift

Best Practices

A/B Testing

Start with small traffic percentages (5-10%) for new variants before increasing
Use multi-armed bandit routing when you want to minimize exposure to underperforming variants
Define success metrics before starting a test
Ensure statistically significant sample sizes before drawing conclusions
Use canary routing for cautious rollouts of new model versions

Drift Detection

Create baselines immediately after training while training data is available
Run drift analysis at regular intervals (daily or weekly) to catch issues early
Collect ground truth data wherever possible to measure real accuracy, not just distribution shifts
Investigate warning-level drift before it becomes critical
When significant drift is confirmed, retrain the model and create a new baseline
Use A/B testing routers to safely compare retrained models against current production versions

Overview​

Core Capabilities​

Architecture​

Backend Services​

A/B Testing Routers​

Routing Strategies​

Router Lifecycle​

Access Control​

Drift Detection​

Statistical Tests​

Drift Detection Workflow​

Drift Status Levels​

Router Methods​

Drift Detection Methods​

Getting Started​

Setting Up A/B Testing​

Setting Up Drift Detection​

Best Practices​

A/B Testing​

Drift Detection​

Overview

Core Capabilities

Architecture

Backend Services

A/B Testing Routers

Routing Strategies

Router Lifecycle

Access Control

Drift Detection

Statistical Tests

Drift Detection Workflow

Drift Status Levels

Router Methods

Drift Detection Methods

Getting Started

Setting Up A/B Testing

Setting Up Drift Detection

Best Practices

A/B Testing

Drift Detection