MLOps
The MLOps module provides production machine learning operations including A/B testing routers for traffic splitting between model variants, drift detection for monitoring model performance over time, and ground truth integration for measuring real-world accuracy.
Overview
Core Capabilities
- A/B Testing Routers -- Split traffic between model variants using weighted random, feature-based, multi-armed bandit, or canary routing strategies
- Drift Detection -- Monitor changes in production data distributions compared to training data baselines using PSI, KS test, Chi-Square, and Jensen-Shannon divergence
- Ground Truth Integration -- Upload actual outcomes to track prediction accuracy over time and detect concept drift
- Performance History -- Track model accuracy trends with configurable granularity (hourly, daily, weekly)
- Alert Configuration -- Set drift thresholds and notification preferences per model
Architecture
Backend Services
| Service | Description |
|---|---|
| Model Router | Backend service that implements routing strategies and serves predictions |
| AI Gateway | Handles router deployment and management via Kubernetes |
A/B Testing Routers
Compare model variants in production with traffic splitting across four routing strategies:
Routing Strategies
| Strategy | Description |
|---|---|
| Weighted Random | Split traffic by percentage between variants. A variant with weight 70 gets 70% of traffic relative to total enabled variant weight. |
| Feature-Based | Route requests based on input feature values using priority-ordered rules with operators (equals, not_equals, in, not_in, greater_than, less_than, contains, regex). |
| Multi-Armed Bandit | Automatically balance exploration and exploitation to route more traffic to better-performing variants. Supports epsilon-greedy, Thompson sampling, and UCB1 algorithms. |
| Canary | Gradually roll out a new model variant by routing a configurable percentage of traffic to the canary while the rest goes to the control variant. |
Router Lifecycle
Routers follow this lifecycle:
registered --> deploying --> running --> stopped
|
v
failed
| Status | Description |
|---|---|
registered | Router created but not yet deployed |
deploying | Deployment to Kubernetes is in progress |
running | Router is live and serving predictions |
stopped | Router is paused, preserving configuration |
failed | Deployment failed with errors |
Access Control
- Owner -- The user who created the router (full modify/delete access)
- Public -- Router can be marked as public for all users to view
- Shared -- Router can be shared with specific users by ID
- Admins -- Administrators have full access to all routers
Learn more about A/B Testing -->
Drift Detection
Monitor model performance and detect distribution changes in production data:
Statistical Tests
| Metric | Use Case | Interpretation |
|---|---|---|
| PSI (Population Stability Index) | Overall distribution shift | < 0.1: OK, 0.1-0.2: Warning, > 0.2: Alert |
| Kolmogorov-Smirnov Test | Numerical feature drift | p-value < 0.05 indicates significant drift |
| Chi-Square Test | Categorical feature drift | p-value < 0.05 indicates significant drift |
| Jensen-Shannon Divergence | General distribution comparison | 0-1 scale, higher values indicate more divergence |
Drift Detection Workflow
- Create a reference baseline from your training data feature distributions
- Log predictions as your model serves requests in production
- Run drift analysis to compare current production data against the baseline
- Review results showing overall drift score and per-feature metrics
- Upload ground truth to measure actual prediction accuracy over time
Drift Status Levels
| Status | PSI Range | Action |
|---|---|---|
ok | PSI < 0.1 | No action needed. Data distribution is stable. |
warning | 0.1 <= PSI < 0.2 | Investigate potential causes of drift. |
alert | PSI >= 0.2 | Significant drift detected. Consider retraining the model. |
Learn more about Drift Detection -->
Router Methods
| Method | Description |
|---|---|
routers.create | Create a new router with strategy and variants |
routers.get | Get a router by ID (access-controlled) |
routers.list | List accessible routers with optional filters |
routers.update | Update router name, description, or tags |
routers.delete | Soft-delete a router (stops deployment first) |
routers.deploy | Deploy router to Kubernetes via AI Gateway |
routers.stop | Stop a running router |
routers.start | Start a stopped router (re-deploys) |
routers.updateVariantWeight | Change a variant's traffic weight (0-100) |
routers.toggleVariant | Enable or disable a variant (minimum 2 must remain enabled) |
routers.getMetrics | Get aggregated metrics with per-variant breakdown |
routers.getPredictions | Get prediction history with pagination |
routers.recordFeedback | Record reward feedback for bandit optimization |
routers.updatePermissions | Update public/shared access settings |
Drift Detection Methods
| Method | Description |
|---|---|
drift.logPrediction | Log a prediction with features and output for drift tracking |
drift.getPredictions | Get prediction logs with date and ground truth filters |
drift.addGroundTruth | Upload a single ground truth record |
drift.uploadGroundTruth | Bulk upload ground truth from CSV/JSON |
drift.getUnmatchedPredictions | Find predictions without ground truth |
drift.createBaseline | Create a reference baseline from feature data |
drift.getActiveBaseline | Get the active baseline for a model |
drift.setActiveBaseline | Set which baseline is active |
drift.getBaselines | List all baselines for a model |
drift.analyze | Run drift analysis comparing production data to baseline |
drift.getLatestResult | Get the most recent drift analysis result |
drift.getResults | Get drift result history with optional status filter |
drift.getAlertConfig | Get alert configuration for a model |
drift.updateAlertConfig | Update alert thresholds and notification settings |
drift.getPerformanceHistory | Get accuracy history with configurable granularity |
Getting Started
Setting Up A/B Testing
- Register your model variants in the Model Registry
- Create a router with at least 2 variants and a routing strategy
- Deploy the router to get a prediction endpoint
- Route your application's requests through the router endpoint
- Monitor metrics and adjust variant weights based on results
Setting Up Drift Detection
- Export feature distributions from your training dataset
- Create a reference baseline using
drift.createBaseline - Configure prediction logging so production predictions are tracked
- Run periodic drift analysis using
drift.analyze - Optionally upload ground truth data to track accuracy over time
- Configure alert thresholds to be notified of significant drift
Best Practices
A/B Testing
- Start with small traffic percentages (5-10%) for new variants before increasing
- Use multi-armed bandit routing when you want to minimize exposure to underperforming variants
- Define success metrics before starting a test
- Ensure statistically significant sample sizes before drawing conclusions
- Use canary routing for cautious rollouts of new model versions
Drift Detection
- Create baselines immediately after training while training data is available
- Run drift analysis at regular intervals (daily or weekly) to catch issues early
- Collect ground truth data wherever possible to measure real accuracy, not just distribution shifts
- Investigate warning-level drift before it becomes critical
- When significant drift is confirmed, retrain the model and create a new baseline
- Use A/B testing routers to safely compare retrained models against current production versions