Skip to main content

MLOps

The MLOps module provides production machine learning operations including A/B testing routers for traffic splitting between model variants, drift detection for monitoring model performance over time, and ground truth integration for measuring real-world accuracy.

Overview

Core Capabilities

  • A/B Testing Routers -- Split traffic between model variants using weighted random, feature-based, multi-armed bandit, or canary routing strategies
  • Drift Detection -- Monitor changes in production data distributions compared to training data baselines using PSI, KS test, Chi-Square, and Jensen-Shannon divergence
  • Ground Truth Integration -- Upload actual outcomes to track prediction accuracy over time and detect concept drift
  • Performance History -- Track model accuracy trends with configurable granularity (hourly, daily, weekly)
  • Alert Configuration -- Set drift thresholds and notification preferences per model

Architecture

Backend Services

ServiceDescription
Model RouterBackend service that implements routing strategies and serves predictions
AI GatewayHandles router deployment and management via Kubernetes

A/B Testing Routers

Compare model variants in production with traffic splitting across four routing strategies:

Routing Strategies

StrategyDescription
Weighted RandomSplit traffic by percentage between variants. A variant with weight 70 gets 70% of traffic relative to total enabled variant weight.
Feature-BasedRoute requests based on input feature values using priority-ordered rules with operators (equals, not_equals, in, not_in, greater_than, less_than, contains, regex).
Multi-Armed BanditAutomatically balance exploration and exploitation to route more traffic to better-performing variants. Supports epsilon-greedy, Thompson sampling, and UCB1 algorithms.
CanaryGradually roll out a new model variant by routing a configurable percentage of traffic to the canary while the rest goes to the control variant.

Router Lifecycle

Routers follow this lifecycle:

registered --> deploying --> running --> stopped
|
v
failed
StatusDescription
registeredRouter created but not yet deployed
deployingDeployment to Kubernetes is in progress
runningRouter is live and serving predictions
stoppedRouter is paused, preserving configuration
failedDeployment failed with errors

Access Control

  • Owner -- The user who created the router (full modify/delete access)
  • Public -- Router can be marked as public for all users to view
  • Shared -- Router can be shared with specific users by ID
  • Admins -- Administrators have full access to all routers

Learn more about A/B Testing -->

Drift Detection

Monitor model performance and detect distribution changes in production data:

Statistical Tests

MetricUse CaseInterpretation
PSI (Population Stability Index)Overall distribution shift< 0.1: OK, 0.1-0.2: Warning, > 0.2: Alert
Kolmogorov-Smirnov TestNumerical feature driftp-value < 0.05 indicates significant drift
Chi-Square TestCategorical feature driftp-value < 0.05 indicates significant drift
Jensen-Shannon DivergenceGeneral distribution comparison0-1 scale, higher values indicate more divergence

Drift Detection Workflow

  1. Create a reference baseline from your training data feature distributions
  2. Log predictions as your model serves requests in production
  3. Run drift analysis to compare current production data against the baseline
  4. Review results showing overall drift score and per-feature metrics
  5. Upload ground truth to measure actual prediction accuracy over time

Drift Status Levels

StatusPSI RangeAction
okPSI < 0.1No action needed. Data distribution is stable.
warning0.1 <= PSI < 0.2Investigate potential causes of drift.
alertPSI >= 0.2Significant drift detected. Consider retraining the model.

Learn more about Drift Detection -->

Router Methods

MethodDescription
routers.createCreate a new router with strategy and variants
routers.getGet a router by ID (access-controlled)
routers.listList accessible routers with optional filters
routers.updateUpdate router name, description, or tags
routers.deleteSoft-delete a router (stops deployment first)
routers.deployDeploy router to Kubernetes via AI Gateway
routers.stopStop a running router
routers.startStart a stopped router (re-deploys)
routers.updateVariantWeightChange a variant's traffic weight (0-100)
routers.toggleVariantEnable or disable a variant (minimum 2 must remain enabled)
routers.getMetricsGet aggregated metrics with per-variant breakdown
routers.getPredictionsGet prediction history with pagination
routers.recordFeedbackRecord reward feedback for bandit optimization
routers.updatePermissionsUpdate public/shared access settings

Drift Detection Methods

MethodDescription
drift.logPredictionLog a prediction with features and output for drift tracking
drift.getPredictionsGet prediction logs with date and ground truth filters
drift.addGroundTruthUpload a single ground truth record
drift.uploadGroundTruthBulk upload ground truth from CSV/JSON
drift.getUnmatchedPredictionsFind predictions without ground truth
drift.createBaselineCreate a reference baseline from feature data
drift.getActiveBaselineGet the active baseline for a model
drift.setActiveBaselineSet which baseline is active
drift.getBaselinesList all baselines for a model
drift.analyzeRun drift analysis comparing production data to baseline
drift.getLatestResultGet the most recent drift analysis result
drift.getResultsGet drift result history with optional status filter
drift.getAlertConfigGet alert configuration for a model
drift.updateAlertConfigUpdate alert thresholds and notification settings
drift.getPerformanceHistoryGet accuracy history with configurable granularity

Getting Started

Setting Up A/B Testing

  1. Register your model variants in the Model Registry
  2. Create a router with at least 2 variants and a routing strategy
  3. Deploy the router to get a prediction endpoint
  4. Route your application's requests through the router endpoint
  5. Monitor metrics and adjust variant weights based on results

Setting Up Drift Detection

  1. Export feature distributions from your training dataset
  2. Create a reference baseline using drift.createBaseline
  3. Configure prediction logging so production predictions are tracked
  4. Run periodic drift analysis using drift.analyze
  5. Optionally upload ground truth data to track accuracy over time
  6. Configure alert thresholds to be notified of significant drift

Best Practices

A/B Testing

  • Start with small traffic percentages (5-10%) for new variants before increasing
  • Use multi-armed bandit routing when you want to minimize exposure to underperforming variants
  • Define success metrics before starting a test
  • Ensure statistically significant sample sizes before drawing conclusions
  • Use canary routing for cautious rollouts of new model versions

Drift Detection

  • Create baselines immediately after training while training data is available
  • Run drift analysis at regular intervals (daily or weekly) to catch issues early
  • Collect ground truth data wherever possible to measure real accuracy, not just distribution shifts
  • Investigate warning-level drift before it becomes critical
  • When significant drift is confirmed, retrain the model and create a new baseline
  • Use A/B testing routers to safely compare retrained models against current production versions