Drift Detection

Drift detection monitors changes in your model's input data distributions over time and alerts you when production data differs significantly from training data. When combined with ground truth integration, it also detects prediction drift -- when model accuracy degrades compared to baseline performance.

Why Drift Detection Matters

Machine learning models are trained on historical data, but the real world is constantly changing. Over time, the data your model sees in production may differ from the data it was trained on, causing performance degradation. This is known as drift.

Types of Drift

Type	Description	How Detected
Data Drift	The statistical distribution of input features changes over time (e.g., customer demographics shift, seasonal patterns change)	PSI, KS test, Chi-Square, Jensen-Shannon divergence
Concept Drift	The relationship between inputs and outputs changes (e.g., customer behavior evolves, economic conditions shift)	Ground truth accuracy comparison over time
Prediction Drift	Model accuracy degrades compared to baseline performance	Requires ground truth labels to detect

Reference Baselines

A reference baseline captures the statistical distributions of your training data features. When you run drift analysis, current production data is compared against this baseline to detect changes.

Creating a Baseline

drift.createBaseline({
  modelId,
  version,
  featureData,
  targetData
})

Parameter	Required	Description
`modelId`	Yes	ID of the model this baseline is for
`version`	Yes	Version string for this baseline (e.g., "v1.0")
`featureData`	Yes	Object mapping feature names to arrays of values
`targetData`	No	Target variable distribution (classification or regression)

The system automatically:

Detects whether each feature is numerical or categorical based on the first value
For numerical features: calculates mean, standard deviation, min, max, percentiles (5th, 25th, 50th, 75th, 95th), and histogram
For categorical features: calculates category frequencies as proportions
For target data: calculates either classification class frequencies or regression distribution
Deactivates any previously active baseline for the model
Sets the new baseline as active

Managing Baselines

Method	Description
`drift.getActiveBaseline(modelId)`	Get the currently active baseline
`drift.setActiveBaseline(modelId, baselineId)`	Switch the active baseline
`drift.getBaselines(modelId)`	List all baselines (sorted by creation date, newest first)

Only one baseline can be active per model at a time. Setting a new active baseline deactivates all others.

Prediction Logging

All predictions must be logged for drift analysis. Each prediction log captures the input features, model output, and metadata needed for later comparison.

Logging a Prediction

drift.logPrediction({
  modelId,
  entityId,
  features,
  prediction,
  probabilities,
  modelVersion,
  latencyMs,
  requestSource
})

Field	Required	Description
`modelId`	Yes	Model that produced the prediction
`entityId`	No	Business entity ID for ground truth matching (defaults to predictionId if not provided)
`features`	Yes	Input features as key-value pairs
`prediction`	Yes	Model output (any type)
`probabilities`	No	Prediction probabilities array (for classification)
`modelVersion`	No	Version of the model used
`latencyMs`	No	Prediction latency in milliseconds
`requestSource`	No	Source of the prediction request

Returns a unique predictionId.

Querying Predictions

drift.getPredictions(modelId, { limit, offset, startDate, endDate, hasGroundTruth })

Filter predictions by date range and whether they have ground truth attached. Returns predictions sorted by timestamp (newest first).

Data Retention

Prediction logs have a 90-day TTL -- they are automatically expired after 90 days. Historical drift analysis results are stored indefinitely in the drift_results collection.

Ground Truth Integration

Ground truth (actual outcomes) allows you to measure prediction drift and model accuracy over time. Ground truth records are matched to predictions via the entityId field.

Single Record Upload

drift.addGroundTruth({
  modelId,
  entityId,
  actualOutcome,
  outcomeTimestamp
})

The system automatically attempts to match the ground truth with an existing prediction that has the same modelId and entityId and has not yet been matched. When matched:

The prediction log is updated with the ground truth data (actualOutcome, outcomeTimestamp, matchedAt)
The ground truth record is marked as matched: true with the matchedPredictionId

Bulk Upload

drift.uploadGroundTruth({
  modelId,
  records: [
    { entityId, actualOutcome, outcomeTimestamp },
    ...
  ]
})

Processes records sequentially, attempting to match each with its corresponding prediction. Returns a summary:

Field	Description
`imported`	Number of records successfully imported
`matched`	Number of records matched to predictions
`errors`	Array of error messages for failed records
`batchId`	Unique identifier for this upload batch

Records missing an entityId are skipped with an error.

Finding Unmatched Predictions

drift.getUnmatchedPredictions(modelId, limit = 100)

Returns predictions that have not yet been matched with ground truth. Use this to identify which predictions still need actual outcomes uploaded.

Statistical Tests

The drift analysis engine uses multiple statistical methods to detect distribution changes:

PSI (Population Stability Index)

The primary metric for drift detection. Measures the overall shift in distribution between baseline and current data.

PSI Range	Status	Interpretation
PSI < 0.1	`ok`	No significant drift. Data distribution is stable.
0.1 <= PSI < 0.2	`warning`	Moderate drift. Investigate potential causes.
PSI >= 0.2	`alert`	Significant drift. Action required. Consider retraining.

For numerical features, PSI is calculated by comparing histogram bin frequencies between the baseline and current distributions (aligned to baseline bin edges).

For categorical features, PSI is calculated by aligning category frequencies between baseline and current distributions. Missing categories are assigned a small frequency (0.001) to avoid division by zero.

Kolmogorov-Smirnov Test

Used for numerical features only. Compares the cumulative distributions of baseline and current data samples.

p-value < 0.05 indicates statistically significant drift
The test is performed on up to 1,000 samples (approximated from the baseline distribution when raw baseline samples are not available)

Chi-Square Test

Used for categorical features only. Compares observed versus expected frequencies across categories.

p-value < 0.05 indicates statistically significant drift
Expected frequencies are derived from the baseline distribution

Jensen-Shannon Divergence

Used for categorical features as a symmetric measure of distribution similarity.

Ranges from 0 (identical distributions) to 1 (completely different)
Provides a complementary view to PSI for categorical data

Running Drift Analysis

Triggering Analysis

drift.analyze(modelId, { windowDays, windowStart, windowEnd })

Parameter	Default	Description
`windowDays`	7	Number of days of recent predictions to analyze
`windowStart`	Calculated from windowDays	Start of analysis window
`windowEnd`	Current time	End of analysis window

Requirements:

An active baseline must exist for the model
At least one prediction must exist in the specified time window

Analysis Process

Retrieves the active baseline for the model
Queries all predictions within the time window
Extracts feature values from predictions and groups by feature name
For each feature in the baseline:
- Numerical features: Calculates PSI from histogram frequencies and KS test from sample comparison
- Categorical features: Calculates PSI, Chi-Square, and Jensen-Shannon divergence from category frequencies
If predictions with ground truth exist, calculates current accuracy
Computes overall drift score from all PSI values
Determines overall status (ok, warning, or alert)
Stores the result in the drift_results collection

Analysis Result

The drift result contains:

Field	Description
`modelId`	Model that was analyzed
`baselineId`	Baseline used for comparison
`windowStart` / `windowEnd`	Time window analyzed
`sampleSize`	Number of predictions analyzed
`overallStatus`	Aggregate status: `ok`, `warning`, or `alert`
`driftScore`	Aggregate drift score (0-1)
`featureDrift`	Array of per-feature drift results
`predictionDrift`	Accuracy comparison (only if ground truth available)
`calculatedAt`	When the analysis was performed
`calculationDurationMs`	How long the analysis took

Per-Feature Results

Each feature drift result includes:

Field	Description
`featureName`	Name of the feature
`driftType`	Type of drift detected (`data`)
`metrics.psi`	PSI value, p-value, and status
`metrics.ksStatistic`	KS test result (numerical features)
`metrics.chiSquare`	Chi-Square test result (categorical features)
`metrics.jensenShannon`	Jensen-Shannon divergence (categorical features)
`currentDistribution`	Current distribution statistics
`baselineDistribution`	Baseline distribution for comparison
`hasDrift`	Boolean flag if significant drift was detected

Querying Results

Method	Description
`drift.getLatestResult(modelId)`	Get the most recent analysis result
`drift.getResults(modelId, { limit, status })`	Get result history with optional status filter

Alert Configuration

Configure automated monitoring thresholds for each model:

drift.updateAlertConfig(modelId, config)

Default Thresholds

Threshold	Default	Description
`psiWarning`	0.1	PSI value that triggers a warning
`psiAlert`	0.2	PSI value that triggers an alert
`ksPValueThreshold`	0.05	KS test p-value significance threshold
`accuracyDropWarning`	0.05	5% accuracy drop triggers a warning
`accuracyDropAlert`	0.1	10% accuracy drop triggers an alert

Notification Settings

Field	Default	Description
`email`	true	Whether to send email notifications
`recipients`	[]	Array of email addresses to notify

Scheduled Analysis

Field	Default	Description
`enabled`	false	Whether scheduled analysis is enabled
`frequency`	`daily`	How often to run: `hourly`, `daily`, or `weekly`
`timeWindowDays`	7	Days of data to include in each analysis

Performance History

Track model accuracy over time with configurable granularity:

drift.getPerformanceHistory(modelId, { windowDays, granularity })

Parameter	Default	Description
`windowDays`	30	Number of days of history to return
`granularity`	`daily`	Time bucket size: `hourly`, `daily`, or `weekly`

Returns an array of data points, each containing:

Field	Description
`timestamp`	Start of the time bucket
`accuracy`	Accuracy within the bucket (requires ground truth)
`predictionCount`	Number of predictions in the bucket
`groundTruthCount`	Number of predictions with ground truth in the bucket

Accuracy is calculated as the proportion of predictions where prediction === groundTruth.actualOutcome within each time bucket.

Collections and Indexes

Prediction Logs Collection

Index	Purpose
`modelId, timestamp` (desc)	Query predictions for a model
`entityId, modelId`	Match ground truth to predictions
`timestamp` (TTL: 90 days)	Automatic cleanup of old predictions

Ground Truth Collection

Stores actual outcome data uploaded for ground truth matching.

Reference Baselines Collection

Index	Purpose
`modelId, isActive`	Find the active baseline for a model

Drift Results Collection

Index	Purpose
`modelId, calculatedAt` (desc)	Query results for a model

Alert Config Collection

Index	Purpose
`modelId` (unique)	One config per model

Interpreting Common Scenarios

Seasonal Drift

Pattern: Drift appears at regular intervals (monthly, quarterly). Action: Consider building season-aware models or using different models for different periods. This may be expected behavior rather than a problem.

Sudden Drift Spike

Pattern: PSI jumps dramatically in a short time. Action: Investigate recent changes: new data sources, pipeline bugs, external events, or upstream system changes.

Gradual Increase

Pattern: Drift slowly increases over weeks or months. Action: Schedule model retraining as part of regular maintenance. Create a new baseline after retraining.

Single Feature Drift

Pattern: One feature shows high drift while others are stable. Action: Investigate that specific feature. The cause may be an upstream data issue, a feature engineering bug, or a genuine change in the data source.

High Drift but Good Accuracy

Pattern: PSI is in alert range but accuracy (from ground truth) remains stable. Action: The distribution has shifted but the model's decision boundaries may still be valid. Monitor closely and document the finding. Consider updating the baseline if this becomes the new normal.

Best Practices

Establish Baselines Early

Create reference baselines immediately after training while the training data is readily available. The training data distribution is your ground truth for comparison.

Monitor Continuously

Do not wait for problems to appear. Schedule regular drift analysis (daily or weekly) to catch issues early. The performance history view helps identify gradual degradation that might not be obvious from a single analysis.

Collect Ground Truth

Where possible, collect actual outcomes to measure real model performance:

Data drift detection (via PSI, KS, etc.) tells you the input distribution changed
Ground truth matching tells you whether the model's predictions are still accurate
Both are valuable, but accuracy measurement is the definitive indicator of model health

Investigate Warnings

When drift is detected at the warning level, investigate before it becomes critical:

Is this expected (seasonal change, business event, holiday)?
Are specific features driving the drift?
Is model performance actually affected (check ground truth accuracy)?

Plan for Retraining

When significant drift is confirmed:

Collect recent data with ground truth labels
Retrain the model on updated data
Create a new reference baseline from the new training data
Use an A/B testing router to safely compare the retrained model against the current production version
Gradually shift traffic to the retrained model once performance is validated

warning

Prediction logs have a 90-day TTL and are automatically deleted after that period. Ensure that drift analysis results and ground truth data are captured before the prediction logs expire if you need long-term records.

Why Drift Detection Matters​

Types of Drift​

Reference Baselines​

Creating a Baseline​

Managing Baselines​

Prediction Logging​

Logging a Prediction​

Querying Predictions​

Data Retention​

Ground Truth Integration​

Single Record Upload​

Bulk Upload​

Finding Unmatched Predictions​

Statistical Tests​

PSI (Population Stability Index)​

Kolmogorov-Smirnov Test​

Chi-Square Test​

Jensen-Shannon Divergence​

Running Drift Analysis​

Triggering Analysis​

Analysis Process​

Analysis Result​

Per-Feature Results​

Querying Results​

Alert Configuration​

Default Thresholds​

Notification Settings​

Scheduled Analysis​

Performance History​

Collections and Indexes​

Prediction Logs Collection​

Ground Truth Collection​

Reference Baselines Collection​

Drift Results Collection​

Alert Config Collection​

Interpreting Common Scenarios​

Seasonal Drift​

Sudden Drift Spike​

Gradual Increase​

Single Feature Drift​

High Drift but Good Accuracy​

Best Practices​

Establish Baselines Early​

Monitor Continuously​

Collect Ground Truth​

Investigate Warnings​

Plan for Retraining​