CUPED: Variance Reduction Using Pre-Experiment Data
CUPED (Controlled-experiment Using Pre-Existing Data) is a technique that reduces the variance of experiment metrics by leveraging users' pre-experiment behavior. The core idea is: if users' pre-experiment metric values are correlated with their in-experiment metric values, you can use the pre-experiment metric as a covariate to linearly adjust the in-experiment metric, thereby reducing random variation and improving experiment sensitivity. In A/B experiments this typically manifests as narrower confidence intervals, a lower MDE (Minimum Detectable Effect), and shorter experiment durations.
Simple Aggregate Metrics
For mean, proportion, and average-active-days metrics, the current implementation estimates a single shared adjustment coefficient using the in-experiment metric
Here
Ratio Metrics
The experimental unit for ratio metrics contains both a numerator and a denominator. The current implementation defines:
Y: in-experiment numerator (nume)N: in-experiment denominator (deno)X: pre-experiment numerator (pre_nume)M: pre-experiment denominator (pre_deno)
The in-experiment metric is Y / N and the pre-experiment metric is X / M. The CUPED adjustment for ratio metrics in the current algorithm library is:
where Cov(Y/N, X/M) and Var(X/M) are estimated via the Delta Method (see Delta Method under Statistical Methods).
Both groups share the same pooled theta. Let the pooled pre-experiment ratio across both groups be:
The CUPED-adjusted ratios for the control group and treatment group are, respectively:
The final effect size is calculated from the adjusted ratios of the two groups:
The post-CUPED variance uses the following form:
This is used for significance testing and interval estimation.
Scope and Usage Recommendations
CUPED currently supports MEAN, PROPORTION, RATIO, SUM, and ACTIVE_DAYS.
The benefit of CUPED depends on the correlation between the pre-experiment metric and the in-experiment metric. The stronger the correlation, the more pronounced the variance reduction typically is. If pre-experiment data is largely missing, the definitions are inconsistent, or the pre-experiment metric is weakly correlated with the in-experiment metric, the benefit will be reduced.