CUPED: Variance Reduction Using Pre-Experiment Data

CUPED (Controlled-experiment Using Pre-Existing Data) is a technique that reduces the variance of experiment metrics by leveraging users' pre-experiment behavior. The core idea is: if users' pre-experiment metric values are correlated with their in-experiment metric values, you can use the pre-experiment metric as a covariate to linearly adjust the in-experiment metric, thereby reducing random variation and improving experiment sensitivity. In A/B experiments this typically manifests as narrower confidence intervals, a lower MDE (Minimum Detectable Effect), and shorter experiment durations.

Simple Aggregate Metrics

For mean, proportion, and average-active-days metrics, the current implementation estimates a single shared adjustment coefficient using the in-experiment metric $Y$ and the pre-experiment metric $X$ :

θ = \frac{C o v (Y, X)}{V a r (X)}

$θ$ is estimated using data pooled from both the control group and treatment group. The CUPED-adjusted metric value for each group is:

Y_{c u p e d} = Y - θ (X - E [X])

Here $E [X]$ also comes from the pooled pre-experiment mean of both groups. Intuitively: if one group's pre-experiment metric was naturally higher, that portion of variance attributable to historical differences is subtracted from the in-experiment metric; if the pre-experiment value was lower, a corresponding correction is added. This does not change the causal interpretation of the randomization, but reduces noise from historical differences unrelated to the treatment.

Ratio Metrics

The experimental unit for ratio metrics contains both a numerator and a denominator. The current implementation defines:

Y: in-experiment numerator (nume)
N: in-experiment denominator (deno)
X: pre-experiment numerator (pre_nume)
M: pre-experiment denominator (pre_deno)

The in-experiment metric is Y / N and the pre-experiment metric is X / M. The CUPED adjustment for ratio metrics in the current algorithm library is:

θ = \frac{C o v (Y / N, X / M)}{V a r (X / M)}

where Cov(Y/N, X/M) and Var(X/M) are estimated via the Delta Method (see Delta Method under Statistical Methods).

Both groups share the same pooled theta. Let the pooled pre-experiment ratio across both groups be:

E [R_{p r e}] = \frac{X_{c} + X_{t}}{M_{c} + M_{t}}

The CUPED-adjusted ratios for the control group and treatment group are, respectively:

R_{c, c u p e d} = \frac{Y_{c}}{N_{c}} - θ (\frac{X_{c}}{M_{c}} - E [R_{p r e}])

R_{t, c u p e d} = \frac{Y_{t}}{N_{t}} - θ (\frac{X_{t}}{M_{t}} - E [R_{p r e}])

The final effect size is calculated from the adjusted ratios of the two groups:

Δ = R_{t, c u p e d} - R_{c, c u p e d}

The post-CUPED variance uses the following form:

V a r (R_{c u p e d}) = V a r (\frac{Y}{N}) + θ^{2} V a r (\frac{X}{M}) - 2 θ C o v (\frac{Y}{N}, \frac{X}{M})

This is used for significance testing and interval estimation.

Scope and Usage Recommendations

CUPED currently supports MEAN, PROPORTION, RATIO, SUM, and ACTIVE_DAYS.

The benefit of CUPED depends on the correlation between the pre-experiment metric and the in-experiment metric. The stronger the correlation, the more pronounced the variance reduction typically is. If pre-experiment data is largely missing, the definitions are inconsistent, or the pre-experiment metric is weakly correlated with the in-experiment metric, the benefit will be reduced.

CUPED: Variance Reduction Using Pre-Experiment Data ​

Simple Aggregate Metrics ​

Ratio Metrics ​

Scope and Usage Recommendations ​

CUPED: Variance Reduction Using Pre-Experiment Data

Simple Aggregate Metrics

Ratio Metrics

Scope and Usage Recommendations