Skip to content

Read Results

This page helps you understand what each item on the Results tab means once your experiment is running. Walk through the page from top to bottom.

Why this matters: Your puzzle game has been running a new reward popup test for a week. The Results page shows green bars, gray bars, and various percentages — what do they mean? After reading this page, you will be able to confidently decide whether to ship, keep running, or end the experiment.

Still configuring the experiment? See Create and Launch an Experiment and Configure the Stats Engine and Validate Samples first.

Entry point

Click Experiments in the left navigation → select an experiment to open the detail page → switch to the Results tab. The experiment status is always shown in the page header; the Make Decision button also appears here while the experiment is running. Experiment Results tab entry

Sharing tip: The page URL encodes all the settings you are currently viewing. Send it to a teammate and they will see exactly the same view you see — no additional explanation needed.

Reading from top to bottom

1. Suggestion banner

A one-line conclusion at the top of the page that tells you the current state of the experiment.

What you seeWhat it meansNext step
Suggestion to add primary metricsNo primary metric selectedAdd one to see comparison charts
Collecting data…Not enough data yetLet the experiment keep running
A better group has been foundA significant winner existsEvaluate whether to Ship
No winning variant foundRan to completion with no significant differenceEvaluate whether to Archive
Risk warning (red)A high-risk condition was triggeredStop reading the headline — investigate first

Game scenario: You launched a new onboarding A/B test yesterday and open Results today to see Collecting data… — this is normal. Most casual games need 7–14 days to accumulate enough signal; do not rush to conclusions.

2. Cumulative Exposures

Cumulative exposures chart At the top of the results page is the cumulative exposure trend chart. An exposure is a deduplicated count of experiment units that have entered the experiment — typically deduplicated users, or devices for device-level experiments. The timeline shows when the experiment started and how many exposures entered each day. You can view the rate at which users are assigned to each variant, the total cumulative exposures, and verify that the actual traffic split matches the target allocation configured in the experiment setup. Exposure chart detail When a Sample Ratio Mismatch (SRM) is detected, the chart marks the affected point in red. The cause is that the cumulative exposure proportion across variants does not match the traffic allocation configured in the experiment setup.

3. Three analysis views

ViewPurpose
Basic Analysis (default)"Did this experiment win?" — a head-to-head comparison of Control versus one Treatment
ExploreView metrics outside the experiment's metric set, or compare multiple Treatments simultaneously (up to 30 metrics × 5 variants)
HTE AnalysisSuspect the effect is concentrated in a specific player segment (e.g., payers vs. non-payers) and want to confirm

Basic Analysis

Basic analysis tab Displays the relative difference (lift) across all metrics you configured when creating the experiment. The experiment "Results" job runs daily, computing the difference between random variants (such as Treatment vs. Control) for each metric and performing statistical tests on the results.

The metrics table compares Control and Treatment row by row:

ColumnMeaning
Metric nameMay include a CUPED On / Off indicator
BaselineThe Control variant's value
ComparisonThe Treatment variant's value
Relative differenceHow much Treatment changed relative to Control, with confidence interval
TrendsA small sparkline showing how the difference has changed over time

Color coding:

  • Green — Treatment is significantly better than Control (won)
  • Red — Treatment is significantly worse than Control (problem)
  • Gray — No significant difference yet

Game scenario: Testing whether a spin-the-wheel animation improves D7 retention over a static chest. After 10 days, the retention row shows green at +2.3%, and guardrail metrics (ARPU, ad watch rate) are gray — a clean win; the spin wheel can be shipped with confidence.

Experiment metrics results table The relative difference (lift) formula is: Delta(%) = (Test - Control) / Control. The confidence interval is computed based on the selected significance level (default 95%).

CUPED indicator

  • CUPED On = variance reduction is active; results are more reliable
  • CUPED Off = this metric type is not supported; hover to see the reason

CUPED can typically let an experiment reach a conclusion 3–7 days earlier — by eliminating natural behavioral variation among players (heavy weekend play vs. occasional weekday sessions) to reduce noise.

Explore

Ad-hoc queries without changing the experiment configuration. Select metrics, select variants, set a date range, and click Query.

When to use it:

  • Compare all Treatments against Control in a single view
  • A teammate asks "did session length change?" — a metric not in the experiment's main metric set
  • Explore an idea on the fly without touching the experiment itself

Game scenario: The primary metric is D7 retention, but a monetization colleague wants to know whether the new popup affects IAP conversion rate. Open Explore, add the IAP conversion metric — no need to modify the experiment setup.

HTE (Heterogeneous Treatment Effects) analysis

Answers the question: "Is the effect the same for all players, or do different segments show different responses?"

Steps:

  1. Select the baseline group at the top (defaults to Control)
  2. Set a date range
  3. To filter a specific segment, click Sample Filtering to add filter conditions
  4. Under Metrics, add the metrics you care about (up to 10)
  5. Under Group By, select a segmentation dimension (player level range, country, spending tier)
  6. Click Query

Query results appear in the Metric Details section below, showing the treatment effect difference across each sub-segment. When there are multiple treatment variants, you can enable Multiple Comparison Correction in the top-right corner to control the false-positive risk.

Game scenario: A new difficulty curve shows a weak positive lift overall. But when segmented by player level, players at level 50+ show +5%, while new players show –1%. HTE surfaces this — if you had shipped to everyone without segmenting, new players would churn faster due to mismatched difficulty, and the "weak positive lift" in overall retention would have been an illusion driven by experienced players.

Experiment Backtrack: a second check on sample balance

Backtrack is a sample balance check that ABC runs automatically on every experiment. It answers: "Are the variants truly comparable over the recent traffic period?"

Results are shown in two places:

  • A small label in the experiment header (Normal / - - placeholder)
  • A standalone backtrack page listing detailed per-metric differences

Practical usage:

  1. See Normal → you can trust the Suggestion and metrics table
  2. See - - → the experiment has not yet accumulated enough data; check back in a few hours
  3. Borderline result and unsure → switch to the 7-Day tab and compare direction and magnitude against the 3-day view
  4. Just added a new metric → click Rerun Backtrack to recompute immediately

Backtrack vs. SRM: both check balance, but with different scopes — SRM covers the entire experiment from start to present; Backtrack covers only the most recent 3 or 7 days. For a long-running experiment that started cleanly but drifted later, SRM may pass while the 3-day Backtrack does not.

Full details in Configure the Stats Engine and Validate Samples.

Advanced Query: force a recompute

The Advanced Query button (next to the date picker) forces a fresh computation. Use it when:

  • You just added a new metric and do not want to wait for tomorrow's automated run
  • The metric definition or data source was updated and the numbers need to be refreshed

Click it, select the metrics to recompute, and click Query. To recompute from scratch across all data, enable the Rerun Data toggle.


Next step: make a decision

Once you have read the results, click the Make Decision button at the top of the experiment to ship the winning variant or archive the experiment. See End an Experiment and Make a Decision.