Skip to content

Best Practices — Interpreting HTE Results

HTE conclusion overview

HTE best practices illustration

Using the example above, the conclusion area shows:

HTE Conclusion: Overall, the treatment variant is significantly positive vs. the baseline variant and has a significant effect on the following subgroups.

Three steps to interpret:

StepWhat to look atConclusion for this example
1Overall effect directionTreatment is significantly positive overall (+329%)
2Which subgroups are significantTop Positively Significant Subgroups lists 3 significantly positive subgroups
3Which subgroup has the strongest effectiOS users have the strongest effect (+415.56%)

Reading the tree diagram

HTE uses a decision tree to automatically segment users into subgroups; each node represents a subgroup.

Tree structure

LevelMeaningExample
Depth 0All users (root node)All crowds, +329%
Depth 1Split by the first dimensioniOS vs. non-iOS
Depth 2+Further split by other dimensionsFurther split by age group

Interpreting node information

Each node shows two numbers:

  • Relative lift: the magnitude of improvement in the treatment group relative to the control group
  • Proportion: the share of that subgroup in the total user population (60.30% means iOS users make up 60.3% of all users)

Color meaning

  • Green / cyan: positive effect (treatment group outperforms control group)
  • Red / pink: negative effect (treatment group underperforms control group)

Reading the table data

The table below the tree is the detailed HTE data. Key column descriptions:

ColumnMeaningWhat to focus on
Subgroup DefinitionSubgroup definition (dimension conditions)Understand the characteristics of this subgroup
Subgroup ProportionSubgroup shareDetermine whether the subgroup is large enough to warrant a separate decision
Baseline / TreatmentControl group vs. treatment group metric valuesUnderstand absolute magnitudes
Relative DifferenceRelative lift (with confidence interval)Core metric — judge effect size
P-valueSignificance testp < 0.05 is required to consider a subgroup effect significant
MDEMinimum Detectable EffectDetermine whether the subgroup is adequately powered

Key data interpretation for this example

SubgroupProportionRelative liftP-valueInterpretation
All crowds100%+329%0Overall effect is extremely strong and significant
iOS users60.30%+415.56%0Stronger than overall; main driver
iOS + non-under_1850.10%+368.44%0Effect remains very strong after excluding underage iOS users
iOS + non-under_18 + non-18-2439.90%+304.25%0Effect drops slightly but remains very strong after further excluding 18–24 year-olds
Non-iOS users39.70%+192.40%0Also significant, but notably smaller than iOS

Key findings

  1. iOS is the core driver: iOS users (60.3%) contribute the strongest positive effect (+415.56%) and are the primary driver of the overall significance.
  2. Android/other platforms also benefit: Non-iOS users (39.7%) also see a +192.40% lift, just smaller in magnitude.
  3. Age affects iOS users: As under_18 and 18-24 users are excluded, the relative lift decreases from +415% → +368% → +304%, indicating that younger iOS users respond more strongly.
  4. All p-values are 0: The effects of all key subgroups are highly significant.

Making decisions based on HTE

Decision framework

Step 1: Check whether the overall effect is significant
  → Not significant: stop the analysis; address the power issue first
  → Significant: proceed to Step 2

Step 2: Identify which subgroups are significant
  → All subgroups point in the same direction: launch to everyone
  → Some subgroups point in the opposite direction: consider a targeted launch

Step 3: Evaluate whether subgroup differences warrant a segmented decision
  → Large difference + subgroup is targetable → segmented strategy (e.g., launch only to iOS users)
  → Small difference or not targetable → launch to everyone

Decision recommendations for this example

OptionAnalysisRecommendation
Full launchOverall significant (+329%); all subgroups positiveSafe choice
iOS-firstiOS effect is strongest (+415%); non-iOS also has effect (+192%)If resources are limited, prioritize ramping up on iOS
Exclude specific subgroupsNo clearly negative subgroupsNo exclusions needed

⚠️ Notes:

  1. HTE involves multiple testing (hypothesis testing across multiple subpopulations) and theoretically requires correction (e.g., Bonferroni). If the platform's Multiple Comparison Correction toggle (top-right) is enabled, results are more trustworthy.
  2. When a subgroup proportion is too small (e.g., < 5%), interpret the result with caution even if the p-value is significant.