Skip to content

The Three Causal Assumptions

All causal estimators — online or batch — require three assumptions to identify treatment effects from observational data. This page explains each assumption and shows how OnlineCML helps you check them.

1. Unconfoundedness

Formal statement:

\[Y_i(0), Y_i(1) \perp W_i \mid X_i\]

In plain English: after conditioning on the observed covariates \(X_i\), the treatment assignment contains no additional information about the potential outcomes.

What can go wrong: unobserved confounders — variables that affect both treatment and outcome but are not in \(X_i\). Examples: unmeasured health status, hidden user intent, genetic factors.

How OnlineCML helps: The OnlineSMD and LiveLovePlot tools check whether the treated and control groups are balanced on observed covariates. Balance on observables is a necessary (not sufficient) condition for unconfoundedness.

smd = OnlineSMD(covariates=["age", "income"])
for x, w, y, _ in stream:
    smd.update(x, treatment=w, weight=ipw_weight)
print(smd.is_balanced())  # |SMD| < 0.1 for all covariates

2. Overlap (Positivity)

Formal statement:

\[0 < e(x) < 1 \quad \forall\, x \in \text{support}(X)\]

where \(e(x) = P(W=1 \mid X=x)\) is the propensity score.

In plain English: every type of unit has a non-zero probability of being either treated or untreated.

What can go wrong: near-positivity violations occur when some subgroups are almost never treated (or almost always treated). IPW weights become extremely large, inflating variance.

How OnlineCML helps: OverlapChecker flags observations with propensity scores outside [ps_min, ps_max]. OnlineOverlapWeights uses Li et al. (2018)'s overlap weights \(h(x) = e(x)(1-e(x))\) which are bounded and stable.

checker = OverlapChecker(ps_min=0.05, ps_max=0.95)
for x, w, y, _ in stream:
    checker.update(ps_model.predict_one(x), treatment=w)
print(checker.is_overlap_adequate())

3. SUTVA

Formal statement: The potential outcome \(Y_i(w)\) depends only on unit \(i\)'s own treatment \(W_i\), not on the treatments of other units.

In plain English: there are no spillover effects between units.

What can go wrong: network effects (social contagion), market-level effects (price changes affect all buyers), household effects (one family member's medication affects outcomes for others).

How OnlineCML handles it: OnlineCML assumes SUTVA by default. If your setting has interference, results should be interpreted as direct effects only. Future versions will include NetworkInterferenceStream for SUTVA-violating settings.

Checking Assumptions in Practice

from onlinecml.diagnostics import OnlineSMD, OverlapChecker

smd     = OnlineSMD(covariates=[...])
checker = OverlapChecker()
ipw     = OnlineIPW()

for x, w, y, _ in stream:
    ps = ipw.ps_model.predict_one(x)
    checker.update(ps, treatment=w)
    weight = 1/ps if w == 1 else 1/(1-ps)
    smd.update(x, treatment=w, weight=weight)
    ipw.learn_one(x, w, y)

# Report
print("Overlap adequate:", checker.is_overlap_adequate())
print("Balance adequate:", smd.is_balanced())

References

  • Imbens, G.W. and Rubin, D.B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.
  • Li, F., Morgan, K.L. and Zaslavsky, A.M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390–400.