Robinson Transformation (R-Learner)
The R-Learner (Nie & Wager, 2021) is based on Robinson's (1988)
partial linear model decomposition. It is implemented in OnlineCML as
OnlineRLearner.
The Partial Linear Model
Assume the outcome follows:
where \(m(x) = E[Y \mid X = x]\) is the baseline outcome surface and \(\tau(x)\) is the CATE.
The Residual Transformation
Robinson (1988) showed that after partialling out the nuisance functions:
where \(e(x) = E[W \mid X = x]\) is the propensity score. This is the Robinson decomposition: the residualised outcome \(\tilde{Y}_i = Y_i - m(X_i)\) regressed on the residualised treatment \(\tilde{W}_i = W_i - e(X_i)\) identifies \(\tau(X_i)\).
The R-Learner Loss
Nie & Wager (2021) propose minimising:
which is equivalent to regressing the pseudo-outcome \(\tilde{Y}_i / \tilde{W}_i\) on \(X_i\) with weight \(\tilde{W}_i^2\).
Online Approximation in OnlineCML
OnlineRLearner maintains three running River models:
ps_model— estimates \(e(x) = P(W=1|X)\)outcome_model— estimates \(m(x) = E[Y|X]\)cate_model— fits \(\tau(x)\) from residualised targets
At each step:
W_res = W - ps_model.predict(X) # treatment residual
Y_res = Y - outcome_model.predict(X) # outcome residual
pseudo_outcome = Y_res / W_res # only if |W_res| >= min_residual
weight = W_res^2
cate_model.learn_one(X, pseudo_outcome, w=weight)
The predict-then-learn protocol ensures the nuisance models are not contaminated by the current observation when generating the pseudo-outcome.
References
- Robinson, P.M. (1988). Root-N-consistent semiparametric regression. Econometrica, 56(4), 931–954.
- Nie, X. and Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), 299–319.