Matching
onlinecml.matching.online_matching.OnlineMatching
Bases: BaseOnlineEstimator
Online K-nearest-neighbor matching estimator for CATE.
Maintains separate sliding-window buffers for treated and control units. For each new observation, finds the K nearest neighbors in the opposite treatment arm and computes a matched CATE estimate via IPW-corrected neighbor averaging.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int
|
Number of nearest neighbors to match. Default 1. |
1
|
buffer_size
|
int
|
Maximum number of units to retain in each arm's buffer. Older units are dropped when the buffer is full (FIFO). Default 200. |
200
|
distance_fn
|
callable or None
|
Distance function |
None
|
Notes
The buffer implements Sliding Window Nearest Neighbor (SWINN) matching.
With finite buffer_size, older observations may be dropped. This
provides implicit adaptation to concept drift at the cost of match
quality early in the stream.
The per-observation CATE estimate is:
.. math::
\hat{\tau}_i = Y_i - \frac{1}{K} \sum_{j \in \mathcal{N}(i)} Y_j
where N(i) is the K nearest neighbors in the opposite arm.
Predict-then-match: The CATE estimate is computed from the current buffer before the new observation is added.
Examples:
>>> from onlinecml.datasets import LinearCausalStream
>>> matcher = OnlineMatching(k=3, buffer_size=100)
>>> for x, w, y, _ in LinearCausalStream(n=500, seed=42):
... matcher.learn_one(x, w, y)
>>> isinstance(matcher.predict_ate(), float)
True
Source code in onlinecml/matching/online_matching.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
learn_one(x, treatment, outcome, propensity=None)
Process one observation and update the matched CATE estimate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
Feature dictionary for this observation. |
required |
treatment
|
int
|
Treatment indicator (0 = control, 1 = treated). |
required |
outcome
|
float
|
Observed outcome. |
required |
propensity
|
float or None
|
Not used; included for API compatibility. |
None
|
Source code in onlinecml/matching/online_matching.py
predict_one(x)
Predict the CATE for a single unit via nearest-neighbor matching.
Finds the K nearest treated and control neighbors in the buffer, and returns the difference in their mean outcomes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
Feature dictionary for the unit. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Estimated CATE. Returns 0.0 if either buffer is empty. |
Source code in onlinecml/matching/online_matching.py
onlinecml.matching.caliper_matching.OnlineCaliperMatching
Bases: BaseOnlineEstimator
Online matching with a maximum distance threshold (caliper).
Extends nearest-neighbor matching by rejecting matches that exceed a maximum distance threshold. Units that cannot be matched within the caliper are tracked separately. Reports the proportion of units in common support.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
caliper
|
float
|
Maximum allowable distance for a match. Observations whose
nearest neighbor is farther than |
1.0
|
buffer_size
|
int
|
Maximum number of units to retain in each arm's buffer. Default 200. |
200
|
distance_fn
|
callable or None
|
Distance function |
None
|
Notes
The common_support_rate property returns the proportion of
observations that were successfully matched (distance ≤ caliper).
A high unmatched rate indicates a positivity violation.
Examples:
>>> from onlinecml.datasets import LinearCausalStream
>>> matcher = OnlineCaliperMatching(caliper=2.0, buffer_size=100)
>>> for x, w, y, _ in LinearCausalStream(n=300, seed=42):
... matcher.learn_one(x, w, y)
>>> isinstance(matcher.common_support_rate, float)
True
Source code in onlinecml/matching/caliper_matching.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | |
common_support_rate
property
Proportion of observations successfully matched within the caliper.
Returns:
| Type | Description |
|---|---|
float
|
Value in [0, 1]. Returns 0.0 before any observations are seen. |
learn_one(x, treatment, outcome, propensity=None)
Process one observation with caliper-constrained matching.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
Feature dictionary for this observation. |
required |
treatment
|
int
|
Treatment indicator (0 = control, 1 = treated). |
required |
outcome
|
float
|
Observed outcome. |
required |
propensity
|
float or None
|
Not used; included for API compatibility. |
None
|
Source code in onlinecml/matching/caliper_matching.py
predict_one(x)
Predict CATE for a single unit via caliper-constrained matching.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
Feature dictionary for the unit. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Estimated CATE. Returns 0.0 if either buffer is empty or if the nearest neighbors exceed the caliper. |
Source code in onlinecml/matching/caliper_matching.py
onlinecml.matching.kernel_matching.OnlineKernelMatching
Bases: BaseOnlineEstimator
Online kernel-weighted matching for CATE estimation.
Instead of selecting K discrete neighbors, uses all units in the opposite arm's buffer with weights determined by a kernel function. The CATE for a treated unit is:
.. math::
\hat{\tau}(x) = Y_i - \frac{\sum_j K(d(x, x_j)) Y_j}{\sum_j K(d(x, x_j))}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bandwidth
|
float
|
Bandwidth of the kernel. Smaller values → sharper matching. Default 1.0. |
1.0
|
buffer_size
|
int
|
Maximum number of units to retain per arm. Default 200. |
200
|
distance_fn
|
callable or None
|
Distance function |
None
|
kernel_fn
|
callable or None
|
Kernel function |
None
|
Examples:
>>> from onlinecml.datasets import LinearCausalStream
>>> matcher = OnlineKernelMatching(bandwidth=1.5, buffer_size=100)
>>> for x, w, y, _ in LinearCausalStream(n=300, seed=42):
... matcher.learn_one(x, w, y)
>>> isinstance(matcher.predict_ate(), float)
True
Source code in onlinecml/matching/kernel_matching.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | |
learn_one(x, treatment, outcome, propensity=None)
Process one observation and update the CATE estimate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
Feature dictionary for this observation. |
required |
treatment
|
int
|
Treatment indicator (0 = control, 1 = treated). |
required |
outcome
|
float
|
Observed outcome. |
required |
propensity
|
float or None
|
Not used; included for API compatibility. |
None
|
Source code in onlinecml/matching/kernel_matching.py
predict_one(x)
Predict the CATE for a single unit via kernel-weighted matching.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
Feature dictionary for the unit. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Estimated CATE. Returns 0.0 if either buffer is empty. |
Source code in onlinecml/matching/kernel_matching.py
Distance Functions
onlinecml.matching.distance.euclidean_distance(x, y)
Compute the Euclidean distance between two feature dicts.
Only features present in both dicts are used. Missing features are treated as 0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
First feature dictionary. |
required |
y
|
dict
|
Second feature dictionary. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Euclidean distance. |
Source code in onlinecml/matching/distance.py
onlinecml.matching.distance.ps_distance(p_x, p_y)
Compute the absolute difference in propensity scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p_x
|
float
|
Propensity score for unit x. |
required |
p_y
|
float
|
Propensity score for unit y. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Absolute propensity score distance |
Source code in onlinecml/matching/distance.py
onlinecml.matching.distance.mahalanobis_distance(x, y, cov_inv=None)
Compute the Mahalanobis distance between two feature dicts.
If no inverse covariance matrix is provided, falls back to scaled Euclidean distance (divides each dimension by its variance proxy = 1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
First feature dictionary. |
required |
y
|
dict
|
Second feature dictionary. |
required |
cov_inv
|
dict or None
|
Inverse covariance matrix as a nested dict
|
None
|
Returns:
| Type | Description |
|---|---|
float
|
Mahalanobis distance. |
Source code in onlinecml/matching/distance.py
onlinecml.matching.distance.combined_distance(x, y, p_x, p_y, ps_weight=0.5)
Compute a weighted combination of Euclidean and PS distance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict
|
First feature dictionary. |
required |
y
|
dict
|
Second feature dictionary. |
required |
p_x
|
float
|
Propensity score for unit x. |
required |
p_y
|
float
|
Propensity score for unit y. |
required |
ps_weight
|
float
|
Weight on the PS distance component (0 to 1). Default 0.5. |
0.5
|
Returns:
| Type | Description |
|---|---|
float
|
Combined distance. |