API¶

class cem.CEM(data: pandas.core.frame.DataFrame, treatment: str, outcome: str, H: Optional[int] = None, measure: str = 'l1', lower_H: int = 1, upper_H: int = 10)¶

The CEM class allows users to experiment with different coarsening schemas on a single DataFrame. The “imbalance” and “match” methods return the multivariate imbalance (pre or post matching) and individual observation weights post-matching, respectively.

Parameters:

data (pandas.DataFrame) – A dataframe containing the observations
treatment (str) – Name of column in dataframe containing the treatment variable
outcome (str) – Name of column in dataframe containing the outcome variable
H (int, optional) – The number of bins to use for the continuous variables when calculating imbalance. If None, H will be calculated using a heuristic (i.e. The integer value between lower_H and upper_H that produced the median L1 imbalance)
measure (str, optional) – Multivariate imbalance measure to use (only L1 and L2 imbalance supported)
lower_H (int, optional) – If H is not provided, the lower end of the range for the automatic H search.
upper_H (int, optional) – If H is not provided, the upper end of the range for the automatic H search.

data¶

Type:	pandas.DataFrame

treatment¶

Type:	str

outcome¶

Type:	str

H¶

Type:	int

imbalance_schema¶

Independent coarsening schema used to calculate multivariate imbalance (pre or post matching)

Type:	dict

measure¶

Multivariate imbalance measure

Type:	str

imbalance(coarsening: Optional[dict] = None) → float¶

Calculate the multivariate imbalance remaining after matching the data using some coarsening schema

Parameters:	coarsening (dict) – Defines the strata. If None, the returned value is the imbalance prior to performing CEM. Keys are the covariate/column names and values are tuples of (func, kwargs). “func” is the name of the Pandas function to use for grouping the covariate (only “cut” and “qcut” are supported) “kwargs” is a dict of arguments to be passed to the Pandas cut function (along with the covariate data)
Returns:	The residual imbalance
Return type:	float

match(coarsening: Optional[dict] = None) → pandas.core.series.Series¶

Perform coarsened exact matching using some coarsening schema and return the weights for each observation

Parameters:	coarsening (dict) – Defines the strata. If None, the returned value is the imbalance prior to performing CEM. Keys are the covariate/column names and values are tuples of (func, kwargs). “func” is the name of the Pandas function to use for grouping the covariate (only “cut” and “qcut” are supported) “kwargs” is a dict of arguments to be passed to the Pandas cut function (along with the covariate data)
Returns:	The weight to use for each observation of the provided data given the coarsening schema provided
Return type:	pandas.Series