Estimators Overview

This page provides an overview on how to choose the most appropriate estimator for your workflow.

LinearRegressionEstimator

Recommended use: For continuous numerical outcomes (e.g. the number of people who are vaccinated).

class causal_testing.estimation.linear_regression_estimator.LinearRegressionEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, df: DataFrame = None, effect_modifiers: dict[Variable, Any] = None, formula: str = None, alpha: float = 0.05, query: str = '')

Bases: RegressionEstimator

A Linear Regression Estimator is a parametric estimator which restricts the variables in the data to a linear combination of parameters and functions of the variables (note these functions need not be linear).

estimate_ate() → EffectEstimate

Estimate the average treatment effect of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value.

Returns:: The average treatment effect and the Wald confidence intervals.

estimate_ate_calculated(adjustment_config: dict = None) → EffectEstimate

Estimate the ATE of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value. Here, we actually calculate the expected outcomes under control and treatment and divide one by the other. This allows for custom terms to be put in such as squares, inverses, products, etc.

Param:: adjustment_config: The configuration of the adjustment set as a dict mapping variable names to their values. N.B. Every variable in the adjustment set MUST have a value in order to estimate the outcome under control and treatment.
Returns:: The average treatment effect and the Wald confidence intervals.

estimate_coefficient() → EffectEstimate

Estimate the unit average treatment effect of the treatment on the outcome. That is, the change in outcome caused by a unit change in treatment.

Returns:: The unit average treatment effect and the Wald confidence intervals.

estimate_risk_ratio(adjustment_config: dict = None) → EffectEstimate

Estimate the risk_ratio effect of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value.

Returns:: The average treatment effect and the Wald confidence intervals.

gp_formula(ngen: int = 100, pop_size: int = 20, num_offspring: int = 10, max_order: int = 0, extra_operators: list = None, sympy_conversions: dict = None, seeds: list = None, seed: int = 0)

Use Genetic Programming (GP) to infer the regression equation from the data.

Parameters:

ngen – The maximum number of GP generations to run for.
pop_size – The GP population size.
num_offspring – The number of offspring per generation.
max_order – The maximum polynomial order to use, e.g. max_order=2 will give polynomials of the form ax^2 + bx + c.
extra_operators – Additional operators for the GP (defaults are +, *, log(x), and 1/x). Operations should be of the form (fun, numArgs), e.g. (add, 2).
sympy_conversions – Dictionary of conversions of extra_operators for sympy, e.g. "mul": lambda \*args_: "Mul({},{})".format(\*args_).
seeds – Seed individuals for the population (e.g. if you think that the relationship between X and Y is probably logarithmic, you can put that in).
seed – Random seed for the GP.

LogisticRegressionEstimator

Recommended use: For binary outcomes (yes/no, true/false, success/failure).

class causal_testing.estimation.logistic_regression_estimator.LogisticRegressionEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, df: DataFrame = None, effect_modifiers: dict[Variable, Any] = None, formula: str = None, alpha: float = 0.05, query: str = '')

Bases: RegressionEstimator

A Logistic Regression Estimator is a parametric estimator which restricts the variables in the data to a linear combination of parameters and functions of the variables (note these functions need not be linear). It is designed for estimating categorical outcomes.

add_modelling_assumptions(): Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.

estimate_unit_odds_ratio() → EffectEstimate

Estimate the odds ratio of increasing the treatment by one. In logistic regression, this corresponds to the coefficient of the treatment of interest.

Returns:: The odds ratio with confidence intervals.

MultinomialRegressionEstimator

Recommended use: For categorical outcomes (e.g. colurs: Red, Green, Blue).

class causal_testing.estimation.multinomial_regression_estimator.MultinomialRegressionEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, df: DataFrame = None, effect_modifiers: dict[Variable, Any] = None, formula: str = None, alpha: float = 0.05, query: str = '')

Bases: RegressionEstimator

A Logistic Regression Estimator is a parametric estimator which restricts the variables in the data to a linear combination of parameters and functions of the variables (note these functions need not be linear). It is designed for estimating categorical outcomes.

add_modelling_assumptions(): Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.

estimate_unit_odds_ratio() → EffectEstimate

Estimate the odds ratio of increasing the treatment by one. In logistic regression, this corresponds to the coefficient of the treatment of interest.

Returns:: The odds ratio with confidence intervals.

CubicSplineRegressionEstimator

Recommended use: For continuous outcomes with non-linear relationships or changes in behaviour. Useful when the relationship between treatment and outcome cannot be captured by a linear model.

class causal_testing.estimation.cubic_spline_estimator.CubicSplineRegressionEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, basis: int, df: DataFrame = None, effect_modifiers: dict[Variable, Any] = None, formula: str = None, alpha: float = 0.05, expected_relationship=None)

Bases: LinearRegressionEstimator

A Cubic Spline Regression Estimator is a parametric estimator which restricts the variables in the data to a combination of parameters and basis functions of the variables.

estimate_ate_calculated(adjustment_config: dict = None) → EffectEstimate

Estimate the ate effect of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value. Here, we actually calculate the expected outcomes under control and treatment and divide one by the other. This allows for custom terms to be put in such as squares, inverses, products, etc.

Param:: adjustment_config: The configuration of the adjustment set as a dict mapping variable names to their values. N.B. Every variable in the adjustment set MUST have a value in order to estimate the outcome under control and treatment.
Returns:: The average treatment effect.

fit_model(data=None) → RegressionResultsWrapper

Run linear regression of the treatment and adjustment set against the outcome and return the model.

Returns:: The model after fitting to data.

classmethod regressor(formula, data, subset=None, drop_cols=None, *args, **kwargs)

Create a Model from a formula and dataframe.

Parameters

formulastr or generic Formula object: The formula specifying the model.
dataarray_like: The data for the model. See Notes.
subsetarray_like: An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.
drop_colsarray_like: Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.
*args: System Message: WARNING/2 (/home/docs/checkouts/readthedocs.org/user_builds/causal-testing-framework/checkouts/latest/causal_testing/estimation/cubic_spline_estimator.py:docstring of statsmodels.base.model.Model.from_formula, line 16); backlink

Inline emphasis start-string without end-string.

Additional positional argument that are passed to the model.
**kwargs: System Message: WARNING/2 (/home/docs/checkouts/readthedocs.org/user_builds/causal-testing-framework/checkouts/latest/causal_testing/estimation/cubic_spline_estimator.py:docstring of statsmodels.base.model.Model.from_formula, line 24); backlink

Inline strong start-string without end-string.

These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

Returns

model: The model instance.

Notes

data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame.

InstrumentalVariableEstimator

Recommended use: When dealing with unmeasured confounding using instrumental variables.

class causal_testing.estimation.instrumental_variable_estimator.InstrumentalVariableEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, instrument: str, df: DataFrame = None, alpha: float = 0.05, query: str = '')

Bases: Estimator

Carry out estimation using instrumental variable adjustment rather than conventional adjustment. This means we do not need to observe all confounders in order to adjust for them. A key assumption here is linearity.

add_modelling_assumptions(): Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.

estimate_coefficient(bootstrap_size=100) → EffectEstimate: Estimate the unit ate (i.e. coefficient) of the treatment on the outcome.

iv_coefficient(df) → float: Estimate the linear regression coefficient of the treatment on the outcome.

IPCWEstimator

Recommended use: For handling missing data or selection bias using inverse probability of censoring weighting (e.g. time-varying data).

class causal_testing.estimation.ipcw_estimator.IPCWEstimator(df: DataFrame, timesteps_per_observation: int, control_strategy: list[tuple[int, str, Any]], treatment_strategy: list[tuple[int, str, Any]], outcome: Variable, status_column: str, fit_bl_switch_formula: str, fit_bltd_switch_formula: str, eligibility=None, alpha: float = 0.05, total_time: float = None)

Bases: Estimator

Class to perform Inverse Probability of Censoring Weighting (IPCW) estimation for sequences of treatments over time-varying data.

Param:: df: Input DataFrame containing time-varying data.
Param:: timesteps_per_observation: Number of timesteps per observation.
Param:: control_strategy: The control strategy, with entries of the form (timestep, variable, value).
Param:: treatment_strategy: The treatment strategy, with entries of the form (timestep, variable, value).
Param:: outcome: Name of the outcome column in the DataFrame.
Param:: status_column: Name of the status column in the DataFrame, which should be True for operating normally, False for a fault.
Param:: fit_bl_switch_formula: Formula for fitting the baseline switch model.
Param:: fit_bltd_switch_formula: Formula for fitting the baseline time-dependent switch model.
Param:: eligibility: Function to determine eligibility for treatment. Defaults to None for “always eligible”.
Param:: alpha: Significance level for hypothesis testing. Defaults to 0.05.
Param:: total_time: Total time for the analysis. Defaults to one plus the length of of the strategy (control or treatment) with the most elements multiplied by timesteps_per_observation.

add_modelling_assumptions(): Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.

estimate_hazard_ratio() → EffectEstimate: Estimate the hazard ratio.

preprocess_data(): Set up the treatment-specific columns in the data that are needed to estimate the hazard ratio.

setup_fault_t_do(individual: DataFrame)

Return a binary sequence with each bit representing whether the current index is the time point at which the event of interest (i.e. a fault) occurred.

N.B. This is rounded _up_ to the nearest multiple of self.timesteps_per_observation. That is, if the fault occurs at time 22, and self.timesteps_per_observation == 5, then fault_t_do will be 25.

setup_fault_time(individual: DataFrame, perturbation: float = -0.001): Return the time at which the event of interest (i.e. a fault) occurred.

setup_xo_t_do(individual: DataFrame, strategy_assigned: list)

Return a binary sequence with each bit representing whether the current index is the time point at which the individual diverted from the assigned treatment strategy (and thus should be censored).

Parameters:

individual – DataFrame representing the individual.
strategy_assigned – The assigned treatment strategy.

ExperimentalEstimator

Recommended use: For randomised controlled trials or experimental data where treatment assignment is randomised.: Directly runs the system under test multiple times with different configurations (e.g. you need to collect new data by executing your system multiple times).

class causal_testing.estimation.experimental_estimator.ExperimentalEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: dict[str, Any], effect_modifiers: dict[str, Any] = None, alpha: float = 0.05, repeats: int = 200)

Bases: Estimator

A Logistic Regression Estimator is a parametric estimator which restricts the variables in the data to a linear combination of parameters and functions of the variables (note these functions need not be linear). It is designed for estimating categorical outcomes.

add_modelling_assumptions(): Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.

estimate_ate() → EffectEstimate

Estimate the average treatment effect of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value.

Returns:: The average treatment effect and the bootstrapped confidence intervals.

estimate_risk_ratio() → tuple[Series, list[Series, Series]]

Estimate the risk ratio of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value.

Returns:: The average treatment effect and the bootstrapped confidence intervals.

abstract run_system(configuration: dict) → dict: Runs the system under test with the supplied configuration and supplies the outputs as a dict. :param configuration: The run configuration arguments. :returns: The resulting output as a dict.