Estimators Overview
This page provides an overview on how to choose the most appropriate estimator for your workflow.
LinearRegressionEstimator
Recommended use: For continuous numerical outcomes (e.g. the number of people who are vaccinated).
- class causal_testing.estimation.linear_regression_estimator.LinearRegressionEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, df: DataFrame = None, effect_modifiers: dict[Variable, Any] = None, formula: str = None, alpha: float = 0.05, query: str = '')
Bases:
RegressionEstimatorA Linear Regression Estimator is a parametric estimator which restricts the variables in the data to a linear combination of parameters and functions of the variables (note these functions need not be linear).
- estimate_ate() EffectEstimate
Estimate the average treatment effect of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value.
- Returns:
The average treatment effect and the 95% Wald confidence intervals.
- estimate_ate_calculated(adjustment_config: dict = None) EffectEstimate
Estimate the ATE of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value. Here, we actually calculate the expected outcomes under control and treatment and divide one by the other. This allows for custom terms to be put in such as squares, inverses, products, etc.
- Param:
adjustment_config: The configuration of the adjustment set as a dict mapping variable names to their values. N.B. Every variable in the adjustment set MUST have a value in order to estimate the outcome under control and treatment.
- Returns:
The average treatment effect and the 95% Wald confidence intervals.
- estimate_coefficient() EffectEstimate
Estimate the unit average treatment effect of the treatment on the outcome. That is, the change in outcome caused by a unit change in treatment.
- Returns:
The unit average treatment effect and the 95% Wald confidence intervals.
- estimate_risk_ratio(adjustment_config: dict = None) EffectEstimate
Estimate the risk_ratio effect of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value.
- Returns:
The average treatment effect and the 95% Wald confidence intervals.
- gp_formula(ngen: int = 100, pop_size: int = 20, num_offspring: int = 10, max_order: int = 0, extra_operators: list = None, sympy_conversions: dict = None, seeds: list = None, seed: int = 0)
Use Genetic Programming (GP) to infer the regression equation from the data.
- Parameters:
ngen – The maximum number of GP generations to run for.
pop_size – The GP population size.
num_offspring – The number of offspring per generation.
max_order – The maximum polynomial order to use, e.g.
max_order=2will give polynomials of the formax^2 + bx + c.extra_operators – Additional operators for the GP (defaults are +, *, log(x), and 1/x). Operations should be of the form (fun, numArgs), e.g. (add, 2).
sympy_conversions – Dictionary of conversions of extra_operators for sympy, e.g.
"mul": lambda \*args_: "Mul({},{})".format(\*args_).seeds – Seed individuals for the population (e.g. if you think that the relationship between X and Y is probably logarithmic, you can put that in).
seed – Random seed for the GP.
LogisticRegressionEstimator
Recommended use: For binary outcomes (yes/no, true/false, success/failure).
- class causal_testing.estimation.logistic_regression_estimator.LogisticRegressionEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, df: DataFrame = None, effect_modifiers: dict[Variable, Any] = None, formula: str = None, alpha: float = 0.05, query: str = '')
Bases:
RegressionEstimatorA Logistic Regression Estimator is a parametric estimator which restricts the variables in the data to a linear combination of parameters and functions of the variables (note these functions need not be linear). It is designed for estimating categorical outcomes.
- add_modelling_assumptions()
Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.
- estimate_unit_odds_ratio() EffectEstimate
Estimate the odds ratio of increasing the treatment by one. In logistic regression, this corresponds to the coefficient of the treatment of interest.
- Returns:
The odds ratio. Confidence intervals are not yet supported.
CubicSplineRegressionEstimator
Recommended use: For continuous outcomes with non-linear relationships or changes in behaviour. Useful when the relationship between treatment and outcome cannot be captured by a linear model.
- class causal_testing.estimation.cubic_spline_estimator.CubicSplineRegressionEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, basis: int, df: DataFrame = None, effect_modifiers: dict[Variable, Any] = None, formula: str = None, alpha: float = 0.05, expected_relationship=None)
Bases:
LinearRegressionEstimatorA Cubic Spline Regression Estimator is a parametric estimator which restricts the variables in the data to a combination of parameters and basis functions of the variables.
- estimate_ate_calculated(adjustment_config: dict = None) EffectEstimate
Estimate the ate effect of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value. Here, we actually calculate the expected outcomes under control and treatment and divide one by the other. This allows for custom terms to be put in such as squares, inverses, products, etc.
- Param:
adjustment_config: The configuration of the adjustment set as a dict mapping variable names to their values. N.B. Every variable in the adjustment set MUST have a value in order to estimate the outcome under control and treatment.
- Returns:
The average treatment effect.
- fit_model(data=None) RegressionResultsWrapper
Run linear regression of the treatment and adjustment set against the outcome and return the model.
- Returns:
The model after fitting to data.
- classmethod regressor(formula, data, subset=None, drop_cols=None, *args, **kwargs)
Create a Model from a formula and dataframe.
Parameters
- formulastr or generic Formula object
The formula specifying the model.
- dataarray_like
The data for the model. See Notes.
- subsetarray_like
An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame.
- drop_colsarray_like
Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.
- *args
-
Additional positional argument that are passed to the model.
- **kwargs
-
These are passed to the model with one exception. The
eval_envkeyword is passed to patsy. It can be either apatsy:patsy.EvalEnvironmentobject or an integer indicating the depth of the namespace to use. For example, the defaulteval_env=0uses the calling namespace. If you wish to use a “clean” environment seteval_env=-1.
Returns
- model
The model instance.
Notes
data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame.
InstrumentalVariableEstimator
Recommended use: When dealing with unmeasured confounding using instrumental variables.
- class causal_testing.estimation.instrumental_variable_estimator.InstrumentalVariableEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: set, instrument: str, df: DataFrame = None, alpha: float = 0.05, query: str = '')
Bases:
EstimatorCarry out estimation using instrumental variable adjustment rather than conventional adjustment. This means we do not need to observe all confounders in order to adjust for them. A key assumption here is linearity.
- add_modelling_assumptions()
Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.
- estimate_coefficient(bootstrap_size=100) EffectEstimate
Estimate the unit ate (i.e. coefficient) of the treatment on the outcome.
- iv_coefficient(df) float
Estimate the linear regression coefficient of the treatment on the outcome.
IPCWEstimator
Recommended use: For handling missing data or selection bias using inverse probability of censoring weighting (e.g. time-varying data).
- class causal_testing.estimation.ipcw_estimator.IPCWEstimator(df: DataFrame, timesteps_per_observation: int, control_strategy: list[tuple[int, str, Any]], treatment_strategy: list[tuple[int, str, Any]], outcome: Variable, status_column: str, fit_bl_switch_formula: str, fit_bltd_switch_formula: str, eligibility=None, alpha: float = 0.05, total_time: float = None)
Bases:
EstimatorClass to perform Inverse Probability of Censoring Weighting (IPCW) estimation for sequences of treatments over time-varying data.
- Param:
df: Input DataFrame containing time-varying data.
- Param:
timesteps_per_observation: Number of timesteps per observation.
- Param:
control_strategy: The control strategy, with entries of the form (timestep, variable, value).
- Param:
treatment_strategy: The treatment strategy, with entries of the form (timestep, variable, value).
- Param:
outcome: Name of the outcome column in the DataFrame.
- Param:
status_column: Name of the status column in the DataFrame, which should be True for operating normally, False for a fault.
- Param:
fit_bl_switch_formula: Formula for fitting the baseline switch model.
- Param:
fit_bltd_switch_formula: Formula for fitting the baseline time-dependent switch model.
- Param:
eligibility: Function to determine eligibility for treatment. Defaults to None for “always eligible”.
- Param:
alpha: Significance level for hypothesis testing. Defaults to 0.05.
- Param:
total_time: Total time for the analysis. Defaults to one plus the length of of the strategy (control or treatment) with the most elements multiplied by timesteps_per_observation.
- add_modelling_assumptions()
Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.
- estimate_hazard_ratio() EffectEstimate
Estimate the hazard ratio.
- preprocess_data()
Set up the treatment-specific columns in the data that are needed to estimate the hazard ratio.
- setup_fault_t_do(individual: DataFrame)
Return a binary sequence with each bit representing whether the current index is the time point at which the event of interest (i.e. a fault) occurred.
N.B. This is rounded _up_ to the nearest multiple of self.timesteps_per_observation. That is, if the fault occurs at time 22, and self.timesteps_per_observation == 5, then fault_t_do will be 25.
- setup_fault_time(individual: DataFrame, perturbation: float = -0.001)
Return the time at which the event of interest (i.e. a fault) occurred.
- setup_xo_t_do(individual: DataFrame, strategy_assigned: list)
Return a binary sequence with each bit representing whether the current index is the time point at which the individual diverted from the assigned treatment strategy (and thus should be censored).
- Parameters:
individual – DataFrame representing the individual.
strategy_assigned – The assigned treatment strategy.
ExperimentalEstimator
- Recommended use: For randomised controlled trials or experimental data where treatment assignment is randomised.
Directly runs the system under test multiple times with different configurations (e.g. you need to collect new data by executing your system multiple times).
- class causal_testing.estimation.experimental_estimator.ExperimentalEstimator(base_test_case: BaseTestCase, treatment_value: float, control_value: float, adjustment_set: dict[str, Any], effect_modifiers: dict[str, Any] = None, alpha: float = 0.05, repeats: int = 200)
Bases:
EstimatorA Logistic Regression Estimator is a parametric estimator which restricts the variables in the data to a linear combination of parameters and functions of the variables (note these functions need not be linear). It is designed for estimating categorical outcomes.
- add_modelling_assumptions()
Add modelling assumptions to the estimator. This is a list of strings which list the modelling assumptions that must hold if the resulting causal inference is to be considered valid.
- estimate_ate() EffectEstimate
Estimate the average treatment effect of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value.
- Returns:
The average treatment effect and the bootstrapped confidence intervals.
- estimate_risk_ratio() tuple[Series, list[Series, Series]]
Estimate the risk ratio of the treatment on the outcome. That is, the change in outcome caused by changing the treatment variable from the control value to the treatment value.
- Returns:
The average treatment effect and the bootstrapped confidence intervals.
- abstract run_system(configuration: dict) dict
Runs the system under test with the supplied configuration and supplies the outputs as a dict. :param configuration: The run configuration arguments. :returns: The resulting output as a dict.