causal_testing.estimation.genetic_programming_regression_fitter
This module contains a genetic programming implementation to infer the functional form between the adjustment set and the outcome.
Module Contents
Classes
Object to perform genetic programming. |
Functions
|
Return the reciprocal of the input. |
|
NOTE: This is a temporary workaround. This method is copied verbatim from |
|
Creates a power operator and its corresponding sympy conversion. |
- causal_testing.estimation.genetic_programming_regression_fitter.reciprocal(x: float) float
Return the reciprocal of the input. :param x: Float to reciprocate. :return: 1/x
- causal_testing.estimation.genetic_programming_regression_fitter.mut_insert(expression: deap.gp.PrimitiveTree, pset: deap.gp.PrimitiveSet)
NOTE: This is a temporary workaround. This method is copied verbatim from gp.mutInsert. It seems they forgot to import isclass from inspect, so their method throws an error, saying that “isclass is not defined”. A couple of lines are not covered by tests, but since this is 1. a temporary workaround until they release a new version of DEAP, and 2. not our code, I don’t think that matters.
Inserts a new branch at a random position in expression. The subtree at the chosen position is used as child node of the created subtree, in that way, it is really an insertion rather than a replacement. Note that the original subtree will become one of the children of the new primitive inserted, but not perforce the first (its position is randomly selected if the new primitive has more than one child).
- Parameters:
expression – The normal or typed tree to be mutated.
pset – The pset object defining the variables and constants.
- Returns:
A tuple of one tree.
- causal_testing.estimation.genetic_programming_regression_fitter.create_power_function(order: int)
Creates a power operator and its corresponding sympy conversion.
- Parameters:
order – The order of the power, e.g. order=2 will give x^2.
- Returns:
A pair consisting of the power function and the sympy conversion
- class causal_testing.estimation.genetic_programming_regression_fitter.GP(df: pandas.DataFrame, features: list, outcome: str, max_order: int = 0, extra_operators: list = None, sympy_conversions: dict = None, seed=0)
Object to perform genetic programming.
- split(individual: deap.gp.PrimitiveTree) list
Split an expression into its components, e.g. 2x + 4y - xy -> [2x, 4y, xy].
- Parameters:
individual – The expression to be split.
- Returns:
A list of the equations components that are linearly combined into the full equation.
- _convert_prim(prim: deap.gp.Primitive, args: list) str
Convert primitives to sympy format.
- Parameters:
prim – A GP primitive, e.g. add
args – The list of arguments
- Returns:
A sympy compatible string representing the function, e.g. add(x, y) -> Add(x, y).
- _stringify_for_sympy(expression: deap.gp.PrimitiveTree) str
Return the expression in a sympy compatible string.
- Parameters:
expression – The expression to be simplified.
- Returns:
A sympy compatible string representing the equation.
- simplify(expression: deap.gp.PrimitiveTree) sympy.core.Expr
Simplify an expression by appling mathematical equivalences.
- Parameters:
expression – The expression to simplify.
- Returns:
The simplified expression as a sympy Expr object.
- repair(expression: deap.gp.PrimitiveTree) deap.gp.PrimitiveTree
Use linear regression to infer the coefficients of the linear components of the expression. Named “repair” since a “repair operator” is quite common in GP.
- Parameters:
expression – The expression to process.
- Returns:
The expression with constant coefficients, or the original expression if that fails.
- fitness(expression: deap.gp.PrimitiveTree) float
Evaluate the fitness of an candidate expression according to the error between the estimated and observed values. Low values are better.
- Parameters:
expression – The candidate expression to evaluate.
- Returns:
The fitness of the individual.
- make_offspring(population: list, num_offspring: int) list
Create the next generation of individuals.
- Parameters:
population – The current population.
num_offspring – The number of new individuals to generate.
- Returns:
A list of num_offspring new individuals generated through crossover and mutation.
- run_gp(ngen: int, pop_size: int = 20, num_offspring: int = 10, seeds: list = None, repair: bool = True) deap.gp.PrimitiveTree
Execute Genetic Programming to find the best expression using a mu+lambda algorithm.
- Parameters:
ngen – The maximum number of generations.
pop_size – The population size.
num_offspring – The number of new individuals per generation.
seeds – Seed individuals for the initial population.
repair – Whether to run the linear regression repair operator (defaults to True).
- Returns:
The best candididate expression.
- mutate(expression: deap.gp.PrimitiveTree) deap.gp.PrimitiveTree
mutate individuals to replicate the small changes in DNA that occur in natural reproduction. A node will randomly be inserted, removed, or replaced.
- Parameters:
expression – The expression to mutate.
- Returns:
The mutated expression.