# Causal Inference &

Discovery in Python

*The Book*

*From Machine Learning & Pearlian Perspective*

Welcome to Causal Python! I am so glad to see you here!

My name is **Alex**.For the last 6 years I was helping researchers and entrepreneurs solve their problems using machine learning & statistical techniques. I helped built large-scale machine learning systems for **Fortune 100**, **Fortune 500** and **Inc. 5000** businesses and gave talks and workshops for companies like **Mercedes Benz** or **e:fs**.When I was **starting with causality** almost 4 years ago I **could not find** a comprehensive book on causality in Python.Understanding the **potential of casual machine learning** and knowing how much **effort it took** me to build a solid understanding of how to **translate** the main causal ideas **into code**, I decided to **write a book**, to make your **journey** into causality **easier** and **faster**.The book provides a **comprehensive exploration** of the **theory** and **techniques** at the intersection of **modern causality** and **machine learning**.It covers basic motivations and fundamental concepts of **Pearlian** causal inference, explains the theory and shows **step-by-step code examples** for traditional and advanced causal inference and discovery techniques using **DoWhy**, **EconML**, **gCastle**, **PyTorch** and more.The book will be available **mid 2023** (ebook + physical).**Subscribe** below to get updates and learn about amazing **free content** on **causality** that I'll be sharing over the **next months**!

© Causal Python 2022

## Fundamentals

**Fundamentals** is a set of short articles presenting the basic **causal concepts**, **power tips** and **secrets** to help you **jump-start** your causal journey. We focus on causal inference and causal discovery in **Python**, but many resources are universal.The list of topics will grow with bi-weekly frequency.

Available topics:

## Welcome to Causal python!

Check the latest blog post **here:**

**Free content** on causal machine learning & **updates on my causal book**:

## D-Separation

Pearlian causal inference framework is rooted in **Bayesian networks**. In Bayesian networks (that are probabilistic graphical models) we represent variables as **nodes** and conditional probabilities between variables as directed **edges**.

In **causal inference** nodes also represent variables, but **edges** represent **causal relationships** between the variables rather than just conditional probabilities.**D-separation** is a set of rules that allows us to *block* a path between two (sets of) variables.There are three basic graphical structures**:

• **forks**

• **chains**

• **colliders**Knowing the three structures, **d-separation** boils down to the following rules:For any three disjoint sets of nodes ** X**,

**and**

*Y***, a path between**

*Z***and**

*X***is blocked by**

*Y***if:• There’s a fork**

*Z***or a chain**

*i←j→k***in this path such that the middle node**

*i→ j→ k***or• There’s a collider**

*j∈Z***on this path such that neither**

*i→j ←k***nor any of its descendants belongs to**

*j***.These rules can be now used to**

*Z***de-confound**causal relationships in your data or uncover causal structure (e.g. PC algorithm)._____________** To learn more about these structures check this video or this video.Play with causal structures in the

**notebook**

## Causal Discovery

Check the latest blog post here

## Defining Causality

People have been thinking about causality **over the millennia** and **across the cultures**.**Aristotle** – one of the most prolific philosophers of ancient Greece – claimed that understanding causal structure of a process is a necessary ingredient of knowledge about this process. Moreover, he argued that being able to answer **“why” questions** is the **essence of scientific explanation**. He distinguished four types of causes (material, formal, efficient and final), an idea that might capture certain interesting aspects of reality as much as it might be counter-intuitive to a contemporary scientist or researcher.**David Hume**, a famous 18th century Scottish philosopher, proposed a **more unified framework** for cause-effect relationships. Hume starts with an observation that we never experience cause-effect relationships in the world, which leads him to formulating an **association-based definition of causality** (human mind produces *a sense of cause-effect* relationships when it **gets used to** repeatable **sequences of events**).[Note that this is a very simplified view on Hume's ideas. To learn more check this article]We know that pure associations are not enough to accurately describe causal effects (vide **confounding**).In order to address this issue and reduce the complexity of the problem, **Judea Pearl** proposed a very **simple** and **actionable definition of causality**:** A causes B when B "listens to" A**This essentially means that if we change something in

**, we expect to see a change in**

*A***.This definition allows us to abstract out ontological complexities and focus on addressing real-world problems.**

*B*## Assumptions: Positivity

What is **positivity assumption**?**Positivity** (also known as **overlap**) is one of **the most fundamental assumptions** in causal inference.Simply put, it requires that the probability of treatment, given control variables is greater than zero.Formally:*P(T=t | Z=z) > 0*(Greater than 0 == positive; hence the name)What is the **meaning** of this formula?For every value of the control variables, the probability of every possible value of treatment should be greater then 0.Why is this important? Let's see!Let's assume that in our dataset we have **60 subjects** described by a **single continuous feature Z**.**Each subject** either receives a **binary treatment T** or not, and each subject has some **continuous outcome Y**.**Importantly**, we need to **control for Z** in order to compute the **unbiased effect** of the treatment T on the outcome Y.Imagine that, by accident, the treatment was administered **only** to people whose value of **Z is 5 or more**. On the other hand, people whose value of **Z is less than 5**, **did not get the treatment**.To compute the **average treatment effect** (**ATE**) of treatment T, we need to compare the values of the outcomes under the treatment to the values under no treatment:*ATE = E[Y(1) - Y(0)]*Take a look at the figure below:

Subjects that **got the treatment** are marked in **red**. Subjects that **did not** get the treatment are marked in **blue**.Is the probability of each value of treatment **greater than zero** for each value of Z?**Clearly not!**So, what’s the problem here?In order to **compare the outcomes** of treated subjects with the outcomes of untreated subjects, we need to **estimate the values** for red dots in the blue area (where we have no treatment) and the values of the blue dots in the red area.Whatever model we use for this purpose, it will need to **extrapolate**.Take a look at the figure below that shows **possible extrapolation trajectories** (red and blue lines respectively):

How accurate is it in your opinion?Will it lead to **good estimates** of the treatment effect?It **really hard** to answer this question! We can only **guess**!**Compare this** to the figure below where positivity assumption is not violated:

Now, a model has a **much easier job** to do. It only needs to **interpolate**.We can reasonably estimate the effect from, without guessing! That’s the **essence of the positivity assumption!**Naturally, **positivity also works in higher dimensional cases** (with a caveat that interpolation in high dimension is a problematic concept in itself; Balestriero et al., 2021).

Photo: Ahmet Polat

## Structural Causal Models

Photo:

This model can be described by the following set of **assignments** (traditionally called ** equations**):

*A := fa(εa)*

*X := fx(A, εx)***where**

*Y := fy(A, X, εy)***s are some arbitrary functions (**

*f***is a function specific to**

*fx***and so on) and**

*X***s are**

*ε***(imagine a normal distribution for instance). Note that we've**

*noise terms***omitted**

**s in the graph for simplicity, but they are an important part of the model.In the**

*ε***SCM**nomenclature

**,**

*A***and**

*X***are called**

*Y***endogenous variables**while

**s are called**

*ε***exogenous variables**(note that they don't have any

*parent*assignments).Some researchers tend to include the

**graphical representation**in the definition of

**SCM**and some don't. The definitions are not entirely consistent in the literature.

## Confounding

**Confounding** is a basic causal concept.**Confounding variables** influence two or more other variables and produce a problematic spurious association between them. Such an association is visible from purely statistical point of view (e.g. in correlation analysis), but does not make sense from the causal point of view.To learn more check this Jupyter Notebook and this blog post.

## Books on Causality

**Causality** as a field is highly heterogenous.When I was starting in the field, I was very surprised to learn that sub-fields of **causal inference** and **causal discovery** were almost entirely separated and studied by virtually non-overlapping groups of researchers.Additionally, two main schools of thought in causal inference - one coming from **Donald Rubin** the other from **Judea Pearl** - tend to use different vocabulary, making the life of causal student even more difficult.

My curated list of **six causal books** is here to help you grasp these differences and find the common denominator among the noise.

This list is not meant to be **complete** (whatever it means). It's rather a guided tour to help you feel at home in the **Causal Wonderland**.The six books are:

- **The Book of Why**

- **Causal Inference in Statistics — A Primer**

- **Elements of Causal Inference**

- **Causality — Models, Reasoning and Inference**

- **Causal Inference — The Mixtape**

- **Causal Inference - What If?****Learn more** about these books in **this blog post**.

## Thank you

Please check your inbox to **confirm your email**.