Causal Inference &
Discovery in Python

The Book

From Machine Learning & Pearlian Perspective

Hi, my name is Alex.

When I was starting with causality three years ago I could not find a comprehensive book on causality in Python.

I decided to write this book, to make your journey into causality easier and faster.
The book will be available in early 2023 (ebook + physical).

Subscribe below to get updates and learn about amazing free content on causality that I'll be sharing over the next months!

© Causal Python 2022


Fundamentals is a set of short articles presenting the basic causal concepts, power tips and secrets to help you jump-start your causal journey. We focus on causal inference and causal discovery in Python, but many resources are universal.The list of topics will grow with bi-weekly frequency.

Available topics:

Four amazing Causality Resources

What fits you best?

Free content on causal machine learning & updates on my causal book:

Want to learn fundamental concepts on causality?

Welcome to Causal python!

Check the latest blog post here:

Free content on causal machine learning & updates on my causal book:


Pearlian causal inference framework is rooted in Bayesian networks. In Bayesian networks (that are probabilistic graphical models) we represent variables as nodes and conditional probabilities between variables as directed edges.

In causal inference nodes also represent variables, but edges represent causal relationships between the variables rather than just conditional probabilities.D-separation is a set of rules that allows us to block a path between two (sets of) variables.There are three basic graphical structures**:
Knowing the three structures, d-separation boils down to the following rules:For any three disjoint sets of nodes X, Y and Z, a path between X and Y is blocked by Z if:• There’s a fork i←j→k or a chain i→ j→ k in this path such that the middle node j∈Z or• There’s a collider i→j ←k on this path such that neither j nor any of its descendants belongs to Z.These rules can be now used to de-confound causal relationships in your data or uncover causal structure (e.g. PC algorithm)._____________** To learn more about these structures check this video or this video.Play with causal structures in the notebook

Causal Discovery

Check the latest blog post here

Defining Causality

People have been thinking about causality over the millennia and across the cultures.Aristotle – one of the most prolific philosophers of ancient Greece – claimed that understanding causal structure of a process is a necessary ingredient of knowledge about this process. Moreover, he argued that being able to answer “why” questions is the essence of scientific explanation. He distinguished four types of causes (material, formal, efficient and final), an idea that might capture certain interesting aspects of reality as much as it might be counter-intuitive to a contemporary scientist or researcher.David Hume, a famous 18th century Scottish philosopher, proposed a more unified framework for cause-effect relationships. Hume starts with an observation that we never experience cause-effect relationships in the world, which leads him to formulating an association-based definition of causality (human mind produces a sense of cause-effect relationships when it gets used to repeatable sequences of events).[Note that this is a very simplified view on Hume's ideas. To learn more check this article]We know that pure associations are not enough to accurately describe causal effects (vide confounding).In order to address this issue and reduce the complexity of the problem, Judea Pearl proposed a very simple and actionable definition of causality:A causes B when B "listens to" AThis essentially means that if we change something in A, we expect to see a change in B.This definition allows us to abstract out ontological complexities and focus on addressing real-world problems.

Assumptions: Positivity

What is positivity assumption?Positivity (also known as overlap) is one of the most fundamental assumptions in causal inference.Simply put, it requires that the probability of treatment, given control variables is greater than zero.Formally:P(T=t | Z=z) > 0(Greater than 0 == positive; hence the name)What is the meaning of this formula?For every value of the control variables, the probability of every possible value of treatment should be greater then 0.Why is this important? Let's see!Let's assume that in our dataset we have 60 subjects described by a single continuous feature Z.Each subject either receives a binary treatment T or not, and each subject has some continuous outcome Y.Importantly, we need to control for Z in order to compute the unbiased effect of the treatment T on the outcome Y.Imagine that, by accident, the treatment was administered only to people whose value of Z is 5 or more. On the other hand, people whose value of Z is less than 5, did not get the treatment.To compute the average treatment effect (ATE) of treatment T, we need to compare the values of the outcomes under the treatment to the values under no treatment:ATE = E[Y(1) - Y(0)]Take a look at the figure below:

Subjects that got the treatment are marked in red. Subjects that did not get the treatment are marked in blue.Is the probability of each value of treatment greater than zero for each value of Z?Clearly not!So, what’s the problem here?In order to compare the outcomes of treated subjects with the outcomes of untreated subjects, we need to estimate the values for red dots in the blue area (where we have no treatment) and the values of the blue dots in the red area.Whatever model we use for this purpose, it will need to extrapolate.Take a look at the figure below that shows possible extrapolation trajectories (red and blue lines respectively):

How accurate is it in your opinion?Will it lead to good estimates of the treatment effect?It really hard to answer this question! We can only guess!Compare this to the figure below where positivity assumption is not violated:

Now, a model has a much easier job to do. It only needs to interpolate.We can reasonably estimate the effect from, without guessing! That’s the essence of the positivity assumption!Naturally, positivity also works in higher dimensional cases (with a caveat that interpolation in high dimension is a problematic concept in itself; Balestriero et al., 2021).

Photo: Ahmet Polat

Structural Causal Models


This model can be described by the following set of assignments (traditionally called equations):A := fa(εa)
X := fx(A, εx)
Y := fy(A, X, εy)
where fs are some arbitrary functions (fx is a function specific to X and so on) and εs are noise terms (imagine a normal distribution for instance). Note that we've omitted εs in the graph for simplicity, but they are an important part of the model.In the SCM nomenclature A, X and Y are called endogenous variables while εs are called exogenous variables (note that they don't have any parent assignments).Some researchers tend to include the graphical representation in the definition of SCM and some don't. The definitions are not entirely consistent in the literature.


Confounding is a basic causal concept.Confounding variables influence two or more other variables and produce a problematic spurious association between them. Such an association is visible from purely statistical point of view (e.g. in correlation analysis), but does not make sense from the causal point of view.To learn more check this Jupyter Notebook and this blog post.

Books on Causality

Causality as a field is highly heterogenous.When I was starting in the field, I was very surprised to learn that sub-fields of causal inference and causal discovery were almost entirely separated and studied by virtually non-overlapping groups of researchers.Additionally, two main schools of thought in causal inference - one coming from Donald Rubin the other from Judea Pearl - tend to use different vocabulary, making the life of causal student even more difficult.

My curated list of six causal books is here to help you grasp these differences and find the common denominator among the noise.

This list is not meant to be complete (whatever it means). It's rather a guided tour to help you feel at home in the Causal Wonderland.
The six books are:
- The Book of Why
- Causal Inference in Statistics — A Primer
- Elements of Causal Inference
- Causality — Models, Reasoning and Inference
- Causal Inference — The Mixtape
- Causal Inference - What If?
Learn more about these books in this blog post.

Thank you

Please check your inbox to confirm your email.