Jan 3

·

7 min read

## CAUSAL DATA SCIENCE

## A regularly updated collection of free resources to approach, explore and master causal inference topics

Since I have been asked many times for suggestions and resources to approach the field of** causal inference**, I decided to collect in this article what I personally think are the best free resources available online to either approach the field or expand your knowledge. I will try to suggest different types of **resources** so that you can pick what best suits your learning style: video lectures, books, articles, code examples, etc… Pick the format that works best for you!

A few **disclaimers** before starting. This article is very **subjective**: I will cover only resources that I know and that I have used. My **background** is more frequentist than bayesian and I am more familiar with the potential outcomes causal framework than other frameworks, such as directed acyclic graphs. Also, I decided to list only free resources, therefore a lot of books will be absent (yes, neither Mostly Harmless Econometrics nor The Book of Why are here). If I forgot something, let me know!

I divided the article into 4 main **sections**, by media type: lectures, blogs, books, and others. Within each section, I ranked the content by personal preference, starting from my favorites. Enjoy! 🤗

I start with video lectures since it has always been my favorite medium for learning. I personally think that nothing beats in-person teaching, but lecture recordings are a close second best.

**Ph.D. Applied Methods Course**

**Author**: Paul Goldsmith-Pinkham (Yale University)**Year**: 2021**Difficulty**: intermediate to advanced**Comment**: probably the best resource on causal inference available online, it is literally the recording of Paul’s course at Yale. Mostly covers quasi-experimental methods, providing both intuition and details.

**Ph.D. Econometrics Course**

**Author**: Chris Conlon (New York University)**Year**: 2020**Level**: research**Comment**: huge resource, covering a wide variety of topics mostly related to the “structural” approach. In particular, it is one of the few resources in this list covering extremum estimators (MLE, GMM).

**Machine Learning and Causal Inference Short Course**

**Authors**: Susan Athey, Jann Spiess, and Stephan Wager (Stanford University)**Year**: 2022**Level**: research**Comment**: a short course by leading researchers concentrating on the estimation of heterogeneous treatment effects. The content is very advanced, but the exposition is clear and paired with examples.

## Causal Inference Course

**Author**: Brady Neal (Mila — Quebec AI Institute).**Year**: 2020**Level**: beginner to intermediate**Comment**: a very approachable introduction to causal inference. Covers both the potential outcomes framework and DAGs. Guest lectures from Susan Athey and Alberto Abadie (plus an extra with Jushua Benjo).

## Causality Boot Camp

**Author**: various professors at the Simons’ Institute (Berkeley).**Year**: 2022**Level**: beginner to intermediate**Comment**: differently from most of the other resources, this bootcamp focuses on the DAG and do-calculus approach.

**Difference-in-Differences Reading Group**

**Author**: multiple researchers**Year**: 2021**Level**: advanced to research**Comment**: short collection of lectures on the exploding diff-in-diffs literature, hosted by the authors themselves. Extra value for the many questions from the audience.

**Causal Inference with Panel Data** Short Course

**Author**: Yiqing Xu (Stanford University)**Year**: 2021**Level**: advanced**Comment**: recordings of a 3-day course at the Washington University in St. Louis covering quasi-experimental methods with panel data. The level is advanced and points towards frontier research in the field.

Blogs are my second-best favorite resources: they combine a more intuitive and informal exposition with code. They lack the direct engagement of lectures, but you can usually directly play with code.

## Causal Inference for the Brave and True

**Author**: Matheus Facure**Year**: regularly updated**Level**: intermediate to advanced**Comment**: if you prefer playing with code rather than reading math, this is the resource for you. It covers a wide variety of topics, always translating equations into *extremely readable* code. Personal favorite!

## Causal Inference and Its Applications in Online Industry

**Author**: Alex Deng (Airbnb)**Year**: 2018**Level**: intermediate**Comments**: unfortunately incomplete, but covers in detail industry applications, concentrating on experimentation and AB tests.

## Difference-in-Differences Blog

**Author**: mostly Bret Zeldow and Laura Hatfield (Harvard Med)**Year**: 2019**Level**: intermediate**Comments**: covers difference-in-differences estimators in detail, from the basics to the research frontier, from the medical perspective.

There are few books that are freely available online, but they are excellent resources, mostly published in the last couple of years. Notable absences are Mostly Harmless Econometrics and any advanced econometrics book (I personally really like Econometrics by Bruce Hansen which was freely available online until recently).

## Causal Inference: The Mixtape

**Author**: Scott Cunningham (Baylor University)**Year**: 2021**Level**: intermediate**Comment**: extremely readable book, covering the basics of the potential outcomes framework with more words than equations. It also comes with in-text code in R, Python, and Stata. A second edition is on the way.

## The Effect

**Author**: Nick C. Huntington-Klein (Seattle University)**Year**: 2021**Level**: intermediate to advanced**Comment**: great resource covering a wide variety of quasi-experimental methods in detail. More technical and comprehensive than The Mixtape but still extremely readable, alternating text and code (mostly R).**Bonus**: comes with recordings on Youtube!

## Causal Inference: What If

**Author**: Miguel Hernan (Harvard University)**Year**: 2022**Level**: intermediate to advanced**Comment**: finally a resource that does not come from Economics! Covers mostly DAGs and experiments, with a lot of code in R, Python, and others. More academic but less readable than the previous two.

Lastly, some mixed resources that do not exactly fit the previous categories (but not less interesting or useful).

## AEA Continuing Education Series

**Author**: different professors**Year**: from 2009 onwards**Level**: advanced to research**Comments**: a collection of courses from the American Economic Association, from leading researchers in the field. There is usually one course per year on causal inference and econometrics.

Recent and relevant courses:

**Modern Sampling Methods: Design and Inference**(2022), by Keisuke Hirano (Yale University) and Jack Porter (UW Madison)**Mastering Mostly Harmless Econometrics**(2020), by Alberto Abadie, Joshua Angrist, and Christopher Walters (MIT)**Machine Learning and Econometrics**(2018), by Susan Athey, and Guido Imbens (Stanford University)

## NBER Summer Institute Methods Lectures

**Author**: rotating professors**Year**: from 2007 onwards**Level**: research**Comments**: a collection of lectures on (mostly) econometrics and causal inference, from leading researchers in the field. The series is not structured but contains extremely high-quality material.

Recent and relevant lectures:

**Empirical Bayes Applications**(2022), by Christopher Walters (MIT)**Synthetic Controls: Methods and Practice**(2021), by Alberto Abadie (MIT)**Regression Discontinuity Designs: Practice and Topics**(2021), by Matias Cattaneo (Princeton University)

## The Gary Chamberlain Seminar in Econometrics

**Author**: rotating professors**Year**: from 2020 onwards**Level**: research**Comments**: probably the most advanced (and least digestible) resource on the list. It is a seminar series where researchers present their (usually yet-to-be-published) research in causal inference.

## EconML Documentation Page

**Author**: researchers at Microsoft**Level**: advanced**Comment**: the library covers a wide variety of topics, mostly at the intersection of causal inference and machine learning. Everything is very well documented and paired with code examples.

## CausalML Documentation Page

**Author**: scientists at Uber**Level**: code**Comments**: documentation of causalML packages, mostly covering the estimation of heterogeneous treatment effects. Less elaborate than the EconML page, but contains both theory and examples.

## A/B Testing Course

**Author**: scientists at Google**Level**: beginner**Comments**: beginner course on ab testing and randomized experiments. Starts from the basics taking almost nothing for granted. Great for a first approach (or preparing for interviews).

**Honorable Mentions**

- Dive into Causal Machine Learning: lots of code from curses in Stanford and MIT, but unfortunately not always well documented.
- PyWhy documentation (ex. DoWhy): documentation for the largest causal inference repository.
- DoubleML documentation: library for double/debiased estimators.

Congrats for making it this far! You are now a causal inference master! 🥳

Jokes aside, I know this is a very long list of resources, but I hope it was useful. I will try to keep it updated, sharing any addition to the list on social media (I am usually active on Linkedin and Twitter). I also regularly publish on Towards Data Science on causal inference and data science topics.

## FAQs

### What is the best study for causal inference? ›

**Randomized controlled trials** are the gold standard for measuring causality. The best method to infer causality is through randomized controlled trials (RCTs).

**What is a simple example of causal inference? ›**

Take, for example, the causal chain **X → Y → Z**. In this chain, information about X gives us information about Y, which in turn provides information about Z. However, if we control for Y (by choosing, for example, a particular value of Y), information about X then provides no new information about Z.

**Why is causal inference difficult? ›**

Making valid causal inferences is challenging because **it requires high-quality data and adequate statistical methods**.

**What are the three types of causal inference? ›**

In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about **(1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects** (also known as " ...

**What is the best way to study causality? ›**

The Ideal Way: **Random Experiments**

The purest way to establish causation is through a randomized controlled experiment (like an A/B test) where you have two groups — one gets the treatment, one doesn't.

**Which research method is the most reliable at inferring causality? ›**

**Randomized controlled trials**

The RCT is typically regarded as the most robust basis for causal inference and represents the most common approach that uses study design to support the causal inference. Nevertheless, RCTs rest on the critical assumption that the groups are similar except with respect to the intervention.

**What is a causal inference for beginners? ›**

WHAT IS CAUSAL INFERENCE. In very simple words, causal inference is **the study of cause-and-effect relationships**. Those who practice causal inference ask questions such as does X cause Y, what are the effects of changing X on Y? For example, what is the effect of changing the format of the website on user retention?

**What are the four steps of causal inference? ›**

DoWhy breaks down causal inference into four simple steps: **model, identify, estimate, and refute**.

**What is the formula for causal inference? ›**

The mean treatment effect or mean causal effect is defined by **θ = E(Y1) − E(Y0) = E(Y |set X = 1) − E(Y |set X = 0)**. The parameter θ has the following interpretation: θ is the mean response if we exposed everyone minus the mean response if we exposed no-one.

**What is the biggest threat to causal inference? ›**

The fundamental challenge of causal inference is that **the counterfactual cannot be observed directly**. In health research, we often compare a group of people who were exposed to the potential cause to a group of people who were not exposed.

### What are three conditions needed for causal inference in research? ›

There are three conditions for causality: **covariation, temporal precedence, and control for “third variables.”** The latter comprise alternative explanations for the observed causal relationship.

**What is the golden standard of causal inference? ›**

**Randomized controlled trials** have long been considered the 'gold standard' for causal inference in clinical research.

**What are the two fundamental laws of causal inference? ›**

Principle 1: “**The law of structural counterfactuals.” Principle 2: “The law of structural independence.”**

**What is the strongest study design to establish causation? ›**

**Randomized controlled trials (RCTs)** are considered as the gold standard for causal inference because they rely on the fewest and weakest assumptions. But under certain conditions quasi-experimental designs that lack random assignment can also be as credible as RCTs (Shadish, Cook, & Campbell, 2002).

**What are the three rules of causation? ›**

To establish causality you need to show three things–**that X came before Y, that the observed relationship between X and Y didn't happen by chance alone, and that there is nothing else that accounts for the X -> Y relationship**.

**What are the four approaches to causality? ›**

This short paper compiles the big ideas behind some philosophical views, definitions, and examples of causality. This collection spans the realms of the four commonly adopted approaches to causality: **Humes regularity, counterfactual, manipulation, and mechanisms**.

**What is the most powerful method for determining causal relationships? ›**

**Longitudinal studies** can be more powerful in determining causal relations than cross-sectional studies in determining causal relationships since we've got several data points for each unit and can look at the progression over time.

**How does a researcher make a causal inference? ›**

To make a causal inference statement, **the independent variable (the reading program in our examples) is manipulated in the different groups, and all other variables that might affect the independent variable are held constant**.

**What are common causal inference methods? ›**

Common frameworks for causal inference include the **causal pie model (component-cause), Pearl's structural causal model (causal diagram + do-calculus), structural equation modeling, and Rubin causal model (potential-outcome)**, which are often used in areas such as social sciences and epidemiology.

**What is an example of causal inference in research? ›**

Causal inference refers to the process of drawing a conclusion that a specific treatment (i.e., intervention) was the “cause” of the effect (or outcome) that was observed. A simple example is concluding that **taking an aspirin caused your headache to go away**.

### Is causal inference inductive or deductive? ›

Causal reasoning is generally considered a form of **inductive reasoning**.

**What are two common errors to avoid when using causal reasoning? ›**

Here are two common errors to avoid when using causal reasoning. First: is the **Fallacy of false cause**. Second: avoid when using causal reasoning is assuming that events have only one cause, some events have several causes. Avoid when using causal reasoning is assuming that events have only one cause.

**What is the philosophy of causal inference? ›**

Causal inference is merely **special case of prediction in which one is concerned with predicting outcomes under alternative manipulations**. The conditionality problem illustrates how the introduction of a causal component into a statistical model can resolve previous ambiguities in choice of a statistical procedure.

**What is an example of false causality? ›**

This fallacy falsely assumes that one event causes another. Often a reader will mistake a time connection for a cause-effect connection. EXAMPLES: **Every time I wash my car, it rains.** **Our garage sale made lots of money before Joan showed up.**

**What is the fundamental problem with causal inference? ›**

The challenge for causal inference is that **we are not generally able to observe both of these states**: at the point in time when we are measuring the outcomes, each individual either has had drug exposure or has not.

**What Cannot be the purpose of a causal study? ›**

Answer and Explanation: Causal studies do not typically use methods that involve: **studying relationships between variables (correlation)** **detailed or in-depth descriptions of phenomena**.

**What are the three types of causal logic? ›**

There are three types of causal reasoning: **deduction, induction, and abduction**.

**What is Bayesian approach for causal inference? ›**

The Bayesian Causal Inference model of multisensory perception is **a statistical model that essentially infers the more likely of two causal structures, given sensory inputs and assumptions about the structures**.

**What does Sutva mean? ›**

The Stable Unit Treatment Value Assumption (SUTVA) and Its Implications for Social Science RCTs.

**What is the difference between causality and causal inference? ›**

Causality describes ideas about the nature of the relations of cause and effect. A cause is something that produces or occasions an effect. Causal inference is the thought process that tests whether a relationship of cause to effect exists.

### What is the only way to determine a causal relationship between two? ›

**The use of a controlled study** is the most effective way of establishing causality between variables. In a controlled study, the sample or population is split in two, with both groups being comparable in almost every way. The two groups then receive different treatments, and the outcomes of each group are assessed.

**What are the three 3 well known cause analysis techniques? ›**

**Fishbone Diagram**. **Scatter Diagram**. **Failure Mode and Effects Analysis (FMEA)**

**What are three levels of confidence in causality? ›**

When seeking to establish a causal relationship, researchers distinguish among three levels of causation: **Absolute Causality, Conditional Causality, and Contributory Causality**.

**What is nonspuriousness? ›**

Nonspuriousness. **Relationship between two variables that is not due to variation in a third variable**.

**What is the best design for causal inferences? ›**

**Randomized controlled trials** are the gold standard for causal inference (Fisher, 1935). In an ideal experiment, the experimental units are randomized into two or more treatment groups and the group averages of the response variable estimate the average causal effects.

**What is the research method for causal inference? ›**

**Randomized experiments** are the gold standard for drawing causal inferences, but drawing such inferences from observational studies is often necessary and requires special care.

**Which type of research provides support for a causal inference? ›**

Experiments allow researchers to make causal inferences. Other types of methods include **longitudinal and quasi-experimental designs**.

**Which type of study allows for causal conclusions? ›**

**Experimental research** involves the manipulation of an independent variable and the measurement of a dependent variable. Random assignment to conditions is normally used to create initial equivalence between the groups, allowing researchers to draw causal conclusions.

**What are the four criteria for causal inference? ›**

Plausibility (reasonable pathway to link outcome to exposure) Consistency (same results if repeat in different time, place person) Temporality (exposure precedes outcome) Strength (with or without a dose response relationship)

**What is the most valid study design to determine causation? ›**

**Randomized experiments** (also known as RCT or randomized control trials) are considered to be the most rigorous approach, or the “gold standard,” to identifying causal effects because they theoretically eliminate all preexisting differences between the treatment and control groups.

### What are the 3 types of causal research? ›

**3 types of causal research are;**

- Comparative Study.
- Case-Control Study.
- Cohort Study.

**What are the basic methods for establishing causal inference? ›**

**Despite the difficulties inherent in determining causality in economic systems, several widely employed methods exist throughout those fields.**

- Theoretical methods. ...
- Instrumental variables. ...
- Model specification. ...
- Sensitivity analysis. ...
- Design-based econometrics.

**Is causal inference qualitative or quantitative? ›**

The logic of causal inference **typically invoked by quantitative methodologists** therefore also applies to qualitative comparative methods: if two or more cases are identical in all relevant dimensions but vary in the treatment, causal inference is internally valid.

**What type of study would be needed to determine a causal relation? ›**

In a longitudinal study, you would look for causal relations over time by analyzing users over time. For example, you would look at the difference in user behavior over time between those users that purchased a product after a certain email versus those users that didn't.

**Do you need random assignment for causation? ›**

**Random assignment helps you separation causation from correlation and rule out confounding variables**. As a critical component of the scientific method, experiments typically set up contrasts between a control group and one or more treatment groups.