Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Parametric (model-based) estimators of causal effects

Slides

This page illustrates parametric adjustment: how a model like Ordinary Least Squares can be used to predict potential outcomes as a function of treatment and confounders. Parametric adjustment is useful in settings with many confounders, where nonparametric adjustment is not possible or would have poor statistical properties.

We illustrate parametric adjustment by estimating the causal effect of education on income, where we assume that the causal effect is identified by statistical adjustment for race, parent education, and parent income.

First, prepare the environment and load data we used in the PSID Prediction Challenge.

library(tidyverse)
d <- read_csv("../psid_mobility_challenge/for_students/learning.csv") %>%
  # Optional: Modify g2_educ so that it is an ordered factor
  mutate(g2_educ = fct_relevel(g2_educ, 
                               "Less than high school",
                               "High school",
                               "Some college",
                               "College"))

Parametric adjustment: The big idea

To take subgroups defined by these confounders would be impossible—no two people have exactly the same value of parent income. A model becomes necessary.

To estimate parametrically,

  1. fit a model to predict the outcome as a function of confounders and treatment
  2. create a data frame where treatment takes each value of interest
  3. using the model, predict the potential outcome in each modified data frame

Details with code examples

For example, we might assume an OLS specification where respondent education interacts with each of the confounders.

fit <- lm(g3_log_income ~ g3_educ * (g2_educ + g2_log_income + race),
          data = d)

Then, create data frames where the treatment is modified to take each value of interest: the respondent finishes exactly high school or finishes college.

d_College <- d %>%
  mutate(g3_educ = "College")
d_HighSchool <- d %>%
  mutate(g3_educ = "High school")

Finally, predict outcomes to estimate the conditional average causal effect for each respondent at their observed value of the confounding variables.

conditional_average_effect <- d %>%
  mutate(yhat_College = predict(fit, newdata = d_College),
         yhat_HighSchool = predict(fit, newdata = d_HighSchool),
         effect_of_college = yhat_College - yhat_HighSchool) %>%
  select(g3_id, race, g2_educ, g2_log_income, effect_of_college) %>%
  print()
## # A tibble: 1,365 × 5
##    g3_id race  g2_educ               g2_log_income effect_of_college
##    <dbl> <chr> <fct>                         <dbl>             <dbl>
##  1     1 White Less than high school          11.3             0.498
##  2     4 White High school                    12.1             0.374
##  3     7 White High school                    11.4             0.419
##  4     9 White Less than high school          11.3             0.498
##  5    10 White Less than high school          11.3             0.498
##  6    11 White High school                    12.1             0.374
##  7    12 White High school                    12.1             0.374
##  8    13 White Less than high school          11.6             0.479
##  9    19 White High school                    11.0             0.442
## 10    21 White High school                    11.0             0.442
## # … with 1,355 more rows

This produces a data frame where

  • each row is a respondent
  • each row has its own conditional average causal effect

You can summarize the average causal effect in the whole population by taking the average.

conditional_average_effect %>%
  summarize(effect_of_college = mean(effect_of_college))
## # A tibble: 1 × 1
##   effect_of_college
##               <dbl>
## 1             0.410

You can also summarize by taking an average within any subpopulation, such as those defined by parent education.

conditional_average_effect %>%
  group_by(g2_educ) %>%
  summarize(effect_of_college = mean(effect_of_college))
## # A tibble: 4 × 2
##   g2_educ               effect_of_college
##   <fct>                             <dbl>
## 1 Less than high school             0.563
## 2 High school                       0.444
## 3 Some college                      0.385
## 4 College                           0.204

Closing thoughts

While nonparametric adjustment relies only on our causa assumptions, parametric adjustment requires the additional assumptions of a statistical model. By assuming a model, parametric methods share information across observations with different confounder values and thus can answer causal questions in settings where subgroups defined by confounders are very sparsely populated.

Summary video: What we covered today