Class exercise: Predicting income mobility
In this exercise, we will explore the degree to which survey respondents’ incomes in adulthood can be predicted given information on their education and socioeconomic characteristics of their parents and grandparents.
This document introduces the data and then your task.
Accessing data
Visit the exercise page on OpenICPSR. You will need to register for an account and agree to terms. Download for_students.zip.
About the data
The Panel Study of Income Dynamics (PSID) began in 1968 with a representative sample of U.S. households. Since then, the study has repeatedly interviewed these individuals and their descendants. The PSID is uniquely positioned to help us answer questions about income mobility over 3 generations: grandparents, parents, and respondents. We will refer to these as generations g1, g2, and g3.
Your task
Your task is to predict respondent’s incomes.
- to build your model, using
learning.csv
- make predictions for the cases in
holdout_public.csv
You can use any statistical or machine learning approach you want. For ideas, see the next pages!
You can also use any or all of the predictors provided in learning.csv
:
- respondent sex
- respondent education
- parent and grandparent education
- parent and grandparent log income
- grandparent’s race
For the purpose of this exercise, we have constructed each log income variable to be the log of mean income across all surveys conducted with the person at age 30–45.
The data also contain identifiers:
- g1_id identifies grandparents (multiple rows have the same grandparent)
- g2_id identifies parents (multiple rows have the same parent)
- g3_id identifies respondents (each row is a unique respondent)
How to submit
You will submit
- your predictions in a .csv file with two columns
g3_id
for each case inholdout_public
g3_log_income
with your prediction for each case
- your .R code file
Submit predictions in this Google form