Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Example with OLS

This page walks through a sample submission for our PSID Income Prediction Challenge.

There are four steps.

  1. Load data: learning.csv and holdout_public.csv
  2. Learn a prediction function in learning
  3. Predict for new cases in holdout_public
  4. Prepare submission

You should carry out these steps in an R script (not an .Rmd) so that you can submit your R script with your predictions.

1. Load data

The first step is to load the data. For data access, see the previous page.

library(tidyverse)
learning <- read_csv("learning.csv")
holdout_public <- read_csv("holdout_public.csv")

2. Learn a prediction function in learning

Example: OLS with past incomes as predictors

fit <- lm(g3_log_income ~ g1_log_income + g2_log_income,
          data = learning)

3. Predict for new cases in holdout_public

Start with holdout_public and mutate to change the g3_log_income column from its current value of NA to the predicted values from your model.

fitted <- holdout_public %>%
  # Predict using the estimated model
  mutate(g3_log_income = predict(fit, newdata = holdout_public))

4. Prepare submission

Create a data frame with two columns:

  • g3_id (the identifier)
  • g3_log_income (the predicted outcome)
for_submission <- fitted %>%
  select(g3_id, g3_log_income)

Save as a csv and upload to the submission site!

write_csv(for_submission,
          file = "example.csv")

When you upload, you will also upload

  • the .R source code for your submission
  • a brief text narrative telling us what strategy you used

If this were our submission, we would upload this R source code and this .csv file of predictions.

Summary video: What we covered today