Example with OLS
This page walks through a sample submission for our PSID Income Prediction Challenge.
There are four steps.
- Load data:
learning.csv
andholdout_public.csv
- Learn a prediction function in learning
- Predict for new cases in holdout_public
- Prepare submission
You should carry out these steps in an R script (not an .Rmd) so that you can submit your R script with your predictions.
1. Load data
The first step is to load the data. For data access, see the previous page.
library(tidyverse)
learning <- read_csv("learning.csv")
holdout_public <- read_csv("holdout_public.csv")
2. Learn a prediction function in learning
Example: OLS with past incomes as predictors
fit <- lm(g3_log_income ~ g1_log_income + g2_log_income,
data = learning)
3. Predict for new cases in holdout_public
Start with holdout_public
and mutate to change the g3_log_income
column from its current value of NA
to the predicted values from your model.
fitted <- holdout_public %>%
# Predict using the estimated model
mutate(g3_log_income = predict(fit, newdata = holdout_public))
4. Prepare submission
Create a data frame with two columns:
g3_id
(the identifier)g3_log_income
(the predicted outcome)
for_submission <- fitted %>%
select(g3_id, g3_log_income)
Save as a csv and upload to the submission site!
write_csv(for_submission,
file = "example.csv")
When you upload, you will also upload
- the .R source code for your submission
- a brief text narrative telling us what strategy you used
If this were our submission, we would upload this R source code and this .csv file of predictions.