Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Jan 30. Exercise: Income inequality by education

Worked Example

This exercise examines how income inequality has changed over time for American workers with and without college degrees.

Learning goals for today include

  • Create a ggplot with your table group
  • Distinguish trends using color and facet
  • Communicate the result in a few sentences

We will analyze byCollege.csv, which is described in (2) below.

1. Coordinate your group

Our exercise today will involve coding as well as writing interpretations in English. You should use R to randomize these roles. On one of your computers, first create a vector with your names.

names <- c("Name 1","Name 2","Name 3","Name 4")

Then, use the sample function to assign two names to the object chosen. You can learn about sample by typing ?sample in the console.

chosen <- sample(x = names, size = 2)

Look at the chosen object. The first person will be the coder, and the second person will be the writer.

2. Coding to visualize the data

Everyone should help with these, but the coder will be the one typing into a .R source file.

The first step (as always) is to prepare your R environment by loading packages and the data.

library(tidyverse)
byCollege <- read_csv("https://info3370.github.io/sp23/assets/data/byCollege.csv")

Now explore the data a bit.

  • How many variables are there?
  • What class is each variable?

Aside. We want to tell you a bit about how we created this data file. You will do things like this in the future.

We used the Current Population Survey Annual Social and Economic Supplement for 1962–2022, accessed through the Integrated Public Use Microdata Series (IPUMS). In each year, we restricted to people employed 50+ weeks in the past year. For each worker, we focused on their reported wage and salary income for the entire year. We summarized those incomes by the 10th, 50th, and 90th percentile of the distribution over all workers (weighted by survey weights).

We did those things so you can jump straight to analysis.

Analysis: Visualize the data

Create a ggplot to show how income changed over time.

  • The x variable is year
  • The y variable is income

There are many trends here! In fact, there are 6 trends for every subgroup defined by the variables

  • education (College Degree vs Less than College)
  • quantity (10th, 50th, and 90th percentiles)

On Friday, we had only one trend. Today, we will extend our ggplot skills to visualize many trends at once. There are two key tools for doing this.

Color. Add the color aesthetic to your ggplot call. For examples, see R4DS 3.3. You could either use color = education or color = quantity.

Facets. We can also distinguish trends by separating our plot into many panels, called facets. See R4DS 3.5.

Using some combination of color and facets, visualize

  • how the 10th, 50th, and 90th percentiles of the income distribution changed over time
  • for those with and without a college degree

3. Interpretation to communicate the result

Discuss what you see. Where did change occur over time?

Now it is the writer’s turn. As your group discusses, come to a consensus about how you would explain what a reader should take away from your figure. What information is shown to them? Write a few sentences.

Important: Save your code and your writing!

In our next class meeting, we will take what you did in this class and turn it into a more formal and reproducible report.

Summary video: What we covered today