Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Exercise: Independent investigation

Data science is a creative endeavor. This class activity is an opportunity to explore and think independently. You will

  • Create your own quantitative empirical question
  • Select data to explore that question
  • Produce a visualization to tell a story
  • Present that story to another group in the class

You’ve already learned tools that will be helpful

  • Accessing data through IPUMS
  • Preparing those data using R
  • Producing a visualization with ggplot()

And we are here to help! Flag any of us down along the way.

We will focus on this class activity for several classes, and you will then present your group’s findings to another group. Do not stress about having the best findings—sometimes the most compelling findings are those you least expect. We know your results will leave some questions unanswered—it takes time to do research.

Where do I start?

Pick a few variables that interest you! A great visualization often involves only a few variables. You can pick them by looking through the IPUMS-CPS documentation.

Not sure where to start? Click here for inspiration!

These reports use IPUMS-CPS data. You could use them as inspiration to find a CPS variable of interest to you.

Click here to see how one of these examples comes from IPUMS-CPS variables

This report uses a couple of different variables:

I picked some variables! What next?

  1. Write down a question you’d like to understand using those variables For instance,
    • How did the proportion of families with incomes below $30,000 change from 1962 to 2022? Among those employed as computer programmers (see OCC1990), did the proportion female change over time? How does poverty vary across metropolitan vs rural areas?
  2. Get the data
  3. Prepare your data
    • Using the tidyverse!
    • Remember how we did this in class
    • Remember there might be missing value codes to filter() out
  4. Make a visualization
  5. Think about how you’ll present to another group

We are all excited to learn from each other’s findings!

What should we produce?

You should produce a .PDF via RMarkdown. One person from each group will upload this PDF to the IPUMS Group Investigation discussion board on Canvas.

Your PDF should contain

  • Your names
  • An informative title
  • Clear and readable code
  • A figure you have produced
  • A few sentences explaining and interpreting the figure.

We will present the findings in pairs of groups. Your group will go to the Canvas discussion board and open your post. Then you will present:

  • Findings: Present the figure and interpretation
  • Code walk-through: Walk through each line of code, saying in English what it does

Your paired group will give feedback, and will also learn from you!

FAQs: What is up with gender, sex, and race?

IPUMS-CPS contains sex, coded Male and Female. But people using these data often write about gender gaps between “men” and “women.” This might seem confusing, or even hurtful. The categorization of race might also seem concerning. Let us explain a bit.

Why doesn’t the CPS keep up with the times?

As a long-running study, the CPS seeks to ask questions the same way over time to allow comparisons across years. But as social science understanding of sex and gender grows, we might realize the variables in the data do not match the constructs we want to study.

What is gender?

Gender is a socially-construed categorization that refers to the social, psychological, cultural and behavioral aspects of being a certain gender identity. This includes expected norms, roles, and activities. Gender is distinct from sex, is not binary, and varies from society to society (World Health Organization, Canadian Institutes of Health’s Research). Some common gender categories are: woman, man, non-binary, genderqueer.

What is sex?

Sex is a biological categorization that is assigned at birth based on anatomy, chromosomes, and/or hormones. It is primarily associated with physical and physiological features of humans and animals (Canadian Institutes of Health’s Research). Sex categories are typically female, male, and intersex, but there is variation in the biological attributes that comprise sex, and they can change with or without medical intervention.

If I study sex, how should I describe my results?

If you use the variable ‘sex’ in your research, avoid the terms women and men when describing your observations, but instead use the appropriate labels: female, male, and intersex people.

How could we be more inclusive?

Throughout our research efforts, as well as when writing about our results, it is important to be attentive toward the differences between these two categories, as well as to be critical toward our quantitative variables. Reflect on the implications of describing sex as a binary variable. Which parts of the population are not represented in the data? (Read more in Lindqvist, What is gender, anyway). Lastly, IPUMS-CPS, as many other datasets, does not hold information about gender. How might these datasets improve in the future?

How could I explain this to someone else?

Whenever you have doubts about the difference between gender and sex, a friend of yours does, or you don’t know how to explain it to someone else, ask The Genderbread Person! (more information here)
Infographic explaining the difference between gender identity, gender expression, biological sex, sexual attraction, and romantic attraction.

Doesn’t race have a similar problem?

Yep. Race is also a social construct with definitions that vary across societies, over time, and across interactions. The categories available to respondents have changed over time in the Census and the CPS. And race is multifaceted, so that any categorization might miss important aspects. We must be mindful that the way we do science could reinforce this construct. Despite concerns, disparities across categories like sex and race are important. It is worthwhile to use available data to study these disparities, while recognizing the limits of the measured data.

Video intro to this exercise