Applied Data Science Training
In the SPEC Lab we invest a lot of energy training students in applied data science. In particular, we focus on training students in data management and data visualization using R, an open source statistical software package.
data management I
Exploring and Manipulating Data
This module introduces the tidyverse package and covers how to subset data and create new variables, as well as how to group and arrange observations and summarize information. Functions covered include: select(), filter(), mutate(), summarise(), group_by(), arrange().
organization for collaboration
These trainings are complementary to coursework in statistics and econometrics. While statistics and econometrics courses generally focus on theory and mathematics, we teach the nuts and bolts of statistical computing. We developed these trainings in collaboration with our Pipeline Partners to prepare students not only for academic social science research, but also for data science careers in government, nonprofits, and the private sector.
One of our favorite free resources for learning econometrics is the Econometrics Academy, founded by Dr. Ani Ketchova. Also, Gary King has posted his introductory quantitative methods course (targeted at first-year graduate students). For R resources complementary to these materials, we are big fans of this online textbook, and this list of resources by topic. We also like Andrew Heiss's data visualization course.
The modules below are designed to be completed in order, but some skipping around is definitely possible. Each module contains:
1). A module guide
2). Lecture videos (on Youtube).
3). A walkthrough exercise, which can be completed with an instructor or alone
4). A group-work exercise, which is designed to be completed in small groups but can be completed alone
5). A homework assignment designed to allow students to demonstrate individual mastery.
We also post answer keys for the group-work and homework questions, and the R scripts from the lecture videos.
These materials are a constant work in progress and we welcome feedback at firstname.lastname@example.org. Funding for the creation of these materials was provided by the National Science Foundation, the Dornsife College of Letters, Arts, and Sciences at the University of Southern California, and individual SPEC Lab Donors. To support our continued work to improve and expand these materials, please donate here.
data management IIA
This module covers how to use the append_ids function created by the SPEC Lab to append Gleditsch-Ward country ID numbers to datasets on the basis of country names. This module is narrowly geared toward the management of the type of country-year datasets common in the quantitative study of comparative politics and international relations and may not be useful to all students.