A collection of resources and readings for people wanting to get acquainted with computational social science.
Credit goes to Andrew Hall for suggesting at a Data Science Nights@Northwestern event that we collect and share resources that we found useful or interesting.
Contributions and suggestions welcome.
- Perspectives on Computational Analysis Syllabus
- Computational Social Science, syllabus by Nir Grinberg, Ben-Gurion University
- Very high-level view of what makes up "data science:" Curriculum Guidelines for Undergraduate Programs in Data Science
-
Learn R in R: the
swirl
package. -
Fast lane to learning R by Norman Matloff (professor of Computer Science at UC Davis). The course is quite thorough regarding base R, including graphics (ggplot2 is covered as well). NM is a proponent of learning base R first before learning third-party packages and I tend to agree.
This site is for those who know nothing of R, or maybe even nothing of programming, and seek QUICK, painless entree to the world of R.
-
R for Data Science by Garret Grolemund and Hadley Wickham. The authors are important originators of/contributors to the so-called "tidyverse", a collection of packages for R. These packages tend make things easier (especially for automated workflows). However, starting out with the "tidyverse" when learning R is, in my opinion, a bit like learning to run before learning to walk.
-
Starting from zero, Data Carpentry workshop. These resources are intended for in-person workshops but can be used by self-learners.
This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
-
Data Science Course in a Box (Course materials) by Mine Cetinkaya-Rundel for RStudio. Primarily intended for teachers but might be valuable for self-learners too. Self-presentation:
Data Science in a Box contains the materials required to teach (or learn from) an introductory data science course using R, all of which are freely-available and open-source. They include course materials such as slide decks, homework assignments, guided labs, sample exams, a final project assignment, as well as materials for instructors such as pedagogical tips, information on computing infrastructure, technology stack, and course logistics.
See datasciencebox.org for everything you need to know about the project!
-
R for Stata users, for people coming from Stata and wanting to learn R. An earlier draft is available for free. This book is structured somewhat similarly to the O'Reilly Cookbooks, i.e. it is a laundry list of problems or situations for which solutions are given in both Stata and R. If your particular problem is among those covered, great! If not, you won't get around learning the basics of R and translating Stata logic into R logic yourself.
-
Chromebook Data Science project
Chromebook Data Science (CBDS) is an online educational program to help anyone who can read, write, and use a computer to move into data science.> It is offered by faculty members in the Johns Hopkins Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health. There are currently 12 courses that are offered in the Chromebook Data Science Curriculum.
-
UK Data Service Data Skills Modules
These introductory level interactive modules are designed for users who want to get to grips with keys aspects of survey, longitudinal and aggregate data.
-
The BBC's visual and data journalism cookbook for R, Blog post announcing and explaining the launch of the BBC's visual and data journalism cookbook in R
-
Tutorials on the scientific Python ecosystem: a quick introduction to central tools and techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level of expertise, from beginner to expert.
"How to name files," Jenny Bryan's speaker deck
Version control in R:
"starting R markdown,", a YouTube tutorial playlist by Danielle Navarro
Matthew Sagalnik, Bit by bit (free version)
Matthew Sagalnik, Bit by bit (tree version)
Bernard E. Harcourt, Against Prediction (tree version) Summary: Against Prediction argues that predictive policing models not “crime” but “arrests”, i.e. it models how police react to an unobserved (for the model) behavior instead of modeling the unobserved behavior directly. In other words, it does not model what crimes will happen where, but who will be arrested. Therefore, it will reinforce existing trends in policing instead of “improving” policing.
Bernard E. Harcourt, Against Prediction (working paper)
Bernard E. Harcourt, Against Prediction (review by Cosma Shalizi)
Thoughts on algorithmic fairness
Algorithmic fairness is an interdisciplinary research field concerned with the various ways that algorithms may perpetuate or reinforce unfair legacies of our history, and how we might modify the alorithms or systems they are used in to prevent this. For example, if the training data used in a machin learning methods contains patterns caused by things like racism, sexism, ableism, or other types of injustice, then the model may learn those patterns and use them to make predictions and decisions that are unfair. There are many ways that technology can have unintended consequences, and this is just one of them.
Claus Witte, Fundamentals of Data Visualization
Claus Witte, Fundamentals of Data Visualization (R markdown source)
Kieran J. Healy, Data Visualization - A practical introduction
Garret Grolemund, Hadley Wickham, R for Data Science
Quartz guide to real world data munging problems
Data cleaning in R needn't be hard - presentation materials by Crystal Lewis
Fairness definitions and their politics: presentation by Arvind Narayanan, video, presentation by Arvind Narayanan, text, article by Verma and Rubin, pdf
Notes from @DynamicWebPaige on Google's ML Fairness course
Inter-university Consortium for Political and Social research A data repository for mostly survey data. North America-centric.
Cross-national equivalent file The Cross-National Equivalent File (CNEF) project harmonizes a subset of the data found on seven panel data sets collected in Australia, Canada, China, Germany, Korea, Russia, Switzerland, UK, and US.
Urban Institute Data Catalogue, UIDC announcement and short presentation
Google tool for finding datasets
Cook County Open Data - State Attorney (e.g. arrest data)
The Wesleyan Media Project tracks and analyzes all broadcast advertisements aired by or on behalf of federal and state election candidates in every media market in the country.
The Stanford Open Policing Project
Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country.
The @unitedstates project Scrapers and parsers for many aspects regarding Congress, e.g. bios of members past and present, data about bills and roll call votes, district shapefiles, and much more.
This tool converts HTML files containing the text of the Congressional Record into structured text data. It is particularly useful for identifying speeches by members of Congress.
List of sociologists on twitter, by Philip N. Cohen
List of demographers on twitter, by Conrad Hacket
List of demographers on twitter, by Cameron Campbell
#rladies
#rstats
The Data Science job market is saturated
Cheatsheet - Neural networks/maching learning
R for the rest of use, Resources
R resources collection, NU Research Computing Services
Python resources collection, NU Research Computing Services
Following recent events at DataCamp (see here, here, here, here), this guide prefers to recommend other resources. The course offer is, however, comprehensive, and university students may benefit from special offers.