A guide to computational social science resources

A collection of resources and readings for people wanting to get acquainted with computational social science.

Credit goes to Andrew Hall for suggesting at a Data Science Nights@Northwestern event that we collect and share resources that we found useful or interesting.

Contributions and suggestions welcome.

Syllabi

Perspectives on Computational Analysis Syllabus
Computational Social Science, syllabus by Nir Grinberg, Ben-Gurion University
Very high-level view of what makes up "data science:" Curriculum Guidelines for Undergraduate Programs in Data Science

Training

Learn R in R: the swirl package.
Fast lane to learning R by Norman Matloff (professor of Computer Science at UC Davis). The course is quite thorough regarding base R, including graphics (ggplot2 is covered as well). NM is a proponent of learning base R first before learning third-party packages and I tend to agree.

This site is for those who know nothing of R, or maybe even nothing of programming, and seek QUICK, painless entree to the world of R.
R for Data Science by Garret Grolemund and Hadley Wickham. The authors are important originators of/contributors to the so-called "tidyverse", a collection of packages for R. These packages tend make things easier (especially for automated workflows). However, starting out with the "tidyverse" when learning R is, in my opinion, a bit like learning to run before learning to walk.
Starting from zero, Data Carpentry workshop. These resources are intended for in-person workshops but can be used by self-learners.

This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
Data Science Course in a Box (Course materials) by Mine Cetinkaya-Rundel for RStudio. Primarily intended for teachers but might be valuable for self-learners too. Self-presentation:

Data Science in a Box contains the materials required to teach (or learn from) an introductory data science course using R, all of which are freely-available and open-source. They include course materials such as slide decks, homework assignments, guided labs, sample exams, a final project assignment, as well as materials for instructors such as pedagogical tips, information on computing infrastructure, technology stack, and course logistics.

See datasciencebox.org for everything you need to know about the project!
R for Stata users, for people coming from Stata and wanting to learn R. An earlier draft is available for free. This book is structured somewhat similarly to the O'Reilly Cookbooks, i.e. it is a laundry list of problems or situations for which solutions are given in both Stata and R. If your particular problem is among those covered, great! If not, you won't get around learning the basics of R and translating Stata logic into R logic yourself.
Chromebook Data Science project

Chromebook Data Science (CBDS) is an online educational program to help anyone who can read, write, and use a computer to move into data science.> It is offered by faculty members in the Johns Hopkins Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health. There are currently 12 courses that are offered in the Chromebook Data Science Curriculum.
UK Data Service Data Skills Modules

These introductory level interactive modules are designed for users who want to get to grips with keys aspects of survey, longitudinal and aggregate data.
The BBC's visual and data journalism cookbook for R, Blog post announcing and explaining the launch of the BBC's visual and data journalism cookbook in R
SciPy Lecture Notes

Tutorials on the scientific Python ecosystem: a quick introduction to central tools and techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level of expertise, from beginner to expert.

Readings

Practical advice

"How to name files," Jenny Bryan's speaker deck

Version control in R:

"starting R markdown,", a YouTube tutorial playlist by Danielle Navarro

General

Matthew Sagalnik, Bit by bit (free version)

Matthew Sagalnik, Bit by bit (tree version)

Bernard E. Harcourt, Against Prediction (tree version) Summary: Against Prediction argues that predictive policing models not “crime” but “arrests”, i.e. it models how police react to an unobserved (for the model) behavior instead of modeling the unobserved behavior directly. In other words, it does not model what crimes will happen where, but who will be arrested. Therefore, it will reinforce existing trends in policing instead of “improving” policing.

Bernard E. Harcourt, Against Prediction (working paper)

Bernard E. Harcourt, Against Prediction (review by Cosma Shalizi)

Thoughts on algorithmic fairness

Algorithmic fairness is an interdisciplinary research field concerned with the various ways that algorithms may perpetuate or reinforce unfair legacies of our history, and how we might modify the alorithms or systems they are used in to prevent this. For example, if the training data used in a machin learning methods contains patterns caused by things like racism, sexism, ableism, or other types of injustice, then the model may learn those patterns and use them to make predictions and decisions that are unfair. There are many ways that technology can have unintended consequences, and this is just one of them.

Data sets and sources

Inter-university Consortium for Political and Social research A data repository for mostly survey data. North America-centric.

Cross-national equivalent file The Cross-National Equivalent File (CNEF) project harmonizes a subset of the data found on seven panel data sets collected in Australia, Canada, China, Germany, Korea, Russia, Switzerland, UK, and US.

Urban Institute Data Catalogue, UIDC announcement and short presentation

Google tool for finding datasets

Cook County Open Data portal

Cook County Open Data - State Attorney (e.g. arrest data)

Wesleyan Media Project

The Wesleyan Media Project tracks and analyzes all broadcast advertisements aired by or on behalf of federal and state election candidates in every media market in the country.

The Stanford Open Policing Project

Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country.

The @unitedstates project Scrapers and parsers for many aspects regarding Congress, e.g. bios of members past and present, data about bills and roll call votes, district shapefiles, and much more.

Congressional record parser

This tool converts HTML files containing the text of the Congressional Record into structured text data. It is particularly useful for identifying speeches by members of Congress.

Pew Research survey data

People, groups, hashtags

List of sociologists on twitter, by Philip N. Cohen

List of demographers on twitter, by Conrad Hacket

List of demographers on twitter, by Cameron Campbell

#rladies

#rstats

R Animated Gifs

The Data Science job market is saturated

Cheatsheet - Neural networks/maching learning

Similar collections

R for the rest of use, Resources

R resources collection, NU Research Computing Services

Python resources collection, NU Research Computing Services

OpenIntro, free textbooks

DataCamp

Following recent events at DataCamp (see here, here, here, here), this guide prefers to recommend other resources. The course offer is, however, comprehensive, and university students may benefit from special offers.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A guide to computational social science resources

Syllabi

Training

Readings

Practical advice

General

Methods

Ethics

Neural Networks

Data sets and sources

People, groups, hashtags

Similar collections

DataCamp

About

Releases

Packages

License

ZHU-SK/comsocsci

Folders and files

Latest commit

History

Repository files navigation

A guide to computational social science resources

Syllabi

Training

Readings

Practical advice

General

Methods

Ethics

Neural Networks

Data sets and sources

People, groups, hashtags

Similar collections

DataCamp

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages