Skip to content

jasonlong/benfords-law

Repository files navigation

Testing Benford's Law

This is a simple experiment by Jason Long and Bryce Thornton to test how many real-life, publicly available datasets satisfy Benford's Law.

Contributing Datasets

If you find this to be an interesting idea, we'd encourage you to help add more datasets to the site. We've intentionally kept the site as simple and lightweight as possible. There is no real backend - the data has been crunched in advance and the results are simply entered into JSON files.

To contribute a new dataset, you'll need to do two things:

Add the dataset name to the JSON index file

The format of js/datasets/index.json is simply a key/value pair:

{
  "twitter-users-by-followers-count": "Twitter users by followers count",
  "distance-of-stars-from-earth-in-light-years": "Distance of stars from Earth in light years",
  "loan-amounts-on-kiva-org": "Loan amounts on kiva.org",
  "total-number-of-print-materials-in-us-libraries": "Total number of print materials in US libraries",
  "population-of-spanish-cities": "Population of Spanish cities"
}

Create a dataset JSON file

Add your new file in the /js/datasets/ directory with a name that matches the key used in step 1. The format looks like this:

{
	"values": {
		"1": 32.62,
		"2": 16.66,
		"3": 11.80,
		"4": 9.26,
  		"5": 7.63,
		  "6": 6.55,
		"7": 5.76,
		"8": 5.14,
		"9": 4.56
	},
	"num_records": "38,670,514",
	"min_value": "1",
	"max_value": "4,706,631",
	"source": "http://www.infochimps.com/datasets/twitter-census-twitter-users-by-friends-count"
}

It's important to include the source of the data used so that others can verify and reproduce the results.

Crunching the data

Generating Benford stats is a fairly straightforward process. We've made a simple ruby class for you to use if you'd like.

First, grab a copy of the class from here: https://gist.github.com/1044174

Second, include the class in your script like so:

require 'benford_counter'
require 'rubygems'
require 'csv'

counter = BenfordCounter.new

CSV.foreach("spain.txt") do |row|
  counter.count(row[9])
end

counter.results

Additional Tools

fweez contributed the Linux filesize dataset and created a Python script for tallying filesizes in a directory.

Updating Javascript and CSS

We're using CoffeeScript for the Javascript and Sass/Compass for the CSS.

Once CoffeeScript is installed (see the CoffeeScript docs), run this command from the project root to observe and compile changes:

coffee --watch -o js/ --compile js/coffee/*.coffee 

Note that the only file that should be edited is /js/coffee/app.coffee. The /js/coffee/app.js file is generated by CoffeeScript.

To make changes to the CSS, you need to install Sass and Compass (see the Compass docs. Then edit /css/sass/screen.scss. You observe and compile changes by running this command from the project root:

compass watch

To compile a production-ready compressed version:

compass compile --output-style compressed --force