data-512-a1
│ README.md
│ LICENSE
│ .gitignore
| Page_views_on_english_wikipedia.png
| hcds-a1-data-curation.ipynb
|
└───data_clean
│ │ en-wikipedia_traffic_firstyearmonth-lastyearmonth.csv
|
└───data_raw
│ apiname_accesstype_firstmonth-lastmonth.json
│ ...
To aquire, process and analyze a dataset of monthly traffic on English Wikipedia from January 1st, 2008 through September 1st, 2021.
The project pulls data from Wikimedia's API. We leverage two endpoints:
-
Legacy Pagecounts API (documentation, endpoint)
-
PageView API (documentation, endpoint)
Column | Value |
---|---|
year | YYYY |
month | MM |
pagecount_all_views | num_views |
pagecount_desktop_views | num_views |
pagecount_mobile_views | num_views |
pageview_all_views | num_views |
pageview_desktop_views | num_views |
pageview_mobile_views | num_views |
The .JSON naming covention used is apiname_accesstype_firstmonth-lastmonth.json
The .CSV naming convention used is en-wikipedia_traffic_firstyearmonth-lastyearmonth.csv
The Pageview API excludes spiders/crawlers data, it only takes user agent data, the data from Pagecounts API do not. All mobile data is unavailable until October 2014. Previous data includes only desktop and main site views.