Skip to content

Latest commit

 

History

History
56 lines (43 loc) · 3.01 KB

hw_1.md

File metadata and controls

56 lines (43 loc) · 3.01 KB

Homework 1

General assignment information.{% if id == "columbia" %} Note that this isn't a template notebook, hence there's no 🚀 above. You will create a blank notebook for this one.{% endif %}

Tutorials

Coding

You'll complete this assignment using pandas. Steps:

  1. Find a dataset.
    • It must have:
      • At least one numeric column
      • Between one thousand and one million rows
    • Don't spend too long on this step.
  2. If there's more than one numeric column, pick one.
  3. Create a new notebook.
  4. Read in the data.
  5. Compute:
    • The mean
    • The median
    • The mode
  6. Do a groupby() with an aggregation.

Now turn in the assignment.

Tutorials, continued

  1. Read The Joys (and Woes) of the Craft of Software Engineering
    • Note not everything in there is applicable to data analysis
  2. Filtering/indexing DataFrames
  3. Learn about functions
  4. Coding Style Guides - Please skim these; I don't expect you to understand and follow everything in them. The most important guidelines to pay attention to are indentation and keeping each statement on its own line.
  5. Guide to commenting your code
  6. Quartz Guide to Bad Data

Optional

Participation

Reminder about the between-class participation requirement.