Skip to content

Latest commit

 

History

History
136 lines (119 loc) · 36.5 KB

README.md

File metadata and controls

136 lines (119 loc) · 36.5 KB

Lecture notes for "Programming for Data Science", "Python for Data Science" and "Python for ML Engineering."

This repository contains lecture notes for classes offered by Shahbaz Chaudhary at the University Of Chicago's Masters in Applied Data Science program.

Class setup

Please follow the instructions below to get your computer ready for this class.

Note Mac users: Once software is downloaded, if you double click to launch it, you may get permission errors. Try to right click on the downloaded software, pick "open" and continue. (Apple is trying to protect you from accidentally starting malware/virus)

Install Python (anaconda distribution)

Please install Python from this website: https://www.anaconda.com/download/ (modern computers are 64 bit so please pick that option)

Mac users: Accept all default prompts

Windows users: Accept all default prompts

Anaconda's distribution of Python is widely used in the industry, particularly among data scientists. This distribution makes it easy to use many libraries and packages for data analysis, building models, visualization, etc.

Once installed, please start jupyter notebook and execute code provided below

  1. Start Anaconda Navigator and click Launch on the panel labeled Jupyter Notebook
  2. Create new notebook from the web interface
  3. Execute this code:
%%timeit
sum(range(1_000_000))
  1. Execute this code:
from psutil import virtual_memory, disk_usage, cpu_count, os

bytes_in_gb = 1024**3

print("Memory:\t",round(virtual_memory().total/bytes_in_gb,4), "Gigabytes")
print("Disk:\t",round(disk_usage(os.path.abspath(os.sep)).total/bytes_in_gb,4), "Gigabytes")
print("CPUs:\t", cpu_count())

Clone this repository

  1. Visit this web page: https://github.com/falconair/ProgrammingForAnalytics
  2. Click "Clone or download" and pick the "Download ZIP" option (unless you already have a GitHub account)

The following steps are optional

Install Git and Git Bash

Please intall Git, a version control sotware, from this website: https://git-scm.com/downloads (you are ok to use default settings)

Note that this is a command-line tool. Once installed, you may not see a new icon to click. We will install a Desktop client to remedy this.

Although we don't make heavy use of version control, you will be introduced to the concept. Installing Git also installs "Git Bash," and comand line environment which simulates Unix/Linux. We will do several exercises which will require this environment.

Additional steps:
  1. Install a Graphical interface to Git from this website: https://desktop.github.com/
  2. [Windows users only] a. type cd (this will take you to your home directory) b. type echo cd >> .profile (this will make sure your home directory is loaded when you start Git Bash)

Install Visual Studio Code

Please install Visual Studio Code from https://code.visualstudio.com/

Additional steps:

Install Python extensions from https://marketplace.visualstudio.com/items?itemName=ms-python.python (visit that page and click "Install")

========

Table of Contents

Module Class Description
Intro to consoles Intro to consoles This lectures introduces the concept of a console, such as dos cmd or mac terminal, to students
Programming vs calculators Programming vs calculator Helps novices understand what features need to be added to a calculator to make it a  fully programming environment
First programs First programs Several examples of small, but full programs which use all common programming constructs and data structures
Intro to Jupyter Intro to Jupyter - not technical Provides hsitorical context for Jupyter
Intro type Jupyter - technical Provides a practioner specific intro to Jupyter
All of Python All of Python - faster basics An overview of Python for computer programmers (multi-week lecture)
All of Python - basics An overview of Python for novice or non-programmers: teaches programming constructs
All of Python - variables and tuples Teaches multiple variable assignment
All of Python - basic functions Introduces functions
All of Python - numbers Overview of numbers and related operations
All of Python - strings Overview of strings and related operations
All of Python - Boolean algebra Dives deeper into the world of comparisons, and/or/not
All of Python - basic plotting General matlab intro (not recommended for novices)
All of Python - dictionaries Introduces Python dictionaries (aka maps, associative arrays)
All of Python - lists Teaches lists
All of Python - comprhensions Teaches list and dictionary comprehensions (useful but intermediate feature)
All of Python - basic classes Introduces classes and the very basics of object oriented programming
All of Python - loops Describes while and for loops
All of Python - conditionals and None Deeper dive into if/else conditions and Python's None type
All of Python - function arguments Deeper dive into functions, including optional parameters
All of Python - lambda functions Introduces anaonymous functions (aka lambda functions)
All of Python - recursive functions Introduces the world of functions themselves
All of Python - regexes A very basic intro to regular expressions
Intro to Numpy Numpy quick start A broad overview of Numpy
Intro to Pandas Pandas - quick start A broad overview of Pandas
Pandas - Series A deeper dive into Pandas Series
Pandas - Dataframes Build up a dataframe using a collection of Series or a Numpy matrix, shows basic functioanality
Pandas - general operations Introduces additional dataframe operations
Pandas -  combining: merge, join, concat Shows how to combine multiple dataframes, similar to SQL joins
Pandas - groupby Show how to break a population into subgroups and find aggregates for those subgroups
Pandas - Index Does a deep dive into Pandas indexes, a topic often not known to casual Pandas users
Pandas - reshape, pivot, melt, stack Shows how to convert columns to rows and back, features similar to Excel's pivot table or cub rollup analysis
Pandas - operations: str, dt, apply Shows how to apply string or date functions to Pandas series
Scikit learn Scikit Learn - method behind the madness Describes Scikit learn's architecture and introduces pipes
Scikit Learn - Run saved models Shows how to connect SKLearn models to the web (very basic)
Secret lives of text files Secret lives of text files Describes encodings (UTF, ASCII), multi-byte characters, special characters such as \n and \t, etc.
How to read technical docs How to read technical docs
Basic computer archtecture Basic computer architecture Provides a broad overview of a CPU, registers, floating points vs integers, disk vs memory speed differences
Python for Analytics First programs First programs Several examples of small, but full programs which use all common programming constructs and data structures
Intro to Jupyter Intro to Jupyter - not technical Provides hsitorical context for Jupyter
Intro type Jupyter - technical Provides a practioner specific intro to Jupyter
All of Python All of Python - faster basics An overview of Python for computer programmers (multi-week lecture)
Intro to Numpy Numpy quick start A broad overview of Numpy
Intro to Pandas Pandas - quick start A broad overview of Pandas
Pandas - Series A deeper dive into Pandas Series
Pandas - Dataframes Build up a dataframe using a collection of Series or a Numpy matrix, shows basic functioanality
Pandas - general operations Introduces additional dataframe operations
Pandas -  combining: merge, join, concat Shows how to combine multiple dataframes, similar to SQL joins
Pandas - groupby Show how to break a population into subgroups and find aggregates for those subgroups
Pandas - Index Does a deep dive into Pandas indexes, a topic often not known to casual Pandas users
Pandas - reshape, pivot, melt, stack Shows how to convert columns to rows and back, features similar to Excel's pivot table or cub rollup analysis
Pandas - operations: str, dt, apply Shows how to apply string or date functions to Pandas series
Scikit learn Scikit Learn - method behind the madness Describes Scikit learn's architecture and introduces pipes
Scikit Learn - Run saved models Shows how to connect SKLearn models to the web (very basic)
Programming for Analytics Programming vs calculators Programming vs calculator Helps novices understand what features need to be added to a calculator to make it a  fully programming environment
First programs First programs Several examples of small, but full programs which use all common programming constructs and data structures
Intro to Jupyter Intro to Jupyter - not technical Provides hsitorical context for Jupyter
Intro type Jupyter - technical Provides a practioner specific intro to Jupyter
All of Python All of Python - basics An overview of Python for novice or non-programmers: teaches programming constructs
Secret lives of text files Secret lives of text files Describes encodings (UTF, ASCII), multi-byte characters, special characters such as \n and \t, etc.
How to read technical docs How to read technical docs
Basic computer archtecture Basic computer architecture Provides a broad overview of a CPU, registers, floating points vs integers, disk vs memory speed differences
Intro to Numpy Numpy quick start A broad overview of Numpy
Intro to Pandas Pandas - quick start A broad overview of Pandas
Lectures on R omitted