This tutorial is prepared for JuliaCon2020 presentation A deep dive into DataFrames.jl indexing.
This version is updated to DataFrames.jl 1.5.0 release and Julia 1.9.0.
Its focus is on discussing all the details of indexing in DataFrames.jl. If you are interested in introductory tutorials about how to use DataFrames.jl, please check out the list here and in general you might want to have a look at my blog as most of the posts there are somehow related to DataFrames.jl.
The tutorial is split in two parts (there is going to be a short break between them during the conference):
- Introductory: "by example" explanation of indexing and broadcasting rules in DataFrames.jl (file indexing_part1_introduction.ipynb if you want to do exercises by yourself or file indexing_part1_introduction_solutions.ipynb if you prefer just to see the solutions immediately).
- Advanced: detailed discussion how indexing in DataFrames.jl is actually implemented, and what are key design challenges (file: indexing_part2_advanced.ipynb).
All the examples are checked under Julia 1.9.0.
Before you start please make sure that you have all the required packages
installed. The simplest way to do it is to perform the following steps
(I assume you have git
and julia
commands available):
- Open a terminal in a directory where you want to have the tutorial to be stored.
- Run
git clone https://github.com/bkamins/JuliaCon2020-DataFrames-Tutorial.git
to download the repository with the turorial (if you to not havegit
installed then you can download a Zip of this repository from GitHub and unpack it locally). - Run
cd JuliaCon2020-DataFrames-Tutorial
to switch a directory to the one where the tutorial has been downloaded. - On Kaggle go to https://www.kaggle.com/qks1lver/amex-nyse-nasdaq-stock-histories?select=fh_5yrs.csv and download the file fh_5yrs.csv (it is simplest to do it manually), after downloading it you will get a 75752_1304789_compressed_fh_5yrs.csv.zip file (the name of the file might change so please check it), which you should unzip (a detailed instruction is OS dependent so I omit it here) and make sure that fh_5yrs.csv is in the same directory as the tutorial (if you followed my instructions the name of this directory is JuliaCon2020-DataFrames-Tutorial). In order to perform this step you need to be logged in to Kaggle. In case you have problem with this you can download the required file from http://bogumilkaminski.pl/files/juliacon2020_fh_5yrs.zip, but Kaggle access is preferred to give an appropriate recognition to the creator of the file (and download speed also should be better).
- Run
julia --project=. -e "using Pkg; Pkg.instantiate(); Pkg.precompile()"
to make sure you have all the required packages automatically downloaded and precompiled (in case they would be missing; this step is optional and is recommended to be run only the first time you want to run the tutorial to make sure that all the packages are correctly set up before we start working with them) - Run
julia --project=. -e "using IJulia; notebook(dir=pwd())"
to start the Jupyter Notebook in the directory where the tutorials are store (note that in the terminal you do not see a prompt as the Julia process is running now). - Now you can work with the tutorials.
- After you are done please make sure to choose "Quit" option in the top right hand side in the Jupyter Notebook. This will automatically terminate the Julia process in the terminal and prompt will reappear.
It is best if you have the above steps performed before joining the JuliaCon2020 presentation to make sure you can smoothly follow the tutorial.