This repository can be used to test and develop changes to LightGBM's Dask integration. It contains the following useful features:
make
recipes for building a local development image withlightgbm
installed from a local copy, and Jupyter Lab running for interactive development- Jupyter notebooks for testing
lightgbm.dask
against aLocalCluster
(multi-worker, single-machine) and adask_cloudprovider.aws.FargateCluster
(multi-worker, multi-machine) make
recipes for publishing a custom container image to ECR Public repository, for use with AWS Fargate
Contents
- Getting Started
- Develop in Jupyter
- Test with a LocalCluster
- Test with a FargateCluster
- Run LightGBM unit tests
- Profile LightGBM code
To begin, clone a copy of LightGBM to a folder LightGBM
at the root of this repo.
You can do this however you want, for example:
git clone --recursive [email protected]:microsoft/LightGBM.git LightGBM
If you're developing a reproducible example for an issue or you're testing a potential pull request, you probably want to clone LightGBM from your fork, instead of the main repo.
This section describes how to test a version of LightGBM in Jupyter.
Run the following to build an image that includes lightgbm
, all its dependencies, and a JupyterLab setup.
make notebook-image
The first time you run this, it will take a few minutes as this project needs to build a base image with LightGBM's dependencies and needs to compile the LightGBM C++ library.
Every time after that, make notebook-image
should run very quickly.
Start up Jupyter Lab!
This command will run Jupyter Lab in a container using the image you built with make notebook-image
.
make start-notebook
Navigate to http://127.0.0.1:8888/lab
in your web browser.
The command make start-notebook
mounts your current working directory into the running container.
That means that even though Jupyter Lab is running inside the container, changes that you make in it will be saved on your local filesystem even after you shut the container down.
So you can edit and create notebooks and other code in there with confidence!
When you're done with the notebook, stop the container by running the following from another shell:
make stop-notebook
To test lightgbm.dask
on a LocalCluster
, run the steps in "Develop in Jupyter", then try out local.ipynb
or your own notebooks.
There are some problems with Dask code which only arise in a truly distributed, multi-machine setup.
To test for these sorts of issues, I like to use dask-cloudprovider
.
The steps below describe how to test a local copy of LightGBM on a FargateCluster
from dask-cloudprovider
.
Build an image that can be used for the scheduler and works in the Dask cluster you'll create on AWS Fargate. This image will have your local copy of LightGBM installed in it.
make cluster-image
For the rest of the steps in this section, you'll need access to AWS resources. To begin, install the AWS CLI if you don't already have it.
pip install --upgrade awscli
Next, configure your shell to make authenticated requests to AWS. If you've never done this, you can see the AWS CLI docs.
The rest of this section assums that the shell variables AWS_SECRET_ACCESS_KEY
and AWS_ACCESS_KEY_ID
have been sett.
I like to set these by keeping them in a file
# file: aws.env
AWS_SECRET_ACCESS_KEY=your-key-here
AWS_ACCESS_KEY_ID=your-access-key-id-here
and then sourcing that file
set -o allexport
source aws.env
set +o allexport
To use the cluster image in the containers you spin up on Fargate, it has to be available in a container registry. This project uses the free AWS Elastic Container Registry (ECR) Public. For more information on ECR Public, see the AWS docs.
The command below will create a new repository on ECR Public, store the details of that repository in a file ecr-details.json
, and push the cluster image to it.
The cluster image will not contain your credentials, notebooks, or other local files.
make push-image
This may take a few minutes to complete.
Follow the steps in "Develop in Jupyter" to get a local Jupyter Lab running.
Open aws.ipynb
.
That notebook contains sample code that uses dask-cloudprovider
to provision a Dask cluster on AWS Fargate.
You can view the cluster's current state and its logs by navigating to the Elastic Container Service (ECS) section of the AWS console.
As you work on whatever experiment you're doing, you'll probably find yourself wanting to repeat these steps multiple times.
To remove the image you pushed to ECR Public and the repository you created there, run the following
make delete-repo
Then, repeat the steps above to rebuild your images and test again.
This repo makes it easy to run lightgbm
's Dask unit tests in a containerized setup.
make lightgbm-unit-tests
Pass variable DASK_VERSION
to use a different version of dask
/ distributed
.
make lightgbm-unit-tests \
-e DASK_VERSION=2023.4.0
To try to identify expensive parts of the code path for lightgbm
, you can run its examples under cProfile
(link) and then visualize those profiling results with snakeviz
(link).
make profile
Then navigate to http://0.0.0.0:8080/snakeviz/%2Fprofiling-output
in your web browser.
To summarize memory allocations in typical uses of LightGBM, and to attribute those memory allocations to particular codepaths, you can run its examples under memray
(link).
make profile-memory-usage
That will generate a bunch of HTML files.
View them in your browser by running the following, then navigating to localhost:1234
.
python -m http.server \
--directory ./profiling-output/memory-usage \
1234