-
Notifications
You must be signed in to change notification settings - Fork 15
Optimization
Feluda uses memray for memory profiling. To profile a specific operator, install feluda core requirements.txt
and then
# for a specific operator
$ python3 -m memray run -o vid_vec_rep_resnet.bin vid_vec_rep_resnet.py
# Analyze using flamegraph
$ python3 -m memray flamegraph vid_vec_rep_resnet.bin
Feluda uses pyinstrument for cpu profiling. See known issues with docker code. To profile a specific operator, install feluda core requirements.txt
and then
# for a specific operator
$ pyinstrument -r speedscope -o speedscope_vid_vec_rep_resnet.json vid_vec_rep_resnet.py
# load the json file here to view flamegraph - https://www.speedscope.app/
To profile code using cProfile
. Make the following changes in the code.
In the video vec
operator, add a if __name__ == "__main__":
block.
# import required libraries
import cProfile
import pstats
from io import StringIO
# if block at the end of the operator
if __name__ == "__main__":
file_path = {"path": r"/path/to/video/file"}
initialize(param=None)
profiler = cProfile.Profile()
profiler.enable()
run(file_path)
profiler.disable()
result_stream = StringIO()
stats = pstats.Stats(profiler, stream=result_stream).sort_stats('cumulative')
stats.print_stats()
print(result_stream.getvalue())
The output will be the results of the profiled code. We can also save the output in a txt
file like this
python vid_vec_rep_resnet.py > output.txt
to find how long the run
function takes, we can try a simple approach as well.
import time
start_time = time.time()
run(file_path)
end_time = time.time()
duration = end_time - start_time
The Dockerfiles in src/api-server
and src/indexer
implement multistage builds to reduce image size (from approximately 5 gb each to 1.6 gb each). Since both the server and indexer have the same dependencies, it could be useful to push the first stage of their Docker builds to Dockerhub as a separate image, and then pull that image in the Dockerfiles.
Building the first stage and pushing it to a Dockerhub repository -
cd src/api-server
docker build --target builder -t username/repository:tag .
docker push -t username/repository:tag
And then replacing the following code in both the Dockerfiles -
FROM python:3.7-slim as builder
RUN apt-get update \
&& apt-get -y upgrade \
&& apt-get install -y \
--no-install-recommends gcc build-essential \
--no-install-recommends libgl1-mesa-glx libglib2.0-0 \
# Vim is only for debugging in dev mode. Uncomment in production
vim \
&& apt-get purge -y --auto-remove \
gcc build-essential \
libgl1-mesa-glx libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
COPY requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip install --user -r requirements.txt
with -
FROM username/repository:tag AS builder
Note that this builder image would need to be rebuilt if there is any change in the dependencies.
- Install
docker
- Install
docker-buildx
from your package manager - On every reboot - Register Arm executables to run on x64 machines
The official
binfmt
project is now part oflinuxkit
OLD - https://github.com/docker/binfmt
CURRENT - use the latest version from this docker repo
- https://hub.docker.com/r/linuxkit/binfmt
- https://github.com/linuxkit/linuxkit/tree/master/pkg/binfmt
- https://www.docker.com/blog/introducing-linuxkit-container-os-toolkit/
# Register Arm executables to run on x64 machines
$ docker run --rm --privileged linuxkit/binfmt:68604c81876812ca1c9e2d9f098c28f463713e61-amd64
# To verify the qemu handlers are registered properly, run the following and make sure the first line of the output is “enabled”. Note that the handler registration doesn’t survive a reboot, but could be added to the system start-up scripts.
$ cat /proc/sys/fs/binfmt_misc/qemu-aarch64
- Optimized PyTorch on AWS Graviton (arm64)
-
https://www.youtube.com/watch?v=c1Rl-vCmnT0 See ~@14:40 for AWS Graviton optimization
-
https://github.com/aws/aws-graviton-getting-started/blob/main/machinelearning/pytorch.md
-
In Dockerfile
- Ensure that the Docker base OS is compatible with AWS Graviton
- OPTIONAL - remove hash value for base image - still picked up correct arch using docker buildx
- turn on AWS Graviton optimization
# Graviton3(E) (e.g. c7g, c7gn and Hpc7g instances) supports BF16 format for ML acceleration. This can be enabled in oneDNN by setting the below environment variable grep -q bf16 /proc/cpuinfo && export DNNL_DEFAULT_FPMATH_MODE=BF16 # Enable primitive caching to avoid the redundant primitive allocation # latency overhead. Please note this caching feature increases the # memory footprint. Tune this cache capacity to a lower value to # reduce the additional memory requirement. export LRU_CACHE_CAPACITY=1024 # Enable Transparent huge page allocations from PyTorch C10 allocator export THP_MEM_ALLOC_ENABLE=1 # Make sure the openmp threads are distributed across all the processes for multi process applications to avoid over subscription for the vcpus. For example if there is a single application process, then num_processes should be set to '1' so that all the vcpus are assigned to it with one-to-one mapping to omp threads num_vcpus=$(getconf _NPROCESSORS_ONLN) num_processes=<number of processes> export OMP_NUM_THREADS=$((1 > ($num_vcpus/$num_processes) ? 1 : ($num_vcpus/$num_processes))) export OMP_PROC_BIND=false export OMP_PLACES=cores
-
In
requirements.txt
- Use/Replace find links to - https://download.pytorch.org/whl/cpu
- Remove
torch
version - TODO - try pinned versioning again with above find link
This is required since using
pip-compile
with above settings still downloads GPU version
- Build
# Build
$ sudo docker buildx build --platform linux/arm64 -t image-operator -f Dockerfile.image_vec_rep_resnet .
# Verify
$ sudo docker inspect image-operator | grep 'Architecture'
# sample output
"Architecture": "arm64",
# Verify env vars have been set
$ sudo docker inspect image-operator --format "{{.Config.Env}}"
- Running multi-arch docker images on
x86_64
- https://stackoverflow.com/questions/68675532/how-to-run-arm64-docker-images-on-amd64-host-platform
- Install
qemu-user-static
from package manager
$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
$ sudo docker run --rm image-operator uname -m
$ sudo docker run --platform linux/arm64 -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static -it image-operator
- Use the following settings for running the image via a docker compose file
container_name:
image: <built-arm-image>
platform: linux/arm64
volumes:
- /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static
$ sudo -iu postgres
# create user as superuser with db creation privilege
[postgres]$ createuser -s -d postgres_benchmark
# create db owned by owner
[postgres]$ createdb insert_benchmark -O postgres_benchmark
[postgres]$ exit
# init with empty database
$ pgbench -i -s 50 insert_benchmark -U postgres_benchmark
# run insert command
$ pgbench -c 89 -j 1 -t 1000 -P 5 -f <(echo 'INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (1, 2, 3, 4, current_timestamp)') insert_benchmark