-
Notifications
You must be signed in to change notification settings - Fork 79
Validation Framework
Validation framework is used to smoke test and validate PyTorch and Domain libraries on both CPU and GPU machines. Linux, Windows and MacOS (x86 and Apple Silicon) are supported. Following are the high level requirements for the validation framework:
- Support Linux, Windows and MacOS using ephemeral runners with only minimal dependencies installed
- Support CPU and GPU runners, with older Nvidia GPU driver to provide for backward compatibility tests (our CI runs on latest Nvidia Drivers)
- Execute on nightly basis
- Surface the result on HUD
- Cover all the Released Domain Libraries
- Follow same instructions as get started page for installation
- Cover nightly, test and release channel
- Use same matrix as PyTorch Core and Nova Project, so that after new build is introduced to PyTorch Core, it should become available for validation
- Smoke test that will cover PyTorch standalone or PyTorch with all Domain Libraries
Validation framework is used in two different ways:
-
Nightly Validation of PyTorch, TorchAudio, TorchVision as one ecosystem. Using same instructions as in get started page. These workflows are implemented in validate-binaries.yml and are used for nightly and release validation by PyTorchDev infra team.
-
Standalone Domain Library validation. Currently implemented for TorchText and TorchRec domain libraries. This is completely customized way of using validation framework and in theory this approach can be used to validate any project within PyTorch organization. Please see how to onboard to Validation Framework if you are interested in start using it.
-
Using validation framework to automate deployment and validation of Pytorch Getting Started page
Onboarding to validation framework is straight forward. You will need to create the following:
- New GitHub action workflow that will run the validation. This workflow should call validate-domain-library.yml workflow.
- New script that will perform installation of your package and smoke testing.
- Optional new GitHUb action workflow that will run validation on nightly basis.
Following is the GitHub action workflow from TorchText repo:
name: Validate binaries
on:
workflow_call:
inputs:
channel:
description: "Channel to use (nightly, test, release, all)"
required: false
type: string
default: release
os:
description: "Operating system to generate for (linux, windows, macos, macos-arm64)"
required: true
type: string
ref:
description: "Reference to checkout, defaults to empty"
default: ""
required: false
type: string
workflow_dispatch:
inputs:
channel:
description: "Channel to use (nightly, test, release, all)"
required: true
type: choice
options:
- release
- nightly
- test
- all
os:
description: "Operating system to generate for (linux, windows, macos)"
required: true
type: choice
default: all
options:
- windows
- linux
- macos
- all
ref:
description: "Reference to checkout, defaults to empty"
default: ""
required: false
type: string
jobs:
validate-binaries:
uses: pytorch/test-infra/.github/workflows/validate-domain-library.yml@main
with:
package_type: "conda,wheel"
os: ${{ inputs.os }}
channel: ${{ inputs.channel }}
repository: "pytorch/text"
smoke_test: "source ./.github/scripts/validate_binaries.sh"
install_torch: true
Following inputs are currently supported.
package_type: This is package type that you intend to test. We support following package types: conda, wheel, libtorch, all
os: Operating System to run tests on. We support following: linux, windows, macos, macos-arm64
channel: Channel to use nightly, test, release
repository: Which repository you are calling validate workflow from
smoke_test: Script that should install the binary and perform validation. This is your local bash shell script.
with_cuda: enable/disable This is optional parameter to include cuda builds
install_torch: true/false . If set will preinstall torch based on matrix entry
Following is validate_binaries.sh from TorchText repo:
if [[ ${MATRIX_PACKAGE_TYPE} = "conda" ]]; then
conda install -y torchtext -c ${PYTORCH_CONDA_CHANNEL}
else
pip install ${PYTORCH_PIP_PREFIX} torchtext --extra-index-url ${PYTORCH_PIP_DOWNLOAD_URL}
fi
python ./test/smoke_tests/smoke_tests.py
Finally if you want to run the workflow on nightly basis add validate-nightly-binaries.yml workflow.
# Scheduled validation of the nightly binaries
name: cron
on:
schedule:
# At 5:30 pm UTC (7:30 am PDT)
- cron: "30 17 * * *"
# Have the ability to trigger this job manually through the API
workflow_dispatch:
push:
branches:
- main
paths:
- .github/workflows/validate-nightly-binaries.yml
- .github/workflows/validate-binaries.yml
- .github/scripts/validate-binaries.sh
pull_request:
paths:
- .github/workflows/validate-nightly-binaries.yml
- .github/workflows/validate-binaries.yml
- .github/scripts/validate-binaries.sh
jobs:
nightly:
uses: ./.github/workflows/validate-binaries.yml
with:
channel: nightly
os: all
Additionally you can refer to TorchRec repo for additional onboarding examples. validate-binaries.yml, validate_binaries.sh and validate-nightly-binaries.yml
Operating System | Type | GPUs | GPU Memory (GB) | vCPU | Memory (GiB) | details |
---|---|---|---|---|---|---|
Linux CPU | c5.2xlarge | NA | NA | 8 | 16 | Docker and CentOS 7 |
Linux GPU | g3.4xlarge | 1 Tesla M60 | 8 | 16 | 122 | Docker and CentOS 7 |
Windows CPU | c5d.4xlarge | NA | NA | 16 | 32 | Windows 2019 |
Windows GPU | p3.2xlarge | 1 Tesla V100 | 16 | 8 | 61 | Windows 2019 |
MacOS x86 | NA | NA | NA | 3 | 14 | MacOS 12.6.2 |
MacOS arm64 | mac2.metal | NA | NA | 8 | 16 | MacOS 12.4 |
Binary build matrix contains current configuration that is supported by PyTorch core and domain libraries. Binaries build matrix is generated using the following workflow: generate_binary_build_matrix.yml. For additional details refer to documentation here
Currently Following CUDA and Python configurations are supported:
CUDA | CUDNN | additional details |
---|---|---|
11.6 | 8.5.0.96 | Stable CUDA Release |
11.7 | 8.5.0.96 | Latest CUDA Release |
11.8 | 8.5.0.96 | CUDA Release Supported on nightly |
Python versions | Package details |
---|---|
3.7-3.10 | Supported on Conda and Pip |
3.11 | Supported on Pip only |
The output of the Generate workflow workflow is a JSON array of entires which contain basic information needed to install and test the package. Following is a sample output:
{
"python_version": "3.7",
"gpu_arch_type": "cuda",
"gpu_arch_version": "11.7",
"desired_cuda": "cu117",
"container_image": "pytorch/manylinux-builder:cuda11.7",
"package_type": "wheel",
"build_name": "wheel-py3_7-cuda11_7",
"validation_runner": "windows.8xlarge.nvidia.gpu",
"installation": "pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu117",
"channel": "nightly",
"upload_to_base_bucket": "no",
"stable_version": "1.13.1"
}
We have setup following workflows that run nightly and release validation and executed on a daily basis. Both of these workflows call validate binary workflow that does the validation using the full supported binary matrix. The validate binary workflow can also be triggered manually on-demand from Github Action page
During the validation we use installation instructions provided by binary matrix. The validation is executed for PyTorch, TorchVision and TorchAudio and following checks are performed:
- For nightly validation check that date on the packages is within Nightly Allowed delta(3 days)
- CUDA version matches for all 3 packages and matches with desired_cuda version from the matrix
- Smoke test all 3 packages.
- If is CUDA enabled, run some CUDA smoke tests.
- Run additional check binary script for for Linux packages
The results of the tests are be visualized on the HUD