Skip to content

Latest commit

 

History

History
90 lines (63 loc) · 2.75 KB

Readme.md

File metadata and controls

90 lines (63 loc) · 2.75 KB

ofed driver

refer to nvidia network operator ofed-driver-ds.yaml and values.yaml

the pod builds the OFED driver from the source and install some online package. Once the pod is ready, the OFED driver is installed

release chart

the chart version is composed of '{x of driverImage }-{y of driverImage }-{custom}'

for example of 24.04.0, 24.04 is the default OFED image verison

tag the code and the CI will automatically release a chart

git tag ofe-driver-vXX.YY.ZZ 
git push --tags

deploy

Image Tag

the following helm options decide the image tag, which has a format {driverVersion}-${OSName}${OSVer}-${Arch}

image.driverVersion="24.04-0.6.6.0-0"

image.OSName="ubuntu"

image.OSVer="22.04"

image.Arch="amd64"

refer to nvidia available image tag

for example:

  • 24.04-0.6.6.0-0-ubuntu20.04-amd64
  • 24.04-0.6.6.0-0-ubuntu22.04-amd64
  • 24.04-0.6.6.0-0-ubuntu24.04-amd64

install

helm repo add spiderchart https://spidernet-io.github.io/charts
helm repo update
helm search repo ofed-driver

# for China user, add `--set image.registry=nvcr.m.daocloud.io`
helm install ofed-driver spiderchart/ofed-driver -n kube-system \
    --set image.OSName="ubuntu" \
    --set image.OSVer="22.04" \
    --set image.Arch="amd64"

note: the pod will run apt-get to install something online , you could use proxy as following

cat<<EOF > values.yaml
image:
  OSName: "ubuntu"
  OSVer: "22.04"
  Arch: "amd64"

extraEnv:
  - name: HTTPS_PROXY 
    value: "http://<example.proxy.com:port>"
  - name: HTTP_PROXY
    value: "http://<example.proxy.com:port>"
  - name: https_proxy
    value: "http://<example.proxy.com:port>"
  - name: http_proxy
    value: "http://<example.proxy.com:port>"
EOF

helm install ofed-driver spiderchart/ofed-driver -n kube-system -f values.yaml

# when the pod is ready, the OFED driver is ready
kubectl get pod -n kube-system 
    kube-system      mofed-ubuntu-24.04-ds-lsprx                                       0/1     Running            0          3m54s

when the driver is ready, mlx5_core module could be found on the node

~# lsmod | grep -i mlx5_core
mlx5_core            2068480  1 mlx5_ib

refer nvidia doc and enviroment config for more details