Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add orchestrator type specific spec for pods created by the operator #201

Merged
merged 7 commits into from
Nov 4, 2024

Conversation

shivamerla
Copy link
Collaborator

For e.g. seccompprofile is a must for TKGS while not supported on OCP with the nonroot SCC

slu2011
slu2011 previously approved these changes Oct 30, 2024
@shivamerla shivamerla marked this pull request as draft October 31, 2024 17:39
@shivamerla shivamerla marked this pull request as ready for review October 31, 2024 18:50
@shivamerla
Copy link
Collaborator Author

Error on TKGS if Seccomp Profile is not set

Warning ReconcileFailed 2s (x13 over 23s) nimcache-controller NIMCache nimcache1 reconcile failed, msg: pods "nimcache1-pod" is forbidden: violates PodSecurity "restricted:latest": seccompProfile (pod or container "nim-cache" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

However OCP doesn't like it.

ReconcileFailed 1s (x9 over 2s) nimcache-controller NIMCache meta-llama3-8b-instruct-a100-tp1 reconcile failed, msg: pods "meta-llama3-8b-instruct-a100-tp1-pod" is forbidden: unable to validate against any security context constraint: [pod.metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: Forbidden: seccomp may not be set, pod.metadata.annotations[container.seccomp.security.alpha.kubernetes.io/nim-cache-ctr]: Forbidden: seccomp may not be set,

Copy link
Collaborator

@visheshtanksale visheshtanksale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments

internal/controller/nimcache_controller.go Outdated Show resolved Hide resolved
internal/controller/nimservice_controller.go Outdated Show resolved Hide resolved
internal/controller/nemo_guardrail_controller.go Outdated Show resolved Hide resolved
internal/controller/nimcache_controller.go Outdated Show resolved Hide resolved
slu2011
slu2011 previously approved these changes Nov 4, 2024
shivamerla and others added 6 commits November 4, 2024 09:49
For e.g. seccompprofile is a must for TKGS while not supported on OCP with the nonroot SCC

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Shiva Krishna, Merla <[email protected]>
@shivamerla shivamerla merged commit a0ecace into NVIDIA:main Nov 4, 2024
9 checks passed
visheshtanksale added a commit to visheshtanksale/k8s-nim-operator that referenced this pull request Nov 6, 2024
…VIDIA#201)

* Add orchestrator type specific spec for pods created by the operator

For e.g. seccompprofile is a must for TKGS while not supported on OCP with the nonroot SCC

Signed-off-by: Shiva Krishna, Merla <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Co-authored-by: Vishesh Tanksale <[email protected]>
visheshtanksale added a commit to visheshtanksale/k8s-nim-operator that referenced this pull request Nov 6, 2024
…VIDIA#201)

* Add orchestrator type specific spec for pods created by the operator

For e.g. seccompprofile is a must for TKGS while not supported on OCP with the nonroot SCC

Signed-off-by: Shiva Krishna, Merla <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Co-authored-by: Vishesh Tanksale <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>
visheshtanksale added a commit to visheshtanksale/k8s-nim-operator that referenced this pull request Nov 6, 2024
…VIDIA#201)

* Add orchestrator type specific spec for pods created by the operator

For e.g. seccompprofile is a must for TKGS while not supported on OCP with the nonroot SCC

Signed-off-by: Shiva Krishna, Merla <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Co-authored-by: Vishesh Tanksale <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>
visheshtanksale added a commit to visheshtanksale/k8s-nim-operator that referenced this pull request Nov 6, 2024
…VIDIA#201)

* Add orchestrator type specific spec for pods created by the operator

For e.g. seccompprofile is a must for TKGS while not supported on OCP with the nonroot SCC

Signed-off-by: Shiva Krishna, Merla <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Co-authored-by: Vishesh Tanksale <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>
shivamerla added a commit that referenced this pull request Nov 6, 2024
* Update service type based on the CR spec (#193)

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Allow users to specify runtimeclass name for NIM deployments (#194)

* Allow users to specify runtimeclass name for NIM deployments

Signed-off-by: Shiva Krishna, Merla <[email protected]>

* Avoid runtimeclass name for caching as model can be downloaded on a non-gpu node

Signed-off-by: Shiva Krishna, Merla <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Remove deprecated rbac proxy container for metrics (#192)

* Deprecate kube-rbac-proxy and use controller-runtimes in-built feature for authn/authz for metrics
Reference: https://book.kubebuilder.io/reference/metrics

Signed-off-by: Shiva Krishna, Merla <[email protected]>

* vendor dependencies

Signed-off-by: Shiva Krishna, Merla <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Add utility package to detect underlying container orchestrator type (#197)

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Add common metadata required for RedHat certification (#196)

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Refactor custom CA cert injection (#204)

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Allow custom env to be setup for caching jobs (#203)

This is useful in following cases
* To provide proxy env variables https_proxy etc, for caching
* For custom model pullers, these can control the behavior of caching job

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Add orchestrator type specific spec for pods created by the operator (#201)

* Add orchestrator type specific spec for pods created by the operator

For e.g. seccompprofile is a must for TKGS while not supported on OCP with the nonroot SCC

Signed-off-by: Shiva Krishna, Merla <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Co-authored-by: Vishesh Tanksale <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Add Helm pre-upgrade hook for automatic upgrade to latest CRDs (#208)

* Add Helm pre-upgrade hook for automatic upgrade to latest CRDs

Signed-off-by: Shiva Krishna, Merla <[email protected]>

* Add security context for the upgrade hook

Signed-off-by: Shiva Krishna, Merla <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Fix entrypoint for the upgrade hook to avoid invoking shell (#209)

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>

* Removing NGC Home path and HF_HOME env (#202) (#211)

Signed-off-by: Vishesh Tanksale <[email protected]>

---------

Signed-off-by: Shiva Krishna, Merla <[email protected]>
Signed-off-by: Vishesh Tanksale <[email protected]>
Co-authored-by: Shiva Krishna Merla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants