Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayCluster][Feature] add redis username to head pod from GcsFaultToleranceOptions #2760

Merged
merged 3 commits into from
Jan 21, 2025

Conversation

win5923
Copy link
Contributor

@win5923 win5923 commented Jan 16, 2025

Why are these changes needed?

This PR addresses the following selected part:
image

Currently, the example YAML file for GCS FT has Redis version 5.0.9, which does not support ACL. This causes a CrashLoopBackOff when setting redisUsername.

image
Ref: https://redis.io/docs/latest/operate/oss_and_stack/management/security/acl/

Manual Tests

kind: ConfigMap
apiVersion: v1
metadata:
  name: redis-config
  labels:
    app: redis
data:
  redis.conf: |-
    dir /data
    port 6379
    bind 0.0.0.0
    appendonly yes
    protected-mode no
    requirepass 5241590000000000
    pidfile /data/redis-6379.pid

    user username on >5241590000000000 ~* +@all
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    app: redis
spec:
  type: ClusterIP
  ports:
    - name: redis
      port: 6379
  selector:
    app: redis
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7.4.2 <--- Use the version that supports ACL.
          command:
            - "sh"
            - "-c"
            - "redis-server /usr/local/etc/redis/redis.conf"
          ports:
            - containerPort: 6379
          volumeMounts:
            - name: config
              mountPath: /usr/local/etc/redis/redis.conf
              subPath: redis.conf
      volumes:
        - name: config
          configMap:
            name: redis-config
---
# Redis password
apiVersion: v1
kind: Secret
metadata:
  name: redis-password-secret
type: Opaque
data:
  # echo -n "username" | base64
  # echo -n "5241590000000000" | base64
  username: dXNlcm5hbWU=
  password: NTI0MTU5MDAwMDAwMDAwMA==
---
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-external-redis-3
spec:
  rayVersion: 'nightly'
  gcsFaultToleranceOptions:
    redisAddress: redis:6379
    redisUsername:
      valueFrom:
        secretKeyRef:
          name: redis-password-secret
          key: username
    redisPassword:
      valueFrom:
        secretKeyRef:
          name: redis-password-secret
          key: password
  headGroupSpec:
    rayStartParams:
      num-cpus: "0"
    template:
      spec:
        containers:
          - name: ray-head
            image: rayproject/ray:nightly
            resources:
              limits:
                cpu: "1"
              requests:
                cpu: "1"
            ports:
              - containerPort: 6379
                name: redis
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
            volumeMounts:
              - mountPath: /tmp/ray
                name: ray-logs
              - mountPath: /home/ray/samples
                name: ray-example-configmap
        volumes:
          - name: ray-logs
            emptyDir: {}
          - name: ray-example-configmap
            configMap:
              name: ray-example
              defaultMode: 0777
              items:
                - key: detached_actor.py
                  path: detached_actor.py
                - key: increment_counter.py
                  path: increment_counter.py
  workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 10
      groupName: small-group
      rayStartParams: {}
      template:
        spec:
          containers:
            - name: ray-worker
              image: rayproject/ray:nightly
              volumeMounts:
                - mountPath: /tmp/ray
                  name: ray-logs
              resources:
                limits:
                  cpu: "1"
                requests:
                  cpu: "1"
          volumes:
            - name: ray-logs
              emptyDir: {}
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ray-example
data:
  detached_actor.py: |
    import ray

    @ray.remote(num_cpus=1)
    class Counter:
      def __init__(self):
          self.value = 0

      def increment(self):
          self.value += 1
          return self.value

    ray.init(namespace="default_namespace")
    Counter.options(name="counter_actor", lifetime="detached").remote()
  increment_counter.py: |
    import ray

    ray.init(namespace="default_namespace")
    counter = ray.get_actor("counter_actor")
    print(ray.get(counter.increment.remote()))

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ray-example
data:
  detached_actor.py: |
    import ray

    @ray.remote(num_cpus=1)
    class Counter:
      def __init__(self):
          self.value = 0

      def increment(self):
          self.value += 1
          return self.value

    ray.init(namespace="default_namespace")
    Counter.options(name="counter_actor", lifetime="detached").remote()
  increment_counter.py: |
    import ray

    ray.init(namespace="default_namespace")
    counter = ray.get_actor("counter_actor")
    print(ray.get(counter.increment.remote()))

image
image

Related issue number

Resolves #2720

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@win5923 win5923 marked this pull request as draft January 16, 2025 14:38
@win5923 win5923 force-pushed the redis/username branch 3 times, most recently from b786001 to 4793752 Compare January 16, 2025 15:07
@win5923 win5923 marked this pull request as ready for review January 16, 2025 16:24
@win5923
Copy link
Contributor Author

win5923 commented Jan 16, 2025

@rueian PTAL when you are free

@kevin85421
Copy link
Member

@win5923 would you mind rebasing with the master branch?

@win5923 win5923 force-pushed the redis/username branch 2 times, most recently from 2fceba0 to 833e95f Compare January 17, 2025 14:32
@win5923
Copy link
Contributor Author

win5923 commented Jan 17, 2025

@kevin85421 Done! PTAL

@kevin85421 kevin85421 self-assigned this Jan 17, 2025
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@win5923 would you mind resolving the conflict? Thanks!

@@ -137,6 +148,15 @@ func configureGCSFaultTolerance(podTemplate *corev1.PodTemplateSpec, instance ra
})
}
} else {
// If users directly set the `redis-username` in `rayStartParams` instead of referring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am considering not adding any specific logic for redis-username (i.e. remove the change from L151 to L159) in the old configurations to avoid maintenance costs and to provide an incentive for users to migrate to the new API, GcsFaultToleranceOptions. HDYT @rueian?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then should we make the incentive clear in the validateRayClusterSpec?

ray-operator/controllers/ray/common/pod.go Show resolved Hide resolved
@@ -334,12 +336,26 @@ func TestConfigureGCSFaultToleranceWithAnnotations(t *testing.T) {
redisPasswordEnv: "test-password",
isHeadPod: true,
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests in TestConfigureGCSFaultToleranceWithGcsFTOptions too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@win5923 win5923 force-pushed the redis/username branch 5 times, most recently from c72577c to 5cadced Compare January 20, 2025 13:07
@win5923 win5923 force-pushed the redis/username branch 3 times, most recently from 20f602f to f7080fe Compare January 20, 2025 13:15
@win5923 win5923 requested review from kevin85421 and rueian January 20, 2025 13:25
@win5923 win5923 force-pushed the redis/username branch 5 times, most recently from 398357c to b5acc94 Compare January 21, 2025 01:23
Signed-off-by: win5923 <[email protected]>
@win5923 win5923 requested a review from rueian January 21, 2025 01:39
@kevin85421 kevin85421 merged commit 0055bf3 into ray-project:master Jan 21, 2025
24 checks passed
@win5923 win5923 deleted the redis/username branch January 21, 2025 04:42
instance.Spec.HeadGroupSpec.RayStartParams["redis-username"] = "$REDIS_USERNAME"
container.Env = append(container.Env, corev1.EnvVar{
Name: utils.REDIS_USERNAME,
Value: options.RedisUsername.Value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only one of Value or ValurFrom can be set, not both

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same for redis password)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RayCluster][Feature] add GcsFaultToleranceOptions to the RayCluster CRD [2/N]
4 participants