Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After applying MeshTcpRoute health checks disappear and traffic starts to be routed to unhealthy service #11237

Open
slonka opened this issue Aug 28, 2024 · 1 comment
Labels
kind/bug A bug triage/accepted The issue was reviewed and is complete enough to start working on it

Comments

@slonka
Copy link
Contributor

slonka commented Aug 28, 2024

What happened?

Coming from slack: https://kuma-mesh.slack.com/archives/CN2GN4HE1/p1724414616702329

  1. Yep, my policies.
type: MeshHealthCheck
name: web-to-redis-check
mesh: default
spec:
  targetRef:
    kind: Mesh
  to:
    - targetRef:
        kind: MeshService
        name: redis_kuma-demo_svc_6379
      default:
        interval: 10s
        timeout: 2s
        unhealthyThreshold: 3
        healthyThreshold: 1
        tcp:
          send: "ping\n"
          receive: [PONG]
        http:
          disabled: true
---
type: MeshTCPRoute
name: tcp-route-1
mesh: default
spec:
  targetRef:
    kind: MeshService
    name: demo-app_kuma-demo_svc_5000
  to:
    - targetRef:
        kind: MeshService
        name: redis_kuma-demo_svc_6379
      rules:
        - default:
            backendRefs:
              - kind: MeshServiceSubset
                name: redis_kuma-demo_svc_6379
                tags:
                  kuma.io/zone: zone-1
                weight: 90
              - kind: MeshServiceSubset
                name: redis_kuma-demo_svc_6379
                tags:
                  kuma.io/zone: zone-2
                weight: 10
---
type: MeshTrafficPermission
name: allow-all
mesh: default
spec:
  targetRef:
    kind: Mesh
  from:
  - targetRef:
      kind: Mesh
    default:
      action: Allow
---
type: Mesh
name: default
mtls:
  enabledBackend: ca-1
  backends:
    - name: ca-1
      type: builtin
      dpCert:
        rotation:
          expiration: 1d
      conf:
        caCert:
          RSAbits: 2048
          expiration: 10y
  1. How ro reproduce

Setup

Kuma version 2.8.2
K8s cluster 1.25

  1. Global control plane ran using docker compose app + database were in separate containers but in one compose file
  2. zone-1 control plane configured in universal mode on the same host with ingress and dataplane-proxy
  3. zone-2 control plane in one cluster with test application, installed with kumactl

Setup global control plane

Dockerfile:

Dockefile
FROM debian:12

ARG KUMA_VER=2.8.2

RUN apt-get update && apt-get install -y curl \
    &&  curl -L https://kuma.io/installer.sh | VERSION=$KUMA_VER sh - \
    && mv kuma-2.8.2 /kuma

WORKDIR /kuma

docker-compose.yaml:

---
version: '2'

services:
  postgres:
    image: postgres:16-alpine
    ports:
      - 5432:5432
    volumes:
      - ./apps/postgres:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=kuma_password
      - POSTGRES_USER=kuma_user
      - POSTGRES_DB=kuma_db
  kuma-global-cp:
    depends_on:
      - postgres
    build:
      context: .
      dockerfile: Dockerfile
    command: ["bin/kuma-cp", "run"]
    network_mode: host
    volumes:
      - ./apps/.kuma:/root/.kuma
    ports:
      - 5680:5680
      - 5681:5681
      - 5682:5682
      - 8443:443
      - 5676:5676
      - 5678:5678
      - 5685:5685
    environment:
      - "KUMA_MODE=global"
      - "KUMA_ENVIRONMENT=universal"
      - "KUMA_STORE_TYPE=postgres"
      - "KUMA_STORE_POSTGRES_HOST=127.0.0.1"
      - "KUMA_STORE_POSTGRES_PORT=5432"
      - "KUMA_STORE_POSTGRES_USER=kuma_user"
      - "KUMA_STORE_POSTGRES_PASSWORD=kuma_password"
      - "KUMA_STORE_POSTGRES_DB_NAME=kuma_db"

and run it

$ docker compose up -d 

Setup control plane for zone-1 in universal mode

---
version: '2'
services:
  postgres:
    image: postgres:16-alpine
    ports:
      - 5432:5432
    volumes:
      - ./apps/postgres:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=kuma_password
      - POSTGRES_USER=kuma_user
      - POSTGRES_DB=kuma_db

and run it

$ docker compose up -d

install kuma in line with documentation.

bash
$ curl -L https://kuma.io/installer.sh | VERSION=2.8.2 sh -
$ cd kuma-2.8.2/bin && export PATH=$(pwd):$PATH

Save next code to start-zone.sh

bash 
#!/bin/bash

export KUMA_MODE=zone
export KUMA_MULTIZONE_ZONE_NAME=zone-1
export KUMA_ENVIRONMENT=universal
export KUMA_STORE_TYPE=postgres
export KUMA_STORE_POSTGRES_HOST=127.0.0.1
export KUMA_STORE_POSTGRES_PORT=5432
export KUMA_STORE_POSTGRES_USER=kuma_user
export KUMA_STORE_POSTGRES_PASSWORD=kuma_password
export KUMA_STORE_POSTGRES_DB_NAME=kuma_db
export KUMA_MULTIZONE_ZONE_GLOBAL_ADDRESS=grpcs://<kuma-global-cp>:5685
export KUMA_MULTIZONE_ZONE_KDS_TLS_SKIP_VERIFY=true

kuma-cp migrate up
kuma-cp run

and start it

$ ./start-zone.sh

Start ingress in zone-1

Generate token on kuma-global-cp container

bash
$ kumactl generate zone-token --zone=zone-1 --valid-for=87600h --scope egress --scope ingress > /tmp/zone-token

and save it to the file(ingress-dp.yaml) on instance where ingress will be started.

Prepare dataplane configuration proxy:

type: ZoneIngress
name: zone-1
networking:
  address: <host ip>
  port: 10000
  advertisedAddress: <host ip> if you use wonna use balancer before zone ingress proxy add its address
  advertisedPort: 10000

Start dataplane proxy in the ingress mode:

bash 
kuma-dp run --proxy-type=ingress \
--cp-address=https://<kuma-zone-1-cp>:5678 \
--dataplane-token-file=zone-token \
--dataplane-file=ingress-dp.yaml

Setup dataplane proxy for redis (redis-dp.yaml)

Install redis as usual for your dist. and start it on 127.0.0.1 6379

type: Dataplane
mesh: default
name: redis-1
networking:
  address: 127.0.0.1
  inbound:
  - port: 9000
    servicePort: 6379
    tags:
      kuma.io/service: redis_kuma-demo_svc_6379
      kuma.io/protocol: tcp
  admin:
    port: 9902

generate token for dataplane on the kuma-global-cp:

bash
$ kumactl generate dataplane-token --name redis-1 --valid-for=87600h > redis-dp-token

start it

bash
./kuma-dp run \
--cp-address=https://<kuma-zone-1-cp>:5678 \
--dataplane-file=redis-dp.yaml \
--dataplane-token-file=redis-dp-token

Setup control plane for zone-2 in k8s mode

In my case k8s started with minikube:

 bash
$ minikube start --cpus 2 --memory 6144 --kubernetes-version v1.25 -p zone-2

Install kuma on the k8s with kumactl in line with documentation:

bash
./kumactl install control-plane \
  --set "controlPlane.mode=zone" \
  --set "controlPlane.zone=zone-2" \
  --set "ingress.enabled=true" \
  --set "controlPlane.kdsGlobalAddress=grpcs://<kuma-global-cp>:5685" \
  --set "controlPlane.tls.kdsZoneClient.skipVerify=true" \
  | kubectl apply -f -

Save to demo-app.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: kuma-demo
  labels:
    kuma.io/sidecar-injection: enabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: kuma-demo
spec:
  selector:
    matchLabels:
      app: redis
  replicas: 1
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: "redis"
          ports:
            - name: tcp
              containerPort: 6379
          lifecycle:
            preStop: # delay shutdown to support graceful mesh leave
              exec:
                command: ["/bin/sleep", "30"]
            postStart:
              exec:
                command: ["/usr/local/bin/redis-cli", "set", "zone", "local"]
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: kuma-demo
spec:
  selector:
    app: redis
  ports:
  - protocol: TCP
    port: 6379
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-app
  namespace: kuma-demo
spec:
  selector:
    matchLabels:
      app: demo-app
  replicas: 1
  template:
    metadata:
      labels:
        app: demo-app
    spec:
      containers:
        - name: demo-app
          image: "kumahq/kuma-demo"
          env:
            - name: REDIS_HOST
              value: "redis_kuma-demo_svc_6379.mesh"
            - name: REDIS_PORT
              value: "80"
            - name: APP_VERSION
              value: "1.0"
            - name: APP_COLOR
              value: "#efefef"
          ports:
            - name: http
              containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: demo-app
  namespace: kuma-demo
spec:
  selector:
    app: demo-app
  ports:
  - protocol: TCP
    appProtocol: http
    port: 5000

install it:

bash
$ kubectl apply -f demo-app.yaml

Setup mesh policies

  1. For cross zone essetial install mTLS support. Save to mesh-builtin-ca.yaml on kuma-global-cp instance
type: Mesh
name: default
mtls:
  enabledBackend: ca-1
  backends:
    - name: ca-1
      type: builtin
      dpCert:
        rotation:
          expiration: 1d
      conf:
        caCert:
          RSAbits: 2048
          expiration: 10y

apply it:

$ kumactl apply -f mesh-builtin-ca.yaml
  1. Just for test needs allow all traffic. For this save yaml to file allow-all-traffic.yaml
type: MeshTrafficPermission
name: allow-all
mesh: default
spec:
  targetRef:
    kind: Mesh
  from:
  - targetRef:
      kind: Mesh
    default:
      action: Allow

apply it:

$ kumactl apply -f allow-all-traffic.yaml
  1. For balance redis traffic between zone-1 and zone-2 save to meshtcproute.yaml
type: MeshTCPRoute
name: tcp-route-1
mesh: default
spec:
  targetRef:
    kind: MeshService
    name: demo-app_kuma-demo_svc_5000
  to:
    - targetRef:
        kind: MeshService
        name: redis_kuma-demo_svc_6379
      rules:
        - default:
            backendRefs:
              - kind: MeshServiceSubset
                name: redis_kuma-demo_svc_6379
                tags:
                  kuma.io/zone: zone-1
                weight: 90
              - kind: MeshServiceSubset
                name: redis_kuma-demo_svc_6379
                tags:
                  kuma.io/zone: zone-2
                weight: 10

apply it:

$ kumactl apply -f meshtcproute.yaml
  1. MeshHealtCheck and save meshhealthcheck.yaml
type: MeshHealthCheck
name: web-to-redis-check
mesh: default
spec:
  targetRef:
    kind: Mesh
  to:
    - targetRef:
        kind: MeshService
        name: redis_kuma-demo_svc_6379
      default:
        interval: 10s
        timeout: 2s
        unhealthyThreshold: 3
        healthyThreshold: 1
        tcp:
          send: "ping\n"
          receive: [PONG]
        http:
          disabled: true

Test app

forward app port from minikube to host

$ kubectl -n kuma-demo port-forward service/demo-app --address 0.0.0.0 5000

and test it in browser

http://127.0.0.1:5000/

Try to stop redis in zone-1 behind the dataplane proxy. Application continues to try unavailable service.
I thought unhealthy service is skipped.

@slonka slonka added triage/pending This issue will be looked at on the next triage meeting kind/bug A bug triage/needs-reproducing Someone else should try to reproduce this labels Aug 28, 2024
@kumahq kumahq deleted a comment Aug 28, 2024
@kumahq kumahq deleted a comment Aug 28, 2024
@jakubdyszkiewicz jakubdyszkiewicz added triage/accepted The issue was reviewed and is complete enough to start working on it and removed triage/pending This issue will be looked at on the next triage meeting triage/needs-reproducing Someone else should try to reproduce this labels Sep 2, 2024
@lukidzi
Copy link
Contributor

lukidzi commented Sep 4, 2024

I think the issue arises because when you create a MeshTCPRoute, you end up with two separate clusters. The routing configuration happens at the route level, and both route and cluster selection occur there. While health checking seems to work, the traffic unfortunately still goes to the cluster with the unhealthy endpoint. If all endpoints were in the same cluster, everything should work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug triage/accepted The issue was reviewed and is complete enough to start working on it
Projects
None yet
Development

No branches or pull requests

9 participants
@slonka @jakubdyszkiewicz @lukidzi and others