Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(helm chart): Add metrics-server hardening options #1288

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mkilchhofer
Copy link

@mkilchhofer mkilchhofer commented Jul 14, 2023

What this PR does / why we need it:

It adds options on howto secure the connection between metrics-server and the kube-apiserver.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

related to:

Resolves #1230

Additional info:
The accepted FR #1230 is implemented also as I need this to test the hardening feature.

/cc: @serathius

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 14, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: mkilchhofer / name: Marco Maurer (-Kilchhofer) (8e1c090)

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 14, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @mkilchhofer!

It looks like this is your first PR to kubernetes-sigs/metrics-server 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/metrics-server has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 14, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @mkilchhofer. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 14, 2023
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 14, 2023
@serathius
Copy link
Contributor

/assign @stevehipwell

Copy link
Contributor

@stevehipwell stevehipwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @mkilchhofer, I've got a couple of high level review point which it would be good if you could address before I review this in more details

  • There is no need to run functions for the APIService if it's not enabled
  • The helm mode would be a better default (see below)
  • You don't need to recreate the cert on each Helm install, you can look up the existing secret
  • I don't think the chart should support creating a secret, the secret option should only be existing
  • Please remove the extraObjects value, this isn't a "good" pattern

@mkilchhofer
Copy link
Author

  • There is no need to run functions for the APIService if it's not enabled

How do you mean that? The genSelfSignedCert function?

  • The helm mode would be a better default (see below)

Sure, will do

  • You don't need to recreate the cert on each Helm install, you can look up the existing secret

I think we should not rely on the lookup function. There are GitOps solutions in the wild which only use helm for rendering (helm template subcommand). The lookup function of helm returns nothing during template as it cannot talk to the kubeapi by design. Only upgrade or install talks to the apiserver.

  • I don't think the chart should support creating a secret, the secret option should only be existing

Hmm why not? It is pretty common that helm charts support this.

  • Please remove the extraObjects value, this isn't a "good" pattern

How can I test the existing secret method then? I think it is an approved feature request, so contributors can file PRs for this, ref:

@stevehipwell
Copy link
Contributor

How do you mean that? The genSelfSignedCert function?

You moved the test for if the APIService was enabled below the new code.

I think we should not rely on the lookup function. There are GitOps solutions in the wild which only use helm for rendering (helm template subcommand). The lookup function of helm returns nothing during template as it cannot talk to the kubeapi by design. Only upgrade or install talks to the apiserver.

In that case they can use one of the other patterns, GitOps shouldn't define architecture it should support best practice.

Hmm why not? It is pretty common that helm charts support this.

Because it's unnecessary and leaks data into the chart values.

How can I test the existing secret method then? I think it is an approved feature request, so contributors can file PRs for this, ref:

We could add a test secret as part or the CI initialisation or it could be generated through the chart as a testing resource.

@mkilchhofer
Copy link
Author

mkilchhofer commented Jul 17, 2023

Because it's unnecessary and leaks data into the chart values.

Well, why is that an issue? This is how helm is designed for. These values are stored as a K8s secret.

@stevehipwell
Copy link
Contributor

Well, why is that an issue? This is how helm is designed for. These values are stored as a K8s secret.

It's the unnecessary part which is the primary concern here; but the Helm implementation and increased surface area for secret information shouldn't be discounted as unimportant.

But as I've said in my other replies you can easily extend this chart by creating a wrapper chart which depends on it and at which point you can make your own architectural decisions.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 18, 2023
@mkilchhofer mkilchhofer force-pushed the feature/helm-chart_hardening branch 3 times, most recently from 81f71cd to ee37851 Compare July 18, 2023 06:36
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 28, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mkilchhofer
Copy link
Author

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Jul 28, 2024
@k8s-ci-robot
Copy link
Contributor

@mkilchhofer: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mkilchhofer
Copy link
Author

@mkilchhofer could you rebase this PR?

@stevehipwell rebase done.

@mkilchhofer mkilchhofer force-pushed the feature/helm-chart_hardening branch 2 times, most recently from 3abbd87 to 38d8903 Compare July 29, 2024 21:37
Copy link
Contributor

@stevehipwell stevehipwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkilchhofer could you add your change to the chart CHANGELOG?

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 30, 2024
@mkilchhofer
Copy link
Author

@mkilchhofer could you add your change to the chart CHANGELOG?

/ok-to-test

@stevehipwell For sure. Done 👍

@stevehipwell
Copy link
Contributor

@mkilchhofer thanks for this PR, everything looks in order and I can see the CI automation is working. As we only currently check that the pod started correctly this doesn't necessarily mean that the changes actually work, which isn't usually an issue as the actual changes are usually contained in the binary which has been heavily tested. So could you confirm if you have you tested all these configurations locally?

I think that before we release the chart we ought to add a Helm test resource on a hook to confirm that these work and don't regress in the future, but I'm happy to take this PR as is if you've tested locally.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2024
@mkilchhofer
Copy link
Author

So could you confirm if you have you tested all these configurations locally?

Yep, I tested it within KinD (kindest/node:v1.29.2):

Bootstrap a kind cluster

$ pwd
/Users/mkilchhofer/git/github/metrics-server/charts/metrics-server

$ kind create cluster --config ../../test/kind-config.yaml
Creating cluster "kind" ...
(..)

$ kubie ctx kind-kind

$ helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --wait \
  --set installCRDs=true \
  --set extraArgs='{--enable-certificate-owner-ref}'
NAME: cert-manager
LAST DEPLOYED: Thu Aug  1 14:22:38 2024
(..)

Test case Option 1 (helm)

$ helm install metrics-server . --namespace kube-system --values ci/tls-helm-values.yaml
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:26:24 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     3m           14Mi
coredns-76f75df574-lfxqp                     3m           14Mi
etcd-kind-control-plane                      44m          44Mi
kindnet-94f7t                                1m           10Mi
kindnet-s7zgz                                1m           9Mi
kindnet-t5tv7                                1m           9Mi
kube-apiserver-kind-control-plane            137m         277Mi
kube-controller-manager-kind-control-plane   47m          52Mi
kube-proxy-2xpwm                             2m           14Mi
kube-proxy-72gdt                             1m           15Mi
kube-proxy-gdhf7                             1m           16Mi
kube-scheduler-kind-control-plane            6m           22Mi
metrics-server-5bc86f74cc-x5q7j              9m           19M

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

Test case Option 2 (cert-manager)

$ helm install metrics-server . --namespace kube-system --values ci/tls-certManager-values.yaml --wait
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:28:33 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     4m           14Mi
coredns-76f75df574-lfxqp                     4m           14Mi
etcd-kind-control-plane                      53m          44Mi
kindnet-94f7t                                1m           10Mi
kindnet-s7zgz                                1m           9Mi
kindnet-t5tv7                                1m           9Mi
kube-apiserver-kind-control-plane            146m         294Mi
kube-controller-manager-kind-control-plane   60m          55Mi
kube-proxy-2xpwm                             1m           14Mi
kube-proxy-72gdt                             1m           15Mi
kube-proxy-gdhf7                             1m           16Mi
kube-scheduler-kind-control-plane            8m           22Mi
metrics-server-5bc86f74cc-k7ggm              9m           19Mi

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

Test case Option 3 (existing secret)

$ openssl req -x509 -newkey rsa:2048 -sha256 -days 365 \
  -nodes -keyout tls.key -out tls.crt \
  -subj "/CN=metrics-server" \
  -addext "subjectAltName=DNS:metrics-server,DNS:metrics-server.kube-system.svc"
.....+ (..)

$ kubectl -n kube-system create secret generic metrics-server-existing \
  --from-file=tls.key \
  --from-file=tls.crt
secret/metrics-server-existing created

$ cat <<EOF >> ci/tls-existingSecret-values.yaml
apiService:
  insecureSkipTLSVerify: false
  caBundle: |
$(cat tls.crt  | sed -e "s/^/    /g")
EOF

$ cat ci/tls-existingSecret-values.yaml
args:
  - --kubelet-insecure-tls

## Set via GH action (step "Prepare existing secret test scenario")
# apiService:
#   insecureSkipTLSVerify: false
#   caBundle: |

tls:
  type: existingSecret
  existingSecret:
    name: metrics-server-existing
apiService:
  insecureSkipTLSVerify: false
  caBundle: |
    -----BEGIN CERTIFICATE-----
    (..)
    -----END CERTIFICATE-----

$ helm install metrics-server . --namespace kube-system --values ci/tls-existingSecret-values.yaml --wait
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:44:15 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     4m           22Mi
coredns-76f75df574-lfxqp                     4m           21Mi
etcd-kind-control-plane                      50m          53Mi
kindnet-94f7t                                2m           16Mi
kindnet-s7zgz                                1m           15Mi
kindnet-t5tv7                                1m           17Mi
kube-apiserver-kind-control-plane            146m         290Mi
kube-controller-manager-kind-control-plane   51m          53Mi
kube-proxy-2xpwm                             1m           17Mi
kube-proxy-72gdt                             2m           16Mi
kube-proxy-gdhf7                             3m           16Mi
kube-scheduler-kind-control-plane            7m           24Mi
metrics-server-868545df69-r96pf              11m          19Mi

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

When I have time at a later time, I can investigate if we can test these scenarios with a helm test.

@stevehipwell
Copy link
Contributor

@mkilchhofer RE testing I was thinking of using Helm testing hooks to install a pod and a job to check the metrics for the pod are being served. This test would work for all CI tests (existing pattern and new patterns).

Copy link
Contributor

@stevehipwell stevehipwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkilchhofer, stevehipwell
Once this PR has been reviewed and has the lgtm label, please assign logicalhan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stevehipwell
Copy link
Contributor

@logicalhan could you please approve this?

@mkilchhofer
Copy link
Author

@logicalhan friendly ping

@mkilchhofer
Copy link
Author

@stevehipwell @logicalhan friendly ping

@stevehipwell
Copy link
Contributor

@mkilchhofer I don't have approval permission outside of the Helm chart directory.

CC @dgrisonnet @serathius

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Allow the Chart to create extra manifest
7 participants