Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create static mirror pods due to mpod.vpc.k8s.aws Admission Webhook #351

Open
jonathan-innis opened this issue Dec 30, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@jonathan-innis
Copy link

jonathan-innis commented Dec 30, 2023

Describe the Bug:

I am trying to create a static mirror pod on a node that is running AL2 and is connecting to an EKS control plane. When I point the kubelet to the staticPodPath, I get the following error message in the kubelet on startup

Dec 30 02:44:48 ip-192-168-81-58.us-west-2.compute.internal kubelet[1495]: E1230 02:44:48.535524    1495 kubelet.go:1899] "Failed creating a mirror pod for" err="admission webhook \"mpod.vpc.k8s.aws\" denied the request: Failed to get Matching SGP for Pods, rejecting event" pod="default/static-web-ip-192-168-81-58.us-west-2.compute.internal"

Digging deeper into why this happened, I see that this error log gets fired here: https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/webhooks/core/pod_webhook.go#L188. Looking at the GetMatchingSecurityGroupForPods() function, I can see that this will error out and cause denial in the webhook when the webhook is unable to find the service account for the pod. Since the service account for the pod doesn't exist for static pods, I'm suspecting that the lack of the ability for looking up the unspecified service account here is causing failure on pod creation.

From reading through this issue, static pods implicitly don't rely on any API objects since they can't assume that the apiserver even exists when they come up. It seems like the webhook here makes an assumption that these service account names always exist in pods, which seems to be true almost all of the time, except in the case of static pods.

Expected Behavior:

Static pods should be able to create an apiserver representation of themselves without any failure.

How to reproduce it (as minimally and precisely as possible):

  1. Create an EC2 instance running the EKS-optimized AMI on AL2
  2. Use the following userData (or similar) when creating the instance
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"

mkdir -p /etc/kubernetes/manifests/
echo "$(jq '.staticPodPath="/etc/kubernetes/manifests/"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json    

cat <<EOF >/etc/kubernetes/manifests/static-web.yaml
apiVersion: v1
kind: Pod
metadata:
  name: static-web
  namespace: default
spec:
  containers:
    - name: web
      image: nginx
EOF

--//
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash -xe
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
/etc/eks/bootstrap.sh <cluster-name> --apiserver-endpoint <apiserver-endpoint> --b64-cluster-ca <cluster-ca> \
--dns-cluster-ip '10.100.0.10' \
--use-max-pods false
--//--
  1. Wait for the node to join and the instance to start. Then, run journalctl -u kubelet after SSM-ing into the node to see failures creating the static pods.

Additional Context:

As a workaround right now, I'm just having to disable the mutating webhook with kubectl delete mutatingwebhookconfiguration vpc-resource-mutating-webhook to unblock me from creating static pods.

Environment:

  • Kubernetes version (use kubectl version): v1.28.4-eks-8cb36c9
  • CNI Version: v1.12.5-eksbuild.2
  • OS (Linux/Windows): Linux
@jonathan-innis jonathan-innis added the bug Something isn't working label Dec 30, 2023
@jonathan-innis jonathan-innis changed the title Unable to create mirror pods due to VPC Resource Controller Admission Webhook Unable to create static mirror pods due to VPC Resource Controller Admission Webhook Dec 30, 2023
@jonathan-innis jonathan-innis changed the title Unable to create static mirror pods due to VPC Resource Controller Admission Webhook Unable to create static mirror pods due to mpod.vpc.k8s.aws Admission Webhook Dec 30, 2023
@haouc
Copy link
Contributor

haouc commented Jan 1, 2024

@jonathan-innis , yes, the webhook is assuming all pods are not static pods and should be assigned with a Service Account. As discussed offline, so far we are not seeing an use case which need create static pods in production. We will investigate if SA check is ignore-able from supported feature point of view (Security Group for Pods), and/or in more general point of view. I will update later.

@jonathan-innis
Copy link
Author

jonathan-innis commented Jan 1, 2024

so far we are not seeing an use case which need create static pods in production

We've seen this ask for Karpenter with Airflow: kubernetes-sigs/karpenter#863. Granted, this is one data point, but it seems like some asks do exist for creating static pods that aren't control plane pods.

will investigate if SA check is ignore-able from supported feature point of view

Definitely seems like you could just ignore the get of the SA if you don't find one attached to the pod. I would imagine that you should be able to enforce Security Groups for Pods like you would with any other pod since I would expect that the network traffic would be routed to the static pod like any other pod on the cluster.

@haouc
Copy link
Contributor

haouc commented Jan 2, 2024

to enforce Security Groups for Pods

If this is regarding static pods to use Security Group for Pods, this is not a case we were supporting or testing. Who sets up the networking for static pods?

you could just ignore the get of the SA if you don't find one attached to the pod

At this moment I am not certain if the webhook can safely assume No SA assigned pods are guaranteed being static pods. Since the feature supports pod labels and sa labels, we have to be certain ignoring SA is ok in all cases.

@jonathan-innis
Copy link
Author

jonathan-innis commented Jan 2, 2024

Who sets up the networking for static pods

This is something I'm not 100% sure on. I'm assuming the CNI, as with every other pod component, but I'll double-check that in the community Slack. I'm working off that assumption only because there's no callout in the static pod docs that mentions otherwise.

At this moment I am not certain if the webhook can safely assume No SA assigned pods are guaranteed being static pods

I don't even think that you have to guarantee that they are static pods. From what I can understand, you can build a SecurityGroupPolicy off of selectors on either the pods or the service account. Naturally, I would assume that if a pod doesn't reference a service account (for whatever reason) a service account selector just wouldn't apply to it.

@jonathan-innis
Copy link
Author

yep, nothing special (just CNI)

Confirmed that it's CNI like any other pod on the cluster: https://kubernetes.slack.com/archives/C09NXKJKA/p1704164272715389

@haouc
Copy link
Contributor

haouc commented Jan 2, 2024

Thanks for checking. It makes sense to me that static pods' networking are setup by the same path. I have no problem to remove the forced SA check on pods. Just want to call out this can be a behavior change although I think it is unlikely customers are relying on this check to avoid apply SGP to some of their pods.

@guessi
Copy link

guessi commented Feb 8, 2024

Jump into the thread as I found the same issue recently.

Testing Environment:

  • Amazon EKS 1.29 (fresh new clean cluster)
  • Managed Add-ons:
    • kube-proxy (all defaults, v1.29.0-eksbuild.2)
    • Amazon VPC CNI (all defaults, v1.16.2-eksbuild.1)
    • CoreDNS (all defaults, v1.11.1-eksbuild.6)

I found that if user tried to follow the guidance of Static Pod creation, it would failed unexpectedly.

Steps to reproduce the issue (execute inside EKS node with "root")

mkdir -p /etc/kubernetes/manifests/
cat <<EOF >/etc/kubernetes/manifests/static-web.yaml
apiVersion: v1
kind: Pod
metadata:
  name: static-web
  labels:
    role: myrole
spec:
  containers:
    - name: web
      image: nginx
      ports:
        - name: web
          containerPort: 80
          protocol: TCP
EOF
echo "$(jq '.staticPodPath="/etc/kubernetes/manifests/"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json
systemctl restart kubelet
# journalctl -u kubelet | grep 'static-web'
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:27.838978  118710 kubelet.go:2424] "SyncLoop ADD" source="file" pods=["default/static-web-ip-192-168-101-59.ec2.internal"]
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:27.839118  118710 topology_manager.go:215] "Topology Admit Handler" podUID="85f6f142d15130b28f70dbf3308765a8" podNamespace="default" podName="static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:27.839303  118710 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="default/static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: E0208 05:18:27.872088  118710 kubelet.go:1930] "Failed creating a mirror pod for" err="admission webhook \"mpod.vpc.k8s.aws\" denied the request: Failed to get Matching SGP for Pods, rejecting event" pod="default/static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:27.872265  118710 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="default/static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:28 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:28.604175  118710 kubelet.go:2456] "SyncLoop (PLEG): event for pod" pod="default/static-web-ip-192-168-101-59.ec2.internal" event={"ID":"85f6f142d15130b28f70dbf3308765a8","Type":"ContainerStarted","Data":"18c924ab2785f8eb19ba785a780a311fea3fb32653d51bb8310d40285b9d4b92"}
Feb 08 05:18:32 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:32.616077  118710 kubelet.go:2456] "SyncLoop (PLEG): event for pod" pod="default/static-web-ip-192-168-101-59.ec2.internal" event={"ID":"85f6f142d15130b28f70dbf3308765a8","Type":"ContainerStarted","Data":"e982dd5d3c7faa4a34046dfba3411c2c69e8bea5c0f04e5b2cb1d22237172a7c"}
Feb 08 05:18:32 ip-192-168-101-59.ec2.internal kubelet[118710]: E0208 05:18:32.621184  118710 kubelet.go:1930] "Failed creating a mirror pod for" err="admission webhook \"mpod.vpc.k8s.aws\" denied the request: Failed to get Matching SGP for Pods, rejecting event" pod="default/static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:33 ip-192-168-101-59.ec2.internal kubelet[118710]: E0208 05:18:33.623274  118710 kubelet.go:1930] "Failed creating a mirror pod for" err="admission webhook \"mpod.vpc.k8s.aws\" denied the request: Failed to get Matching SGP for Pods, rejecting event" pod="default/static-web-ip-192-168-101-59.ec2.internal"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants