Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared network for vfkit driver using vmnet-helper #20501

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

nirs
Copy link
Contributor

@nirs nirs commented Mar 7, 2025

Add new network option for vfkit "vment-shared", connecting vfkit to the
vmnet shared network. Clusters using this network can access other
clusters in the same network, similar to socket_vment with QEMU driver.

If network is not specified, we default to the "nat" network, keeping
the previous behavior. If network is "vment-shared", the vfkit driver
manages 2 processes: vfkit and vmnet-helper.

Like vfkit, vmnet-helper is started in the background, in a new process
group, so it not terminated if the minikube process group is terminate.

Since vment-helper requires root to start the vmnet interface, we start
it with sudo, creating 2 child processes. vment-helper drops privileges
immediately after starting the vment interface, and run as the user and
group running minikube.

Stopping the cluster will stop sudo, which will stop the vmnet-helper
process. Deleting the cluster kill both sudo and vment-helper by killing
the process group.

This change is not complete, but it is good enough to play with the new
shared network.

Example usage:

  1. Install vmnet-helper:
    https://github.com/nirs/vmnet-helper?tab=readme-ov-file#installation

  2. Setup vment-helper sudoers rule:
    https://github.com/nirs/vmnet-helper?tab=readme-ov-file#granting-permission-to-run-vmnet-helper

  3. Start 2 clusters with vmnet-shared network:

% minikube start -p c1 --driver vfkit --network vment-shared
...

% minikube start -p c2 --driver vfkit --network vmnet-shared
...

% minikube ip -p c1
192.168.105.18

% minikube ip -p c2
192.168.105.19
  1. Both cluster can access the other cluster:
% minikube -p c1 ssh -- ping -c 3 192.168.105.19
PING 192.168.105.19 (192.168.105.19): 56 data bytes
64 bytes from 192.168.105.19: seq=0 ttl=64 time=0.621 ms
64 bytes from 192.168.105.19: seq=1 ttl=64 time=0.989 ms
64 bytes from 192.168.105.19: seq=2 ttl=64 time=0.490 ms

--- 192.168.105.19 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.490/0.700/0.989 ms

% minikube -p c2 ssh -- ping -c 3 192.168.105.18
PING 192.168.105.18 (192.168.105.18): 56 data bytes
64 bytes from 192.168.105.18: seq=0 ttl=64 time=0.289 ms
64 bytes from 192.168.105.18: seq=1 ttl=64 time=0.798 ms
64 bytes from 192.168.105.18: seq=2 ttl=64 time=0.993 ms

--- 192.168.105.18 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.289/0.693/0.993 ms

To complete this work we need:

  • Install vmnet-helper on the CI macOS hosts
  • Add test using --network vmnet-shared

Based on #20506

Fixes #20557
Fixes #20558

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 7, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nirs
Once this PR has been reviewed and has the lgtm label, please assign spowelljr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @nirs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 7, 2025
@minikube-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@nirs
Copy link
Contributor Author

nirs commented Mar 7, 2025

@afbjorklund can you review this?

@nirs nirs force-pushed the vmnet-helper branch 3 times, most recently from 86b449f to 6bf1d56 Compare March 9, 2025 22:38
@nirs
Copy link
Contributor Author

nirs commented Mar 9, 2025

Example machine config when vmnet-shared is used:

{
    "ConfigVersion": 3,
    "Driver": {
        "IPAddress": "192.168.105.21",
        "MachineName": "minikube",
        "SSHUser": "docker",
        "SSHPort": 22,
        "SSHKeyPath": "",
        "StorePath": "/Users/nir/.minikube",
        "SwarmMaster": false,
        "SwarmHost": "",
        "SwarmDiscovery": "",
        "Boot2DockerURL": "file:///Users/nir/.minikube/cache/iso/arm64/minikube-v1.35.0-arm64.iso",
        "DiskSize": 20000,
        "CPU": 2,
        "Memory": 6000,
        "Cmdline": "",
        "ExtraDisks": 0,
        "Network": "vmnet-shared",
        "MACAddress": "de:a2:9c:71:3c:f7",
        "VmnetHelper": {
            "MachineDir": "/Users/nir/.minikube/machines/minikube",
            "InterfaceID": "a0c43efb-2dcc-4abb-a310-568782c5dc7a"
        }
    },
    "DriverName": "vfkit",
    "HostOptions": {
        "Driver": "",
        "Memory": 0,
        "Disk": 0,
        "EngineOptions": {
            "ArbitraryFlags": null,
            "Dns": null,
            "GraphDir": "",
            "Env": null,
            "Ipv6": false,
            "InsecureRegistry": [
                "10.96.0.0/12"
            ],
            "Labels": null,
            "LogLevel": "",
            "StorageDriver": "",
            "SelinuxEnabled": false,
            "TlsVerify": false,
            "RegistryMirror": [],
            "InstallURL": "https://get.docker.com"
        },
        "SwarmOptions": {
            "IsSwarm": false,
            "Address": "",
            "Discovery": "",
            "Agent": false,
            "Master": false,
            "Host": "",
            "Image": "",
            "Strategy": "",
            "Heartbeat": 0,
            "Overcommit": 0,
            "ArbitraryFlags": null,
            "ArbitraryJoinFlags": null,
            "Env": null,
            "IsExperimental": false
        },
        "AuthOptions": {
            "CertDir": "/Users/nir/.minikube",
            "CaCertPath": "/Users/nir/.minikube/certs/ca.pem",
            "CaPrivateKeyPath": "/Users/nir/.minikube/certs/ca-key.pem",
            "CaCertRemotePath": "",
            "ServerCertPath": "/Users/nir/.minikube/machines/server.pem",
            "ServerKeyPath": "/Users/nir/.minikube/machines/server-key.pem",
            "ClientKeyPath": "/Users/nir/.minikube/certs/key.pem",
            "ServerCertRemotePath": "",
            "ServerKeyRemotePath": "",
            "ClientCertPath": "/Users/nir/.minikube/certs/cert.pem",
            "ServerCertSANs": null,
            "StorePath": "/Users/nir/.minikube"
        }
    },
    "Name": "minikube"
}

@nirs
Copy link
Contributor Author

nirs commented Mar 9, 2025

Example multi-node cluster

% minikube start --network vmnet-shared --nodes 2 --cni auto
😄  minikube v1.35.0 on Darwin 15.3.1 (arm64)
✨  Using the vfkit (experimental) driver based on user configuration
❗  --network flag is only valid with the docker/podman, KVM and Qemu drivers, it will be ignored
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🔥  Creating vfkit VM (CPUs=2, Memory=4050MB, Disk=20000MB) ...
📦  Preparing Kubernetes v1.32.2 on containerd 1.7.23 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: default-storageclass, storage-provisioner

👍  Starting "minikube-m02" worker node in "minikube" cluster
🔥  Creating vfkit VM (CPUs=2, Memory=4050MB, Disk=20000MB) ...
🌐  Found network options:
    ▪ NO_PROXY=192.168.105.22
📦  Preparing Kubernetes v1.32.2 on containerd 1.7.23 ...
    ▪ env NO_PROXY=192.168.105.22
    > kubelet.sha256:  64 B / 64 B [-------------------------] 100.00% ? p/s 0s
    > kubectl.sha256:  64 B / 64 B [-------------------------] 100.00% ? p/s 0s
    > kubeadm.sha256:  64 B / 64 B [-------------------------] 100.00% ? p/s 0s
    > kubectl:  53.25 MiB / 53.25 MiB [------------] 100.00% 37.13 MiB p/s 1.6s
    > kubelet:  71.75 MiB / 71.75 MiB [------------] 100.00% 12.43 MiB p/s 6.0s
    > kubeadm:  66.81 MiB / 66.81 MiB [------------] 100.00% 10.11 MiB p/s 6.8s
🔎  Verifying Kubernetes components...
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

% kubectl get node                                                           
NAME           STATUS   ROLES           AGE     VERSION
minikube       Ready    control-plane   5m1s    v1.32.2
minikube-m02   Ready    <none>          4m40s   v1.32.2

% kubectl get node -o jsonpath='{.items[*].status.addresses[0].address}{"\n"}'
192.168.105.22 192.168.105.23

Each node get its own vment-helper process:

% ps au | grep vmnet-helper | grep -v grep 
nir  60260   0.0  0.0 410743984   3792 s020  S     1:07AM   0:00.71 /opt/vmnet-helper/bin/vmnet-helper --fd 21 --interface-id c9ae713b-5937-42de-9018-bbb95a425506
root 60259   0.0  0.0 410737168   6448 s020  S     1:07AM   0:00.01 sudo --non-interactive --close-from 22 /opt/vmnet-helper/bin/vmnet-helper --fd 21 --interface-id c9ae713b-5937-42de-9018-bbb95a425506
nir  60254   0.0  0.0 410735792   3728 s020  S     1:07AM   0:00.73 /opt/vmnet-helper/bin/vmnet-helper --fd 13 --interface-id cf62fb28-2533-4c25-8d35-1eef1736f707
root 60253   0.0  0.0 410754576   6992 s020  S     1:07AM   0:00.01 sudo --non-interactive --close-from 14 /opt/vmnet-helper/bin/vmnet-helper --fd 13 --interface-id cf62fb28-2533-4c25-8d35-1eef1736f707

@afbjorklund
Copy link
Collaborator

@afbjorklund can you review this?

I could take a look at it later perhaps, but one of the minikube maintainers will still need to "take over" the vfkit driver.

Probably should have an issue, as well.

Copy link

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed several vment typos in commit logs

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2025
@nirs nirs mentioned this pull request Mar 14, 2025
nirs added 3 commits March 16, 2025 00:36
Current code contains multiple implementations for managing a process
using pids, with various issues:

- Some are unsafe, terminating a process by pid without validating that
  the pid belongs to the right process. Some use unclear
- Using unclear terms like checkPid() (what does it mean?)
- Some are missing tests

Let's clean up the mess by introducing a process package. The package
provides:

- process.WritePidfile(): write a pid to file
- process.ReadPidfile(): read pid from file
- process.Exists(): tells if process matching pid and name exists
- process.Terminate() terminates a process matching pid and name
- process.Kil() kill a process matching pid and name

The library is tested on linux, darwin, and windows. On windows we don't
have a standard way to terminate a process gracefully, so
process.Terminate() is the same as process.Kill().

I want to use this package in vfkit and the new vment package, and later
we can use it for qemu, hyperkit, and other code using managing
processes with pids.
- Simplify GetState() using process.ReadPidfile()
- Simplify Start() using process.WritePidfile()
GetState() had several issues:

- When accessing vfkit HTTP API, we handled only "running",
  "VirtualMachineStateRunning", "stopped", and
  "VirtualMachineStateStopped", but there are other 10 possible states,
  which we handled as state.None, when vfkit is running and need to be
  stopped. This can lead to wrong handling in the caller.

- When handling "stopped" and "VirtualMachineStateStopped" we returned
  state.Stopped, but did not remove the pidfile. This can cause
  termination of unrelated process or reporting wrong status when the
  pid is reused.

- Accessing the HTTP API will fail after we stop or kill it. This
  cause GetState() to fail when the process is actually stopped, which
  can lead to unnecessary retries and long delays (kubernetes#20503).

- When retuning state.None during Remove(), we use tried to do graceful
  shutdown which does not make sense in minikube delete flow, and is not
  consistent with state.Running handling.

Accessing vfkit API to check for state does not add much value for our
used case, checking if the vfkit process is running, and it is not
reliable.

Fix all the issues by not using the HTTP API in GetState(), and use only
the process state. We still use the API for stopping and killing vfkit
to do graceful shutdown. This also simplifies Remove(), since we need to
handle only the state.Running state.

With this change we consider vfkit as stopped only when the process does
not exist, which takes about 3 seconds after the state is reported as
"stopped".

Example stop flow:

    I0309 18:15:40.260249   18857 main.go:141] libmachine: Stopping "minikube"...
    I0309 18:15:40.263225   18857 main.go:141] libmachine: set state: {State:Stop}
    I0309 18:15:46.266902   18857 main.go:141] libmachine: Machine "minikube" was stopped.
    I0309 18:15:46.267122   18857 stop.go:75] duration metric: took 6.127761459s to stop

Example delete flow:

    I0309 17:00:49.483078   18127 out.go:177] * Deleting "minikube" in vfkit ...
    I0309 17:00:49.499252   18127 main.go:141] libmachine: set state: {State:HardStop}
    I0309 17:00:49.569938   18127 lock.go:35] WriteFile acquiring /Users/nir/.kube/config: ...
    I0309 17:00:49.573977   18127 out.go:177] * Removed all traces of the "minikube" cluster.
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 16, 2025
@medyagh
Copy link
Member

medyagh commented Mar 21, 2025

thank you for spending time on this PR, I could take a look at this PR once we merge other vkfit optimization PRs

nirs added 8 commits March 22, 2025 15:13
Previously we did not check the process name when checking a pid from a
pidfile. If the pidfile became state we would assume that vfkit is
running and try to stop it via the HTTP API, which would never succeed.
Now we detect stale pidfile and remove it.

If removing the stale pidfile fails, we don't want to fail the operation
since we know that vfkit is not running. We log the failure to aid
debugging of stale pidfile.
If setting vfkit state to "Stop" fails, we used to return an error.
Retrying the operation may never succeed.

Fix by falling back to terminating vfkit using a signal. This terminates
vfkit immediately similar to HardStop[1].

We can still fail if the pidfile is corrupted but this is unlikely and
requires manual cleanup.

In the case when we are sure the vfkit process does not exist, we remove
the pidfile immediately, avoiding leftover pidfile if the caller does
not call GetState() after Stop().

[1] crc-org/vfkit#284
We know that setting the state to `HardStop` typically fails:

    I0309 19:19:42.378591   21795 out.go:177] 🔥  Deleting "minikube" in vfkit ...
    W0309 19:19:42.397472   21795 delete.go:106] remove failed, will retry: kill: Post "http://_/vm/state": EOF

This may lead to unnecessary retries and delays. Fix by falling back to
sending a SIGKILL signal.

Example delete flow when setting vfkit state fails:

    I0309 20:07:41.688259   25540 out.go:177] 🔥  Deleting "minikube" in vfkit ...
    I0309 20:07:41.712017   25540 main.go:141] libmachine: Failed to set vfkit state to 'HardStop': Post "http://_/vm/state": EOF
Remove temporary and unneeded mac variable. It is easier to follow the
code when we use d.MACAddress.
System state changes should be more visible to make debugging easier.
The package manages the vmnet-helper[1] child process, providing
connection to the vmnet network without running the guest as root.

We will use vmnet-helper for the vfkit driver, which does not have a way
to use shared network, when guests can access other guest in the
network.  We can use it later with the qemu driver as alternative to
socket_vmnet.

[1] https://github.com/nirs/vmnet-helper
Add new network option for vfkit "vmnet-shared", connecting vfkit to the
vmnet shared network. Clusters using this network can access other
clusters in the same network, similar to socket_vmnet with QEMU driver.

If network is not specified, we default to the "nat" network, keeping
the previous behavior. If network is "vmnet-shared", the vfkit driver
manages 2 processes: vfkit and vmnet-helper.

Like vfkit, vmnet-helper is started in the background, in a new process
group, so it not terminated if the minikube process group is terminate.

Since vmnet-helper requires root to start the vmnet interface, we start
it with sudo, creating 2 child processes. vmnet-helper drops privileges
immediately after starting the vmnet interface, and run as the user and
group running minikube.

Stopping the cluster will stop sudo, which will stop the vmnet-helper
process. Deleting the cluster kill both sudo and vmnet-helper by killing
the process group.

This change is not complete, but it is good enough to play with the new
shared network.

Example usage:

1. Install vmnet-helper:
   https://github.com/nirs/vmnet-helper?tab=readme-ov-file#installation

2. Setup vmnet-helper sudoers rule:
   https://github.com/nirs/vmnet-helper?tab=readme-ov-file#granting-permission-to-run-vmnet-helper

3. Start 2 clusters with vmnet-shared network:

    % minikube start -p c1 --driver vfkit --network vmnet-shared
    ...

    % minikube start -p c2 --driver vfkit --network vmnet-shared
    ...

    % minikube ip -p c1
    192.168.105.18

    % minikube ip -p c2
    192.168.105.19

4. Both cluster can access the other cluster:

    % minikube -p c1 ssh -- ping -c 3 192.168.105.19
    PING 192.168.105.19 (192.168.105.19): 56 data bytes
    64 bytes from 192.168.105.19: seq=0 ttl=64 time=0.621 ms
    64 bytes from 192.168.105.19: seq=1 ttl=64 time=0.989 ms
    64 bytes from 192.168.105.19: seq=2 ttl=64 time=0.490 ms

    --- 192.168.105.19 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.490/0.700/0.989 ms

    % minikube -p c2 ssh -- ping -c 3 192.168.105.18
    PING 192.168.105.18 (192.168.105.18): 56 data bytes
    64 bytes from 192.168.105.18: seq=0 ttl=64 time=0.289 ms
    64 bytes from 192.168.105.18: seq=1 ttl=64 time=0.798 ms
    64 bytes from 192.168.105.18: seq=2 ttl=64 time=0.993 ms

    --- 192.168.105.18 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.289/0.693/0.993 ms
Trailing whitespace is removed by some editors or displayed as a
warning. Clean up to make it easy to make maintain this file.
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 22, 2025
@nirs
Copy link
Contributor Author

nirs commented Mar 22, 2025

Example iperf3 test

Resources

iperf3-server.yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: iperf3
  name: iperf3
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: iperf3
  template:
    metadata:
      labels:
        app: iperf3
    spec:
      containers:
      - image: networkstatic/iperf3
        imagePullPolicy: Always
        name: iperf3
        command: ["iperf3", "-s"]
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: iperf3
  name: iperf3
  namespace: default
spec:
  ports:
  - nodePort: 30201
    port: 5201
    protocol: TCP
  selector:
    app: iperf3
  type: NodePort

iperf3-client.yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: iperf3
  name: iperf3
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: iperf3
  template:
    metadata:
      labels:
        app: iperf3
    spec:
      containers:
      - image: networkstatic/iperf3
        imagePullPolicy: Always
        name: iperf3
        command: ["sleep", "3600"]

Creating the clusters

minikube start -p server --driver vfkit --container-runtime containerd --network vmnet-shared
minikube start -p client --driver vfkit --container-runtime containerd --network vmnet-shared
kubectl apply -f iperf3-server.yaml --context server
kubectl apply -f iperf3-client.yaml --context client

Host to vm benchmark

% iperf3 -c $(minikube ip -p server) --port 30201                              
Connecting to host 192.168.105.37, port 30201
[  5] local 192.168.105.1 port 62342 connected to 192.168.105.37 port 30201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   983 MBytes  8.25 Gbits/sec                  
[  5]   1.00-2.00   sec  1002 MBytes  8.37 Gbits/sec                  
[  5]   2.00-3.00   sec   979 MBytes  8.25 Gbits/sec                  
[  5]   3.00-4.00   sec   981 MBytes  8.21 Gbits/sec                  
[  5]   4.00-5.00   sec   982 MBytes  8.26 Gbits/sec                  
[  5]   5.00-6.00   sec   993 MBytes  8.30 Gbits/sec                  
[  5]   6.00-7.00   sec   986 MBytes  8.29 Gbits/sec                  
[  5]   7.00-8.00   sec   987 MBytes  8.25 Gbits/sec                  
[  5]   8.00-9.00   sec  1002 MBytes  8.44 Gbits/sec                  
[  5]   9.00-10.00  sec  1010 MBytes  8.44 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  9.67 GBytes  8.31 Gbits/sec                  sender
[  5]   0.00-10.01  sec  9.67 GBytes  8.30 Gbits/sec                  receiver

VM to host benchmark

% iperf3 -c $(minikube ip -p server) --port 30201 --reverse
Connecting to host 192.168.105.37, port 30201
Reverse mode, remote host 192.168.105.37 is sending
[  5] local 192.168.105.1 port 62348 connected to 192.168.105.37 port 30201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec  1001 MBytes  8.35 Gbits/sec                  
[  5]   1.01-2.01   sec  1016 MBytes  8.52 Gbits/sec                  
[  5]   2.01-3.01   sec   975 MBytes  8.18 Gbits/sec                  
[  5]   3.01-4.01   sec   949 MBytes  7.96 Gbits/sec                  
[  5]   4.01-5.00   sec  1007 MBytes  8.47 Gbits/sec                  
[  5]   5.00-6.00   sec  1016 MBytes  8.53 Gbits/sec                  
[  5]   6.00-7.00   sec  1008 MBytes  8.47 Gbits/sec                  
[  5]   7.00-8.00   sec  1.00 GBytes  8.57 Gbits/sec                  
[  5]   8.00-9.00   sec  1.00 GBytes  8.61 Gbits/sec                  
[  5]   9.00-10.01  sec  1019 MBytes  8.52 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec  9.81 GBytes  8.42 Gbits/sec  421            sender
[  5]   0.00-10.01  sec  9.80 GBytes  8.42 Gbits/sec                  receiver

VM to VM benchmark

% kubectl exec deploy/iperf3 --context client -- iperf3 -c $(minikube ip -p server) --port 30201 --forceflush
Connecting to host 192.168.105.37, port 30201
[  5] local 10.244.0.7 port 58638 connected to 192.168.105.37 port 30201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.38 GBytes  11.8 Gbits/sec  162   2.60 MBytes       
[  5]   1.00-2.00   sec  1.40 GBytes  12.0 Gbits/sec    0   2.99 MBytes       
[  5]   2.00-3.00   sec  1.40 GBytes  12.0 Gbits/sec    0   3.01 MBytes       
[  5]   3.00-4.00   sec  1.38 GBytes  11.9 Gbits/sec   20   3.01 MBytes       
[  5]   4.00-5.00   sec  1.39 GBytes  11.9 Gbits/sec    0   3.01 MBytes       
[  5]   5.00-6.00   sec  1.40 GBytes  12.0 Gbits/sec    0   3.01 MBytes       
[  5]   6.00-7.00   sec  1.40 GBytes  12.0 Gbits/sec    0   3.01 MBytes       
[  5]   7.00-8.00   sec  1.38 GBytes  11.9 Gbits/sec    0   3.01 MBytes       
[  5]   8.00-9.00   sec  1.39 GBytes  12.0 Gbits/sec    0   3.01 MBytes       
[  5]   9.00-10.00  sec  1.40 GBytes  12.0 Gbits/sec    0   3.01 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  13.9 GBytes  11.9 Gbits/sec  182             sender
[  5]   0.00-10.00  sec  13.9 GBytes  11.9 Gbits/sec                  receiver

@nirs
Copy link
Contributor Author

nirs commented Mar 22, 2025

Comparing to nat driver

The nat driver does not support VM to VM benchmark so we can test only host-to-vm and vm-to-host.

Creating the cluster

minikube start -p nat --driver vfkit --container-runtime containerd --network nat
kubectl apply -f iper3-server.yaml --context nat

Host to VM benchmark

% iperf3 -c $(minikube ip -p nat) --port 30201                                    
Connecting to host 192.168.106.22, port 30201
[  5] local 192.168.106.1 port 62415 connected to 192.168.106.22 port 30201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec  1.36 GBytes  11.6 Gbits/sec                  
[  5]   1.01-2.00   sec  1.31 GBytes  11.3 Gbits/sec                  
[  5]   2.00-3.00   sec  1.32 GBytes  11.4 Gbits/sec                  
[  5]   3.00-4.00   sec  1.34 GBytes  11.5 Gbits/sec                  
[  5]   4.00-5.01   sec  1.37 GBytes  11.7 Gbits/sec                  
[  5]   5.01-6.00   sec  1.44 GBytes  12.4 Gbits/sec                  
[  5]   6.00-7.00   sec  1.32 GBytes  11.3 Gbits/sec                  
[  5]   7.00-8.01   sec  1.32 GBytes  11.3 Gbits/sec                  
[  5]   8.01-9.01   sec  1.33 GBytes  11.4 Gbits/sec                  
[  5]   9.01-10.01  sec  1.38 GBytes  11.8 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  13.5 GBytes  11.6 Gbits/sec                  sender
[  5]   0.00-10.01  sec  13.5 GBytes  11.6 Gbits/sec                  receiver

VM to host benchmark

% iperf3 -c $(minikube ip -p nat) --port 30201 --reverse
Connecting to host 192.168.106.22, port 30201
Reverse mode, remote host 192.168.106.22 is sending
[  5] local 192.168.106.1 port 62419 connected to 192.168.106.22 port 30201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  7.09 GBytes  60.9 Gbits/sec                  
[  5]   1.00-2.01   sec  7.28 GBytes  62.3 Gbits/sec                  
[  5]   2.01-3.00   sec  7.27 GBytes  62.7 Gbits/sec                  
[  5]   3.00-4.01   sec  7.00 GBytes  60.0 Gbits/sec                  
[  5]   4.01-5.01   sec  6.77 GBytes  58.2 Gbits/sec                  
[  5]   5.01-6.00   sec  7.16 GBytes  61.7 Gbits/sec                  
[  5]   6.00-7.01   sec  7.24 GBytes  62.0 Gbits/sec                  
[  5]   7.01-8.01   sec  7.05 GBytes  60.5 Gbits/sec                  
[  5]   8.01-9.00   sec  7.42 GBytes  63.9 Gbits/sec                  
[  5]   9.00-10.00  sec  6.30 GBytes  54.1 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  70.6 GBytes  60.6 Gbits/sec  2595            sender
[  5]   0.00-10.00  sec  70.6 GBytes  60.6 Gbits/sec                  receiver

@nirs
Copy link
Contributor Author

nirs commented Mar 22, 2025

Comparing to QEMU driver with socket_vmnet

Creating the clusters

I'm using socket_vment installed from binarry as launchd service:
https://github.com/lima-vm/socket_vmnet?tab=readme-ov-file#from-binary

minikube start -p server --driver qemu --container-runtime containerd --network socket_vmnet
minikube start -p client --driver qemu --container-runtime containerd --network socket_vmnet
kubectl apply -f iper3-server.yaml --context server
kubectl apply -f iper3-client.yaml --context client

Host to VM benchmark

% iperf3 -c $(minikube ip -p server) --port 30201          
Connecting to host 192.168.105.39, port 30201
[  5] local 192.168.105.1 port 59464 connected to 192.168.105.39 port 30201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec   360 MBytes  3.00 Gbits/sec                  
[  5]   1.01-2.01   sec   364 MBytes  3.05 Gbits/sec                  
[  5]   2.01-3.00   sec   367 MBytes  3.08 Gbits/sec                  
[  5]   3.00-4.01   sec   374 MBytes  3.14 Gbits/sec                  
[  5]   4.01-5.01   sec   368 MBytes  3.08 Gbits/sec                  
[  5]   5.01-6.01   sec   366 MBytes  3.07 Gbits/sec                  
[  5]   6.01-7.01   sec   364 MBytes  3.06 Gbits/sec                  
[  5]   7.01-8.01   sec   358 MBytes  3.00 Gbits/sec                  
[  5]   8.01-9.01   sec   369 MBytes  3.09 Gbits/sec                  
[  5]   9.01-10.01  sec   362 MBytes  3.04 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  3.57 GBytes  3.06 Gbits/sec                  sender
[  5]   0.00-10.01  sec  3.56 GBytes  3.06 Gbits/sec                  receiver

VM to host benchmark

% iperf3 -c $(minikube ip -p server) --port 30201 --reverse
Connecting to host 192.168.105.39, port 30201
Reverse mode, remote host 192.168.105.39 is sending
[  5] local 192.168.105.1 port 59470 connected to 192.168.105.39 port 30201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec   206 MBytes  1.72 Gbits/sec                  
[  5]   1.01-2.00   sec   208 MBytes  1.75 Gbits/sec                  
[  5]   2.00-3.00   sec   209 MBytes  1.74 Gbits/sec                  
[  5]   3.00-4.01   sec   209 MBytes  1.75 Gbits/sec                  
[  5]   4.01-5.01   sec   208 MBytes  1.74 Gbits/sec                  
[  5]   5.01-6.01   sec   209 MBytes  1.75 Gbits/sec                  
[  5]   6.01-7.01   sec   210 MBytes  1.76 Gbits/sec                  
[  5]   7.01-8.01   sec   210 MBytes  1.76 Gbits/sec                  
[  5]   8.01-9.01   sec   189 MBytes  1.58 Gbits/sec                  
[  5]   9.01-10.01  sec   181 MBytes  1.52 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec  1.99 GBytes  1.71 Gbits/sec  4585            sender
[  5]   0.00-10.01  sec  1.99 GBytes  1.71 Gbits/sec                  receiver

VM to VM benchmark

% kubectl exec deploy/iperf3 --context client -- iperf3 -c $(minikube ip -p server) --port 30201 --forceflush
Connecting to host 192.168.105.39, port 30201
[  5] local 10.244.0.3 port 33452 connected to 192.168.105.39 port 30201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   278 MBytes  2.33 Gbits/sec  378    942 KBytes       
[  5]   1.00-2.00   sec   281 MBytes  2.36 Gbits/sec  435   1.10 MBytes       
[  5]   2.00-3.00   sec   282 MBytes  2.37 Gbits/sec   25   1.26 MBytes       
[  5]   3.00-4.00   sec   282 MBytes  2.37 Gbits/sec    0   1.36 MBytes       
[  5]   4.00-5.00   sec   282 MBytes  2.37 Gbits/sec    0   1.45 MBytes       
[  5]   5.00-6.00   sec   284 MBytes  2.38 Gbits/sec   40   1.55 MBytes       
[  5]   6.00-7.00   sec   281 MBytes  2.36 Gbits/sec   16   1.63 MBytes       
[  5]   7.00-8.00   sec   282 MBytes  2.37 Gbits/sec   49   1.67 MBytes       
[  5]   8.00-9.00   sec   284 MBytes  2.38 Gbits/sec    0   1.75 MBytes       
[  5]   9.00-10.00  sec   278 MBytes  2.33 Gbits/sec    0   1.81 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.75 GBytes  2.36 Gbits/sec  943             sender
[  5]   0.00-10.00  sec  2.75 GBytes  2.36 Gbits/sec                  receiver

@nirs
Copy link
Contributor Author

nirs commented Mar 22, 2025

Benchmark results

vfkit with vmnet-shared network is 2.8-5.0 times faster compared with qemu and socket_vment.

driver network host to vm vm to host vm to vm
vfkit nat 11.6 Gbits/s 60.6 Gbits/s n/a
vfkit vmnet-shared 8.3 Gbits/s 8.4 Gbits/s 11.9 Gbits/s
qemu socket_vmnet 3.0 Gbits/s 1.7 Gbits/s 2.4 Gbits/s

Results are similar to benchmarks with plain vms.

For more complete benchmarks see:

@nirs nirs requested review from cfergeau and medyagh March 22, 2025 15:48
nirs added 2 commits March 24, 2025 23:53
The vfkit driver supports now `nat` and `vmnet-shared` network options.
The `nat` option provides the best performance and is always available,
so it is the default network option. The `vmnet-shared` option provides
access between machines with lower performance compared to `nat`.

If `vment-shared` option is selected, we verify that vmnet-helper is
available. The check ensure that vmnet-helper is installed and sudoers
configuration allows the current user to run vment-helper without a
password.

If validating vment-helper failed, we return a new NotFoundVmnetHelper
reason pointing to vment-helper installation docs or recommending to use
`nat`. This is based on how we treat missing socket_vmnet for QEMU
driver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vfkit: unable to access other other clusters from the node vfkit: unable to create multi-node cluster
6 participants