-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-52280, SDN-5330: Add ipsec connect wait service #4854
OCPBUGS-52280, SDN-5330: Add ipsec connect wait service #4854
Conversation
/assign @tssurya |
/assign @jcaamano |
/assign @huiran0826 |
3d9767a
to
e3c4fef
Compare
e3c4fef
to
19c2998
Compare
19c2998
to
48c5c1c
Compare
48c5c1c
to
8bd088c
Compare
158c96f
to
07dd5a4
Compare
@pperiyasamy: This pull request references SDN-5330 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
07dd5a4
to
fcf67a9
Compare
@pperiyasamy: This pull request references SDN-5330 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest-required |
/retest |
/assign @yuqi-zhang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good
I just have some silly questions for my own understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great commit message and description of the problem made it easy for me to understand..
qq re:
The wait-for-ipsec-connect.service service is added into the base template to avoid two reboots during upgrade if it goes into IPsec machine configs rendered by the network operator.
I am new to MCO.. so is this a thing that if you add something in base template then during the main reboot of upgrade itself things will kick in versus I assume there is another path which acts like day2 that causes a second set of reboots? sorry for the stupid question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tssurya good question. if we define this service to be part of base template, then mco renders it as part of the final rendered machine config for the pool (so it will be single reboot for the node during upgrade), If we add it to be part of IPsec machine configs in CNO (https://github.com/openshift/cluster-network-operator/blob/master/bindata/network/ovn-kubernetes/common/80-ipsec-master-extensions.yaml), then it causes two node reboots. one is for mco rendering updated IPsec machine configs on the node and then mco rendering its own final rendered machine configs. that's what we are avoiding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you makes total sense!
if ! grep -q "auto=start" /etc/ipsec.d/openshift.conf; then | ||
sed -i '/^.*conn ovn.*$/a\ auto=start' /etc/ipsec.d/openshift.conf | ||
fi | ||
chroot /proc/1/root ipsec restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is what loads up all the connections based on ocp.conf file above?
question out of ignorance.. but ipsec here is "ipsec": {"NetworkManager-libreswan", "libreswan"},
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chroot /proc/1/root ipsec restart
commad restarts ipsec.service
systemd service which eventually rebooting pluto daemon (this loads up all the connections from the /etc/ipsec.d/opsenshift.conf
file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ipsec == pluto gotcha
# Modify existing IPsec connection entries with "auto=start" | ||
# option and restart ipsec systemd service. This helps to | ||
# establish IKE SAs for the existing IPsec connections with | ||
# peer nodes. This option will be deleted from connections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the new changes coming in ovs 3.5? where we plan to run it as a systemd process on the node?
so this fix is kinda temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is temporary fix for 4.19, when openvswitch-ipsec
systemd service is consumed, then this can be removed and only 60s wait logic is needed.
# establish IKE SAs for the existing IPsec connections with | ||
# peer nodes. This option will be deleted from connections | ||
# once ovs-monitor-ipsec process spinned up on the node by | ||
# ovn-ipsec-host pod, but still it won't reestablish IKE SAs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I assume that is also the case today when the ipsec-host pod comes up it sees what all connections are already established right and does nothing?
this is the script you gave me that does that in ovs: https://github.com/openvswitch/ovs/blob/main/ipsec/ovs-monitor-ipsec.in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, the ovs-monitor-ipsec
process just comes up and does nothing for existing peer nodes (except deleting auto=start
parameter on every connection entries in /etc/ipsec.d/openshift.conf
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto=route is the default which means "The connection will be automatically started when traffic matching its policy is detected."
so that means given kubelet has not started we won't have IKEs established here.. hence we need auto=start.
chroot /proc/1/root ipsec restart | ||
|
||
# Wait for upto 60s to get IPsec SAs to establish with peer nodes. | ||
timeout=60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: update commit msg/pr description to say 1min.. perhaps add your reasoning which you mentioned in the review comments for why you chose 1min ?
so the extra timewait of 1min per node (assuming reboots happening serially?) does that increase overall upgrades we need to be worried about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
The reboot within control plane nodes are serial.
The reboot within worker nodes are serial.
The reboot of worker and control plane nodes are happening in parallel.
on a 6 nodes cluster it took upto 2s (worst case 4s), 32 nodes cluster it took 4s (worst case 14s), so this may not increase upgrade time considering we have only 60s wait time in the script.
contents: | | ||
[Unit] | ||
Description=Ensure IKE SA established for existing IPsec connections. | ||
After=ipsec.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the ipsec service starts first then we run this and then we restart ipsec service to load the connections?
im confused with the before/after semantics here..
so before=kubelet means this wait service will first run before kubelet starts right?
and ipsec service will run before wait service? i.e wait service will run after ipsec service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, right, this is the order of sequence for the service startup when node come up. while node is being shutdown, it follows reverse order for stopping the services.
because of auto=start
change in the connection entries, we must restart pluto daemon to take that into effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha thank you
When node goes for a reboot on an IPsec enabled cluster, once it comes up, after ovn-ipsec-host pod is deployed, the ovs-monitor-ipsec process deployed by the pod parses /etc/ipsec.d/openshift.conf file, makes pluto daemon to establish IKE SAs with peer nodes, but these are established after kubelet is started, workload pods scheduled on this node would fail communicating with other node pods until IPsec SAs are established. So this commit adds wait-for-ipsec-connect.service systemd service which depends on ipsecenabler.service created by IPsec machine config. This new service loads existing IPsec connections created by OVN/OVS into pluto daemon with "auto=start" option and waits upto 60s until IPsec tunnels are established. This gives a better chance IPsec SAs are established even before kubelet is started and when ovn-ipsec-host pod comes up later, it doesn't have to do anything for existing IPsec connections. We derived total wait time 60s based on the testing from 6 and 32 nodes cluster. In 6 node cluster, it took mostly about 2s (in worst case 4s), In 32 node cluster, it took mostly about 2s or 4s (in worst case 14s) to get IPsec connections up with peer nodes. The wait-for-ipsec-connect.service service is added into the base template to avoid two reboots during upgrade if it goes into IPsec machine configs rendered by the network operator. Signed-off-by: Periyasamy Palanisamy <[email protected]>
We noticed pluto is tearing down established IPsec connections in parallel with crio stopping all pod containers which includes stopping api server pod container. It happens when node reboot initiated for rendering new machine configs at the time of OCP upgrade. This creates api connection disruptions in the cluster, these disruptions are generating events, caught by origin monitor tests and failing IPsec upgrade CI lane and it may also cause noticeable temporary pod traffic failure during upgrade for IPsec enabled cluster. Hence this commit adds Before=crio.service dependency on the ipsec.service so that pluto daemon is stopped after the shutdown of crio service, all pod containers are stopped on the node. This gives enough room for clients to gracefully move to another control plane node for API connections. Signed-off-by: Periyasamy Palanisamy <[email protected]>
666448f
to
1fa5eaa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I'm happy with how it looks and my questions were answered.
I'll let @trozet take another pass for the approve
#!/bin/bash | ||
set -x | ||
|
||
if [ ! -e "/etc/ipsec.d/openshift.conf" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pperiyasamy on fresh cluster install I assume this will be the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tssurya right. this will be the case for ipsec external and disabled mode as well.
/retest |
/lgtm |
@yuqi-zhang can you please approve? |
@pperiyasamy: This pull request references Jira Issue OCPBUGS-52280, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. This pull request references SDN-5330 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logically seems fine, the main thing that has bitten us in the past for similar services were systemd dependency loops, but as I understand it there is strict ordering here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the change looks fine to me. The auto=start part is a little dangerous as it increases chances for crossing streams at scale, so sooner we can get rid of it (by switching to running openviswitch-ipsec.service on the host) the better, but should be OK for now with the pinned Libreswan 4.6.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: igsilya, pperiyasamy, trozet, tssurya, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@pperiyasamy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
d96d261
into
openshift:main
@pperiyasamy: Jira Issue OCPBUGS-52280: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-52280 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-machine-config-operator |
The IPsec upgrade CI job (
e2e-aws-ovn-ipsec-upgrade
) is never passing and failing with api connection disruptive events and some cluster operator are unavailable for more than allowed period. so we are addressing those issueswith this PR by fixing two things.
Add ipsec connect wait service
When node goes for a reboot on an IPsec enabled cluster, once it comes up, after ovn-ipsec-host pod is deployed, the ovs-monitor-ipsec process deployed by the pod parses /etc/ipsec.d/openshift.conf file, makes pluto daemon to
establish IKE SAs with peer nodes, but these are established after kubelet is started, workload pods scheduled on this node would fail communicating with other node pods until IPsec SAs are established.
So the commit e96fe31 adds wait-for-ipsec-connect.service systemd service which depends on ipsecenabler.service created by IPsec machine config. This new
service loads existing IPsec connections created by OVN/OVS into pluto daemon with "auto=start" option and waits upto 3 minutes until IPsec tunnels are established. This gives a better chance IPsec SAs are established even
before kubelet is started and when ovn-ipsec-host pod comes up later, it doesn't have to do anything for existing IPsec connections.
The wait-for-ipsec-connect.service service is added into the base template to avoid two reboots during upgrade if it goes into IPsec machine configs rendered by the network operator.
Add crio dependency for ipsec.service
We noticed pluto is tearing down established IPsec connections in parallel with crio stopping all pod containers which includes stopping api server pod container. It happens when node reboot initiated for rendering new
machine configs at the time of OCP upgrade.
This creates api connection disruptions in the cluster, these disruptions are generating events, caught by origin monitor tests and failing IPsec upgrade CI lane and it may also cause noticeable temporary pod traffic failure during
upgrade for IPsec enabled cluster.
Hence the commit fcf67a9 adds Before=crio.service dependency on the ipsec.service so that pluto daemon is stopped after the shutdown of crio service, all pod containers are stopped on the node. This gives enough room for clients to
gracefully move to another control plane node for API connections.
Big thanks to @igsilya for helping to troubleshoot and fixing this problem.