AIS-Operator: new proxy replicas cannot join the cluster after deleting the original master replica #208

eahydra · 2025-01-18T17:22:56Z

Is there an existing issue for this?

I have searched the existing issues

Describe the bug

Hi guys, I created a aistore cluster with ais-operator, the cluster has two proxy replicas. I tried to simulate a cluster failure to determine if the cluster was still working. So I performed the following steps:

Cordon the node where the primary replica is running and delete the primary replica "aistore-proxy-0".
Another replica, aistore-proxy-1, becomes the primary server.
Update spec.proxySpec.size=3 in the AIStore CRD object to try to increase the scale.
The new replica aistore-proxy-2 failed to join the cluster. From the log, it tried to connect the original primary replica due to the primary url in global config still is aistore-proxy-0.

Expected Behavior

The new replica aistore-proxy-2 should connect to the new primary replica aistore-proxy-1 and successfully join to the cluster.

Current Behavior

The new replica aistore-proxy-2 failed to join the cluster.

Steps To Reproduce

As I described in Describe the bug

Possible Solution

I think it would be a good idea to update the global config with the latest master URL the next time reconcile.

Additional Information/Context

No response

AIStore build/version

latest, ais-operator/latest

Environment details (OS name and version, etc.)

Ubuntu 22.04, K8s v1.30

The text was updated successfully, but these errors were encountered:

aaronnw · 2025-01-21T04:47:27Z

Thanks for opening, will take a look.

Right now a new node will start off trying to connect to proxy-0 and, ideally, if proxy-0 is not primary it will update the cluster map provided to the new node, including the current primary. But since in your case proxy-0 is not ready, this fails.

To address this, we could have the init container query the proxy service to set the correct primary in the initial config.

However if I understand correctly, this situation comes up because you're asking it to scale up when proxy-0 can't be scheduled onto a running node. There's no real reason proxies need to be a statefulset vs. a deployment, so we could possibly look into updating that and removing any volume bindings that restrict a proxy to a specific node. This way proxy-0 would simply be rescheduled and when proxy-2 comes up, proxy-0 would be ready to receive requests. (Targets are another issue -- inherently very stateful, so cordoning and setting up new PVs is a more risky/manual process.)

eahydra · 2025-01-21T07:11:00Z

Thanks for your reply @aaronnw .

To address this, we could have the init container query the proxy service to set the correct primary in the initial config.

Agree. This is more reasonable than updating the primary url in the global configuration.

However if I understand correctly, this situation comes up because you're asking it to scale up when proxy-0 can't be scheduled onto a running node.

In fact, I want to mock a node failure scenario to test the election process of aistore proxy and the impact of the intermediate state on file reading and writing.

so we could possibly look into updating that and removing any volume bindings that restrict a proxy to a specific node.

I am curious, is the data synchronized between proxies just the list of asinodes (proxy and target)?

alex-aizman · 2025-01-21T15:44:16Z

Just a quick reaction to something that was said earlier:

There's no real reason proxies need to be a statefulset vs. a deployment, so we could possibly look into updating that and removing any volume bindings that restrict a proxy to a specific node.

There's no reason, real or imaginary. Proxies can run anywhere with no restrictions or expectations other than intra-cluster connectivity at low latency.

aaronnw · 2025-01-30T21:57:46Z

@eahydra
Sorry for delay, I've been out for a bit.

I am curious, is the data synchronized between proxies just the list of asinodes (proxy and target)?

I was referring to the state PVs we use for caching data which includes several types of metadata including the configuration and cluster map. This can all be synced when a proxy joins the cluster initially, thus no need for long-term storage or any statefulset. I believe we did this initially for consistency with the target nodes, which do need to be stateful.

AIS already does support the idea of a "discovery" url. We may be able to simply set this to the headless service as a fallback. Looking into it...

eahydra added the bug label Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIS-Operator: new proxy replicas cannot join the cluster after deleting the original master replica #208

AIS-Operator: new proxy replicas cannot join the cluster after deleting the original master replica #208

eahydra commented Jan 18, 2025 •

edited

Loading

aaronnw commented Jan 21, 2025 •

edited

Loading

eahydra commented Jan 21, 2025

alex-aizman commented Jan 21, 2025

aaronnw commented Jan 30, 2025

AIS-Operator: new proxy replicas cannot join the cluster after deleting the original master replica #208

AIS-Operator: new proxy replicas cannot join the cluster after deleting the original master replica #208

Comments

eahydra commented Jan 18, 2025 • edited Loading

Is there an existing issue for this?

Describe the bug

Expected Behavior

Current Behavior

Steps To Reproduce

Possible Solution

Additional Information/Context

AIStore build/version

Environment details (OS name and version, etc.)

aaronnw commented Jan 21, 2025 • edited Loading

eahydra commented Jan 21, 2025

alex-aizman commented Jan 21, 2025

aaronnw commented Jan 30, 2025

eahydra commented Jan 18, 2025 •

edited

Loading

aaronnw commented Jan 21, 2025 •

edited

Loading