Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Close index leads to temporary red cluster until shard has started #16016

Open
ashking94 opened this issue Sep 20, 2024 · 0 comments
Open
Labels
bug Something isn't working Cluster Manager untriaged

Comments

@ashking94
Copy link
Member

Describe the bug

As of today, when an index is closed, it makes the cluster red temporarily until the shard has started. I am able to see this issue in both conventional document replication cluster as well as remote store enabled clusters.

Logs on a remote store enabled cluster

opensearch-master1  | [2024-09-20T08:55:03,100][INFO ][o.o.p.PluginsService     ] [opensearch-master1] PluginService:onIndexModule index:[index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-master1  | [2024-09-20T08:55:03,138][INFO ][o.o.c.m.MetadataCreateIndexService] [opensearch-master1] [index1] creating index, cause [api], templates [], shards [1]/[0]
opensearch-master1  | [2024-09-20T08:55:03,141][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T08:55:03,148][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-node1    | [2024-09-20T08:55:03,219][INFO ][o.o.p.PluginsService     ] [opensearch-node1] PluginService:onIndexModule index:[index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-node1    | [2024-09-20T08:55:03,357][INFO ][o.o.i.t.RemoteFsTranslog ] [opensearch-node1] [index1][0] Downloaded data from remote translog till maxSeqNo = -1
opensearch-node1    | [2024-09-20T08:55:03,381][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,381][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,382][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,382][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-master1  | [2024-09-20T08:55:03,385][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index1][0]]]).
opensearch-master1  | [2024-09-20T08:55:03,388][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-node1    | [2024-09-20T08:55:03,457][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=STARTED engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,457][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=STARTED engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,458][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Scheduled retry with didRefresh=true
opensearch-master1  | [2024-09-20T08:55:03,482][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T08:55:08,198][INFO ][o.o.p.PluginsService     ] [opensearch-master1] PluginService:onIndexModule index:[index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-master1  | [2024-09-20T08:55:08,229][INFO ][o.o.c.m.MetadataMappingService] [opensearch-master1] [index1/h71N_-WHQcWNEjqbFctFJQ] create_mapping
opensearch-master1  | [2024-09-20T08:55:08,230][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-master1  | [2024-09-20T08:55:22,939][INFO ][o.o.c.m.MetadataIndexStateService] [opensearch-master1] closing indices [index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-master1  | [2024-09-20T08:55:22,940][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-master1  | [2024-09-20T08:55:23,006][INFO ][o.o.c.m.MetadataIndexStateService] [opensearch-master1] completed closing of indices [index1]
opensearch-master1  | [2024-09-20T08:55:23,007][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T08:55:23,010][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-master1  | [2024-09-20T08:55:23,073][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T08:55:23,075][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-node1    | [2024-09-20T08:55:23,140][INFO ][o.o.p.PluginsService     ] [opensearch-node1] PluginService:onIndexModule index:[index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-node1    | [2024-09-20T08:55:23,183][INFO ][o.o.i.s.IndexShard       ] [opensearch-node1] [index1][0] Downloaded translog and checkpoint files from=8 to=10
opensearch-node1    | [2024-09-20T08:55:23,207][INFO ][o.o.i.t.RemoteFsTranslog ] [opensearch-node1] [index1][0] Downloaded translog and checkpoint files from=8 to=10
opensearch-node1    | [2024-09-20T08:55:23,209][INFO ][o.o.i.t.RemoteFsTranslog ] [opensearch-node1] [index1][0] Downloaded data from remote translog till maxSeqNo = -1
opensearch-master1  | [2024-09-20T08:55:23,231][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[index1][0]]]).

Logs on doc rep clusters

opensearch-master1  | [2024-09-20T09:00:23,777][INFO ][o.o.p.PluginsService     ] [opensearch-master1] PluginService:onIndexModule index:[index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:23,821][INFO ][o.o.c.m.MetadataCreateIndexService] [opensearch-master1] [index1] creating index, cause [api], templates [], shards [1]/[0]
opensearch-master1  | [2024-09-20T09:00:23,825][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-node1    | [2024-09-20T09:00:23,882][INFO ][o.o.p.PluginsService     ] [opensearch-node1] PluginService:onIndexModule index:[index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:24,033][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index1][0]]]).
opensearch-master1  | [2024-09-20T09:00:24,096][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T09:00:26,347][INFO ][o.o.p.PluginsService     ] [opensearch-master1] PluginService:onIndexModule index:[index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:26,378][INFO ][o.o.c.m.MetadataMappingService] [opensearch-master1] [index1/G9Qow6fDROaCVD65DX-n0w] create_mapping
opensearch-master1  | [2024-09-20T09:00:42,889][INFO ][o.o.c.m.MetadataIndexStateService] [opensearch-master1] closing indices [index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:42,949][INFO ][o.o.c.m.MetadataIndexStateService] [opensearch-master1] completed closing of indices [index1]
opensearch-master1  | [2024-09-20T09:00:42,949][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T09:00:43,008][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-node1    | [2024-09-20T09:00:43,061][INFO ][o.o.p.PluginsService     ] [opensearch-node1] PluginService:onIndexModule index:[index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:43,097][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[index1][0]]]).

This problem may be aggravated in remote store enabled cluster due to existing behaviour where the translog is downloaded from remote store. This, however, is being fixed now.

Related component

Cluster Manager

To Reproduce

  1. Create an index
  2. Ingest some docs
  3. Close the index

Expected behavior

I am not very sure if the cluster should really turn red here or not. This gives a false sense of underlying issue that may be causing red cluster. IMHO the cluster should remain green during the close index is happening.

Additional Details

NA

@ashking94 ashking94 added bug Something isn't working untriaged labels Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager untriaged
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant