Ozone 1.4.0 HA mode on k8s unable to perform leadership election when kill one scm pod #7194

cnmac · 2024-09-13T07:57:11Z

cnmac
Sep 13, 2024

Deploy the HA cluster to the k8s environment through ozone-1.4.0\kubernetes\examples\ozone-ha*.yaml, Some adjustments have been made as follows：

minDatanode: 2
datanode:
  replicas: 2
  storage: 500Gi
  storageClassName: 'local-path'
om:
  replicas: 1
  storage: 20Gi
  storageClassName: 'local-path'
s3g:
  replicas: 1
  storage: 20Gi
  storageClassName: 'local-path'
scm:
  replicas: 2
  storage: 20Gi
  storageClassName: 'local-path'

The config-configmap.yaml file is adjusted as follow，the value of {{ .Release.Namespace }} is set to ozone：

  OZONE-SITE.XML_ozone.scm.service.ids: scmservice
  OZONE-SITE.XML_ozone.scm.nodes.scmservice: scm0,scm1
  OZONE-SITE.XML_ozone.scm.address.scmservice.scm0: scm-0.scm.{{ .Release.Namespace }}.svc.cluster.local
  OZONE-SITE.XML_ozone.scm.address.scmservice.scm1: scm-1.scm.{{ .Release.Namespace }}.svc.cluster.local
  OZONE-SITE.XML_ozone.scm.ratis.enable: "true"
  OZONE-SITE.XML_ozone.scm.primordial.node.id: scm0

Two scm nodes are deployed, After the deployment is normal, a simple verification is performed as follows:

root@k8s-master-01:[/root/hdp/hadoop-3.3.6/bin]./hdfs dfs -ls /
root@k8s-master-01:[/root/hdp/hadoop-3.3.6/bin]./hdfs dfs -mkdir /volume1
2024-09-13 14:54:09,620 INFO rpc.RpcClient: Creating Volume: volume1, with root as owner and space quota set to -1 bytes, counts quota set to -1
root@k8s-master-01:[/root/hdp/hadoop-3.3.6/bin]./hdfs dfs -mkdir /volume1/bucket1
2024-09-13 14:54:16,192 INFO rpc.RpcClient: Creating Bucket: volume1/bucket1, with bucket layout FILE_SYSTEM_OPTIMIZED, root as owner, Versioning false, Storage Type set to DISK and Encryption set to false, Replication Type set to server-side default replication type, Namespace Quota set to -1, Space Quota set to -1 
root@k8s-master-01:[/root/hdp/hadoop-3.3.6/bin]./hdfs dfs -put /etc/hosts /volume1/bucket1/test
put: Unable to allocate a container to the block of size: 268435456, replicationConfig: RATIS/THREE. Unable to find enough nodes that meet the space requirement of 1073741824 bytes for metadata and 5368709120 bytes for data in healthy node set. Required 3. Found 2.

Then on k9s, press ctrl+d to stop the scm-1 pod，the scm-1 pod cannot be restarted.

| ozone                   datanode-0                                       ●       1/1        Running                    0        6      1409         n/a         n/a         n/a         n/a 100.64.0.31         k8s-master-01       24m           
│ ozone                   datanode-1                                       ●       1/1        Running                    0       16      1149         n/a         n/a         n/a         n/a 100.64.1.59         k8s-worker-02       24m           
│ ozone                   om-0                                             ●       1/1        Running                    0        6      1490         n/a         n/a         n/a         n/a 100.64.0.43         k8s-master-01       24m           
│ ozone                   s3g-0                                            ●       1/1        Running                    0        5      1120         n/a         n/a         n/a         n/a 100.64.0.251        k8s-master-01       24m           
│ ozone                   scm-0                                            ●       1/1        Running                    0       71      1489         n/a         n/a         n/a         n/a 100.64.0.102        k8s-master-01       24m           
│ ozone                   scm-1                                            ●       0/1        Init:1/2                   0        0         0         n/a         n/a         n/a         n/a 100.64.1.112        k8s-worker-02       8m55s

The log of scm-1 pod is as follows:

│ bootstrap , while invoking $Proxy14.send over nodeId=scm0,nodeAddress=scm-0.scm.ozone.svc.cluster.local/100.64.0.102:9863 after 26 failover attempts. Trying to failover after sleeping for 2000ms. Current retry count: 26.                      
│ bootstrap 2024-09-13 07:31:03 INFO  RetryInvocationHandler:422 - com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException): Server:637767f8-f121-46c1-8dda-f0aa313650c8  
│ bootstrap     at org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:110)                                                                                                             
│ bootstrap     at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:246)                                                                                                                                                  
│ bootstrap     at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:109)                                                                                 
│ bootstrap     at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:14430)                                                             
│ bootstrap     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)                                                                                                                                           
│ bootstrap     at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)                                                                                                                             
│ bootstrap     at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)                                                                                                                             
│ bootstrap     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)                                                                                                                                                                             
│ bootstrap     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)                                                                                                                                                                       
│ bootstrap     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)                                                                                                                                                                       
│ bootstrap     at java.base/java.security.AccessController.doPrivileged(Native Method)                                                                                                                                                             
│ bootstrap     at java.base/javax.security.auth.Subject.doAs(Subject.java:423)                                                                                                                                                                     
│ bootstrap     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)                                                                                                                                             
│ bootstrap     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)                                                                                                                                                                       
│ bootstrap , while invoking $Proxy14.send over nodeId=scm0,nodeAddress=scm-0.scm.ozone.svc.cluster.local/100.64.0.102:9863 after 27 failover attempts. Trying to failover after sleeping for 2000ms. Current retry count: 27.                      
│ bootstrap 2024-09-13 07:31:05 INFO  RetryInvocationHandler:422 - com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException): Server:637767f8-f121-46c1-8dda-f0aa313650c8  
│ bootstrap     at org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:110)                                                                                                             
│ bootstrap     at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:246)                                                                                                                                                  
│ bootstrap     at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:109)                                                                                 
│ bootstrap     at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:14430)                                                             
│ bootstrap     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)                                                                                                                                           
│ bootstrap     at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)                                                                                                                             
│ bootstrap     at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)                                                                                                                             
│ bootstrap     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)                                                                                                                                                                             
│ bootstrap     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)                                                                                                                                                                       
│ bootstrap     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)                                                                                                                                                                       
│ bootstrap     at java.base/java.security.AccessController.doPrivileged(Native Method)                                                                                                                                                             
│ bootstrap     at java.base/javax.security.auth.Subject.doAs(Subject.java:423)                                                                                                                                                                     
│ bootstrap     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)                                                                                                                                             
│ bootstrap     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)                                                                                                                                                                       
│ bootstrap , while invoking $Proxy14.send over nodeId=scm0,nodeAddress=scm-0.scm.ozone.svc.cluster.local/100.64.0.102:9863 after 28 failover attempts. Trying to failover after sleeping for 2000ms. Current retry count: 28.

In fact, I can access this scm-0.scm.ozone.svc.cluster.local/100.64.0.102:9863 through the curl command on the om-0 pod.

<<K9s-Shell>> Pod: ozone/om-0 | Container: om 
bash-4.2$ curl scm-0.scm.ozone.svc.cluster.local:9863             
It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon.
bash-4.2$ curl 100.64.0.102:9863
It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon.
bash-4.2$

In the log of scm-0, it appears that scm-1.scm.ozone.svc.cluster.local cannot be found, but it has become FollowerState and cannot perform leadership election.

 scm 2024-09-13 07:36:47 INFO  RaftServerConfigKeys:46 - raft.server.leaderelection.pre-vote = true (default)                                                                                                                                      
│ scm 2024-09-13 07:36:47 INFO  RoleInfo:139 - 637767f8-f121-46c1-8dda-f0aa313650c8: start 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection343                                                                                
│ scm 2024-09-13 07:36:47 INFO  LeaderElection:321 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection343 PRE_VOTE round 0: submit vote requests at term 2 for 9: peers:[cb225f15-ea43-43f0-a491-aefff0dedff5|scm-1.scm.ozone  
│ scm 2024-09-13 07:36:47 INFO  LeaderElection:136 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection343 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.S  
│ scm 2024-09-13 07:36:47 INFO  LeaderElection:89 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection343: PRE_VOTE REJECTED received 0 response(s) and 1 exception(s):                                                         
│ scm 2024-09-13 07:36:47 INFO  LeaderElection:136 -   Exception 0: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host scm-1.scm.ozone.svc.cluster.local      
│ scm 2024-09-13 07:36:47 INFO  LeaderElection:323 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection343 PRE_VOTE round 0: result REJECTED                                                                                    
│ scm 2024-09-13 07:36:47 INFO  RaftServer$Division:383 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF: changes role from CANDIDATE to FOLLOWER at term 2 for REJECTED                                                                   
│ scm 2024-09-13 07:36:47 INFO  RoleInfo:130 - 637767f8-f121-46c1-8dda-f0aa313650c8: shutdown 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection343                                                                             
│ scm 2024-09-13 07:36:47 INFO  RoleInfo:139 - 637767f8-f121-46c1-8dda-f0aa313650c8: start 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-FollowerState                                                                                    
│ scm 2024-09-13 07:36:52 INFO  FollowerState:143 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-FollowerState: change to CANDIDATE, lastRpcElapsedTime:5021934801ns, electionTimeout:5021ms                                             
│ scm 2024-09-13 07:36:52 INFO  RoleInfo:110 - 637767f8-f121-46c1-8dda-f0aa313650c8: shutdown 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-FollowerState                                                                                 
│ scm 2024-09-13 07:36:52 INFO  RaftServer$Division:383 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF: changes role from  FOLLOWER to CANDIDATE at term 2 for changeToCandidate                                                         
│ scm 2024-09-13 07:36:52 INFO  RaftServerConfigKeys:46 - raft.server.leaderelection.pre-vote = true (default)                                                                                                                                      
│ scm 2024-09-13 07:36:52 INFO  RoleInfo:139 - 637767f8-f121-46c1-8dda-f0aa313650c8: start 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection344                                                                                
│ scm 2024-09-13 07:36:52 INFO  LeaderElection:321 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection344 PRE_VOTE round 0: submit vote requests at term 2 for 9: peers:[cb225f15-ea43-43f0-a491-aefff0dedff5|scm-1.scm.ozone  
│ scm 2024-09-13 07:36:52 INFO  LeaderElection:136 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection344 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.S  
│ scm 2024-09-13 07:36:52 INFO  LeaderElection:89 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection344: PRE_VOTE REJECTED received 0 response(s) and 1 exception(s):                                                         
│ scm 2024-09-13 07:36:52 INFO  LeaderElection:136 -   Exception 0: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host scm-1.scm.ozone.svc.cluster.local      
│ scm 2024-09-13 07:36:52 INFO  LeaderElection:323 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection344 PRE_VOTE round 0: result REJECTED                                                                                    
│ scm 2024-09-13 07:36:52 INFO  RaftServer$Division:383 - 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF: changes role from CANDIDATE to FOLLOWER at term 2 for REJECTED                                                                   
│ scm 2024-09-13 07:36:52 INFO  RoleInfo:130 - 637767f8-f121-46c1-8dda-f0aa313650c8: shutdown 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-LeaderElection344                                                                             
│ scm 2024-09-13 07:36:52 INFO  RoleInfo:139 - 637767f8-f121-46c1-8dda-f0aa313650c8: start 637767f8-f121-46c1-8dda-f0aa313650c8@group-F8C567D447AF-FollowerState

This will render the entire cluster unusable, the cluster cannot be recovered.

Answered by errose28

Sep 13, 2024

OZONE-SITE.XML_ozone.scm.service.ids: scmservice
OZONE-SITE.XML_ozone.scm.nodes.scmservice: scm0,scm1
OZONE-SITE.XML_ozone.scm.address.scmservice.scm0: scm-0.scm.{{ .Release.Namespace }}.svc.cluster.local
OZONE-SITE.XML_ozone.scm.address.scmservice.scm1: scm-1.scm.{{ .Release.Namespace }}.svc.cluster.local
OZONE-SITE.XML_ozone.scm.ratis.enable: "true"
OZONE-SITE.XML_ozone.scm.primordial.node.id: scm0

OM and SCM use Ratis (a Raft implementation) which requires an odd number of nodes for quorum. Please retry with either 1 or 3 SCM roles.

Two scm nodes are deployed, After the deployment is normal, a simple verification is performed as follows:

The first two commands in this verification …

View full answer

errose28 · 2024-09-13T17:15:48Z

errose28
Sep 13, 2024
Collaborator

OZONE-SITE.XML_ozone.scm.service.ids: scmservice
OZONE-SITE.XML_ozone.scm.nodes.scmservice: scm0,scm1
OZONE-SITE.XML_ozone.scm.address.scmservice.scm0: scm-0.scm.{{ .Release.Namespace }}.svc.cluster.local
OZONE-SITE.XML_ozone.scm.address.scmservice.scm1: scm-1.scm.{{ .Release.Namespace }}.svc.cluster.local
OZONE-SITE.XML_ozone.scm.ratis.enable: "true"
OZONE-SITE.XML_ozone.scm.primordial.node.id: scm0

OM and SCM use Ratis (a Raft implementation) which requires an odd number of nodes for quorum. Please retry with either 1 or 3 SCM roles.

Two scm nodes are deployed, After the deployment is normal, a simple verification is performed as follows:

The first two commands in this verification are only metadata operations, which will go to the OM and not involve SCM. The final key put fails because it is trying to write 3-way replicated data to a cluster with only 2 datanodes. It looks like the two SCM's were able to elect a leader in this case since they returned a response, but since this config is not supported it's hard to say exactly what will happen from there, including the restart case.

1 reply

cnmac Sep 14, 2024
Author

Yes, I tried to configure 3 SCM and this problem did not occur. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ozone 1.4.0 HA mode on k8s unable to perform leadership election when kill one scm pod #7194

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Ozone 1.4.0 HA mode on k8s unable to perform leadership election when kill one scm pod #7194

cnmac Sep 13, 2024

Replies: 1 comment · 1 reply

errose28 Sep 13, 2024 Collaborator

cnmac Sep 14, 2024 Author

cnmac
Sep 13, 2024

Replies: 1 comment 1 reply

errose28
Sep 13, 2024
Collaborator

cnmac Sep 14, 2024
Author