-
Deploy the HA cluster to the k8s environment through ozone-1.4.0\kubernetes\examples\ozone-ha*.yaml, Some adjustments have been made as follows:
The config-configmap.yaml file is adjusted as follow,the value of {{ .Release.Namespace }} is set to ozone:
Two scm nodes are deployed, After the deployment is normal, a simple verification is performed as follows:
Then on k9s, press ctrl+d to stop the scm-1 pod,the scm-1 pod cannot be restarted.
The log of scm-1 pod is as follows:
In fact, I can access this scm-0.scm.ozone.svc.cluster.local/100.64.0.102:9863 through the curl command on the om-0 pod.
In the log of scm-0, it appears that scm-1.scm.ozone.svc.cluster.local cannot be found, but it has become FollowerState and cannot perform leadership election.
This will render the entire cluster unusable, the cluster cannot be recovered. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
OM and SCM use Ratis (a Raft implementation) which requires an odd number of nodes for quorum. Please retry with either 1 or 3 SCM roles.
The first two commands in this verification are only metadata operations, which will go to the OM and not involve SCM. The final key put fails because it is trying to write 3-way replicated data to a cluster with only 2 datanodes. It looks like the two SCM's were able to elect a leader in this case since they returned a response, but since this config is not supported it's hard to say exactly what will happen from there, including the restart case. |
Beta Was this translation helpful? Give feedback.
OM and SCM use Ratis (a Raft implementation) which requires an odd number of nodes for quorum. Please retry with either 1 or 3 SCM roles.
The first two commands in this verification …