Replies: 4 comments 4 replies
-
There is a test It starts MiniOzoneCluster with different count of DNs and verifies that:
For now it's passed to get successful CI build. To reproduce a problem you have to uncomment next lines (one, two) in P.S.
and
|
Beta Was this translation helpful? Give feedback.
-
@siddhantsangwan told that:
More thoughts about reopening pipelines that doesn't take part in data writing on cluster startup:
|
Beta Was this translation helpful? Give feedback.
-
@nandakumar131 told that:
How to fix current issue for now:
|
Beta Was this translation helpful? Give feedback.
-
@Montura if you choose to go with these configurations suggested by Nanda:
All your Datanodes will have enough time to get registered and be part of pipelines. Then, if you have multiple racks, the default If you don't have multiple racks, Datanodes are picked randomly, which is eventually expected to lead to fair pipeline distribution. |
Beta Was this translation helpful? Give feedback.
-
There is a desing doc saying that:
"For example, with 5 and replication factor of 3, you would end up with 2 datanode not being a part of any pipeline."
This prevents pipelines from being distributed fairly among the datanodes in the cluster. Especially when datanodes become ready one by one.
I’m thinking in a way only for cluster startup scenario:
OPENED
. And there is a time slot when there are someOPENED
pipelines exists, but data writing from client still doesn’t start.OPENED
(and not involved in data writing yet) pipelines to allowBackgroundPipelineCreator
to create pipelines again to include datanodes that started later (I’ve talked about these specific nodes in the previous message).Some thoughts:
PipelineManagerImpl::scrubPipelines
that collects too longALLOCATED
andCLOSED
pipelines could be tuned or use some options (likeOZONE_SCM_PIPELINE_ALLOCATED_TIMEOUT
) to collect alsoOPENED
pipelines without containers?Ex. Cluster of 5 datanodes, first 3 start up fast, 2 nodes start later, OZONE_DATANODE_PIPELINE_LIMIT = 10:
Total pipeline count could be 16 instead of 10. And all 5 datanodes will be utilized as much as possible.
Beta Was this translation helpful? Give feedback.
All reactions