-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add minimizeDataMovement to Rebalacne API #15110
base: master
Are you sure you want to change the base?
Add minimizeDataMovement to Rebalacne API #15110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a rebalance specific test case as well for the new flag?
OfflineClusterIntegrationTest
had some tests with dry-run, but can probably be enhanced to add one for this. You can also look at PartialUpsertTableRebalanceIntegrationTest
as an example
TableRebalancerClusterStatelessTest
might be a good one too
@@ -54,41 +54,44 @@ public InstanceAssignmentDriver(TableConfig tableConfig) { | |||
} | |||
|
|||
public InstancePartitions assignInstances(InstancePartitionsType instancePartitionsType, | |||
List<InstanceConfig> instanceConfigs, @Nullable InstancePartitions existingInstancePartitions) { | |||
List<InstanceConfig> instanceConfigs, @Nullable InstancePartitions existingInstancePartitions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend adding separate method signatures with the added parameter. Have the existing one pass 'forceMinimizeDataMovement' as 'false'
e.g.
public InstancePartitions assignInstances(InstancePartitionsType instancePartitionsType,
List<InstanceConfig> instanceConfigs, @Nullable InstancePartitions existingInstancePartitions) {
return assignInstances(instancePartitionsType, instanceConfigs, existingInstancePartitions, false);
}
That way you only need to modify the Rebalance code to call the newly added method signatures. Remaining code that doesn't need to pass this flag can continue using the original signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String tableNameWithType = _tableConfig.getTableName(); | ||
LOGGER.info("Starting {} instance assignment for table {}", instancePartitionsName, tableNameWithType); | ||
|
||
boolean minimizeDataMovement = instanceAssignmentConfig.isMinimizeDataMovement(); | ||
boolean minimizeDataMovement = forceMinimizeDataMovement || instanceAssignmentConfig.isMinimizeDataMovement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there are some types of TableConfigs
for which it is not allowed to set an instance config.
There is a method InstanceAssignmentConfigUtils::allowInstanceAssignment
which checks for this. I think it makes sense to only set forceMinimizeDataMovement
to true
if instance assignment config is even allowed. You can perhaps log a warning if forceMinimizeDataMovement
is set to true
but instance assignment isn't allowed, and just set minimizeDataMovement
to false
in that case
nit: perhaps also add a comment that explains all of the above
cc @Jackie-Jiang @klsince can you folks confirm if my understanding of the above is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in InstanceAssignmentConfigUtils::getInstanceAssignmentConfig
it handles the case when instance config is not set (this covers cases when allowInstanceAssignment
is false) by falling back to look at the minimizeDataMovement
flag in segment validation config.
Can we safely assume that InstanceAssignmentConfigUtils::getInstanceAssignmentConfig
is always called when making InstanceAssignmentConfig
instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has a Precondition check for allowInstanceAssignment
!= false
, indicating that it just throws an exception in that case
Preconditions.checkState(allowInstanceAssignment(tableConfig, instancePartitionsType),
"Instance assignment is not allowed for the given table config");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just capturing offline discussion here:
InstanceTagPoolSelector::selectInstances
uses the _minimizeDataMovement
flag:
if (_minimizeDataMovement) {
assert _existingInstancePartitions != null;
Map<Integer, Integer> poolToNumExistingInstancesMap = new TreeMap<>();
...
...
Based on the above, if we blindly pass in _minimizeDataMovement
even for balanced (default) instance assignment, then this assert will fail since _existingInstancePartitions
should be null. So we should only set _minimizeDataMovement
to 'true' if allowInstanceAssignment
!= false
@J-HowHuang to verify this via testing as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public InstanceTagPoolSelector(InstanceTagPoolConfig tagPoolConfig, String tableNameWithType,
boolean minimizeDataMovement, @Nullable InstancePartitions existingInstancePartitions) {
...
_minimizeDataMovement = minimizeDataMovement && existingInstancePartitions != null;
_existingInstancePartitions = existingInstancePartitions;
}
It does the check in the constructor of InstanceTagPoolSelector
so it should not be a problem. Though need to document this to manifest the behavior in the api docs, especially the case that existingInstancePartitions
is set to null
when bootstrap=true
in the rebalance api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. please just verify that rebalance with default instance assignment and replica group instance assignment both work in that case.
// config will be used to determine whether to run the Minimal Data Movement Algorithm. | ||
@JsonProperty("minimizeDataMovement") | ||
@ApiModelProperty(example = "true") | ||
private boolean _minimizeDataMovement = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, should we start with defaulting this to false
, test it out in a cluster with a few different scenarios, and then add a new change to default this to true
once we confirm it always does the right thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. 1bb364c
boolean enableStrictReplicaGroup = tableConfig.getRoutingConfig() != null | ||
&& RoutingConfig.STRICT_REPLICA_GROUP_INSTANCE_SELECTOR_TYPE.equalsIgnoreCase( | ||
tableConfig.getRoutingConfig().getInstanceSelectorType()); | ||
LOGGER.info( | ||
"Start rebalancing table: {} with dryRun: {}, preChecks: {}, reassignInstances: {}, includeConsuming: {}, " | ||
+ "bootstrap: {}, downtime: {}, minReplicasToKeepUpForNoDowntime: {}, enableStrictReplicaGroup: {}, " | ||
+ "lowDiskMode: {}, bestEfforts: {}, externalViewCheckIntervalInMs: {}, " | ||
+ "externalViewStabilizationTimeoutInMs: {}", | ||
+ "externalViewStabilizationTimeoutInMs: {}, forceMinimizeDataMovement: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in the log just use minimizeDataMovement
as that is the config name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2d7cb55
to
8edc46a
Compare
…et forceMinimizeDataMovement to false
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #15110 +/- ##
============================================
- Coverage 61.75% 56.05% -5.70%
- Complexity 207 803 +596
============================================
Files 2436 2152 -284
Lines 133233 115670 -17563
Branches 20636 18584 -2052
============================================
- Hits 82274 64839 -17435
- Misses 44911 45517 +606
+ Partials 6048 5314 -734
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Add new boolean property
minimizeDataMovement
toorg.apache.pinot.controller.helix.core.rebalance.RebalanceConfig
.This flag enforces minimal data movement algorithm if set to true. Otherwise, fall back to the
minimizeDataMovement
tag set in the table config.