Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubspot cell balancer & normalizer #126

Open
wants to merge 159 commits into
base: hubspot-2.5
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
159 commits
Select commit Hold shift + click to select a range
9c62045
HubSpotCellCostFunction
Oct 16, 2024
e4f5a14
Adjust to handle little endian cell encoding
Oct 17, 2024
aad121f
Mark as private
Oct 17, 2024
9b5002b
Revert to big endian, simplify heuristics
Oct 17, 2024
9a954dd
Fix NPE, add logging, run spotless
Oct 29, 2024
6271c26
Clean up
Oct 29, 2024
995b8cb
Add init debug
Oct 30, 2024
d94d862
Clarify expectations via preconditions
Oct 30, 2024
8202674
Update debug and add guard for non default tables
Oct 30, 2024
f83dc2e
Emit setup at info level to ensure we see it
Oct 30, 2024
8c6c48c
Add info state dump on every cost calc call
Oct 30, 2024
ab52ea6
Add some debug so we can see why regionlocation would be null
Oct 31, 2024
0ac73bb
emit if we disable locationfinder
Oct 31, 2024
275ba6b
Ensure the region finder is set if the cell cost function exists
Oct 31, 2024
1f58743
Emit the multiplier
Oct 31, 2024
57205da
Missed one spot
Oct 31, 2024
a9e1547
Fix debug
Oct 31, 2024
b59c17c
skip any that snuck in, emit better logs, and fail more obviously here
Oct 31, 2024
840496d
include count w/o servers
Oct 31, 2024
d7081eb
Make it legible
Oct 31, 2024
856e440
list details of the unknown region
Oct 31, 2024
7df5fc2
Emit which table
Oct 31, 2024
a1d849f
Skip if empty region server mapping, assume it's empty for now
Oct 31, 2024
45c8182
Emit the cells in the region here
Oct 31, 2024
1a67f81
Tell us about which cells this region holds
Oct 31, 2024
5328064
Add emission for region size
Oct 31, 2024
4490e1b
Make clear if we skip any non-empty regions
Oct 31, 2024
580e31c
Include this
Oct 31, 2024
ba831d9
If the first two bytes of start/stop are the same, the region holds e…
Oct 31, 2024
cdd6e77
Correct how we calculate the cells
Oct 31, 2024
426aeca
add a version identifier here
Nov 5, 2024
4a0a7fa
This isn't really an edge case so much as the main case
Nov 5, 2024
867f492
Update logging & calcs
Nov 5, 2024
6120bc8
Use shaded version
Nov 5, 2024
b3a664b
emit the table name and namespace
Nov 5, 2024
a3b9b88
Switch to multidimensional array to reduce allocations
Nov 5, 2024
45ed606
Optimize the balancer eval function
Nov 6, 2024
941d2d2
Deps
Nov 6, 2024
0123942
Add custom step generator for stochastic load to prioritize shuffling…
Nov 6, 2024
43fb3da
Use our own candidate generator for the stochastic balancer
Nov 6, 2024
3d45b4d
Yep that sure is a 5
Nov 6, 2024
cf1064d
It has to be the order of the ordinal of course
Nov 6, 2024
04fecfe
Correct reservoir sampling seed, and use boolean[] instead of set
Nov 7, 2024
35adc46
Cost is invoked 2-3 times per use, memoize it
Nov 7, 2024
3a744ab
Filter out non-default regions
Nov 7, 2024
fe67705
Prevent being out of bounds
Nov 7, 2024
1d9b7ef
Correct off-by-1
Nov 7, 2024
0ac31f5
Add guards here
Nov 7, 2024
06c2ad5
Include the tables
Nov 7, 2024
d25a9ac
Do not emit null actions
Nov 7, 2024
235fe34
only use these balancer tools on objects-3
Nov 7, 2024
3517598
This is a bug - only add the cost of the function if it's needed
Nov 7, 2024
c4c6296
Add a lot of trace logging
Nov 7, 2024
1b1bc44
More logging
Nov 7, 2024
6ac45ba
Trace enough to figure out why cell count is 0
Nov 7, 2024
583aca9
Fix the subtle array access bug
Nov 7, 2024
26b4f21
Undo memoization on cluster state change, and allow to trace teh bala…
Nov 8, 2024
bbed188
Also emit the full cost breakdown per step
Nov 8, 2024
dbe3263
Rework the cost function to be the number of cells (over all servers)…
Nov 8, 2024
41a0b41
Update debug to focus on which region/cells are getting picked
Nov 8, 2024
1873181
Tweak down to trace
Nov 8, 2024
adf28a6
Rework how the cost function calculates and updates cost
Nov 8, 2024
06e6b83
Fix edge case for short rowkeys
Nov 8, 2024
e270d6a
Merge pull request #120 from HubSpot/isolate-generator-cost-mismatch
szabowexler Nov 8, 2024
4f61691
Add debug and fix the state error here
Nov 8, 2024
c6a84ec
No noop
Nov 8, 2024
f40e83b
Print which generator we've selected
Nov 8, 2024
7011733
Tweak logs to allow for local run
Nov 18, 2024
bd83602
use shaded version
Nov 18, 2024
5d69edc
Use gson
Nov 18, 2024
d8cef32
Try exposing only specific fields
Nov 18, 2024
e1f1da1
Only emit objects-3, and include the full region info
Nov 18, 2024
eb52446
Mark as exposed
Nov 19, 2024
eca1dfe
Refine when we print, and what
Nov 19, 2024
d68f91a
Update serde for int2int map so we can run the balancer locally
Nov 19, 2024
aad1e1b
Stash partial balancer rework
Nov 20, 2024
5a1dc79
Stash2 -- gets to a balance of 1-6 cells/RS
Nov 20, 2024
9199cd8
First cleanup
Nov 21, 2024
a986195
Disable automatic logging for local runs
Nov 21, 2024
0677ff3
Merge pull request #122 from HubSpot/rework-cell-balancer
szabowexler Nov 21, 2024
258caf9
Fix test
Nov 21, 2024
e95ef44
Stash work
Nov 26, 2024
221a8c4
cost is actually how far we are from having as many cells as possible
Nov 26, 2024
20cbb95
add custom normalizer
Nov 26, 2024
9eedc86
Revert "add custom normalizer"
Nov 26, 2024
3526567
add hubspot normalizer
Nov 26, 2024
83dad7f
Prioritize spreading cells out
Nov 26, 2024
378ad33
Not for inclusion
Nov 26, 2024
faf41c9
Merge pull request #124 from HubSpot/cell-spread-out
szabowexler Nov 27, 2024
c0895ba
Extract static methods, simplify
Dec 2, 2024
5ba25ae
Clean up the candidate generator
Dec 2, 2024
3bd6efb
Elevate to higher package so normalizer can share common cell ops
Dec 2, 2024
61b89c5
Clean up + normalize cost
Dec 2, 2024
66b2adf
Update the normalizer to avoid merging across cell lines
Dec 2, 2024
ddb3017
Mark addition
Dec 2, 2024
bf9fe28
Finish clean up
Dec 2, 2024
6044fc7
Fix import
Dec 2, 2024
edc89ff
Print error if cell id is out of bounds
Dec 2, 2024
6a7511b
Move this up
Dec 2, 2024
ff1ef1f
Improve debug output
Dec 2, 2024
e0695c0
Cap max cells per RS to be 10% of all cells
Dec 3, 2024
e6a9d7c
Do not install unless multiplier is positive
Dec 3, 2024
7b69247
Target a specific capped cell count
Dec 4, 2024
9a41c05
Merge pull request #127 from HubSpot/target-specific-cell-count
szabowexler Dec 4, 2024
6b2df75
Simplify when we fill underloaded
Dec 4, 2024
4cae482
include target
Dec 4, 2024
28b0271
Emit which generator
Dec 4, 2024
f8085d5
randomize the under-/overloaded server picked
Dec 4, 2024
e8361df
Mark if we keep or reject
Dec 4, 2024
ca45b36
Print region counts
Dec 4, 2024
b8ac383
Prioritize balance by region and THEN evening out cell isolation
Dec 4, 2024
ce825c9
Merge pull request #128 from HubSpot/add-unbalance-dominating-factor
szabowexler Dec 4, 2024
5a42bb3
Add guard in case of error computing online cost, and do a deep reset…
Dec 5, 2024
b685173
Clean MutableRegionInfo
Dec 5, 2024
cf40b93
Clean up ServerName
Dec 5, 2024
b7f108a
Clean up TableName
Dec 5, 2024
2a22aea
Clean up Address
Dec 5, 2024
d87742d
Clean up BalancerClusterState
Dec 5, 2024
0f67a71
Clean up RegionLocationFinder
Dec 5, 2024
b425e9a
Clean up StochasticLoadBalancer
Dec 5, 2024
d8e9f9a
More cleanup StochasticLoadBalancer
Dec 5, 2024
2337b4c
Clean up RegionNormalizerFactory
Dec 5, 2024
d5b2b9a
Merge pull request #129 from HubSpot/clean-up-for-merge
szabowexler Dec 5, 2024
6caeb48
clean imports
Dec 5, 2024
e865850
style
Dec 5, 2024
f8d48d9
Style
Dec 5, 2024
70bc20f
Small clusters may not have enough regions/cell to support lower isol…
Dec 6, 2024
36d4fd2
Merge pull request #130 from HubSpot/handle-small-clusters
szabowexler Dec 6, 2024
f0cf9ff
Fix which Ints
Dec 6, 2024
71ecc2b
Emit the cluster state at the end of balance
Dec 16, 2024
da0834e
Revert "Remove all the debugging changes, generally make ready for re…
szabowexler Dec 16, 2024
59b41c2
Merge pull request #133 from HubSpot/revert-129-clean-up-for-merge
szabowexler Dec 16, 2024
25e0cf9
Measure this distance by region count from balanced
Dec 18, 2024
c2fde1d
Set target to 20% of cells
Jan 7, 2025
9719fa7
Split single function into two, and try simple random shuffling
Jan 24, 2025
282bbf2
Start by just testing live
Jan 24, 2025
1f302b1
Start at 50%
Jan 24, 2025
6af20a2
Merge pull request #138 from HubSpot/split-and-simplify-prefix-balance
szabowexler Jan 27, 2025
af8ebd5
Add the ratio and costs
Jan 27, 2025
4722617
Merge pull request #139 from HubSpot/split-and-simplify-prefix-balance
szabowexler Jan 27, 2025
b0d303c
We were computing perf:iso instead of iso:perf
Jan 28, 2025
b19d456
Add a simple debug
Jan 28, 2025
b8056a4
Use dispersion to describe the concept
Jan 28, 2025
8d6bc95
Fix the formula
Jan 28, 2025
a5418a9
Better log
Jan 28, 2025
de9ed00
Allow to run using build, for now
Jan 28, 2025
70ad820
Merge pull request #140 from HubSpot/swap-to-dispersion
szabowexler Jan 28, 2025
7fb24a4
Include table names
Jan 28, 2025
08b6b76
Fix mismatch
Jan 28, 2025
943a423
Format that
Jan 28, 2025
661c8ff
Mark that we only need the cost function if all regions are not syste…
Jan 28, 2025
d0641d8
Only needed if multipler is nonzero as well
Jan 28, 2025
1b821c1
Emit the initial cost
Jan 28, 2025
5873254
Avoid a divide by zero, just treat this server as balanced
Jan 28, 2025
982eba5
Have to init this
Jan 28, 2025
ce958fe
Make sure it's initialized
Jan 28, 2025
c23cf77
Only do full prep if needed
Jan 28, 2025
7f9b302
Include table names
Jan 28, 2025
6945130
Try setting to 0.5
Jan 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hbase.thirdparty.com.google.gson.annotations.Expose;
import org.apache.yetus.audience.InterfaceAudience;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand Down Expand Up @@ -54,17 +55,17 @@ class MutableRegionInfo implements RegionInfo {
// zookeeper as of 0.90.0 HBase. And now in DisableTableProcedure, finally we will create bunch
// of UnassignProcedures and at the last of the procedure we will set the region state to
// CLOSED, and will not change the offLine flag.
private boolean offLine;
private boolean split;
private final long regionId;
private final int replicaId;
private final byte[] regionName;
private final byte[] startKey;
private final byte[] endKey;
private final int hashCode;
private final String encodedName;
private final byte[] encodedNameAsBytes;
private final TableName tableName;
@Expose private boolean offLine;
@Expose private boolean split;
@Expose private final long regionId;
@Expose private final int replicaId;
@Expose private final byte[] regionName;
@Expose private final byte[] startKey;
@Expose private final byte[] endKey;
@Expose private final int hashCode;
@Expose private final String encodedName;
@Expose private final byte[] encodedNameAsBytes;
@Expose private final TableName tableName;

private static int generateHashCode(final TableName tableName, final byte[] startKey,
final byte[] endKey, final long regionId, final int replicaId, boolean offLine,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
import org.apache.hadoop.hbase.net.Address;
import org.apache.hadoop.hbase.util.Addressing;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hbase.thirdparty.com.google.gson.annotations.Expose;
import org.apache.yetus.audience.InterfaceAudience;

import org.apache.hbase.thirdparty.com.google.common.base.Splitter;
Expand Down Expand Up @@ -82,15 +83,15 @@ public class ServerName implements Comparable<ServerName>, Serializable {
*/
public static final String UNKNOWN_SERVERNAME = "#unknown#";

private final String serverName;
private final long startCode;
@Expose private final String serverName;
@Expose private final long startCode;
private transient Address address;

/**
* Cached versioned bytes of this ServerName instance.
* @see #getVersionedBytes()
*/
private byte[] bytes;
@Expose private byte[] bytes;
public static final List<ServerName> EMPTY_SERVER_LIST = new ArrayList<>(0);

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import java.util.concurrent.CopyOnWriteArraySet;
import org.apache.commons.lang3.ArrayUtils;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hbase.thirdparty.com.google.gson.annotations.Expose;
import org.apache.yetus.audience.InterfaceAudience;

import org.apache.hbase.thirdparty.com.google.common.base.Preconditions;
Expand Down Expand Up @@ -93,14 +94,14 @@ public static boolean isMetaTableName(final TableName tn) {
*/
public static final TableName OLD_META_TABLE_NAME = getADummyTableName(OLD_META_STR);

private final byte[] name;
private final String nameAsString;
private final byte[] namespace;
private final String namespaceAsString;
private final byte[] qualifier;
private final String qualifierAsString;
private final boolean systemTable;
private final int hashCode;
@Expose private final byte[] name;
@Expose private final String nameAsString;
@Expose private final byte[] namespace;
@Expose private final String namespaceAsString;
@Expose private final byte[] qualifier;
@Expose private final String qualifierAsString;
@Expose private final boolean systemTable;
@Expose private final int hashCode;

/**
* Check passed byte array, "tableName", is legal user-space table name.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import java.util.Iterator;
import java.util.List;
import org.apache.commons.lang3.StringUtils;
import org.apache.hbase.thirdparty.com.google.gson.annotations.Expose;
import org.apache.yetus.audience.InterfaceAudience;

import org.apache.hbase.thirdparty.com.google.common.base.Splitter;
Expand All @@ -37,7 +38,7 @@
*/
@InterfaceAudience.Public
public class Address implements Comparable<Address> {
private final HostAndPort hostAndPort;
@Expose private final HostAndPort hostAndPort;

private Address(HostAndPort hostAndPort) {
this.hostAndPort = hostAndPort;
Expand Down
Loading