Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Leaderless tablet when tserver crashes while tablet splitting #25916

Open
1 task done
archit-rastogi opened this issue Feb 6, 2025 · 0 comments
Open
1 task done
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@archit-rastogi
Copy link

archit-rastogi commented Feb 6, 2025

Jira Link: DB-15231

Description

Long running universe has a leaderless tablet and is not recovering.
Tablet belongs to a geo-partitioned table.

Version: 2.23.1.0-b221

Seems we failed to create two replicas for a new split tablet, and then never managed to recover..

Tserver logs:

0129 12:04:34.255645 828807 ts_tablet_manager.cc:2962] Update data/wal directory assignment map for table: 00004000000030008000000000004303 and tablet 3f93a5e377254348876f2750ed1a0632
I0129 12:04:34.255689 828807 fs_manager.cc:759] SetTabletPathByDataPath: Tablet 3f93a5e377254348876f2750ed1a0632 metadata path being set to /mnt/d0/yb-data/tserver
I0129 12:04:39.860646 828807 fs_manager.cc:759] SetTabletPathByDataPath: Tablet 3f93a5e377254348876f2750ed1a0632 metadata path being set to /mnt/d0/yb-data/tserver
I0129 12:04:41.365617 828807 tablet_snapshots.cc:609] T feebadab88fe4b4396d6aeb289e7aa7a P cc6bc04aa51e4da0a33113aa7479eaf4: Checkpoint created in /mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632
I0129 12:04:41.365669 828807 docdb_rocksdb_util.cc:659] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [R]: Write buffer size: 134217728
W0129 12:04:41.378095 828807 column_family.cc:393] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [R]: Column family default does not use any background compaction. Compactions can only be done via CompactFiles
I0129 12:04:41.382022 828807 version_set.cc:2966] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [R]: Recovered from manifest file:/mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632/MANIFEST-000854 succeeded,manifest_file_number is 854, next_file_number is 875, last_sequence is 1125899939877235, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x00002581a74629c0 -> { op_id: 464.59311506 hybrid_time: { physical: 1738152272828590 } history_cutoff: { cotables cutoff: <invalid>, primary cutoff: { physical: 1738150965473523 } } max_value_level_ttl_expiration_time: <initial> primary_schema_version: 0 cotable_schema_versions: [] global_filter: <invalid> cotables_filter: [] }
I0129 12:04:41.382042 828807 version_set.cc:2974] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [R]: Column family [default] (ID 0), log number is 855
I0129 12:04:41.389885 828807 version_set.cc:2407] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [R]: Creating manifest 878
I0129 12:04:41.428936 828807 db_impl.cc:805] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [R]: Shutting down RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632
I0129 12:04:41.429018 828807 db_impl.cc:1277] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [R]: [JOB 0] Delete /mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632//MANIFEST-000854 type=4 #854 -- OK
I0129 12:04:41.430073 828807 db_impl.cc:928] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [R]: Shutdown done
I0129 12:04:41.430119 828807 docdb_rocksdb_util.cc:659] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [I]: Write buffer size: 134217728
W0129 12:04:41.433807 828807 column_family.cc:393] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [I]: Column family default does not use any background compaction. Compactions can only be done via CompactFiles
I0129 12:04:41.434293 828807 version_set.cc:2966] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [I]: Recovered from manifest file:/mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632.intents/MANIFEST-000606 succeeded,manifest_file_number is 606, next_file_number is 621, last_sequence is 1125899972549524, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x00002581a7465980 -> { op_id: 462.59310942 hybrid_time: { physical: 1738152098507292 } history_cutoff: { cotables cutoff: <invalid>, primary cutoff: <invalid> } max_value_level_ttl_expiration_time: <initial> primary_schema_version: <nullopt> cotable_schema_versions: [] global_filter: <invalid> cotables_filter: [] }
I0129 12:04:41.434310 828807 version_set.cc:2974] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [I]: Column family [default] (ID 0), log number is 607
I0129 12:04:41.437371 828807 version_set.cc:2407] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [I]: Creating manifest 624
I0129 12:04:41.441540 828807 db_impl.cc:805] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [I]: Shutting down RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632.intents
I0129 12:04:41.441598 828807 db_impl.cc:1277] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [I]: [JOB 0] Delete /mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632.intents//MANIFEST-000606 type=4 #606 -- OK
I0129 12:04:41.455629 828807 db_impl.cc:928] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4 [I]: Shutdown done
I0129 12:04:41.455677 828807 ts_tablet_manager.cc:1211] P cc6bc04aa51e4da0a33113aa7479eaf4: Created raft group metadata for table: 00004000000030008000000000004303 tablet: 3f93a5e377254348876f2750ed1a0632

tserver appears to crash

W0129 12:05:19.011183 781400 threadpool.cc:483] Thread pool failed to create thread: Runtime error (yb/util/thread.cc:805): Failed to start thread. Thread start status: IO error (yb/util/thread.cc:772): pthread_create: Resource temporarily unavailable (system error 11), signal mask restore status: OK, num_threads: 3, max_threads: 2147483647

after a tserver restart

I0129 12:06:08.599387 863125 fs_manager.cc:814] Found tablet 3f93a5e377254348876f2750ed1a0632 metadata at /mnt/d0/yb-data/tserver
I0129 12:06:08.627758 863158 ts_tablet_manager.cc:1928] Loading metadata for tablet 3f93a5e377254348876f2750ed1a0632
W0129 12:06:44.913903 863161 ts_tablet_manager.cc:2859] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4: Tablet Manager startup: Rolling forward tablet deletion of type TABLET_DATA_DELETED
I0129 12:06:44.913919 863161 ts_tablet_manager.cc:3428] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4: Deleting tablet data with delete state TABLET_DATA_DELETED
I0129 12:06:44.913930 863161 docdb_rocksdb_util.cc:659] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4: Write buffer size: 134217728
I0129 12:06:44.913942 863161 tablet_metadata.cc:827] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4: Destroying regular db at: /mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632
I0129 12:06:44.916256 863161 tablet_metadata.cc:833] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4: Successfully destroyed regular DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632
I0129 12:06:44.916496 863161 tablet_metadata.cc:850] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4: Successfully destroyed provisional records DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632.intents
I0129 12:06:44.925206 863161 ts_tablet_manager.cc:3438] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4: Tablet deleted. Last logged OpId: 0.0
I0129 12:06:44.925251 863161 log.cc:1747] T 3f93a5e377254348876f2750ed1a0632 P cc6bc04aa51e4da0a33113aa7479eaf4: Deleting WAL dir /mnt/d0/yb-data/tserver/wals/table-00004000000030008000000000004303/tablet-3f93a5e377254348876f2750ed1a0632
I0129 12:06:44.925379 863161 consensus_meta.cc:132] T 3f93a5e377254348876f2750ed1a0632 Deleting consensus metadata
I0129 12:06:44.925453 863161 ts_tablet_manager.cc:3512] Deleted transition in progress handle non ready tablet for tablet 3f93a5e377254348876f2750ed1a0632

credits: @hulien22

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@archit-rastogi archit-rastogi added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Feb 6, 2025
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
None yet
Development

No branches or pull requests

2 participants