Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zfs_recover_ms parameter #17094

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ihoro
Copy link

@ihoro ihoro commented Feb 25, 2025

Motivation and Context

There are production cases when loading of a metaslab leads to a ZFS panic due to unexpected entries in its spacemap (presumably). The assertions in zfs_range_tree_add_impl() and zfs_range_tree_remove_impl() fail due to overlapping or missing segments, etc. A business would like to go ahead with such pools while the root cause is being investigated.

Description

The idea is to allow loading such metaslabs with a potential space leak as a trade-off instead of a potential data loss.

We already have zfs_recover module parameter to mitigate various issues, including some range tree cases, and this patch adds zfs_recover_ms parameter to localize the recovery behavior to the metaslab loading process only.

The following diagrams are expected to help with the details:

zfs_recover_ms

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@github-actions github-actions bot added the Status: Work in Progress Not yet ready for general review label Feb 25, 2025
Signed-off-by: Igor Ostapenko <[email protected]>
Comment on lines +3148 to +3149
ZFS_MODULE_PARAM(zfs, zfs_, recover_ms, INT, ZMOD_RW,
"Set to attempt to recover from fatal errors during metaslab loading");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it is really about metaslab loading, or even about metaslabs. Doesn't it apply to all spacemap operations, or even some non-spacemap? Also as I have mentioned during the call, I recall it more often happening not even during the metaslab loading (under which I guess you mean spacemap condensing, or whatever it is called), but when deleting snapshots, moving their list of blocks to parent of to free space.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently it focuses on ms_allocatable only, but, indeed, it covers all operations with it, not only upon loading.

Comment on lines +345 to +347
zfs_panic_recover_ms("zfs: adding segment "
"(offset=%llx size=%llx) overlapping with "
"existing one (offset=%llx size=%llx)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wish we could say something about what was the range tree and what we were doing, otherwise just a message that something on pool overlapped somewhere does not help us with debugging. In case of panic we should get a stack, but I wonder if we could get more. May be giving range trees some types, names, etc. for debugging.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of more generalized approach, I guess we could cover both comments with something like zfs_recover_range_tree_mask (instead of zfs_recover_ms) which allows selecting up to 64 most interesting range tree classes: ms_allocatable, ms_freeing, ms_freed, and so on. I think 64 bits are enough to cover all range tree use cases we have currently, even the ones not related to spacemaps. Each instance of the selected tree classes would go the warning path instead of the panic one. In addition, each range tree instance could really have some description which says about the actual name like ms_allocatable and, probably, extra info like spa/vdev/metaslab id or something. Does it seem like one of the options we could consider implementing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @amotin had in mind was just the one tunable, but make the log output mention which range-tree, and as much details as possible so we can have something to track down when a user reports encountering the problem. Knowing which of the range trees would be a good first step, but it'd also be nice to show the range that was being added/removed, the full details of the range it overlapped with (do we have a birth time for each available?), and in general to just leave as many bread crumbs as possible to chase this problem down

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems that the range tree "selector" using some mask is an unnecessary extra complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants