Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue-2074: disable IO of broken devices #2417

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

sharpeye
Copy link
Collaborator

@sharpeye sharpeye commented Nov 4, 2024

#2074

At startup, Disk Agent collects a list of devices with changed serial numbers and suspends such devices.
After Disk Agent registers in Disk Registry, broken devices are disabled and the rest of the devices resumed.
Disk Registry sends a list of broken devices with TEvRegisterAgentResponse (DevicesToDisableIO field).
Suspending devices means that read/write/zero requests will receive an E_REJECTED error.
Disabling devices means that read/write/zero requests will receive an E_IO error.

The list of suspended devices is stored in the DevicesWithSuspendedIO field in the configuration cache.

@sharpeye sharpeye added the blockstore Add this label to run only cloud/blockstore build and tests on PR label Nov 4, 2024
Copy link
Contributor

github-actions bot commented Nov 4, 2024

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 3346ef3.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3472 3472 0 0 0 0

@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch from 669f536 to 1f988e3 Compare November 5, 2024 21:10
@sharpeye sharpeye changed the title issue-2074: store the list of uuids of devices with suspended IO in the config cache issue-2074: disable IO of broken devices Nov 5, 2024
@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch 2 times, most recently from 32dc4c8 to d023386 Compare November 5, 2024 21:31
Copy link
Contributor

github-actions bot commented Nov 5, 2024

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit d023386.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3475 3475 0 0 0 0

@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch 3 times, most recently from 379cdf6 to de2c24a Compare November 5, 2024 22:33
Copy link
Contributor

github-actions bot commented Nov 5, 2024

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit de2c24a.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3475 3475 0 0 0 0

@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch from de2c24a to 18b0571 Compare November 6, 2024 12:11
Copy link
Contributor

github-actions bot commented Nov 6, 2024

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 18b0571.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3476 3476 0 0 0 0

@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch 3 times, most recently from 4d051e3 to 0b33f9a Compare November 6, 2024 13:09
Copy link
Contributor

github-actions bot commented Nov 6, 2024

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 0b33f9a.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3476 3475 0 1 0 0

@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch from 0b33f9a to 078b1ff Compare November 6, 2024 14:20
Copy link
Contributor

github-actions bot commented Nov 6, 2024

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 078b1ff.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3476 3476 0 0 0 0

@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch from 078b1ff to 9c7b960 Compare November 6, 2024 18:02
@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch from 9c7b960 to 2b6bf5b Compare November 6, 2024 18:07
Copy link
Contributor

github-actions bot commented Nov 6, 2024

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 2b6bf5b.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3477 3477 0 0 0 0

@sharpeye sharpeye added the large-tests Launch large tests for PR label Nov 6, 2024
@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch from 1e1a3b6 to 43e40ce Compare November 6, 2024 20:14
Copy link
Contributor

github-actions bot commented Nov 6, 2024

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 43e40ce.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3668 3667 0 1 0 0

@sharpeye sharpeye force-pushed the users/sharpeye/issue-2074-store-suspended-io-dev-list branch from 43e40ce to 21feaa1 Compare November 7, 2024 09:52
Copy link
Contributor

github-actions bot commented Nov 7, 2024

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 21feaa1.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3668 3668 0 0 0 0

@sharpeye sharpeye marked this pull request as ready for review November 8, 2024 14:05
@komarevtsev-d
Copy link
Collaborator

У нас же есть какие-то гарантии что RegisterAgentResponse от DR дойдет до агента? Не получится так, что агент как бы зарегался, а IO встало колом?

Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 70a5cf0.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3669 3669 0 0 0 0

@sharpeye
Copy link
Collaborator Author

sharpeye commented Nov 13, 2024

У нас же есть какие-то гарантии что RegisterAgentResponse от DR дойдет до агента? Не получится так, что агент как бы зарегался, а IO встало колом?

Агент с DR общаются по пайпу - он гарантирует доставку, либо порвется соединение. К тому же, если девайсы засуспендились, то они были либо сломаны (DR прислал в ответе), либо поменяли серийники, и их в любом случае надо суспендить. Рабочие девайсы просто так не засуспендятся.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blockstore Add this label to run only cloud/blockstore build and tests on PR large-tests Launch large tests for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants