-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RSDK-9366] Clear poisoned lock in tests #371
[RSDK-9366] Clear poisoned lock in tests #371
Conversation
lock.clear_poison(); | ||
lock_result.into_inner() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there shouldn't be an issue with multiple threads since:
a. we still hold the LockResult which contains the guard
b. we return the guard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason why this is OK is that lock poisoning is mostly about ensuring that the state guarded by the mutex can be trusted to be in a consistent state. In this case, the state is unit, so there is no possibility of it having an inconsistent state.
Honestly this would probably be better written with a binary semaphore, since there isn't really state to be protected, but there doesn't seem to be such a thing in the standard library right now that we can use.
That said, I think there should be a comment here above clear_poison
explaining why we feel justified to clear the poison state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mod a comment, but I would like a second opinion from another reviewer before this merges.
Meanwhile, have you been able to confirm that once you make this change that you see a failed test sometimes, but that it doesn't propagate like it did before?
lock.clear_poison(); | ||
lock_result.into_inner() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason why this is OK is that lock poisoning is mostly about ensuring that the state guarded by the mutex can be trusted to be in a consistent state. In this case, the state is unit, so there is no possibility of it having an inconsistent state.
Honestly this would probably be better written with a binary semaphore, since there isn't really state to be protected, but there doesn't seem to be such a thing in the standard library right now that we can use.
That said, I think there should be a comment here above clear_poison
explaining why we feel justified to clear the poison state.
I added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it lead to false positive? The mutex ensure tests are run sequentially if one fails then we should focus on fixing why the test fails. The mutex should never be poisoned no?
ideally it would never be poisoned, but this will at least reduce the poison-lock noise, showing just the relevant errors of the test failing for legitimate reasons. |
The mutex will be poisoned if a test fails due to a panic, and then all of the subsequent tests which attempt to use the lock will also fail, making it hard to see which test actually failed and which were just fallout from the lock poisoning. Since there isn't actually any meaningful state protected by the mutex, I believe it is OK to clear the poisoning, so that the other tests can meaningfully execute and we get a clear indication of which test failed. Which we should then of course fix. Here is a playground link which will show the issue: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6161ddf7b147e9e58714b35ccdf7454e You may need to run it 10 or so times before you get the ordering that shows the issue, but when |
Some tests do share resources even if not explicitly guarded by the Mutex (like sockets). It's fair to clear the poison but later tests may fail in less obvious ways. We should revisit this when we address CI and focus on fixing flaky tests if they happen. Too bad we can't have a |
18e27f1
to
3e0d578
Compare
the
global_network_test_lock
is used to serialize access to network resources to ensure parallelized tests don't trip over one another. We don't actually care to propagate poisons across tests because each test is meant to be self-contained and the underlying network resources are in-fact available.