Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix usages of some async utilities #1610

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Vampire
Copy link
Member

@Vampire Vampire commented Mar 26, 2023

Fixes #1600

@Vampire Vampire requested a review from leonard84 March 26, 2023 18:03
@Vampire Vampire marked this pull request as draft March 26, 2023 18:28
@Vampire
Copy link
Member Author

Vampire commented Mar 26, 2023

Hmmm, the tests failing now were previously flaky at best.

They either were successful because the code behaved properly,
or they were successful because the timeout for the latch was short enough.

E.g. for ParallelSpec#tests can get exclusive access to resources to be meaningful without the assertion I added, you would need a very high timeout value where it practically can never happen that the code misbehaves but the test is green due to the timeout passing fast enough. 100 ms as timeout is surely not matching that constraint.
But if you e. g. configure 10 seconds as timeout, then you actually have to wait those 10 seconds for the test to complete successfully and only a failure would be faster.

@leonard84 do you have any idea how to better properly test this, without depending on "high enough" timeouts?

@Vampire
Copy link
Member Author

Vampire commented Mar 26, 2023

Maybe we should not really test the concurrency but trust that JUnit Platform does the right thing and instead just test that the annotations set the proper exclusive resources on the feature infos?

Comment on lines +65 to +94
while (waitMillis > 0) {
long waitStart = System.nanoTime();
try {
synced = sync.offer(stackTrace, waitMillis, TimeUnit.MILLISECONDS);
} catch (InterruptedException ignored) {
// this is our own thread, so we can ignore the interruption safely and continue the remaining waiting
waitMillis -= TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - waitStart);
continue;
}
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it indeed happens that the watcher thread is interrupted during the initial wait, it should continue the initial wait, shouldn't it?
Or should it really forward the interrupting to the Feature thread, just because it got interrupted for whatever reason?

return;
}
} catch (InterruptedException | ExecutionException e) {
ExecutionResult executionResult = iterationRunner.runIteration(args, maxIterations).join();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using join to make it un-interruptible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not get uninterruptible.
It just does not throw checked exceptions but unchecked ones.
If you use get() you might get InterruptedException and then have to handle it properly.
Just ignoring is not ok at that spot as this is not a self-managed thread, so in most cases you should either rethrow or at least re-interrupt so that code higher in the stack can properly handle the interrupt and so on.
And by using join() here you can simply get around all that, as you don't mess with the interrupted exception yourself.

And ExecutionException would never come here anyway, as the future is never completed exceptionally.

@leonard84
Copy link
Member

They either were successful because the code behaved properly, or they were successful because the timeout for the latch was short enough.

E.g. for ParallelSpec#tests can get exclusive access to resources to be meaningful without the assertion I added, you would need a very high timeout value where it practically can never happen that the code misbehaves but the test is green due to the timeout passing fast enough. 100 ms as timeout is surely not matching that constraint. But if you e. g. configure 10 seconds as timeout, then you actually have to wait those 10 seconds for the test to complete successfully and only a failure would be faster.

@leonard84 do you have any idea how to better properly test this, without depending on "high enough" timeouts?

Just increasing the timeout to 10s doesn't fix the tests, we'd actually need a way to get access to the lock.

Some time ago, I advocated that the JUnit Platform integrates JFR events, to enable low tracing of parallel executions. Unfortunately, that didn't gain any traction with the sentiment being that the listener->jfr events was good enough.

Maybe this would be a good showcase why it would be good to have those low-level events. If the platform would emit events for locking/unlocking of exclusive resources, we could make assertions upon those events instead of not knowing why if it was ok, because the code worked properly or because there was just a single thread for some reason. @marcphilipp WDYT?

@marcphilipp
Copy link
Member

Maybe this would be a good showcase why it would be good to have those low-level events. If the platform would emit events for locking/unlocking of exclusive resources, we could make assertions upon those events instead of not knowing why if it was ok, because the code worked properly or because there was just a single thread for some reason.

I'm not opposed to it in general. The main concern was breaking users' builds. Apparently there are still JDKs that don't include the JFR module (see junit-team/junit5#3279 (comment)) but we might now be able to use jfr-polyfill.

@Vampire
Copy link
Member Author

Vampire commented Aug 16, 2023

Any news on that @marcphilipp?

@Vampire Vampire force-pushed the fix-async-usages branch 5 times, most recently from cd2a9e1 to 3764821 Compare August 28, 2023 10:31
@Vampire Vampire force-pushed the fix-async-usages branch 2 times, most recently from c4dcae4 to 1b4069e Compare September 16, 2023 10:25
@Vampire Vampire force-pushed the fix-async-usages branch 2 times, most recently from eced08a to 012ca2c Compare September 18, 2023 23:38
@Vampire Vampire force-pushed the fix-async-usages branch 3 times, most recently from a536962 to 4a09c7a Compare October 14, 2023 11:34
@Vampire Vampire force-pushed the fix-async-usages branch 3 times, most recently from 73fabaa to d9e0b17 Compare November 7, 2023 00:00
@Vampire Vampire force-pushed the fix-async-usages branch 2 times, most recently from 9ae9fb1 to 5c95dfe Compare December 1, 2023 01:20
@Vampire Vampire force-pushed the fix-async-usages branch 2 times, most recently from 6b2d47e to a16e152 Compare December 18, 2023 12:49
@Vampire
Copy link
Member Author

Vampire commented Mar 6, 2024

@AndreasTu before you try again to fix flaky async tests, especially by introducing insanely high timeouts, please be aware of this PR and the comments.

Maybe you have some additional ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants