Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local server stress tests #23701

Merged
merged 71 commits into from
Feb 26, 2025

Conversation

anthony-murphy
Copy link
Contributor

@anthony-murphy anthony-murphy commented Jan 30, 2025

This change adds a new type of stress test. This stress test runs against local server, and leverage's all layers of the Fluid Framework stack. Additionally, this test imports existing models from our DDS fuzz tests, and reuses them to incorporate DDS level testing and validation. With these tests we can randomly test the full intersection of features ensuring broad coverage which isn't easy or feasible to evaluate scenario by scenario.

@github-actions github-actions bot added the area: dds Issues related to distributed data structures label Jan 30, 2025
@github-actions github-actions bot removed area: dds: sharedstring area: dds Issues related to distributed data structures area: runtime Runtime related issues labels Feb 20, 2025
@anthony-murphy anthony-murphy removed the request for review from a team February 20, 2025 19:50
@anthony-murphy anthony-murphy force-pushed the local-sever-stress-tests branch from 3923671 to 5e806d7 Compare February 21, 2025 18:12
@anthony-murphy anthony-murphy force-pushed the local-sever-stress-tests branch from 5e806d7 to 1aef146 Compare February 21, 2025 18:34
tylerbutler pushed a commit to tylerbutler/FluidFramework that referenced this pull request Feb 22, 2025
This change splits out a number of change made in and for microsoft#23701. Theses
refactoring are primarily around reusability.
@@ -0,0 +1,118 @@
{
"name": "@fluid-internal/local-server-stress-tests",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package seems like a better fit for @fluid-private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about this, and will change it, but i also wasn't sure if we could run fluid private tests in our long running out stress runs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect issues with that

Copy link
Contributor

@Abe27342 Abe27342 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. As Matt pointed out, it is pretty complex, but I'd rather not block progress in the meantime. I'll let you judge whether efforts to reason about what's going on warrant simplifications.

},
uploadBlob: async (state, op) =>
// this will hang if we are offline due to disconnect, so we don't wait for blob upload
void state.datastore.uploadBlob(op.tag, state.random.string(state.random.integer(1, 16))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not awaiting the blob upload is reasonable, but I'm mildly concerned about race conditions on test replay. I looked at this a bit and it seems like the strategy you used is to only mark the blob as 'available for use' once it's completed upload (this is via the implementation of uploadBlob called here and the subsequent local state update that happens). That seems like it wouldn't have problems for the intial test. However, on replay it seems like an op referencing a pending blob upload could run before upload has finished, and the reducer logic which looks for the handle with matching tag would then fail.

Maybe we just don't see this b/c it's local server? Or perhaps more likely, I misinterpreted something? If you haven't run into it with local replay, I don't think it's necessary to resolve to check in. It might make sense to track the state of blobs a bit more thoroughly though (e.g. have the implementation of uploadBlob track the pending blob upload locally and give above reducer logic something to await)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i haven't seen a replay failure, but i think you are correct in that it could cause one. maybe local server is fast enough it won't happen in practice, and i'm not sure how to fix it other than by putting a small retry with timeout in the reducer.

@anthony-murphy anthony-murphy force-pushed the local-sever-stress-tests branch from 396e836 to 3e5e188 Compare February 26, 2025 01:08
@anthony-murphy anthony-murphy force-pushed the local-sever-stress-tests branch from 3e5e188 to e4c1efd Compare February 26, 2025 01:17
@anthony-murphy anthony-murphy enabled auto-merge (squash) February 26, 2025 02:18
@anthony-murphy anthony-murphy merged commit dfa566d into microsoft:main Feb 26, 2025
31 checks passed
@anthony-murphy anthony-murphy deleted the local-sever-stress-tests branch February 26, 2025 17:35
chentong7 pushed a commit to chentong7/FluidFramework that referenced this pull request Mar 11, 2025
This change adds a new type of stress test. This stress test runs
against local server, and leverage's all layers of the Fluid Framework
stack. Additionally, this test imports existing models from our DDS fuzz
tests, and reuses them to incorporate DDS level testing and validation.
With these tests we can randomly test the full intersection of features
ensuring broad coverage which isn't easy or feasible to evaluate
scenario by scenario.

---------

Co-authored-by: Abram Sanderson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants