PERF-3060 [WiP] Add post-condition verification for CRUDActor metrics #794

jseyster · 2022-12-12T19:40:16Z

This WiP is mostly to get a first opinion on how well the PostCondition class fits into the reporting architecture. I'd be interested in any feedback, and if the approach looks reasonable, I can proceed with unit testing and documentation. I included the temporary example workload that I have been testing with.

dalyd · 2022-12-16T22:24:43Z

src/workloads/execution/CheckMetricsTest.yml

+          OperationMetricsName: PipelineWithoutOptions
+          OperationCommand:
+            Pipeline: [{$group: {_id: "$int", sum: {$sum: "$int2"}}}]
+          PostCondition:


Is this intended to be part of the production workload, or is this whole phase only check? I'm hoping the second.

I had imagined being able to check the result value as well, but I guess you could write that into the pipeline itself. Is that the intent?

I made sure that the time spent verifying the post-condition doesn't get included in the elapsed time that the operation takes. I'm not sure what you mean by "only check." Are there separate production and verification modes for workloads? Thanks!

I considered having a way to verify query results as well, but after discussing it with other people on the column store index project, we decided that users who want that functionality will probably go with an AssertiveActor and that, for this case, we want to avoid the pitfall of making performance workloads do double duty as rigorous correctness tests.

If you are running one operation in one thread once, then everything is just fine. However, if you are running things in a loop with multiple threads, you are going to stress the system differently for having the post condition check. The client will spend some amount of checking the result before sending the next request, so there will be less apparent load on the system. For example, if you have 20 threads running in parallel, and you have 5% overhead for the check, you will effectively have 19 parallel operations on the server at a time, instead of 20. The server will be less loaded. It's something to be aware of, but not a reason to avoid doing it. It seems like there will be cases in which this would be very useful.

I had imagined a check like this being used as a step after the real test, to verify invariants about the system. For example, if you had a workload that was a model of a banking application, you could verify that the total cash in the system didn't change after many transfers. I think your addition still supports this use case fwiw. Actually I'd encourage including an example use case like this with the other.

We don't have anything that is production versus verification modes today. It's probably not worth adding one either, although it is an interesting idea.

I should have added: the approach looks promising!

Good point. Once this is ready for documentation, I think it will be worthwhile to note that it's appropriate for measuring the performance of a relatively long-running query but not for tests intended to measure performance of a server under heavy load.

I also like your idea that a PostCondition could also be used to examine a collection after benchmarks run to ensure it left them in an expected state. An AssertiveActor can also accomplish that, but it can take some trial and error to get the right query. I included an example in CheckMetricsTest.yml along with an example of how to do a similar comparison with an AssertiveActor to show how the techniques compare.

rtimmons

Sorry for my extreme delay. I think the syntax and overall approach is reasonable for now. We may want to move the PostCondition into the Metrics structure itself, but I like the general PostCondition concept being extensible outside of Metrics, so let's see where that evolves.

I looked through the code and didn't see anything obviously wrong, but I'd feel much better with good automated testing and bit more commentary in places where there's so many longs in a row 😄. So I'm 👍 the approach and syntax. Happy to review the code with a bit more scrutiny whenever you're ready.

…rite volume

jseyster

Speaking of extreme delays 🙃... We ended up taking this work out of the Column Store Index project (which we are wrapping up) and turning into a quick win, which is why it took me a while to address the comments.

Thanks for the helpful feedback! I added an additional example case (which can eventually get moved to the docs) and better comments. I also added automated testing with the goal of getting the patch close to a state where it would be reasonably mergable if we agreed on the functionality and details of the implementation.

Now that the Column Store Index project is wrapped I'll have some time to work on this, but it's still not urgent, so no need to worry about review turnaround time. Thanks again!

jseyster · 2023-01-26T00:09:03Z

src/workloads/execution/CheckMetricsTest.yml

+          OperationMetricsName: PipelineWithoutOptions
+          OperationCommand:
+            Pipeline: [{$group: {_id: "$int", sum: {$sum: "$int2"}}}]
+          PostCondition:


Good point. Once this is ready for documentation, I think it will be worthwhile to note that it's appropriate for measuring the performance of a relatively long-running query but not for tests intended to measure performance of a server under heavy load.

I also like your idea that a PostCondition could also be used to examine a collection after benchmarks run to ensure it left them in an expected state. An AssertiveActor can also accomplish that, but it can take some trial and error to get the right query. I included an example in CheckMetricsTest.yml along with an example of how to do a similar comparison with an AssertiveActor to show how the techniques compare.

dalyd · 2023-02-01T16:33:27Z

With the time this PR has been open, you might benefit from merging dsi master into your branch, and pushing that up here. Github handles merging in master well, and does not handle rebasing well at all.

dalyd

A couple questions. Some of this is above my current C++ level.
Overall, I appreciate the test cases. Let's make sure to include some examples in the docs/ directory, but otherwise this looks good to me.

dalyd · 2023-02-01T16:50:05Z

src/metrics/include/metrics/operation.hpp

+     * instance.
+     */
+    void conditionalSuccess(const PostCondition& postCondition = {}) {
+        auto finished = ClockSource::now();


Nice -- just calling out this is how you don't let the time of the post condition check impact the latency measurement for the desired op.

dalyd · 2023-02-01T16:57:11Z

src/metrics/include/metrics/PostCondition.hpp

+        }
+
+        for (const auto& relation : relations) {
+            if (auto requiredValue = node[relation.key].maybe<long long>()) {


Do you want to check for relation.symbol also? It seems like you might want to use it in this loop. I suppose Yaml might not like those symbols as keys.

dalyd · 2023-02-01T17:02:43Z

src/metrics/include/metrics/PostCondition.hpp

+                        return bytes;
+                }
+                throw std::logic_error("impossible");
+            }();


Does this need to be a lambda? Could you just determine the observedValue and pass it into evaluateFn. I'm not great with Lambdas, so maybe not. But if it can, I think it would make this code more readable.

PERF-3060 [WiP] Add post-condition verification for CRUDActor metrics

778c374

dalyd reviewed Dec 16, 2022

View reviewed changes

rtimmons previously approved these changes Jan 4, 2023

View reviewed changes

jseyster added 3 commits January 25, 2023 23:56

Add unit tests for CrudActor PostCondition

fe54aa7

Add example of using PostCondition for after-the-fact validation of w…

8de8df4

…rite volume

Comments to document the PostCondition class

909105a

jseyster dismissed rtimmons’s stale review via 909105a January 26, 2023 00:01

jseyster commented Jan 26, 2023

View reviewed changes

Changes that accidentally got left out of previous commits

d646114

jseyster requested review from dalyd and rtimmons January 30, 2023 23:09

dalyd reviewed Feb 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF-3060 [WiP] Add post-condition verification for CRUDActor metrics #794

PERF-3060 [WiP] Add post-condition verification for CRUDActor metrics #794

jseyster commented Dec 12, 2022

dalyd Dec 16, 2022

jseyster Dec 19, 2022

dalyd Dec 20, 2022

dalyd Dec 20, 2022

jseyster Jan 26, 2023

rtimmons left a comment

jseyster left a comment

jseyster Jan 26, 2023

dalyd commented Feb 1, 2023

dalyd left a comment

dalyd Feb 1, 2023

dalyd Feb 1, 2023

dalyd Feb 1, 2023

PERF-3060 [WiP] Add post-condition verification for CRUDActor metrics #794

Are you sure you want to change the base?

PERF-3060 [WiP] Add post-condition verification for CRUDActor metrics #794

Conversation

jseyster commented Dec 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rtimmons left a comment

Choose a reason for hiding this comment

jseyster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalyd commented Feb 1, 2023

dalyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment