-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved outputs to analyze integration test results #445
Conversation
1e36329
to
9e38c4e
Compare
96bfccb
to
f5ba137
Compare
7e81ca3
to
a568072
Compare
|
||
return key_diffs | ||
|
||
def compare_dict_values(self, dict1, dict2, percent_threshold=10, abs_threshold=1000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the threshold be when deciding to report percent changes of json values? Should the thresholds for agg_results be different from ecm_results?
note - percent threshold means that only differences >= to that will be reported, absolute threshold only reports differences if the original values exceed that number to prevent outputting large percent diffs due to small numbers.
.gitignore
Outdated
|
||
!tests/integration_testing/results/plots/tech_potential/*.xlsx | ||
!tests/integration_testing/results/plots/max_adopt_potential/*.xlsx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To overwrite the ignored .xlsx files specified above
Remaining tasks
|
35d720b
to
a782f5a
Compare
b34265e
to
0a12c03
Compare
@jtlangevin this is ready for your review (pending CI). Per your comment, I updated the absolute threshold for reporting to depend on the units being compared, where it is 1,000 if cost or energy, and 10 if it emissions. This means that one or both of the values must be greater than that (not that the difference is greater). The percent threshold remains the same for all, 10%. A test case with results that change can be found here: https://github.com/trynthink/scout/actions/runs/13660717679. In the artifacts you will see |
Thanks, I see the plots in the commit which is very helpful. In the artifacts it looks like only the aggregate results are being differenced, and not results for individual ECMs – e.g., |
So the artifacts in that dummy PR are dependent of how the results changed. I just trimmed down the list of ECMs, meaning that there are a lot of differences in the json keys for ecm_results (found in In the PR description, the second screenshot under "Example |
…f baseline and new results for CI comparisons.
…or some methods; better documentation.
68d85ab
to
1b4ad6d
Compare
Fixes #415
Introduces a class to compare integration test results on a branch with the results stored on master. A previous PR (#440) added all integration test results to master. This PR provides a way of evaluating the differences between the working branch and master, which include:
compare_results.py
to:agg_results_key_diffs.csv
andecm_results_key_diffs.csv
)agg_results_value_diffs.csv
andecm_results_value_diffs.csv
)Summary_Data-MAP.xlsx
andSummary_Data-TP.xlsx
(output toSummary_Data-MAP_percent_diffs.csv
andSummary_Data-TP_percent_diffs.csv
)agg_results.json
orecm_results.json
, then:tests/integration_tests/results_base
tests/integration_tests/compare_results.py
Example Outputs
Example CI artifacts are found at https://github.com/trynthink/scout/actions/runs/13660717679
Example


*_results_key_diffs.csv
:Example


*_results_value_diffs.csv
:Example

Summary_Data-*_percent_diffs.xlsx
:Same format as original xlsx files, but values are the percent differences