Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimizations #10

Closed
wants to merge 8 commits into from

Conversation

simonlindholm
Copy link
Contributor

Not sure if this is wanted given the discussed rewrite in #3, but I spent an evening doing some obvious optimizations and got runtime for a naive ctd program

INT(1, 10000000, n)
NEWLINE
REP(n, SPACE)
	INT(1, 1000000)
END
NEWLINE
EOF

down from 4.7 seconds to 1.3 (on 5e6 ints).

Most of the PR is spent avoiding GMP, however I suspect a large part of that perf win can be achieved by just the cachedLong part of the relevant commit. Note the "Read input file as binary" behavior change, which is worth discussing in its own right. And of course, feel free to close this if it's not worth the review/testing burden given the rewrite, I was mostly doing this for my own curiosity.

This cuts runtime down from ~4.7s to ~3s.
No effect on my naive benchmark.
CRLF line endings should not validate, or if we think they should, that
handling should happen in NEWLINE. This also makes it possible to do
faster IO, by measuring the size of the input file before reading it.
~2s -> ~1.4s
@meisterT meisterT requested a review from eldering October 20, 2019 14:35
@meisterT
Copy link
Member

We didn't make progress on #3 in a while, so I think it's fair to merge this PR, but I will leave the review to @eldering

@eldering
Copy link
Member

I'm happy to review and merge it. I've looked at everything except the GMP/bigint commit so far:
These commits look fine, and I agree on using binary files and NEWLINE handling the decision of whether CRLF should be accepted or not.

@simonlindholm
Copy link
Contributor Author

So hm, on second thought re the CRLF change, this might make it super annoying to use ctd on a Windows system, when most methods of generating test data output CRLF line endings, and it could be worth leaving CRLF validation to Unix systems... Convenience vs strictness, hmm. It's not strictly needed for fast IO -- str.erase(str.begin() + inputstream.gcount(), str.end()); would make up for the file size discrepancy.

Either way, it definitely would be good to have NEWLINE do some special-casing of CRLFs, e.g. printing a nicer error message for that case. Line endings are a subtle thing that's easy to be confused by. The C++ validator script we use for local IOI qualifiers has this logic: https://github.com/nordicolympiad/testdata_tools/blob/fbae58a0416f84ca9449b0536ef49c9f27b25711/input_format_validators/validator/validator.h#L345

Also, apparently I need to do something to handle the case when __builtin_saddl_overflow doesn't exist, for CI... (introduced in GCC 5)

@eldering
Copy link
Member

So hm, on second thought re the CRLF change, this might make it super annoying to use ctd on a Windows system, when most methods of generating test data output CRLF line endings, and it could be worth leaving CRLF validation to Unix systems... Convenience vs strictness, hmm. It's not strictly needed for fast IO -- str.erase(str.begin() + inputstream.gcount(), str.end()); would make up for the file size discrepancy.

I don't quite understand how this would make using checktestdata on Windows annoying. Precisely what NEWLINE would do could still be platform-dependent, right? As in: NEWLINE could read the platform default newline character(s) and optionally warn if it sees something that looks like it's coming from a different platform (Windows/Unix/OSX), and all this could be configurable if needed.

Either way, it definitely would be good to have NEWLINE do some special-casing of CRLFs, e.g. printing a nicer error message for that case. Line endings are a subtle thing that's easy to be confused by. The C++ validator script we use for local IOI qualifiers has this logic: https://github.com/nordicolympiad/testdata_tools/blob/fbae58a0416f84ca9449b0536ef49c9f27b25711/input_format_validators/validator/validator.h#L345

Also, apparently I need to do something to handle the case when __builtin_saddl_overflow doesn't exist, for CI... (introduced in GCC 5)

So, I propose to not merge this particular commit for now, but I'm happy to consider it in the future (or even make some changes myself, time permitting).

@simonlindholm
Copy link
Contributor Author

As in: NEWLINE could read the platform default newline character(s) and optionally warn if it sees something that looks like it's coming from a different platform (Windows/Unix/OSX), and all this could be configurable if needed.

Ah yes, this would work.

So, I propose to not merge this particular commit for now, but I'm happy to consider it in the future (or even make some changes myself, time permitting).

Sounds good! Should I update the PR, or do you want to cherry-pick the other commits?

@eldering
Copy link
Member

So, I propose to not merge this particular commit for now, but I'm happy to consider it in the future (or even make some changes myself, time permitting).

Sounds good! Should I update the PR, or do you want to cherry-pick the other commits?

Ok, I've cherry-picked and part of your commits. I've rewritten the compilation with Bisonc++ 6 fix a bit, since on my system it was still breaking because of changes to Flexc++ 2.03.

Feel free to create new PRs for the outstanding issues, which if I'm not mistaken are:

  • Faster/binary IO
  • A bigint wrapper around GMP

@eldering
Copy link
Member

I also reviewed the bigint code and added some tests to be confident it works fine.

In principle it looks fine to me, but it would be nice to change the code a bit, so that it can be tranparently switched on/off depending on compiler support for the __builtin* functions. I've made an attempt to do so, see the last few commits in eldering@f7ecb9e, but it doesn't pass the tests when __builtin* stuff is disabled.

@eldering
Copy link
Member

Merged in via #19

@eldering eldering closed this Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants