-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep Datafiles Sorted #354
base: dev
Are you sure you want to change the base?
Conversation
Can you specify this a bit/provide an example script? |
The following section shows the test that would fail. In it, Database::addInterval () is called with intervals out of order, but then when trying to iterate over the database, the order isn't sorted, which is implied by the code that does not want to load the entire database. 4cce6b2#diff-de69d8da4e4696c24f8977414a514e29R39-R77 Granted, I cannot currently think of a way for a user of the timew binary to create this condition, but I think a future user of the Database object shouldn't have to commit and reload the database in order to make sure newly added intervals are sorted. I first was wondering about this when writing b43fdf2#diff-41766cdf08b0c4d6c7a51271c51b9f85R117. I thought why can't I just use lower_bound or otherwise stop looking through the datafile if I've passed the point the new interval should be. I think this also could be an issue if we implement the |
e3f7c48
to
e83f9db
Compare
Actually, I was able to conceive of an example that shows the error against the current dev branch. The following script will report the wrong running time. I added an interval with the same tags as the current interval in the past. Because the newly added interval was placed at the end of the collection of the database, when interval summarize is walking backwards to show the totals it shows both the current interval (30 min) and the interval that was added a month ago (1 hour).
Although, still there is an issue with intervalSummarize since even when the DB is in sorted order, it only reports the total for the most recent interval. Here is the output when the same script is run against the wip-sorted-datafiles branch:
I opened #368 to finally eliminate this source of confusion. |
e83f9db
to
444f1bc
Compare
Signed-off-by: Shaun Ruffell <[email protected]>
If the same interval is added, we do not necessarily want to create a transaction anymore so the datafile should report this information. Signed-off-by: Shaun Ruffell <[email protected]>
Signed-off-by: Shaun Ruffell <[email protected]>
It doesn't make any sense for the Datafiles to exist in a non-initialized state. Signed-off-by: Shaun Ruffell <[email protected]>
Since the switch to using iterators to load lines from the database, it is assumed that the lines in the datafile are always in sorted order. While the code has naturally kept them in sorted order previously, this change makes the Datafile responsible for keeping them sorted. Signed-off-by: Shaun Ruffell <[email protected]>
The dump() member is constant, as it should be, but if called before one of the other member that initialize the database, it will fail to return any intervals. This change makes the _files member mutable so that constant member functions are able to initialize it. These functions still maintain logical constness as the database itself is not modified. Signed-off-by: Shaun Ruffell <[email protected]>
Eliminates the naked pointer to the Journal that the Database held which will simplify writing tests for the Database class. A follow-on commit can remove the Journal reference from the command signatures since it is already passed as part of the Database. Signed-off-by: Shaun Ruffell <[email protected]>
Before this change, exists will only report if the real file exists. If an atomic file is created or written, but not finalized, the exists call will return false. Signed-off-by: Shaun Ruffell <[email protected]>
Since the code uses AtomicFiles to write the tags database, it should also use AtomicFiles to check for the tags databases' existence. Signed-off-by: Shaun Ruffell <[email protected]>
Required to use some of the standard algorithms with the database iterators. Signed-off-by: Shaun Ruffell <[email protected]>
There were some paths through the database that could result in the datafiles being improperly initialized. For example, it was possible to call dump() on an initialized database that wouldn't return anything. Since the vast majority of use cases involve loading all the Datafile, I believe it is clearer to always initialize them at construction time. Signed-off-by: Shaun Ruffell <[email protected]>
Like the previous commit to keep the datafiles sorted, this change ensures that the Datafiles themselves are always in sorted order regardless of the order intervals are added. This is needed since the switch to using iterators for the database. Signed-off-by: Shaun Ruffell <[email protected]>
…ructor Primarily to simplify the test code a bit. Signed-off-by: Shaun Ruffell <[email protected]>
Signed-off-by: Shaun Ruffell <[email protected]>
Some conditions are difficult to check via the python scripts. Specfically some of the more odd sort orderings. Signed-off-by: Shaun Ruffell <[email protected]>
This eliminates the 'initialize' member function in the Extension class. The constructor should handle what is needed to initialize the class. Signed-off-by: Shaun Ruffell <[email protected]>
444f1bc
to
70c62e5
Compare
The datafiles and inclusions in the datafiles are not guaranteed to be kept in sorted order in light of adds / removes. This meant that if a user in the future adds an interval, and then attempts to iterate over the database, the order was not guaranteed to be preserved.
This is a draft PR since it is built on the work of the following PRs: