Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep Datafiles Sorted #354

Draft
wants to merge 16 commits into
base: dev
Choose a base branch
from

Conversation

sruffell
Copy link
Contributor

@sruffell sruffell commented Aug 4, 2020

The datafiles and inclusions in the datafiles are not guaranteed to be kept in sorted order in light of adds / removes. This meant that if a user in the future adds an interval, and then attempts to iterate over the database, the order was not guaranteed to be preserved.

This is a draft PR since it is built on the work of the following PRs:

@lauft
Copy link
Member

lauft commented Aug 11, 2020

The datafiles and inclusions in the datafiles are not guaranteed to be kept in sorted order in light of adds / removes. This meant that if a user in the future adds an interval, and then attempts to iterate over the database, the order was not guaranteed to be preserved.

Can you specify this a bit/provide an example script?

@sruffell
Copy link
Contributor Author

The following section shows the test that would fail. In it, Database::addInterval () is called with intervals out of order, but then when trying to iterate over the database, the order isn't sorted, which is implied by the code that does not want to load the entire database.

4cce6b2#diff-de69d8da4e4696c24f8977414a514e29R39-R77

Granted, I cannot currently think of a way for a user of the timew binary to create this condition, but I think a future user of the Database object shouldn't have to commit and reload the database in order to make sure newly added intervals are sorted.

I first was wondering about this when writing b43fdf2#diff-41766cdf08b0c4d6c7a51271c51b9f85R117. I thought why can't I just use lower_bound or otherwise stop looking through the datafile if I've passed the point the new interval should be. I think this also could be an issue if we implement the import command, since I do not believe we should assume the input will be in sorted order relative to entries that may already be in the target database.

@sruffell sruffell force-pushed the wip-sorted-datafiles branch 2 times, most recently from e3f7c48 to e83f9db Compare August 13, 2020 06:58
@sruffell
Copy link
Contributor Author

sruffell commented Aug 14, 2020

The datafiles and inclusions in the datafiles are not guaranteed to be kept in sorted order in light of adds / removes. This meant that if a user in the future adds an interval, and then attempts to iterate over the database, the order was not guaranteed to be preserved.

Can you specify this a bit/provide an example script?

Actually, I was able to conceive of an example that shows the error against the current dev branch.

The following script will report the wrong running time. I added an interval with the same tags as the current interval in the past. Because the newly added interval was placed at the end of the collection of the database, when interval summarize is walking backwards to show the totals it shows both the current interval (30 min) and the interval that was added a month ago (1 hour).

#!/bin/sh
set -x

export TIMEWARRIORDB=/tmp/timewarriordb
rm -rf ${TIMEWARRIORDB}
mkdir -p ${TIMEWARRIORDB}/data
:> ${TIMEWARRIORDB}/timewarrior.cfg

src/timew start :quiet 1h ago proja
src/timew start :quiet 30min ago projb
src/timew track $(date -d '-2 months' +%Y-%m-%dT%H:%M:%S) - $(date -d '-2 months + 1 hour' +%Y-%m-%dT%H:%M:%S) projb
+ export TIMEWARRIORDB=/tmp/timewarriordb
+ rm -rf /tmp/timewarriordb
+ mkdir -p /tmp/timewarriordb/data
+ :
+ src/timew start :quiet 1h ago proja
+ src/timew start :quiet 30min ago projb
+ date -d -2 months +%Y-%m-%dT%H:%M:%S
+ date -d -2 months + 1 hour +%Y-%m-%dT%H:%M:%S
+ src/timew track 2020-06-14T07:04:02 - 2020-06-14T08:04:02 projb
Recorded projb
  Started 2020-06-14T07:04:02
  Ended              08:04:02
  Total               1:30:00

Although, still there is an issue with intervalSummarize since even when the DB is in sorted order, it only reports the total for the most recent interval.

Here is the output when the same script is run against the wip-sorted-datafiles branch:

+ export TIMEWARRIORDB=/tmp/timewarriordb
+ rm -rf /tmp/timewarriordb
+ mkdir -p /tmp/timewarriordb/data
+ :
+ src/timew start :quiet 1h ago proja
+ src/timew start :quiet 30min ago projb
+ date -d -2 months +%Y-%m-%dT%H:%M:%S
+ date -d -2 months + 1 hour +%Y-%m-%dT%H:%M:%S
+ src/timew track 2020-06-14T07:09:31 - 2020-06-14T08:09:31 projb
Recorded projb
  Started 2020-06-14T07:09:31
  Ended              08:09:31
  Total               0:30:00

I opened #368 to finally eliminate this source of confusion.

@sruffell sruffell force-pushed the wip-sorted-datafiles branch from e83f9db to 444f1bc Compare August 14, 2020 21:51
@lauft lauft added the discussion Discussion on a topic label Aug 19, 2020
If the same interval is added, we do not necessarily want to create a
transaction anymore so the datafile should report this information.

Signed-off-by: Shaun Ruffell <[email protected]>
It doesn't make any sense for the Datafiles to exist in a
non-initialized state.

Signed-off-by: Shaun Ruffell <[email protected]>
Since the switch to using iterators to load lines from the database, it is
assumed that the lines in the datafile are always in sorted order. While the
code has naturally kept them in sorted order previously, this change makes the
Datafile responsible for keeping them sorted.

Signed-off-by: Shaun Ruffell <[email protected]>
The dump() member is constant, as it should be, but if called before one
of the other member that initialize the database, it will fail to return
any intervals.

This change makes the _files member mutable so that constant member
functions are able to initialize it.  These functions still maintain
logical constness as the database itself is not modified.

Signed-off-by: Shaun Ruffell <[email protected]>
Eliminates the naked pointer to the Journal that the Database held which
will simplify writing tests for the Database class.

A follow-on commit can remove the Journal reference from the command
signatures since it is already passed as part of the Database.

Signed-off-by: Shaun Ruffell <[email protected]>
Before this change, exists will only report if the real file exists. If
an atomic file is created or written, but not finalized, the exists call
will return false.

Signed-off-by: Shaun Ruffell <[email protected]>
Since the code uses AtomicFiles to write the tags database, it should
also use AtomicFiles to check for the tags databases' existence.

Signed-off-by: Shaun Ruffell <[email protected]>
Required to use some of the standard algorithms with the database
iterators.

Signed-off-by: Shaun Ruffell <[email protected]>
There were some paths through the database that could result in the
datafiles being improperly initialized. For example, it was possible to
call dump() on an initialized database that wouldn't return anything.

Since the vast majority of use cases involve loading all the Datafile, I
believe it is clearer to always initialize them at construction time.

Signed-off-by: Shaun Ruffell <[email protected]>
Like the previous commit to keep the datafiles sorted, this change
ensures that the Datafiles themselves are always in sorted order
regardless of the order intervals are added.

This is needed since the switch to using iterators for the database.

Signed-off-by: Shaun Ruffell <[email protected]>
…ructor

Primarily to simplify the test code a bit.

Signed-off-by: Shaun Ruffell <[email protected]>
Some conditions are difficult to check via the python scripts.
Specfically some of the more odd sort orderings.

Signed-off-by: Shaun Ruffell <[email protected]>
This eliminates the 'initialize' member function in the Extension class.
The constructor should handle what is needed to initialize the class.

Signed-off-by: Shaun Ruffell <[email protected]>
@sruffell sruffell force-pushed the wip-sorted-datafiles branch from 444f1bc to 70c62e5 Compare August 25, 2020 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev-branch discussion Discussion on a topic
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants