Keep Datafiles Sorted #354

sruffell · 2020-08-04T10:57:18Z

The datafiles and inclusions in the datafiles are not guaranteed to be kept in sorted order in light of adds / removes. This meant that if a user in the future adds an interval, and then attempts to iterate over the database, the order was not guaranteed to be preserved.

This is a draft PR since it is built on the work of the following PRs:

lauft · 2020-08-11T08:37:50Z

The datafiles and inclusions in the datafiles are not guaranteed to be kept in sorted order in light of adds / removes. This meant that if a user in the future adds an interval, and then attempts to iterate over the database, the order was not guaranteed to be preserved.

Can you specify this a bit/provide an example script?

sruffell · 2020-08-12T01:24:06Z

The following section shows the test that would fail. In it, Database::addInterval () is called with intervals out of order, but then when trying to iterate over the database, the order isn't sorted, which is implied by the code that does not want to load the entire database.

4cce6b2#diff-de69d8da4e4696c24f8977414a514e29R39-R77

Granted, I cannot currently think of a way for a user of the timew binary to create this condition, but I think a future user of the Database object shouldn't have to commit and reload the database in order to make sure newly added intervals are sorted.

I first was wondering about this when writing b43fdf2#diff-41766cdf08b0c4d6c7a51271c51b9f85R117. I thought why can't I just use lower_bound or otherwise stop looking through the datafile if I've passed the point the new interval should be. I think this also could be an issue if we implement the import command, since I do not believe we should assume the input will be in sorted order relative to entries that may already be in the target database.

sruffell · 2020-08-14T12:06:15Z

The datafiles and inclusions in the datafiles are not guaranteed to be kept in sorted order in light of adds / removes. This meant that if a user in the future adds an interval, and then attempts to iterate over the database, the order was not guaranteed to be preserved.

Can you specify this a bit/provide an example script?

Actually, I was able to conceive of an example that shows the error against the current dev branch.

The following script will report the wrong running time. I added an interval with the same tags as the current interval in the past. Because the newly added interval was placed at the end of the collection of the database, when interval summarize is walking backwards to show the totals it shows both the current interval (30 min) and the interval that was added a month ago (1 hour).

#!/bin/sh
set -x

export TIMEWARRIORDB=/tmp/timewarriordb
rm -rf ${TIMEWARRIORDB}
mkdir -p ${TIMEWARRIORDB}/data
:> ${TIMEWARRIORDB}/timewarrior.cfg

src/timew start :quiet 1h ago proja
src/timew start :quiet 30min ago projb
src/timew track $(date -d '-2 months' +%Y-%m-%dT%H:%M:%S) - $(date -d '-2 months + 1 hour' +%Y-%m-%dT%H:%M:%S) projb

+ export TIMEWARRIORDB=/tmp/timewarriordb
+ rm -rf /tmp/timewarriordb
+ mkdir -p /tmp/timewarriordb/data
+ :
+ src/timew start :quiet 1h ago proja
+ src/timew start :quiet 30min ago projb
+ date -d -2 months +%Y-%m-%dT%H:%M:%S
+ date -d -2 months + 1 hour +%Y-%m-%dT%H:%M:%S
+ src/timew track 2020-06-14T07:04:02 - 2020-06-14T08:04:02 projb
Recorded projb
  Started 2020-06-14T07:04:02
  Ended              08:04:02
  Total               1:30:00

Although, still there is an issue with intervalSummarize since even when the DB is in sorted order, it only reports the total for the most recent interval.

Here is the output when the same script is run against the wip-sorted-datafiles branch:

+ export TIMEWARRIORDB=/tmp/timewarriordb
+ rm -rf /tmp/timewarriordb
+ mkdir -p /tmp/timewarriordb/data
+ :
+ src/timew start :quiet 1h ago proja
+ src/timew start :quiet 30min ago projb
+ date -d -2 months +%Y-%m-%dT%H:%M:%S
+ date -d -2 months + 1 hour +%Y-%m-%dT%H:%M:%S
+ src/timew track 2020-06-14T07:09:31 - 2020-06-14T08:09:31 projb
Recorded projb
  Started 2020-06-14T07:09:31
  Ended              08:09:31
  Total               0:30:00

I opened #368 to finally eliminate this source of confusion.

Signed-off-by: Shaun Ruffell <[email protected]>

If the same interval is added, we do not necessarily want to create a transaction anymore so the datafile should report this information. Signed-off-by: Shaun Ruffell <[email protected]>

Signed-off-by: Shaun Ruffell <[email protected]>

It doesn't make any sense for the Datafiles to exist in a non-initialized state. Signed-off-by: Shaun Ruffell <[email protected]>

Since the switch to using iterators to load lines from the database, it is assumed that the lines in the datafile are always in sorted order. While the code has naturally kept them in sorted order previously, this change makes the Datafile responsible for keeping them sorted. Signed-off-by: Shaun Ruffell <[email protected]>

The dump() member is constant, as it should be, but if called before one of the other member that initialize the database, it will fail to return any intervals. This change makes the _files member mutable so that constant member functions are able to initialize it. These functions still maintain logical constness as the database itself is not modified. Signed-off-by: Shaun Ruffell <[email protected]>

Eliminates the naked pointer to the Journal that the Database held which will simplify writing tests for the Database class. A follow-on commit can remove the Journal reference from the command signatures since it is already passed as part of the Database. Signed-off-by: Shaun Ruffell <[email protected]>

Before this change, exists will only report if the real file exists. If an atomic file is created or written, but not finalized, the exists call will return false. Signed-off-by: Shaun Ruffell <[email protected]>

Since the code uses AtomicFiles to write the tags database, it should also use AtomicFiles to check for the tags databases' existence. Signed-off-by: Shaun Ruffell <[email protected]>

Required to use some of the standard algorithms with the database iterators. Signed-off-by: Shaun Ruffell <[email protected]>

There were some paths through the database that could result in the datafiles being improperly initialized. For example, it was possible to call dump() on an initialized database that wouldn't return anything. Since the vast majority of use cases involve loading all the Datafile, I believe it is clearer to always initialize them at construction time. Signed-off-by: Shaun Ruffell <[email protected]>

Like the previous commit to keep the datafiles sorted, this change ensures that the Datafiles themselves are always in sorted order regardless of the order intervals are added. This is needed since the switch to using iterators for the database. Signed-off-by: Shaun Ruffell <[email protected]>

…ructor Primarily to simplify the test code a bit. Signed-off-by: Shaun Ruffell <[email protected]>

Signed-off-by: Shaun Ruffell <[email protected]>

Some conditions are difficult to check via the python scripts. Specfically some of the more odd sort orderings. Signed-off-by: Shaun Ruffell <[email protected]>

This eliminates the 'initialize' member function in the Extension class. The constructor should handle what is needed to initialize the class. Signed-off-by: Shaun Ruffell <[email protected]>

sruffell force-pushed the wip-sorted-datafiles branch 2 times, most recently from e3f7c48 to e83f9db Compare August 13, 2020 06:58

sruffell force-pushed the wip-sorted-datafiles branch from e83f9db to 444f1bc Compare August 14, 2020 21:51

lauft added the discussion Discussion on a topic label Aug 19, 2020

sruffell added 16 commits August 25, 2020 03:39

Interval: getAnnotation() can return constant reference

8106b79

Signed-off-by: Shaun Ruffell <[email protected]>

Datafile: addInterval () returns true if the datafile is modified

cad7089

If the same interval is added, we do not necessarily want to create a transaction anymore so the datafile should report this information. Signed-off-by: Shaun Ruffell <[email protected]>

Do not add empty transactions to journal

7654e63

Signed-off-by: Shaun Ruffell <[email protected]>

Datafile: Do not allow uninitialized Datafiles

f2c6de5

It doesn't make any sense for the Datafiles to exist in a non-initialized state. Signed-off-by: Shaun Ruffell <[email protected]>

AtomicFile: exists () should work before finalization

747bb41

Before this change, exists will only report if the real file exists. If an atomic file is created or written, but not finalized, the exists call will return false. Signed-off-by: Shaun Ruffell <[email protected]>

Database: Use AtomicFile when checking for existence of tags file.

fe3d985

Since the code uses AtomicFiles to write the tags database, it should also use AtomicFiles to check for the tags databases' existence. Signed-off-by: Shaun Ruffell <[email protected]>

Database: Define iterator traits and remove unimplemented operations

5b1562e

Required to use some of the standard algorithms with the database iterators. Signed-off-by: Shaun Ruffell <[email protected]>

Database: Add constructor that calls initialize and enable move const…

954a072

…ructor Primarily to simplify the test code a bit. Signed-off-by: Shaun Ruffell <[email protected]>

test: Allow TempDir objects to be used as strings

98e313f

Signed-off-by: Shaun Ruffell <[email protected]>

test: Add test for Database object

f872d81

Some conditions are difficult to check via the python scripts. Specfically some of the more odd sort orderings. Signed-off-by: Shaun Ruffell <[email protected]>

Use create(Rules|Extensions) for symmetry with createDatabase

70c62e5

This eliminates the 'initialize' member function in the Extension class. The constructor should handle what is needed to initialize the class. Signed-off-by: Shaun Ruffell <[email protected]>

sruffell force-pushed the wip-sorted-datafiles branch from 444f1bc to 70c62e5 Compare August 25, 2020 08:44

lauft added the dev-branch label Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep Datafiles Sorted #354

Keep Datafiles Sorted #354

sruffell commented Aug 4, 2020 •

edited

Loading

lauft commented Aug 11, 2020

sruffell commented Aug 12, 2020

sruffell commented Aug 14, 2020 •

edited

Loading

Keep Datafiles Sorted #354

Are you sure you want to change the base?

Keep Datafiles Sorted #354

Conversation

sruffell commented Aug 4, 2020 • edited Loading

lauft commented Aug 11, 2020

sruffell commented Aug 12, 2020

sruffell commented Aug 14, 2020 • edited Loading

sruffell commented Aug 4, 2020 •

edited

Loading

sruffell commented Aug 14, 2020 •

edited

Loading