Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify timestamp.json METAFILES format #90

Merged
merged 10 commits into from
Mar 6, 2020

Conversation

joshuagl
Copy link
Member

@joshuagl joshuagl commented Feb 5, 2020

A read-through of the spec made me question the definition of METAFILES for snapshot.json.
In step 3 of the client application detailed workflow the spec states that the client download

"up to the number of bytes specified in the timestamp metadata file"

yet the description of METAFILES for snapshot.json suggests the object behaves the same as METAFILES in timestamp.json where LENGTH is optional.

As @erickt raised a similar concern in #38 here's a pull request to try and clarify.

tuf-spec.md Outdated

{ METAPATH : {
"version" : VERSION,
("length" : LENGTH, |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("length" : LENGTH, |
"length" : LENGTH, |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from other "or"-groups in the spec, where we do use parentheses (see #L734-L735 and #L840-L841)?

If not, I suggest we use a consistent notation, with or without parentheses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I understand now. The lack of parentheses indicates that one of the values must be provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if only one of them is mandatory, then the routine from the client workflow that @joshuagl quotes above , i.e.

"up to the number of bytes specified in the timestamp metadata file"

would fail in the case timestamp only lists hashes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. If we keep the METAPATH object as is with optional fields, then the client should fetch either length or an application specified size, and in the latter case compare to the hash.

Perhaps length should be mandatory here, the client workflow seems to imply that it is.

@joshuagl joshuagl force-pushed the joshuagl/timestamp-length branch from 17e6516 to af91a93 Compare February 5, 2020 21:33
@joshuagl
Copy link
Member Author

joshuagl commented Feb 5, 2020

I've force pushed an update which includes @trishankatdatadog's suggests, thanks for the review!

@joshuagl joshuagl force-pushed the joshuagl/timestamp-length branch from af91a93 to 97312dd Compare February 5, 2020 21:37
Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, @joshuagl! Do we need to rephrase the client workflow you quoted, i.e. "up to the number of bytes specified in ... "? (see inline comments)

Besides, it might be worth to briefly re-consider the benefit of listing hash and length of snapshot.json in timestamp.json. Judging from the arguments, brought forward by @trishankatdatadog and @awwad in #38 and #28, the version of snapshot might be enough.

cc @JustinCappos, @mnm678.

@joshuagl
Copy link
Member Author

joshuagl commented Feb 6, 2020

Thanks for the review @lukpueh. You are right, with the change as-is we would need to rephrase the client workflow.

However, having read #28 and re-read #38 it does seem that we might be able to simplify the listing of snapshot.json in timestamp.json.

If we remove the hashes, does it make sense to list length still? Or would it be better to allow the application to "download up to Z number of bytes" as with other metadata file downloads?

@lukpueh
Copy link
Member

lukpueh commented Feb 6, 2020

I can't see how the hashes and lengths fields depend on each other. So I don't think there would be a problem in keeping/mandating only one of them. But I do wonder if we even need to keep/mandate either of them.

I assume the main argument for lengths is to have an exact download upper bound. But as @trishankkarthik points out in #38, timestamp should always have pretty much the same size (modulo a few bytes more or less due to snapshot's version), so a "download up to Z number of bytes"-strategy should be good enough.

hashes OTOH seems to make the connection between timestamp and snapshot more robust. But if there is a unique mapping from timestamp to snapshot via snapshot's version number, hashes shouldn't be needed. Note that snapshot's integrity is guaranteed by snapshot's signature. And the version number should indeed always be unique, in order to prevent rollback attacks (see secure-systems-lab/peps#69 for related discussions).

@lukpueh
Copy link
Member

lukpueh commented Feb 7, 2020

Ping @mnm678, @JustinCappos, @trishankatdatadog. Your opinions on the matter would be appreciated! :)

We should be able to close both #28 and #38 with this PR. This can be done by removing/optionalizing hashes and lengths in timestamp (for the reasons presented above and in the issues), or by making an argument for not doing so.

Either way, clarifying the relevant section in the spec, as @joshuagl does, is a good idea.

@trishankatdatadog
Copy link
Member

I think we might as well make timestamp consistent with snapshot, so if the latter lists only version numbers, then that's what the former should do, too. I can't imagine we lose much security this way, especially since, in practice, timestamp and snapshot share the same online key anyway, and as long as we set an upper bound on the size of any metadata downloaded. So, length and hashes may go back to being optional, just like in snapshot.

@mnm678
Copy link
Collaborator

mnm678 commented Feb 7, 2020

I think that we should remove/make optional the hashes and lengths in timestamp to match the requirement in snapshot. As @trishankatdatadog mentioned, the only benefit gained from the hashes is extra verification of the contents of snapshot, which is redundant as snapshot is alright signed by the snapshot key (and in practice this is often the same as the timestamp key).

As for the question of whether to remove them or make them optional, I would propose we make them optional for backwards compatibility, and possibly remove them in a future major release.

@JustinCappos
Copy link
Member

I do feel more nervous about removing the hash and length from timestamp. The length of snapshot could increase dramatically as new targets are added. It could also decrease after a rotation of the snapshot key and clean up of outdated targets files.

I'll list a (semi-contrived) situation where version number, hash, and length are all important to have.

  1. If missing the length, but with a hash and version number. An attacker can launch an endless data attack more easily without compromising any keys.
  2. If missing the hash, but with a length and version number. An attacker which has stolen the timestamp and snapshot key and who can answer client requests (e.g., a compromised repo), can give different clients the same timestamp but different snapshot metadata. This may allow bifurcation of client requests without doing things like rapidly increasing the version number or giving different timestamp files, both of which would be more noticeable. (This is fairly contrived. I'm not really sure the impact here, but it seems worrying.)
  3. If missing the version number, but with a hash and a length. An attacker that has compromised the timestamp only, may generate a timestamp file for a future time that lists an old snapshot file. A client updating later, but before the future time, would not download the new snapshot file, thus causing a freeze attack without the attacker needing to maintain control of the repo. (The repo could rotate timestamp to fix this.)

My off-the-cuff thoughts are: having a version number is good, having a length is good, having a hash is good, and having all three seems to be the best. I believe it may be possible to drop one or more (but we would want to think very carefully about this), but is there a reason not to list all three?

@lukpueh
Copy link
Member

lukpueh commented Feb 10, 2020

  1. If missing the length, but with a hash and version number. An attacker can launch an endless data attack more easily without compromising any keys.

I realize now that the constant size of timestamp as argument against lengths in timestamp (in my comment above and a comment in #38) is nonsense, since the field specifies snapshot's size and not timestamp's. 🤦‍♂ Sorry for the confusion.

That means an upper bound for snapshot, either via lengths in timestamp, or defined by the client (as we do for other metadata) is necessary to prevent endless data attacks.

  1. If missing the hash, but with a length and version number. An attacker which has stolen the timestamp and snapshot key and who can answer client requests (e.g., a compromised repo), can give different clients the same timestamp but different snapshot metadata. [...]

AFAICS, with the given attacker capabilities, they could achieve the same by serving different timestamp and different snapshot metadata to different clients, even if timestamp includes a hash of snapshot. Or is there an advantage for the attacker to serve only different snapshot metadata?

  1. If missing the version number, but with a hash and a length. An attacker that has compromised the timestamp only, may generate a timestamp file for a future time that lists an old snapshot file. [...]

I think we all agree that snapshot's version number in timestamp.json is important to prevent rollback attacks, despite a compromised timestamp role.


It seems like the removal/optionalization of hashes and lengths was motivated by bandwidth concerns in the Mercury paper (see theupdateframework/rust-tuf#104 and theupdateframework/python-tuf#465). However, the paper does not seem to mention the lengths field (it only talks about hashes vs. version), nor the timestamp role (it only talks about those fields in snapshot).

To summarize my thoughts on those fields:

  • version
    • required to fail early on rollback attacks
    • alternative to hashes for mapping the referenced metadata. The mapping should be 1-1, but could become 1-n, e.g. if version numbers are re-used or in @JustinCappos' bifurcation scenario (see 2. above).
  • hashes
    • more robust mapping of referenced metadata, compared to version (1-1 if no hash collisions)
    • strong impact on bandwidth
  • lengths
    • precise and flexible way of preventing endless-data attacks,
    • a client-defined upper bound could be a bandwidth-effective but less precise and less flexible alternative

And to summarize my thoughts on the affected roles:

  • timestamp:
    • most downloaded role
    • only has one entry, i.e. snapshot.json
  • snapshot:
    • second most downloaded role
    • can have many entries , i.e. targets.json and all .json

Please correct me if I'm wrong or missing something!

@trishankatdatadog
Copy link
Member

My off-the-cuff thoughts are: having a version number is good, having a length is good, having a hash is good, and having all three seems to be the best. I believe it may be possible to drop one or more (but we would want to think very carefully about this), but is there a reason not to list all three?

Yeah, but shouldn't the snapshot also list all three by that logic? Don't we accept some tradeoff in security for b/w performance there? Here there is no such demand, but I find all the scenarios above fairly contrived. Having said that, there's absolutely no reason to drop all 3 in timestamp except for some aesthetic consistency.

@JustinCappos
Copy link
Member

JustinCappos commented Feb 10, 2020 via email

@lukpueh
Copy link
Member

lukpueh commented Feb 11, 2020

Snapshot doesn't have hashes because the size becomes quite large overall and our Mercury work shows that the version number is even more valuable to have than the hash in most cases. For timestamp, it's a single entry so I think doesn't matter in the same way...

Interestingly, the specification currently suggests the opposite. For timestamp.json it says:

Timestamp files will potentially be downloaded very frequently. Unnecessary
information in them will be avoided.

@lukpueh
Copy link
Member

lukpueh commented Feb 11, 2020

Based on the discussions here, I took the liberty to hijack this PR, reverting the commits and instead

  • adding a client-set download limit as fallback if snapshot length is not set in timestamp.json, and
  • generally clarifying METAFILES and its fields for timestamp.json and snapshot.json

@joshuagl, I hope you don't mind. This is just a suggestion. We can easily reset to what it was before.

@trishankatdatadog, @JustinCappos, would you kindly take another look? length and hashes is now again optional in both timestamp.json and snapshot.json. And I added some context, why one would want to omit them (i.e. bandwidth), and under what circumstances they may be omitted (i.e. fallback download limit, no reused version numbers).

We could further add a recommendation for when those fields should be omitted (i.e. for timestamp: 'low expiration' and for snapshot: 'low expiration' x 'many (delegated) targets'). What do you think?

@lukpueh lukpueh force-pushed the joshuagl/timestamp-length branch from c881d30 to 6933985 Compare February 11, 2020 11:43
@joshuagl
Copy link
Member Author

I don't mind at all, this looks like a good set of clarifications to me.

I do like the idea of adding a recommendation on when optional fields could be omitted.

tuf-spec.md Outdated
in the timestamp metadata file. If consistent snapshots are not used (see
**3**. **Download snapshot metadata file**, up to the number of bytes
specified in the timestamp metadata file. If not specified, download up to a
number of bytes set by the authors of the application using TUF. This may be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposefully say "a number of bytes" instead of "VAR number of bytes" because we already use X and Y above, and Z below, so I wasn't sure what VAR to use. If others want, we can make the wording consistent. Any thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that, we could shuffle the variables so that we use W, X, Y and Z.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Let's wait till we know in what direction this PR should go. If we decide to make length mandatory, we don't need the client-defined upper bound.

Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No brackets in METAPATH now, since the fields are no longer optional, please lose them

@lukpueh
Copy link
Member

lukpueh commented Feb 12, 2020

No brackets in METAPATH now, since the fields are no longer optional, please lose them

Which brackets are you referring to, @trishankatdatadog? The current diff of this PR doesn't change anything about the metadata format. Do you want to make version, length and hashes mandatory for snapshot and timestamp?

How does that fit in with what you say above that "we might as well make timestamp consistent with snapshot, so if the latter lists only version numbers, then that's what the former should do".

Also, the specification currently claims "Timestamp files will potentially be downloaded very frequently. Unnecessary information in them will be avoided." Is that not a reference to the suggestions of the Mercury paper?

@trishankatdatadog
Copy link
Member

@lukpueh Yeah, I think we should make a determination whether we're going to use them or not. Let's discuss in the next community meeting.

P.S. At the time that part of the spec was written, I don't think the Mercury paper was around.

Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on community meeting today, let's make hashes and lengths optional in both timestamp and snapshot, and I think we can approve this and move forward from there.

@lukpueh
Copy link
Member

lukpueh commented Mar 3, 2020

Thanks for the comment @trishankatdatadog. What changes do you request? In the current version of this PR the METAFILES data format is defined in section "4.4. File formats: snapshot.json", marking the length and hashes field as optional. In section "4.6. File formats: timestamp.json" , we say that METAFILES is is the same is described for the snapshot.json file, i.e. length and hashes are also flexible.

@joshuagl
Copy link
Member Author

joshuagl commented Mar 3, 2020

Can we update step 3. Download snapshot metadata file to use a variable – i.e. "up to Y number of bytes (because the size is unknown)" – so that it's consistent with the other steps?
(Also shuffling the other steps so that the variables start at W and are ordered alphabetically)

@joshuagl joshuagl closed this Mar 3, 2020
@joshuagl joshuagl reopened this Mar 3, 2020
@lukpueh
Copy link
Member

lukpueh commented Mar 3, 2020

Good call, @joshuagl. Do you want to push a commit with the variables W,X,Y,Z you proposed above?

@joshuagl joshuagl force-pushed the joshuagl/timestamp-length branch from 74dc56f to a32c938 Compare March 3, 2020 11:08
@joshuagl
Copy link
Member Author

joshuagl commented Mar 3, 2020

Good call, @joshuagl. Do you want to push a commit with the variables W,X,Y,Z you proposed above?

Done.

@joshuagl joshuagl force-pushed the joshuagl/timestamp-length branch 2 times, most recently from 969b24f to 37c6be8 Compare March 3, 2020 11:15
mnm678
mnm678 previously approved these changes Mar 3, 2020
file at METAPATH, including their cryptographic hash function. For example:
{ "sha256": HASH, ... }. HASHES is OPTIONAL and can be omitted to reduce
the snapshot metadata file size. In that case the repository MUST guarantee
that VERSION alone unambiguously identifies the metadata at METAPATH.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to note when length and hashes should be included (ie backwards compatibility for both, using length as an exact download size)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards compatibility is one reason, yes, but I believe the more important reason is to protect against certain security attacks. Please see Section 5.6 of Mercury.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we are bringing back hashes and length here, we should add the corresponding checks in the download process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to note when length and hashes should be included

@mnm678, @trishankatdatadog: Good idea. Could you push a commit with such a note?

Also, if we are bringing back hashes and length here, we should add the corresponding checks in the download process.

@trishankatdatadog: What's missing from the existing (L1136-1138) and newly proposed (L11160-1163) download checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukpueh Those are good checks, but:

  • 3.1 doesn't mention that hashes and length are optional. It also doesn't mention checking the length, if any, of the snapshot metadata.
  • When we check the targets role or its delegations, do we mention checking optional hashes and length from the snapshot metadata?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing these out, @trishankatdatadog!

  • 3.1 doesn't mention that hashes and length are optional. It also doesn't mention checking the length, if any, of the snapshot metadata.

Is another length match check in 3.1. necessary? We already say in 3. that the client should download snapshot.json up to the length specified in timestamp.json (if specified).

Regarding optionality of hashes, I agree we could add a "(if any)" note in 3.1, as we do for targets in 4.1.

  • When we check the targets role or its delegations, do we mention checking optional hashes and length from the snapshot metadata?

For top-level targets we do say in 4.1. "The hashes (if any), and version number of the new targets metadata file MUST match the trusted snapshot metadata."
I think the "(if any)" is slightly misplaced, because the hashes it refers are computed by the client. It would be more accurate to say: "The hashes and version number of the new targets metadata file MUST match the hashes (if any) and version number listed in the trusted snapshot". Let me push a commit that clarifies this.

Regarding length, similar to timestamp-snapshot in 3., we say in 4.1. that the client should download targets.json up to the length specified in snapshot.json (if specified). Do you think we need another length match check in 4.1.?

Regarding delegated targets metadata, we currently don't provide details about verification. #86 adds those details and adopts the "(if any)" wording used in 4.1. If we accept my suggestion for 4.1. above, we should adopt this for delegated targets metadata in #86.

Similarly, if we add a length match check for top-level targets metadata, we should adopt this for delegated in #86 too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified optionality of hashes (checks) in 0bdb99c. Let me know if we should add another optional length check...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a second length check isn't necessary as the length is already used when downloading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we check the targets role or its delegations, do we mention checking optional hashes and length from the snapshot metadata?
FYI, @trishankatdatadog: As promised above, I just pushed a commit to #86 that notes that delegated targets metadata hashes might be missing in snapshot metadata.

joshuagl and others added 9 commits March 5, 2020 13:05
In step 3 of the client application detailed workflow the spec states that the
client download "up to the number of bytes specified in the timestamp metadata
file" which suggests that LENGTH is not optional in the METAFILES entry for
snapshot.json in the timestamp.json file.

Signed-off-by: Joshua Lock <[email protected]>
The listing for snapshot.json in timestamp.json REQUIRES both the
length and the hashes field, unlike snapshot.json where these are
OPTIONAL.

Fixes #38

Signed-off-by: Joshua Lock <[email protected]>
This reverts commits:
- Clarify timestamp.json METAFILES format
97312dd

- LENGTH isn't optional in timestamp.json for snapshot.json
d97da38
Describe how to use a client-defined upper limit for snapshot
metadata if timestamp does not specify its length.
Generally clarify what files are listed under METAFILES in
snapshot.json and timestamp.json and under what circumstances the
LENGTH and HASHES field for a METAFILE can be omitted (and why
it should be omitted).
Update the modified step 3 to use a named variable for the application author
defined number of bytes to download a snapshot metadata file. Re-name prior
variables, starting at W, to ensure an alphabetically consistent list of
variables used in each step.

Signed-off-by: Joshua Lock <[email protected]>
Update the client workflow to clarify that snapshot metadata hashes
can only be checked if timestamp lists the optional hashes, and,
similarly, targets metadata hashes can only be checked if snapshot
lists the optional hashes.
@lukpueh lukpueh force-pushed the joshuagl/timestamp-length branch from 0bdb99c to 081e56a Compare March 5, 2020 12:09
@lukpueh
Copy link
Member

lukpueh commented Mar 6, 2020

Thanks for the spadework, @joshuagl, and for the thorough reviews, @mnm678, @trishankatdatadog and @JustinCappos.

After having bumped last modified date and patch version number, as advised by our new release policy, the checker script passes and we can merge and release v1.0.1. 🎉

@lukpueh lukpueh merged commit 69b8456 into theupdateframework:master Mar 6, 2020
@joshuagl joshuagl deleted the joshuagl/timestamp-length branch March 6, 2020 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants