Skip to content

Commit 624bcba

Browse files
committed
Add initial draft of tap introducing snapshot merkle trees
Signed-off-by: marinamoore <[email protected]>
1 parent bce2f69 commit 624bcba

File tree

2 files changed

+244
-0
lines changed

2 files changed

+244
-0
lines changed

candidate-snapshot-merkle-tap.md

+244
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
* TAP:
2+
* Title: Snapshot Merkle Trees
3+
* Version: 0
4+
* Last-Modified: 17/09/2020
5+
* Author: Marina Moore, Justin Cappos
6+
* Type: Standardization
7+
* Status: Draft
8+
* Content-Type: markdown
9+
* Created: 14/09/2020
10+
* +TUF-Version:
11+
* +Post-History:
12+
13+
# Abstract
14+
15+
To optimize the snapshot metadata file size for large registries, registries
16+
can use a snapshot Merkle tree to conceptually store version information about
17+
all images in a single snapshot without needing to distribute this entire
18+
snapshot to all clients. First, the client retrieves only a timestamp file,
19+
which changes according to some period p (such as every day or week). Second,
20+
the snapshot file is itself kept as a Merkle tree, with the root stored in
21+
timestamp metadata. This snapshot file is broken into a file for each target
22+
that contains the Merkle tree leaf with information about that target and a
23+
path to the root of the Merkle tree. A new snapshot Merkle tree is generated
24+
every time a new timestamp is generated. To prove that there has not been a
25+
reversion of the snapshot Merkle tree when downloading an image, the client
26+
and third-party auditors download the prior snapshot Merkle trees and check
27+
that the version numbers did not decrease at any point. To make this scalable
28+
as the number of timestamps increases, the client will only download version
29+
information signed by the current timestamp file. Thus, rotating this key
30+
enables the registry to discard old snapshot Merkle tree data.
31+
32+
The feature described in this TAP does not need to be implemented by all TUF
33+
implementations. It is an option for any adopter who is interested in the
34+
benefits provided by this feature, but may not make sense for implementations
35+
with fewer target files.
36+
37+
# Motivation
38+
39+
For very large repositories, the snapshot metadata file could get very large.
40+
This snapshot metadata file must be downloaded on every update cycle, and so
41+
could significantly impact the metadata overhead. For example, if a repository
42+
has 50,000,000 targets, the snapshot metadata will be about 380,000,000 bytes
43+
(https://docs.google.com/spreadsheets/d/18iwWnWvAAZ4In33EWJBgdAWVFE720B_z0eQlB4FpjNc/edit?ts=5ed7d6f4#gid=0).
44+
For this reason, it is necessary to create a more scalable solution for snapshot
45+
metadata that does not significantly impact the security properties of TUF.
46+
47+
We designed a new approach to snapshot that improves scalability while
48+
achieving similar security properties to the existing snapshot metadata
49+
50+
51+
# Rationale
52+
53+
Snapshot metadata provides a consistent view of the repository in order to
54+
protect against mix-and-match attacks and rollback attacks. In order to provide
55+
these protections, snapshot metadata is responsible for keeping track of the
56+
version number of each target file, ensuring that all targets downloaded are
57+
from the same snapshot, and ensuring that no target file decreases its version
58+
number (except in the case of fast forward attack recovery). Any new solution
59+
we develop must provide these same protections.
60+
61+
A snapshot Merkle tree manages version information for each target by including
62+
this information in each leaf node. By using a Merkle tree to store these nodes,
63+
this proposal can cryptographically verify that different targets are from the
64+
same snapshot by ensuring that the Merkle tree roots match. Due to the
65+
properties of secure hash functions, any two leaves of a Merkle tree with the
66+
same root are from the same tree.
67+
68+
In order to prevent rollback attacks between Merkle trees, this proposal
69+
introduces third-party auditors. These auditors are responsible for downloading
70+
all nodes of each Merkle tree to ensure that no version numbers have decreased
71+
between generated trees. This achieves rollback protection without every client
72+
having to store the version information for every target.
73+
74+
# Specification
75+
76+
This proposal replaces the single snapshot metadata file with a snapshot Merkle
77+
metadata file for each target. The repository generates these snapshot Merkle
78+
metadata files by building a Merkle tree using all target files and storing the
79+
path to each target in the snapshot Merkle metadata. The root of this Merkle
80+
tree is stored in timestamp metadata to allow for client verification. The
81+
client uses the path stored in the snapshot Merkle metadata for a target, along
82+
with the root of the Merkle tree, to ensure that metadata is from the given
83+
Merkle tree. The details of these files and procedures are described in
84+
this section.
85+
86+
![Diagram of snapshot Merkle tree](merkletap-1.jpg)
87+
88+
## Merkle tree generation
89+
90+
When the repository generates snapshot metadata, instead of putting the version
91+
information for all targets into a single file, it instead uses the version
92+
information to generate a Merkle tree. Each target’s version information forms
93+
a leaf of the tree, then these leaves are used to build a Merkle tree. The
94+
internal nodes of a Merkle tree contain the hash of the leaf nodes. The exact
95+
algorithm for generating this Merkle tree (ie the order of leaves in the hash,
96+
how version information is encoded), is left to the implementer, but this
97+
algorithm should be documented in a POUF so that implementations can be
98+
compatible and correctly verify Merkle tree data. However, all implementations
99+
should meet the following requirements:
100+
* Leaf nodes must be unique. A unique identifier of the target, such as the
101+
filepath or hash must be included in the leaf data to ensure that no two leaf
102+
node hashes are the same.
103+
* The tree must be a Merkle tree. Each internal node must contain a hash that
104+
includes both leaf nodes.
105+
106+
Once the Merkle tree is generated, the repository must create a snapshot Merkle
107+
metadata file for each target. This file must contain the leaf contents and
108+
the path to the root of the Merkle tree. This path must contain the hashes of
109+
sibling nodes needed to reconstruct the tree during verification (see diagram).
110+
In addition the path should contain direction information so that the client
111+
will know whether each node is a left or right sibling when reconstructing the
112+
tree.
113+
114+
This information will be included in the following metadata format:
115+
```
116+
{ “leaf_contents”: {METAFILES},
117+
“Merkle_path”: {INDEX:HASH}
118+
“path_directions”:{INDEX:DIR}
119+
}
120+
```
121+
122+
Where `METAFILES` is the version information as defined for snapshot metadata,
123+
`INDEX` provides the ordering of nodes, `HASH` is the hash of the sibling node,
124+
and `DIR` indicates whether the given node is a left or right sibling.
125+
126+
In addition, the following optional field will be added to timestamp metadata.
127+
If this field is included, the client should use snapshot Merkle metadata to
128+
verify updates instead:
129+
130+
```
131+
("merkle_root": ROOT_HASH)
132+
```
133+
134+
Where `ROOT_HASH` is the hash of the Merkle tree root.
135+
136+
Note that snapshot Merkle metadata files do not need to be signed by a snapshot
137+
key because the path information will be verified based on the Merkle root
138+
provided in timestamp. Removing these signatures will provide additional space
139+
savings for clients.
140+
141+
## Merkle tree verification
142+
143+
If a client sees the `merkle_root` field in timestamp metadata, they will use
144+
the snapshot Merkle metadata to check version information. If this field is
145+
present, the client will download the snapshot Merkle metadata file only for
146+
the target the client is attempting to update. The client will verify the
147+
snapshot Merkle metadata file by reconstructing the Merkle tree and comparing
148+
the computed root hash to the hash provided in timestamp metadata. If the
149+
hashes do not match, the snapshot Merkle metadata is invalid. Otherwise, the
150+
client will use the version information in the verified snapshot Merkle
151+
metadata to proceed with the update.
152+
153+
For additional rollback protection, the client may download previous versions
154+
of the snapshot Merkle metadata for the given target file. After verifying
155+
these files, the client should compare the version information in the previous
156+
Merkle trees to the information in the current Merkle tree to ensure that the
157+
version numbers have never decreased. In order to allow for fast forward attack
158+
recovery (discussed further in Security Analysis), the client should only
159+
download previous versions that were signed with the same timestamp key.
160+
161+
## Auditing Merkle trees
162+
163+
In order to ensure the validity of all target version information in the
164+
Merkle tree, third-party auditors should validate the entire tree each time it
165+
is updated. Auditors should download every snapshot Merkle file, verify the
166+
paths, check the root hash against the hash provided in timestamp metadata,
167+
and ensure that the version information has not decreased for each target.
168+
Alternatively, the repository may provide auditors with information about the
169+
contents and ordering of leaf nodes so that the auditors can more efficiently
170+
verify the entire tree.
171+
172+
Auditors may provide an additional signature for timestamp metadata that
173+
indicates that they have verified the contents of the Merkle tree whose root
174+
is in that timestamp file. Using this signature, clients can check whether a
175+
particular third party has approved the Merkle tree.
176+
177+
## Garbage collection
178+
When a threshold of timestamp keys are revoked and replaced, the repository no
179+
longer needs to store snapshot Merkle files signed by the previous timestamp
180+
key. Replacing the timestamp key is an opportunity for fast forward attack
181+
recovery, and so all version information from before the replacement is no
182+
longer valid. At this point, the repository may garbage collect all snapshot
183+
Merkle metadata files.
184+
185+
# Security Analysis
186+
187+
This proposal impacts the snapshot metadata, so this section will discuss the
188+
attacks that are mitigated by snapshot metadata in TUF.
189+
190+
## Rollback attack
191+
192+
In the event that the timestamp key is compromised, an attacker may provide an
193+
invalid Merkle tree that contains a previous version of a target. This attack
194+
is prevented by both the client’s verification and by auditors. When the client
195+
verifies previous versions of the snapshot Merkle metadata for a target, they
196+
ensure that the version number of that target has not decreased. However, if
197+
the attacker controls the timestamp key(s) and the repository, the previous
198+
snapshot Merkle metadata downloaded by the client may also be invalid. To
199+
protect against this case, third party auditors store the previous version of
200+
all metadata, and will detect when the version number decreases in a new
201+
Merkle tree. As long as the client checks for an auditor’s verification, the
202+
client will not install the rolled-back version of the target.
203+
204+
## Fast forward attack
205+
206+
If an attacker is able to compromise the timestamp key, they may arbitrarily
207+
increase the version number of a target in the snapshot Merkle metadata. If
208+
they increase it to a sufficiently large number (say the maximum integer value),
209+
the client will not accept any future version of the target as the version
210+
number will be below the previous version. To recover from this attack,
211+
auditors and clients should not check version information from before a
212+
timestamp key replacement. This allows a timestamp key replacement to be used
213+
as a reset after a fast forward attack. The existing system handles fast
214+
forward attack recovery in a similar manner, by instructing clients to delete
215+
stored version information after a timestamp key replacement.
216+
217+
## Mix and match attack
218+
219+
A snapshot Merkle tree prevents mix and match attacks by ensuring that all
220+
targets files installed come from the same snapshot Merkle tree. If all targets
221+
have version information in the same snapshot Merkle tree, the properties of
222+
secure hash functions ensure that these versions were part of the same snapshot.
223+
224+
225+
# Backwards Compatibility
226+
227+
This TAP is not backwards compatible. The following table describes
228+
compatibility for clients and repositories.
229+
230+
| Parties that support snapshot Merkle trees | Result |
231+
| ------------------------------------------ | ------ |
232+
| Client and repository support this TAP | Client and repository are compatible |
233+
| Client supports this TAP, but repository does not | Client and repository are compatible. The timestamp metadata provided by the repository will never contain the `merkle_root` field, and so the client will not look for snapshot Merkle metadata. |
234+
| Repository supports this TAP, but client does not | Client and repository are not compatible. If the repository uses snapshot Merkle metadata, the client will not recognise the `merkle_root` field as valid. |
235+
| Neither client nor repository supports this TAP | Client and repository are compatible |
236+
237+
# Augmented Reference Implementation
238+
239+
https://github.com/theupdateframework/tuf/pull/1113/
240+
TODO: auditor implementation
241+
242+
# Copyright
243+
244+
This document has been placed in the public domain.

merkletap-1.jpg

30.8 KB
Loading

0 commit comments

Comments
 (0)