|
| 1 | +* TAP: |
| 2 | +* Title: Snapshot Merkle Trees |
| 3 | +* Version: 0 |
| 4 | +* Last-Modified: 17/09/2020 |
| 5 | +* Author: Marina Moore, Justin Cappos |
| 6 | +* Type: Standardization |
| 7 | +* Status: Draft |
| 8 | +* Content-Type: markdown |
| 9 | +* Created: 14/09/2020 |
| 10 | +* +TUF-Version: |
| 11 | +* +Post-History: |
| 12 | + |
| 13 | + # Abstract |
| 14 | + |
| 15 | + To optimize the snapshot metadata file size for large registries, registries |
| 16 | + can use a snapshot Merkle tree to conceptually store version information about |
| 17 | + all images in a single snapshot without needing to distribute this entire |
| 18 | + snapshot to all clients. First, the client retrieves only a timestamp file, |
| 19 | + which changes according to some period p (such as every day or week). Second, |
| 20 | + the snapshot file is itself kept as a Merkle tree, with the root stored in |
| 21 | + timestamp metadata. This snapshot file is broken into a file for each target |
| 22 | + that contains the Merkle tree leaf with information about that target and a |
| 23 | + path to the root of the Merkle tree. A new snapshot Merkle tree is generated |
| 24 | + every time a new timestamp is generated. To prove that there has not been a |
| 25 | + reversion of the snapshot Merkle tree when downloading an image, the client |
| 26 | + and third-party auditors download the prior snapshot Merkle trees and check |
| 27 | + that the version numbers did not decrease at any point. To make this scalable |
| 28 | + as the number of timestamps increases, the client will only download version |
| 29 | + information signed by the current timestamp file. Thus, rotating this key |
| 30 | + enables the registry to discard old snapshot Merkle tree data. |
| 31 | + |
| 32 | +The feature described in this TAP does not need to be implemented by all TUF |
| 33 | +implementations. It is an option for any adopter who is interested in the |
| 34 | +benefits provided by this feature, but may not make sense for implementations |
| 35 | +with fewer target files. |
| 36 | + |
| 37 | +# Motivation |
| 38 | + |
| 39 | +For very large repositories, the snapshot metadata file could get very large. |
| 40 | +This snapshot metadata file must be downloaded on every update cycle, and so |
| 41 | +could significantly impact the metadata overhead. For example, if a repository |
| 42 | +has 50,000,000 targets, the snapshot metadata will be about 380,000,000 bytes |
| 43 | +(https://docs.google.com/spreadsheets/d/18iwWnWvAAZ4In33EWJBgdAWVFE720B_z0eQlB4FpjNc/edit?ts=5ed7d6f4#gid=0). |
| 44 | +For this reason, it is necessary to create a more scalable solution for snapshot |
| 45 | +metadata that does not significantly impact the security properties of TUF. |
| 46 | + |
| 47 | +We designed a new approach to snapshot that improves scalability while |
| 48 | +achieving similar security properties to the existing snapshot metadata |
| 49 | + |
| 50 | + |
| 51 | +# Rationale |
| 52 | + |
| 53 | +Snapshot metadata provides a consistent view of the repository in order to |
| 54 | +protect against mix-and-match attacks and rollback attacks. In order to provide |
| 55 | +these protections, snapshot metadata is responsible for keeping track of the |
| 56 | +version number of each target file, ensuring that all targets downloaded are |
| 57 | +from the same snapshot, and ensuring that no target file decreases its version |
| 58 | +number (except in the case of fast forward attack recovery). Any new solution |
| 59 | +we develop must provide these same protections. |
| 60 | + |
| 61 | +A snapshot Merkle tree manages version information for each target by including |
| 62 | +this information in each leaf node. By using a Merkle tree to store these nodes, |
| 63 | +this proposal can cryptographically verify that different targets are from the |
| 64 | +same snapshot by ensuring that the Merkle tree roots match. Due to the |
| 65 | +properties of secure hash functions, any two leaves of a Merkle tree with the |
| 66 | +same root are from the same tree. |
| 67 | + |
| 68 | +In order to prevent rollback attacks between Merkle trees, this proposal |
| 69 | +introduces third-party auditors. These auditors are responsible for downloading |
| 70 | +all nodes of each Merkle tree to ensure that no version numbers have decreased |
| 71 | +between generated trees. This achieves rollback protection without every client |
| 72 | +having to store the version information for every target. |
| 73 | + |
| 74 | +# Specification |
| 75 | + |
| 76 | +This proposal replaces the single snapshot metadata file with a snapshot Merkle |
| 77 | +metadata file for each target. The repository generates these snapshot Merkle |
| 78 | +metadata files by building a Merkle tree using all target files and storing the |
| 79 | +path to each target in the snapshot Merkle metadata. The root of this Merkle |
| 80 | +tree is stored in timestamp metadata to allow for client verification. The |
| 81 | +client uses the path stored in the snapshot Merkle metadata for a target, along |
| 82 | +with the root of the Merkle tree, to ensure that metadata is from the given |
| 83 | +Merkle tree. The details of these files and procedures are described in |
| 84 | +this section. |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | +## Merkle tree generation |
| 89 | + |
| 90 | +When the repository generates snapshot metadata, instead of putting the version |
| 91 | +information for all targets into a single file, it instead uses the version |
| 92 | +information to generate a Merkle tree. Each target’s version information forms |
| 93 | +a leaf of the tree, then these leaves are used to build a Merkle tree. The |
| 94 | +internal nodes of a Merkle tree contain the hash of the leaf nodes. The exact |
| 95 | +algorithm for generating this Merkle tree (ie the order of leaves in the hash, |
| 96 | +how version information is encoded), is left to the implementer, but this |
| 97 | +algorithm should be documented in a POUF so that implementations can be |
| 98 | +compatible and correctly verify Merkle tree data. However, all implementations |
| 99 | +should meet the following requirements: |
| 100 | +* Leaf nodes must be unique. A unique identifier of the target, such as the |
| 101 | +filepath or hash must be included in the leaf data to ensure that no two leaf |
| 102 | +node hashes are the same. |
| 103 | +* The tree must be a Merkle tree. Each internal node must contain a hash that |
| 104 | +includes both leaf nodes. |
| 105 | + |
| 106 | +Once the Merkle tree is generated, the repository must create a snapshot Merkle |
| 107 | +metadata file for each target. This file must contain the leaf contents and |
| 108 | +the path to the root of the Merkle tree. This path must contain the hashes of |
| 109 | +sibling nodes needed to reconstruct the tree during verification (see diagram). |
| 110 | +In addition the path should contain direction information so that the client |
| 111 | +will know whether each node is a left or right sibling when reconstructing the |
| 112 | +tree. |
| 113 | + |
| 114 | +This information will be included in the following metadata format: |
| 115 | +``` |
| 116 | +{ “leaf_contents”: {METAFILES}, |
| 117 | + “Merkle_path”: {INDEX:HASH} |
| 118 | + “path_directions”:{INDEX:DIR} |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +Where `METAFILES` is the version information as defined for snapshot metadata, |
| 123 | +`INDEX` provides the ordering of nodes, `HASH` is the hash of the sibling node, |
| 124 | +and `DIR` indicates whether the given node is a left or right sibling. |
| 125 | + |
| 126 | +In addition, the following optional field will be added to timestamp metadata. |
| 127 | +If this field is included, the client should use snapshot Merkle metadata to |
| 128 | +verify updates instead: |
| 129 | + |
| 130 | +``` |
| 131 | +("merkle_root": ROOT_HASH) |
| 132 | +``` |
| 133 | + |
| 134 | +Where `ROOT_HASH` is the hash of the Merkle tree root. |
| 135 | + |
| 136 | +Note that snapshot Merkle metadata files do not need to be signed by a snapshot |
| 137 | +key because the path information will be verified based on the Merkle root |
| 138 | +provided in timestamp. Removing these signatures will provide additional space |
| 139 | +savings for clients. |
| 140 | + |
| 141 | +## Merkle tree verification |
| 142 | + |
| 143 | +If a client sees the `merkle_root` field in timestamp metadata, they will use |
| 144 | +the snapshot Merkle metadata to check version information. If this field is |
| 145 | +present, the client will download the snapshot Merkle metadata file only for |
| 146 | +the target the client is attempting to update. The client will verify the |
| 147 | +snapshot Merkle metadata file by reconstructing the Merkle tree and comparing |
| 148 | +the computed root hash to the hash provided in timestamp metadata. If the |
| 149 | +hashes do not match, the snapshot Merkle metadata is invalid. Otherwise, the |
| 150 | +client will use the version information in the verified snapshot Merkle |
| 151 | +metadata to proceed with the update. |
| 152 | + |
| 153 | +For additional rollback protection, the client may download previous versions |
| 154 | +of the snapshot Merkle metadata for the given target file. After verifying |
| 155 | +these files, the client should compare the version information in the previous |
| 156 | +Merkle trees to the information in the current Merkle tree to ensure that the |
| 157 | +version numbers have never decreased. In order to allow for fast forward attack |
| 158 | +recovery (discussed further in Security Analysis), the client should only |
| 159 | +download previous versions that were signed with the same timestamp key. |
| 160 | + |
| 161 | +## Auditing Merkle trees |
| 162 | + |
| 163 | +In order to ensure the validity of all target version information in the |
| 164 | +Merkle tree, third-party auditors should validate the entire tree each time it |
| 165 | +is updated. Auditors should download every snapshot Merkle file, verify the |
| 166 | +paths, check the root hash against the hash provided in timestamp metadata, |
| 167 | +and ensure that the version information has not decreased for each target. |
| 168 | +Alternatively, the repository may provide auditors with information about the |
| 169 | +contents and ordering of leaf nodes so that the auditors can more efficiently |
| 170 | +verify the entire tree. |
| 171 | + |
| 172 | +Auditors may provide an additional signature for timestamp metadata that |
| 173 | +indicates that they have verified the contents of the Merkle tree whose root |
| 174 | +is in that timestamp file. Using this signature, clients can check whether a |
| 175 | +particular third party has approved the Merkle tree. |
| 176 | + |
| 177 | +## Garbage collection |
| 178 | +When a threshold of timestamp keys are revoked and replaced, the repository no |
| 179 | +longer needs to store snapshot Merkle files signed by the previous timestamp |
| 180 | +key. Replacing the timestamp key is an opportunity for fast forward attack |
| 181 | +recovery, and so all version information from before the replacement is no |
| 182 | +longer valid. At this point, the repository may garbage collect all snapshot |
| 183 | +Merkle metadata files. |
| 184 | + |
| 185 | +# Security Analysis |
| 186 | + |
| 187 | +This proposal impacts the snapshot metadata, so this section will discuss the |
| 188 | +attacks that are mitigated by snapshot metadata in TUF. |
| 189 | + |
| 190 | +## Rollback attack |
| 191 | + |
| 192 | +In the event that the timestamp key is compromised, an attacker may provide an |
| 193 | +invalid Merkle tree that contains a previous version of a target. This attack |
| 194 | +is prevented by both the client’s verification and by auditors. When the client |
| 195 | +verifies previous versions of the snapshot Merkle metadata for a target, they |
| 196 | +ensure that the version number of that target has not decreased. However, if |
| 197 | +the attacker controls the timestamp key(s) and the repository, the previous |
| 198 | +snapshot Merkle metadata downloaded by the client may also be invalid. To |
| 199 | +protect against this case, third party auditors store the previous version of |
| 200 | +all metadata, and will detect when the version number decreases in a new |
| 201 | +Merkle tree. As long as the client checks for an auditor’s verification, the |
| 202 | +client will not install the rolled-back version of the target. |
| 203 | + |
| 204 | +## Fast forward attack |
| 205 | + |
| 206 | +If an attacker is able to compromise the timestamp key, they may arbitrarily |
| 207 | +increase the version number of a target in the snapshot Merkle metadata. If |
| 208 | +they increase it to a sufficiently large number (say the maximum integer value), |
| 209 | +the client will not accept any future version of the target as the version |
| 210 | +number will be below the previous version. To recover from this attack, |
| 211 | +auditors and clients should not check version information from before a |
| 212 | +timestamp key replacement. This allows a timestamp key replacement to be used |
| 213 | +as a reset after a fast forward attack. The existing system handles fast |
| 214 | +forward attack recovery in a similar manner, by instructing clients to delete |
| 215 | +stored version information after a timestamp key replacement. |
| 216 | + |
| 217 | +## Mix and match attack |
| 218 | + |
| 219 | +A snapshot Merkle tree prevents mix and match attacks by ensuring that all |
| 220 | +targets files installed come from the same snapshot Merkle tree. If all targets |
| 221 | +have version information in the same snapshot Merkle tree, the properties of |
| 222 | +secure hash functions ensure that these versions were part of the same snapshot. |
| 223 | + |
| 224 | + |
| 225 | +# Backwards Compatibility |
| 226 | + |
| 227 | +This TAP is not backwards compatible. The following table describes |
| 228 | +compatibility for clients and repositories. |
| 229 | + |
| 230 | +| Parties that support snapshot Merkle trees | Result | |
| 231 | +| ------------------------------------------ | ------ | |
| 232 | +| Client and repository support this TAP | Client and repository are compatible | |
| 233 | +| Client supports this TAP, but repository does not | Client and repository are compatible. The timestamp metadata provided by the repository will never contain the `merkle_root` field, and so the client will not look for snapshot Merkle metadata. | |
| 234 | +| Repository supports this TAP, but client does not | Client and repository are not compatible. If the repository uses snapshot Merkle metadata, the client will not recognise the `merkle_root` field as valid. | |
| 235 | +| Neither client nor repository supports this TAP | Client and repository are compatible | |
| 236 | + |
| 237 | +# Augmented Reference Implementation |
| 238 | + |
| 239 | +https://github.com/theupdateframework/tuf/pull/1113/ |
| 240 | +TODO: auditor implementation |
| 241 | + |
| 242 | +# Copyright |
| 243 | + |
| 244 | +This document has been placed in the public domain. |
0 commit comments