|
| 1 | +exercise sheet 1, advanced database systems, 2024s |
| 2 | + |
| 3 | +author: yahya jabary, 11912007 |
| 4 | + |
| 5 | +# execise 1: disk access in databases |
| 6 | + |
| 7 | +*disk specifications:* |
| 8 | + |
| 9 | +- **magenetic disk:** |
| 10 | + - block size: 1 kB |
| 11 | + - rotational speed: 10,000 rpm |
| 12 | + - seek time: 1 ms |
| 13 | + - transfer rate: 500 MB/s |
| 14 | + - track-to-track seek time: 1 ms |
| 15 | + - track size: 1000 kB |
| 16 | +- **ssd:** |
| 17 | + - block size: 10 kB |
| 18 | + - transfer rate: 3000 MB/s |
| 19 | + |
| 20 | +_database files:_ |
| 21 | + |
| 22 | +- $Item$: |
| 23 | + - $n_i$ = 50,000 records (number of records) |
| 24 | + - $R_i$ = 10 kB (record size) |
| 25 | +- $Supplier$: |
| 26 | + - $n_s$ = 200 records |
| 27 | + - $R_s$ = 50 kB |
| 28 | + |
| 29 | +*dbms:* |
| 30 | + |
| 31 | +- block size: 1000 kB → larger than disk block sizes so the dbms will need to perform more i/o operations to interact with a full dbms block. |
| 32 | +- unspanned → no indirection |
| 33 | +- contiguous block allocation → blocks of the same file are adjacent |
| 34 | +- main memory is empty when starting off |
| 35 | +- intermediate results and hash values stay in memory |
| 36 | + |
| 37 | +_assignment:_ |
| 38 | + |
| 39 | +- query: $Item \bowtie_{Item.supplier=Supplier.id}Supplier$ |
| 40 | +- calculate access time for the given query execution plans by postgres for each external storage |
| 41 | + |
| 42 | +## a) hash join |
| 43 | + |
| 44 | +``` |
| 45 | +Hash Join |
| 46 | + Hash Cond: (i.supplier = s.id) |
| 47 | + -> Seq Scan on item i |
| 48 | + -> Hash |
| 49 | + -> Seq Scan on supplier s |
| 50 | +``` |
| 51 | + |
| 52 | +*hash join* |
| 53 | + |
| 54 | +assume we want to equi join: $R \bowtie_{\text{A}=\text{B}} S$ |
| 55 | + |
| 56 | +- i. partition phase: |
| 57 | + - find a hash function that can map values in the join columns to a buffer frame index between $[1;B\text{-}1]$. → the buffer frames we map the rows to are called "buckets" and the 1 remaining buffer frame is used to read new pages in. |
| 58 | + - read each page $p_R$ of $R$ to memory. then hash the join value of each row to find the right bucket to store a pointer in. → if buffer frames overflow, write them back to disk. |
| 59 | + - repeat for $p_S$ of $S$. |
| 60 | + - total cost: $2 \cdot (b_R + b_S)$ → factor of 2 because of initial reading and writing back the potentially full buckets to disk. |
| 61 | +- ii. probing phase: |
| 62 | + - assuming $R_i$ and $S_i$ are all rows in the $i$th-bucket (and $R_i$ is the smaller one of them): read $R_i$ to $B\text-2$ buffer frames. → if not possible, either hash recursively or try another algorithm. the 2 remaining buffer frames are used to read new $S_i$ pages in and store the final result. |
| 63 | + - read each page of $S_i$ into memory. then check each row for matches with $R_i$. |
| 64 | + - if a matching row is found, write it into the buffer frame dedicated to results. |
| 65 | + - total cost: $b_i + b_s$ |
| 66 | +- **total cost of both phases**: $3 \cdot (b_R + b_S)$ |
| 67 | + |
| 68 | +*access time: magnetic disk* |
| 69 | + |
| 70 | +- access time for one block: |
| 71 | + - $t_s$ - seek time: 1ms |
| 72 | + - $t_r$ - rotational delay: 0.5 \* (1 / 10,000) \* 60 = 3ms → we assume that it takes 0.5 rotations for a hit on average. |
| 73 | + - $t_{tr}$ - transfer time: 500MB/s = 1kB/0.002ms |
| 74 | + - total: 4.002ms |
| 75 | +- access time for $n$ blocks: |
| 76 | + - $t_{t2t}$ - track to track seek time = 1ms |
| 77 | + - num of allocated tracks: 1000kB track size / 1kB block = 1000 blocks per track → for $n$ blocks we need $n$/1000 tracks → we would change tracks ($n$/1000-1) times |
| 78 | + - **random access**: $n$ * 4.002ms |
| 79 | + - **sequential access**: $t_s$ + $t_r$ + $n \cdot t_{tr}$ + track changes \* $t_{t2t}$ = 1ms + 3ms + $n$ \* 0.002ms + ($n$/1000-1) \* 1ms |
| 80 | +- i. $Item$ |
| 81 | + - total num of blocks: (50,000 records * 10 kB record size) / 1kB block size = 500,000 blocks |
| 82 | + - sequential access of 500,000 blocks: **1503ms** |
| 83 | +- ii. $Supplier$ |
| 84 | + - total num of blocks: (200 records * 50 kB record size) / 1kB block size = 10,000 blocks |
| 85 | + - sequential access of 10,000 blocks: **33ms** |
| 86 | +- iii. total access time for hash join |
| 87 | + - $3 \cdot (b_{Item} + b_{Supplier})$ = 3 * (1503ms + 33ms) = **4608ms** |
| 88 | + |
| 89 | +*access time: ssd* |
| 90 | + |
| 91 | +- access time for one block: |
| 92 | + - $t_{tr}$ - transfer time: 3000MB/s = 10kB/3333.3 ns |
| 93 | +- access time for $n$ blocks: |
| 94 | + - **sequential / random access**: $n$ * 3333.3ns |
| 95 | +- i. $Item$ |
| 96 | + - total num of blocks: (50,000 records * 10 kB record size) / 10kB block size = 50,000 blocks |
| 97 | + - sequential access of 50,000 blocks: **166.665ms** |
| 98 | +- ii. $Supplier$ |
| 99 | + - total num of blocks: (200 records * 50 kB record size) / 10kB block size = 1000 blocks |
| 100 | + - sequential access of 1000 blocks: **3.3333ms** |
| 101 | +- iii. total access time for hash join |
| 102 | + - $3 \cdot (b_{Item} + b_{Supplier})$ = 3 * (166.665ms + 3.3333ms) = **509.9949ms** |
| 103 | + |
| 104 | +## b) index nested loops join |
| 105 | + |
| 106 | +``` |
| 107 | +Nested Loop |
| 108 | +-> Seq Scan on supplier s |
| 109 | +-> Index Scan using record_by_idx on item i |
| 110 | +Index Cond: (supplier = s.id) |
| 111 | +``` |
| 112 | + |
| 113 | +pseudo algorithm for naive nested loops join: |
| 114 | + |
| 115 | +``` |
| 116 | +foreach page p_item of item: |
| 117 | + foreach page p_supplier of supplier: |
| 118 | + foreach tuple i ∈ p_item and s ∈ p_supplier: |
| 119 | + if i.supplier = s.id then Res := Res ∪ {(r,s)} |
| 120 | +``` |
| 121 | + |
| 122 | +pseudo algorithm for index nested loops join: |
| 123 | + |
| 124 | +``` |
| 125 | +itemIndex := generateIndex(i.supplier) |
| 126 | +
|
| 127 | +foreach page p_supplier of supplier: |
| 128 | + foreach tuple s ∈ p_supplier: |
| 129 | + Res := Res ∪ itemIndex.getMatches(s.id) |
| 130 | +``` |
| 131 | + |
| 132 | +details: |
| 133 | + |
| 134 | +- every `supplier.id` is looked up in an index of `item`, with the column `item.supplier` as the index key. |
| 135 | +- if there is a match, the record from the index pointer gets read from disk. |
| 136 | + - the result contains 20 records. |
| 137 | + - the disk access isn’t sequential. we don’t know anything about the read order. |
| 138 | +- we don’t know which kind index was used. we do not include the disk access costs for index creation. |
| 139 | +- **total cost**: $b_{Supplier} + 20 \cdot r_{Item}$ |
| 140 | + |
| 141 | +*access time: magnetic disk* |
| 142 | + |
| 143 | +- i. $Item$ |
| 144 | + - total num of blocks: (20 records * 10 kB record size) / 1kB block size = 200 blocks |
| 145 | + - random access of 200 blocks: **800.4ms** |
| 146 | +- ii. $Supplier$ |
| 147 | + - sequential access of 10,000 blocks: **33ms** (same as previous example) |
| 148 | +- iii. total: **833.4 ms** |
| 149 | + |
| 150 | +*access time: ssd* |
| 151 | + |
| 152 | +- i. $Item$ |
| 153 | + - total num of blocks: (20 records * 10 kB record size) / 10kB block size = 20 blocks |
| 154 | + - random access of 20 blocks: **0.066666ms** |
| 155 | +- ii. $Supplier$ |
| 156 | + - sequential access of 1000 blocks: **3.3333ms** (same as previous example) |
| 157 | +- iii. total: **3.399966ms** |
| 158 | + |
| 159 | +# execise 2: selectivity |
| 160 | + |
| 161 | +## a) |
| 162 | + |
| 163 | +estimate the selectivity: |
| 164 | + |
| 165 | +- `repository.contributors` has 100,000 rows |
| 166 | +- equi-depth histogram: 7 buckets of equal size using the 6 dividers {1, 2, 4, 7, 12, 20} |
| 167 | +- max value: 255 |
| 168 | + - assumption: boundary values are included in the following bucket |
| 169 | + - buckets: {\[-∞;0], \[1;1], \[2;3], \[4;6], \[7;11], \[12;19], \[20;255]} |
| 170 | +- assume uniform distribution |
| 171 | + |
| 172 | +*i) predicate: `contributors ≥ 4`* |
| 173 | + |
| 174 | +- because the histogram is equi-depth, we can use the bucket count to calculate selectivity |
| 175 | +- 4 buckets satisfy the predicate: {\[4;6], \[7;11], \[12;19], \[20;255]} |
| 176 | +- selectivity ≈ 4/7 buckets = 0.5714285714 |
| 177 | + |
| 178 | +*ii) predicate: `contributors > 12`* |
| 179 | + |
| 180 | +- if the 2 buckets in {\[12;19], \[20;255]} wouldn’t contain the value 12, then they would satisfy the predicate. |
| 181 | +- since the values are evenly spread, $\approx(1-\frac{1}{19-12}) = \frac{6}{7}$ of values in the \[12;19] bucket satisfy the predicate. |
| 182 | +- selectivity ≈ $(1+\frac{6}{7})/7$ buckets = 0.2653061224 |
| 183 | + |
| 184 | +## b) |
| 185 | + |
| 186 | +estimate the selectivity: |
| 187 | + |
| 188 | +- avoid histograms for this part: histograms focus on ranges of values. they aren’t useful for selectivity estimation of equalities. |
| 189 | +- `repository.contributors` has 400 distinct values. |
| 190 | + - (note: combined with the prior specification, this means that values can also be negative) |
| 191 | + |
| 192 | +*i) predicate: `contributors == 5`* |
| 193 | + |
| 194 | +- we assume uniform distribution (in the absence of other information) |
| 195 | +- selectivity ≈ $\frac{1}{400}$ = 0.0025 |
| 196 | + |
| 197 | +*ii) predicate: `contributors != 5`* |
| 198 | + |
| 199 | +- we assume uniform distribution (in the absence of other information) |
| 200 | +- selectivity ≈ $1-\frac{1}{400}$ = 0.9975 |
| 201 | + |
| 202 | +*limitations of the method* |
| 203 | + |
| 204 | +- **uniform distribution assumption:** this method assumes that all distinct values in the 'contributors' column are equally likely. this is very unlikely in reality. (ie. a negative exponential distribution might be a better approximation - but we can only find out through sampling). |
| 205 | +- **lack of correlation:** it doesn't consider potential correlations between the 'contributors' column and other data in the table. these correlations could impact the actual selectivity of the predicate. |
| 206 | + |
| 207 | +*possible solutions* |
| 208 | + |
| 209 | +- **more detailed histograms:** instead of equi-depth histograms, you could use equi-width histograms or more sophisticated histograms that better capture the distribution of frequent values. |
| 210 | +- **sampling:** take a representative sample of the data and examine the distribution of values in the 'contributors' column within the sample. this can give you a more realistic estimate of selectivity. |
| 211 | +- **collect statistics:** gather more detailed statistics about the frequency of different values in the 'contributors' column. this will improve selectivity estimates for equality predicates. |
| 212 | + |
| 213 | +## c) |
| 214 | + |
| 215 | +estimate the selectivity: |
| 216 | + |
| 217 | +- `user` has 50,000 rows |
| 218 | +- `repository` has 100,000 rows (read from the table in the assignment, not part of this exercise) |
| 219 | +- the `user.id` key can be joined on the foreign key `repository.owner` |
| 220 | + - assumption: no null values in `repository.owner` because it’s a foreign key. |
| 221 | + |
| 222 | +*i) `repository ⋈_{owner=id} user`* |
| 223 | + |
| 224 | +- since `user.id` is a key attribute, the result of the join can’t have more rows than `user` has. |
| 225 | +- selectivity of join ≈ selectivity of `user.id` = 1/50,000 = 0.00002 |
| 226 | + |
| 227 | +*ii) `π_owner(repository)`* |
| 228 | + |
| 229 | +- `repository.owner` has \[1;50,000] distinct values |
| 230 | + - 1 distinct value at least → one user owning everything |
| 231 | + - 50,000 distinct values at most → because there are no more keys to match with from the other table |
| 232 | +- we assume uniform distribution (in the absence of other information) of repository ownership: therefore `repository.owner` = $(\frac{1+50,000}{2})$ = 25,001 |
| 233 | +- selectivity ≈ 25,001/100,000 = 0.250005 |
| 234 | + |
| 235 | +## d, e) |
| 236 | + |
| 237 | +before optimization: |
| 238 | + |
| 239 | +```sql |
| 240 | +SELECT * FROM repository rep, user u, release rel |
| 241 | +WHERE rep.owner = u.id AND rel.repo = rep.id |
| 242 | + AND (rel.name = 'v2' OR rel.version = 2) |
| 243 | + AND rep.commits > 105 |
| 244 | + AND rep.contributors > 11; |
| 245 | +``` |
| 246 | + |
| 247 | +after optimization: |
| 248 | + |
| 249 | +```sql |
| 250 | +SELECT * |
| 251 | +FROM release rel |
| 252 | +INNER JOIN repository rep ON rel.repo = rep.id |
| 253 | + AND rep.commits > 105 |
| 254 | + AND rep.contributors > 11 |
| 255 | +INNER JOIN user u ON u.id = rep.owner |
| 256 | + WHERE rel.version = 2 OR rel.name = 'v2'; |
| 257 | +``` |
| 258 | + |
| 259 | +*rule-based, logical, heuristic optimization* |
| 260 | + |
| 261 | +- simplify relational algebra, reduce i/o access: |
| 262 | + - replace $\times$ and $\sigma$ with $\bowtie$ |
| 263 | + - apply $\sigma, \pi$ as early as possible - and apply the stronger filters first |
| 264 | + - remove unnecessary attributes early on |
| 265 | +- huge search space: |
| 266 | + - there are $\frac{(2n)!}{n!}$ possible combinations of joining $n\text{+}1$ tables |
| 267 | + - dbms usually execute the "left-deep trees" because it allows pipelining and index nested loop joins. |
| 268 | + |
| 269 | +*optimizations* |
| 270 | + |
| 271 | +- replacing $\times$ and $\sigma$ with $\bowtie$ |
| 272 | + - i replaced the selection `WHERE rep.owner = u.id AND rel.repo = rep.id` after the cartesian product with 2 joins. |
| 273 | + - this will eliminate rows without a foreign key. |
| 274 | +- ordering joins |
| 275 | + - see: https://www.postgresql.org/docs/current/explicit-joins.html |
| 276 | + - `rep.owner` has 20,000 distinct values while `rel.repo` has 50,000 distinct values. therefore we have to join `repository` and `release` first. |
| 277 | + - this will eliminate even more rows without a foreign key. |
| 278 | + - but modern database query optimizers are so sophisticated that they often analyze table statistics and indexes to determine the best join order regardless of the way you write the query. |
| 279 | +- applying $\sigma, \pi$ as early as possible |
| 280 | + - we filter as many rows as possible before joining with the third table. |
| 281 | +- applying the stronger filters first |
| 282 | + - i reversed the order of predicates `(rel.name = 'v2' OR rel.version = 2)` so an integer based short circuit can happen before potential string comparisions. |
| 283 | + - i filtered by `rep.commits` before filtering by `rep.contributors` because `.commits` have more distinct values to filter and also the predicate `> 105` is already in the last equi-depth histogram bucket, while the predicate `>11` isn’t. |
| 284 | + |
| 285 | +# execise 3, 4: query planning and optimization |
| 286 | + |
| 287 | +the last 2 exercises turned into their own independent github project because i wanted to use jupyter notebook and write some code for better benchmarking. |
| 288 | + |
| 289 | +see: https://github.com/sueszli/query-queen |
0 commit comments