Skip to content

Commit 364f411

Browse files
authored
Add files via upload
1 parent e9cf642 commit 364f411

14 files changed

+3460
-0
lines changed
Loading
Loading

adbs - advanced database systems 184.780/assets/Untitled-2023-09-28-1305.svg

+21
Loading

adbs - advanced database systems 184.780/assets/dmbs sketch.excalidraw

+2,114
Large diffs are not rendered by default.
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
exercise sheet 1, advanced database systems, 2024s
2+
3+
author: yahya jabary, 11912007
4+
5+
# execise 1: disk access in databases
6+
7+
*disk specifications:*
8+
9+
- **magenetic disk:**
10+
- block size: 1 kB
11+
- rotational speed: 10,000 rpm
12+
- seek time: 1 ms
13+
- transfer rate: 500 MB/s
14+
- track-to-track seek time: 1 ms
15+
- track size: 1000 kB
16+
- **ssd:**
17+
- block size: 10 kB
18+
- transfer rate: 3000 MB/s
19+
20+
_database files:_
21+
22+
- $Item$:
23+
- $n_i$ = 50,000 records (number of records)
24+
- $R_i$ = 10 kB (record size)
25+
- $Supplier$:
26+
- $n_s$ = 200 records
27+
- $R_s$ = 50 kB
28+
29+
*dbms:*
30+
31+
- block size: 1000 kB → larger than disk block sizes so the dbms will need to perform more i/o operations to interact with a full dbms block.
32+
- unspanned → no indirection
33+
- contiguous block allocation → blocks of the same file are adjacent
34+
- main memory is empty when starting off
35+
- intermediate results and hash values stay in memory
36+
37+
_assignment:_
38+
39+
- query: $Item \bowtie_{Item.supplier=Supplier.id}Supplier$
40+
- calculate access time for the given query execution plans by postgres for each external storage
41+
42+
## a) hash join
43+
44+
```
45+
Hash Join
46+
Hash Cond: (i.supplier = s.id)
47+
-> Seq Scan on item i
48+
-> Hash
49+
-> Seq Scan on supplier s
50+
```
51+
52+
*hash join*
53+
54+
assume we want to equi join: $R \bowtie_{\text{A}=\text{B}} S$
55+
56+
- i. partition phase:
57+
- find a hash function that can map values in the join columns to a buffer frame index between $[1;B\text{-}1]$. → the buffer frames we map the rows to are called "buckets" and the 1 remaining buffer frame is used to read new pages in.
58+
- read each page $p_R$ of $R$ to memory. then hash the join value of each row to find the right bucket to store a pointer in. → if buffer frames overflow, write them back to disk.
59+
- repeat for $p_S$ of $S$.
60+
- total cost: $2 \cdot (b_R + b_S)$ → factor of 2 because of initial reading and writing back the potentially full buckets to disk.
61+
- ii. probing phase:
62+
- assuming $R_i$ and $S_i$ are all rows in the $i$th-bucket (and $R_i$ is the smaller one of them): read $R_i$ to $B\text-2$ buffer frames. → if not possible, either hash recursively or try another algorithm. the 2 remaining buffer frames are used to read new $S_i$ pages in and store the final result.
63+
- read each page of $S_i$ into memory. then check each row for matches with $R_i$.
64+
- if a matching row is found, write it into the buffer frame dedicated to results.
65+
- total cost: $b_i + b_s$
66+
- **total cost of both phases**: $3 \cdot (b_R + b_S)$
67+
68+
*access time: magnetic disk*
69+
70+
- access time for one block:
71+
- $t_s$ - seek time: 1ms
72+
- $t_r$ - rotational delay: 0.5 \* (1 / 10,000) \* 60 = 3ms → we assume that it takes 0.5 rotations for a hit on average.
73+
- $t_{tr}$ - transfer time: 500MB/s = 1kB/0.002ms
74+
- total: 4.002ms
75+
- access time for $n$ blocks:
76+
- $t_{t2t}$ - track to track seek time = 1ms
77+
- num of allocated tracks: 1000kB track size / 1kB block = 1000 blocks per track → for $n$ blocks we need $n$/1000 tracks → we would change tracks ($n$/1000-1) times
78+
- **random access**: $n$ * 4.002ms
79+
- **sequential access**: $t_s$ + $t_r$ + $n \cdot t_{tr}$ + track changes \* $t_{t2t}$ = 1ms + 3ms + $n$ \* 0.002ms + ($n$/1000-1) \* 1ms
80+
- i. $Item$
81+
- total num of blocks: (50,000 records * 10 kB record size) / 1kB block size = 500,000 blocks
82+
- sequential access of 500,000 blocks: **1503ms**
83+
- ii. $Supplier$
84+
- total num of blocks: (200 records * 50 kB record size) / 1kB block size = 10,000 blocks
85+
- sequential access of 10,000 blocks: **33ms**
86+
- iii. total access time for hash join
87+
- $3 \cdot (b_{Item} + b_{Supplier})$ = 3 * (1503ms + 33ms) = **4608ms**
88+
89+
*access time: ssd*
90+
91+
- access time for one block:
92+
- $t_{tr}$ - transfer time: 3000MB/s = 10kB/3333.3 ns
93+
- access time for $n$ blocks:
94+
- **sequential / random access**: $n$ * 3333.3ns
95+
- i. $Item$
96+
- total num of blocks: (50,000 records * 10 kB record size) / 10kB block size = 50,000 blocks
97+
- sequential access of 50,000 blocks: **166.665ms**
98+
- ii. $Supplier$
99+
- total num of blocks: (200 records * 50 kB record size) / 10kB block size = 1000 blocks
100+
- sequential access of 1000 blocks: **3.3333ms**
101+
- iii. total access time for hash join
102+
- $3 \cdot (b_{Item} + b_{Supplier})$ = 3 * (166.665ms + 3.3333ms) = **509.9949ms**
103+
104+
## b) index nested loops join
105+
106+
```
107+
Nested Loop
108+
-> Seq Scan on supplier s
109+
-> Index Scan using record_by_idx on item i
110+
Index Cond: (supplier = s.id)
111+
```
112+
113+
pseudo algorithm for naive nested loops join:
114+
115+
```
116+
foreach page p_item of item:
117+
foreach page p_supplier of supplier:
118+
foreach tuple i ∈ p_item and s ∈ p_supplier:
119+
if i.supplier = s.id then Res := Res ∪ {(r,s)}
120+
```
121+
122+
pseudo algorithm for index nested loops join:
123+
124+
```
125+
itemIndex := generateIndex(i.supplier)
126+
127+
foreach page p_supplier of supplier:
128+
foreach tuple s ∈ p_supplier:
129+
Res := Res ∪ itemIndex.getMatches(s.id)
130+
```
131+
132+
details:
133+
134+
- every `supplier.id` is looked up in an index of `item`, with the column `item.supplier` as the index key.
135+
- if there is a match, the record from the index pointer gets read from disk.
136+
- the result contains 20 records.
137+
- the disk access isn’t sequential. we don’t know anything about the read order.
138+
- we don’t know which kind index was used. we do not include the disk access costs for index creation.
139+
- **total cost**: $b_{Supplier} + 20 \cdot r_{Item}$
140+
141+
*access time: magnetic disk*
142+
143+
- i. $Item$
144+
- total num of blocks: (20 records * 10 kB record size) / 1kB block size = 200 blocks
145+
- random access of 200 blocks: **800.4ms**
146+
- ii. $Supplier$
147+
- sequential access of 10,000 blocks: **33ms** (same as previous example)
148+
- iii. total: **833.4 ms**
149+
150+
*access time: ssd*
151+
152+
- i. $Item$
153+
- total num of blocks: (20 records * 10 kB record size) / 10kB block size = 20 blocks
154+
- random access of 20 blocks: **0.066666ms**
155+
- ii. $Supplier$
156+
- sequential access of 1000 blocks: **3.3333ms** (same as previous example)
157+
- iii. total: **3.399966ms**
158+
159+
# execise 2: selectivity
160+
161+
## a)
162+
163+
estimate the selectivity:
164+
165+
- `repository.contributors` has 100,000 rows
166+
- equi-depth histogram: 7 buckets of equal size using the 6 dividers {1, 2, 4, 7, 12, 20}
167+
- max value: 255
168+
- assumption: boundary values are included in the following bucket
169+
- buckets: {\[-∞;0], \[1;1], \[2;3], \[4;6], \[7;11], \[12;19], \[20;255]}
170+
- assume uniform distribution
171+
172+
*i) predicate: `contributors ≥ 4`*
173+
174+
- because the histogram is equi-depth, we can use the bucket count to calculate selectivity
175+
- 4 buckets satisfy the predicate: {\[4;6], \[7;11], \[12;19], \[20;255]}
176+
- selectivity ≈ 4/7 buckets = 0.5714285714
177+
178+
*ii) predicate: `contributors > 12`*
179+
180+
- if the 2 buckets in {\[12;19], \[20;255]} wouldn’t contain the value 12, then they would satisfy the predicate.
181+
- since the values are evenly spread, $\approx(1-\frac{1}{19-12}) = \frac{6}{7}$ of values in the \[12;19] bucket satisfy the predicate.
182+
- selectivity ≈ $(1+\frac{6}{7})/7$ buckets = 0.2653061224
183+
184+
## b)
185+
186+
estimate the selectivity:
187+
188+
- avoid histograms for this part: histograms focus on ranges of values. they aren’t useful for selectivity estimation of equalities.
189+
- `repository.contributors` has 400 distinct values.
190+
- (note: combined with the prior specification, this means that values can also be negative)
191+
192+
*i) predicate: `contributors == 5`*
193+
194+
- we assume uniform distribution (in the absence of other information)
195+
- selectivity ≈ $\frac{1}{400}$ = 0.0025
196+
197+
*ii) predicate: `contributors != 5`*
198+
199+
- we assume uniform distribution (in the absence of other information)
200+
- selectivity ≈ $1-\frac{1}{400}$ = 0.9975
201+
202+
*limitations of the method*
203+
204+
- **uniform distribution assumption:** this method assumes that all distinct values in the 'contributors' column are equally likely. this is very unlikely in reality. (ie. a negative exponential distribution might be a better approximation - but we can only find out through sampling).
205+
- **lack of correlation:** it doesn't consider potential correlations between the 'contributors' column and other data in the table. these correlations could impact the actual selectivity of the predicate.
206+
207+
*possible solutions*
208+
209+
- **more detailed histograms:** instead of equi-depth histograms, you could use equi-width histograms or more sophisticated histograms that better capture the distribution of frequent values.
210+
- **sampling:** take a representative sample of the data and examine the distribution of values in the 'contributors' column within the sample. this can give you a more realistic estimate of selectivity.
211+
- **collect statistics:** gather more detailed statistics about the frequency of different values in the 'contributors' column. this will improve selectivity estimates for equality predicates.
212+
213+
## c)
214+
215+
estimate the selectivity:
216+
217+
- `user` has 50,000 rows
218+
- `repository` has 100,000 rows (read from the table in the assignment, not part of this exercise)
219+
- the `user.id` key can be joined on the foreign key `repository.owner`
220+
- assumption: no null values in `repository.owner` because it’s a foreign key.
221+
222+
*i) `repository ⋈_{owner=id} user`*
223+
224+
- since `user.id` is a key attribute, the result of the join can’t have more rows than `user` has.
225+
- selectivity of join ≈ selectivity of `user.id` = 1/50,000 = 0.00002
226+
227+
*ii) `π_owner(repository)`*
228+
229+
- `repository.owner` has \[1;50,000] distinct values
230+
- 1 distinct value at least → one user owning everything
231+
- 50,000 distinct values at most → because there are no more keys to match with from the other table
232+
- we assume uniform distribution (in the absence of other information) of repository ownership: therefore `repository.owner` = $(\frac{1+50,000}{2})$ = 25,001
233+
- selectivity ≈ 25,001/100,000 = 0.250005
234+
235+
## d, e)
236+
237+
before optimization:
238+
239+
```sql
240+
SELECT * FROM repository rep, user u, release rel
241+
WHERE rep.owner = u.id AND rel.repo = rep.id
242+
AND (rel.name = 'v2' OR rel.version = 2)
243+
AND rep.commits > 105
244+
AND rep.contributors > 11;
245+
```
246+
247+
after optimization:
248+
249+
```sql
250+
SELECT *
251+
FROM release rel
252+
INNER JOIN repository rep ON rel.repo = rep.id
253+
AND rep.commits > 105
254+
AND rep.contributors > 11
255+
INNER JOIN user u ON u.id = rep.owner
256+
WHERE rel.version = 2 OR rel.name = 'v2';
257+
```
258+
259+
*rule-based, logical, heuristic optimization*
260+
261+
- simplify relational algebra, reduce i/o access:
262+
- replace $\times$ and $\sigma$ with $\bowtie$
263+
- apply $\sigma, \pi$ as early as possible - and apply the stronger filters first
264+
- remove unnecessary attributes early on
265+
- huge search space:
266+
- there are $\frac{(2n)!}{n!}$ possible combinations of joining $n\text{+}1$ tables
267+
- dbms usually execute the "left-deep trees" because it allows pipelining and index nested loop joins.
268+
269+
*optimizations*
270+
271+
- replacing $\times$ and $\sigma$ with $\bowtie$
272+
- i replaced the selection `WHERE rep.owner = u.id AND rel.repo = rep.id` after the cartesian product with 2 joins.
273+
- this will eliminate rows without a foreign key.
274+
- ordering joins
275+
- see: https://www.postgresql.org/docs/current/explicit-joins.html
276+
- `rep.owner` has 20,000 distinct values while `rel.repo` has 50,000 distinct values. therefore we have to join `repository` and `release` first.
277+
- this will eliminate even more rows without a foreign key.
278+
- but modern database query optimizers are so sophisticated that they often analyze table statistics and indexes to determine the best join order regardless of the way you write the query.
279+
- applying $\sigma, \pi$ as early as possible
280+
- we filter as many rows as possible before joining with the third table.
281+
- applying the stronger filters first
282+
- i reversed the order of predicates `(rel.name = 'v2' OR rel.version = 2)` so an integer based short circuit can happen before potential string comparisions.
283+
- i filtered by `rep.commits` before filtering by `rep.contributors` because `.commits` have more distinct values to filter and also the predicate `> 105` is already in the last equi-depth histogram bucket, while the predicate `>11` isn’t.
284+
285+
# execise 3, 4: query planning and optimization
286+
287+
the last 2 exercises turned into their own independent github project because i wanted to use jupyter notebook and write some code for better benchmarking.
288+
289+
see: https://github.com/sueszli/query-queen
Loading
Loading
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
> Lamport, Leslie. "Paxos made simple." ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) (2001): 51-58.
2+
3+
the core of paxos is the **”synod consensus algorithm”**. the consensus is applied to the state machine to build distributed systems.
4+
5+
## overview
6+
7+
_problem_
8+
9+
- a bunch of processes can propose values
10+
- we only want a single value of those which got proposed to be chosen – and shared among all processes
11+
12+
_roles_
13+
14+
- keep in mind: each process can have one or more roles in the implementation
15+
- proposers
16+
- acceptors
17+
- learners
18+
19+
_message passing_
20+
21+
- based on the non-byzantine model
22+
- agents have arbitrary processing speed
23+
- messages can take an arbitrary time to be delivered or even lost – we are allowed to duplicate them
24+
- messages can never be corrupted
25+
- may fail and restart any time – we need redundancy
26+
27+
## step 1: choosing a value
28+
29+
_definitions_
30+
31+
- proposals $(n,v)$
32+
- a proposal consists of a “**proposal number** $v$“ and a “**value** $n$“.
33+
- accepting
34+
- a proposal can be accepted by an acceptor.
35+
- choosing
36+
- a proposal is chosen, if it got accepted by the majority.
37+
38+
_possible solutions_
39+
40+
there are multiple ways to approach this problem:
41+
42+
- one or more proposers send a value to a **single acceptor**.
43+
- problem: failure of acceptor can lead to failure of system.
44+
- solution: multiple acceptors for fault tolerance.
45+
- one or more proposers send a value to **multiple acceptors concurrently**.
46+
- problem: failure of acceptor can lead to no choice being made, because no majority is reached.
47+
- solution: requirement (2).
48+
49+
_requirements_
50+
51+
1. acceptors must accept the first proposal they receive.
52+
- (note: this is necessary so the system functions even with a single proposer)
53+
- 1a. acceptors can’t accept higher promise numbers if they agreed to it in a response to a `prepare()` request.
54+
1. multiple proposals can be chosen, as long as they all have the same value.
55+
- 2a. all following proposals that get accepted must have the same value (implies 2).
56+
- 2b. all following proposals that get proposed must have the same value (implies 2a).
57+
- 2c. for all proposals $(n,v)$ there is a set $S$ consisting of the majority of acceptors. for each acceptor, at least one of the conditions below must apply:
58+
- $\forall s \in S:$ no proposal with a number smaller than $n$ was ever accepted.
59+
- $\forall s \in S:$ the accepted proposal with a number smaller than but closest to $n$, has the same value $v$.
60+
61+
_satisfiying requirements_
62+
63+
- `prepare()` request – which satisfies the (2c) requirement:
64+
1. the proposer selects a random proposal number $n$
65+
- a proposer can make multiple proposals and can also abandon a proposal in the middle of the protocol. but it can’t reuse this number for another proposal. proposers remember the highest proposal number they ever used.
66+
- no two proposals are ever issued with the same number.
67+
1. the proposer requests all acceptors not to accept any other proposals with a number smaller than $n$.
68+
- the acceptors promise not to, if the number they received is the highest one out of all proposal they’ve received so far.
69+
1. the proposer requests all acceptors to respond with the proposal they accepted with a number smaller than but closest to $n$.
70+
- `accept()` request
71+
1. if the proposer receives a majority response, then it can **issue a proposal to be accepted** with $v$ being either:
72+
- a) the highest numbered proposal among all responses.
73+
- b) an arbitrary value (if responders didn’t return any proposal).
74+
2. the acceptors accept this proposal, unless they received a `prepare()` request with a higher number in the meantime.
75+
76+
one possible performance optimization could be letting proposers know when a proposal with a higher number has been found, so they immediately restart.
77+
78+
if proposers constantly send `prepare()` requests with increasing values, then no progress will ever be made (livelock). this can be avoided by electing one or many special proposers, called **”distinguished proposers”**, that manage the proposals by receiving, selecting and broadcasting them all to the acceptors – similar to a proxy.
79+
80+
## step 2: learning a chosen value
81+
82+
learners learn that a value has been chosen, by finding out that it was accepted by the majority of acceptors.
83+
84+
there are many ways to implement this:
85+
86+
- acceptors broadcasting their accepted value to learners.
87+
- learners broadcasting a request to learn to all acceptors.
88+
- acceptors broadcasting to a subset of learners which are responsible to then forwarding / broadcasting the request to all learners - called the **”distinguished learners”**.
89+
90+
## the state machine
91+
92+
in a simple client-server architecture the server can be described as a deterministic state-machine that performs client instructions by executing a series of steps.
93+
94+
state machines receive an input, update their internal state, return an output.
95+
96+
we want to guarantee that all servers execute the same sequence of state machines commands to then reach the same final state.
97+
98+
we run multiple instances of paxos and the $i^{th}$ instance will determine the $i^{th}$ state-machine instruction of a sequence.
99+
100+
each server plays the same role in all instances of the algorithm (optimization: electing leaders for each role in each instance).

0 commit comments

Comments
 (0)