Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
Aaaaaaron committed May 7, 2021
1 parent 3099113 commit 1b4ad25
Show file tree
Hide file tree
Showing 61 changed files with 72 additions and 60 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,11 @@ WHERE t2.c = 0

原表:

![image-20210414163537859](image-20210414163537859.png)
![image-20210414163537859](Calcite-Not-in-opt/image-20210414163537859.png)

经过两轮 join 之后的表(USERS.ID=JOBS.AGE):

![image-20210414171920321](image-20210414171920321.png)
![image-20210414171920321](Calcite-Not-in-opt/image-20210414171920321.png)

------

Expand Down Expand Up @@ -141,8 +141,8 @@ SQL 是个三值系统 TRUE/FALSE/UNKNOWN, 其中 NULL 和所有值比较都是

可以看到 jobs 插入了一个 users 不存在的值, 选出的结果会把 null 这行排除掉

![image-20210414165506690](image-20210414165506690.png)
![image-20210414165506690](Calcite-Not-in-opt/image-20210414165506690.png)

但是有一种例外, 当 A not in B, B表是个空表时, 可以选出 A 表所有数据, 可以看到 null 这行数据选了出来

![image-20210414165201722](image-20210414165201722.png)
![image-20210414165201722](Calcite-Not-in-opt/image-20210414165201722.png)
22 changes: 11 additions & 11 deletions Calcite Volcano Planner.md → Calcite-Volcano-Planner.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Memo 中两个最基本的概念就是 Expression Group(简称 Group) 以及

### Init Memo

![image-20200208221706543](image-20200208221706543.png)
![image-20200208221706543](Calcite-Volcano-Planner/image-20200208221706543.png)

一旦最初的计划复制到了MEMO结构中以后,就可以对逻辑操作符做一些转换以生成物理操作符。

Expand All @@ -56,11 +56,11 @@ Memo 中两个最基本的概念就是 Expression Group(简称 Group) 以及

由于物理属性的不同,同一组中的某些操作符可作为孩子节点,而另外一些操作符则不能

![image-20200208221734549](image-20200208221734549.png)
![image-20200208221734549](Calcite-Volcano-Planner/image-20200208221734549.png)

### Find best plan

![image-20200208221741219](image-20200208221741219.png)
![image-20200208221741219](Calcite-Volcano-Planner/image-20200208221741219.png)

**Ref:**

Expand Down Expand Up @@ -277,7 +277,7 @@ public static <T> Enumerable<T> asEnumerable(JavaRDD<T> rdd) {
}
```

![image-20200208225453064](image-20200208225453064-20210507205155192.png)
![image-20200208225453064](Calcite-Volcano-Planner/image-20200208225453064-20210507205155192.png)

## 基本流程

Expand Down Expand Up @@ -589,7 +589,7 @@ public RelNode changeTraits(final RelNode rel, RelTraitSet toTraits) {

在 changeTraits 里会创建一个新的 RelSubset(`rel#16:Subset#2.ENUMERABLE.[]`) 作为根节点(rootRel2), 但是两者还是同一个 RelSet (因为逻辑语义上没有变化)

![image-20200130143902228](image-20200130143902228.png)
![image-20200130143902228](Calcite-Volcano-Planner/image-20200130143902228.png)

然后会再用这个 rootRel2 再 set 一次 root: `planner.setRoot(rootRel2);` 由于这次的根节点是 RelSubset, registerImpl 会走到 `registerSubset` 中去. 不过会直接返回出来.

Expand Down Expand Up @@ -800,7 +800,7 @@ class EnumerableProjectRule extends ConverterRule {

在方法 `transformTo` 中可以看到 register 了这个新生成的物理算子(EnumerableProject): **volcanoPlanner.ensureRegistered(rel, rels[0], this)**, 这次第二个参数 equivRel 不是 null 了, 而是 `rels[0](LogicalProject)`. 在 VolcanoPlanner#register 中会拿到 equivRel 对应的 RelSet, 再走下去就又是上面的 `registerImpl`, 只不过这时候会有一个 set 传入.

![image-20200130235141251](image-20200130235141251.png)
![image-20200130235141251](Calcite-Volcano-Planner/image-20200130235141251.png)

这样就把新生成的这个物理算子注册到了原来的 RelSet 树上, 完成了 transform 的过程.

Expand All @@ -812,11 +812,11 @@ OPTIMIZE Rule-match queued: rule [ProjectRemoveRule] rels [rel#19:EnumerableProj

**Before**

![image-20200208225742683](image-20200208225742683.png)
![image-20200208225742683](Calcite-Volcano-Planner/image-20200208225742683.png)

**After**

![image-20200208225746328](image-20200208225746328.png)
![image-20200208225746328](Calcite-Volcano-Planner/image-20200208225746328.png)

#### ProjectRemoveRule

Expand Down Expand Up @@ -858,11 +858,11 @@ for (RelSubset otherSubset : otherSet.subsets) {

**before**

![image-20200131214015824](image-20200131214015824.png)
![image-20200131214015824](Calcite-Volcano-Planner/image-20200131214015824.png)

**after**

![image-20200131214114884](image-20200131214114884.png)
![image-20200131214114884](Calcite-Volcano-Planner/image-20200131214114884.png)

```
Root: rel#18:Subset#1.ENUMERABLE.[]
Expand Down Expand Up @@ -915,7 +915,7 @@ Set#1, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR GEN
rel#20:BindableTableScan.BINDABLE.[](table=[SALES, EMPS],filters=[=($1, 'John')]), rowcount=100.0, cumulative cost={0.5 rows, 0.505 cpu, 0.0 io}
```

![image-20200131224924660](image-20200131224924660.png)
![image-20200131224924660](Calcite-Volcano-Planner/image-20200131224924660.png)

### buildCheapestPlan

Expand Down
4 changes: 3 additions & 1 deletion Callback与-Coroutine-协程概念说明.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
---
title: Callback 与 Coroutine 协程概念说明
date: 2019-10-09 20:54:46
tags: 高性能
tags:
- 高性能
- System
---
# 小谈阻塞非阻塞
阻塞非阻塞概念都是对于线程, 进程这种粒度来说的, 因为只有他们才是内核有感知的, 协程是你内核无感知, 是你用户自己实现的.
Expand Down
20 changes: 10 additions & 10 deletions Kylin 查询的基本流程.md → Kylin-query-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Calcite 作为一个通用的 SQL 框架, 他出发点是希望能为不同计
1. schema 信息(通过 JDBC 的 Properties 传入)
2. TableScan 如何读取数据(需要框架使用方自己实现, 后面讲代码生成时会提到)

![image-20200207232940630](image-20200207232940630-20210507203629200.png)
![image-20200207232940630](Kylin-query-process/image-20200207232940630-20210507203629200.png)

![image-20200207232956828](image-20200207232956828-20210507203634094.png)
![image-20200207232956828](Kylin-query-process/image-20200207232956828-20210507203634094.png)

# 2.定义规则 Hook 到 Calcite

Expand All @@ -27,21 +27,21 @@ Kylin 使用上面注册进去的这些 Rule 把各个 Calcite 逻辑执行计
1. OLAPRel 也继承自RelNode(Calcite 的逻辑执行计划), 简单来说每个OLAP*Rel只是包了下 Calcite 的逻辑执行计划节点, 只是为了多加一些自己的抽象方法, 需要各个子类去实现, 后面会详细介绍
2. 这些抽象方法是用来完成[查询信息收集/选 Cube/rewrite 执行计划/生成具体的物理执行计划]

![image-20200207233022958](image-20200207233022958.png)
![image-20200207233022958](Kylin-query-process/image-20200207233022958.png)

1. 可以看到这个 Rule 当遇到 LogicalFilter 时, 会把它转换成 OLAPFilterRel, 他们都是 RelNode 的子类.

![image-20200207233039360](image-20200207233039360.png)
![image-20200207233039360](Kylin-query-process/image-20200207233039360.png)

1. 这里有一个在 calcite 代码里 Hack 的点, 就是所有生成的查询计划树, 头结点一定会是 OLAPToEnumerableConverter, 如果不是, 会抛错.

![image-20200207233054038](image-20200207233054038.png)
![image-20200207233054038](Kylin-query-process/image-20200207233054038.png)

这个是串起整个代码流程的关键位置, 下面会详细讲解:

**OLAPToEnumerableConverter.implement**

![image-20200207233114376](image-20200207233114376.png)
![image-20200207233114376](Kylin-query-process/image-20200207233114376.png)

# 3.切分 OLAPContext与选择 Cube

Expand Down Expand Up @@ -73,7 +73,7 @@ OLAPRel 接口有个方法: implementOLAP,之前通过 Rule 转化成的各种 O

注意, case2 虽然有两个 OLAPContext, 但是左边那个 OLAPContext 无法对应一个 cube, 真正 cube 能加速的部分是右下角红色的那个OLAPContext, 同理 case3, 上面白色的两个算子都需要现算.

![image-20200207233136751](image-20200207233136751.png)
![image-20200207233136751](Kylin-query-process/image-20200207233136751.png)

## 3.2 选择 cube

Expand Down Expand Up @@ -117,7 +117,7 @@ EnumerableRel enumerable = impl.visitChild((OLAPRel) getInput());

**OLAPSortRel#implementEnumerable**

![image-20200207233214957](image-20200207233214957.png)
![image-20200207233214957](Kylin-query-process/image-20200207233214957.png)

**注意**

Expand All @@ -139,7 +139,7 @@ return impl.visitChild(this, 0, enumerable, pref);

**EnumerableSort#implement**

![image-20200207233235160](image-20200207233235160.png)
![image-20200207233235160](Kylin-query-process/image-20200207233235160.png)

上文提到使用 Calcite 框架, 需要注意的第二点: TableScan 如何读取数据 接下来介绍, 这个很重要:

Expand All @@ -149,7 +149,7 @@ return impl.visitChild(this, 0, enumerable, pref);

**OLAPTableScan#implement**

![image-20200207233314381](image-20200207233314381.png)
![image-20200207233314381](Kylin-query-process/image-20200207233314381.png)

我们只需要关注 execFunction, 这个是最后生产代码取数据会执行的函数, 如果击中cube, 会走到 executeOLAPQuery, 这个方法在:OLAPTable#executeOLAPQuery

Expand Down
28 changes: 14 additions & 14 deletions SQL CTE 优化.md → SQL-CTE-optimize.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ b. 全部展开: 可以使用 i_color 上的 index, 但是展开的部分重复

c. 部分展开

![image-20210426183446419](image-20210426183446419-20210507210541900.png)
![image-20210426183446419](SQL-CTE-optimize/image-20210426183446419-20210507210541900.png)

### 如何根据上下文优化

Expand All @@ -69,7 +69,7 @@ WITH v AS (SELECT i_brand FROM item WHERE i_color = ’red’)
SELECT * FROM v as v1, v as v2 WHERE v1.i_brand = v2.i_brand;
```

![image-20210426205405612](image-20210426205405612-20210507210542053.png)
![image-20210426205405612](SQL-CTE-optimize/image-20210426205405612-20210507210542053.png)

1. **CTEProducer**: CTE 定义的树的根节点, 有唯一的 ID.
2. **CTEConsumer**: query 里用到 CTE 的地方, ID 与其对应的 CTEProducer 一致.
Expand All @@ -82,7 +82,7 @@ SELECT * FROM v as v1, v as v2 WHERE v1.i_brand = v2.i_brand;
- b. 无内联, CTEAnchor 被替换成 Sequence, CTEProducer 作为其 left child, CTEAnchor 的 child 作为其 right child
- **Sequence** 保证了特定的执行顺序, 可以保证 CTEProducer 在 CTEConsumer 之前执行. 这个机制可以保证生成的计划没有死锁

![image-20210427111008104](image-20210427111008104-20210507210542029.png)
![image-20210427111008104](SQL-CTE-optimize/image-20210427111008104-20210507210542029.png)

### 嵌套 CTE 例子

Expand All @@ -94,7 +94,7 @@ SELECT * FROM v AS v3, w AS w1, w AS w2
WHERE v3.p < w1.p + w2.p;
```

![image-20210427142643234](image-20210427142643234-20210507210542091.png)
![image-20210427142643234](SQL-CTE-optimize/image-20210427142643234-20210507210542091.png)

注意, 图 b 有两个 CTEAnchor, 其顺序和在 with 定义里的一致

Expand All @@ -113,7 +113,7 @@ Memo 中两个最基本的概念就是 Expression Group(简称 Group) 以及

#### Init Memo

![image-20200208221706543](image-20200208221706543-20210507210542129.png)
![image-20200208221706543](SQL-CTE-optimize/image-20200208221706543-20210507210542129.png)

一旦最初的计划复制到了MEMO结构中以后,就可以对逻辑操作符做一些转换以生成物理操作符。
一个转换规则可以生成:
Expand All @@ -123,19 +123,19 @@ Memo 中两个最基本的概念就是 Expression Group(简称 Group) 以及

#### Apply transformation/implement rule

![image-20200208221734549](image-20200208221734549-20210507210542169.png)
![image-20200208221734549](SQL-CTE-optimize/image-20200208221734549-20210507210542169.png)

一组逻辑操作符组成一个子计划。根仍保留在原来的组中,而其他操作符分配到其他的组中,必要的时候可以建立新组,如 join( A, join(B,C)) -> join( join(A,B), C), 这两个最外面的 Join 是等价的, 所以是同一个根节点, 但是前后两次里面的 join 不一样, 所以在不同的组

#### Find best plan

![image-20200208221741219](image-20200208221741219-20210507210542188.png)
![image-20200208221741219](SQL-CTE-optimize/image-20200208221741219-20210507210542188.png)

### CTE Transformation

使用 Memo 代表不同的候选者使得这个过程是 CBO 的, 我们可以在一条查询中内联某些 CTE, 其他的不内联.

![image-20210427144511944](image-20210427144511944-20210507210542294.png)
![image-20210427144511944](SQL-CTE-optimize/image-20210427144511944-20210507210542294.png)

1. 首先生成一个初始的 memo (F.6)
2. 第一条 rule 应用在 CTEAnchor 上, 生成一个 Sequence 节点(group 0), 同时把 CTEProducer 作为其 left child 展开, 生成了 group 4/5/6, 其 right child 是CTEAnchor 的 child
Expand All @@ -147,7 +147,7 @@ F.8 展示了一些可能的 plan, 这些 plan 都有一些问题:
- 8(a)/8(b) 是非法的, 他们只有 CTEConsumer, 却没有 CTEProducer, 所以 CTEConsumer 永远也读不到他们想要的数据. 我们会避免产生这样的计划
- 8(c)/8(d) 的计划明显不是最优的, c 展开了所有的 CTE, 但是保留了 CTEProducer, 同理 d 也是.

![image-20210427145718543](image-20210427145718543-20210507210542229.png)
![image-20210427145718543](SQL-CTE-optimize/image-20210427145718543-20210507210542229.png)

### Predicate Pushdown

Expand All @@ -162,7 +162,7 @@ WHERE v1.i_brand = v2.i_brand
AND v2.i_color = ’blue’;
```

![image-20210427153800222](image-20210427153800222-20210507210542245.png)
![image-20210427153800222](SQL-CTE-optimize/image-20210427153800222-20210507210542245.png)

### Always Inlining Single-use CTEs

Expand All @@ -187,7 +187,7 @@ SELECT * FROM item WHERE item.i_color = 'red';

### Enforcing Physical Properties

![image-20210427161134653](image-20210427161134653-20210507210542580.png)
![image-20210427161134653](SQL-CTE-optimize/image-20210427161134653-20210507210542580.png)

### Producer Context

Expand All @@ -211,7 +211,7 @@ SELECT COUNT(DISTINCT cs_item_sk), AVG(DISTINCT cs_qty)
FROM CATALOG sales WHERE cs_net profit > 1000
```

![image-20210427200049532](image-20210427200049532-20210507210542308.png)
![image-20210427200049532](SQL-CTE-optimize/image-20210427200049532-20210507210542308.png)

### Common Subexpression Elimination

Expand All @@ -225,7 +225,7 @@ SELECT * FROM
WHERE t1.i_brand <> t2.i_brand;
```

![image-20210427202305622](image-20210427202305622-20210507210542337.png)
![image-20210427202305622](SQL-CTE-optimize/image-20210427202305622-20210507210542337.png)

具体算法如下:

Expand All @@ -237,4 +237,4 @@ WHERE t1.i_brand <> t2.i_brand;
- InsertCTEConsumers() 把表达式中的公共子表达式替换为对应的 CTEConsumer
- 最后, 在每组公共子表达式的 LCA(最近公共祖先) 处插入一个CTEAnchor

![image-20210427202537374](image-20210427202537374-20210507210542586.png)
![image-20210427202537374](SQL-CTE-optimize/image-20210427202537374-20210507210542586.png)
3 changes: 1 addition & 2 deletions Spark-Bucketing-Deep-Dive.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: Spark bucketing Deep Dive
date: 2019-10-09 17:10:27
top: true
date: 2019-12-31 17:10:27
tags:
- Spark
- BigData
Expand Down
2 changes: 2 additions & 0 deletions Spark-PRC.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
title: Spark PRC
date: 2018-12-19 10:30:33
tags:
- Spark
- BigData
---

- receive:接收消息并处理,但不需要给客户端回复。
Expand Down
4 changes: 3 additions & 1 deletion Spark-SQL-Join-Deep-Dive.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
---
title: Spark SQL Join Deep Dive
date: 2019-06-29 18:20:19
date: 2019-03-29 18:20:19
tags:
- Spark
- BigData
---
# Join 基础
## Nested Loop Join
Expand Down
Loading

0 comments on commit 1b4ad25

Please sign in to comment.