You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/07 15:06:48 UTC

[GitHub] [doris] morrySnow opened a new issue, #14912: [Feature] (Nereids) lateral view

morrySnow opened a new issue, #14912:
URL: https://github.com/apache/doris/issues/14912

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   # Lateral View
   
   # 综述
   
   Lateral View的作用为:对输入的tuple,调用table function,生成k个tuple。然后和原始输入数据做inner 或者outer join。
   
   Lateral的实现分为两个部分
   
   1. 表函数,即Table Function
   2. 表函数节点,即TableFunctionNode
   
   # 实现参考
   
   ## 旧优化器
   
   ### TableFunction
   
   **支持如下的table function**
   
   - explode_split
   - explode_bitmap
   - explode_json_array_int
   - explode_json_array_double
   - explode_json_array_string
   - explode_numbers
   - explode
   
   以及他们的outer版本,带上后缀`_outer`
   
   **查找table function逻辑**
   
   旧优化器的table function没有自己的基类,使用了`scalarFunction`作为基类。为了区分是否为table function。在FunctionSet中使用额外的数据结构存储对应的function name到function的映射。
   
   ### TableFunctionNode
   
   旧优化器的TableFunctionNode实现比较简单。内部如要有以下属性
   
   ```java
   ArrayList<Expr> fnCallExprList;
   // The output slot ids of TableFunctionNode
   // Only the slot whose id is in this list will be output by TableFunctionNode
   List<SlotId> outputSlotIds
   
   // 以及继承自PlanNode的属性
   ArrayList<TupleId> tupleIds;
   ```
   
   FE和BE有隐式依赖:tupleIds必须包含input的tuple,同时按照function的顺序,排列生成列的tuple。
   
   outputSlotIds标记了所有需要输出的列,不在其中的列,在BE不会复制child的数据,以减少内存开销。
   
   TableFunctionNode可以包含多个Lateral view展开的内容,均在这一个节点处理。
   
   ```java
   +-----------------------------------------------------------------------+
   | Explain String                                                        |
   +-----------------------------------------------------------------------+
   | PLAN FRAGMENT 0                                                       |
   |   OUTPUT EXPRS:                                                       |
   |     `a`                                                               |
   |     `b`                                                               |
   |   PARTITION: HASH_PARTITIONED: `default_cluster:test`.`t1`.`k1`       |
   |                                                                       |
   |   VRESULT SINK                                                        |
   |                                                                       |
   |   1:VTABLE FUNCTION NODE                                              |
   |   |  table function: explode(array(1, 2, 3)) explode(array(1, 2, 3))  |
   |   |  lateral view tuple id: 1 2                                       |
   |   |  output slot id: 0 1                                              |
   |   |  cardinality=2                                                    |
   |   |                                                                   |
   |   0:VOlapScanNode                                                     |
   |      TABLE: default_cluster:test.t1(t1), PREAGGREGATION: ON           |
   |      partitions=1/1, tablets=1/1, tabletList=113826                   |
   |      cardinality=1, avgRowSize=1940.0, numNodes=1                     |
   +-----------------------------------------------------------------------+
   ```
   
   ## Spark
   
   ### Generator(TableFunction)
   
   spark使用统一的基类Generator代表所有的TableFunction。
   
   spark也支持来后缀的_outer函数。实现方式为用OuterGenerator包装真正的Generator。在analyze时,将其替换为内部的Generator,并在Generate中将Outer标记设置为true。
   
   ### Generate(TableFunctionNode)
   
   spark使用Generate节点来完成此任务
   
   spark将每一个lateral view均翻译成单独的generate节点。如果调用多个lateral view则依序处理
   
   ```java
   *(1) Generate explode(array(l_orderkey#362)), [l_orderkey#362, a#378], false, [a2#379]
   +- *(1) Generate explode(array(l_orderkey#362)), [l_orderkey#362], false, [a#378]
      +- ...
   ```
   
   ### 语法
   
   spark可以使用额外的OUTER关键字实现outer lateral view。
   
   spark不止可以在table后使用lateral view。在fromClause后也可以使用,即对fromClause整体进行表函数处理。
   
   # 新优化器设计
   
   ## 名称
   
   按照新优化器节点名称的原则,数据变换节点,应以动词命名。由于TableFunction的规范名称应为Table Generating Function。所以使用`Generate`作为节点名称更合适。
   
   函数类名使用规范名称:`TableGeneratingFunction`
   
   ## 数据结构
   
   ```java
   class LogicalGenerate {
     List<Function> generators;
     boolean outer;
     String qualifier;
     List<Slot> generatorOutput;
   } 
   ```
   
   ## Parser
   
   遇到Lateral View直接解析为Generate节点
   
   ```java
   new LogicalGenerator(unboundFunctions, false, qualifier, slots, child)
   ```
   
   ## Analyzer
   
   1. 解析函数,check必须为`TableGeneratingFunction`
   
   ## Rewrite
   
   1. 合并连续的Generator,只要top的`TableGeneratingFunction` 输入,不是bottom的`generatorOutput`
   
   ## translator
   
   1. 增加一个`TableGeneratingFunction`的翻译规则
   2. 注意生成的tuple list,保证child的tuple在最前面
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow closed issue #14912: [Feature] (Nereids) lateral view

Posted by GitBox <gi...@apache.org>.
morrySnow closed issue #14912: [Feature] (Nereids) lateral view
URL: https://github.com/apache/doris/issues/14912


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org