You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/11/02 06:26:53 UTC

[GitHub] [incubator-doris] Userwhite opened a new issue #6989: [Feature] DorisOnEs support Aggregate push down

Userwhite opened a new issue #6989:
URL: https://github.com/apache/incubator-doris/issues/6989


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   ## DorisOnEs 支持聚合下推
   
   旧版本:将过滤条件下推,使用ES读取数据返回到Doris,利用Doris的MPP架构进行聚合
   新版本:将过滤条件和聚合下推,使用ES获取每个shard的聚合结果,利用Doris的MPP架构进行聚合结果的Merge
   
   框架图
   <img width="621" alt="F33EF407-3B42-4594-8B70-3E31ED3B0C00" src="https://user-images.githubusercontent.com/49226823/139794336-b1fa865e-6a91-426b-94e5-ec6eb8e4ea3b.png">
   
   1. FE生成查询计划,判断是否满足ES聚合下推条件:
   目前只支持单表聚合下推,聚合及Group by不能包含函数(IFNULL.. is ok),in谓词中不能包含null等等
   2. 如果满足聚合下推的条件,将两阶段聚合的第一阶段和scan阶段合并,表示通过ES下发聚合请求直接获取每个shard的聚合结果。
   3. BE收到查询计划后,会根据相关参数生成携带过滤条件和聚合属性的DSL,每个BE会被分配多个shard,开启多个线程将DSL发送到BE。
   4. BE从ES获取结果,解析返回结果
   5. BE根据解析得到的数据,拼接成聚合一阶段的tuple,shuffle到聚合二阶段Merge finalize。
   6. 生成最终结果。
   
   注意:
   1. 数据类型:只支持一些常规类型:INT、BIGINT、DOUBLE、DATE、DATETIME、VARCHAR等,不包含Decimal
   2. 聚合类型:只包含sum、count(不包含distinct)、min、max、avg
   聚合和group by都是对列进行操作,不能出现函数如2 * sum(a)
   sum/avg(1.0/1) 这种情况目前没有处理(要么类型是decimal,要么内部不是slotref),不予以下推。 
   count(1/*)允许下推,会被特殊处理,count(2/..)不予以下推
   max(string) 是不被允许的,ES不能处理这种情况。
   3. 如果谓词不能全部下推到ES,也不允许聚合下推,eg: IN的时候两个类型不一致。
   4. 处理时间字段的时候严格按照0时区,ES写入的时候默认0时区,这样能保持时间一致。
   
   
   ### Use case
   
   使用Session variable[enable_pushdown_agg_to_es]控制功能开启。
   
   ## 表数据
   
   ![image](https://user-images.githubusercontent.com/49226823/139795744-fafafa36-67a6-45ab-8fdc-7b8f02d1f45e.png)
   
   ## 功能简单验证
   
   ```sql
   select/*+set_var(enable_pushdown_agg_to_es=true)*/ sum(price),avg(price),max(price),max(sold),min(sold),sold from doe group by sold order by sold;
   select/*+set_var(enable_pushdown_agg_to_es=false)*/ sum(price),avg(price),max(price),max(sold),min(sold),sold from doe group by sold order by sold;
   ```
   ![image](https://user-images.githubusercontent.com/49226823/139796127-b26bb656-c1c5-4c5c-a438-2763d9c17508.png)
   
   ## Explain计划查看聚合是否下推
   
   ![image](https://user-images.githubusercontent.com/49226823/139796368-1b462ba0-6d5d-4894-a846-572dcfe4512e.png)
   ![image](https://user-images.githubusercontent.com/49226823/139796431-1919c83d-aef4-418a-980a-fe71cf8e7d24.png)
   
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org