You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/04/26 06:27:11 UTC

[GitHub] [incubator-doris] zenoyang opened a new issue, #9229: [Enhancement] Enable PreAggregation on vectorized exec

zenoyang opened a new issue, #9229:
URL: https://github.com/apache/incubator-doris/issues/9229

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   PreAggregation is always turned off when the current vectorization is executed (even though explain sql shows PREAGGREGATION: ON ).
   When the amount of aggregated data is relatively large, enabling PreAggregation can improve the concurrency capability of aggregation, and the performance can be improved several times.
   
   Our production env query case:
   version: master 2d83167e5
   The number of table rows and deduplication cardinality are both in the tens of millions.
   5 BE, tbl has 100 buckets.
   
   | ID   | SQL                                                          | vectorized (s) | vectorized+PreAgg (s) | non-vectorized (s) |
   | ---- | ------------------------------------------------------------ | -------------- | --------------------- | ------------------ |
   | Q1   | select k1,<br/>       count(distinct if(k2 ='xxx',k3,NULL)) as uv<br/>from tbl1<br/>where dt = 'xxx'<br/>group by k1 | 358.66         | 176.59                | 184.95             |
   | Q2   | select k4,count(distinct if(k5=1,k6,NULL))<br/>from tbl1<br/>where dt='xxx' and k1='xxx' and k2='xxx' and k3='xxx'<br/>group by k4 | 14.13          | 5.74                  | 5.90               |
   | Q3   | SELECT count(distinct k2)<br/>from tbl1<br/>where dt='xxx'<br/>and k1 in ('xxx', ...) | 108.05         | 73.07                 | 75.27              |
   
   
   ### Solution
   
   Enable PreAggregation on vectorized exec, consistent with non-vectorization. In the future, we can use the query optimizer to decide whether to enable PreAggregation according to the amount of filtered data, whether there is a count distinct indicator, etc.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] zenoyang closed issue #9229: [Enhancement] Enable PreAggregation when has StreamingAgg on vectorized exec

Posted by GitBox <gi...@apache.org>.
zenoyang closed issue #9229: [Enhancement] Enable PreAggregation when has StreamingAgg on vectorized exec
URL: https://github.com/apache/incubator-doris/issues/9229


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org