You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/05 02:19:32 UTC

[GitHub] [doris] zhengshengjun opened a new issue, #10603: [Enhancement] [vectorized] serialize aggregate data to actual column type in aggregation

zhengshengjun opened a new issue, #10603:
URL: https://github.com/apache/doris/issues/10603

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   Currently I found merge aggregation's performance is several times slower than 1st aggregation phase in a 2 phase aggregate query. After analysis with some perf-tools, the main causes are the following extra steps :
   1. aggregate data in 1st phase are serialized to StringColumn, then serialized to corresponding data types again in merge aggregate
   2. frequently allocate and destroy aggregate state
   ![image](https://user-images.githubusercontent.com/74281684/177235236-4c199674-e60d-413c-8e69-fa82620594af.png)
   
   Then we tried to serialize 1 aggregation's result directly its actual data type to avoid deserializing from StringColumn to actual data type, and merge with add_batch function to avoid frequently allocate and destroy aggregate state.
   
   The enhancement are tested:
   
   Before optimization:
   ![wVow1EXuEZ](https://user-images.githubusercontent.com/74281684/177236123-06559a8e-7eae-48ce-904e-9dfdb80072e4.png)
   ![image](https://user-images.githubusercontent.com/74281684/177236336-d2f78422-fac7-411a-a659-efa62870ccba.png)
   
   
   After optimization:
   ![image](https://user-images.githubusercontent.com/74281684/177235591-86463044-595e-4f24-a8e5-79ec7a0a5892.png)
   ![image](https://user-images.githubusercontent.com/74281684/177235775-51e002ec-9c22-4adc-ac89-d65a6aa20fcd.png)
   
   
   
   
   
   ### Solution
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhengshengjun commented on issue #10603: [Enhancement] [vectorized] serialize aggregate data to actual column type in aggregation

Posted by GitBox <gi...@apache.org>.
zhengshengjun commented on issue #10603:
URL: https://github.com/apache/doris/issues/10603#issuecomment-1174814604

   pr: https://github.com/apache/doris/pull/10618/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org