You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/05 02:19:32 UTC
[GitHub] [doris] zhengshengjun opened a new issue, #10603: [Enhancement] [vectorized] serialize aggregate data to actual column type in aggregation
zhengshengjun opened a new issue, #10603:
URL: https://github.com/apache/doris/issues/10603
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
### Description
Currently I found merge aggregation's performance is several times slower than 1st aggregation phase in a 2 phase aggregate query. After analysis with some perf-tools, the main causes are the following extra steps :
1. aggregate data in 1st phase are serialized to StringColumn, then serialized to corresponding data types again in merge aggregate
2. frequently allocate and destroy aggregate state
![image](https://user-images.githubusercontent.com/74281684/177235236-4c199674-e60d-413c-8e69-fa82620594af.png)
Then we tried to serialize 1 aggregation's result directly its actual data type to avoid deserializing from StringColumn to actual data type, and merge with add_batch function to avoid frequently allocate and destroy aggregate state.
The enhancement are tested:
Before optimization:
![wVow1EXuEZ](https://user-images.githubusercontent.com/74281684/177236123-06559a8e-7eae-48ce-904e-9dfdb80072e4.png)
![image](https://user-images.githubusercontent.com/74281684/177236336-d2f78422-fac7-411a-a659-efa62870ccba.png)
After optimization:
![image](https://user-images.githubusercontent.com/74281684/177235591-86463044-595e-4f24-a8e5-79ec7a0a5892.png)
![image](https://user-images.githubusercontent.com/74281684/177235775-51e002ec-9c22-4adc-ac89-d65a6aa20fcd.png)
### Solution
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] zhengshengjun commented on issue #10603: [Enhancement] [vectorized] serialize aggregate data to actual column type in aggregation
Posted by GitBox <gi...@apache.org>.
zhengshengjun commented on issue #10603:
URL: https://github.com/apache/doris/issues/10603#issuecomment-1174814604
pr: https://github.com/apache/doris/pull/10618/files
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org