You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/30 02:50:39 UTC

[GitHub] [incubator-doris] starocean999 opened a new issue, #9838: [Enhancement] global dict optimization for low cardinality string column

starocean999 opened a new issue, #9838:
URL: https://github.com/apache/incubator-doris/issues/9838

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   Group by operator is kind of time cosuming, especially for string column. However, some string column are low cardinality and can be dict encoded. We can take advantage of the dict encoded integer values in group by operator instead of the original strings to get much better performance.
   
   ### Solution
   
   add low_cardinality keyword. use "alter table xxx modify column yyy low_cardinality true;" to set the column yyy as low cardinality. The the column would be dict encoded and in group by operator, we can use integer dict code instead of strings.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org