You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mingmwang (via GitHub)" <gi...@apache.org> on 2023/03/29 16:37:09 UTC

[GitHub] [arrow-datafusion] mingmwang commented on issue #5547: Improve the performance of COUNT DISTINCT queries for high cardinality groups

mingmwang commented on issue #5547:
URL: https://github.com/apache/arrow-datafusion/issues/5547#issuecomment-1488935772

   > t aggregation as grouping columns).
   > 
   > I will paste JavaDoc for Spark's RewriteDistinctAggregates below be
   
    I can work on this.  In SparkSQL, it leverages the Expand operator to achieve this, In DataFusion, we do not have Expand operator, instead, we can leverage the Group by GroupingSet to achieve this(need to use the GROUPING_ID() functions with Case When Then exprs).
   
   Need PR need to be merged first,  so DataFusion can support GROUPING() and GROUPING_ID() functions.
   https://github.com/apache/arrow-datafusion/pull/5749
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org