You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/26 01:42:28 UTC

[GitHub] [arrow-datafusion] jychen7 commented on a change in pull request #1841: Implement bitmap_distinct function using roaring bitmap

jychen7 commented on a change in pull request #1841:
URL: https://github.com/apache/arrow-datafusion/pull/1841#discussion_r835693924



##########
File path: datafusion-physical-expr/src/expressions/bitmap_distinct.rs
##########
@@ -0,0 +1,229 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       [brainstorming] how about we combine this to `ApproxDistinct` and use `BitmapDistinctCountAccumulator` for int8, int16 and int32 if the feature is avilable?
   
   And use `NumericHLLAccumulator` for int64 and other non-int types. This way, user just need declare `approx_distinct` and rely on Datafusion to auto select the best approximate algorithm
   
   https://github.com/apache/arrow-datafusion/blob/81592947e8814327ebdbd1fbc3d4a090796e37a3/datafusion-physical-expr/src/expressions/approx_distinct.rs#L91-L98
   
   ---
   
   unrelate notes: as a user, I do want to keep `count(distinct)` as exact count and `approx_distinct` as approximation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org