You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/14 08:35:27 UTC

[GitHub] [arrow-datafusion] liukun4515 commented on issue #1823: implement bitmap_distinct function using bitmap

liukun4515 commented on issue #1823:
URL: https://github.com/apache/arrow-datafusion/issues/1823#issuecomment-1038800123


   > i have implement a initial version get below result: 1million_rows_10thousand_distinct.parquet
   > 
   > ```
   > 1. count distint
   > +----------------------------+
   > | COUNT(DISTINCT test.value) |
   > +----------------------------+
   > | 10000                      |
   > +----------------------------+
   > 1 row in set. Query took 0.237 seconds.
   > 
   > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   > 2. bitmap distinct (roaring-rs)
   > 
   > +---------------------------------+
   > | BITMAPCOUNTDISTINCT(test.value) |
   > +---------------------------------+
   > | 10000                           |
   > +---------------------------------+
   > 1 row in set. Query took 0.052 seconds
   > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   > 3. approx distinct (hll)
   > 
   > +----------------------------+
   > | APPROXDISTINCT(test.value) |
   > +----------------------------+
   > | 9943                       |
   > +----------------------------+
   > 1 row in set. Query took 0.047 seconds.
   > ```
   > 
   > the bitmap used is [this](https://github.com/RoaringBitmap/roaring-rs) i have checked influx_iox use [croating-rs](https://github.com/saulius/croaring-rs) @alamb Sorry to bother you 😂, could you share some info why use croating-rs, if you have a bench result that would be fantastic 👍 !
   
   Could you please file the draft of the pull request?
   @Ted-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org