You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/12/21 08:40:29 UTC

[GitHub] elloooooo opened a new pull request #6768: extension for exactly distinct count for single long type dimension:accurate-cardinality

elloooooo opened a new pull request #6768: extension for exactly distinct count for single long type dimension:accurate-cardinality
URL: https://github.com/apache/incubator-druid/pull/6768
 
 
   Now, Druid offers the ability by nested group by query.
   Its logic can be described as follows:
   
   For a sql like:
   ```sql
   select count(distinct pid) from DATASOURCE where col="val"
   ```
   the exactly query will be like:
   ```sql
   select count(*) from (
     select pid from DATASOURCE_segments_in_historical1 where col="val" group by pid
     UNION ALL
     select pid from DATASOURCE_segments_in_historical2 where col="val" group by pid
     UNION ALL
     select pid from DATASOURCE_segments_in_historical3 where col="val" group by pid
     ...
   ) group by pid
   ```
   
   For high cardinality case, the size of result transfered from historical node to broker node can be really large and leads to poor performance.
   So this extension try to use bitmap(64bit RoaringBitmap) as the container for the result data from the historical.
   The performance can be 10 times better the nested group by method.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org