You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/03/25 07:01:53 UTC

[GitHub] [incubator-druid] pdeva opened a new issue #7337: DataSketches HLL is not a replacement for Cardinality aggregator

pdeva opened a new issue #7337: DataSketches HLL is not a replacement for Cardinality aggregator
URL: https://github.com/apache/incubator-druid/issues/7337
 
 
   ### Description
   The `Cardinality` aggregator is deprecated in 0.14 in favor of `Datasketch HLL`.
   
   However, `Datasketch HLL` requires you to add that aggregation during ingestion time, thus severely limiting its usage.
   
   What if after you have been ingesting the data, you decide you want to calculate the cardinality of some columns? You can do that with `Cardinality` aggregator.
    
   ### Motivation
   
   `Datasketch HLL` requires you to specify columns to calculate cardinality over during ingestion time. This limits its usage if you decide to calculate cardinality over some different column later in your application.
   
   Since the `Cardinality` aggregator has no such limitation, it should _not_ be deprecated and kept alongside `Datasketch HLL`. Those who want ultra fast cardinality queries  over specific columns can pre-specify them during ingestion. For others, the Cardinality aggregator would provide a good fallback.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org