You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/03/29 06:03:57 UTC

[GitHub] [incubator-druid] pdeva edited a comment on issue #7337: DataSketches HLL is not a replacement for Cardinality aggregator

pdeva edited a comment on issue #7337: DataSketches HLL is not a replacement for Cardinality aggregator
URL: https://github.com/apache/incubator-druid/issues/7337#issuecomment-477879520
 
 
   @jon-wei  ok using `HLLSketchBuild` does work. though the documentation for this could definitely be improved.
   
   That said there are still a couple functionalities from `Cardinality` missing:
   
   1. `Cardinality` can take multiple fields. `HLLSketchBuild` can only take 1. 
   2. The `byRow` calculation is missing as pointed by @gianm 
   
   As mentioned in the [druid 0.13. docs for `Cardinality`](http://druid.io/docs/latest/querying/aggregations), I wonder how these 2 examples would translate to `HllSketchBuild`. 
   
   **Examples for Cardinality**
   
   1. Determine the number of distinct countries people are living in or have come from.
   
   ```json
   {
     "type": "cardinality",
     "name": "distinct_countries",
     "fields": [ "country_of_origin", "country_of_residence" ]
   }
   ```
   
   2. Determine the number of distinct people (i.e. combinations of first and last name).
   
   ```json
   {
     "type": "cardinality",
     "name": "distinct_people",
     "fields": [ "first_name", "last_name" ],
     "byRow" : true
   }
   ```
   
   3. Determine the number of distinct starting characters of last names
   
   ```json
   {
     "type": "cardinality",
     "name": "distinct_last_name_first_char",
     "fields": [
       {
        "type" : "extraction",
        "dimension" : "last_name",
        "outputName" :  "last_name_first_char",
        "extractionFn" : { "type" : "substring", "index" : 0, "length" : 1 }
       }
     ],
     "byRow" : true
   }
   ```
   
   <hr/>
   
   If you can provide the answers, I am happy to update the HLLSketch documentation with the equivalents for these.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org