You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/03/19 14:34:11 UTC

[GitHub] [incubator-druid] quenlang opened a new issue #7297: thetaSketch aggrgator handle null or "" into unexpected value at ingesting

quenlang opened a new issue #7297: thetaSketch aggrgator handle null or "" into unexpected value at ingesting 
URL: https://github.com/apache/incubator-druid/issues/7297
 
 
   @AlexanderSaydakov @gianm 
   I had found the thetaSketch aggrgator handle ```null``` or ```""``` into unexpected value at ingesting in our scene but i'm not sure why it happend.
   
   I defined a thetaSketch aggrgator in our metrics like this
   ```
    {
           "name": "prefix_success_business_no",
           "fieldName": "prefix_success_business_no",
           "type": "thetaSketch"
    }
   ```
   The value of prefix_success_business_no column in raw data before ingesting maybe `null`, `empty string` or `normal string` which like "quenlang@126.com". we found that even though the prefix_success_business_no's value is null or "" or combination of the two in raw data, then after ingesting, the distinct count of prefix_success_business_no was not null or zero when i performed a thetaSketch aggrgator query like this
   ```
   ...
   {
         "type": "thetaSketch",
         "name": "prefixSuccessBusinessNo",
         "fieldName": "prefix_success_business_no",
         "size": 16384,
         "shouldFinalize": true,
         "isInputThetaSketch": false,
         "errorBoundsStdDev": null
   }
   ...
   ```
   the result like this
   ```
   [ {
     "timestamp" : "2019-03-18T13:30:00.000Z",
     "result" : {
       "prefixSuccessBusiness_no" : 16.0
     }
   } ]
   ```
   All the orignal value of prefix_success_business_no is null in this query, but druid return the distinct count of prefix_success_business_no 16 for me.  I had no ideas about this situation, does the thetaSkect not handle null or "" fully? my druid version was  0.13.0.
   
   There was a new aggrgator in 0.13.0 which called HLLSkecth, can you tell me more about the difference between HLLSkecth and thetaSkect over space, speed and accuracy? which one i should use in my sense.
   
   
   Best wishes !
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org