You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/03/14 01:09:05 UTC

[GitHub] [incubator-pinot] xuanzih opened a new issue #5153: different result on fasthll and distinctcounthll

xuanzih opened a new issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153
 
 
   hi guys, we are trying to switch from fasthll to distinctcounthll.
   `com.clearspring.analytics.stream.cardinality.HyperLogLog;` is used in code and `org.apache.pinot.core.startree.hll.HllUtil` to serialize the hll to a string.
   with the same condition we have 1000x difference.
   Example:
   ```
   SELECT fasthll(my_hll), distinctcounthll(my_hll)
   FROM counts_table WHERE timestamp >= 1500768000
   ```
   I get results:
   ```
   "aggregationResults": [
       {
           "function": "fastHLL_my_hll",
           "value": "68685244"
       }, {
           "function": "distinctCountHLL_my_hll",
           "value": "50535"
       }]
   ```
   Could anyone suggest what's the big difference between them?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #5153: different result on fasthll and distinctcounthll

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153#issuecomment-598991778
 
 
   FastHll will convert one string into a hyperloglog object, which may represent thousand unique values. DistinctCountHLL treats string as a value, not hyperloglog object, so it will return the approximation of how many unique hyperloglog serialized strings, the value should be close to your total number scanned . 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5153: different result on fasthll and distinctcounthll

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153#issuecomment-599022898
 
 
   `fasthll` is deprecated because of the low performance of deserialization. You may generate BYTES type for serialized `HyperLogLog` using `org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog)` and query it with `distinctcounthll`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang closed issue #5153: different result on fasthll and distinctcounthll

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang closed issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #5153: different result on fasthll and distinctcounthll

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153#issuecomment-598991898
 
 
   DistinctCountHLL could only infer bytes column as a hyperloglog object.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org