You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/03/14 01:09:05 UTC
[GitHub] [incubator-pinot] xuanzih opened a new issue #5153: different
result on fasthll and distinctcounthll
xuanzih opened a new issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153
hi guys, we are trying to switch from fasthll to distinctcounthll.
`com.clearspring.analytics.stream.cardinality.HyperLogLog;` is used in code and `org.apache.pinot.core.startree.hll.HllUtil` to serialize the hll to a string.
with the same condition we have 1000x difference.
Example:
```
SELECT fasthll(my_hll), distinctcounthll(my_hll)
FROM counts_table WHERE timestamp >= 1500768000
```
I get results:
```
"aggregationResults": [
{
"function": "fastHLL_my_hll",
"value": "68685244"
}, {
"function": "distinctCountHLL_my_hll",
"value": "50535"
}]
```
Could anyone suggest what's the big difference between them?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #5153: different
result on fasthll and distinctcounthll
Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153#issuecomment-598991778
FastHll will convert one string into a hyperloglog object, which may represent thousand unique values. DistinctCountHLL treats string as a value, not hyperloglog object, so it will return the approximation of how many unique hyperloglog serialized strings, the value should be close to your total number scanned .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5153: different
result on fasthll and distinctcounthll
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153#issuecomment-599022898
`fasthll` is deprecated because of the low performance of deserialization. You may generate BYTES type for serialized `HyperLogLog` using `org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog)` and query it with `distinctcounthll`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang closed issue #5153: different result
on fasthll and distinctcounthll
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang closed issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #5153: different
result on fasthll and distinctcounthll
Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #5153: different result on fasthll and distinctcounthll
URL: https://github.com/apache/incubator-pinot/issues/5153#issuecomment-598991898
DistinctCountHLL could only infer bytes column as a hyperloglog object.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org