You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/06/03 07:55:45 UTC
[GitHub] [incubator-pinot] lakshmanan-v opened a new issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost
lakshmanan-v opened a new issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014
DISTINCTCOUNTHLL accuracy and memory footprint can be improved through latest HLL algorithms. We have a choice either replace the existing implementation with a better one or leave the existing DISTINCTCOUNTHLL to implement original HLL and create separate functions (ex: DISTINCTCOUNTHLLPLUSPLUS).
[Google's HLL++](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf) -- a popular algorithm amongst the community offers lot of improvements over original HLL. There are multiple java implementations of HLL++. Most of them have variations in performance due to the register size and other implementation choices. Clearspring [stream-lib](https://github.com/addthis/stream-lib) used for current HyperLogLog function, implements HLL++ as [HyperLogPlus](https://github.com/addthis/stream-lib/commits/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] lakshmanan-v commented on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost
Posted by GitBox <gi...@apache.org>.
lakshmanan-v commented on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854110461
Good point @Jackie-Jiang. We could add this as a separate function DISTINCTCOUNTHLLSKETCH (similar to DISTINCTCOUNTTHETASKETCH). It would be nice to document the parameters, accuracy and space requirements of each of HLL implementations as we have a handful of them already. Users can chose the right one based on their need.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] lakshmanan-v edited a comment on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost
Posted by GitBox <gi...@apache.org>.
lakshmanan-v edited a comment on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854110461
Good point @Jackie-Jiang. We could add this as a separate function DISTINCTCOUNDATASKETCHHLL (similar to DISTINCTCOUNTTHETASKETCH). It would be nice to document the parameters, accuracy and space requirements of each of HLL implementations as we have a handful of them already. Users can chose the right one based on their need.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] lakshmanan-v edited a comment on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost
Posted by GitBox <gi...@apache.org>.
lakshmanan-v edited a comment on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854110461
Good point @Jackie-Jiang. We could add this as a separate function DISTINCTCOUNDATASKETCHHLL (similar to DISTINCTCOUNTTHETASKETCH). It would be nice to document the parameters, accuracy and space requirements of each of HLL implementations as we have a handful of them already. Users can chose the right one based on their need.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854037997
Another alternative is the Data Sketch HLL: https://datasketches.apache.org/docs/HLL/HLL.html
They claim better performance in this paper: https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] lakshmanan-v commented on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost
Posted by GitBox <gi...@apache.org>.
lakshmanan-v commented on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854110461
Good point @Jackie-Jiang. We could add this as a separate function DISTINCTCOUNTHLLSKETCH (similar to DISTINCTCOUNTTHETASKETCH). It would be nice to document the parameters, accuracy and space requirements of each of HLL implementations as we have a handful of them already. Users can chose the right one based on their need.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854037997
Another alternative is the Data Sketch HLL: https://datasketches.apache.org/docs/HLL/HLL.html
They claim better performance in this paper: https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org