You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/06/03 07:55:45 UTC

[GitHub] [incubator-pinot] lakshmanan-v opened a new issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost

lakshmanan-v opened a new issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014


   DISTINCTCOUNTHLL accuracy and memory footprint can be improved through latest HLL algorithms. We have a choice either replace the existing implementation with a better one or leave the existing DISTINCTCOUNTHLL to implement original HLL and create separate functions (ex: DISTINCTCOUNTHLLPLUSPLUS).
   
   [Google's HLL++](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf) -- a popular algorithm amongst the community offers lot of improvements over original HLL. There are multiple java implementations of HLL++. Most of them have variations in performance due to the register size and other implementation choices. Clearspring [stream-lib](https://github.com/addthis/stream-lib) used for current HyperLogLog function, implements HLL++ as [HyperLogPlus](https://github.com/addthis/stream-lib/commits/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java). 
   
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] lakshmanan-v commented on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost

Posted by GitBox <gi...@apache.org>.
lakshmanan-v commented on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854110461


   Good point @Jackie-Jiang. We could add this as a separate function DISTINCTCOUNTHLLSKETCH (similar to DISTINCTCOUNTTHETASKETCH). It would be nice to document the parameters, accuracy and space requirements of each of HLL implementations as we have a handful of them already. Users can chose the right one based on their need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] lakshmanan-v edited a comment on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost

Posted by GitBox <gi...@apache.org>.
lakshmanan-v edited a comment on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854110461


   Good point @Jackie-Jiang. We could add this as a separate function DISTINCTCOUNDATASKETCHHLL (similar to DISTINCTCOUNTTHETASKETCH). It would be nice to document the parameters, accuracy and space requirements of each of HLL implementations as we have a handful of them already. Users can chose the right one based on their need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] lakshmanan-v edited a comment on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost

Posted by GitBox <gi...@apache.org>.
lakshmanan-v edited a comment on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854110461


   Good point @Jackie-Jiang. We could add this as a separate function DISTINCTCOUNDATASKETCHHLL (similar to DISTINCTCOUNTTHETASKETCH). It would be nice to document the parameters, accuracy and space requirements of each of HLL implementations as we have a handful of them already. Users can chose the right one based on their need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854037997


   Another alternative is the Data Sketch HLL: https://datasketches.apache.org/docs/HLL/HLL.html
   They claim better performance in this paper: https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] lakshmanan-v commented on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost

Posted by GitBox <gi...@apache.org>.
lakshmanan-v commented on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854110461


   Good point @Jackie-Jiang. We could add this as a separate function DISTINCTCOUNTHLLSKETCH (similar to DISTINCTCOUNTTHETASKETCH). It would be nice to document the parameters, accuracy and space requirements of each of HLL implementations as we have a handful of them already. Users can chose the right one based on their need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #7014: Add HLL++ support for better accuracy and possibly lower memory cost

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7014:
URL: https://github.com/apache/incubator-pinot/issues/7014#issuecomment-854037997


   Another alternative is the Data Sketch HLL: https://datasketches.apache.org/docs/HLL/HLL.html
   They claim better performance in this paper: https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org