You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "cbalci (via GitHub)" <gi...@apache.org> on 2023/02/03 22:55:32 UTC

[GitHub] [pinot] cbalci commented on issue #6420: [discuss]integrating Apache DataSketches library

cbalci commented on issue #6420:
URL: https://github.com/apache/pinot/issues/6420#issuecomment-1416502419

   I'd like to revive this thread highlighting another advantage of using the DataSketches library which I came across recently.
   
   Besides being well maintained, the library provides binary representation compatibility across implementations, which makes it interoperable with external systems. An example usage might be a Spark pipeline generating intermediate data sketches before writing into Pinot. Pinot can serve queries with complex filters and eventually merge/intersect these sketches to produce estimates.
   
   This can already be achieved in Pinot for cardinality estimation using ThetaSketch functions 👍 .
   I think a quantile implementation such as KLL and a Frequent Items sketch from this library would be great additions to complete the picture.
   
   I'd be happy to give it a hand if we can get a consensus on including them. @mayankshriv please let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org