You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/01/07 07:20:43 UTC

[GitHub] [incubator-pinot] tangyong opened a new issue #6420: [discuss]integrating Apache DataSketches library

tangyong opened a new issue #6420:
URL: https://github.com/apache/incubator-pinot/issues/6420


   The following is the discussion with Mayank on slack:
   
   Mark: Hi Team, I have seen that in 0.4.0, pinot has implemented the initial version of theta-sketch based distinct count aggregation function, utilizing the Apache DataSketches library.  Compared to Druid the latest release which has also included DataSketches extension(Theta sketch, Tuple sketch, Quantiles sketch ,HLL sketch),  pinot has any plan to implement other sketchs other than Theta sketch).  Thanks.
   
   Mayank: Pinot already supports HLL and TDigest based percentiles. If there's a specific case where you would find DataSketch based implementations more useful, we can definitely explore that. If so, would recommend filing an issue for that.
   
   Mayank: For HLL we use com.clearspring.analytics.stream.cardinality.HyperLogLog,And for TDigest, we use com.tdunning.math.stats.TDigest
   
   Mark: we maybe need to pay attention to KLL sketch vs t-digest(pinot impmentation) and seeing the following comparison by datasketches, https://datasketches.apache.org/docs/Quantiles/KllSketchVsTDigest.html
   
   Mayank: Thanks for sharing @Mark.Tang. We can definitely explore adding these if needed.
   
   Mark: appendix(https://github.com/apache/datasketches-website/blob/master/docs/pdf/DataSketches_deck.pdf): HLL 
   ![pinot1](https://user-images.githubusercontent.com/187414/103863413-c65e1500-50fb-11eb-9c6a-b1b9677b69a7.png)
   
   Also noting that DataSketches includes a latest CPC Sketch: Estimating Stream Cardinalities more efficiently than the famous HLL sketch, which is from https://arxiv.org/pdf/1708.06839.pdf


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org