You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/01/07 02:00:13 UTC

Apache Pinot Daily Email Digest (2021-01-06)

### _#general_

  
 **@tangyonga:** • Hi Team, I have seen that in 0.4.0, pinot has implemented
the initial version of theta-sketch based distinct count aggregation function,
utilizing the  library. Compared to Druid the latest release which has also
included DataSketches extension(, ,  ,), pinot has any plan to implement other
sketchs other than Theta sketch). Thanks.  
**@mayanks:** Pinot already supports HLL and TDigest based percentiles. If
there's a specific case where you would find DataSketch based implementations
more useful, we can definitely explore that. If so, would recommend filing an
issue for that.  
**@mayanks:** For HLL we use
`com.clearspring.analytics.stream.cardinality.HyperLogLog`  
**@mayanks:** And for TDigest, we use `com.tdunning.math.stats.TDigest`  
**@tangyonga:** Thanks for quick reply!  
**@mayanks:** :+1:  
**@tangyonga:** @mayanks we maybe need to pay attention to KLL sketch vs
t-digest(pinot impmentation) and seeing the following comparison by
datasketches,  
**@mayanks:** Thanks for sharing @tangyonga. We can definitely explore adding
these if needed.  
**@tangyonga:** appendix(): HLL @mayanks  
**@tangyonga:** Also noting that DataSketches includes a latest : Estimating
Stream Cardinalities more efficiently than the famous HLL sketch, which is
from  
**@mayanks:** If you could open an issue and add all this there, it would help
us track this request @tangyonga  
**@tangyonga:** I will try to open an issue to discuss sketches family
@mayanks  
**@mayanks:** Thanks @tangyonga.  
 **@sosyalmedya.oguzhan:** Hello, do pinot supports upsert for offline tables?
or do it only supports that for realtime tables? for example; when late data
arrived after the real-time segment is flushed, can pinot update it?  
**@mayanks:** @sosyalmedya.oguzhan At the moment the support is for real-time
only. However, Pinot segments don’t need to be time partitioned, so late
arriving data is not an issue cc: @yupeng  
**@yupeng:** Yes, upset is for realtime only, for offline table you can do the
compaction in the segment creation job.  
**@yupeng:** but the offline upsert support is on the roadmap for the upcoming
months  
 **@john:** @john has joined the channel  
 **@egala:** @egala has joined the channel  
 **@myeole:** Hello, Do we have any pinot DB benchmarks we can refer to ?  
**@g.kishore:** We have some we did at LinkedIn and recently Confluera
published some numbers..  
**@myeole:** Thanks  
**@g.kishore:** we always suggest doing the benchmark for your use case and
with your data  
**@g.kishore:** you can see the indexing techniques in Pinot and some
performance numbers on 5 years of GitHub Data  
**@myeole:** Sure Thanks  
 **@zjinwei:** Hi, is it possible to monitor Pinot DB metrics with Wavefront
instead of Prometheus and Grafana? Are there any docs I can refer to? Thanks  
**@g.kishore:** All metrics are emitted via JMX. I am not familiar with
wavefront... Does wavefront have a JMX exporter?  
**@zjinwei:** Hi Kishore, thanks for replying. I found something in Wavefront
about JMX Integration.. I thinks it might work. Just curious are there any way
we can implement to achieve native integration using Dropwizard?  

###  _#random_

  
 **@john:** @john has joined the channel  
 **@egala:** @egala has joined the channel  

###  _#troubleshooting_

  
 **@john:** @john has joined the channel  
 **@egala:** @egala has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org