You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/01/07 02:00:13 UTC
Apache Pinot Daily Email Digest (2021-01-06)
### _#general_
**@tangyonga:** • Hi Team, I have seen that in 0.4.0, pinot has implemented
the initial version of theta-sketch based distinct count aggregation function,
utilizing the library. Compared to Druid the latest release which has also
included DataSketches extension(, , ,), pinot has any plan to implement other
sketchs other than Theta sketch). Thanks.
**@mayanks:** Pinot already supports HLL and TDigest based percentiles. If
there's a specific case where you would find DataSketch based implementations
more useful, we can definitely explore that. If so, would recommend filing an
issue for that.
**@mayanks:** For HLL we use
`com.clearspring.analytics.stream.cardinality.HyperLogLog`
**@mayanks:** And for TDigest, we use `com.tdunning.math.stats.TDigest`
**@tangyonga:** Thanks for quick reply!
**@mayanks:** :+1:
**@tangyonga:** @mayanks we maybe need to pay attention to KLL sketch vs
t-digest(pinot impmentation) and seeing the following comparison by
datasketches,
**@mayanks:** Thanks for sharing @tangyonga. We can definitely explore adding
these if needed.
**@tangyonga:** appendix(): HLL @mayanks
**@tangyonga:** Also noting that DataSketches includes a latest : Estimating
Stream Cardinalities more efficiently than the famous HLL sketch, which is
from
**@mayanks:** If you could open an issue and add all this there, it would help
us track this request @tangyonga
**@tangyonga:** I will try to open an issue to discuss sketches family
@mayanks
**@mayanks:** Thanks @tangyonga.
**@sosyalmedya.oguzhan:** Hello, do pinot supports upsert for offline tables?
or do it only supports that for realtime tables? for example; when late data
arrived after the real-time segment is flushed, can pinot update it?
**@mayanks:** @sosyalmedya.oguzhan At the moment the support is for real-time
only. However, Pinot segments don’t need to be time partitioned, so late
arriving data is not an issue cc: @yupeng
**@yupeng:** Yes, upset is for realtime only, for offline table you can do the
compaction in the segment creation job.
**@yupeng:** but the offline upsert support is on the roadmap for the upcoming
months
**@john:** @john has joined the channel
**@egala:** @egala has joined the channel
**@myeole:** Hello, Do we have any pinot DB benchmarks we can refer to ?
**@g.kishore:** We have some we did at LinkedIn and recently Confluera
published some numbers..
**@myeole:** Thanks
**@g.kishore:** we always suggest doing the benchmark for your use case and
with your data
**@g.kishore:** you can see the indexing techniques in Pinot and some
performance numbers on 5 years of GitHub Data
**@myeole:** Sure Thanks
**@zjinwei:** Hi, is it possible to monitor Pinot DB metrics with Wavefront
instead of Prometheus and Grafana? Are there any docs I can refer to? Thanks
**@g.kishore:** All metrics are emitted via JMX. I am not familiar with
wavefront... Does wavefront have a JMX exporter?
**@zjinwei:** Hi Kishore, thanks for replying. I found something in Wavefront
about JMX Integration.. I thinks it might work. Just curious are there any way
we can implement to achieve native integration using Dropwizard?
### _#random_
**@john:** @john has joined the channel
**@egala:** @egala has joined the channel
### _#troubleshooting_
**@john:** @john has joined the channel
**@egala:** @egala has joined the channel
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org