You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Samarth Jain <sa...@apache.org> on 2019/03/19 22:54:36 UTC

T-Digest backed sketch aggregator

Hi,

T-Digest (https://github.com/tdunning/t-digest) data-structure is another
way of computing sketches, rank based statistics and trimmed means over
numeric data. At my day job, we have been using a t-digest backed Druid
aggregator module which generally has been working out well for the use
cases of respective teams. I think it would be valuable to have T-Digest
backed aggregators in Druid along with other sketch algorithms like moments
and yahoo quantile sketches.

T-Digest has also been adopted by other projects including:

Elastic Search -
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html#search-aggregations-metrics-percentile-aggregation-approximation

stream-lib (
https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/TDigest.java
)

Apache Mahout -
https://archive.cloudera.com/cdh5/cdh/5/mahout/mahout-math/org/apache/mahout/math/stats/TDigest.html

I have been working on cleaning up and improving performance of the module
and would like to contribute it. I would like to see what does the
community think about it.

Thanks,
Samarth

Re: T-Digest backed sketch aggregator

Posted by Samarth Jain <sa...@apache.org>.
Thanks for the replies, Jihoon and Gian. I have created a proposal -
https://github.com/apache/incubator-druid/issues/7303.

On Tue, Mar 19, 2019 at 5:29 PM Gian Merlino <gi...@apache.org> wrote:

> (The template is on
> https://github.com/apache/incubator-druid/issues/new/choose)
>
> It sounds cool to me too!
>
> On Tue, Mar 19, 2019 at 5:19 PM Jihoon Son <gh...@gmail.com> wrote:
>
> > Sounds great!
> > Would you mind writing a proposal about this?
> >
> > Jihoon
> >
> > On Tue, Mar 19, 2019 at 3:54 PM Samarth Jain <sa...@apache.org> wrote:
> >
> > > Hi,
> > >
> > > T-Digest (https://github.com/tdunning/t-digest) data-structure is
> > another
> > > way of computing sketches, rank based statistics and trimmed means over
> > > numeric data. At my day job, we have been using a t-digest backed Druid
> > > aggregator module which generally has been working out well for the use
> > > cases of respective teams. I think it would be valuable to have
> T-Digest
> > > backed aggregators in Druid along with other sketch algorithms like
> > moments
> > > and yahoo quantile sketches.
> > >
> > > T-Digest has also been adopted by other projects including:
> > >
> > > Elastic Search -
> > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html#search-aggregations-metrics-percentile-aggregation-approximation
> > >
> > > stream-lib (
> > >
> > >
> >
> https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/TDigest.java
> > > )
> > >
> > > Apache Mahout -
> > >
> > >
> >
> https://archive.cloudera.com/cdh5/cdh/5/mahout/mahout-math/org/apache/mahout/math/stats/TDigest.html
> > >
> > > I have been working on cleaning up and improving performance of the
> > module
> > > and would like to contribute it. I would like to see what does the
> > > community think about it.
> > >
> > > Thanks,
> > > Samarth
> > >
> >
>

Re: T-Digest backed sketch aggregator

Posted by Gian Merlino <gi...@apache.org>.
(The template is on
https://github.com/apache/incubator-druid/issues/new/choose)

It sounds cool to me too!

On Tue, Mar 19, 2019 at 5:19 PM Jihoon Son <gh...@gmail.com> wrote:

> Sounds great!
> Would you mind writing a proposal about this?
>
> Jihoon
>
> On Tue, Mar 19, 2019 at 3:54 PM Samarth Jain <sa...@apache.org> wrote:
>
> > Hi,
> >
> > T-Digest (https://github.com/tdunning/t-digest) data-structure is
> another
> > way of computing sketches, rank based statistics and trimmed means over
> > numeric data. At my day job, we have been using a t-digest backed Druid
> > aggregator module which generally has been working out well for the use
> > cases of respective teams. I think it would be valuable to have T-Digest
> > backed aggregators in Druid along with other sketch algorithms like
> moments
> > and yahoo quantile sketches.
> >
> > T-Digest has also been adopted by other projects including:
> >
> > Elastic Search -
> >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html#search-aggregations-metrics-percentile-aggregation-approximation
> >
> > stream-lib (
> >
> >
> https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/TDigest.java
> > )
> >
> > Apache Mahout -
> >
> >
> https://archive.cloudera.com/cdh5/cdh/5/mahout/mahout-math/org/apache/mahout/math/stats/TDigest.html
> >
> > I have been working on cleaning up and improving performance of the
> module
> > and would like to contribute it. I would like to see what does the
> > community think about it.
> >
> > Thanks,
> > Samarth
> >
>

Re: T-Digest backed sketch aggregator

Posted by Jihoon Son <gh...@gmail.com>.
Sounds great!
Would you mind writing a proposal about this?

Jihoon

On Tue, Mar 19, 2019 at 3:54 PM Samarth Jain <sa...@apache.org> wrote:

> Hi,
>
> T-Digest (https://github.com/tdunning/t-digest) data-structure is another
> way of computing sketches, rank based statistics and trimmed means over
> numeric data. At my day job, we have been using a t-digest backed Druid
> aggregator module which generally has been working out well for the use
> cases of respective teams. I think it would be valuable to have T-Digest
> backed aggregators in Druid along with other sketch algorithms like moments
> and yahoo quantile sketches.
>
> T-Digest has also been adopted by other projects including:
>
> Elastic Search -
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html#search-aggregations-metrics-percentile-aggregation-approximation
>
> stream-lib (
>
> https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/TDigest.java
> )
>
> Apache Mahout -
>
> https://archive.cloudera.com/cdh5/cdh/5/mahout/mahout-math/org/apache/mahout/math/stats/TDigest.html
>
> I have been working on cleaning up and improving performance of the module
> and would like to contribute it. I would like to see what does the
> community think about it.
>
> Thanks,
> Samarth
>