You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Charles Allen <ch...@snap.com.INVALID> on 2019/02/25 17:56:44 UTC

Datasketches

There are a lot of here and there discussions on how to handle sketching /
hll / histograms / other-stats, and it is getting kind of hard to keep
track of them all.

In addition, looks like Datasketches is in an incubating proposal stage for
Apache
http://mail-archives.apache.org/mod_mbox/incubator-general/201902.mbox/%3CCA%2BUaPnt%3DUvbLr_v-4%2BYbAmHsAM-GqQG%2Bb%3DgOw3BL3Cemj%2BOwSA%40mail.gmail.com%3E


I think it is important enough and wide spread enough to have a top level
consideration within the druid project. Either a label or a "github
project" or something so that things can be tracked easier.

Anyone have any opinions or desires here?

Thanks,
Charles Allen

Re: Datasketches

Posted by Julian Hyde <jh...@apache.org>.
I don’t know how a project can formally track another project, but individuals certainly can.

If any Druid committers are ASF members then they could volunteer to help as mentors of the Data Sketches podling. 

If any Druid committers are past or current contributors to the DataSketches they could ask to be put onto the initial contributors list.

And any of us could join the podling’s dev list, monitor it to see if anything is of interest to Druid, and report back on this list. And vice versa, telling the DataSketches community what’s going on in Druid.

Lastly, if you find DataSketches interesting, just offer to help. I’m sure there’s plenty to do, and they would love your help. Your experience with Druid and the ASF incubation process will be useful to them.

Julian


> On Feb 25, 2019, at 9:56 AM, Charles Allen <ch...@snap.com.INVALID> wrote:
> 
> There are a lot of here and there discussions on how to handle sketching /
> hll / histograms / other-stats, and it is getting kind of hard to keep
> track of them all.
> 
> In addition, looks like Datasketches is in an incubating proposal stage for
> Apache
> http://mail-archives.apache.org/mod_mbox/incubator-general/201902.mbox/%3CCA%2BUaPnt%3DUvbLr_v-4%2BYbAmHsAM-GqQG%2Bb%3DgOw3BL3Cemj%2BOwSA%40mail.gmail.com%3E
> 
> 
> I think it is important enough and wide spread enough to have a top level
> consideration within the druid project. Either a label or a "github
> project" or something so that things can be tracked easier.
> 
> Anyone have any opinions or desires here?
> 
> Thanks,
> Charles Allen


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org


Re: Datasketches

Posted by Roman Leventov <le...@gmail.com>.
There is also an important sub-project in DataSketches - Memory (currently
https://github.com/DataSketches/memory) that originated from this issue:
https://github.com/apache/incubator-druid/issues/3892 and there is a plan
to eventually move Druid from ByteBuffer to Memory, at least in some parts.

On Mon, 25 Feb 2019 at 18:04, Charles Allen <ch...@snap.com.invalid>
wrote:

> Basically there are a LOT of issues and PRs that show up when searching for
> datasketches in the druid PR list:
>
> https://github.com/apache/incubator-druid/pulls?utf8=%E2%9C%93&q=datasketches
>
>
> Maybe just have a label called
>
> Area - Sketches
>
> ?
>
>
> On Mon, Feb 25, 2019 at 11:01 AM Gian Merlino <gi...@apache.org> wrote:
>
> > What scope would you suggest for the label or github project?
> >
> > There seem to be discussions going on around making DataSketches HLL
> and/or
> > Quantiles more 'default' options for their respective areas -- are you
> > thinking that kind of thing?
> >
> > On Mon, Feb 25, 2019 at 9:57 AM Charles Allen
> > <ch...@snap.com.invalid> wrote:
> >
> > > There are a lot of here and there discussions on how to handle
> sketching
> > /
> > > hll / histograms / other-stats, and it is getting kind of hard to keep
> > > track of them all.
> > >
> > > In addition, looks like Datasketches is in an incubating proposal stage
> > for
> > > Apache
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_incubator-2Dgeneral_201902.mbox_-253CCA-252BUaPnt-253DUvbLr-5Fv-2D4-252BYbAmHsAM-2DGqQG-252Bb-253DgOw3BL3Cemj-252BOwSA-2540mail.gmail.com-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=Szv_v6S3DItbN0qP2B1K4mtfj4ybBA-PVuomUFw5PBU&s=U9qD5_bDYYoUyr5SJZVa-UWB5jNabHXy51TvFHczR8E&e=
> > >
> > >
> > > I think it is important enough and wide spread enough to have a top
> level
> > > consideration within the druid project. Either a label or a "github
> > > project" or something so that things can be tracked easier.
> > >
> > > Anyone have any opinions or desires here?
> > >
> > > Thanks,
> > > Charles Allen
> > >
> >
>

Re: Datasketches

Posted by Charles Allen <ch...@snap.com.INVALID>.
Basically there are a LOT of issues and PRs that show up when searching for
datasketches in the druid PR list:
https://github.com/apache/incubator-druid/pulls?utf8=%E2%9C%93&q=datasketches


Maybe just have a label called

Area - Sketches

?


On Mon, Feb 25, 2019 at 11:01 AM Gian Merlino <gi...@apache.org> wrote:

> What scope would you suggest for the label or github project?
>
> There seem to be discussions going on around making DataSketches HLL and/or
> Quantiles more 'default' options for their respective areas -- are you
> thinking that kind of thing?
>
> On Mon, Feb 25, 2019 at 9:57 AM Charles Allen
> <ch...@snap.com.invalid> wrote:
>
> > There are a lot of here and there discussions on how to handle sketching
> /
> > hll / histograms / other-stats, and it is getting kind of hard to keep
> > track of them all.
> >
> > In addition, looks like Datasketches is in an incubating proposal stage
> for
> > Apache
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_incubator-2Dgeneral_201902.mbox_-253CCA-252BUaPnt-253DUvbLr-5Fv-2D4-252BYbAmHsAM-2DGqQG-252Bb-253DgOw3BL3Cemj-252BOwSA-2540mail.gmail.com-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=Szv_v6S3DItbN0qP2B1K4mtfj4ybBA-PVuomUFw5PBU&s=U9qD5_bDYYoUyr5SJZVa-UWB5jNabHXy51TvFHczR8E&e=
> >
> >
> > I think it is important enough and wide spread enough to have a top level
> > consideration within the druid project. Either a label or a "github
> > project" or something so that things can be tracked easier.
> >
> > Anyone have any opinions or desires here?
> >
> > Thanks,
> > Charles Allen
> >
>

Re: Datasketches

Posted by Gian Merlino <gi...@apache.org>.
What scope would you suggest for the label or github project?

There seem to be discussions going on around making DataSketches HLL and/or
Quantiles more 'default' options for their respective areas -- are you
thinking that kind of thing?

On Mon, Feb 25, 2019 at 9:57 AM Charles Allen
<ch...@snap.com.invalid> wrote:

> There are a lot of here and there discussions on how to handle sketching /
> hll / histograms / other-stats, and it is getting kind of hard to keep
> track of them all.
>
> In addition, looks like Datasketches is in an incubating proposal stage for
> Apache
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201902.mbox/%3CCA%2BUaPnt%3DUvbLr_v-4%2BYbAmHsAM-GqQG%2Bb%3DgOw3BL3Cemj%2BOwSA%40mail.gmail.com%3E
>
>
> I think it is important enough and wide spread enough to have a top level
> consideration within the druid project. Either a label or a "github
> project" or something so that things can be tracked easier.
>
> Anyone have any opinions or desires here?
>
> Thanks,
> Charles Allen
>