You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@datasketches.apache.org by liupeng_wx <li...@qq.com> on 2022/01/07 08:04:38 UTC

Frequent Distinct Tuples Sketch

hi all:

&nbsp; &nbsp; &nbsp; &nbsp; i have a question at Frequent Distinct Tuples Sketch。&nbsp;a multiset of tuples with&nbsp;N&nbsp;dimensions&nbsp;{d1,d2, d3, …, dN},FDT could  base on any of dimensions&nbsp; and&nbsp;&nbsp;approximate&nbsp;count distinct&nbsp; left dimensions。eg: select appromate&nbsp; group by(d1,d2),count distinct {d2,...dn} from&nbsp;sketches &nbsp;group by&nbsp;(d1,d2)。 is there a way to&nbsp; group any of dimensions and count distinct any of left&nbsp;dimensions,&nbsp;eg: select appromate&nbsp; group by(d1,d2),count distinct d3 from sketches&nbsp; group by&nbsp;(d1,d2)

Re: Frequent Distinct Tuples Sketch

Posted by Ben Krug <be...@imply.io>.
Apologies - I mixed up which user group I was reading.  I thought I was in
the druid users group, and distracted attention from the original question.

Apologies again,

Ben

On Wed, Jan 12, 2022 at 11:09 PM leerho <le...@gmail.com> wrote:

> I'd have to think about it more.  But the FDT sketch was put in the
> library as an example.  With tuple sketches you would have to write the
> code that encapsulates the tuple summary cells to do what you want and then
> extend the summary aggregator to do the proper merge operations.   So in a
> sense the generic Tuple sketch provides the underlying machinery to handle
> the approximate sampling and then you fill in the generic tuples with the
> proper aggregator (perhaps another sketch) and set the union aggregator
> function to properly feed those functions.  So many such ideas are
> potentially feasible, it just might require some serious thought about what
> exactly you want it to do, and then code it and try it.
> Cheers,
> Lee.
>
> On Fri, Jan 7, 2022 at 12:04 AM liupeng_wx <li...@qq.com> wrote:
>
>> hi all:
>>
>>         i have a question at Frequent Distinct Tuples Sketch。 a multiset
>> of tuples with N dimensions {d1,d2, d3, …, dN},FDT could base on any of
>> dimensions  and  approximate count distinct  left dimensions。eg: *select
>> appromate  group by(d1,d2),count distinct {d2,...dn} from sketches  group
>> by **(d1,d2)*。 is there a way to  group any of dimensions and count
>> distinct any of left dimensions, eg: *select appromate  group
>> by(d1,d2),count distinct d3 from sketches  group by (d1,d2)*
>>
>>
>>

Re: Frequent Distinct Tuples Sketch

Posted by leerho <le...@gmail.com>.
I'd have to think about it more.  But the FDT sketch was put in the library
as an example.  With tuple sketches you would have to write the code that
encapsulates the tuple summary cells to do what you want and then extend
the summary aggregator to do the proper merge operations.   So in a sense
the generic Tuple sketch provides the underlying machinery to handle the
approximate sampling and then you fill in the generic tuples with the
proper aggregator (perhaps another sketch) and set the union aggregator
function to properly feed those functions.  So many such ideas are
potentially feasible, it just might require some serious thought about what
exactly you want it to do, and then code it and try it.
Cheers,
Lee.

On Fri, Jan 7, 2022 at 12:04 AM liupeng_wx <li...@qq.com> wrote:

> hi all:
>
>         i have a question at Frequent Distinct Tuples Sketch。 a multiset
> of tuples with N dimensions {d1,d2, d3, …, dN},FDT could base on any of
> dimensions  and  approximate count distinct  left dimensions。eg: *select
> appromate  group by(d1,d2),count distinct {d2,...dn} from sketches  group
> by **(d1,d2)*。 is there a way to  group any of dimensions and count
> distinct any of left dimensions, eg: *select appromate  group
> by(d1,d2),count distinct d3 from sketches  group by (d1,d2)*
>
>
>

Re: Frequent Distinct Tuples Sketch

Posted by leerho <le...@gmail.com>.
Not directly.  But the FDT sketch is really pretty simple to code yourself,
and is in the library as primarily an example.
Nonetheless, one of the reasons that only a few of our sketches have been
adapted for Druid is that Druid requires that all sketches be capable of
operating off-heap.
Which is much more complex especially for generic sketches.  Although the
FDT sketch is simple in concept, making it work in the Druid environment
will be a challenge.
I'd be interested in hearing any thoughts you might have.
Lee.

On Fri, Jan 7, 2022 at 12:54 PM Ben Krug <be...@imply.io> wrote:

> Does druid support FDT sketches?  The datasketch module docs don't list
> it.
> https://druid.apache.org/docs/latest/development/extensions-core/datasketches-extension.html
>
> On Fri, Jan 7, 2022 at 1:04 AM liupeng_wx <li...@qq.com> wrote:
>
>> hi all:
>>
>>         i have a question at Frequent Distinct Tuples Sketch。 a multiset
>> of tuples with N dimensions {d1,d2, d3, …, dN},FDT could base on any of
>> dimensions  and  approximate count distinct  left dimensions。eg: *select
>> appromate  group by(d1,d2),count distinct {d2,...dn} from sketches  group
>> by **(d1,d2)*。 is there a way to  group any of dimensions and count
>> distinct any of left dimensions, eg: *select appromate  group
>> by(d1,d2),count distinct d3 from sketches  group by (d1,d2)*
>>
>>
>>

Re: Frequent Distinct Tuples Sketch

Posted by Ben Krug <be...@imply.io>.
Does druid support FDT sketches?  The datasketch module docs don't list
it.
https://druid.apache.org/docs/latest/development/extensions-core/datasketches-extension.html

On Fri, Jan 7, 2022 at 1:04 AM liupeng_wx <li...@qq.com> wrote:

> hi all:
>
>         i have a question at Frequent Distinct Tuples Sketch。 a multiset
> of tuples with N dimensions {d1,d2, d3, …, dN},FDT could base on any of
> dimensions  and  approximate count distinct  left dimensions。eg: *select
> appromate  group by(d1,d2),count distinct {d2,...dn} from sketches  group
> by **(d1,d2)*。 is there a way to  group any of dimensions and count
> distinct any of left dimensions, eg: *select appromate  group
> by(d1,d2),count distinct d3 from sketches  group by (d1,d2)*
>
>
>