You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Ted Dunning <te...@gmail.com> on 2016/05/02 02:22:11 UTC

Re: Probabilistic data structures in Drill

Drill doesn't use any such data structures in itself. The emphasis has been
on being correct first with the option of introducing approximations later.

That said, you can definitely define aggregators yourself. Last I checked,
however, user defined aggregators are single level ... that means that
everything that gets aggregated has to go through a single function which
definitely limits scalability. This was several months ago, though, so
things may have improved by now.

Perhaps somebody can comment on whether multi-level user-defined
aggregators are possible?

On Sat, Apr 30, 2016 at 8:32 AM, Edmon Begoli <eb...@gmail.com> wrote:

> Is Drill using any of the probabilistic data structures [1], and if so -
> which ones and how?
>
> Thank you,
> Edmon
>
> 1. Probabilistic Data Structures -
> https://en.m.wikipedia.org/wiki/Category:Probabilistic_data_structures
>

Re: Probabilistic data structures in Drill

Posted by Sudheesh Katkam <sk...@maprtech.com>.

There is a pending pull request [1] to support table statistics. This includes using HyperLogLog to estimate number of distinct values, etc. I do not know further details.

Thank you,
Sudheesh

[1] https://github.com/apache/drill/pull/425 <https://github.com/apache/drill/pull/425>

> On May 1, 2016, at 7:26 PM, Edmon Begoli <eb...@gmail.com> wrote:
> 
> Yes, I am preparing a research seminar, and I am doing a survey of the uses
> or probabilistic and synopsis data structures in post-Hadoop "Big Data"
> technologies.
> 
> On Sun, May 1, 2016 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> Drill also makes use of hash tables and hash partitioning.
>> 
>> I’m not sure what was the purpose of your question. Are you carrying out a
>> survey?
>> 
>> Julian
>> 
>> 
>>> On May 1, 2016, at 5:22 PM, Ted Dunning <te...@gmail.com> wrote:
>>> 
>>> Drill doesn't use any such data structures in itself. The emphasis has
>> been
>>> on being correct first with the option of introducing approximations
>> later.
>>> 
>>> That said, you can definitely define aggregators yourself. Last I
>> checked,
>>> however, user defined aggregators are single level ... that means that
>>> everything that gets aggregated has to go through a single function which
>>> definitely limits scalability. This was several months ago, though, so
>>> things may have improved by now.
>>> 
>>> Perhaps somebody can comment on whether multi-level user-defined
>>> aggregators are possible?
>>> 
>>> 
>>> 
>>> On Sat, Apr 30, 2016 at 8:32 AM, Edmon Begoli <eb...@gmail.com> wrote:
>>> 
>>>> Is Drill using any of the probabilistic data structures [1], and if so -
>>>> which ones and how?
>>>> 
>>>> Thank you,
>>>> Edmon
>>>> 
>>>> 1. Probabilistic Data Structures -
>>>> https://en.m.wikipedia.org/wiki/Category:Probabilistic_data_structures
>>>> 
>> 
>>

Re: Probabilistic data structures in Drill

Posted by Edmon Begoli <eb...@gmail.com>.

Yes, I am preparing a research seminar, and I am doing a survey of the uses
or probabilistic and synopsis data structures in post-Hadoop "Big Data"
technologies.

On Sun, May 1, 2016 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:

> Drill also makes use of hash tables and hash partitioning.
>
> I’m not sure what was the purpose of your question. Are you carrying out a
> survey?
>
> Julian
>
>
> > On May 1, 2016, at 5:22 PM, Ted Dunning <te...@gmail.com> wrote:
> >
> > Drill doesn't use any such data structures in itself. The emphasis has
> been
> > on being correct first with the option of introducing approximations
> later.
> >
> > That said, you can definitely define aggregators yourself. Last I
> checked,
> > however, user defined aggregators are single level ... that means that
> > everything that gets aggregated has to go through a single function which
> > definitely limits scalability. This was several months ago, though, so
> > things may have improved by now.
> >
> > Perhaps somebody can comment on whether multi-level user-defined
> > aggregators are possible?
> >
> >
> >
> > On Sat, Apr 30, 2016 at 8:32 AM, Edmon Begoli <eb...@gmail.com> wrote:
> >
> >> Is Drill using any of the probabilistic data structures [1], and if so -
> >> which ones and how?
> >>
> >> Thank you,
> >> Edmon
> >>
> >> 1. Probabilistic Data Structures -
> >> https://en.m.wikipedia.org/wiki/Category:Probabilistic_data_structures
> >>
>
>

Re: Probabilistic data structures in Drill

Posted by Julian Hyde <jh...@apache.org>.

Drill also makes use of hash tables and hash partitioning.

I’m not sure what was the purpose of your question. Are you carrying out a survey?

Julian


> On May 1, 2016, at 5:22 PM, Ted Dunning <te...@gmail.com> wrote:
> 
> Drill doesn't use any such data structures in itself. The emphasis has been
> on being correct first with the option of introducing approximations later.
> 
> That said, you can definitely define aggregators yourself. Last I checked,
> however, user defined aggregators are single level ... that means that
> everything that gets aggregated has to go through a single function which
> definitely limits scalability. This was several months ago, though, so
> things may have improved by now.
> 
> Perhaps somebody can comment on whether multi-level user-defined
> aggregators are possible?
> 
> 
> 
> On Sat, Apr 30, 2016 at 8:32 AM, Edmon Begoli <eb...@gmail.com> wrote:
> 
>> Is Drill using any of the probabilistic data structures [1], and if so -
>> which ones and how?
>> 
>> Thank you,
>> Edmon
>> 
>> 1. Probabilistic Data Structures -
>> https://en.m.wikipedia.org/wiki/Category:Probabilistic_data_structures
>>