You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Michael Hausenblas <mi...@gmail.com> on 2012/09/05 09:40:35 UTC

Market watch/design input

All,

Just stumbled upon 'Processing a Trillion Cells per Mouse Click' [1] by the Google research folks (ignoring the name, PowerDrill, for now): though in-memory dependent, this might be interesting for us?

Cheers,
	   Michael

[1] http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/


Re: Market watch/design input

Posted by Tomer Shiran <ts...@maprtech.com>.
Yep, I agree. We should be able to support the PowerDrill format by
introducing a "PowerDrill scan operator"

On Wed, Sep 5, 2012 at 12:54 AM, karthik tunga <ka...@gmail.com>wrote:

> Hi,
>
> I think this was already mentioned as part of the initial reading links.
> Its quite an interesting paper.
> The storage format (dictionary encoding) differs from dremel. The
> distributed execution is similar to dremel.
>
> Cheers,
> Karthik
>
> On 5 September 2012 03:40, Michael Hausenblas
> <mi...@gmail.com>wrote:
>
> >
> > All,
> >
> > Just stumbled upon 'Processing a Trillion Cells per Mouse Click' [1] by
> > the Google research folks (ignoring the name, PowerDrill, for now):
> though
> > in-memory dependent, this might be interesting for us?
> >
> > Cheers,
> >            Michael
> >
> > [1] http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf
> >
> > --
> > Michael Hausenblas
> > Ireland, Europe
> > http://mhausenblas.info/
> >
> >
>



-- 
Tomer Shiran
Director of Product Management | MapR Technologies | 650-804-8657

Re: Market watch/design input

Posted by Jason Frantz <jf...@maprtech.com>.
Right. The difference in the two papers is that Dremel has an efficient
encoding for a disk-resident dataset, whereas PowerDrill has an efficient
encoding for a memory-resident dataset.

On Thu, Sep 6, 2012 at 2:22 PM, karthik tunga <ka...@gmail.com>wrote:

> If we assume the the data fits in memory, couldn't dremel do the same thing
> ?
>
> On 5 September 2012 12:58, Ted Dunning <te...@gmail.com> wrote:
>
> > Excellent idea.  As Tomer indicates, with a flexible scan operator, this
> > should be quite doable.
> >
> > On Wed, Sep 5, 2012 at 8:32 AM, Michael Hausenblas <
> > michael.hausenblas@gmail.com> wrote:
> >
> > > > Yes it is easy.  It indicates what you can do if you have stuff in
> > memory
> > > > and have a good index.
> > >
> > > Thanks, understood. However, I'd like to come back to my initial
> message
> > > and make a more explicit proposal: 'Include PowerDrill design and
> > insights
> > > as an input into our design'.
> > >
> >
>

Re: Market watch/design input

Posted by karthik tunga <ka...@gmail.com>.
If we assume the the data fits in memory, couldn't dremel do the same thing
?

On 5 September 2012 12:58, Ted Dunning <te...@gmail.com> wrote:

> Excellent idea.  As Tomer indicates, with a flexible scan operator, this
> should be quite doable.
>
> On Wed, Sep 5, 2012 at 8:32 AM, Michael Hausenblas <
> michael.hausenblas@gmail.com> wrote:
>
> > > Yes it is easy.  It indicates what you can do if you have stuff in
> memory
> > > and have a good index.
> >
> > Thanks, understood. However, I'd like to come back to my initial message
> > and make a more explicit proposal: 'Include PowerDrill design and
> insights
> > as an input into our design'.
> >
>

Re: Market watch/design input

Posted by Ted Dunning <te...@gmail.com>.
Excellent idea.  As Tomer indicates, with a flexible scan operator, this
should be quite doable.

On Wed, Sep 5, 2012 at 8:32 AM, Michael Hausenblas <
michael.hausenblas@gmail.com> wrote:

> > Yes it is easy.  It indicates what you can do if you have stuff in memory
> > and have a good index.
>
> Thanks, understood. However, I'd like to come back to my initial message
> and make a more explicit proposal: 'Include PowerDrill design and insights
> as an input into our design'.
>

Re: Market watch/design input

Posted by Michael Hausenblas <mi...@gmail.com>.
Ted,

> Yes it is easy.  It indicates what you can do if you have stuff in memory
> and have a good index.

Thanks, understood. However, I'd like to come back to my initial message and make a more explicit proposal: 'Include PowerDrill design and insights as an input into our design'.

As you and the other initial committers are the only ones who, ATM, *can* push to the Git repo I'd like to learn, in addition, how the process works: is it correct to assume that one comes up with a proposal here on the list and after discussion (and positive resolution) one of the committers updates the respective resources in the repo?

Cheers,
	   Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 5 Sep 2012, at 15:57, Ted Dunning wrote:

> Yes it is easy.  It indicates what you can do if you have stuff in memory
> and have a good index.
> 
> On Wed, Sep 5, 2012 at 1:18 AM, Michael Hausenblas <
> michael.hausenblas@gmail.com> wrote:
> 
>> 
>>> I think this was already mentioned as part of the initial reading links.
>>> Its quite an interesting paper.
>> 
>> 
>> Hmmm. If you're referring to the 'Drill reading links' thread [1] - I
>> didn't find it listed there.
>> 
>> FYI: I stumbled upon it via G+ today [2] and the earliest mentioning I was
>> able to track down is a Wired post from late August [3].
>> 
>> Cheers,
>>           Michael
>> 
>> [1]
>> http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201208.mbox/%3CCAGzKHweziabN9yhsCFUMVaAn-GL3dGauCXh6zb6kg_Xwg3xNnw%40mail.gmail.com%3E
>> [2] https://plus.google.com/u/0/+ResearchatGoogle/posts/UaDPdYu2q1u
>> [3]
>> http://www.wired.com/wiredenterprise/2012/08/google-trillion-pieces-of-data/
>> 
>> --
>> Michael Hausenblas
>> Ireland, Europe
>> http://mhausenblas.info/
>> 
>> On 5 Sep 2012, at 08:54, karthik tunga wrote:
>> 
>>> Hi,
>>> 
>>> I think this was already mentioned as part of the initial reading links.
>>> Its quite an interesting paper.
>>> The storage format (dictionary encoding) differs from dremel. The
>>> distributed execution is similar to dremel.
>>> 
>>> Cheers,
>>> Karthik
>>> 
>>> On 5 September 2012 03:40, Michael Hausenblas
>>> <mi...@gmail.com>wrote:
>>> 
>>>> 
>>>> All,
>>>> 
>>>> Just stumbled upon 'Processing a Trillion Cells per Mouse Click' [1] by
>>>> the Google research folks (ignoring the name, PowerDrill, for now):
>> though
>>>> in-memory dependent, this might be interesting for us?
>>>> 
>>>> Cheers,
>>>>          Michael
>>>> 
>>>> [1] http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf
>>>> 
>>>> --
>>>> Michael Hausenblas
>>>> Ireland, Europe
>>>> http://mhausenblas.info/
>>>> 
>>>> 
>> 
>> 


Re: Market watch/design input

Posted by Ted Dunning <te...@gmail.com>.
Yes it is easy.  It indicates what you can do if you have stuff in memory
and have a good index.

On Wed, Sep 5, 2012 at 1:18 AM, Michael Hausenblas <
michael.hausenblas@gmail.com> wrote:

>
> > I think this was already mentioned as part of the initial reading links.
> > Its quite an interesting paper.
>
>
> Hmmm. If you're referring to the 'Drill reading links' thread [1] - I
> didn't find it listed there.
>
> FYI: I stumbled upon it via G+ today [2] and the earliest mentioning I was
> able to track down is a Wired post from late August [3].
>
> Cheers,
>            Michael
>
> [1]
> http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201208.mbox/%3CCAGzKHweziabN9yhsCFUMVaAn-GL3dGauCXh6zb6kg_Xwg3xNnw%40mail.gmail.com%3E
> [2] https://plus.google.com/u/0/+ResearchatGoogle/posts/UaDPdYu2q1u
> [3]
> http://www.wired.com/wiredenterprise/2012/08/google-trillion-pieces-of-data/
>
> --
> Michael Hausenblas
> Ireland, Europe
> http://mhausenblas.info/
>
> On 5 Sep 2012, at 08:54, karthik tunga wrote:
>
> > Hi,
> >
> > I think this was already mentioned as part of the initial reading links.
> > Its quite an interesting paper.
> > The storage format (dictionary encoding) differs from dremel. The
> > distributed execution is similar to dremel.
> >
> > Cheers,
> > Karthik
> >
> > On 5 September 2012 03:40, Michael Hausenblas
> > <mi...@gmail.com>wrote:
> >
> >>
> >> All,
> >>
> >> Just stumbled upon 'Processing a Trillion Cells per Mouse Click' [1] by
> >> the Google research folks (ignoring the name, PowerDrill, for now):
> though
> >> in-memory dependent, this might be interesting for us?
> >>
> >> Cheers,
> >>           Michael
> >>
> >> [1] http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf
> >>
> >> --
> >> Michael Hausenblas
> >> Ireland, Europe
> >> http://mhausenblas.info/
> >>
> >>
>
>

Re: Market watch/design input

Posted by Michael Hausenblas <mi...@gmail.com>.
> I think this was already mentioned as part of the initial reading links.
> Its quite an interesting paper.


Hmmm. If you're referring to the 'Drill reading links' thread [1] - I didn't find it listed there. 

FYI: I stumbled upon it via G+ today [2] and the earliest mentioning I was able to track down is a Wired post from late August [3].

Cheers,
	   Michael

[1] http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201208.mbox/%3CCAGzKHweziabN9yhsCFUMVaAn-GL3dGauCXh6zb6kg_Xwg3xNnw%40mail.gmail.com%3E
[2] https://plus.google.com/u/0/+ResearchatGoogle/posts/UaDPdYu2q1u
[3] http://www.wired.com/wiredenterprise/2012/08/google-trillion-pieces-of-data/

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 5 Sep 2012, at 08:54, karthik tunga wrote:

> Hi,
> 
> I think this was already mentioned as part of the initial reading links.
> Its quite an interesting paper.
> The storage format (dictionary encoding) differs from dremel. The
> distributed execution is similar to dremel.
> 
> Cheers,
> Karthik
> 
> On 5 September 2012 03:40, Michael Hausenblas
> <mi...@gmail.com>wrote:
> 
>> 
>> All,
>> 
>> Just stumbled upon 'Processing a Trillion Cells per Mouse Click' [1] by
>> the Google research folks (ignoring the name, PowerDrill, for now): though
>> in-memory dependent, this might be interesting for us?
>> 
>> Cheers,
>>           Michael
>> 
>> [1] http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf
>> 
>> --
>> Michael Hausenblas
>> Ireland, Europe
>> http://mhausenblas.info/
>> 
>> 


Re: Market watch/design input

Posted by karthik tunga <ka...@gmail.com>.
Hi,

I think this was already mentioned as part of the initial reading links.
Its quite an interesting paper.
The storage format (dictionary encoding) differs from dremel. The
distributed execution is similar to dremel.

Cheers,
Karthik

On 5 September 2012 03:40, Michael Hausenblas
<mi...@gmail.com>wrote:

>
> All,
>
> Just stumbled upon 'Processing a Trillion Cells per Mouse Click' [1] by
> the Google research folks (ignoring the name, PowerDrill, for now): though
> in-memory dependent, this might be interesting for us?
>
> Cheers,
>            Michael
>
> [1] http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf
>
> --
> Michael Hausenblas
> Ireland, Europe
> http://mhausenblas.info/
>
>