You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by David Alves <da...@gmail.com> on 2013/04/06 00:31:35 UTC

timeline for dist execution

Hi All
	
	I was wondering if there is a timeline on when we might get a sketch of the dist execution engine.
	As I mentioned before I have a little over a month to get something working and I'm starting to get a bit worried.
	I've been working in the parallel per region hbase scanner and soon I'll have something usable.
	I can definitely put in a few hours working/helping on it if that helps, but as previously suggested I'd rather not reinvent the wheel.
	Right now I was thinking that something that plugs-in to the reference implementation (i.e. would not require a stable SE iface) would be a nice start.
	What do you think?

Best
David

Re: timeline for dist execution

Posted by Lisen Mu <im...@gmail.com>.

+1, great news, many thanks to the effort!




On Sun, Apr 7, 2013 at 1:45 PM, Timothy Chen <tn...@gmail.com> wrote:

> Sounds good, would love to join the session to help things move forward as
> well.
>
> Tim
>
>
> On Sat, Apr 6, 2013 at 9:36 PM, Jacques Nadeau <jacques.drill@gmail.com
> >wrote:
>
> > I'll try to drop some of my work and thoughts on the list this week.
> > As always with these things, everything takes longer than one would
> > like...
> >
> > I am also thinking that it might be good to do a google hangout
> > brainstorming session soon around some of this stuff to help move
> > things along.
> >
> > J
> >
> > On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> >
> > > +1
> > >
> > > It would be nice to see what's the current status and future plan on
> > in-mem
> > > data representation in the dist exec engine.
> > >
> > > I was previously going to do something about DataValue in exec/ref.
> > However
> > > after some reading into previous discussions in the maillist and some
> > links
> > > in 'useful research' wiki page
> > > (vldb09-tutorial6.pdf<http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
> >
> > > abadisigmod06.pdf<
> > http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> > > etc.)
> > >
> > > I found it non-trivial and crucial building block to build in-mem data
> > > structure. Incremental optimisation based on current DataValue seems a
> > bad
> > > idea.
> > >
> > > So what's your thought on this? If we could get a sketch, I would very
> > much
> > > like to do something on this issue.
> > >
> > >
> > > On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com>
> > wrote:
> > >
> > >> Hi All
> > >>
> > >>        I was wondering if there is a timeline on when we might get a
> > >> sketch of the dist execution engine.
> > >>        As I mentioned before I have a little over a month to get
> > >> something working and I'm starting to get a bit worried.
> > >>        I've been working in the parallel per region hbase scanner and
> > >> soon I'll have something usable.
> > >>        I can definitely put in a few hours working/helping on it if
> that
> > >> helps, but as previously suggested I'd rather not reinvent the wheel.
> > >>        Right now I was thinking that something that plugs-in to the
> > >> reference implementation (i.e. would not require a stable SE iface)
> > would
> > >> be a nice start.
> > >>        What do you think?
> > >>
> > >> Best
> > >> David
> > >>
> > >>
> > >>
> > >>
> >
>

Re: timeline for dist execution

Posted by Timothy Chen <tn...@gmail.com>.

Sounds good, would love to join the session to help things move forward as
well.

Tim


On Sat, Apr 6, 2013 at 9:36 PM, Jacques Nadeau <ja...@gmail.com>wrote:

> I'll try to drop some of my work and thoughts on the list this week.
> As always with these things, everything takes longer than one would
> like...
>
> I am also thinking that it might be good to do a google hangout
> brainstorming session soon around some of this stuff to help move
> things along.
>
> J
>
> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
>
> > +1
> >
> > It would be nice to see what's the current status and future plan on
> in-mem
> > data representation in the dist exec engine.
> >
> > I was previously going to do something about DataValue in exec/ref.
> However
> > after some reading into previous discussions in the maillist and some
> links
> > in 'useful research' wiki page
> > (vldb09-tutorial6.pdf<http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf>
> > abadisigmod06.pdf<
> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> > etc.)
> >
> > I found it non-trivial and crucial building block to build in-mem data
> > structure. Incremental optimisation based on current DataValue seems a
> bad
> > idea.
> >
> > So what's your thought on this? If we could get a sketch, I would very
> much
> > like to do something on this issue.
> >
> >
> > On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com>
> wrote:
> >
> >> Hi All
> >>
> >>        I was wondering if there is a timeline on when we might get a
> >> sketch of the dist execution engine.
> >>        As I mentioned before I have a little over a month to get
> >> something working and I'm starting to get a bit worried.
> >>        I've been working in the parallel per region hbase scanner and
> >> soon I'll have something usable.
> >>        I can definitely put in a few hours working/helping on it if that
> >> helps, but as previously suggested I'd rather not reinvent the wheel.
> >>        Right now I was thinking that something that plugs-in to the
> >> reference implementation (i.e. would not require a stable SE iface)
> would
> >> be a nice start.
> >>        What do you think?
> >>
> >> Best
> >> David
> >>
> >>
> >>
> >>
>

Re: timeline for dist execution

Posted by Timothy Chen <tn...@gmail.com>.

I've used Guice in the past as well, I found it to be pretty fast, and it makes injection usage fairly simple. One thing I like about it a lot is that you can set injection custom scope, which in the life time of that scope any injection to that type is referred to that same object.

I won't say it has a small learning curve though as the setup can be somewhat complex IMHO, but might be just me.

Tim

On Apr 13, 2013, at 8:07 PM, Jacques Nadeau <ja...@apache.org> wrote:

> I've not worked with Guice before.  However, I've found some DI systems
> finicky, hard to debug and slow in the past.  Can you pick a place where
> you think it make life easier/better and do a little example to share with
> the list?
> 
> Let's discuss the other items on our hangout on Tuesday.  I've tried to
> invite the list so anybody should be able to join if they'd like.
> 
> I've also pushed my WIP stuff to the Apache repo so that it is more
> visible.  It is in a development branched called execwork:
> https://github.com/apache/incubator-drill/tree/execwork
> 
> Thanks,
> Jacques
> 
> On Sat, Apr 13, 2013 at 4:28 PM, David Alves <da...@gmail.com> wrote:
> 
>> Hi Jacques
>> 
>>        Thank you for posting your wip and for what seems like a huge
>> effort.
>>        Some thoughts:
>> 
>>        I think we might finally be at a place where we could
>> divide-and-conquer and break that into a series of reviewable and testable
>> patches .
>>        I can see the following, at least:
>>        - rpc/serialization stuff
>>        - server runtime and boot/shutdown scripts
>>        - in memory data structures
>>        - cluster mgmt
>>        - distributed and local physical ops
>>        - schema related stuff
>>        - query distribution and coordination
>> 
>>        I'm happy to start the break down/test effort and to update/create
>> jira's as necessary.
>>        Also this might be a good time to start a discussion on
>> modularization.
>>        It seems that currently drill is using a mix of programmatic
>> implementation loading and ad-hoc classpath scanning (which might become an
>> issue in security managed jvm's if drill is used as library instead of as a
>> runtime).
>>        Because drill will have a lot of pluggable components at all sorts
>> of levels (SE's, query executors, data formats etc), IMO we could consider
>> moving to an externally maintained module system.
>>        I've used guice in huge modularized projects with relative
>> success. I like it due to its small dependency footprint, no-xml'ness, use
>> of javax.inject classes (meaning usually no need to reference guice
>> directly in the classes) and relatively small learning curve.
>>        I really like the config format that you've chosen and we could
>> use that to load configure modules.
>>        What do you think?
>> 
>> Best
>> David
>> 
>> On Apr 13, 2013, at 5:18 AM, Jacques Nadeau <ja...@apache.org> wrote:
>> 
>>> You can check out some of what I've been working on my GitHub at
>>> https://github.com/jacques-n/incubator-drill/tree/execwork
>>> 
>>> Key concepts are:
>>> 1) The primary in-memory data structure is a RecordBatch that contains
>> one
>>> or more fields.  Each of these fields holds a vector of values with the
>>> goal that each batch fits within a single core's L2 cache.  The
>> VectorValue
>>> structures are envisioned to be language agnostic and are backed by
>>> Netty4's ByteBuf abstraction.  These Vector formats will be strongly
>>> documented and not java centric so that moving back and forth between the
>>> native layer is reasonable.  The thinking is that there will be two
>>> additional direct compression interfaces for RLE and Dict for specialized
>>> operators who don't need fully decompressed data.  This provides a
>>> compromise between excess overhead due to compression-aware operators and
>>> losing out on any compression-aware benefits.  As you can see,
>> ValueVectors
>>> include Required (subclasses of FixedValueVector and VariableVector),
>>> nullable (a.k.a optional) and I'll be adding a Dremel-esque nested
>> repeated
>>> value set of vectors.
>>> 
>>> 2) The ByteBuf interface is also used for the protobuf based Bit2Bit and
>>> User2Bit communication. The key being that these are a push/pull combined
>>> interface to allow streaming responses and also allow direct transfer of
>>> ByteBuf's without serialization and deserialization or excessive copies.
>>> (And JNI interchange with minimal overhead.)
>>> 
>>> 3) As mentioned previously on the list, the initial ClusterCoordinator is
>>> utilizing Zk/Curator.  I've also added a quick integration with Hazelcast
>>> to manage things like the per-node queue depth for distributed scheduling
>>> purposes.  This may be a bit heavy but should get us to functional
>> faster.
>>> 
>>> This is heavily WIP so many things are staged but not connected yet.
>>> Things are broken.  And there are no tests.  But hopefully it will give
>>> you a sense of the direction I've been headed.
>>> 
>>> I'm hoping to add some more things to this over the weekend and then we
>> can
>>> go through things on Tuesday.
>>> 
>>> Thanks,
>>> Jacques
>>> 
>>> 
>>> 
>>> On Fri, Apr 12, 2013 at 12:56 PM, David Alves <da...@gmail.com>
>> wrote:
>>> 
>>>> Hi Jacques
>>>> 
>>>>       sounds good!
>>>>       will you still be able to post a link to your wip dist exec stuff
>>>> before the weekend?
>>>>       really anxious to tinker with it.
>>>> 
>>>> Best
>>>> David
>>>> 
>>>> On Apr 12, 2013, at 12:24 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>>> 
>>>>> Looks like most people can meet at 9am PST on Tuesday.   Let's meet
>> then.
>>>>> 
>>>>> J
>>>>> 
>>>>> On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <te...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Great idea.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Hi All
>>>>>>> 
>>>>>>>      I took the liberty of creating a doodle for the hangout to
>>>>>>> (hopefully) make it easier to select a time suitable for everyone.
>>>>>>>      The link is: http://www.doodle.com/t9b5n455utkpebi3
>>>>>>> 
>>>>>>> Best
>>>>>>> David Alves
>>>>>>> 
>>>>>>> On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
>>>>>>>> 
>>>>>>>> Tim
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Given David's request to have everybody review whatever I share,
>>>> let's
>>>>>>> do
>>>>>>>>> M/T/W of next week..  What times are people available?
>>>>>>>>> 
>>>>>>>>> J
>>>>>>>>> 
>>>>>>>>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I'm open 2pm pst, see when Jacques is open.
>>>>>>>>>> 
>>>>>>>>>> Tim
>>>>>>>>>> 
>>>>>>>>>> Sent from my iPad
>>>>>>>>>> 
>>>>>>>>>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Jacques
>>>>>>>>>>> 
>>>>>>>>>>>> I'll try to drop some of my work and thoughts on the list this
>>>>>> week.
>>>>>>>>>>> 
>>>>>>>>>>> That is great news!
>>>>>>>>>>> 
>>>>>>>>>>>> As always with these things, everything takes longer than one
>>>> would
>>>>>>>>>>>> like…
>>>>>>>>>>> Hopefully we can help and take of the workload.
>>>>>>>>>>> 
>>>>>>>>>>>> I am also thinking that it might be good to do a google hangout
>>>>>>>>>>>> brainstorming session soon around some of this stuff to help
>> move
>>>>>>>>>>>> things along.
>>>>>>>>>>> 
>>>>>>>>>>> A google hangout is a good idea.
>>>>>>>>>>> Wednesday would be a good day for me, say 2PM PST, how about for
>>>>>>>>>> other people?
>>>>>>>>>>> I do think we should do it after we get a chance to take a look
>> at
>>>>>>>>>> what you already have so that we're all in the same page.
>>>>>>>>>>> 
>>>>>>>>>>> Best
>>>>>>>>>>> David
>>>>>>>>>>> 
>>>>>>>>>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <
>>>>>> jacques.drill@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I'll try to drop some of my work and thoughts on the list this
>>>>>> week.
>>>>>>>>>>>> As always with these things, everything takes longer than one
>>>> would
>>>>>>>>>>>> like...
>>>>>>>>>>>> 
>>>>>>>>>>>> I am also thinking that it might be good to do a google hangout
>>>>>>>>>>>> brainstorming session soon around some of this stuff to help
>> move
>>>>>>>>>>>> things along.
>>>>>>>>>>>> 
>>>>>>>>>>>> J
>>>>>>>>>>>> 
>>>>>>>>>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It would be nice to see what's the current status and future
>> plan
>>>>>> on
>>>>>>>>>> in-mem
>>>>>>>>>>>>> data representation in the dist exec engine.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I was previously going to do something about DataValue in
>>>>>> exec/ref.
>>>>>>>>>> However
>>>>>>>>>>>>> after some reading into previous discussions in the maillist
>> and
>>>>>>> some
>>>>>>>>>> links
>>>>>>>>>>>>> in 'useful research' wiki page
>>>>>>>>>>>>> (vldb09-tutorial6.pdf<
>>>>>>>>> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
>>>>>>>>>>> 
>>>>>>>>>>>>> abadisigmod06.pdf<
>>>>>>>>>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
>>>>>>>>>>>>> etc.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I found it non-trivial and crucial building block to build
>> in-mem
>>>>>>>>> data
>>>>>>>>>>>>> structure. Incremental optimisation based on current DataValue
>>>>>> seems
>>>>>>>>> a
>>>>>>>>>> bad
>>>>>>>>>>>>> idea.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So what's your thought on this? If we could get a sketch, I
>> would
>>>>>>>>> very
>>>>>>>>>> much
>>>>>>>>>>>>> like to do something on this issue.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <
>>>>>> davidralves@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi All
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  I was wondering if there is a timeline on when we might get
>> a
>>>>>>>>>>>>>> sketch of the dist execution engine.
>>>>>>>>>>>>>>  As I mentioned before I have a little over a month to get
>>>>>>>>>>>>>> something working and I'm starting to get a bit worried.
>>>>>>>>>>>>>>  I've been working in the parallel per region hbase scanner
>>>>>> and
>>>>>>>>>>>>>> soon I'll have something usable.
>>>>>>>>>>>>>>  I can definitely put in a few hours working/helping on it if
>>>>>>>>> that
>>>>>>>>>>>>>> helps, but as previously suggested I'd rather not reinvent the
>>>>>>>>> wheel.
>>>>>>>>>>>>>>  Right now I was thinking that something that plugs-in to the
>>>>>>>>>>>>>> reference implementation (i.e. would not require a stable SE
>>>>>> iface)
>>>>>>>>>> would
>>>>>>>>>>>>>> be a nice start.
>>>>>>>>>>>>>>  What do you think?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>> David
>> 
>>

Re: timeline for dist execution

Posted by Jacques Nadeau <ja...@apache.org>.

I've not worked with Guice before.  However, I've found some DI systems
finicky, hard to debug and slow in the past.  Can you pick a place where
you think it make life easier/better and do a little example to share with
the list?

Let's discuss the other items on our hangout on Tuesday.  I've tried to
invite the list so anybody should be able to join if they'd like.

I've also pushed my WIP stuff to the Apache repo so that it is more
visible.  It is in a development branched called execwork:
https://github.com/apache/incubator-drill/tree/execwork

Thanks,
Jacques

On Sat, Apr 13, 2013 at 4:28 PM, David Alves <da...@gmail.com> wrote:

> Hi Jacques
>
>         Thank you for posting your wip and for what seems like a huge
> effort.
>         Some thoughts:
>
>         I think we might finally be at a place where we could
> divide-and-conquer and break that into a series of reviewable and testable
> patches .
>         I can see the following, at least:
>         - rpc/serialization stuff
>         - server runtime and boot/shutdown scripts
>         - in memory data structures
>         - cluster mgmt
>         - distributed and local physical ops
>         - schema related stuff
>         - query distribution and coordination
>
>         I'm happy to start the break down/test effort and to update/create
> jira's as necessary.
>         Also this might be a good time to start a discussion on
> modularization.
>         It seems that currently drill is using a mix of programmatic
> implementation loading and ad-hoc classpath scanning (which might become an
> issue in security managed jvm's if drill is used as library instead of as a
> runtime).
>         Because drill will have a lot of pluggable components at all sorts
> of levels (SE's, query executors, data formats etc), IMO we could consider
> moving to an externally maintained module system.
>         I've used guice in huge modularized projects with relative
> success. I like it due to its small dependency footprint, no-xml'ness, use
> of javax.inject classes (meaning usually no need to reference guice
> directly in the classes) and relatively small learning curve.
>         I really like the config format that you've chosen and we could
> use that to load configure modules.
>         What do you think?
>
> Best
> David
>
> On Apr 13, 2013, at 5:18 AM, Jacques Nadeau <ja...@apache.org> wrote:
>
> > You can check out some of what I've been working on my GitHub at
> > https://github.com/jacques-n/incubator-drill/tree/execwork
> >
> > Key concepts are:
> > 1) The primary in-memory data structure is a RecordBatch that contains
> one
> > or more fields.  Each of these fields holds a vector of values with the
> > goal that each batch fits within a single core's L2 cache.  The
> VectorValue
> > structures are envisioned to be language agnostic and are backed by
> > Netty4's ByteBuf abstraction.  These Vector formats will be strongly
> > documented and not java centric so that moving back and forth between the
> > native layer is reasonable.  The thinking is that there will be two
> > additional direct compression interfaces for RLE and Dict for specialized
> > operators who don't need fully decompressed data.  This provides a
> > compromise between excess overhead due to compression-aware operators and
> > losing out on any compression-aware benefits.  As you can see,
> ValueVectors
> > include Required (subclasses of FixedValueVector and VariableVector),
> > nullable (a.k.a optional) and I'll be adding a Dremel-esque nested
> repeated
> > value set of vectors.
> >
> > 2) The ByteBuf interface is also used for the protobuf based Bit2Bit and
> > User2Bit communication. The key being that these are a push/pull combined
> > interface to allow streaming responses and also allow direct transfer of
> > ByteBuf's without serialization and deserialization or excessive copies.
> > (And JNI interchange with minimal overhead.)
> >
> > 3) As mentioned previously on the list, the initial ClusterCoordinator is
> > utilizing Zk/Curator.  I've also added a quick integration with Hazelcast
> > to manage things like the per-node queue depth for distributed scheduling
> > purposes.  This may be a bit heavy but should get us to functional
> faster.
> >
> > This is heavily WIP so many things are staged but not connected yet.
> > Things are broken.  And there are no tests.  But hopefully it will give
> > you a sense of the direction I've been headed.
> >
> > I'm hoping to add some more things to this over the weekend and then we
> can
> > go through things on Tuesday.
> >
> > Thanks,
> > Jacques
> >
> >
> >
> > On Fri, Apr 12, 2013 at 12:56 PM, David Alves <da...@gmail.com>
> wrote:
> >
> >> Hi Jacques
> >>
> >>        sounds good!
> >>        will you still be able to post a link to your wip dist exec stuff
> >> before the weekend?
> >>        really anxious to tinker with it.
> >>
> >> Best
> >> David
> >>
> >> On Apr 12, 2013, at 12:24 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >>
> >>> Looks like most people can meet at 9am PST on Tuesday.   Let's meet
> then.
> >>>
> >>> J
> >>>
> >>> On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >>>
> >>>> Great idea.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com>
> >> wrote:
> >>>>
> >>>>> Hi All
> >>>>>
> >>>>>       I took the liberty of creating a doodle for the hangout to
> >>>>> (hopefully) make it easier to select a time suitable for everyone.
> >>>>>       The link is: http://www.doodle.com/t9b5n455utkpebi3
> >>>>>
> >>>>> Best
> >>>>> David Alves
> >>>>>
> >>>>> On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
> >>>>>
> >>>>>> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
> >>>>>>
> >>>>>> Tim
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Given David's request to have everybody review whatever I share,
> >> let's
> >>>>> do
> >>>>>>> M/T/W of next week..  What times are people available?
> >>>>>>>
> >>>>>>> J
> >>>>>>>
> >>>>>>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> I'm open 2pm pst, see when Jacques is open.
> >>>>>>>>
> >>>>>>>> Tim
> >>>>>>>>
> >>>>>>>> Sent from my iPad
> >>>>>>>>
> >>>>>>>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Jacques
> >>>>>>>>>
> >>>>>>>>>> I'll try to drop some of my work and thoughts on the list this
> >>>> week.
> >>>>>>>>>
> >>>>>>>>> That is great news!
> >>>>>>>>>
> >>>>>>>>>> As always with these things, everything takes longer than one
> >> would
> >>>>>>>>>> like…
> >>>>>>>>> Hopefully we can help and take of the workload.
> >>>>>>>>>
> >>>>>>>>>> I am also thinking that it might be good to do a google hangout
> >>>>>>>>>> brainstorming session soon around some of this stuff to help
> move
> >>>>>>>>>> things along.
> >>>>>>>>>
> >>>>>>>>> A google hangout is a good idea.
> >>>>>>>>> Wednesday would be a good day for me, say 2PM PST, how about for
> >>>>>>>> other people?
> >>>>>>>>> I do think we should do it after we get a chance to take a look
> at
> >>>>>>>> what you already have so that we're all in the same page.
> >>>>>>>>>
> >>>>>>>>> Best
> >>>>>>>>> David
> >>>>>>>>>
> >>>>>>>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <
> >>>> jacques.drill@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> I'll try to drop some of my work and thoughts on the list this
> >>>> week.
> >>>>>>>>>> As always with these things, everything takes longer than one
> >> would
> >>>>>>>>>> like...
> >>>>>>>>>>
> >>>>>>>>>> I am also thinking that it might be good to do a google hangout
> >>>>>>>>>> brainstorming session soon around some of this stuff to help
> move
> >>>>>>>>>> things along.
> >>>>>>>>>>
> >>>>>>>>>> J
> >>>>>>>>>>
> >>>>>>>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> +1
> >>>>>>>>>>>
> >>>>>>>>>>> It would be nice to see what's the current status and future
> plan
> >>>> on
> >>>>>>>> in-mem
> >>>>>>>>>>> data representation in the dist exec engine.
> >>>>>>>>>>>
> >>>>>>>>>>> I was previously going to do something about DataValue in
> >>>> exec/ref.
> >>>>>>>> However
> >>>>>>>>>>> after some reading into previous discussions in the maillist
> and
> >>>>> some
> >>>>>>>> links
> >>>>>>>>>>> in 'useful research' wiki page
> >>>>>>>>>>> (vldb09-tutorial6.pdf<
> >>>>>>> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
> >>>>>>>>>
> >>>>>>>>>>> abadisigmod06.pdf<
> >>>>>>>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> >>>>>>>>>>> etc.)
> >>>>>>>>>>>
> >>>>>>>>>>> I found it non-trivial and crucial building block to build
> in-mem
> >>>>>>> data
> >>>>>>>>>>> structure. Incremental optimisation based on current DataValue
> >>>> seems
> >>>>>>> a
> >>>>>>>> bad
> >>>>>>>>>>> idea.
> >>>>>>>>>>>
> >>>>>>>>>>> So what's your thought on this? If we could get a sketch, I
> would
> >>>>>>> very
> >>>>>>>> much
> >>>>>>>>>>> like to do something on this issue.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <
> >>>> davidralves@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi All
> >>>>>>>>>>>>
> >>>>>>>>>>>>   I was wondering if there is a timeline on when we might get
> a
> >>>>>>>>>>>> sketch of the dist execution engine.
> >>>>>>>>>>>>   As I mentioned before I have a little over a month to get
> >>>>>>>>>>>> something working and I'm starting to get a bit worried.
> >>>>>>>>>>>>   I've been working in the parallel per region hbase scanner
> >>>> and
> >>>>>>>>>>>> soon I'll have something usable.
> >>>>>>>>>>>>   I can definitely put in a few hours working/helping on it if
> >>>>>>> that
> >>>>>>>>>>>> helps, but as previously suggested I'd rather not reinvent the
> >>>>>>> wheel.
> >>>>>>>>>>>>   Right now I was thinking that something that plugs-in to the
> >>>>>>>>>>>> reference implementation (i.e. would not require a stable SE
> >>>> iface)
> >>>>>>>> would
> >>>>>>>>>>>> be a nice start.
> >>>>>>>>>>>>   What do you think?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best
> >>>>>>>>>>>> David
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: timeline for dist execution

Posted by David Alves <da...@gmail.com>.

Hi Jacques

	Thank you for posting your wip and for what seems like a huge effort.
	Some thoughts:	

	I think we might finally be at a place where we could divide-and-conquer and break that into a series of reviewable and testable patches .
	I can see the following, at least:
	- rpc/serialization stuff
	- server runtime and boot/shutdown scripts
	- in memory data structures
	- cluster mgmt
	- distributed and local physical ops
	- schema related stuff
	- query distribution and coordination
	
	I'm happy to start the break down/test effort and to update/create jira's as necessary.
	Also this might be a good time to start a discussion on modularization.
	It seems that currently drill is using a mix of programmatic implementation loading and ad-hoc classpath scanning (which might become an issue in security managed jvm's if drill is used as library instead of as a runtime).
	Because drill will have a lot of pluggable components at all sorts of levels (SE's, query executors, data formats etc), IMO we could consider moving to an externally maintained module system.
	I've used guice in huge modularized projects with relative success. I like it due to its small dependency footprint, no-xml'ness, use of javax.inject classes (meaning usually no need to reference guice directly in the classes) and relatively small learning curve.
	I really like the config format that you've chosen and we could use that to load configure modules.
	What do you think?

Best
David	
	
On Apr 13, 2013, at 5:18 AM, Jacques Nadeau <ja...@apache.org> wrote:

> You can check out some of what I've been working on my GitHub at
> https://github.com/jacques-n/incubator-drill/tree/execwork
> 
> Key concepts are:
> 1) The primary in-memory data structure is a RecordBatch that contains one
> or more fields.  Each of these fields holds a vector of values with the
> goal that each batch fits within a single core's L2 cache.  The VectorValue
> structures are envisioned to be language agnostic and are backed by
> Netty4's ByteBuf abstraction.  These Vector formats will be strongly
> documented and not java centric so that moving back and forth between the
> native layer is reasonable.  The thinking is that there will be two
> additional direct compression interfaces for RLE and Dict for specialized
> operators who don't need fully decompressed data.  This provides a
> compromise between excess overhead due to compression-aware operators and
> losing out on any compression-aware benefits.  As you can see, ValueVectors
> include Required (subclasses of FixedValueVector and VariableVector),
> nullable (a.k.a optional) and I'll be adding a Dremel-esque nested repeated
> value set of vectors.
> 
> 2) The ByteBuf interface is also used for the protobuf based Bit2Bit and
> User2Bit communication. The key being that these are a push/pull combined
> interface to allow streaming responses and also allow direct transfer of
> ByteBuf's without serialization and deserialization or excessive copies.
> (And JNI interchange with minimal overhead.)
> 
> 3) As mentioned previously on the list, the initial ClusterCoordinator is
> utilizing Zk/Curator.  I've also added a quick integration with Hazelcast
> to manage things like the per-node queue depth for distributed scheduling
> purposes.  This may be a bit heavy but should get us to functional faster.
> 
> This is heavily WIP so many things are staged but not connected yet.
> Things are broken.  And there are no tests.  But hopefully it will give
> you a sense of the direction I've been headed.
> 
> I'm hoping to add some more things to this over the weekend and then we can
> go through things on Tuesday.
> 
> Thanks,
> Jacques
> 
> 
> 
> On Fri, Apr 12, 2013 at 12:56 PM, David Alves <da...@gmail.com> wrote:
> 
>> Hi Jacques
>> 
>>        sounds good!
>>        will you still be able to post a link to your wip dist exec stuff
>> before the weekend?
>>        really anxious to tinker with it.
>> 
>> Best
>> David
>> 
>> On Apr 12, 2013, at 12:24 PM, Jacques Nadeau <ja...@apache.org> wrote:
>> 
>>> Looks like most people can meet at 9am PST on Tuesday.   Let's meet then.
>>> 
>>> J
>>> 
>>> On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>> 
>>>> Great idea.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com>
>> wrote:
>>>> 
>>>>> Hi All
>>>>> 
>>>>>       I took the liberty of creating a doodle for the hangout to
>>>>> (hopefully) make it easier to select a time suitable for everyone.
>>>>>       The link is: http://www.doodle.com/t9b5n455utkpebi3
>>>>> 
>>>>> Best
>>>>> David Alves
>>>>> 
>>>>> On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
>>>>> 
>>>>>> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
>>>>>> 
>>>>>> Tim
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>>> Given David's request to have everybody review whatever I share,
>> let's
>>>>> do
>>>>>>> M/T/W of next week..  What times are people available?
>>>>>>> 
>>>>>>> J
>>>>>>> 
>>>>>>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> I'm open 2pm pst, see when Jacques is open.
>>>>>>>> 
>>>>>>>> Tim
>>>>>>>> 
>>>>>>>> Sent from my iPad
>>>>>>>> 
>>>>>>>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Jacques
>>>>>>>>> 
>>>>>>>>>> I'll try to drop some of my work and thoughts on the list this
>>>> week.
>>>>>>>>> 
>>>>>>>>> That is great news!
>>>>>>>>> 
>>>>>>>>>> As always with these things, everything takes longer than one
>> would
>>>>>>>>>> like…
>>>>>>>>> Hopefully we can help and take of the workload.
>>>>>>>>> 
>>>>>>>>>> I am also thinking that it might be good to do a google hangout
>>>>>>>>>> brainstorming session soon around some of this stuff to help move
>>>>>>>>>> things along.
>>>>>>>>> 
>>>>>>>>> A google hangout is a good idea.
>>>>>>>>> Wednesday would be a good day for me, say 2PM PST, how about for
>>>>>>>> other people?
>>>>>>>>> I do think we should do it after we get a chance to take a look at
>>>>>>>> what you already have so that we're all in the same page.
>>>>>>>>> 
>>>>>>>>> Best
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <
>>>> jacques.drill@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I'll try to drop some of my work and thoughts on the list this
>>>> week.
>>>>>>>>>> As always with these things, everything takes longer than one
>> would
>>>>>>>>>> like...
>>>>>>>>>> 
>>>>>>>>>> I am also thinking that it might be good to do a google hangout
>>>>>>>>>> brainstorming session soon around some of this stuff to help move
>>>>>>>>>> things along.
>>>>>>>>>> 
>>>>>>>>>> J
>>>>>>>>>> 
>>>>>>>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> +1
>>>>>>>>>>> 
>>>>>>>>>>> It would be nice to see what's the current status and future plan
>>>> on
>>>>>>>> in-mem
>>>>>>>>>>> data representation in the dist exec engine.
>>>>>>>>>>> 
>>>>>>>>>>> I was previously going to do something about DataValue in
>>>> exec/ref.
>>>>>>>> However
>>>>>>>>>>> after some reading into previous discussions in the maillist and
>>>>> some
>>>>>>>> links
>>>>>>>>>>> in 'useful research' wiki page
>>>>>>>>>>> (vldb09-tutorial6.pdf<
>>>>>>> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
>>>>>>>>> 
>>>>>>>>>>> abadisigmod06.pdf<
>>>>>>>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
>>>>>>>>>>> etc.)
>>>>>>>>>>> 
>>>>>>>>>>> I found it non-trivial and crucial building block to build in-mem
>>>>>>> data
>>>>>>>>>>> structure. Incremental optimisation based on current DataValue
>>>> seems
>>>>>>> a
>>>>>>>> bad
>>>>>>>>>>> idea.
>>>>>>>>>>> 
>>>>>>>>>>> So what's your thought on this? If we could get a sketch, I would
>>>>>>> very
>>>>>>>> much
>>>>>>>>>>> like to do something on this issue.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <
>>>> davidralves@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi All
>>>>>>>>>>>> 
>>>>>>>>>>>>   I was wondering if there is a timeline on when we might get a
>>>>>>>>>>>> sketch of the dist execution engine.
>>>>>>>>>>>>   As I mentioned before I have a little over a month to get
>>>>>>>>>>>> something working and I'm starting to get a bit worried.
>>>>>>>>>>>>   I've been working in the parallel per region hbase scanner
>>>> and
>>>>>>>>>>>> soon I'll have something usable.
>>>>>>>>>>>>   I can definitely put in a few hours working/helping on it if
>>>>>>> that
>>>>>>>>>>>> helps, but as previously suggested I'd rather not reinvent the
>>>>>>> wheel.
>>>>>>>>>>>>   Right now I was thinking that something that plugs-in to the
>>>>>>>>>>>> reference implementation (i.e. would not require a stable SE
>>>> iface)
>>>>>>>> would
>>>>>>>>>>>> be a nice start.
>>>>>>>>>>>>   What do you think?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best
>>>>>>>>>>>> David
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: timeline for dist execution

Posted by Lisen Mu <im...@gmail.com>.

again, it's really great effort. Really expect your next push.


Thanks,

Lisen


On Mon, Apr 15, 2013 at 6:22 PM, Lisen Mu <im...@gmail.com> wrote:

> Jaques,
>
> That's really great effort!
>
> > about jni
> * Is there some kind of architecture map showing which part might be
> replaced by native implementation and how?
> * According to the name 'java-exec' I guess it will be fully functional
> with/without native code, is that correct?
>
> > about vectors
> * Is each ValueVector implementation corresponding to a data type in the
> data schema? If so, will there be 2 compressed version for each ValueVector
> implementation?
> Or, is each ValueVector implementation corresponding to a kind of internal
> representation of data? If so, I can for example use BitVector for vector
> encoding for string/int type fields, or I can use it in invisible join/semi
> join?
> * RSE would produce compressed data directly, is that correct? If so, does
> RSE need to advertise compression capability, or does RecordBatch
> self-describe compression info?
> * Not so importantly, is there any chance to make compression algorithms
> pluggable? If someone wish to add some kind of compression, it's his/hers
> duty to add new tests to verify all OPs functional.
>
> > about local exec engine (Drillbit? )
> * It will use LOP and POP in prototype/common, right? so POP
> implementations in prototype/common will have dependency on java-exec?
> * How will LOP/POP translation differ from exec/ref?
> * Is there any chance to make  pluggable? some thing like
>
>   LogicalPlan LogicalOptimizer.optimize(LogicalPlan lp);
>
> Thus I can use some scenario-oriented optimizations in external optimizer
> before every bit of Drill is well optimized first, and make it into POP if
> it can be used in a generalized way. In this way we can run/test drill
> based system on our production data set earlier.
>
> > about maven
> * Changes on dependency is expected due to heavy WIP. which is the easiest
> way to keep dependencies consistant with pom.xml? Parquet for example,
> maybe ORC in the future? We solve this by our LAN apache archiva server.
> But maybe a public available archiva server?
>
>
> On Sat, Apr 13, 2013 at 6:18 PM, Jacques Nadeau <ja...@apache.org>wrote:
>
>> You can check out some of what I've been working on my GitHub at
>> https://github.com/jacques-n/incubator-drill/tree/execwork
>>
>> Key concepts are:
>> 1) The primary in-memory data structure is a RecordBatch that contains one
>> or more fields.  Each of these fields holds a vector of values with the
>> goal that each batch fits within a single core's L2 cache.  The
>> VectorValue
>> structures are envisioned to be language agnostic and are backed by
>> Netty4's ByteBuf abstraction.  These Vector formats will be strongly
>> documented and not java centric so that moving back and forth between the
>> native layer is reasonable.  The thinking is that there will be two
>> additional direct compression interfaces for RLE and Dict for specialized
>> operators who don't need fully decompressed data.  This provides a
>> compromise between excess overhead due to compression-aware operators and
>> losing out on any compression-aware benefits.  As you can see,
>> ValueVectors
>> include Required (subclasses of FixedValueVector and VariableVector),
>> nullable (a.k.a optional) and I'll be adding a Dremel-esque nested
>> repeated
>> value set of vectors.
>>
>> 2) The ByteBuf interface is also used for the protobuf based Bit2Bit and
>> User2Bit communication. The key being that these are a push/pull combined
>> interface to allow streaming responses and also allow direct transfer of
>> ByteBuf's without serialization and deserialization or excessive copies.
>>  (And JNI interchange with minimal overhead.)
>>
>> 3) As mentioned previously on the list, the initial ClusterCoordinator is
>> utilizing Zk/Curator.  I've also added a quick integration with Hazelcast
>> to manage things like the per-node queue depth for distributed scheduling
>> purposes.  This may be a bit heavy but should get us to functional faster.
>>
>> This is heavily WIP so many things are staged but not connected yet.
>>  Things are broken.  And there are no tests.  But hopefully it will give
>> you a sense of the direction I've been headed.
>>
>> I'm hoping to add some more things to this over the weekend and then we
>> can
>> go through things on Tuesday.
>>
>> Thanks,
>> Jacques
>>
>>
>>
>> On Fri, Apr 12, 2013 at 12:56 PM, David Alves <da...@gmail.com>
>> wrote:
>>
>> > Hi Jacques
>> >
>> >         sounds good!
>> >         will you still be able to post a link to your wip dist exec
>> stuff
>> > before the weekend?
>> >         really anxious to tinker with it.
>> >
>> > Best
>> > David
>> >
>> > On Apr 12, 2013, at 12:24 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>> >
>> > > Looks like most people can meet at 9am PST on Tuesday.   Let's meet
>> then.
>> > >
>> > > J
>> > >
>> > > On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <te...@gmail.com>
>> > wrote:
>> > >
>> > >> Great idea.
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com>
>> > wrote:
>> > >>
>> > >>> Hi All
>> > >>>
>> > >>>        I took the liberty of creating a doodle for the hangout to
>> > >>> (hopefully) make it easier to select a time suitable for everyone.
>> > >>>        The link is: http://www.doodle.com/t9b5n455utkpebi3
>> > >>>
>> > >>> Best
>> > >>> David Alves
>> > >>>
>> > >>> On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
>> > >>>
>> > >>>> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
>> > >>>>
>> > >>>> Tim
>> > >>>>
>> > >>>>
>> > >>>> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <jacques@apache.org
>> >
>> > >>> wrote:
>> > >>>>
>> > >>>>> Given David's request to have everybody review whatever I share,
>> > let's
>> > >>> do
>> > >>>>> M/T/W of next week..  What times are people available?
>> > >>>>>
>> > >>>>> J
>> > >>>>>
>> > >>>>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
>> > >>> wrote:
>> > >>>>>
>> > >>>>>> I'm open 2pm pst, see when Jacques is open.
>> > >>>>>>
>> > >>>>>> Tim
>> > >>>>>>
>> > >>>>>> Sent from my iPad
>> > >>>>>>
>> > >>>>>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com>
>> > >> wrote:
>> > >>>>>>
>> > >>>>>>> Hi Jacques
>> > >>>>>>>
>> > >>>>>>>> I'll try to drop some of my work and thoughts on the list this
>> > >> week.
>> > >>>>>>>
>> > >>>>>>>  That is great news!
>> > >>>>>>>
>> > >>>>>>>> As always with these things, everything takes longer than one
>> > would
>> > >>>>>>>> like…
>> > >>>>>>>  Hopefully we can help and take of the workload.
>> > >>>>>>>
>> > >>>>>>>> I am also thinking that it might be good to do a google hangout
>> > >>>>>>>> brainstorming session soon around some of this stuff to help
>> move
>> > >>>>>>>> things along.
>> > >>>>>>>
>> > >>>>>>>  A google hangout is a good idea.
>> > >>>>>>>  Wednesday would be a good day for me, say 2PM PST, how about
>> for
>> > >>>>>> other people?
>> > >>>>>>>  I do think we should do it after we get a chance to take a
>> look at
>> > >>>>>> what you already have so that we're all in the same page.
>> > >>>>>>>
>> > >>>>>>> Best
>> > >>>>>>> David
>> > >>>>>>>
>> > >>>>>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <
>> > >> jacques.drill@gmail.com>
>> > >>>>>> wrote:
>> > >>>>>>>
>> > >>>>>>>> I'll try to drop some of my work and thoughts on the list this
>> > >> week.
>> > >>>>>>>> As always with these things, everything takes longer than one
>> > would
>> > >>>>>>>> like...
>> > >>>>>>>>
>> > >>>>>>>> I am also thinking that it might be good to do a google hangout
>> > >>>>>>>> brainstorming session soon around some of this stuff to help
>> move
>> > >>>>>>>> things along.
>> > >>>>>>>>
>> > >>>>>>>> J
>> > >>>>>>>>
>> > >>>>>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> +1
>> > >>>>>>>>>
>> > >>>>>>>>> It would be nice to see what's the current status and future
>> plan
>> > >> on
>> > >>>>>> in-mem
>> > >>>>>>>>> data representation in the dist exec engine.
>> > >>>>>>>>>
>> > >>>>>>>>> I was previously going to do something about DataValue in
>> > >> exec/ref.
>> > >>>>>> However
>> > >>>>>>>>> after some reading into previous discussions in the maillist
>> and
>> > >>> some
>> > >>>>>> links
>> > >>>>>>>>> in 'useful research' wiki page
>> > >>>>>>>>> (vldb09-tutorial6.pdf<
>> > >>>>> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
>> > >>>>>>>
>> > >>>>>>>>> abadisigmod06.pdf<
>> > >>>>>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
>> > >>>>>>>>> etc.)
>> > >>>>>>>>>
>> > >>>>>>>>> I found it non-trivial and crucial building block to build
>> in-mem
>> > >>>>> data
>> > >>>>>>>>> structure. Incremental optimisation based on current DataValue
>> > >> seems
>> > >>>>> a
>> > >>>>>> bad
>> > >>>>>>>>> idea.
>> > >>>>>>>>>
>> > >>>>>>>>> So what's your thought on this? If we could get a sketch, I
>> would
>> > >>>>> very
>> > >>>>>> much
>> > >>>>>>>>> like to do something on this issue.
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <
>> > >> davidralves@gmail.com>
>> > >>>>>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Hi All
>> > >>>>>>>>>>
>> > >>>>>>>>>>    I was wondering if there is a timeline on when we might
>> get a
>> > >>>>>>>>>> sketch of the dist execution engine.
>> > >>>>>>>>>>    As I mentioned before I have a little over a month to get
>> > >>>>>>>>>> something working and I'm starting to get a bit worried.
>> > >>>>>>>>>>    I've been working in the parallel per region hbase scanner
>> > >> and
>> > >>>>>>>>>> soon I'll have something usable.
>> > >>>>>>>>>>    I can definitely put in a few hours working/helping on it
>> if
>> > >>>>> that
>> > >>>>>>>>>> helps, but as previously suggested I'd rather not reinvent
>> the
>> > >>>>> wheel.
>> > >>>>>>>>>>    Right now I was thinking that something that plugs-in to
>> the
>> > >>>>>>>>>> reference implementation (i.e. would not require a stable SE
>> > >> iface)
>> > >>>>>> would
>> > >>>>>>>>>> be a nice start.
>> > >>>>>>>>>>    What do you think?
>> > >>>>>>>>>>
>> > >>>>>>>>>> Best
>> > >>>>>>>>>> David
>> > >>>>>>>
>> > >>>>>>
>> > >>>>>
>> > >>>
>> > >>>
>> > >>
>> >
>> >
>>
>
>

Re: timeline for dist execution

Posted by Lisen Mu <im...@gmail.com>.

Jaques,

That's really great effort!

> about jni
* Is there some kind of architecture map showing which part might be
replaced by native implementation and how?
* According to the name 'java-exec' I guess it will be fully functional
with/without native code, is that correct?

> about vectors
* Is each ValueVector implementation corresponding to a data type in the
data schema? If so, will there be 2 compressed version for each ValueVector
implementation?
Or, is each ValueVector implementation corresponding to a kind of internal
representation of data? If so, I can for example use BitVector for vector
encoding for string/int type fields, or I can use it in invisible join/semi
join?
* RSE would produce compressed data directly, is that correct? If so, does
RSE need to advertise compression capability, or does RecordBatch
self-describe compression info?
* Not so importantly, is there any chance to make compression algorithms
pluggable? If someone wish to add some kind of compression, it's his/hers
duty to add new tests to verify all OPs functional.

> about local exec engine (Drillbit? )
* It will use LOP and POP in prototype/common, right? so POP
implementations in prototype/common will have dependency on java-exec?
* How will LOP/POP translation differ from exec/ref?
* Is there any chance to make  pluggable? some thing like

  LogicalPlan LogicalOptimizer.optimize(LogicalPlan lp);

Thus I can use some scenario-oriented optimizations in external optimizer
before every bit of Drill is well optimized first, and make it into POP if
it can be used in a generalized way. In this way we can run/test drill
based system on our production data set earlier.

> about maven
* Changes on dependency is expected due to heavy WIP. which is the easiest
way to keep dependencies consistant with pom.xml? Parquet for example,
maybe ORC in the future? We solve this by our LAN apache archiva server.
But maybe a public available archiva server?


On Sat, Apr 13, 2013 at 6:18 PM, Jacques Nadeau <ja...@apache.org> wrote:

> You can check out some of what I've been working on my GitHub at
> https://github.com/jacques-n/incubator-drill/tree/execwork
>
> Key concepts are:
> 1) The primary in-memory data structure is a RecordBatch that contains one
> or more fields.  Each of these fields holds a vector of values with the
> goal that each batch fits within a single core's L2 cache.  The VectorValue
> structures are envisioned to be language agnostic and are backed by
> Netty4's ByteBuf abstraction.  These Vector formats will be strongly
> documented and not java centric so that moving back and forth between the
> native layer is reasonable.  The thinking is that there will be two
> additional direct compression interfaces for RLE and Dict for specialized
> operators who don't need fully decompressed data.  This provides a
> compromise between excess overhead due to compression-aware operators and
> losing out on any compression-aware benefits.  As you can see, ValueVectors
> include Required (subclasses of FixedValueVector and VariableVector),
> nullable (a.k.a optional) and I'll be adding a Dremel-esque nested repeated
> value set of vectors.
>
> 2) The ByteBuf interface is also used for the protobuf based Bit2Bit and
> User2Bit communication. The key being that these are a push/pull combined
> interface to allow streaming responses and also allow direct transfer of
> ByteBuf's without serialization and deserialization or excessive copies.
>  (And JNI interchange with minimal overhead.)
>
> 3) As mentioned previously on the list, the initial ClusterCoordinator is
> utilizing Zk/Curator.  I've also added a quick integration with Hazelcast
> to manage things like the per-node queue depth for distributed scheduling
> purposes.  This may be a bit heavy but should get us to functional faster.
>
> This is heavily WIP so many things are staged but not connected yet.
>  Things are broken.  And there are no tests.  But hopefully it will give
> you a sense of the direction I've been headed.
>
> I'm hoping to add some more things to this over the weekend and then we can
> go through things on Tuesday.
>
> Thanks,
> Jacques
>
>
>
> On Fri, Apr 12, 2013 at 12:56 PM, David Alves <da...@gmail.com>
> wrote:
>
> > Hi Jacques
> >
> >         sounds good!
> >         will you still be able to post a link to your wip dist exec stuff
> > before the weekend?
> >         really anxious to tinker with it.
> >
> > Best
> > David
> >
> > On Apr 12, 2013, at 12:24 PM, Jacques Nadeau <ja...@apache.org> wrote:
> >
> > > Looks like most people can meet at 9am PST on Tuesday.   Let's meet
> then.
> > >
> > > J
> > >
> > > On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> > >
> > >> Great idea.
> > >>
> > >>
> > >>
> > >> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com>
> > wrote:
> > >>
> > >>> Hi All
> > >>>
> > >>>        I took the liberty of creating a doodle for the hangout to
> > >>> (hopefully) make it easier to select a time suitable for everyone.
> > >>>        The link is: http://www.doodle.com/t9b5n455utkpebi3
> > >>>
> > >>> Best
> > >>> David Alves
> > >>>
> > >>> On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
> > >>>
> > >>>> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
> > >>>>
> > >>>> Tim
> > >>>>
> > >>>>
> > >>>> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org>
> > >>> wrote:
> > >>>>
> > >>>>> Given David's request to have everybody review whatever I share,
> > let's
> > >>> do
> > >>>>> M/T/W of next week..  What times are people available?
> > >>>>>
> > >>>>> J
> > >>>>>
> > >>>>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
> > >>> wrote:
> > >>>>>
> > >>>>>> I'm open 2pm pst, see when Jacques is open.
> > >>>>>>
> > >>>>>> Tim
> > >>>>>>
> > >>>>>> Sent from my iPad
> > >>>>>>
> > >>>>>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com>
> > >> wrote:
> > >>>>>>
> > >>>>>>> Hi Jacques
> > >>>>>>>
> > >>>>>>>> I'll try to drop some of my work and thoughts on the list this
> > >> week.
> > >>>>>>>
> > >>>>>>>  That is great news!
> > >>>>>>>
> > >>>>>>>> As always with these things, everything takes longer than one
> > would
> > >>>>>>>> like…
> > >>>>>>>  Hopefully we can help and take of the workload.
> > >>>>>>>
> > >>>>>>>> I am also thinking that it might be good to do a google hangout
> > >>>>>>>> brainstorming session soon around some of this stuff to help
> move
> > >>>>>>>> things along.
> > >>>>>>>
> > >>>>>>>  A google hangout is a good idea.
> > >>>>>>>  Wednesday would be a good day for me, say 2PM PST, how about for
> > >>>>>> other people?
> > >>>>>>>  I do think we should do it after we get a chance to take a look
> at
> > >>>>>> what you already have so that we're all in the same page.
> > >>>>>>>
> > >>>>>>> Best
> > >>>>>>> David
> > >>>>>>>
> > >>>>>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <
> > >> jacques.drill@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I'll try to drop some of my work and thoughts on the list this
> > >> week.
> > >>>>>>>> As always with these things, everything takes longer than one
> > would
> > >>>>>>>> like...
> > >>>>>>>>
> > >>>>>>>> I am also thinking that it might be good to do a google hangout
> > >>>>>>>> brainstorming session soon around some of this stuff to help
> move
> > >>>>>>>> things along.
> > >>>>>>>>
> > >>>>>>>> J
> > >>>>>>>>
> > >>>>>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> +1
> > >>>>>>>>>
> > >>>>>>>>> It would be nice to see what's the current status and future
> plan
> > >> on
> > >>>>>> in-mem
> > >>>>>>>>> data representation in the dist exec engine.
> > >>>>>>>>>
> > >>>>>>>>> I was previously going to do something about DataValue in
> > >> exec/ref.
> > >>>>>> However
> > >>>>>>>>> after some reading into previous discussions in the maillist
> and
> > >>> some
> > >>>>>> links
> > >>>>>>>>> in 'useful research' wiki page
> > >>>>>>>>> (vldb09-tutorial6.pdf<
> > >>>>> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
> > >>>>>>>
> > >>>>>>>>> abadisigmod06.pdf<
> > >>>>>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> > >>>>>>>>> etc.)
> > >>>>>>>>>
> > >>>>>>>>> I found it non-trivial and crucial building block to build
> in-mem
> > >>>>> data
> > >>>>>>>>> structure. Incremental optimisation based on current DataValue
> > >> seems
> > >>>>> a
> > >>>>>> bad
> > >>>>>>>>> idea.
> > >>>>>>>>>
> > >>>>>>>>> So what's your thought on this? If we could get a sketch, I
> would
> > >>>>> very
> > >>>>>> much
> > >>>>>>>>> like to do something on this issue.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <
> > >> davidralves@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi All
> > >>>>>>>>>>
> > >>>>>>>>>>    I was wondering if there is a timeline on when we might
> get a
> > >>>>>>>>>> sketch of the dist execution engine.
> > >>>>>>>>>>    As I mentioned before I have a little over a month to get
> > >>>>>>>>>> something working and I'm starting to get a bit worried.
> > >>>>>>>>>>    I've been working in the parallel per region hbase scanner
> > >> and
> > >>>>>>>>>> soon I'll have something usable.
> > >>>>>>>>>>    I can definitely put in a few hours working/helping on it
> if
> > >>>>> that
> > >>>>>>>>>> helps, but as previously suggested I'd rather not reinvent the
> > >>>>> wheel.
> > >>>>>>>>>>    Right now I was thinking that something that plugs-in to
> the
> > >>>>>>>>>> reference implementation (i.e. would not require a stable SE
> > >> iface)
> > >>>>>> would
> > >>>>>>>>>> be a nice start.
> > >>>>>>>>>>    What do you think?
> > >>>>>>>>>>
> > >>>>>>>>>> Best
> > >>>>>>>>>> David
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>>
> > >>
> >
> >
>

Re: timeline for dist execution

Posted by Jacques Nadeau <ja...@apache.org>.

You can check out some of what I've been working on my GitHub at
https://github.com/jacques-n/incubator-drill/tree/execwork

Key concepts are:
1) The primary in-memory data structure is a RecordBatch that contains one
or more fields.  Each of these fields holds a vector of values with the
goal that each batch fits within a single core's L2 cache.  The VectorValue
structures are envisioned to be language agnostic and are backed by
Netty4's ByteBuf abstraction.  These Vector formats will be strongly
documented and not java centric so that moving back and forth between the
native layer is reasonable.  The thinking is that there will be two
additional direct compression interfaces for RLE and Dict for specialized
operators who don't need fully decompressed data.  This provides a
compromise between excess overhead due to compression-aware operators and
losing out on any compression-aware benefits.  As you can see, ValueVectors
include Required (subclasses of FixedValueVector and VariableVector),
nullable (a.k.a optional) and I'll be adding a Dremel-esque nested repeated
value set of vectors.

2) The ByteBuf interface is also used for the protobuf based Bit2Bit and
User2Bit communication. The key being that these are a push/pull combined
interface to allow streaming responses and also allow direct transfer of
ByteBuf's without serialization and deserialization or excessive copies.
 (And JNI interchange with minimal overhead.)

3) As mentioned previously on the list, the initial ClusterCoordinator is
utilizing Zk/Curator.  I've also added a quick integration with Hazelcast
to manage things like the per-node queue depth for distributed scheduling
purposes.  This may be a bit heavy but should get us to functional faster.

This is heavily WIP so many things are staged but not connected yet.
 Things are broken.  And there are no tests.  But hopefully it will give
you a sense of the direction I've been headed.

I'm hoping to add some more things to this over the weekend and then we can
go through things on Tuesday.

Thanks,
Jacques



On Fri, Apr 12, 2013 at 12:56 PM, David Alves <da...@gmail.com> wrote:

> Hi Jacques
>
>         sounds good!
>         will you still be able to post a link to your wip dist exec stuff
> before the weekend?
>         really anxious to tinker with it.
>
> Best
> David
>
> On Apr 12, 2013, at 12:24 PM, Jacques Nadeau <ja...@apache.org> wrote:
>
> > Looks like most people can meet at 9am PST on Tuesday.   Let's meet then.
> >
> > J
> >
> > On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> >> Great idea.
> >>
> >>
> >>
> >> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com>
> wrote:
> >>
> >>> Hi All
> >>>
> >>>        I took the liberty of creating a doodle for the hangout to
> >>> (hopefully) make it easier to select a time suitable for everyone.
> >>>        The link is: http://www.doodle.com/t9b5n455utkpebi3
> >>>
> >>> Best
> >>> David Alves
> >>>
> >>> On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
> >>>
> >>>> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
> >>>>
> >>>> Tim
> >>>>
> >>>>
> >>>> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org>
> >>> wrote:
> >>>>
> >>>>> Given David's request to have everybody review whatever I share,
> let's
> >>> do
> >>>>> M/T/W of next week..  What times are people available?
> >>>>>
> >>>>> J
> >>>>>
> >>>>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> I'm open 2pm pst, see when Jacques is open.
> >>>>>>
> >>>>>> Tim
> >>>>>>
> >>>>>> Sent from my iPad
> >>>>>>
> >>>>>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> Hi Jacques
> >>>>>>>
> >>>>>>>> I'll try to drop some of my work and thoughts on the list this
> >> week.
> >>>>>>>
> >>>>>>>  That is great news!
> >>>>>>>
> >>>>>>>> As always with these things, everything takes longer than one
> would
> >>>>>>>> like…
> >>>>>>>  Hopefully we can help and take of the workload.
> >>>>>>>
> >>>>>>>> I am also thinking that it might be good to do a google hangout
> >>>>>>>> brainstorming session soon around some of this stuff to help move
> >>>>>>>> things along.
> >>>>>>>
> >>>>>>>  A google hangout is a good idea.
> >>>>>>>  Wednesday would be a good day for me, say 2PM PST, how about for
> >>>>>> other people?
> >>>>>>>  I do think we should do it after we get a chance to take a look at
> >>>>>> what you already have so that we're all in the same page.
> >>>>>>>
> >>>>>>> Best
> >>>>>>> David
> >>>>>>>
> >>>>>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <
> >> jacques.drill@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> I'll try to drop some of my work and thoughts on the list this
> >> week.
> >>>>>>>> As always with these things, everything takes longer than one
> would
> >>>>>>>> like...
> >>>>>>>>
> >>>>>>>> I am also thinking that it might be good to do a google hangout
> >>>>>>>> brainstorming session soon around some of this stuff to help move
> >>>>>>>> things along.
> >>>>>>>>
> >>>>>>>> J
> >>>>>>>>
> >>>>>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> It would be nice to see what's the current status and future plan
> >> on
> >>>>>> in-mem
> >>>>>>>>> data representation in the dist exec engine.
> >>>>>>>>>
> >>>>>>>>> I was previously going to do something about DataValue in
> >> exec/ref.
> >>>>>> However
> >>>>>>>>> after some reading into previous discussions in the maillist and
> >>> some
> >>>>>> links
> >>>>>>>>> in 'useful research' wiki page
> >>>>>>>>> (vldb09-tutorial6.pdf<
> >>>>> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
> >>>>>>>
> >>>>>>>>> abadisigmod06.pdf<
> >>>>>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> >>>>>>>>> etc.)
> >>>>>>>>>
> >>>>>>>>> I found it non-trivial and crucial building block to build in-mem
> >>>>> data
> >>>>>>>>> structure. Incremental optimisation based on current DataValue
> >> seems
> >>>>> a
> >>>>>> bad
> >>>>>>>>> idea.
> >>>>>>>>>
> >>>>>>>>> So what's your thought on this? If we could get a sketch, I would
> >>>>> very
> >>>>>> much
> >>>>>>>>> like to do something on this issue.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <
> >> davidralves@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi All
> >>>>>>>>>>
> >>>>>>>>>>    I was wondering if there is a timeline on when we might get a
> >>>>>>>>>> sketch of the dist execution engine.
> >>>>>>>>>>    As I mentioned before I have a little over a month to get
> >>>>>>>>>> something working and I'm starting to get a bit worried.
> >>>>>>>>>>    I've been working in the parallel per region hbase scanner
> >> and
> >>>>>>>>>> soon I'll have something usable.
> >>>>>>>>>>    I can definitely put in a few hours working/helping on it if
> >>>>> that
> >>>>>>>>>> helps, but as previously suggested I'd rather not reinvent the
> >>>>> wheel.
> >>>>>>>>>>    Right now I was thinking that something that plugs-in to the
> >>>>>>>>>> reference implementation (i.e. would not require a stable SE
> >> iface)
> >>>>>> would
> >>>>>>>>>> be a nice start.
> >>>>>>>>>>    What do you think?
> >>>>>>>>>>
> >>>>>>>>>> Best
> >>>>>>>>>> David
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: timeline for dist execution

Posted by David Alves <da...@gmail.com>.

Hi Jacques

	sounds good!
	will you still be able to post a link to your wip dist exec stuff before the weekend?
	really anxious to tinker with it.

Best
David

On Apr 12, 2013, at 12:24 PM, Jacques Nadeau <ja...@apache.org> wrote:

> Looks like most people can meet at 9am PST on Tuesday.   Let's meet then.
> 
> J
> 
> On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <te...@gmail.com> wrote:
> 
>> Great idea.
>> 
>> 
>> 
>> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com> wrote:
>> 
>>> Hi All
>>> 
>>>        I took the liberty of creating a doodle for the hangout to
>>> (hopefully) make it easier to select a time suitable for everyone.
>>>        The link is: http://www.doodle.com/t9b5n455utkpebi3
>>> 
>>> Best
>>> David Alves
>>> 
>>> On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
>>> 
>>>> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
>>>> 
>>>> Tim
>>>> 
>>>> 
>>>> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>>> 
>>>>> Given David's request to have everybody review whatever I share, let's
>>> do
>>>>> M/T/W of next week..  What times are people available?
>>>>> 
>>>>> J
>>>>> 
>>>>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> I'm open 2pm pst, see when Jacques is open.
>>>>>> 
>>>>>> Tim
>>>>>> 
>>>>>> Sent from my iPad
>>>>>> 
>>>>>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> Hi Jacques
>>>>>>> 
>>>>>>>> I'll try to drop some of my work and thoughts on the list this
>> week.
>>>>>>> 
>>>>>>>  That is great news!
>>>>>>> 
>>>>>>>> As always with these things, everything takes longer than one would
>>>>>>>> like…
>>>>>>>  Hopefully we can help and take of the workload.
>>>>>>> 
>>>>>>>> I am also thinking that it might be good to do a google hangout
>>>>>>>> brainstorming session soon around some of this stuff to help move
>>>>>>>> things along.
>>>>>>> 
>>>>>>>  A google hangout is a good idea.
>>>>>>>  Wednesday would be a good day for me, say 2PM PST, how about for
>>>>>> other people?
>>>>>>>  I do think we should do it after we get a chance to take a look at
>>>>>> what you already have so that we're all in the same page.
>>>>>>> 
>>>>>>> Best
>>>>>>> David
>>>>>>> 
>>>>>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <
>> jacques.drill@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> I'll try to drop some of my work and thoughts on the list this
>> week.
>>>>>>>> As always with these things, everything takes longer than one would
>>>>>>>> like...
>>>>>>>> 
>>>>>>>> I am also thinking that it might be good to do a google hangout
>>>>>>>> brainstorming session soon around some of this stuff to help move
>>>>>>>> things along.
>>>>>>>> 
>>>>>>>> J
>>>>>>>> 
>>>>>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> It would be nice to see what's the current status and future plan
>> on
>>>>>> in-mem
>>>>>>>>> data representation in the dist exec engine.
>>>>>>>>> 
>>>>>>>>> I was previously going to do something about DataValue in
>> exec/ref.
>>>>>> However
>>>>>>>>> after some reading into previous discussions in the maillist and
>>> some
>>>>>> links
>>>>>>>>> in 'useful research' wiki page
>>>>>>>>> (vldb09-tutorial6.pdf<
>>>>> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
>>>>>>> 
>>>>>>>>> abadisigmod06.pdf<
>>>>>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
>>>>>>>>> etc.)
>>>>>>>>> 
>>>>>>>>> I found it non-trivial and crucial building block to build in-mem
>>>>> data
>>>>>>>>> structure. Incremental optimisation based on current DataValue
>> seems
>>>>> a
>>>>>> bad
>>>>>>>>> idea.
>>>>>>>>> 
>>>>>>>>> So what's your thought on this? If we could get a sketch, I would
>>>>> very
>>>>>> much
>>>>>>>>> like to do something on this issue.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <
>> davidralves@gmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi All
>>>>>>>>>> 
>>>>>>>>>>    I was wondering if there is a timeline on when we might get a
>>>>>>>>>> sketch of the dist execution engine.
>>>>>>>>>>    As I mentioned before I have a little over a month to get
>>>>>>>>>> something working and I'm starting to get a bit worried.
>>>>>>>>>>    I've been working in the parallel per region hbase scanner
>> and
>>>>>>>>>> soon I'll have something usable.
>>>>>>>>>>    I can definitely put in a few hours working/helping on it if
>>>>> that
>>>>>>>>>> helps, but as previously suggested I'd rather not reinvent the
>>>>> wheel.
>>>>>>>>>>    Right now I was thinking that something that plugs-in to the
>>>>>>>>>> reference implementation (i.e. would not require a stable SE
>> iface)
>>>>>> would
>>>>>>>>>> be a nice start.
>>>>>>>>>>    What do you think?
>>>>>>>>>> 
>>>>>>>>>> Best
>>>>>>>>>> David
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: timeline for dist execution

Posted by Jacques Nadeau <ja...@apache.org>.

Looks like most people can meet at 9am PST on Tuesday.   Let's meet then.

J

On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <te...@gmail.com> wrote:

> Great idea.
>
>
>
> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com> wrote:
>
> > Hi All
> >
> >         I took the liberty of creating a doodle for the hangout to
> > (hopefully) make it easier to select a time suitable for everyone.
> >         The link is: http://www.doodle.com/t9b5n455utkpebi3
> >
> > Best
> > David Alves
> >
> > On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
> >
> > > I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
> > >
> > > Tim
> > >
> > >
> > > On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org>
> > wrote:
> > >
> > >> Given David's request to have everybody review whatever I share, let's
> > do
> > >> M/T/W of next week..  What times are people available?
> > >>
> > >> J
> > >>
> > >> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
> > wrote:
> > >>
> > >>> I'm open 2pm pst, see when Jacques is open.
> > >>>
> > >>> Tim
> > >>>
> > >>> Sent from my iPad
> > >>>
> > >>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com>
> wrote:
> > >>>
> > >>>> Hi Jacques
> > >>>>
> > >>>>> I'll try to drop some of my work and thoughts on the list this
> week.
> > >>>>
> > >>>>   That is great news!
> > >>>>
> > >>>>> As always with these things, everything takes longer than one would
> > >>>>> like…
> > >>>>   Hopefully we can help and take of the workload.
> > >>>>
> > >>>>> I am also thinking that it might be good to do a google hangout
> > >>>>> brainstorming session soon around some of this stuff to help move
> > >>>>> things along.
> > >>>>
> > >>>>   A google hangout is a good idea.
> > >>>>   Wednesday would be a good day for me, say 2PM PST, how about for
> > >>> other people?
> > >>>>   I do think we should do it after we get a chance to take a look at
> > >>> what you already have so that we're all in the same page.
> > >>>>
> > >>>> Best
> > >>>> David
> > >>>>
> > >>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <
> jacques.drill@gmail.com>
> > >>> wrote:
> > >>>>
> > >>>>> I'll try to drop some of my work and thoughts on the list this
> week.
> > >>>>> As always with these things, everything takes longer than one would
> > >>>>> like...
> > >>>>>
> > >>>>> I am also thinking that it might be good to do a google hangout
> > >>>>> brainstorming session soon around some of this stuff to help move
> > >>>>> things along.
> > >>>>>
> > >>>>> J
> > >>>>>
> > >>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> > >>>>>
> > >>>>>> +1
> > >>>>>>
> > >>>>>> It would be nice to see what's the current status and future plan
> on
> > >>> in-mem
> > >>>>>> data representation in the dist exec engine.
> > >>>>>>
> > >>>>>> I was previously going to do something about DataValue in
> exec/ref.
> > >>> However
> > >>>>>> after some reading into previous discussions in the maillist and
> > some
> > >>> links
> > >>>>>> in 'useful research' wiki page
> > >>>>>> (vldb09-tutorial6.pdf<
> > >> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
> > >>>>
> > >>>>>> abadisigmod06.pdf<
> > >>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> > >>>>>> etc.)
> > >>>>>>
> > >>>>>> I found it non-trivial and crucial building block to build in-mem
> > >> data
> > >>>>>> structure. Incremental optimisation based on current DataValue
> seems
> > >> a
> > >>> bad
> > >>>>>> idea.
> > >>>>>>
> > >>>>>> So what's your thought on this? If we could get a sketch, I would
> > >> very
> > >>> much
> > >>>>>> like to do something on this issue.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <
> davidralves@gmail.com>
> > >>> wrote:
> > >>>>>>
> > >>>>>>> Hi All
> > >>>>>>>
> > >>>>>>>     I was wondering if there is a timeline on when we might get a
> > >>>>>>> sketch of the dist execution engine.
> > >>>>>>>     As I mentioned before I have a little over a month to get
> > >>>>>>> something working and I'm starting to get a bit worried.
> > >>>>>>>     I've been working in the parallel per region hbase scanner
> and
> > >>>>>>> soon I'll have something usable.
> > >>>>>>>     I can definitely put in a few hours working/helping on it if
> > >> that
> > >>>>>>> helps, but as previously suggested I'd rather not reinvent the
> > >> wheel.
> > >>>>>>>     Right now I was thinking that something that plugs-in to the
> > >>>>>>> reference implementation (i.e. would not require a stable SE
> iface)
> > >>> would
> > >>>>>>> be a nice start.
> > >>>>>>>     What do you think?
> > >>>>>>>
> > >>>>>>> Best
> > >>>>>>> David
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: timeline for dist execution

Posted by Ted Dunning <te...@gmail.com>.

Great idea.



On Mon, Apr 8, 2013 at 2:14 PM, David Alves <da...@gmail.com> wrote:

> Hi All
>
>         I took the liberty of creating a doodle for the hangout to
> (hopefully) make it easier to select a time suitable for everyone.
>         The link is: http://www.doodle.com/t9b5n455utkpebi3
>
> Best
> David Alves
>
> On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:
>
> > I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
> >
> > Tim
> >
> >
> > On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> >> Given David's request to have everybody review whatever I share, let's
> do
> >> M/T/W of next week..  What times are people available?
> >>
> >> J
> >>
> >> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com>
> wrote:
> >>
> >>> I'm open 2pm pst, see when Jacques is open.
> >>>
> >>> Tim
> >>>
> >>> Sent from my iPad
> >>>
> >>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com> wrote:
> >>>
> >>>> Hi Jacques
> >>>>
> >>>>> I'll try to drop some of my work and thoughts on the list this week.
> >>>>
> >>>>   That is great news!
> >>>>
> >>>>> As always with these things, everything takes longer than one would
> >>>>> like…
> >>>>   Hopefully we can help and take of the workload.
> >>>>
> >>>>> I am also thinking that it might be good to do a google hangout
> >>>>> brainstorming session soon around some of this stuff to help move
> >>>>> things along.
> >>>>
> >>>>   A google hangout is a good idea.
> >>>>   Wednesday would be a good day for me, say 2PM PST, how about for
> >>> other people?
> >>>>   I do think we should do it after we get a chance to take a look at
> >>> what you already have so that we're all in the same page.
> >>>>
> >>>> Best
> >>>> David
> >>>>
> >>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <ja...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> I'll try to drop some of my work and thoughts on the list this week.
> >>>>> As always with these things, everything takes longer than one would
> >>>>> like...
> >>>>>
> >>>>> I am also thinking that it might be good to do a google hangout
> >>>>> brainstorming session soon around some of this stuff to help move
> >>>>> things along.
> >>>>>
> >>>>> J
> >>>>>
> >>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> It would be nice to see what's the current status and future plan on
> >>> in-mem
> >>>>>> data representation in the dist exec engine.
> >>>>>>
> >>>>>> I was previously going to do something about DataValue in exec/ref.
> >>> However
> >>>>>> after some reading into previous discussions in the maillist and
> some
> >>> links
> >>>>>> in 'useful research' wiki page
> >>>>>> (vldb09-tutorial6.pdf<
> >> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
> >>>>
> >>>>>> abadisigmod06.pdf<
> >>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> >>>>>> etc.)
> >>>>>>
> >>>>>> I found it non-trivial and crucial building block to build in-mem
> >> data
> >>>>>> structure. Incremental optimisation based on current DataValue seems
> >> a
> >>> bad
> >>>>>> idea.
> >>>>>>
> >>>>>> So what's your thought on this? If we could get a sketch, I would
> >> very
> >>> much
> >>>>>> like to do something on this issue.
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>>> Hi All
> >>>>>>>
> >>>>>>>     I was wondering if there is a timeline on when we might get a
> >>>>>>> sketch of the dist execution engine.
> >>>>>>>     As I mentioned before I have a little over a month to get
> >>>>>>> something working and I'm starting to get a bit worried.
> >>>>>>>     I've been working in the parallel per region hbase scanner and
> >>>>>>> soon I'll have something usable.
> >>>>>>>     I can definitely put in a few hours working/helping on it if
> >> that
> >>>>>>> helps, but as previously suggested I'd rather not reinvent the
> >> wheel.
> >>>>>>>     Right now I was thinking that something that plugs-in to the
> >>>>>>> reference implementation (i.e. would not require a stable SE iface)
> >>> would
> >>>>>>> be a nice start.
> >>>>>>>     What do you think?
> >>>>>>>
> >>>>>>> Best
> >>>>>>> David
> >>>>
> >>>
> >>
>
>

Re: timeline for dist execution

Posted by David Alves <da...@gmail.com>.

Hi All
	
	I took the liberty of creating a doodle for the hangout to (hopefully) make it easier to select a time suitable for everyone.
	The link is: http://www.doodle.com/t9b5n455utkpebi3

Best
David Alves
	
On Apr 8, 2013, at 1:13 PM, Timothy Chen <tn...@gmail.com> wrote:

> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
> 
> Tim
> 
> 
> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org> wrote:
> 
>> Given David's request to have everybody review whatever I share, let's do
>> M/T/W of next week..  What times are people available?
>> 
>> J
>> 
>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com> wrote:
>> 
>>> I'm open 2pm pst, see when Jacques is open.
>>> 
>>> Tim
>>> 
>>> Sent from my iPad
>>> 
>>> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com> wrote:
>>> 
>>>> Hi Jacques
>>>> 
>>>>> I'll try to drop some of my work and thoughts on the list this week.
>>>> 
>>>>   That is great news!
>>>> 
>>>>> As always with these things, everything takes longer than one would
>>>>> like…
>>>>   Hopefully we can help and take of the workload.
>>>> 
>>>>> I am also thinking that it might be good to do a google hangout
>>>>> brainstorming session soon around some of this stuff to help move
>>>>> things along.
>>>> 
>>>>   A google hangout is a good idea.
>>>>   Wednesday would be a good day for me, say 2PM PST, how about for
>>> other people?
>>>>   I do think we should do it after we get a chance to take a look at
>>> what you already have so that we're all in the same page.
>>>> 
>>>> Best
>>>> David
>>>> 
>>>> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <ja...@gmail.com>
>>> wrote:
>>>> 
>>>>> I'll try to drop some of my work and thoughts on the list this week.
>>>>> As always with these things, everything takes longer than one would
>>>>> like...
>>>>> 
>>>>> I am also thinking that it might be good to do a google hangout
>>>>> brainstorming session soon around some of this stuff to help move
>>>>> things along.
>>>>> 
>>>>> J
>>>>> 
>>>>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> It would be nice to see what's the current status and future plan on
>>> in-mem
>>>>>> data representation in the dist exec engine.
>>>>>> 
>>>>>> I was previously going to do something about DataValue in exec/ref.
>>> However
>>>>>> after some reading into previous discussions in the maillist and some
>>> links
>>>>>> in 'useful research' wiki page
>>>>>> (vldb09-tutorial6.pdf<
>> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
>>>> 
>>>>>> abadisigmod06.pdf<
>>> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
>>>>>> etc.)
>>>>>> 
>>>>>> I found it non-trivial and crucial building block to build in-mem
>> data
>>>>>> structure. Incremental optimisation based on current DataValue seems
>> a
>>> bad
>>>>>> idea.
>>>>>> 
>>>>>> So what's your thought on this? If we could get a sketch, I would
>> very
>>> much
>>>>>> like to do something on this issue.
>>>>>> 
>>>>>> 
>>>>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> Hi All
>>>>>>> 
>>>>>>>     I was wondering if there is a timeline on when we might get a
>>>>>>> sketch of the dist execution engine.
>>>>>>>     As I mentioned before I have a little over a month to get
>>>>>>> something working and I'm starting to get a bit worried.
>>>>>>>     I've been working in the parallel per region hbase scanner and
>>>>>>> soon I'll have something usable.
>>>>>>>     I can definitely put in a few hours working/helping on it if
>> that
>>>>>>> helps, but as previously suggested I'd rather not reinvent the
>> wheel.
>>>>>>>     Right now I was thinking that something that plugs-in to the
>>>>>>> reference implementation (i.e. would not require a stable SE iface)
>>> would
>>>>>>> be a nice start.
>>>>>>>     What do you think?
>>>>>>> 
>>>>>>> Best
>>>>>>> David
>>>> 
>>> 
>>

Re: timeline for dist execution

Posted by Timothy Chen <tn...@gmail.com>.

I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.

Tim


On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <ja...@apache.org> wrote:

> Given David's request to have everybody review whatever I share, let's do
> M/T/W of next week..  What times are people available?
>
> J
>
> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com> wrote:
>
> > I'm open 2pm pst, see when Jacques is open.
> >
> > Tim
> >
> > Sent from my iPad
> >
> > On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com> wrote:
> >
> > > Hi Jacques
> > >
> > >> I'll try to drop some of my work and thoughts on the list this week.
> > >
> > >    That is great news!
> > >
> > >> As always with these things, everything takes longer than one would
> > >> like…
> > >    Hopefully we can help and take of the workload.
> > >
> > >> I am also thinking that it might be good to do a google hangout
> > >> brainstorming session soon around some of this stuff to help move
> > >> things along.
> > >
> > >    A google hangout is a good idea.
> > >    Wednesday would be a good day for me, say 2PM PST, how about for
> > other people?
> > >    I do think we should do it after we get a chance to take a look at
> > what you already have so that we're all in the same page.
> > >
> > > Best
> > > David
> > >
> > > On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <ja...@gmail.com>
> > wrote:
> > >
> > >> I'll try to drop some of my work and thoughts on the list this week.
> > >> As always with these things, everything takes longer than one would
> > >> like...
> > >>
> > >> I am also thinking that it might be good to do a google hangout
> > >> brainstorming session soon around some of this stuff to help move
> > >> things along.
> > >>
> > >> J
> > >>
> > >> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> > >>
> > >>> +1
> > >>>
> > >>> It would be nice to see what's the current status and future plan on
> > in-mem
> > >>> data representation in the dist exec engine.
> > >>>
> > >>> I was previously going to do something about DataValue in exec/ref.
> > However
> > >>> after some reading into previous discussions in the maillist and some
> > links
> > >>> in 'useful research' wiki page
> > >>> (vldb09-tutorial6.pdf<
> http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
> > >
> > >>> abadisigmod06.pdf<
> > http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> > >>> etc.)
> > >>>
> > >>> I found it non-trivial and crucial building block to build in-mem
> data
> > >>> structure. Incremental optimisation based on current DataValue seems
> a
> > bad
> > >>> idea.
> > >>>
> > >>> So what's your thought on this? If we could get a sketch, I would
> very
> > much
> > >>> like to do something on this issue.
> > >>>
> > >>>
> > >>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com>
> > wrote:
> > >>>
> > >>>> Hi All
> > >>>>
> > >>>>      I was wondering if there is a timeline on when we might get a
> > >>>> sketch of the dist execution engine.
> > >>>>      As I mentioned before I have a little over a month to get
> > >>>> something working and I'm starting to get a bit worried.
> > >>>>      I've been working in the parallel per region hbase scanner and
> > >>>> soon I'll have something usable.
> > >>>>      I can definitely put in a few hours working/helping on it if
> that
> > >>>> helps, but as previously suggested I'd rather not reinvent the
> wheel.
> > >>>>      Right now I was thinking that something that plugs-in to the
> > >>>> reference implementation (i.e. would not require a stable SE iface)
> > would
> > >>>> be a nice start.
> > >>>>      What do you think?
> > >>>>
> > >>>> Best
> > >>>> David
> > >
> >
>

Re: timeline for dist execution

Posted by Jacques Nadeau <ja...@apache.org>.

Given David's request to have everybody review whatever I share, let's do
M/T/W of next week..  What times are people available?

J

On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <tn...@gmail.com> wrote:

> I'm open 2pm pst, see when Jacques is open.
>
> Tim
>
> Sent from my iPad
>
> On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com> wrote:
>
> > Hi Jacques
> >
> >> I'll try to drop some of my work and thoughts on the list this week.
> >
> >    That is great news!
> >
> >> As always with these things, everything takes longer than one would
> >> like…
> >    Hopefully we can help and take of the workload.
> >
> >> I am also thinking that it might be good to do a google hangout
> >> brainstorming session soon around some of this stuff to help move
> >> things along.
> >
> >    A google hangout is a good idea.
> >    Wednesday would be a good day for me, say 2PM PST, how about for
> other people?
> >    I do think we should do it after we get a chance to take a look at
> what you already have so that we're all in the same page.
> >
> > Best
> > David
> >
> > On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <ja...@gmail.com>
> wrote:
> >
> >> I'll try to drop some of my work and thoughts on the list this week.
> >> As always with these things, everything takes longer than one would
> >> like...
> >>
> >> I am also thinking that it might be good to do a google hangout
> >> brainstorming session soon around some of this stuff to help move
> >> things along.
> >>
> >> J
> >>
> >> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> >>
> >>> +1
> >>>
> >>> It would be nice to see what's the current status and future plan on
> in-mem
> >>> data representation in the dist exec engine.
> >>>
> >>> I was previously going to do something about DataValue in exec/ref.
> However
> >>> after some reading into previous discussions in the maillist and some
> links
> >>> in 'useful research' wiki page
> >>> (vldb09-tutorial6.pdf<http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf
> >
> >>> abadisigmod06.pdf<
> http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> >>> etc.)
> >>>
> >>> I found it non-trivial and crucial building block to build in-mem data
> >>> structure. Incremental optimisation based on current DataValue seems a
> bad
> >>> idea.
> >>>
> >>> So what's your thought on this? If we could get a sketch, I would very
> much
> >>> like to do something on this issue.
> >>>
> >>>
> >>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com>
> wrote:
> >>>
> >>>> Hi All
> >>>>
> >>>>      I was wondering if there is a timeline on when we might get a
> >>>> sketch of the dist execution engine.
> >>>>      As I mentioned before I have a little over a month to get
> >>>> something working and I'm starting to get a bit worried.
> >>>>      I've been working in the parallel per region hbase scanner and
> >>>> soon I'll have something usable.
> >>>>      I can definitely put in a few hours working/helping on it if that
> >>>> helps, but as previously suggested I'd rather not reinvent the wheel.
> >>>>      Right now I was thinking that something that plugs-in to the
> >>>> reference implementation (i.e. would not require a stable SE iface)
> would
> >>>> be a nice start.
> >>>>      What do you think?
> >>>>
> >>>> Best
> >>>> David
> >
>

Re: timeline for dist execution

Posted by Timothy Chen <tn...@gmail.com>.

I'm open 2pm pst, see when Jacques is open.

Tim

Sent from my iPad

On Apr 7, 2013, at 6:01 PM, David Alves <da...@gmail.com> wrote:

> Hi Jacques 
> 
>> I'll try to drop some of my work and thoughts on the list this week.
> 
>    That is great news!
> 
>> As always with these things, everything takes longer than one would
>> like…
>    Hopefully we can help and take of the workload.
> 
>> I am also thinking that it might be good to do a google hangout
>> brainstorming session soon around some of this stuff to help move
>> things along.
> 
>    A google hangout is a good idea.
>    Wednesday would be a good day for me, say 2PM PST, how about for other people?
>    I do think we should do it after we get a chance to take a look at what you already have so that we're all in the same page.
> 
> Best
> David
> 
> On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <ja...@gmail.com> wrote:
> 
>> I'll try to drop some of my work and thoughts on the list this week.
>> As always with these things, everything takes longer than one would
>> like...
>> 
>> I am also thinking that it might be good to do a google hangout
>> brainstorming session soon around some of this stuff to help move
>> things along.
>> 
>> J
>> 
>> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
>> 
>>> +1
>>> 
>>> It would be nice to see what's the current status and future plan on in-mem
>>> data representation in the dist exec engine.
>>> 
>>> I was previously going to do something about DataValue in exec/ref. However
>>> after some reading into previous discussions in the maillist and some links
>>> in 'useful research' wiki page
>>> (vldb09-tutorial6.pdf<http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf>
>>> abadisigmod06.pdf<http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
>>> etc.)
>>> 
>>> I found it non-trivial and crucial building block to build in-mem data
>>> structure. Incremental optimisation based on current DataValue seems a bad
>>> idea.
>>> 
>>> So what's your thought on this? If we could get a sketch, I would very much
>>> like to do something on this issue.
>>> 
>>> 
>>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com> wrote:
>>> 
>>>> Hi All
>>>> 
>>>>      I was wondering if there is a timeline on when we might get a
>>>> sketch of the dist execution engine.
>>>>      As I mentioned before I have a little over a month to get
>>>> something working and I'm starting to get a bit worried.
>>>>      I've been working in the parallel per region hbase scanner and
>>>> soon I'll have something usable.
>>>>      I can definitely put in a few hours working/helping on it if that
>>>> helps, but as previously suggested I'd rather not reinvent the wheel.
>>>>      Right now I was thinking that something that plugs-in to the
>>>> reference implementation (i.e. would not require a stable SE iface) would
>>>> be a nice start.
>>>>      What do you think?
>>>> 
>>>> Best
>>>> David
>

Re: timeline for dist execution

Posted by David Alves <da...@gmail.com>.

Hi Jacques 

> I'll try to drop some of my work and thoughts on the list this week.

	That is great news!

> As always with these things, everything takes longer than one would
> like…
	Hopefully we can help and take of the workload.

> I am also thinking that it might be good to do a google hangout
> brainstorming session soon around some of this stuff to help move
> things along.

	A google hangout is a good idea.
	Wednesday would be a good day for me, say 2PM PST, how about for other people?
	I do think we should do it after we get a chance to take a look at what you already have so that we're all in the same page.

Best
David

On Apr 6, 2013, at 11:36 PM, Jacques Nadeau <ja...@gmail.com> wrote:

> I'll try to drop some of my work and thoughts on the list this week.
> As always with these things, everything takes longer than one would
> like...
> 
> I am also thinking that it might be good to do a google hangout
> brainstorming session soon around some of this stuff to help move
> things along.
> 
> J
> 
> On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:
> 
>> +1
>> 
>> It would be nice to see what's the current status and future plan on in-mem
>> data representation in the dist exec engine.
>> 
>> I was previously going to do something about DataValue in exec/ref. However
>> after some reading into previous discussions in the maillist and some links
>> in 'useful research' wiki page
>> (vldb09-tutorial6.pdf<http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf>
>> abadisigmod06.pdf<http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
>> etc.)
>> 
>> I found it non-trivial and crucial building block to build in-mem data
>> structure. Incremental optimisation based on current DataValue seems a bad
>> idea.
>> 
>> So what's your thought on this? If we could get a sketch, I would very much
>> like to do something on this issue.
>> 
>> 
>> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com> wrote:
>> 
>>> Hi All
>>> 
>>>       I was wondering if there is a timeline on when we might get a
>>> sketch of the dist execution engine.
>>>       As I mentioned before I have a little over a month to get
>>> something working and I'm starting to get a bit worried.
>>>       I've been working in the parallel per region hbase scanner and
>>> soon I'll have something usable.
>>>       I can definitely put in a few hours working/helping on it if that
>>> helps, but as previously suggested I'd rather not reinvent the wheel.
>>>       Right now I was thinking that something that plugs-in to the
>>> reference implementation (i.e. would not require a stable SE iface) would
>>> be a nice start.
>>>       What do you think?
>>> 
>>> Best
>>> David
>>> 
>>> 
>>> 
>>>

Re: timeline for dist execution

Posted by Jacques Nadeau <ja...@gmail.com>.

I'll try to drop some of my work and thoughts on the list this week.
As always with these things, everything takes longer than one would
like...

I am also thinking that it might be good to do a google hangout
brainstorming session soon around some of this stuff to help move
things along.

J

On Apr 6, 2013, at 8:39 PM, Lisen Mu <im...@gmail.com> wrote:

> +1
>
> It would be nice to see what's the current status and future plan on in-mem
> data representation in the dist exec engine.
>
> I was previously going to do something about DataValue in exec/ref. However
> after some reading into previous discussions in the maillist and some links
> in 'useful research' wiki page
> (vldb09-tutorial6.pdf<http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf>
> abadisigmod06.pdf<http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
> etc.)
>
> I found it non-trivial and crucial building block to build in-mem data
> structure. Incremental optimisation based on current DataValue seems a bad
> idea.
>
> So what's your thought on this? If we could get a sketch, I would very much
> like to do something on this issue.
>
>
> On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com> wrote:
>
>> Hi All
>>
>>        I was wondering if there is a timeline on when we might get a
>> sketch of the dist execution engine.
>>        As I mentioned before I have a little over a month to get
>> something working and I'm starting to get a bit worried.
>>        I've been working in the parallel per region hbase scanner and
>> soon I'll have something usable.
>>        I can definitely put in a few hours working/helping on it if that
>> helps, but as previously suggested I'd rather not reinvent the wheel.
>>        Right now I was thinking that something that plugs-in to the
>> reference implementation (i.e. would not require a stable SE iface) would
>> be a nice start.
>>        What do you think?
>>
>> Best
>> David
>>
>>
>>
>>

Re: timeline for dist execution

Posted by Lisen Mu <im...@gmail.com>.

+1

It would be nice to see what's the current status and future plan on in-mem
data representation in the dist exec engine.

I was previously going to do something about DataValue in exec/ref. However
after some reading into previous discussions in the maillist and some links
in 'useful research' wiki page
(vldb09-tutorial6.pdf<http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf>
 abadisigmod06.pdf<http://cs-www.cs.yale.edu/homes/dna/papers/abadisigmod06.pdf>
 etc.)

I found it non-trivial and crucial building block to build in-mem data
structure. Incremental optimisation based on current DataValue seems a bad
idea.

So what's your thought on this? If we could get a sketch, I would very much
like to do something on this issue.

On Sat, Apr 6, 2013 at 6:31 AM, David Alves <da...@gmail.com> wrote:

> Hi All
>
>         I was wondering if there is a timeline on when we might get a
> sketch of the dist execution engine.
>         As I mentioned before I have a little over a month to get
> something working and I'm starting to get a bit worried.
>         I've been working in the parallel per region hbase scanner and
> soon I'll have something usable.
>         I can definitely put in a few hours working/helping on it if that
> helps, but as previously suggested I'd rather not reinvent the wheel.
>         Right now I was thinking that something that plugs-in to the
> reference implementation (i.e. would not require a stable SE iface) would
> be a nice start.
>         What do you think?
>
> Best
> David
>
>
>
>

Re: timeline for dist execution

Posted by Michael Hausenblas <mi...@gmail.com>.

> What do you think?

+1

Cheers,
		Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 5 Apr 2013, at 23:31, David Alves <da...@gmail.com> wrote:

> Hi All
> 	
> 	I was wondering if there is a timeline on when we might get a sketch of the dist execution engine.
> 	As I mentioned before I have a little over a month to get something working and I'm starting to get a bit worried.
> 	I've been working in the parallel per region hbase scanner and soon I'll have something usable.
> 	I can definitely put in a few hours working/helping on it if that helps, but as previously suggested I'd rather not reinvent the wheel.
> 	Right now I was thinking that something that plugs-in to the reference implementation (i.e. would not require a stable SE iface) would be a nice start.
> 	What do you think?
> 
> Best
> David
> 
> 
>