You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Ted Dunning <te...@gmail.com> on 2013/02/01 00:33:48 UTC

Re: High-level architecture

I hear you.  Deployment complexity is an evil thing.

And your comment about being willing to trade some performance for
flexibility is also interesting.

A big mismatch here, however, is that every query is going to cause
different desired communication patterns.  One way to handle that is to
build a new topology for every query.  That isn't going to fly due to long
topology deployment times.  Essentially Nimbus becomes the out of band
communication mechanism.

The other option would be to use Storm to move query components around.
 The communication patterns are much simpler in this case, but bolts
suddenly need the ability to communicate to arbitrary other bolts to
implement the data flow.  This makes Storm handle the out-of-band
communication and leaves us with implementation of the data transform
outside of Storm.  Since the out-of-band comms are tiny, this is perverse
and doesn't use Storm for what it should be doing.

So I really think that the takeaway here is that we need to be able to pop
up workers very quickly and easily.  That is the lesson learned from Storm
here and it really needs to happen.  This also impacts features like
elasticity (where Drill might soak up excess capability in a cluster, but
not hurt batch performance).


On Thu, Jan 31, 2013 at 12:43 PM, Brian O'Neill <bo...@alumni.brown.edu>wrote:

> Great points. Thanks Ted.
>
> I'm not sure if it is possible, but if there were a Storm topology
> deployment option, I think there might be appetite for that since it would
> reduce the operations/admin complexity significantly for consumers that
> already have Storm deployed.  (IMHO) I would be willing to sacrifice some
> performance to maintain only one set of distributed processing
> infrastructure.
>
> With respect to locality information, I think Storm will eventually need
> to add out-of-band information to optimize the tuple routing.  We developed
> the storm-cassandra bolt, and I'm eager to get to the point where we can
> supply ring/token information to Storm so it can route the tuples to the
> nodes that contain the data.
>
> (Maybe it gets carried around in the tuple and leveraged by the underlying
> infrastructure -- much like Nathan did with transaction id for Trident?)
>
> But I fully appreciate your points. (especially regarding java-centricity,
> serialization, kryo, etc.)
>
> -brian
>
> --
> Brian O'Neill
> Lead Architect, Software Development
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive • King of Prussia, PA • 19406
> M: 215.588.6024 • @boneill42  • healthmarketscience.com
>
> On Jan 30, 2013, at 3:16 PM, Ted Dunning wrote:
>
> > On Wed, Jan 30, 2013 at 11:53 AM, Brian O'Neill <bone@alumni.brown.edu
> >wrote:
> >
> >> ...
> >> How do we intend to distribute the execution engine across a set of
> >> machines?
> >>
> >
> > There are a variety of thoughts.  These include:
> >
> > - custom built execution controller similar to Storm's Nimbus
> >
> > - use Storm's Nimbus
> >
> > - use the custom built controller via Yarn.  Or Mesos.  Or the MapR
> warden
> >
> > - start them by hand.
> >
> > Obviously the last option will be the one that is used in initial
> testing.
> >
> > Any thought to deploying the engine as a Storm topology?
> >>
> >
> > Using Storm probably limits the performance that we can get.  Storm's
> > performance is creditable but not super awesomely impressive.
> >
> > Some of the performance issues with Storm include:
> >
> > - limited to Java.  This may or may not make a difference in the end in
> > terms of performance, but we definitely want flexibility here.  Java can
> be
> > awesomely fast (see LMAX and Disruptor), but C++ may be more predictable.
> > We definitely *don't* want to decide for all time right now which option
> > we take and we definitely *do* want to have the C++ option in our
> > hip-pocket later regardless of how we build execution engines now.  Part
> of
> > Storm's limitations here have to do with the use of Kryo instead of a
> > portable serializer like protobufs.
> >
> > - the designs I have seen or heard batting around tend to deal with
> batches
> > of records represented in an ephemeral column oriented design.  It will
> > also be important for records to be kept in unmaterialized, virtual form
> to
> > minimize copying, especially when flattening arrays and such.  Storm
> allows
> > tuples to be kept in memory when bolts are on the same machine, but
> insists
> > on serializing and deserializing them at the frontier.  To control this,
> we
> > would have to nest serializations which seems a bit like incipient
> insanity.
> >
> > Other issues include:
> >
> > - Drill execution engines will need access to a considerable amount of
> > out-of-band information such as schemas and statistics.  How do we manage
> > that in a restrictive paradigm like Storm
> >
> > - Storm hides location from Bolts.  Drill needs to make decisions based
> on
> > location of execution engines and data.
>
>

Re: High-level architecture

Posted by Ted Dunning <te...@gmail.com>.
Brian,

In the short run, what I would like to see is a simple server that will
accept a query in JSON form and send the resulting rows to another
networked server.  The JSON form should allow for networked sources.

Associated with this could be an operator that sends a JSON sub-query to a
remote server.

All of this can be accomplished pretty straightforwardly using protobuf
rpc.  See http://code.google.com/p/protobuf-rpc-pro/ for a good library for
that and see https://github.com/tdunning/mapr-spout for an example of using
this with zookeeper.

On Fri, Feb 1, 2013 at 10:19 AM, Brian O'Neill <bo...@alumni.brown.edu>wrote:

>
> Excellent points Ted. (again)
> You've got me thinking.
>
> I haven't delved into the algorithm enough to understand how widely the
> communication patterns vary.  I'm sure your right, but it'd be really cool
> if we could find a static topology construct that can accommodate the
> different communication patterns.
> (even if we have to sacrifice locality for now)
>
> If I get some time, I may have a look.
>
> thanks,
> -brian
>
>
> ---
> Brian O'Neill
> Lead Architect, Software Development
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive € King of Prussia, PA € 19406
> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
> healthmarketscience.com
>
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If
> you received this email in error and are not the intended recipient, or
> the person responsible to deliver it to the intended recipient, please
> contact the sender at the email above and delete this email and any
> attachments and destroy any copies thereof. Any review, retransmission,
> dissemination, copying or other use of, or taking any action in reliance
> upon, this information by persons or entities other than the intended
> recipient is strictly prohibited.
>
>
>
>
>
>
>
> On 1/31/13 6:33 PM, "Ted Dunning" <te...@gmail.com> wrote:
>
> >I hear you.  Deployment complexity is an evil thing.
> >
> >And your comment about being willing to trade some performance for
> >flexibility is also interesting.
> >
> >A big mismatch here, however, is that every query is going to cause
> >different desired communication patterns.  One way to handle that is to
> >build a new topology for every query.  That isn't going to fly due to long
> >topology deployment times.  Essentially Nimbus becomes the out of band
> >communication mechanism.
> >
> >The other option would be to use Storm to move query components around.
> > The communication patterns are much simpler in this case, but bolts
> >suddenly need the ability to communicate to arbitrary other bolts to
> >implement the data flow.  This makes Storm handle the out-of-band
> >communication and leaves us with implementation of the data transform
> >outside of Storm.  Since the out-of-band comms are tiny, this is perverse
> >and doesn't use Storm for what it should be doing.
> >
> >So I really think that the takeaway here is that we need to be able to pop
> >up workers very quickly and easily.  That is the lesson learned from Storm
> >here and it really needs to happen.  This also impacts features like
> >elasticity (where Drill might soak up excess capability in a cluster, but
> >not hurt batch performance).
> >
> >
> >On Thu, Jan 31, 2013 at 12:43 PM, Brian O'Neill
> ><bo...@alumni.brown.edu>wrote:
> >
> >> Great points. Thanks Ted.
> >>
> >> I'm not sure if it is possible, but if there were a Storm topology
> >> deployment option, I think there might be appetite for that since it
> >>would
> >> reduce the operations/admin complexity significantly for consumers that
> >> already have Storm deployed.  (IMHO) I would be willing to sacrifice
> >>some
> >> performance to maintain only one set of distributed processing
> >> infrastructure.
> >>
> >> With respect to locality information, I think Storm will eventually need
> >> to add out-of-band information to optimize the tuple routing.  We
> >>developed
> >> the storm-cassandra bolt, and I'm eager to get to the point where we can
> >> supply ring/token information to Storm so it can route the tuples to the
> >> nodes that contain the data.
> >>
> >> (Maybe it gets carried around in the tuple and leveraged by the
> >>underlying
> >> infrastructure -- much like Nathan did with transaction id for Trident?)
> >>
> >> But I fully appreciate your points. (especially regarding
> >>java-centricity,
> >> serialization, kryo, etc.)
> >>
> >> -brian
> >>
> >> --
> >> Brian O'Neill
> >> Lead Architect, Software Development
> >> Health Market Science
> >> The Science of Better Results
> >> 2700 Horizon Drive € King of Prussia, PA € 19406
> >> M: 215.588.6024 € @boneill42  € healthmarketscience.com
> >>
> >> On Jan 30, 2013, at 3:16 PM, Ted Dunning wrote:
> >>
> >> > On Wed, Jan 30, 2013 at 11:53 AM, Brian O'Neill <
> bone@alumni.brown.edu
> >> >wrote:
> >> >
> >> >> ...
> >> >> How do we intend to distribute the execution engine across a set of
> >> >> machines?
> >> >>
> >> >
> >> > There are a variety of thoughts.  These include:
> >> >
> >> > - custom built execution controller similar to Storm's Nimbus
> >> >
> >> > - use Storm's Nimbus
> >> >
> >> > - use the custom built controller via Yarn.  Or Mesos.  Or the MapR
> >> warden
> >> >
> >> > - start them by hand.
> >> >
> >> > Obviously the last option will be the one that is used in initial
> >> testing.
> >> >
> >> > Any thought to deploying the engine as a Storm topology?
> >> >>
> >> >
> >> > Using Storm probably limits the performance that we can get.  Storm's
> >> > performance is creditable but not super awesomely impressive.
> >> >
> >> > Some of the performance issues with Storm include:
> >> >
> >> > - limited to Java.  This may or may not make a difference in the end
> >>in
> >> > terms of performance, but we definitely want flexibility here.  Java
> >>can
> >> be
> >> > awesomely fast (see LMAX and Disruptor), but C++ may be more
> >>predictable.
> >> > We definitely *don't* want to decide for all time right now which
> >>option
> >> > we take and we definitely *do* want to have the C++ option in our
> >> > hip-pocket later regardless of how we build execution engines now.
> >>Part
> >> of
> >> > Storm's limitations here have to do with the use of Kryo instead of a
> >> > portable serializer like protobufs.
> >> >
> >> > - the designs I have seen or heard batting around tend to deal with
> >> batches
> >> > of records represented in an ephemeral column oriented design.  It
> >>will
> >> > also be important for records to be kept in unmaterialized, virtual
> >>form
> >> to
> >> > minimize copying, especially when flattening arrays and such.  Storm
> >> allows
> >> > tuples to be kept in memory when bolts are on the same machine, but
> >> insists
> >> > on serializing and deserializing them at the frontier.  To control
> >>this,
> >> we
> >> > would have to nest serializations which seems a bit like incipient
> >> insanity.
> >> >
> >> > Other issues include:
> >> >
> >> > - Drill execution engines will need access to a considerable amount of
> >> > out-of-band information such as schemas and statistics.  How do we
> >>manage
> >> > that in a restrictive paradigm like Storm
> >> >
> >> > - Storm hides location from Bolts.  Drill needs to make decisions
> >>based
> >> on
> >> > location of execution engines and data.
> >>
> >>
>
>
>

Re: High-level architecture

Posted by Brian O'Neill <bo...@alumni.brown.edu>.
Excellent points Ted. (again)
You've got me thinking.

I haven't delved into the algorithm enough to understand how widely the
communication patterns vary.  I'm sure your right, but it'd be really cool
if we could find a static topology construct that can accommodate the
different communication patterns.
(even if we have to sacrifice locality for now)

If I get some time, I may have a look.

thanks,
-brian


---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 1/31/13 6:33 PM, "Ted Dunning" <te...@gmail.com> wrote:

>I hear you.  Deployment complexity is an evil thing.
>
>And your comment about being willing to trade some performance for
>flexibility is also interesting.
>
>A big mismatch here, however, is that every query is going to cause
>different desired communication patterns.  One way to handle that is to
>build a new topology for every query.  That isn't going to fly due to long
>topology deployment times.  Essentially Nimbus becomes the out of band
>communication mechanism.
>
>The other option would be to use Storm to move query components around.
> The communication patterns are much simpler in this case, but bolts
>suddenly need the ability to communicate to arbitrary other bolts to
>implement the data flow.  This makes Storm handle the out-of-band
>communication and leaves us with implementation of the data transform
>outside of Storm.  Since the out-of-band comms are tiny, this is perverse
>and doesn't use Storm for what it should be doing.
>
>So I really think that the takeaway here is that we need to be able to pop
>up workers very quickly and easily.  That is the lesson learned from Storm
>here and it really needs to happen.  This also impacts features like
>elasticity (where Drill might soak up excess capability in a cluster, but
>not hurt batch performance).
>
>
>On Thu, Jan 31, 2013 at 12:43 PM, Brian O'Neill
><bo...@alumni.brown.edu>wrote:
>
>> Great points. Thanks Ted.
>>
>> I'm not sure if it is possible, but if there were a Storm topology
>> deployment option, I think there might be appetite for that since it
>>would
>> reduce the operations/admin complexity significantly for consumers that
>> already have Storm deployed.  (IMHO) I would be willing to sacrifice
>>some
>> performance to maintain only one set of distributed processing
>> infrastructure.
>>
>> With respect to locality information, I think Storm will eventually need
>> to add out-of-band information to optimize the tuple routing.  We
>>developed
>> the storm-cassandra bolt, and I'm eager to get to the point where we can
>> supply ring/token information to Storm so it can route the tuples to the
>> nodes that contain the data.
>>
>> (Maybe it gets carried around in the tuple and leveraged by the
>>underlying
>> infrastructure -- much like Nathan did with transaction id for Trident?)
>>
>> But I fully appreciate your points. (especially regarding
>>java-centricity,
>> serialization, kryo, etc.)
>>
>> -brian
>>
>> --
>> Brian O'Neill
>> Lead Architect, Software Development
>> Health Market Science
>> The Science of Better Results
>> 2700 Horizon Drive € King of Prussia, PA € 19406
>> M: 215.588.6024 € @boneill42  € healthmarketscience.com
>>
>> On Jan 30, 2013, at 3:16 PM, Ted Dunning wrote:
>>
>> > On Wed, Jan 30, 2013 at 11:53 AM, Brian O'Neill <bone@alumni.brown.edu
>> >wrote:
>> >
>> >> ...
>> >> How do we intend to distribute the execution engine across a set of
>> >> machines?
>> >>
>> >
>> > There are a variety of thoughts.  These include:
>> >
>> > - custom built execution controller similar to Storm's Nimbus
>> >
>> > - use Storm's Nimbus
>> >
>> > - use the custom built controller via Yarn.  Or Mesos.  Or the MapR
>> warden
>> >
>> > - start them by hand.
>> >
>> > Obviously the last option will be the one that is used in initial
>> testing.
>> >
>> > Any thought to deploying the engine as a Storm topology?
>> >>
>> >
>> > Using Storm probably limits the performance that we can get.  Storm's
>> > performance is creditable but not super awesomely impressive.
>> >
>> > Some of the performance issues with Storm include:
>> >
>> > - limited to Java.  This may or may not make a difference in the end
>>in
>> > terms of performance, but we definitely want flexibility here.  Java
>>can
>> be
>> > awesomely fast (see LMAX and Disruptor), but C++ may be more
>>predictable.
>> > We definitely *don't* want to decide for all time right now which
>>option
>> > we take and we definitely *do* want to have the C++ option in our
>> > hip-pocket later regardless of how we build execution engines now.
>>Part
>> of
>> > Storm's limitations here have to do with the use of Kryo instead of a
>> > portable serializer like protobufs.
>> >
>> > - the designs I have seen or heard batting around tend to deal with
>> batches
>> > of records represented in an ephemeral column oriented design.  It
>>will
>> > also be important for records to be kept in unmaterialized, virtual
>>form
>> to
>> > minimize copying, especially when flattening arrays and such.  Storm
>> allows
>> > tuples to be kept in memory when bolts are on the same machine, but
>> insists
>> > on serializing and deserializing them at the frontier.  To control
>>this,
>> we
>> > would have to nest serializations which seems a bit like incipient
>> insanity.
>> >
>> > Other issues include:
>> >
>> > - Drill execution engines will need access to a considerable amount of
>> > out-of-band information such as schemas and statistics.  How do we
>>manage
>> > that in a restrictive paradigm like Storm
>> >
>> > - Storm hides location from Bolts.  Drill needs to make decisions
>>based
>> on
>> > location of execution engines and data.
>>
>>