You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2015/07/16 14:23:28 UTC

Drill on Mesos - A Story

I have been working on getting various frameworks working on my MapR
Cluster that is also running Mesos. Basically, while I know that there is a
package from MapR (for Drill) I am trying to find a way to better separate
the storage layer from the computer layer.

This isn't a dig on MapR, or any of the Hadoop distributions, it's only I
want flexibility to try things, to have an R&D team working with the data
in an environment that can try out new frameworks etc.  This combination
has been very good to me (maybe not to MapR support who received lots of
quirky questions from me.   They have been helpful in furthering my
understanding of this space!)

My next project I wanted to play with was Drill. I found
https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic start
to a Drill on Mesos approach. I read through the code, I understand it, but
I wanted to see it at a more basic level.

So I just figured out how to run Drill bits in Marathon (manually for
now).  Basically, for anyone wanting to play along at home, This actually
works VERY well.  I used MapR FS to host my package from Drill, I set a
conf directory.  (Multiple conf directories actually, I set it up so I
could launch different "sized" drillbits).  I have been able to get things
running, and be performant on my small test cluster.

For those who may be interested here are some of my notes.

- I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some compiling
issues that Jacques was able to help me through. Basically, Java 1.8 isn't
support for building yet (fails some tests) but there is a work around to
that.

- I took the built package and placed it in MapR FS.  Now, I have every
node mounting MapRFS to same NFS location.  I could be using a hdfs
(maprfs) based tarball but I haven't done that yet. I am just playing
around and the NFS mounting of MapRFS sure is handy in this regard.

- At first I created a single sized Drill bit, the Marathon JSON is like
this:

{

"cmd": "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
--config /mapr/brewpot/mesos/drill/conf",

"cpus": 2.0,

"mem": 6144,

"id": "drillpot",

"instances": 1,

"constraints": [["hostname", "UNIQUE"]]

}


So I can walk you through this.  The first is the command obviously.   I
use runbit instead of drillbit.sh start because I want this process to stay
running (from Marathon's perspective).  If I used the drillbit.sh, it uses
nohup and backgrounds it, Mesos/Marathon thinks it died and tries to start
another.

cpus: obvious, maybe a bit small, but I have a small cluster.

mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
marathon memory to 9GB or if the heap was used inside the direct memory.  I
could use some pointers here.

id: This is the id of my cluster in the drill-overides.conf. I did this so
HA proxy would let me connect to the cluster via drillpot.marathon.mesos
and it worked pretty well!

instances: I started with one, but could scale up with marathon

constrains; I only wanted one drill bit per node because of port
conflicts.  If I want to be multi tenant  and have more than one drill bit
per node, I would need to figure out how to abstract the ports. This is
something that I could potentially do in a frame work for Mesos. But at the
same time, I wonder if if when a drill bit registers with a cluster, it
could just "report" it ports in the zookeeper information.. This is
intriguing because if it did this, we could allow it to pull random ports
offered to it from Mesos, registers the information, and away we go.  It
would be intriguing.


Once I posted this to marathon, all was good, bits started, queries were
had by all!  It worked well. Some challenges:


1.  Ports (as mentioned above) I am not managing those, so port conflicts
could occur.

2. I should use a tarball for Marathon, this would allow drill to work on
Mesos without the MapR requirement.

3. Logging. I have the default logback.xml in the conf directory and I am
getting file not found issues in my stderr on the Mesos tasks. This isn't
kill drill, and it still works, but I should organize my logging better.


Hopeful for the future:

1. It would be neat to have a frame work that did the actual running of the
bits.  Perhaps something that could scale up and down based on query usage.
I played around with some smaller drillbits (similar to how myriad defines
profiles) so I could have a drill cluster of 2 large bits, and 2 small bits
on my 5 node cluster.   That worked, but lots of manual work. A framework
would be handy for managing that.

2. Other?


I know this isn't a production thing, but I could see being able to go from
this to something a subset of production users could use in MapR/Mesos (or
just Mesos)   I just wanted to share some of my thought processes and show
a way that various tools can integrate.  Always happy to talk to shop with
folks on this stuff if anyone has any questions.


John

Re: Drill on Mesos - A Story

Posted by Timothy Chen <tn...@gmail.com>.
Hi John,

Fetcher cache is going to be in 0.23 so that's something that you can leverage.

You'll find more doc in the docs folder about it once we have 0.23 released.

Tim

On Thu, Jul 16, 2015 at 12:08 PM, John Omernik <jo...@omernik.com> wrote:
> Timothy -
>
> I played with that, and the performance I was getting in Docker was about
> half that I was getting native. I think that for me, that was occurring
> because if I ran it in Docker, I needed to install the MapR Client in the
> container too, whereas when I run it in marathon, it's using the node's
> access to the disk.  I am comfortable in places where performance stuff
> like this occurs, to not docker all the things, and allow for the tar ball
> method.  Perhaps Mesos could find a way to cache locally?  (Note, putting
> it in MapR FS still has it load pretty quick)
>
> John
>
>
> On Thu, Jul 16, 2015 at 11:44 AM, Timothy Chen <tn...@gmail.com> wrote:
>
>> Also will be nice to launch Drill with a docker image so no tar ball is
>> needed, and much easier be cached on each slave.
>>
>> Tim
>>
>>
>> > On Jul 16, 2015, at 9:37 AM, John Omernik <jo...@omernik.com> wrote:
>> >
>> > Awesome thanks for the update on memory!
>> >
>> > On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
>> > aengelbrecht@maprtech.com> wrote:
>> >
>> >> Great write up and information! Will be interesting to see how this
>> >> evolves.
>> >>
>> >> A quick note, memory allocation is additive so you have to allocate for
>> >> direct plus heap memory. Drill uses direct memory for data
>> >> structures/operations and this is the one that will grow with larger
>> data
>> >> sets, etc.
>> >>
>> >> —Andries
>> >>
>> >>> On Jul 16, 2015, at 5:23 AM, John Omernik <jo...@omernik.com> wrote:
>> >>>
>> >>> I have been working on getting various frameworks working on my MapR
>> >>> Cluster that is also running Mesos. Basically, while I know that there
>> >> is a
>> >>> package from MapR (for Drill) I am trying to find a way to better
>> >> separate
>> >>> the storage layer from the computer layer.
>> >>>
>> >>> This isn't a dig on MapR, or any of the Hadoop distributions, it's
>> only I
>> >>> want flexibility to try things, to have an R&D team working with the
>> data
>> >>> in an environment that can try out new frameworks etc.  This
>> combination
>> >>> has been very good to me (maybe not to MapR support who received lots
>> of
>> >>> quirky questions from me.   They have been helpful in furthering my
>> >>> understanding of this space!)
>> >>>
>> >>> My next project I wanted to play with was Drill. I found
>> >>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
>> >> start
>> >>> to a Drill on Mesos approach. I read through the code, I understand it,
>> >> but
>> >>> I wanted to see it at a more basic level.
>> >>>
>> >>> So I just figured out how to run Drill bits in Marathon (manually for
>> >>> now).  Basically, for anyone wanting to play along at home, This
>> actually
>> >>> works VERY well.  I used MapR FS to host my package from Drill, I set a
>> >>> conf directory.  (Multiple conf directories actually, I set it up so I
>> >>> could launch different "sized" drillbits).  I have been able to get
>> >> things
>> >>> running, and be performant on my small test cluster.
>> >>>
>> >>> For those who may be interested here are some of my notes.
>> >>>
>> >>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some
>> compiling
>> >>> issues that Jacques was able to help me through. Basically, Java 1.8
>> >> isn't
>> >>> support for building yet (fails some tests) but there is a work around
>> to
>> >>> that.
>> >>>
>> >>> - I took the built package and placed it in MapR FS.  Now, I have every
>> >>> node mounting MapRFS to same NFS location.  I could be using a hdfs
>> >>> (maprfs) based tarball but I haven't done that yet. I am just playing
>> >>> around and the NFS mounting of MapRFS sure is handy in this regard.
>> >>>
>> >>> - At first I created a single sized Drill bit, the Marathon JSON is
>> like
>> >>> this:
>> >>>
>> >>> {
>> >>>
>> >>> "cmd":
>> "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
>> >>> --config /mapr/brewpot/mesos/drill/conf",
>> >>>
>> >>> "cpus": 2.0,
>> >>>
>> >>> "mem": 6144,
>> >>>
>> >>> "id": "drillpot",
>> >>>
>> >>> "instances": 1,
>> >>>
>> >>> "constraints": [["hostname", "UNIQUE"]]
>> >>>
>> >>> }
>> >>>
>> >>>
>> >>> So I can walk you through this.  The first is the command obviously.
>>  I
>> >>> use runbit instead of drillbit.sh start because I want this process to
>> >> stay
>> >>> running (from Marathon's perspective).  If I used the drillbit.sh, it
>> >> uses
>> >>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
>> >> start
>> >>> another.
>> >>>
>> >>> cpus: obvious, maybe a bit small, but I have a small cluster.
>> >>>
>> >>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
>> >>> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
>> >>> marathon memory to 9GB or if the heap was used inside the direct
>> >> memory.  I
>> >>> could use some pointers here.
>> >>>
>> >>> id: This is the id of my cluster in the drill-overides.conf. I did this
>> >> so
>> >>> HA proxy would let me connect to the cluster via
>> drillpot.marathon.mesos
>> >>> and it worked pretty well!
>> >>>
>> >>> instances: I started with one, but could scale up with marathon
>> >>>
>> >>> constrains; I only wanted one drill bit per node because of port
>> >>> conflicts.  If I want to be multi tenant  and have more than one drill
>> >> bit
>> >>> per node, I would need to figure out how to abstract the ports. This is
>> >>> something that I could potentially do in a frame work for Mesos. But at
>> >> the
>> >>> same time, I wonder if if when a drill bit registers with a cluster, it
>> >>> could just "report" it ports in the zookeeper information.. This is
>> >>> intriguing because if it did this, we could allow it to pull random
>> ports
>> >>> offered to it from Mesos, registers the information, and away we go.
>> It
>> >>> would be intriguing.
>> >>>
>> >>>
>> >>> Once I posted this to marathon, all was good, bits started, queries
>> were
>> >>> had by all!  It worked well. Some challenges:
>> >>>
>> >>>
>> >>> 1.  Ports (as mentioned above) I am not managing those, so port
>> conflicts
>> >>> could occur.
>> >>>
>> >>> 2. I should use a tarball for Marathon, this would allow drill to work
>> on
>> >>> Mesos without the MapR requirement.
>> >>>
>> >>> 3. Logging. I have the default logback.xml in the conf directory and I
>> am
>> >>> getting file not found issues in my stderr on the Mesos tasks. This
>> isn't
>> >>> kill drill, and it still works, but I should organize my logging
>> better.
>> >>>
>> >>>
>> >>> Hopeful for the future:
>> >>>
>> >>> 1. It would be neat to have a frame work that did the actual running of
>> >> the
>> >>> bits.  Perhaps something that could scale up and down based on query
>> >> usage.
>> >>> I played around with some smaller drillbits (similar to how myriad
>> >> defines
>> >>> profiles) so I could have a drill cluster of 2 large bits, and 2 small
>> >> bits
>> >>> on my 5 node cluster.   That worked, but lots of manual work. A
>> framework
>> >>> would be handy for managing that.
>> >>>
>> >>> 2. Other?
>> >>>
>> >>>
>> >>> I know this isn't a production thing, but I could see being able to go
>> >> from
>> >>> this to something a subset of production users could use in MapR/Mesos
>> >> (or
>> >>> just Mesos)   I just wanted to share some of my thought processes and
>> >> show
>> >>> a way that various tools can integrate.  Always happy to talk to shop
>> >> with
>> >>> folks on this stuff if anyone has any questions.
>> >>>
>> >>>
>> >>> John
>> >>
>> >>
>>

Re: Drill on Mesos - A Story

Posted by Jim Scott <js...@maprtech.com>.
John,

That is a great writeup. If Drill were running from docker with the docker
instance referencing a local path ( Mount a host directory as a data volume
| https://docs.docker.com/userguide/dockervolumes/ ) I would expect the
same performance with all the flexibility you are seeking. In case it is
helpful to you or others, there is a docker image with drill at this
location: https://registry.hub.docker.com/u/mkieboom/apache-drill-docker/

The nice thing about the approach you are taking and adding a docker
deployment with something like Drill is that you really don't care where
those docker instance land in your cluster because you can build your
configuration into your docker image and you are off and running and should
have no problem dynamically spinning up a few more instances whenever you
want. Should hopefully simplify administration.

Jim

On Thu, Jul 16, 2015 at 2:08 PM, John Omernik <jo...@omernik.com> wrote:

> Timothy -
>
> I played with that, and the performance I was getting in Docker was about
> half that I was getting native. I think that for me, that was occurring
> because if I ran it in Docker, I needed to install the MapR Client in the
> container too, whereas when I run it in marathon, it's using the node's
> access to the disk.  I am comfortable in places where performance stuff
> like this occurs, to not docker all the things, and allow for the tar ball
> method.  Perhaps Mesos could find a way to cache locally?  (Note, putting
> it in MapR FS still has it load pretty quick)
>
> John
>
>
> On Thu, Jul 16, 2015 at 11:44 AM, Timothy Chen <tn...@gmail.com> wrote:
>
> > Also will be nice to launch Drill with a docker image so no tar ball is
> > needed, and much easier be cached on each slave.
> >
> > Tim
> >
> >
> > > On Jul 16, 2015, at 9:37 AM, John Omernik <jo...@omernik.com> wrote:
> > >
> > > Awesome thanks for the update on memory!
> > >
> > > On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
> > > aengelbrecht@maprtech.com> wrote:
> > >
> > >> Great write up and information! Will be interesting to see how this
> > >> evolves.
> > >>
> > >> A quick note, memory allocation is additive so you have to allocate
> for
> > >> direct plus heap memory. Drill uses direct memory for data
> > >> structures/operations and this is the one that will grow with larger
> > data
> > >> sets, etc.
> > >>
> > >> —Andries
> > >>
> > >>> On Jul 16, 2015, at 5:23 AM, John Omernik <jo...@omernik.com> wrote:
> > >>>
> > >>> I have been working on getting various frameworks working on my MapR
> > >>> Cluster that is also running Mesos. Basically, while I know that
> there
> > >> is a
> > >>> package from MapR (for Drill) I am trying to find a way to better
> > >> separate
> > >>> the storage layer from the computer layer.
> > >>>
> > >>> This isn't a dig on MapR, or any of the Hadoop distributions, it's
> > only I
> > >>> want flexibility to try things, to have an R&D team working with the
> > data
> > >>> in an environment that can try out new frameworks etc.  This
> > combination
> > >>> has been very good to me (maybe not to MapR support who received lots
> > of
> > >>> quirky questions from me.   They have been helpful in furthering my
> > >>> understanding of this space!)
> > >>>
> > >>> My next project I wanted to play with was Drill. I found
> > >>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
> > >> start
> > >>> to a Drill on Mesos approach. I read through the code, I understand
> it,
> > >> but
> > >>> I wanted to see it at a more basic level.
> > >>>
> > >>> So I just figured out how to run Drill bits in Marathon (manually for
> > >>> now).  Basically, for anyone wanting to play along at home, This
> > actually
> > >>> works VERY well.  I used MapR FS to host my package from Drill, I
> set a
> > >>> conf directory.  (Multiple conf directories actually, I set it up so
> I
> > >>> could launch different "sized" drillbits).  I have been able to get
> > >> things
> > >>> running, and be performant on my small test cluster.
> > >>>
> > >>> For those who may be interested here are some of my notes.
> > >>>
> > >>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some
> > compiling
> > >>> issues that Jacques was able to help me through. Basically, Java 1.8
> > >> isn't
> > >>> support for building yet (fails some tests) but there is a work
> around
> > to
> > >>> that.
> > >>>
> > >>> - I took the built package and placed it in MapR FS.  Now, I have
> every
> > >>> node mounting MapRFS to same NFS location.  I could be using a hdfs
> > >>> (maprfs) based tarball but I haven't done that yet. I am just playing
> > >>> around and the NFS mounting of MapRFS sure is handy in this regard.
> > >>>
> > >>> - At first I created a single sized Drill bit, the Marathon JSON is
> > like
> > >>> this:
> > >>>
> > >>> {
> > >>>
> > >>> "cmd":
> > "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
> > >>> --config /mapr/brewpot/mesos/drill/conf",
> > >>>
> > >>> "cpus": 2.0,
> > >>>
> > >>> "mem": 6144,
> > >>>
> > >>> "id": "drillpot",
> > >>>
> > >>> "instances": 1,
> > >>>
> > >>> "constraints": [["hostname", "UNIQUE"]]
> > >>>
> > >>> }
> > >>>
> > >>>
> > >>> So I can walk you through this.  The first is the command obviously.
> >  I
> > >>> use runbit instead of drillbit.sh start because I want this process
> to
> > >> stay
> > >>> running (from Marathon's perspective).  If I used the drillbit.sh, it
> > >> uses
> > >>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
> > >> start
> > >>> another.
> > >>>
> > >>> cpus: obvious, maybe a bit small, but I have a small cluster.
> > >>>
> > >>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max
> direct
> > >>> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set
> my
> > >>> marathon memory to 9GB or if the heap was used inside the direct
> > >> memory.  I
> > >>> could use some pointers here.
> > >>>
> > >>> id: This is the id of my cluster in the drill-overides.conf. I did
> this
> > >> so
> > >>> HA proxy would let me connect to the cluster via
> > drillpot.marathon.mesos
> > >>> and it worked pretty well!
> > >>>
> > >>> instances: I started with one, but could scale up with marathon
> > >>>
> > >>> constrains; I only wanted one drill bit per node because of port
> > >>> conflicts.  If I want to be multi tenant  and have more than one
> drill
> > >> bit
> > >>> per node, I would need to figure out how to abstract the ports. This
> is
> > >>> something that I could potentially do in a frame work for Mesos. But
> at
> > >> the
> > >>> same time, I wonder if if when a drill bit registers with a cluster,
> it
> > >>> could just "report" it ports in the zookeeper information.. This is
> > >>> intriguing because if it did this, we could allow it to pull random
> > ports
> > >>> offered to it from Mesos, registers the information, and away we go.
> > It
> > >>> would be intriguing.
> > >>>
> > >>>
> > >>> Once I posted this to marathon, all was good, bits started, queries
> > were
> > >>> had by all!  It worked well. Some challenges:
> > >>>
> > >>>
> > >>> 1.  Ports (as mentioned above) I am not managing those, so port
> > conflicts
> > >>> could occur.
> > >>>
> > >>> 2. I should use a tarball for Marathon, this would allow drill to
> work
> > on
> > >>> Mesos without the MapR requirement.
> > >>>
> > >>> 3. Logging. I have the default logback.xml in the conf directory and
> I
> > am
> > >>> getting file not found issues in my stderr on the Mesos tasks. This
> > isn't
> > >>> kill drill, and it still works, but I should organize my logging
> > better.
> > >>>
> > >>>
> > >>> Hopeful for the future:
> > >>>
> > >>> 1. It would be neat to have a frame work that did the actual running
> of
> > >> the
> > >>> bits.  Perhaps something that could scale up and down based on query
> > >> usage.
> > >>> I played around with some smaller drillbits (similar to how myriad
> > >> defines
> > >>> profiles) so I could have a drill cluster of 2 large bits, and 2
> small
> > >> bits
> > >>> on my 5 node cluster.   That worked, but lots of manual work. A
> > framework
> > >>> would be handy for managing that.
> > >>>
> > >>> 2. Other?
> > >>>
> > >>>
> > >>> I know this isn't a production thing, but I could see being able to
> go
> > >> from
> > >>> this to something a subset of production users could use in
> MapR/Mesos
> > >> (or
> > >>> just Mesos)   I just wanted to share some of my thought processes and
> > >> show
> > >>> a way that various tools can integrate.  Always happy to talk to shop
> > >> with
> > >>> folks on this stuff if anyone has any questions.
> > >>>
> > >>>
> > >>> John
> > >>
> > >>
> >
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281

 <http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Drill on Mesos - A Story

Posted by John Omernik <jo...@omernik.com>.
Timothy -

I played with that, and the performance I was getting in Docker was about
half that I was getting native. I think that for me, that was occurring
because if I ran it in Docker, I needed to install the MapR Client in the
container too, whereas when I run it in marathon, it's using the node's
access to the disk.  I am comfortable in places where performance stuff
like this occurs, to not docker all the things, and allow for the tar ball
method.  Perhaps Mesos could find a way to cache locally?  (Note, putting
it in MapR FS still has it load pretty quick)

John


On Thu, Jul 16, 2015 at 11:44 AM, Timothy Chen <tn...@gmail.com> wrote:

> Also will be nice to launch Drill with a docker image so no tar ball is
> needed, and much easier be cached on each slave.
>
> Tim
>
>
> > On Jul 16, 2015, at 9:37 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > Awesome thanks for the update on memory!
> >
> > On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
> > aengelbrecht@maprtech.com> wrote:
> >
> >> Great write up and information! Will be interesting to see how this
> >> evolves.
> >>
> >> A quick note, memory allocation is additive so you have to allocate for
> >> direct plus heap memory. Drill uses direct memory for data
> >> structures/operations and this is the one that will grow with larger
> data
> >> sets, etc.
> >>
> >> —Andries
> >>
> >>> On Jul 16, 2015, at 5:23 AM, John Omernik <jo...@omernik.com> wrote:
> >>>
> >>> I have been working on getting various frameworks working on my MapR
> >>> Cluster that is also running Mesos. Basically, while I know that there
> >> is a
> >>> package from MapR (for Drill) I am trying to find a way to better
> >> separate
> >>> the storage layer from the computer layer.
> >>>
> >>> This isn't a dig on MapR, or any of the Hadoop distributions, it's
> only I
> >>> want flexibility to try things, to have an R&D team working with the
> data
> >>> in an environment that can try out new frameworks etc.  This
> combination
> >>> has been very good to me (maybe not to MapR support who received lots
> of
> >>> quirky questions from me.   They have been helpful in furthering my
> >>> understanding of this space!)
> >>>
> >>> My next project I wanted to play with was Drill. I found
> >>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
> >> start
> >>> to a Drill on Mesos approach. I read through the code, I understand it,
> >> but
> >>> I wanted to see it at a more basic level.
> >>>
> >>> So I just figured out how to run Drill bits in Marathon (manually for
> >>> now).  Basically, for anyone wanting to play along at home, This
> actually
> >>> works VERY well.  I used MapR FS to host my package from Drill, I set a
> >>> conf directory.  (Multiple conf directories actually, I set it up so I
> >>> could launch different "sized" drillbits).  I have been able to get
> >> things
> >>> running, and be performant on my small test cluster.
> >>>
> >>> For those who may be interested here are some of my notes.
> >>>
> >>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some
> compiling
> >>> issues that Jacques was able to help me through. Basically, Java 1.8
> >> isn't
> >>> support for building yet (fails some tests) but there is a work around
> to
> >>> that.
> >>>
> >>> - I took the built package and placed it in MapR FS.  Now, I have every
> >>> node mounting MapRFS to same NFS location.  I could be using a hdfs
> >>> (maprfs) based tarball but I haven't done that yet. I am just playing
> >>> around and the NFS mounting of MapRFS sure is handy in this regard.
> >>>
> >>> - At first I created a single sized Drill bit, the Marathon JSON is
> like
> >>> this:
> >>>
> >>> {
> >>>
> >>> "cmd":
> "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
> >>> --config /mapr/brewpot/mesos/drill/conf",
> >>>
> >>> "cpus": 2.0,
> >>>
> >>> "mem": 6144,
> >>>
> >>> "id": "drillpot",
> >>>
> >>> "instances": 1,
> >>>
> >>> "constraints": [["hostname", "UNIQUE"]]
> >>>
> >>> }
> >>>
> >>>
> >>> So I can walk you through this.  The first is the command obviously.
>  I
> >>> use runbit instead of drillbit.sh start because I want this process to
> >> stay
> >>> running (from Marathon's perspective).  If I used the drillbit.sh, it
> >> uses
> >>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
> >> start
> >>> another.
> >>>
> >>> cpus: obvious, maybe a bit small, but I have a small cluster.
> >>>
> >>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
> >>> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
> >>> marathon memory to 9GB or if the heap was used inside the direct
> >> memory.  I
> >>> could use some pointers here.
> >>>
> >>> id: This is the id of my cluster in the drill-overides.conf. I did this
> >> so
> >>> HA proxy would let me connect to the cluster via
> drillpot.marathon.mesos
> >>> and it worked pretty well!
> >>>
> >>> instances: I started with one, but could scale up with marathon
> >>>
> >>> constrains; I only wanted one drill bit per node because of port
> >>> conflicts.  If I want to be multi tenant  and have more than one drill
> >> bit
> >>> per node, I would need to figure out how to abstract the ports. This is
> >>> something that I could potentially do in a frame work for Mesos. But at
> >> the
> >>> same time, I wonder if if when a drill bit registers with a cluster, it
> >>> could just "report" it ports in the zookeeper information.. This is
> >>> intriguing because if it did this, we could allow it to pull random
> ports
> >>> offered to it from Mesos, registers the information, and away we go.
> It
> >>> would be intriguing.
> >>>
> >>>
> >>> Once I posted this to marathon, all was good, bits started, queries
> were
> >>> had by all!  It worked well. Some challenges:
> >>>
> >>>
> >>> 1.  Ports (as mentioned above) I am not managing those, so port
> conflicts
> >>> could occur.
> >>>
> >>> 2. I should use a tarball for Marathon, this would allow drill to work
> on
> >>> Mesos without the MapR requirement.
> >>>
> >>> 3. Logging. I have the default logback.xml in the conf directory and I
> am
> >>> getting file not found issues in my stderr on the Mesos tasks. This
> isn't
> >>> kill drill, and it still works, but I should organize my logging
> better.
> >>>
> >>>
> >>> Hopeful for the future:
> >>>
> >>> 1. It would be neat to have a frame work that did the actual running of
> >> the
> >>> bits.  Perhaps something that could scale up and down based on query
> >> usage.
> >>> I played around with some smaller drillbits (similar to how myriad
> >> defines
> >>> profiles) so I could have a drill cluster of 2 large bits, and 2 small
> >> bits
> >>> on my 5 node cluster.   That worked, but lots of manual work. A
> framework
> >>> would be handy for managing that.
> >>>
> >>> 2. Other?
> >>>
> >>>
> >>> I know this isn't a production thing, but I could see being able to go
> >> from
> >>> this to something a subset of production users could use in MapR/Mesos
> >> (or
> >>> just Mesos)   I just wanted to share some of my thought processes and
> >> show
> >>> a way that various tools can integrate.  Always happy to talk to shop
> >> with
> >>> folks on this stuff if anyone has any questions.
> >>>
> >>>
> >>> John
> >>
> >>
>

Re: Drill on Mesos - A Story

Posted by Timothy Chen <tn...@gmail.com>.
Also will be nice to launch Drill with a docker image so no tar ball is needed, and much easier be cached on each slave.

Tim


> On Jul 16, 2015, at 9:37 AM, John Omernik <jo...@omernik.com> wrote:
> 
> Awesome thanks for the update on memory!
> 
> On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
> 
>> Great write up and information! Will be interesting to see how this
>> evolves.
>> 
>> A quick note, memory allocation is additive so you have to allocate for
>> direct plus heap memory. Drill uses direct memory for data
>> structures/operations and this is the one that will grow with larger data
>> sets, etc.
>> 
>> —Andries
>> 
>>> On Jul 16, 2015, at 5:23 AM, John Omernik <jo...@omernik.com> wrote:
>>> 
>>> I have been working on getting various frameworks working on my MapR
>>> Cluster that is also running Mesos. Basically, while I know that there
>> is a
>>> package from MapR (for Drill) I am trying to find a way to better
>> separate
>>> the storage layer from the computer layer.
>>> 
>>> This isn't a dig on MapR, or any of the Hadoop distributions, it's only I
>>> want flexibility to try things, to have an R&D team working with the data
>>> in an environment that can try out new frameworks etc.  This combination
>>> has been very good to me (maybe not to MapR support who received lots of
>>> quirky questions from me.   They have been helpful in furthering my
>>> understanding of this space!)
>>> 
>>> My next project I wanted to play with was Drill. I found
>>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
>> start
>>> to a Drill on Mesos approach. I read through the code, I understand it,
>> but
>>> I wanted to see it at a more basic level.
>>> 
>>> So I just figured out how to run Drill bits in Marathon (manually for
>>> now).  Basically, for anyone wanting to play along at home, This actually
>>> works VERY well.  I used MapR FS to host my package from Drill, I set a
>>> conf directory.  (Multiple conf directories actually, I set it up so I
>>> could launch different "sized" drillbits).  I have been able to get
>> things
>>> running, and be performant on my small test cluster.
>>> 
>>> For those who may be interested here are some of my notes.
>>> 
>>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some compiling
>>> issues that Jacques was able to help me through. Basically, Java 1.8
>> isn't
>>> support for building yet (fails some tests) but there is a work around to
>>> that.
>>> 
>>> - I took the built package and placed it in MapR FS.  Now, I have every
>>> node mounting MapRFS to same NFS location.  I could be using a hdfs
>>> (maprfs) based tarball but I haven't done that yet. I am just playing
>>> around and the NFS mounting of MapRFS sure is handy in this regard.
>>> 
>>> - At first I created a single sized Drill bit, the Marathon JSON is like
>>> this:
>>> 
>>> {
>>> 
>>> "cmd": "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
>>> --config /mapr/brewpot/mesos/drill/conf",
>>> 
>>> "cpus": 2.0,
>>> 
>>> "mem": 6144,
>>> 
>>> "id": "drillpot",
>>> 
>>> "instances": 1,
>>> 
>>> "constraints": [["hostname", "UNIQUE"]]
>>> 
>>> }
>>> 
>>> 
>>> So I can walk you through this.  The first is the command obviously.   I
>>> use runbit instead of drillbit.sh start because I want this process to
>> stay
>>> running (from Marathon's perspective).  If I used the drillbit.sh, it
>> uses
>>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
>> start
>>> another.
>>> 
>>> cpus: obvious, maybe a bit small, but I have a small cluster.
>>> 
>>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
>>> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
>>> marathon memory to 9GB or if the heap was used inside the direct
>> memory.  I
>>> could use some pointers here.
>>> 
>>> id: This is the id of my cluster in the drill-overides.conf. I did this
>> so
>>> HA proxy would let me connect to the cluster via drillpot.marathon.mesos
>>> and it worked pretty well!
>>> 
>>> instances: I started with one, but could scale up with marathon
>>> 
>>> constrains; I only wanted one drill bit per node because of port
>>> conflicts.  If I want to be multi tenant  and have more than one drill
>> bit
>>> per node, I would need to figure out how to abstract the ports. This is
>>> something that I could potentially do in a frame work for Mesos. But at
>> the
>>> same time, I wonder if if when a drill bit registers with a cluster, it
>>> could just "report" it ports in the zookeeper information.. This is
>>> intriguing because if it did this, we could allow it to pull random ports
>>> offered to it from Mesos, registers the information, and away we go.  It
>>> would be intriguing.
>>> 
>>> 
>>> Once I posted this to marathon, all was good, bits started, queries were
>>> had by all!  It worked well. Some challenges:
>>> 
>>> 
>>> 1.  Ports (as mentioned above) I am not managing those, so port conflicts
>>> could occur.
>>> 
>>> 2. I should use a tarball for Marathon, this would allow drill to work on
>>> Mesos without the MapR requirement.
>>> 
>>> 3. Logging. I have the default logback.xml in the conf directory and I am
>>> getting file not found issues in my stderr on the Mesos tasks. This isn't
>>> kill drill, and it still works, but I should organize my logging better.
>>> 
>>> 
>>> Hopeful for the future:
>>> 
>>> 1. It would be neat to have a frame work that did the actual running of
>> the
>>> bits.  Perhaps something that could scale up and down based on query
>> usage.
>>> I played around with some smaller drillbits (similar to how myriad
>> defines
>>> profiles) so I could have a drill cluster of 2 large bits, and 2 small
>> bits
>>> on my 5 node cluster.   That worked, but lots of manual work. A framework
>>> would be handy for managing that.
>>> 
>>> 2. Other?
>>> 
>>> 
>>> I know this isn't a production thing, but I could see being able to go
>> from
>>> this to something a subset of production users could use in MapR/Mesos
>> (or
>>> just Mesos)   I just wanted to share some of my thought processes and
>> show
>>> a way that various tools can integrate.  Always happy to talk to shop
>> with
>>> folks on this stuff if anyone has any questions.
>>> 
>>> 
>>> John
>> 
>> 

Re: Drill on Mesos - A Story

Posted by John Omernik <jo...@omernik.com>.
Awesome thanks for the update on memory!

On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> Great write up and information! Will be interesting to see how this
> evolves.
>
> A quick note, memory allocation is additive so you have to allocate for
> direct plus heap memory. Drill uses direct memory for data
> structures/operations and this is the one that will grow with larger data
> sets, etc.
>
> —Andries
>
> On Jul 16, 2015, at 5:23 AM, John Omernik <jo...@omernik.com> wrote:
>
> > I have been working on getting various frameworks working on my MapR
> > Cluster that is also running Mesos. Basically, while I know that there
> is a
> > package from MapR (for Drill) I am trying to find a way to better
> separate
> > the storage layer from the computer layer.
> >
> > This isn't a dig on MapR, or any of the Hadoop distributions, it's only I
> > want flexibility to try things, to have an R&D team working with the data
> > in an environment that can try out new frameworks etc.  This combination
> > has been very good to me (maybe not to MapR support who received lots of
> > quirky questions from me.   They have been helpful in furthering my
> > understanding of this space!)
> >
> > My next project I wanted to play with was Drill. I found
> > https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
> start
> > to a Drill on Mesos approach. I read through the code, I understand it,
> but
> > I wanted to see it at a more basic level.
> >
> > So I just figured out how to run Drill bits in Marathon (manually for
> > now).  Basically, for anyone wanting to play along at home, This actually
> > works VERY well.  I used MapR FS to host my package from Drill, I set a
> > conf directory.  (Multiple conf directories actually, I set it up so I
> > could launch different "sized" drillbits).  I have been able to get
> things
> > running, and be performant on my small test cluster.
> >
> > For those who may be interested here are some of my notes.
> >
> > - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some compiling
> > issues that Jacques was able to help me through. Basically, Java 1.8
> isn't
> > support for building yet (fails some tests) but there is a work around to
> > that.
> >
> > - I took the built package and placed it in MapR FS.  Now, I have every
> > node mounting MapRFS to same NFS location.  I could be using a hdfs
> > (maprfs) based tarball but I haven't done that yet. I am just playing
> > around and the NFS mounting of MapRFS sure is handy in this regard.
> >
> > - At first I created a single sized Drill bit, the Marathon JSON is like
> > this:
> >
> > {
> >
> > "cmd": "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
> > --config /mapr/brewpot/mesos/drill/conf",
> >
> > "cpus": 2.0,
> >
> > "mem": 6144,
> >
> > "id": "drillpot",
> >
> > "instances": 1,
> >
> > "constraints": [["hostname", "UNIQUE"]]
> >
> > }
> >
> >
> > So I can walk you through this.  The first is the command obviously.   I
> > use runbit instead of drillbit.sh start because I want this process to
> stay
> > running (from Marathon's perspective).  If I used the drillbit.sh, it
> uses
> > nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
> start
> > another.
> >
> > cpus: obvious, maybe a bit small, but I have a small cluster.
> >
> > mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
> > memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
> > marathon memory to 9GB or if the heap was used inside the direct
> memory.  I
> > could use some pointers here.
> >
> > id: This is the id of my cluster in the drill-overides.conf. I did this
> so
> > HA proxy would let me connect to the cluster via drillpot.marathon.mesos
> > and it worked pretty well!
> >
> > instances: I started with one, but could scale up with marathon
> >
> > constrains; I only wanted one drill bit per node because of port
> > conflicts.  If I want to be multi tenant  and have more than one drill
> bit
> > per node, I would need to figure out how to abstract the ports. This is
> > something that I could potentially do in a frame work for Mesos. But at
> the
> > same time, I wonder if if when a drill bit registers with a cluster, it
> > could just "report" it ports in the zookeeper information.. This is
> > intriguing because if it did this, we could allow it to pull random ports
> > offered to it from Mesos, registers the information, and away we go.  It
> > would be intriguing.
> >
> >
> > Once I posted this to marathon, all was good, bits started, queries were
> > had by all!  It worked well. Some challenges:
> >
> >
> > 1.  Ports (as mentioned above) I am not managing those, so port conflicts
> > could occur.
> >
> > 2. I should use a tarball for Marathon, this would allow drill to work on
> > Mesos without the MapR requirement.
> >
> > 3. Logging. I have the default logback.xml in the conf directory and I am
> > getting file not found issues in my stderr on the Mesos tasks. This isn't
> > kill drill, and it still works, but I should organize my logging better.
> >
> >
> > Hopeful for the future:
> >
> > 1. It would be neat to have a frame work that did the actual running of
> the
> > bits.  Perhaps something that could scale up and down based on query
> usage.
> > I played around with some smaller drillbits (similar to how myriad
> defines
> > profiles) so I could have a drill cluster of 2 large bits, and 2 small
> bits
> > on my 5 node cluster.   That worked, but lots of manual work. A framework
> > would be handy for managing that.
> >
> > 2. Other?
> >
> >
> > I know this isn't a production thing, but I could see being able to go
> from
> > this to something a subset of production users could use in MapR/Mesos
> (or
> > just Mesos)   I just wanted to share some of my thought processes and
> show
> > a way that various tools can integrate.  Always happy to talk to shop
> with
> > folks on this stuff if anyone has any questions.
> >
> >
> > John
>
>

Re: Drill on Mesos - A Story

Posted by Andries Engelbrecht <ae...@maprtech.com>.
Great write up and information! Will be interesting to see how this evolves.

A quick note, memory allocation is additive so you have to allocate for direct plus heap memory. Drill uses direct memory for data structures/operations and this is the one that will grow with larger data sets, etc.

—Andries

On Jul 16, 2015, at 5:23 AM, John Omernik <jo...@omernik.com> wrote:

> I have been working on getting various frameworks working on my MapR
> Cluster that is also running Mesos. Basically, while I know that there is a
> package from MapR (for Drill) I am trying to find a way to better separate
> the storage layer from the computer layer.
> 
> This isn't a dig on MapR, or any of the Hadoop distributions, it's only I
> want flexibility to try things, to have an R&D team working with the data
> in an environment that can try out new frameworks etc.  This combination
> has been very good to me (maybe not to MapR support who received lots of
> quirky questions from me.   They have been helpful in furthering my
> understanding of this space!)
> 
> My next project I wanted to play with was Drill. I found
> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic start
> to a Drill on Mesos approach. I read through the code, I understand it, but
> I wanted to see it at a more basic level.
> 
> So I just figured out how to run Drill bits in Marathon (manually for
> now).  Basically, for anyone wanting to play along at home, This actually
> works VERY well.  I used MapR FS to host my package from Drill, I set a
> conf directory.  (Multiple conf directories actually, I set it up so I
> could launch different "sized" drillbits).  I have been able to get things
> running, and be performant on my small test cluster.
> 
> For those who may be interested here are some of my notes.
> 
> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some compiling
> issues that Jacques was able to help me through. Basically, Java 1.8 isn't
> support for building yet (fails some tests) but there is a work around to
> that.
> 
> - I took the built package and placed it in MapR FS.  Now, I have every
> node mounting MapRFS to same NFS location.  I could be using a hdfs
> (maprfs) based tarball but I haven't done that yet. I am just playing
> around and the NFS mounting of MapRFS sure is handy in this regard.
> 
> - At first I created a single sized Drill bit, the Marathon JSON is like
> this:
> 
> {
> 
> "cmd": "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
> --config /mapr/brewpot/mesos/drill/conf",
> 
> "cpus": 2.0,
> 
> "mem": 6144,
> 
> "id": "drillpot",
> 
> "instances": 1,
> 
> "constraints": [["hostname", "UNIQUE"]]
> 
> }
> 
> 
> So I can walk you through this.  The first is the command obviously.   I
> use runbit instead of drillbit.sh start because I want this process to stay
> running (from Marathon's perspective).  If I used the drillbit.sh, it uses
> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to start
> another.
> 
> cpus: obvious, maybe a bit small, but I have a small cluster.
> 
> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
> marathon memory to 9GB or if the heap was used inside the direct memory.  I
> could use some pointers here.
> 
> id: This is the id of my cluster in the drill-overides.conf. I did this so
> HA proxy would let me connect to the cluster via drillpot.marathon.mesos
> and it worked pretty well!
> 
> instances: I started with one, but could scale up with marathon
> 
> constrains; I only wanted one drill bit per node because of port
> conflicts.  If I want to be multi tenant  and have more than one drill bit
> per node, I would need to figure out how to abstract the ports. This is
> something that I could potentially do in a frame work for Mesos. But at the
> same time, I wonder if if when a drill bit registers with a cluster, it
> could just "report" it ports in the zookeeper information.. This is
> intriguing because if it did this, we could allow it to pull random ports
> offered to it from Mesos, registers the information, and away we go.  It
> would be intriguing.
> 
> 
> Once I posted this to marathon, all was good, bits started, queries were
> had by all!  It worked well. Some challenges:
> 
> 
> 1.  Ports (as mentioned above) I am not managing those, so port conflicts
> could occur.
> 
> 2. I should use a tarball for Marathon, this would allow drill to work on
> Mesos without the MapR requirement.
> 
> 3. Logging. I have the default logback.xml in the conf directory and I am
> getting file not found issues in my stderr on the Mesos tasks. This isn't
> kill drill, and it still works, but I should organize my logging better.
> 
> 
> Hopeful for the future:
> 
> 1. It would be neat to have a frame work that did the actual running of the
> bits.  Perhaps something that could scale up and down based on query usage.
> I played around with some smaller drillbits (similar to how myriad defines
> profiles) so I could have a drill cluster of 2 large bits, and 2 small bits
> on my 5 node cluster.   That worked, but lots of manual work. A framework
> would be handy for managing that.
> 
> 2. Other?
> 
> 
> I know this isn't a production thing, but I could see being able to go from
> this to something a subset of production users could use in MapR/Mesos (or
> just Mesos)   I just wanted to share some of my thought processes and show
> a way that various tools can integrate.  Always happy to talk to shop with
> folks on this stuff if anyone has any questions.
> 
> 
> John