You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Tom Arnfeld <to...@duedil.com> on 2014/02/04 13:31:26 UTC

System dependencies with Mesos

I’m investigating the possibility of using Mesos to solve the problem of resource allocation between a Hadoop cluster and set of Jenkins slaves (and I like the possibility of being able to easily deploy other frameworks). One of the biggest overhanging questions I can’t seem to find an answer to is how to manage system dependencies across a wide variety of frameworks, and jobs running within those frameworks.

I came across this thread (http://www.mail-archive.com/user@mesos.apache.org/msg00301.html) and caching executor files seems to be the running solution, though not implemented yet. I too would really like to avoid shipping system dependencies (c-deps for python packages, as an example) along with every single job, and i’m especially unsure how this would interact with the Hadoop/Jenkins mesos schedulers (as each hadoop job may require it’s own system dependencies).

More importantly, the architecture of the machine submitting the job is often different from the slaves so we can’t simply ship all the built dependencies with the task.

We’re solving this problem at the moment for Hadoop by installing all dependencies we require on every hadoop task tracker node, which is far from ideal. For jenkins, we’re using Docker to isolate execution of different types of jobs, and built all system dependencies for a suite of jobs into docker images.

I like the idea of continuing down the path of Docker for process isolation and system dependency management, but I don’t see any easy way for this to interact with the existing hadoop/jenkins/etc. schedulers. I guess it’d require us to build our own schedulers/executors that wrapped the process in a Docker container.

I’d love to hear how others are solving this problem… and/or whether Docker seems like the wrong way to go.

—

Tom Arnfeld
Developer // DueDil

Re: System dependencies with Mesos

Posted by Tobias Knaup <to...@knaup.me>.

Sorry to drop off the thread here. Airbnb is using mesos/hadoop. Not sure
if I understand the question - mesos/hadoop manages the task tracker
lifecycle so there is no need for Marathon here.
With the upcoming Docker integration you'll be able to isolate them in a
container:
https://github.com/mesosphere/medea


On Thu, Feb 6, 2014 at 12:33 AM, Tom Arnfeld <to...@duedil.com> wrote:

> Hey Tobi,
>
> That's my thinking too.. Having taken a closer look at Marathon I now
> realise at what level it sits (I previously thought it was a framework
> itself).
>
> Do you know of anyone currently running Hadoop task trackers using
> Marathon? If so, do you think it would be possible to implement a similar
> scheduler to the task scheduler provided by
> https://github.com/mesos/hadoop - if there isn't one already. Or is the
> best way to simply launch long running task trackers?
>
> My point being, i'd like to be able to isolate the hadoop task trackers
> (and even Chronos tasks, for example) within the docker containers to
> enable hadoop tasks to use the dependencies built into the docker image.
>
> Thanks.
>
> --
>
> Tom Arnfeld
> Developer // DueDil
>
> On 6 Feb 2014, at 01:32, Tobias Knaup <to...@knaup.me> wrote:
>
> Hi Tom,
>
> Docker is definitely a good option for this. Marathon already has basic
> support for Docker, and there has been some work recently to integrate it
> more tightly with Mesos.
>
> Cheers,
>
> Tobi
>
>
> On Tue, Feb 4, 2014 at 4:31 AM, Tom Arnfeld <to...@duedil.com> wrote:
>
>> I'm investigating the possibility of using Mesos to solve the problem of
>> resource allocation between a Hadoop cluster and set of Jenkins slaves (and
>> I like the possibility of being able to easily deploy other frameworks).
>> One of the biggest overhanging questions I can't seem to find an answer to
>> is how to manage system dependencies across a wide variety of frameworks,
>> and jobs running within those frameworks.
>>
>> I came across this thread (
>> http://www.mail-archive.com/user@mesos.apache.org/msg00301.html) and
>> caching executor files seems to be the running solution, though not
>> implemented yet. I too would really like to avoid shipping system
>> dependencies (c-deps for python packages, as an example) along with every
>> single job, and i'm especially unsure how this would interact with the
>> Hadoop/Jenkins mesos schedulers (as each hadoop job may require it's own
>> system dependencies).
>>
>> More importantly, the architecture of the machine submitting the job is
>> often different from the slaves so we can't simply ship all the built
>> dependencies with the task.
>>
>> We're solving this problem at the moment for Hadoop by installing all
>> dependencies we require on every hadoop task tracker node, which is far
>> from ideal. For jenkins, we're using Docker to isolate execution of
>> different types of jobs, and built all system dependencies for a suite of
>> jobs into docker images.
>>
>> I like the idea of continuing down the path of Docker for process
>> isolation and system dependency management, but I don't see any easy way
>> for this to interact with the existing hadoop/jenkins/etc. schedulers. I
>> guess it'd require us to build our own schedulers/executors that wrapped
>> the process in a Docker container.
>>
>> I'd love to hear how others are solving this problem... and/or whether
>> Docker seems like the wrong way to go.
>>
>>  --
>>
>> Tom Arnfeld
>> Developer // DueDil
>>
>
>
>

Re: System dependencies with Mesos

Posted by Tom Arnfeld <to...@duedil.com>.

Hey Tobi,

That’s my thinking too.. Having taken a closer look at Marathon I now realise at what level it sits (I previously thought it was a framework itself).

Do you know of anyone currently running Hadoop task trackers using Marathon? If so, do you think it would be possible to implement a similar scheduler to the task scheduler provided by https://github.com/mesos/hadoop – if there isn’t one already. Or is the best way to simply launch long running task trackers?

My point being, i’d like to be able to isolate the hadoop task trackers (and even Chronos tasks, for example) within the docker containers to enable hadoop tasks to use the dependencies built into the docker image.

Thanks.

—

Tom Arnfeld
Developer // DueDil

On 6 Feb 2014, at 01:32, Tobias Knaup <to...@knaup.me> wrote:

> Hi Tom,
> 
> Docker is definitely a good option for this. Marathon already has basic support for Docker, and there has been some work recently to integrate it more tightly with Mesos.
> 
> Cheers,
> 
> Tobi
> 
> 
> On Tue, Feb 4, 2014 at 4:31 AM, Tom Arnfeld <to...@duedil.com> wrote:
> I’m investigating the possibility of using Mesos to solve the problem of resource allocation between a Hadoop cluster and set of Jenkins slaves (and I like the possibility of being able to easily deploy other frameworks). One of the biggest overhanging questions I can’t seem to find an answer to is how to manage system dependencies across a wide variety of frameworks, and jobs running within those frameworks.
> 
> I came across this thread (http://www.mail-archive.com/user@mesos.apache.org/msg00301.html) and caching executor files seems to be the running solution, though not implemented yet. I too would really like to avoid shipping system dependencies (c-deps for python packages, as an example) along with every single job, and i’m especially unsure how this would interact with the Hadoop/Jenkins mesos schedulers (as each hadoop job may require it’s own system dependencies).
> 
> More importantly, the architecture of the machine submitting the job is often different from the slaves so we can’t simply ship all the built dependencies with the task.
> 
> We’re solving this problem at the moment for Hadoop by installing all dependencies we require on every hadoop task tracker node, which is far from ideal. For jenkins, we’re using Docker to isolate execution of different types of jobs, and built all system dependencies for a suite of jobs into docker images.
> 
> I like the idea of continuing down the path of Docker for process isolation and system dependency management, but I don’t see any easy way for this to interact with the existing hadoop/jenkins/etc. schedulers. I guess it’d require us to build our own schedulers/executors that wrapped the process in a Docker container.
> 
> I’d love to hear how others are solving this problem… and/or whether Docker seems like the wrong way to go.
> 
> —
> 
> Tom Arnfeld
> Developer // DueDil
>

Re: System dependencies with Mesos

Posted by Tobias Knaup <to...@knaup.me>.

Hi Tom,

Docker is definitely a good option for this. Marathon already has basic
support for Docker, and there has been some work recently to integrate it
more tightly with Mesos.

Cheers,

Tobi


On Tue, Feb 4, 2014 at 4:31 AM, Tom Arnfeld <to...@duedil.com> wrote:

> I'm investigating the possibility of using Mesos to solve the problem of
> resource allocation between a Hadoop cluster and set of Jenkins slaves (and
> I like the possibility of being able to easily deploy other frameworks).
> One of the biggest overhanging questions I can't seem to find an answer to
> is how to manage system dependencies across a wide variety of frameworks,
> and jobs running within those frameworks.
>
> I came across this thread (
> http://www.mail-archive.com/user@mesos.apache.org/msg00301.html) and
> caching executor files seems to be the running solution, though not
> implemented yet. I too would really like to avoid shipping system
> dependencies (c-deps for python packages, as an example) along with every
> single job, and i'm especially unsure how this would interact with the
> Hadoop/Jenkins mesos schedulers (as each hadoop job may require it's own
> system dependencies).
>
> More importantly, the architecture of the machine submitting the job is
> often different from the slaves so we can't simply ship all the built
> dependencies with the task.
>
> We're solving this problem at the moment for Hadoop by installing all
> dependencies we require on every hadoop task tracker node, which is far
> from ideal. For jenkins, we're using Docker to isolate execution of
> different types of jobs, and built all system dependencies for a suite of
> jobs into docker images.
>
> I like the idea of continuing down the path of Docker for process
> isolation and system dependency management, but I don't see any easy way
> for this to interact with the existing hadoop/jenkins/etc. schedulers. I
> guess it'd require us to build our own schedulers/executors that wrapped
> the process in a Docker container.
>
> I'd love to hear how others are solving this problem... and/or whether
> Docker seems like the wrong way to go.
>
> --
>
> Tom Arnfeld
> Developer // DueDil
>