You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Tom Arnfeld <to...@duedil.com> on 2015/03/28 14:52:08 UTC

Mesos Hadoop Framework 0.1.0

Hey everyone,


I thought it best to send an email to the list before merging and tagging a 0.1.0 release for the Hadoop on Mesos framework. This release is for a new feature we've been working on for quite some time, which allows Hadoop TaskTrackers to be semi-terminated when they are idle, without destroying any map output they may need to retain for running reduce tasks.


Essentially this means that over the lifetime of a job (one with more map/reduce tasks than the size of the cluster) the ratio of map and reduce slots can change, resulting in significantly better resource utilization, because the map slots can be freed up after they have finished doing work.


If anyone is running Hadoop on Mesos or would be kind enough to contribute to reviewing the code in the diff, or giving the branch a go on their cluster, that would be very much appreciated! We've been running the patch in production for several months and have seen some quite significant performance gains with our type of workload.


The pull request is here https://github.com/mesos/hadoop/pull/33.


Feel free to get in touch if you have any questions! Thanks!


--

Tom Arnfeld
Developer // DueDil

Re: Mesos Hadoop Framework 0.1.0

Posted by Tom Arnfeld <to...@duedil.com>.
That's precisely it.




The framework has actually been around for quite some time, probably one of the very early Mesos PoC framework implementations. It's certainly very stable and get's the job done for us, and allows us to focus on progressing with more appropriate technologies like Spark (or YARN for Hadoop fans)!




We'll probably be pushing a few more improvements in the short/medium term, again focussed around ensuring Hadoop MRv1 doesn't create shared cluster woes. FairScheduler job pools are next on the list.



--


Tom Arnfeld

Developer // DueDil






On Saturday, Mar 28, 2015 at 2:51 pm, Jeff Schroeder <je...@computer.org>, wrote:

Gotcha so this is just a better mesos framework for Hadoop and has nothing to do with the Myriad / Yarn stuff, which similarly, prevents you from having to statically setup a Hadoop cluster.




https://mesosphere.com/2015/02/11/yarn-on-mesos-big-data/




Nice stuff

On Saturday, March 28, 2015, Tom Arnfeld <to...@duedil.com> wrote:

To follow up, this is also a decent solution to a nasty problem in the current framework detailed here, https://github.com/mesos/hadoop/issues/32.




--


Tom Arnfeld

Developer // DueDil






On Sat, Mar 28, 2015 at 2:40 PM, Jeff Schroeder <je...@computer.org> wrote:



Does this have any pros / cons over Myriad, which runs Yarn on Mesos? Other than not requiring Yarn :)

On Saturday, March 28, 2015, Tom Arnfeld <to...@duedil.com> wrote:





Hey everyone,




I thought it best to send an email to the list before merging and tagging a 0.1.0 release for the Hadoop on Mesos framework. This release is for a new feature we've been working on for quite some time, which allows Hadoop TaskTrackers to be semi-terminated when they are idle, without destroying any map output they may need to retain for running reduce tasks.




Essentially this means that over the lifetime of a job (one with more map/reduce tasks than the size of the cluster) the ratio of map and reduce slots can change, resulting in significantly better resource utilization, because the map slots can be freed up after they have finished doing work.




If anyone is running Hadoop on Mesos or would be kind enough to contribute to reviewing the code in the diff, or giving the branch a go on their cluster, that would be very much appreciated! We've been running the patch in production for several months and have seen some quite significant performance gains with our type of workload.




The pull request is here https://github.com/mesos/hadoop/pull/33.




Feel free to get in touch if you have any questions! Thanks!





--


Tom Arnfeld

Developer // DueDil









-- 
Text by Jeff, typos by iPhone










-- 
Text by Jeff, typos by iPhone

Re: Mesos Hadoop Framework 0.1.0

Posted by Jeff Schroeder <je...@computer.org>.
Gotcha so this is just a better mesos framework for Hadoop and has nothing
to do with the Myriad / Yarn stuff, which similarly, prevents you from
having to statically setup a Hadoop cluster.

https://mesosphere.com/2015/02/11/yarn-on-mesos-big-data/

Nice stuff

On Saturday, March 28, 2015, Tom Arnfeld <to...@duedil.com> wrote:

> To follow up, this is also a decent solution to a nasty problem in the
> current framework detailed here, https://github.com/mesos/hadoop/issues/32
> .
>
>
> --
>
> Tom Arnfeld
> Developer // DueDil
>
>
> On Sat, Mar 28, 2015 at 2:40 PM, Jeff Schroeder <
> jeffschroeder@computer.org
> <javascript:_e(%7B%7D,'cvml','jeffschroeder@computer.org');>> wrote:
>
>> Does this have any pros / cons over Myriad, which runs Yarn on Mesos?
>> Other than not requiring Yarn :)
>>
>> On Saturday, March 28, 2015, Tom Arnfeld <tom@duedil.com
>> <javascript:_e(%7B%7D,'cvml','tom@duedil.com');>> wrote:
>>
>>>  Hey everyone,
>>>
>>> I thought it best to send an email to the list before merging and
>>> tagging a 0.1.0 release for the Hadoop on Mesos framework. This release is
>>> for a new feature we've been working on for quite some time, which allows
>>> Hadoop TaskTrackers to be semi-terminated when they are idle, without
>>> destroying any map output they may need to retain for running reduce tasks.
>>>
>>> Essentially this means that over the lifetime of a job (one with more
>>> map/reduce tasks than the size of the cluster) the ratio of map and reduce
>>> slots can change, resulting in significantly better resource utilization,
>>> because the map slots can be freed up after they have finished doing work.
>>>
>>> If anyone is running Hadoop on Mesos or would be kind enough to
>>> contribute to reviewing the code in the diff, or giving the branch a go on
>>> their cluster, that would be very much appreciated! We've been running the
>>> patch in production for several months and have seen some quite significant
>>> performance gains with our type of workload.
>>>
>>> The pull request is here https://github.com/mesos/hadoop/pull/33.
>>>
>>> Feel free to get in touch if you have any questions! Thanks!
>>>
>>>  --
>>>
>>> Tom Arnfeld
>>> Developer // DueDil
>>>
>>
>>
>> --
>> Text by Jeff, typos by iPhone
>>
>
>

-- 
Text by Jeff, typos by iPhone

Re: Mesos Hadoop Framework 0.1.0

Posted by Tom Arnfeld <to...@duedil.com>.
To follow up, this is also a decent solution to a nasty problem in the current framework detailed here, https://github.com/mesos/hadoop/issues/32.




--


Tom Arnfeld

Developer // DueDil

On Sat, Mar 28, 2015 at 2:40 PM, Jeff Schroeder
<je...@computer.org> wrote:

> Does this have any pros / cons over Myriad, which runs Yarn on Mesos? Other
> than not requiring Yarn :)
> On Saturday, March 28, 2015, Tom Arnfeld <to...@duedil.com> wrote:
>>  Hey everyone,
>>
>> I thought it best to send an email to the list before merging and tagging
>> a 0.1.0 release for the Hadoop on Mesos framework. This release is for a
>> new feature we've been working on for quite some time, which allows Hadoop
>> TaskTrackers to be semi-terminated when they are idle, without destroying
>> any map output they may need to retain for running reduce tasks.
>>
>> Essentially this means that over the lifetime of a job (one with more
>> map/reduce tasks than the size of the cluster) the ratio of map and reduce
>> slots can change, resulting in significantly better resource utilization,
>> because the map slots can be freed up after they have finished doing work.
>>
>> If anyone is running Hadoop on Mesos or would be kind enough to contribute
>> to reviewing the code in the diff, or giving the branch a go on their
>> cluster, that would be very much appreciated! We've been running the patch
>> in production for several months and have seen some quite significant
>> performance gains with our type of workload.
>>
>> The pull request is here https://github.com/mesos/hadoop/pull/33.
>>
>> Feel free to get in touch if you have any questions! Thanks!
>>
>>  --
>>
>> Tom Arnfeld
>> Developer // DueDil
>>
> -- 
> Text by Jeff, typos by iPhone

Re: Mesos Hadoop Framework 0.1.0

Posted by Tom Arnfeld <to...@duedil.com>.
We're running the framework to support our legacy jobs written in Hadoop MRv1. Essentially this is a feature that moves further towards getting Hadoop to play nicely on a shared cluster.


The Hadoop on Mesos framework is pretty greedy at the moment, and it can be quite problematic if you're trying to pack a multi-tenant cluster to the max.



--


Tom Arnfeld

Developer // DueDil






On Saturday, Mar 28, 2015 at 2:40 pm, Jeff Schroeder <je...@computer.org>, wrote:

Does this have any pros / cons over Myriad, which runs Yarn on Mesos? Other than not requiring Yarn :)

On Saturday, March 28, 2015, Tom Arnfeld <to...@duedil.com> wrote:





Hey everyone,




I thought it best to send an email to the list before merging and tagging a 0.1.0 release for the Hadoop on Mesos framework. This release is for a new feature we've been working on for quite some time, which allows Hadoop TaskTrackers to be semi-terminated when they are idle, without destroying any map output they may need to retain for running reduce tasks.




Essentially this means that over the lifetime of a job (one with more map/reduce tasks than the size of the cluster) the ratio of map and reduce slots can change, resulting in significantly better resource utilization, because the map slots can be freed up after they have finished doing work.




If anyone is running Hadoop on Mesos or would be kind enough to contribute to reviewing the code in the diff, or giving the branch a go on their cluster, that would be very much appreciated! We've been running the patch in production for several months and have seen some quite significant performance gains with our type of workload.




The pull request is here https://github.com/mesos/hadoop/pull/33.




Feel free to get in touch if you have any questions! Thanks!





--


Tom Arnfeld

Developer // DueDil









-- 
Text by Jeff, typos by iPhone

Re: Mesos Hadoop Framework 0.1.0

Posted by Jeff Schroeder <je...@computer.org>.
Does this have any pros / cons over Myriad, which runs Yarn on Mesos? Other
than not requiring Yarn :)

On Saturday, March 28, 2015, Tom Arnfeld <to...@duedil.com> wrote:

>  Hey everyone,
>
> I thought it best to send an email to the list before merging and tagging
> a 0.1.0 release for the Hadoop on Mesos framework. This release is for a
> new feature we've been working on for quite some time, which allows Hadoop
> TaskTrackers to be semi-terminated when they are idle, without destroying
> any map output they may need to retain for running reduce tasks.
>
> Essentially this means that over the lifetime of a job (one with more
> map/reduce tasks than the size of the cluster) the ratio of map and reduce
> slots can change, resulting in significantly better resource utilization,
> because the map slots can be freed up after they have finished doing work.
>
> If anyone is running Hadoop on Mesos or would be kind enough to contribute
> to reviewing the code in the diff, or giving the branch a go on their
> cluster, that would be very much appreciated! We've been running the patch
> in production for several months and have seen some quite significant
> performance gains with our type of workload.
>
> The pull request is here https://github.com/mesos/hadoop/pull/33.
>
> Feel free to get in touch if you have any questions! Thanks!
>
>  --
>
> Tom Arnfeld
> Developer // DueDil
>


-- 
Text by Jeff, typos by iPhone

Re: Mesos Hadoop Framework 0.1.0

Posted by Elizabeth Lingg <el...@mesosphere.io>.
+1 Sounds great, thanks for making the improvements.

-Elizabeth

On Saturday, March 28, 2015, Brenden Matthews <br...@diddyinc.com> wrote:

> This is great, thanks Tom.
>
> On Sat, Mar 28, 2015 at 6:52 AM, Tom Arnfeld <tom@duedil.com
> <javascript:;>> wrote:
>
> > Hey everyone,
> >
> >
> > I thought it best to send an email to the list before merging and tagging
> > a 0.1.0 release for the Hadoop on Mesos framework. This release is for a
> > new feature we've been working on for quite some time, which allows
> Hadoop
> > TaskTrackers to be semi-terminated when they are idle, without destroying
> > any map output they may need to retain for running reduce tasks.
> >
> >
> > Essentially this means that over the lifetime of a job (one with more
> > map/reduce tasks than the size of the cluster) the ratio of map and
> reduce
> > slots can change, resulting in significantly better resource utilization,
> > because the map slots can be freed up after they have finished doing
> work.
> >
> >
> > If anyone is running Hadoop on Mesos or would be kind enough to
> contribute
> > to reviewing the code in the diff, or giving the branch a go on their
> > cluster, that would be very much appreciated! We've been running the
> patch
> > in production for several months and have seen some quite significant
> > performance gains with our type of workload.
> >
> >
> > The pull request is here https://github.com/mesos/hadoop/pull/33.
> >
> >
> > Feel free to get in touch if you have any questions! Thanks!
> >
> >
> > --
> >
> > Tom Arnfeld
> > Developer // DueDil
>

Re: Mesos Hadoop Framework 0.1.0

Posted by Elizabeth Lingg <el...@mesosphere.io>.
+1 Sounds great, thanks for making the improvements.

-Elizabeth

On Saturday, March 28, 2015, Brenden Matthews <br...@diddyinc.com> wrote:

> This is great, thanks Tom.
>
> On Sat, Mar 28, 2015 at 6:52 AM, Tom Arnfeld <tom@duedil.com
> <javascript:;>> wrote:
>
> > Hey everyone,
> >
> >
> > I thought it best to send an email to the list before merging and tagging
> > a 0.1.0 release for the Hadoop on Mesos framework. This release is for a
> > new feature we've been working on for quite some time, which allows
> Hadoop
> > TaskTrackers to be semi-terminated when they are idle, without destroying
> > any map output they may need to retain for running reduce tasks.
> >
> >
> > Essentially this means that over the lifetime of a job (one with more
> > map/reduce tasks than the size of the cluster) the ratio of map and
> reduce
> > slots can change, resulting in significantly better resource utilization,
> > because the map slots can be freed up after they have finished doing
> work.
> >
> >
> > If anyone is running Hadoop on Mesos or would be kind enough to
> contribute
> > to reviewing the code in the diff, or giving the branch a go on their
> > cluster, that would be very much appreciated! We've been running the
> patch
> > in production for several months and have seen some quite significant
> > performance gains with our type of workload.
> >
> >
> > The pull request is here https://github.com/mesos/hadoop/pull/33.
> >
> >
> > Feel free to get in touch if you have any questions! Thanks!
> >
> >
> > --
> >
> > Tom Arnfeld
> > Developer // DueDil
>

Re: Mesos Hadoop Framework 0.1.0

Posted by Brenden Matthews <br...@diddyinc.com>.
This is great, thanks Tom.

On Sat, Mar 28, 2015 at 6:52 AM, Tom Arnfeld <to...@duedil.com> wrote:

> Hey everyone,
>
>
> I thought it best to send an email to the list before merging and tagging
> a 0.1.0 release for the Hadoop on Mesos framework. This release is for a
> new feature we've been working on for quite some time, which allows Hadoop
> TaskTrackers to be semi-terminated when they are idle, without destroying
> any map output they may need to retain for running reduce tasks.
>
>
> Essentially this means that over the lifetime of a job (one with more
> map/reduce tasks than the size of the cluster) the ratio of map and reduce
> slots can change, resulting in significantly better resource utilization,
> because the map slots can be freed up after they have finished doing work.
>
>
> If anyone is running Hadoop on Mesos or would be kind enough to contribute
> to reviewing the code in the diff, or giving the branch a go on their
> cluster, that would be very much appreciated! We've been running the patch
> in production for several months and have seen some quite significant
> performance gains with our type of workload.
>
>
> The pull request is here https://github.com/mesos/hadoop/pull/33.
>
>
> Feel free to get in touch if you have any questions! Thanks!
>
>
> --
>
> Tom Arnfeld
> Developer // DueDil

Re: Mesos Hadoop Framework 0.1.0

Posted by Jeff Schroeder <je...@computer.org>.
Does this have any pros / cons over Myriad, which runs Yarn on Mesos? Other
than not requiring Yarn :)

On Saturday, March 28, 2015, Tom Arnfeld <to...@duedil.com> wrote:

>  Hey everyone,
>
> I thought it best to send an email to the list before merging and tagging
> a 0.1.0 release for the Hadoop on Mesos framework. This release is for a
> new feature we've been working on for quite some time, which allows Hadoop
> TaskTrackers to be semi-terminated when they are idle, without destroying
> any map output they may need to retain for running reduce tasks.
>
> Essentially this means that over the lifetime of a job (one with more
> map/reduce tasks than the size of the cluster) the ratio of map and reduce
> slots can change, resulting in significantly better resource utilization,
> because the map slots can be freed up after they have finished doing work.
>
> If anyone is running Hadoop on Mesos or would be kind enough to contribute
> to reviewing the code in the diff, or giving the branch a go on their
> cluster, that would be very much appreciated! We've been running the patch
> in production for several months and have seen some quite significant
> performance gains with our type of workload.
>
> The pull request is here https://github.com/mesos/hadoop/pull/33.
>
> Feel free to get in touch if you have any questions! Thanks!
>
>  --
>
> Tom Arnfeld
> Developer // DueDil
>


-- 
Text by Jeff, typos by iPhone

Re: Mesos Hadoop Framework 0.1.0

Posted by Brenden Matthews <br...@diddyinc.com>.
This is great, thanks Tom.

On Sat, Mar 28, 2015 at 6:52 AM, Tom Arnfeld <to...@duedil.com> wrote:

> Hey everyone,
>
>
> I thought it best to send an email to the list before merging and tagging
> a 0.1.0 release for the Hadoop on Mesos framework. This release is for a
> new feature we've been working on for quite some time, which allows Hadoop
> TaskTrackers to be semi-terminated when they are idle, without destroying
> any map output they may need to retain for running reduce tasks.
>
>
> Essentially this means that over the lifetime of a job (one with more
> map/reduce tasks than the size of the cluster) the ratio of map and reduce
> slots can change, resulting in significantly better resource utilization,
> because the map slots can be freed up after they have finished doing work.
>
>
> If anyone is running Hadoop on Mesos or would be kind enough to contribute
> to reviewing the code in the diff, or giving the branch a go on their
> cluster, that would be very much appreciated! We've been running the patch
> in production for several months and have seen some quite significant
> performance gains with our type of workload.
>
>
> The pull request is here https://github.com/mesos/hadoop/pull/33.
>
>
> Feel free to get in touch if you have any questions! Thanks!
>
>
> --
>
> Tom Arnfeld
> Developer // DueDil