You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Eli Reisman <ap...@gmail.com> on 2014/12/19 21:09:50 UTC

Re: YARN vs. MR1: is YARN a good idea?

Giraph on YARN thus far doesn't break any compatibility with the MapReduce
version. When I was working on it more actively, it had a slightly faster
job startup but otherwise behaved similarly to the MapReduce version.

There are a number of things design wise that could make the YARN profile
substantially better (in theory) but would require a fork or bigger design
changes/agreements about the MapReduce profiles. This would include things
like spawning the Master Giraph task in the Application Master itself, and
many other things along those lines.

There are also a number of smaller things that would probably make a
difference like exposing YARN's per-task resource configuration features in
a more flexible way.

I haven't had much time to hack on Giraph this past year, and at some point
last summer some folks like Muhammad Islam from LinkedIn did some great
work to update the YARN profile to run on Hadoop 2.2.0 or newer versions
but since then it hasn't gotten much love.

I noticed there is still a note in the master POM from the original Giraph
on YARN implementation that says its compatible only with Hadoop
2.0.3-alpha. I thought that was removed with Mohammad's Hadoop 2.2.0
patches but apparently it wasn't. We should remove that, it's no longer
accurate and seems to be misleading people trying to build the YARN profile.



On Fri, Oct 10, 2014 at 11:15 AM, Tripti Singh <tr...@yahoo-inc.com> wrote:
>
> Hi Matthew,
> I would have been thrilled to give you numbers on this one but for me the
> Application is not scaling without the out-of-core option( which isn't
> working the way it was in previous version)
> I'm still figuring it out and can get back once it's resolved. I have
> patched a few things and will share them for people who might face similar
> issue. If u have a fix for scalability, do let me know
>
> Thanks,
> Tripti
>
> Sent from my iPhone
>
> > On 06-Oct-2014, at 9:22 pm, "Matthew Cornell" <ma...@matthewcornell.org>
> wrote:
> >
> > Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I
> > built Giraph 1.0.0 for our system. How much better is Giraph on YARN?
> > Thank you.
> >
> > --
> > Matthew Cornell | matt@matthewcornell.org
>

Re: YARN vs. MR1: is YARN a good idea?

Posted by Eli Reisman <ap...@gmail.com>.
Excellent! Hope to help out with this a bit more as time permits. I bet if
we add the missing munge symbols to the hadoop_yarn profile that other
error people have mentioned will go away? I know Mohammed added support for
the sasl stuff in his 2.2.0 patch and I assume it still works?
I'm in the middle of a large Hadoop 2.5 upgrade so maybe I can play with it
for reals soon!

Thanks,
Eli
On Dec 19, 2014 1:53 PM, "Roman Shaposhnik" <ro...@shaposhnik.org> wrote:

> Perfect summary! Thanks for writing it.
>
> Thanks,
> Roman.
>
> On Fri, Dec 19, 2014 at 12:09 PM, Eli Reisman <ap...@gmail.com>
> wrote:
> > Giraph on YARN thus far doesn't break any compatibility with the
> MapReduce
> > version. When I was working on it more actively, it had a slightly faster
> > job startup but otherwise behaved similarly to the MapReduce version.
> >
> > There are a number of things design wise that could make the YARN profile
> > substantially better (in theory) but would require a fork or bigger
> design
> > changes/agreements about the MapReduce profiles. This would include
> things
> > like spawning the Master Giraph task in the Application Master itself,
> and
> > many other things along those lines.
> >
> > There are also a number of smaller things that would probably make a
> > difference like exposing YARN's per-task resource configuration features
> in
> > a more flexible way.
> >
> > I haven't had much time to hack on Giraph this past year, and at some
> point
> > last summer some folks like Muhammad Islam from LinkedIn did some great
> work
> > to update the YARN profile to run on Hadoop 2.2.0 or newer versions but
> > since then it hasn't gotten much love.
> >
> > I noticed there is still a note in the master POM from the original
> Giraph
> > on YARN implementation that says its compatible only with Hadoop
> > 2.0.3-alpha. I thought that was removed with Mohammad's Hadoop 2.2.0
> patches
> > but apparently it wasn't. We should remove that, it's no longer accurate
> and
> > seems to be misleading people trying to build the YARN profile.
> >
> >
> >
> > On Fri, Oct 10, 2014 at 11:15 AM, Tripti Singh <tr...@yahoo-inc.com>
> wrote:
> >>
> >> Hi Matthew,
> >> I would have been thrilled to give you numbers on this one but for me
> the
> >> Application is not scaling without the out-of-core option( which isn't
> >> working the way it was in previous version)
> >> I'm still figuring it out and can get back once it's resolved. I have
> >> patched a few things and will share them for people who might face
> similar
> >> issue. If u have a fix for scalability, do let me know
> >>
> >> Thanks,
> >> Tripti
> >>
> >> Sent from my iPhone
> >>
> >> > On 06-Oct-2014, at 9:22 pm, "Matthew Cornell" <
> matt@matthewcornell.org>
> >> > wrote:
> >> >
> >> > Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I
> >> > built Giraph 1.0.0 for our system. How much better is Giraph on YARN?
> >> > Thank you.
> >> >
> >> > --
> >> > Matthew Cornell | matt@matthewcornell.org
>

Re: YARN vs. MR1: is YARN a good idea?

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
Perfect summary! Thanks for writing it.

Thanks,
Roman.

On Fri, Dec 19, 2014 at 12:09 PM, Eli Reisman <ap...@gmail.com> wrote:
> Giraph on YARN thus far doesn't break any compatibility with the MapReduce
> version. When I was working on it more actively, it had a slightly faster
> job startup but otherwise behaved similarly to the MapReduce version.
>
> There are a number of things design wise that could make the YARN profile
> substantially better (in theory) but would require a fork or bigger design
> changes/agreements about the MapReduce profiles. This would include things
> like spawning the Master Giraph task in the Application Master itself, and
> many other things along those lines.
>
> There are also a number of smaller things that would probably make a
> difference like exposing YARN's per-task resource configuration features in
> a more flexible way.
>
> I haven't had much time to hack on Giraph this past year, and at some point
> last summer some folks like Muhammad Islam from LinkedIn did some great work
> to update the YARN profile to run on Hadoop 2.2.0 or newer versions but
> since then it hasn't gotten much love.
>
> I noticed there is still a note in the master POM from the original Giraph
> on YARN implementation that says its compatible only with Hadoop
> 2.0.3-alpha. I thought that was removed with Mohammad's Hadoop 2.2.0 patches
> but apparently it wasn't. We should remove that, it's no longer accurate and
> seems to be misleading people trying to build the YARN profile.
>
>
>
> On Fri, Oct 10, 2014 at 11:15 AM, Tripti Singh <tr...@yahoo-inc.com> wrote:
>>
>> Hi Matthew,
>> I would have been thrilled to give you numbers on this one but for me the
>> Application is not scaling without the out-of-core option( which isn't
>> working the way it was in previous version)
>> I'm still figuring it out and can get back once it's resolved. I have
>> patched a few things and will share them for people who might face similar
>> issue. If u have a fix for scalability, do let me know
>>
>> Thanks,
>> Tripti
>>
>> Sent from my iPhone
>>
>> > On 06-Oct-2014, at 9:22 pm, "Matthew Cornell" <ma...@matthewcornell.org>
>> > wrote:
>> >
>> > Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I
>> > built Giraph 1.0.0 for our system. How much better is Giraph on YARN?
>> > Thank you.
>> >
>> > --
>> > Matthew Cornell | matt@matthewcornell.org