You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Vinod Kumar Vavilapalli <vi...@hortonworks.com> on 2011/09/18 13:37:57 UTC

Giraph-13: Porting Giraph to YARN

Hi all,

I finished an excursion into Giraph's code and now I kinda know what it
takes to port Giraph over to run on top of YARN.

When  the base Hadoop clusters are replaced by YARN clusters, Giraph will
have two options:
 - *Giraph still works over mapreduce APIs*: Even after moving to YARN
clusters, Giraph can still run over MapreduceV2+YARN. Without any code
changes at all.
 - *Giraph works natively onYARN*: This can be done in such a way that in
the medium term, Giraph can continue to work on both a Hadoop Mapreduce
cluster as well as a YARN cluster. Two visible effects when this effort goes
underway, that I can think of:
    -- There will be some moving around of classes/interface to separate
APIs from implementation details and a bit of reorganisation of code to help
support both GiraphV1 and GiraphV2.
    -- The other thing the port will probably affect is a fork in the
community's attention (depending on how much of the community's eyeballs the
new world grabs as opposed to the stabilization/feature work on GiraphV1).

Now here's the thing. Avery indicated on the other thread(about Giraph over
HAMA) that most of the users of Giraph need to work on top of a hadoop
mapreduce cluster for quite some time. Which I completely agree with, being
a long time maintainer/supporting-dev of Hadoop clusters myself.

Given that concern, before embarking on the port, I thought I'd get opinions
from the community.

Thanks,
+Vinod

Re: Giraph-13: Porting Giraph to YARN

Posted by Jakob Homan <jg...@gmail.com>.
>  - *Giraph still works over mapreduce APIs*: Even after moving to YARN
> clusters, Giraph can still run over MapreduceV2+YARN. Without any code
> changes at all.
Giraph will continue to work with the MR1 APIs.

>  - *Giraph works natively onYARN*: This can be done in such a way that in
> the medium term, Giraph can continue to work on both a Hadoop Mapreduce
> cluster as well as a YARN cluster. Two visible effects when this effort goes
> underway, that I can think of:
As described in the JIRA, this is the approach I am taking as I do the work now.

>    -- There will be some moving around of classes/interface to separate
> APIs from implementation details and a bit of reorganisation of code to help
> support both GiraphV1 and GiraphV2.
Yes.

>    -- The other thing the port will probably affect is a fork in the
> community's attention (depending on how much of the community's eyeballs the
> new world grabs as opposed to the stabilization/feature work on GiraphV1).
Not really.  Assuming the refactoring is done in a clean way, it'll be
relatively painless to support both.

>
> Now here's the thing. Avery indicated on the other thread(about Giraph over
> HAMA) that most of the users of Giraph need to work on top of a hadoop
> mapreduce cluster for quite some time. Which I completely agree with, being
> a long time maintainer/supporting-dev of Hadoop clusters myself.
>
> Given that concern, before embarking on the port, I thought I'd get opinions
> from the community.
I am also a Hadoop committer/dev and rest assured, Vinod, we'll ensure
that Giraph plays nice with MR1 for the foreseeable future.  The
issue's assigned to me and I'll be working on it over the next few
weeks.

Re: Giraph-13: Porting Giraph to YARN

Posted by Avery Ching <ac...@apache.org>.
Hi Vinod,

Thank you for your thoughts.  It would be great if your comments were 
put on GIRAPH-13 so they aren't lost.  You and Jakob should sync up to 
see how to proceed on this.

Avery

On 9/18/11 7:37 AM, Vinod Kumar Vavilapalli wrote:
> Hi all,
>
> I finished an excursion into Giraph's code and now I kinda know what it
> takes to port Giraph over to run on top of YARN.
>
> When  the base Hadoop clusters are replaced by YARN clusters, Giraph will
> have two options:
>   - *Giraph still works over mapreduce APIs*: Even after moving to YARN
> clusters, Giraph can still run over MapreduceV2+YARN. Without any code
> changes at all.
>   - *Giraph works natively onYARN*: This can be done in such a way that in
> the medium term, Giraph can continue to work on both a Hadoop Mapreduce
> cluster as well as a YARN cluster. Two visible effects when this effort goes
> underway, that I can think of:
>      -- There will be some moving around of classes/interface to separate
> APIs from implementation details and a bit of reorganisation of code to help
> support both GiraphV1 and GiraphV2.
>      -- The other thing the port will probably affect is a fork in the
> community's attention (depending on how much of the community's eyeballs the
> new world grabs as opposed to the stabilization/feature work on GiraphV1).
>
> Now here's the thing. Avery indicated on the other thread(about Giraph over
> HAMA) that most of the users of Giraph need to work on top of a hadoop
> mapreduce cluster for quite some time. Which I completely agree with, being
> a long time maintainer/supporting-dev of Hadoop clusters myself.
>
> Given that concern, before embarking on the port, I thought I'd get opinions
> from the community.
>
> Thanks,
> +Vinod
>