You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by Vinod Kumar Vavilapalli <vi...@hortonworks.com> on 2011/09/11 16:22:35 UTC

Port to YARN: GIRAPH and HAMA

Crosspost to hama-dev and giraph-dev.

It was only in my morning time that I was looking at HAMA-431, the port of
Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
which is about porting Giraph to YARN.

I was also looking at the Girpah proposal for entry into Apache Incubator.
There is an interesting section there:
{quote}
Relationships with Other Apache Products

Giraph has some overlapping functionality with Apache Hama. However, there
are some significant differences. Giraph focuses on graph-based bulk
synchronous parallel (BSP) computing, while Apache Hama is more for general
purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
Apache Hama uses its own computing framework.
{quote}

I agree with the point about Hama being a general purposed BSP and Giraph
being completely graph oriented. But the later one about the infrastructure
is going to be moot with both Giraph and Hama trying to be ported over to
YARN.

So here's my billion dollar question: Is it possible to implement Girpah's
graph based APIs over the Hama's bsp APIs which both run over a single
Apache BSP implementation over YARN?

I also do see the email thread regarding Hama and Giraph's future
collaboration when Hadoop NextGen aka YARN comes in:
http://s.apache.org/HamaVsGiraph. So are we ready for this yet?

Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
internals except that I see a bsp package in Giraph's source tree. I do know
a tiny bit about Hama's APIs and internal but my expertise is only two days.

Thanks,
+Vinod
(An elephant maintainer trying to see if a Giraffe can be made to ride over
a hippopotamus riding over an elephant)

Re: Port to YARN: GIRAPH and HAMA

Posted by Avery Ching <ac...@apache.org>.

Vinod, thanks for your comments.  I've replied inline.

Avery

On 9/14/11 11:09 AM, Vinod Kumar Vavilapalli wrote:
> Avery,
>
> Some replies inline to the issues you outlined.
>
>> 1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
> to be maintained to support our current users, who will not likely move to
> MRv2 for at least a year.
> I think what you need is to support Giraph's graph API for your users, but
> no, not the underlying implementation. (Or are you leaking MapReduce APIs to
> your users?) Sure, you are restricted to the under implementation(Hadoop
> MRV1 or MRV2 whenever it gets used) at any point of time, but what we are
> discussing is _that_ future when the underlying implementation itself also
> moves to MRV2.
I think the takeaway should be that our clients (at Yahoo! and 
elsewhere) are currently using Giraph on MRv1.  While the Giraph API is 
not exposing the underlying infrastructure APIs (i.e. MRv1 and MRv2), we 
still need to support the MRv1 implementation even while we 
begin/complete the port to MRv2.  I imagine that we will need to support 
both MRv1 and MRv2 for a fairly long period of time as the transition to 
MRv2 for a company (i.e. Yahoo!) could take a very long time (i.e. 
anywhere between 8 months to multiple years).  Some of our internal 
clusters at Yahoo! today are still running 0.20.1 for example.
>> 2)  The internals of Giraph are implemented differently than Hama..
> Sure, but only at present. My original question is - given a BSP
> implementation on a YARN cluster, can GiraphV2(BSP based) be simply
> implemented over that or not. If today, GiraphV1 uses (its own) BSP
> implementation over mapreduce APIs on Hadoop MRV1 cluster, I can clearly see
> how GiraphV2 can be using (HAMA's) BSP implemented over YARN APIs.
>
In theory this is true.  However, as mentioned previously, we still have 
users on MRv1 and will need to support it for a long time (i.e. at least 
a year, probably more).   Also I'm fairly certain that during the next 
year, we will have non-BSP based graph processing computing models in 
place as well.  For these reasons, it may not make sense to try to put 
Giraph on top of HAMA even when we are both on MRv2.  It's hard to say 
now as it is early.  Let's visit this at a later time.

>> 3)  If we have various graph processing computing models (BSP based,
> streams or asynchronous, or a combination), then being on Hama brings little
> value for Giraph.
> That future isn't there yet. In any case, I'd bet when you get there, lot of
> what you have now also wouldn't be an out-of-the-box fit.
>
>  From my perspective (a third person POV), this is what I can conclude.
> Giraph's velocity on Hadoop MapReduce may be real the impedence for thinking
> about a possible sharing of the bsp based implementation with HAMAV2. Sure,
> Giraph has other ideas regarding the computation model itself, but that is a
> future that isn't here yet.
>
> I just hope the same velocity isn't an impedance for thinking about the
> next-gen version on top of YARN :) The way I see it, porting Giraph to YARN
> is also a revolution in itself; most, if not all, of the implementation will
> change yet with the API level compatibility. I am still eagerly looking
> forward to the port of Giraph to YARN. May be more digging into Giraph
> internals may help my cause too.
Giraph does appear to be moving with a fast velocity currently, but we 
have a clear intention to run on top of MRv2.  Please see 
https://issues.apache.org/jira/browse/GIRAPH-13.  Obviously, the MRv2 
changes are much better suited for Giraph and we look forward to the day 
when nearly all Hadoop instances are running MRv2.
> If nothing, this discussion atleast helped sharing of some of the ideas
> between the two communities.
>
> Thanks all for putting down in your thoughts.
> +Vinod
>
>
> On Wed, Sep 14, 2011 at 11:46 AM, Thomas Jungblut<
> thomas.jungblut@googlemail.com>  wrote:
>
>>   We are also thinking about other underlying computing models (i.e.
>>> streaming (asynchronous) graph processing - see
>>
>> That is a really cool idea. But I don't think we are going to focus solely
>> on graph computing. We want to enable an interface which can be used for it
>> (straight forward as described in the Pregel Paper), but I think you are
>> really graph experts- so we don't want to compete with each other :D
>> Our asynchronous processing (in my opinion) will just enable the sending of
>> messages within the computation phase. So the BarrierSync is just a little
>> transition to make sure every task is ready and every message has been send.
>> Your Vertex locking is a graph-only feature, this won't be effecting us
>> anyways.
>>
>>
>> Giraph runs completely as a MapReduce job on Hadoop today.
>> Allright.
>>
>> I think our result is the following:
>> We (Apache Hama) are focussing on the YARN implementation of the BSP
>> paradigm.
>> If you want to run Giraph on a real BSP engine later, feel free to put your
>> stuff on top of that.
>> As far as I have seen, there is a 100% backward compatibility of YARN, so
>> your current solution will run on YARN either.
>>
>> Best Regards,
>>
>> Thomas
>>

Re: Port to YARN: GIRAPH and HAMA

Posted by Avery Ching <ac...@apache.org>.

Vinod, thanks for your comments.  I've replied inline.

Avery

On 9/14/11 11:09 AM, Vinod Kumar Vavilapalli wrote:
> Avery,
>
> Some replies inline to the issues you outlined.
>
>> 1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
> to be maintained to support our current users, who will not likely move to
> MRv2 for at least a year.
> I think what you need is to support Giraph's graph API for your users, but
> no, not the underlying implementation. (Or are you leaking MapReduce APIs to
> your users?) Sure, you are restricted to the under implementation(Hadoop
> MRV1 or MRV2 whenever it gets used) at any point of time, but what we are
> discussing is _that_ future when the underlying implementation itself also
> moves to MRV2.
I think the takeaway should be that our clients (at Yahoo! and 
elsewhere) are currently using Giraph on MRv1.  While the Giraph API is 
not exposing the underlying infrastructure APIs (i.e. MRv1 and MRv2), we 
still need to support the MRv1 implementation even while we 
begin/complete the port to MRv2.  I imagine that we will need to support 
both MRv1 and MRv2 for a fairly long period of time as the transition to 
MRv2 for a company (i.e. Yahoo!) could take a very long time (i.e. 
anywhere between 8 months to multiple years).  Some of our internal 
clusters at Yahoo! today are still running 0.20.1 for example.
>> 2)  The internals of Giraph are implemented differently than Hama..
> Sure, but only at present. My original question is - given a BSP
> implementation on a YARN cluster, can GiraphV2(BSP based) be simply
> implemented over that or not. If today, GiraphV1 uses (its own) BSP
> implementation over mapreduce APIs on Hadoop MRV1 cluster, I can clearly see
> how GiraphV2 can be using (HAMA's) BSP implemented over YARN APIs.
>
In theory this is true.  However, as mentioned previously, we still have 
users on MRv1 and will need to support it for a long time (i.e. at least 
a year, probably more).   Also I'm fairly certain that during the next 
year, we will have non-BSP based graph processing computing models in 
place as well.  For these reasons, it may not make sense to try to put 
Giraph on top of HAMA even when we are both on MRv2.  It's hard to say 
now as it is early.  Let's visit this at a later time.

>> 3)  If we have various graph processing computing models (BSP based,
> streams or asynchronous, or a combination), then being on Hama brings little
> value for Giraph.
> That future isn't there yet. In any case, I'd bet when you get there, lot of
> what you have now also wouldn't be an out-of-the-box fit.
>
>  From my perspective (a third person POV), this is what I can conclude.
> Giraph's velocity on Hadoop MapReduce may be real the impedence for thinking
> about a possible sharing of the bsp based implementation with HAMAV2. Sure,
> Giraph has other ideas regarding the computation model itself, but that is a
> future that isn't here yet.
>
> I just hope the same velocity isn't an impedance for thinking about the
> next-gen version on top of YARN :) The way I see it, porting Giraph to YARN
> is also a revolution in itself; most, if not all, of the implementation will
> change yet with the API level compatibility. I am still eagerly looking
> forward to the port of Giraph to YARN. May be more digging into Giraph
> internals may help my cause too.
Giraph does appear to be moving with a fast velocity currently, but we 
have a clear intention to run on top of MRv2.  Please see 
https://issues.apache.org/jira/browse/GIRAPH-13.  Obviously, the MRv2 
changes are much better suited for Giraph and we look forward to the day 
when nearly all Hadoop instances are running MRv2.
> If nothing, this discussion atleast helped sharing of some of the ideas
> between the two communities.
>
> Thanks all for putting down in your thoughts.
> +Vinod
>
>
> On Wed, Sep 14, 2011 at 11:46 AM, Thomas Jungblut<
> thomas.jungblut@googlemail.com>  wrote:
>
>>   We are also thinking about other underlying computing models (i.e.
>>> streaming (asynchronous) graph processing - see
>>
>> That is a really cool idea. But I don't think we are going to focus solely
>> on graph computing. We want to enable an interface which can be used for it
>> (straight forward as described in the Pregel Paper), but I think you are
>> really graph experts- so we don't want to compete with each other :D
>> Our asynchronous processing (in my opinion) will just enable the sending of
>> messages within the computation phase. So the BarrierSync is just a little
>> transition to make sure every task is ready and every message has been send.
>> Your Vertex locking is a graph-only feature, this won't be effecting us
>> anyways.
>>
>>
>> Giraph runs completely as a MapReduce job on Hadoop today.
>> Allright.
>>
>> I think our result is the following:
>> We (Apache Hama) are focussing on the YARN implementation of the BSP
>> paradigm.
>> If you want to run Giraph on a real BSP engine later, feel free to put your
>> stuff on top of that.
>> As far as I have seen, there is a 100% backward compatibility of YARN, so
>> your current solution will run on YARN either.
>>
>> Best Regards,
>>
>> Thomas
>>

Re: Port to YARN: GIRAPH and HAMA

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Avery,

Some replies inline to the issues you outlined.

>1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
to be maintained to support our current users, who will not likely move to
MRv2 for at least a year.
I think what you need is to support Giraph's graph API for your users, but
no, not the underlying implementation. (Or are you leaking MapReduce APIs to
your users?) Sure, you are restricted to the under implementation(Hadoop
MRV1 or MRV2 whenever it gets used) at any point of time, but what we are
discussing is _that_ future when the underlying implementation itself also
moves to MRV2.

>2)  The internals of Giraph are implemented differently than Hama..
Sure, but only at present. My original question is - given a BSP
implementation on a YARN cluster, can GiraphV2(BSP based) be simply
implemented over that or not. If today, GiraphV1 uses (its own) BSP
implementation over mapreduce APIs on Hadoop MRV1 cluster, I can clearly see
how GiraphV2 can be using (HAMA's) BSP implemented over YARN APIs.

>3)  If we have various graph processing computing models (BSP based,
streams or asynchronous, or a combination), then being on Hama brings little
value for Giraph.
That future isn't there yet. In any case, I'd bet when you get there, lot of
what you have now also wouldn't be an out-of-the-box fit.

>From my perspective (a third person POV), this is what I can conclude.
Giraph's velocity on Hadoop MapReduce may be real the impedence for thinking
about a possible sharing of the bsp based implementation with HAMAV2. Sure,
Giraph has other ideas regarding the computation model itself, but that is a
future that isn't here yet.

I just hope the same velocity isn't an impedance for thinking about the
next-gen version on top of YARN :) The way I see it, porting Giraph to YARN
is also a revolution in itself; most, if not all, of the implementation will
change yet with the API level compatibility. I am still eagerly looking
forward to the port of Giraph to YARN. May be more digging into Giraph
internals may help my cause too.

If nothing, this discussion atleast helped sharing of some of the ideas
between the two communities.

Thanks all for putting down in your thoughts.
+Vinod


On Wed, Sep 14, 2011 at 11:46 AM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

>  We are also thinking about other underlying computing models (i.e.
>> streaming (asynchronous) graph processing - see
>
>
> That is a really cool idea. But I don't think we are going to focus solely
> on graph computing. We want to enable an interface which can be used for it
> (straight forward as described in the Pregel Paper), but I think you are
> really graph experts- so we don't want to compete with each other :D
> Our asynchronous processing (in my opinion) will just enable the sending of
> messages within the computation phase. So the BarrierSync is just a little
> transition to make sure every task is ready and every message has been send.
> Your Vertex locking is a graph-only feature, this won't be effecting us
> anyways.
>
>
> Giraph runs completely as a MapReduce job on Hadoop today.
>>
>
> Allright.
>
> I think our result is the following:
> We (Apache Hama) are focussing on the YARN implementation of the BSP
> paradigm.
> If you want to run Giraph on a real BSP engine later, feel free to put your
> stuff on top of that.
> As far as I have seen, there is a 100% backward compatibility of YARN, so
> your current solution will run on YARN either.
>
> Best Regards,
>
> Thomas
>

Re: Port to YARN: GIRAPH and HAMA

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Avery,

Some replies inline to the issues you outlined.

>1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
to be maintained to support our current users, who will not likely move to
MRv2 for at least a year.
I think what you need is to support Giraph's graph API for your users, but
no, not the underlying implementation. (Or are you leaking MapReduce APIs to
your users?) Sure, you are restricted to the under implementation(Hadoop
MRV1 or MRV2 whenever it gets used) at any point of time, but what we are
discussing is _that_ future when the underlying implementation itself also
moves to MRV2.

>2)  The internals of Giraph are implemented differently than Hama..
Sure, but only at present. My original question is - given a BSP
implementation on a YARN cluster, can GiraphV2(BSP based) be simply
implemented over that or not. If today, GiraphV1 uses (its own) BSP
implementation over mapreduce APIs on Hadoop MRV1 cluster, I can clearly see
how GiraphV2 can be using (HAMA's) BSP implemented over YARN APIs.

>3)  If we have various graph processing computing models (BSP based,
streams or asynchronous, or a combination), then being on Hama brings little
value for Giraph.
That future isn't there yet. In any case, I'd bet when you get there, lot of
what you have now also wouldn't be an out-of-the-box fit.

>From my perspective (a third person POV), this is what I can conclude.
Giraph's velocity on Hadoop MapReduce may be real the impedence for thinking
about a possible sharing of the bsp based implementation with HAMAV2. Sure,
Giraph has other ideas regarding the computation model itself, but that is a
future that isn't here yet.

I just hope the same velocity isn't an impedance for thinking about the
next-gen version on top of YARN :) The way I see it, porting Giraph to YARN
is also a revolution in itself; most, if not all, of the implementation will
change yet with the API level compatibility. I am still eagerly looking
forward to the port of Giraph to YARN. May be more digging into Giraph
internals may help my cause too.

If nothing, this discussion atleast helped sharing of some of the ideas
between the two communities.

Thanks all for putting down in your thoughts.
+Vinod


On Wed, Sep 14, 2011 at 11:46 AM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

>  We are also thinking about other underlying computing models (i.e.
>> streaming (asynchronous) graph processing - see
>
>
> That is a really cool idea. But I don't think we are going to focus solely
> on graph computing. We want to enable an interface which can be used for it
> (straight forward as described in the Pregel Paper), but I think you are
> really graph experts- so we don't want to compete with each other :D
> Our asynchronous processing (in my opinion) will just enable the sending of
> messages within the computation phase. So the BarrierSync is just a little
> transition to make sure every task is ready and every message has been send.
> Your Vertex locking is a graph-only feature, this won't be effecting us
> anyways.
>
>
> Giraph runs completely as a MapReduce job on Hadoop today.
>>
>
> Allright.
>
> I think our result is the following:
> We (Apache Hama) are focussing on the YARN implementation of the BSP
> paradigm.
> If you want to run Giraph on a real BSP engine later, feel free to put your
> stuff on top of that.
> As far as I have seen, there is a 100% backward compatibility of YARN, so
> your current solution will run on YARN either.
>
> Best Regards,
>
> Thomas
>

Re: Port to YARN: GIRAPH and HAMA

Posted by Thomas Jungblut <th...@googlemail.com>.

>
>  We are also thinking about other underlying computing models (i.e.
> streaming (asynchronous) graph processing - see


That is a really cool idea. But I don't think we are going to focus solely
on graph computing. We want to enable an interface which can be used for it
(straight forward as described in the Pregel Paper), but I think you are
really graph experts- so we don't want to compete with each other :D
Our asynchronous processing (in my opinion) will just enable the sending of
messages within the computation phase. So the BarrierSync is just a little
transition to make sure every task is ready and every message has been send.
Your Vertex locking is a graph-only feature, this won't be effecting us
anyways.

Giraph runs completely as a MapReduce job on Hadoop today.
>

Allright.

I think our result is the following:
We (Apache Hama) are focussing on the YARN implementation of the BSP
paradigm.
If you want to run Giraph on a real BSP engine later, feel free to put your
stuff on top of that.
As far as I have seen, there is a 100% backward compatibility of YARN, so
your current solution will run on YARN either.

Best Regards,

Thomas

Re: Port to YARN: GIRAPH and HAMA

Posted by Thomas Jungblut <th...@googlemail.com>.

>
>  We are also thinking about other underlying computing models (i.e.
> streaming (asynchronous) graph processing - see


That is a really cool idea. But I don't think we are going to focus solely
on graph computing. We want to enable an interface which can be used for it
(straight forward as described in the Pregel Paper), but I think you are
really graph experts- so we don't want to compete with each other :D
Our asynchronous processing (in my opinion) will just enable the sending of
messages within the computation phase. So the BarrierSync is just a little
transition to make sure every task is ready and every message has been send.
Your Vertex locking is a graph-only feature, this won't be effecting us
anyways.

Giraph runs completely as a MapReduce job on Hadoop today.
>

Allright.

I think our result is the following:
We (Apache Hama) are focussing on the YARN implementation of the BSP
paradigm.
If you want to run Giraph on a real BSP engine later, feel free to put your
stuff on top of that.
As far as I have seen, there is a 100% backward compatibility of YARN, so
your current solution will run on YARN either.

Best Regards,

Thomas

Re: Port to YARN: GIRAPH and HAMA

Posted by Avery Ching <ac...@apache.org>.

Maybe it's possible, hard to say what will happen in a year.  However, 
at the same time, porting an application from any of the projects to the 
another should be shouldn't be too difficult since the Pregel API is 
relatively simple.  However, as I mentioned in my original post, I 
imagine that Giraph will support non-BSP graph computing models as well 
in the future (less portable).

Avery

On 9/13/11 12:51 PM, Dan Brickley wrote:
> On 13 September 2011 21:43, Dmitriy Ryaboy<dm...@twitter.com>  wrote:
>> Dan,
>> Given how fast we are currently iterating on the API in Giraph, I think
>> agreeing on a common API across 3 projects is a bit premature at this stage,
>> unfortunately..
> Current velocity aside, ... could such an interface be plausible? e.g.
> this time next year?
>
> Dan

Re: Port to YARN: GIRAPH and HAMA

Posted by Avery Ching <ac...@apache.org>.

Maybe it's possible, hard to say what will happen in a year.  However, 
at the same time, porting an application from any of the projects to the 
another should be shouldn't be too difficult since the Pregel API is 
relatively simple.  However, as I mentioned in my original post, I 
imagine that Giraph will support non-BSP graph computing models as well 
in the future (less portable).

Avery

On 9/13/11 12:51 PM, Dan Brickley wrote:
> On 13 September 2011 21:43, Dmitriy Ryaboy<dm...@twitter.com>  wrote:
>> Dan,
>> Given how fast we are currently iterating on the API in Giraph, I think
>> agreeing on a common API across 3 projects is a bit premature at this stage,
>> unfortunately..
> Current velocity aside, ... could such an interface be plausible? e.g.
> this time next year?
>
> Dan

Re: Port to YARN: GIRAPH and HAMA

Posted by Dan Brickley <da...@danbri.org>.

On 13 September 2011 21:43, Dmitriy Ryaboy <dm...@twitter.com> wrote:
> Dan,
> Given how fast we are currently iterating on the API in Giraph, I think
> agreeing on a common API across 3 projects is a bit premature at this stage,
> unfortunately..

Current velocity aside, ... could such an interface be plausible? e.g.
this time next year?

Dan

Re: Port to YARN: GIRAPH and HAMA

Posted by Dan Brickley <da...@danbri.org>.

On 13 September 2011 21:43, Dmitriy Ryaboy <dm...@twitter.com> wrote:
> Dan,
> Given how fast we are currently iterating on the API in Giraph, I think
> agreeing on a common API across 3 projects is a bit premature at this stage,
> unfortunately..

Current velocity aside, ... could such an interface be plausible? e.g.
this time next year?

Dan

Re: Port to YARN: GIRAPH and HAMA

Posted by Dmitriy Ryaboy <dm...@twitter.com>.

Dan,
Given how fast we are currently iterating on the API in Giraph, I think
agreeing on a common API across 3 projects is a bit premature at this stage,
unfortunately..

D

On Tue, Sep 13, 2011 at 11:20 AM, Dan Brickley <da...@danbri.org> wrote:

> On 13 September 2011 19:47, Avery Ching <ac...@apache.org> wrote:
>
> > Perhaps more practically, I wonder if it would be possible for someone
> from
> > the Hama team to refactor our code a bit to support Hama-style BSP in
> > Giraph?  Certainly would be a pretty cool project...
>
> Maybe this is crazy, but: I was wondering...  Pregel's basic API
> approach is pretty straightforward, gloriously simple even. Could we
> have platform-neutral APIs that allowed portability of applications
> between  Pregel-based platforms? At least for Java...
>
> Right now, those of us who are more 'application people' than platform
> developers, are left searching around on 'pregel opensource' and have
> to try to guess which of the various Pregel-eseque platforms is
> looking most healthy. For example, my summer vacation project was
> checking out GoldenOrbOS. Yet by the time I get back, the Mahout list
> was buzzing with discussion of Giraph, so I took a look at that (and
> was pleasantly suprised).
>
> There is clearly a lot of energy and creativity right now going into
> this kind of distributed graph processing platform. That suggests to
> me that *finalising* cross-platform APIs would be premature. But it is
> also a time when platforms have a certain amount of flexibility that
> they will loose as they get adopted and embedded within products and
> processes. Could a Pregel-like Java API be agreed between platforms
> (e.g. let's consider Giraph, Hama, GoldenOrbOS), so that those of us
> investigating applications could proceed with some hope of later
> portability. This might be cheaper than trying to persuade Giraph to
> rebuild on top of Hama, or suchlike. Anyone care to make a first pass
> at suggesting some common interfaces?
>
> cheers,
>
> Dan
>



-- 
Dmitriy V Ryaboy
Twitter Analytics
http://twitter.com/squarecog

Re: Port to YARN: GIRAPH and HAMA

Posted by Dmitriy Ryaboy <dm...@twitter.com>.

Dan,
Given how fast we are currently iterating on the API in Giraph, I think
agreeing on a common API across 3 projects is a bit premature at this stage,
unfortunately..

D

On Tue, Sep 13, 2011 at 11:20 AM, Dan Brickley <da...@danbri.org> wrote:

> On 13 September 2011 19:47, Avery Ching <ac...@apache.org> wrote:
>
> > Perhaps more practically, I wonder if it would be possible for someone
> from
> > the Hama team to refactor our code a bit to support Hama-style BSP in
> > Giraph?  Certainly would be a pretty cool project...
>
> Maybe this is crazy, but: I was wondering...  Pregel's basic API
> approach is pretty straightforward, gloriously simple even. Could we
> have platform-neutral APIs that allowed portability of applications
> between  Pregel-based platforms? At least for Java...
>
> Right now, those of us who are more 'application people' than platform
> developers, are left searching around on 'pregel opensource' and have
> to try to guess which of the various Pregel-eseque platforms is
> looking most healthy. For example, my summer vacation project was
> checking out GoldenOrbOS. Yet by the time I get back, the Mahout list
> was buzzing with discussion of Giraph, so I took a look at that (and
> was pleasantly suprised).
>
> There is clearly a lot of energy and creativity right now going into
> this kind of distributed graph processing platform. That suggests to
> me that *finalising* cross-platform APIs would be premature. But it is
> also a time when platforms have a certain amount of flexibility that
> they will loose as they get adopted and embedded within products and
> processes. Could a Pregel-like Java API be agreed between platforms
> (e.g. let's consider Giraph, Hama, GoldenOrbOS), so that those of us
> investigating applications could proceed with some hope of later
> portability. This might be cheaper than trying to persuade Giraph to
> rebuild on top of Hama, or suchlike. Anyone care to make a first pass
> at suggesting some common interfaces?
>
> cheers,
>
> Dan
>



-- 
Dmitriy V Ryaboy
Twitter Analytics
http://twitter.com/squarecog

Re: Port to YARN: GIRAPH and HAMA

Posted by Dan Brickley <da...@danbri.org>.

On 13 September 2011 19:47, Avery Ching <ac...@apache.org> wrote:

> Perhaps more practically, I wonder if it would be possible for someone from
> the Hama team to refactor our code a bit to support Hama-style BSP in
> Giraph?  Certainly would be a pretty cool project...

Maybe this is crazy, but: I was wondering...  Pregel's basic API
approach is pretty straightforward, gloriously simple even. Could we
have platform-neutral APIs that allowed portability of applications
between  Pregel-based platforms? At least for Java...

Right now, those of us who are more 'application people' than platform
developers, are left searching around on 'pregel opensource' and have
to try to guess which of the various Pregel-eseque platforms is
looking most healthy. For example, my summer vacation project was
checking out GoldenOrbOS. Yet by the time I get back, the Mahout list
was buzzing with discussion of Giraph, so I took a look at that (and
was pleasantly suprised).

There is clearly a lot of energy and creativity right now going into
this kind of distributed graph processing platform. That suggests to
me that *finalising* cross-platform APIs would be premature. But it is
also a time when platforms have a certain amount of flexibility that
they will loose as they get adopted and embedded within products and
processes. Could a Pregel-like Java API be agreed between platforms
(e.g. let's consider Giraph, Hama, GoldenOrbOS), so that those of us
investigating applications could proceed with some hope of later
portability. This might be cheaper than trying to persuade Giraph to
rebuild on top of Hama, or suchlike. Anyone care to make a first pass
at suggesting some common interfaces?

cheers,

Dan

Re: Port to YARN: GIRAPH and HAMA

Posted by Dan Brickley <da...@danbri.org>.

On 13 September 2011 19:47, Avery Ching <ac...@apache.org> wrote:

> Perhaps more practically, I wonder if it would be possible for someone from
> the Hama team to refactor our code a bit to support Hama-style BSP in
> Giraph?  Certainly would be a pretty cool project...

Maybe this is crazy, but: I was wondering...  Pregel's basic API
approach is pretty straightforward, gloriously simple even. Could we
have platform-neutral APIs that allowed portability of applications
between  Pregel-based platforms? At least for Java...

Right now, those of us who are more 'application people' than platform
developers, are left searching around on 'pregel opensource' and have
to try to guess which of the various Pregel-eseque platforms is
looking most healthy. For example, my summer vacation project was
checking out GoldenOrbOS. Yet by the time I get back, the Mahout list
was buzzing with discussion of Giraph, so I took a look at that (and
was pleasantly suprised).

There is clearly a lot of energy and creativity right now going into
this kind of distributed graph processing platform. That suggests to
me that *finalising* cross-platform APIs would be premature. But it is
also a time when platforms have a certain amount of flexibility that
they will loose as they get adopted and embedded within products and
processes. Could a Pregel-like Java API be agreed between platforms
(e.g. let's consider Giraph, Hama, GoldenOrbOS), so that those of us
investigating applications could proceed with some hope of later
portability. This might be cheaper than trying to persuade Giraph to
rebuild on top of Hama, or suchlike. Anyone care to make a first pass
at suggesting some common interfaces?

cheers,

Dan

Re: Port to YARN: GIRAPH and HAMA

Posted by "Edward J. Yoon" <ed...@apache.org>.

Interesintg. In our community, someone's thinking about asynchronous
message processing for more efficient iteration[1], too.

As I mentioned before to you, differ in slogan but not in kind. The
technical issues are nothing, Avery.

Anyway, ...

It would be nice if we can talk together continuously, for
collaborative competition. http://s.apache.org/HamaVsGiraph

1. http://markmail.org/thread/nrrevdrb5qc7ic5c

On Wed, Sep 14, 2011 at 2:47 AM, Avery Ching <ac...@apache.org> wrote:
> Hi Vinod,
>
> Edward and I have chatted about this at times.  It sounds better in theory
> (both BSP based and adding support for MRv2) than in practice I think
> (underlying implementations are quite different).  Actually, I also believe
> that in the future, Giraph is not going to solely be BSP-based graph
> computing.  We are also thinking about other underlying computing models
> (i.e. streaming (asynchronous) graph processing - see
>
> http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201109.mbox/%3CCAEVHzWC8b-7RiBjkDiQKjiu-rVBz9=ogEOajXHbCLCR5n3+QVg@mail.gmail.com%3E
>
> But I think today, the issues are the following:
>
> 1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
> to be maintained to support our current users, who will not likely move to
> MRv2 for at least a year.
> 2)  The internals of Giraph are implemented differently than Hama and would
> take some time to port to.
> 3)  If we have various graph processing computing models (BSP based, streams
> or asynchronous, or a combination), then being on Hama brings little value
> for Giraph.
>
> Perhaps more practically, I wonder if it would be possible for someone from
> the Hama team to refactor our code a bit to support Hama-style BSP in
> Giraph?  Certainly would be a pretty cool project...
>
> Avery
>
> On 9/13/11 4:49 AM, Edward J. Yoon wrote:
>>
>> Quite a while ago, I implemented a clone of Google Pregel simply using
>> BSPLib[1] and decided to focus on BSP computing engine.
>>
>> Hama and Giraph projects are differ in slogan but not in kind.
>>
>> If we made some collaboration, Giraph should be implemented on top of
>> Hama BSP computing engine.
>>
>> Otherwise, we will back to square one again.
>>
>> 1. http://markmail.org/thread/4czcgtjupjvpqcqi
>>
>> On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
>> <vi...@hortonworks.com>  wrote:
>>>
>>> Crosspost to hama-dev and giraph-dev.
>>>
>>> It was only in my morning time that I was looking at HAMA-431, the port
>>> of
>>> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
>>> which is about porting Giraph to YARN.
>>>
>>> I was also looking at the Girpah proposal for entry into Apache
>>> Incubator.
>>> There is an interesting section there:
>>> {quote}
>>> Relationships with Other Apache Products
>>>
>>> Giraph has some overlapping functionality with Apache Hama. However,
>>> there
>>> are some significant differences. Giraph focuses on graph-based bulk
>>> synchronous parallel (BSP) computing, while Apache Hama is more for
>>> general
>>> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
>>> Apache Hama uses its own computing framework.
>>> {quote}
>>>
>>> I agree with the point about Hama being a general purposed BSP and Giraph
>>> being completely graph oriented. But the later one about the
>>> infrastructure
>>> is going to be moot with both Giraph and Hama trying to be ported over to
>>> YARN.
>>>
>>> So here's my billion dollar question: Is it possible to implement
>>> Girpah's
>>> graph based APIs over the Hama's bsp APIs which both run over a single
>>> Apache BSP implementation over YARN?
>>>
>>> I also do see the email thread regarding Hama and Giraph's future
>>> collaboration when Hadoop NextGen aka YARN comes in:
>>> http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
>>>
>>> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs
>>> or
>>> internals except that I see a bsp package in Giraph's source tree. I do
>>> know
>>> a tiny bit about Hama's APIs and internal but my expertise is only two
>>> days.
>>>
>>> Thanks,
>>> +Vinod
>>> (An elephant maintainer trying to see if a Giraffe can be made to ride
>>> over
>>> a hippopotamus riding over an elephant)
>>>
>>
>>
>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Port to YARN: GIRAPH and HAMA

Posted by Owen O'Malley <ow...@hortonworks.com>.

On Tue, Sep 13, 2011 at 10:47 AM, Avery Ching <ac...@apache.org> wrote:
> 1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
> to be maintained to support our current users, who will not likely move to
> MRv2 for at least a year.

Giraph already has ifdefs to deal with the 0.20 and 0.20.2xx API
changes, so it shouldn't be hard to deal with MRv2 the same way.

-- Owen

Re: Port to YARN: GIRAPH and HAMA

Posted by Owen O'Malley <ow...@hortonworks.com>.

On Tue, Sep 13, 2011 at 10:47 AM, Avery Ching <ac...@apache.org> wrote:
> 1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
> to be maintained to support our current users, who will not likely move to
> MRv2 for at least a year.

Giraph already has ifdefs to deal with the 0.20 and 0.20.2xx API
changes, so it shouldn't be hard to deal with MRv2 the same way.

-- Owen

Re: Port to YARN: GIRAPH and HAMA

Posted by "Edward J. Yoon" <ed...@apache.org>.

Interesintg. In our community, someone's thinking about asynchronous
message processing for more efficient iteration[1], too.

As I mentioned before to you, differ in slogan but not in kind. The
technical issues are nothing, Avery.

Anyway, ...

It would be nice if we can talk together continuously, for
collaborative competition. http://s.apache.org/HamaVsGiraph

1. http://markmail.org/thread/nrrevdrb5qc7ic5c

On Wed, Sep 14, 2011 at 2:47 AM, Avery Ching <ac...@apache.org> wrote:
> Hi Vinod,
>
> Edward and I have chatted about this at times.  It sounds better in theory
> (both BSP based and adding support for MRv2) than in practice I think
> (underlying implementations are quite different).  Actually, I also believe
> that in the future, Giraph is not going to solely be BSP-based graph
> computing.  We are also thinking about other underlying computing models
> (i.e. streaming (asynchronous) graph processing - see
>
> http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201109.mbox/%3CCAEVHzWC8b-7RiBjkDiQKjiu-rVBz9=ogEOajXHbCLCR5n3+QVg@mail.gmail.com%3E
>
> But I think today, the issues are the following:
>
> 1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
> to be maintained to support our current users, who will not likely move to
> MRv2 for at least a year.
> 2)  The internals of Giraph are implemented differently than Hama and would
> take some time to port to.
> 3)  If we have various graph processing computing models (BSP based, streams
> or asynchronous, or a combination), then being on Hama brings little value
> for Giraph.
>
> Perhaps more practically, I wonder if it would be possible for someone from
> the Hama team to refactor our code a bit to support Hama-style BSP in
> Giraph?  Certainly would be a pretty cool project...
>
> Avery
>
> On 9/13/11 4:49 AM, Edward J. Yoon wrote:
>>
>> Quite a while ago, I implemented a clone of Google Pregel simply using
>> BSPLib[1] and decided to focus on BSP computing engine.
>>
>> Hama and Giraph projects are differ in slogan but not in kind.
>>
>> If we made some collaboration, Giraph should be implemented on top of
>> Hama BSP computing engine.
>>
>> Otherwise, we will back to square one again.
>>
>> 1. http://markmail.org/thread/4czcgtjupjvpqcqi
>>
>> On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
>> <vi...@hortonworks.com>  wrote:
>>>
>>> Crosspost to hama-dev and giraph-dev.
>>>
>>> It was only in my morning time that I was looking at HAMA-431, the port
>>> of
>>> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
>>> which is about porting Giraph to YARN.
>>>
>>> I was also looking at the Girpah proposal for entry into Apache
>>> Incubator.
>>> There is an interesting section there:
>>> {quote}
>>> Relationships with Other Apache Products
>>>
>>> Giraph has some overlapping functionality with Apache Hama. However,
>>> there
>>> are some significant differences. Giraph focuses on graph-based bulk
>>> synchronous parallel (BSP) computing, while Apache Hama is more for
>>> general
>>> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
>>> Apache Hama uses its own computing framework.
>>> {quote}
>>>
>>> I agree with the point about Hama being a general purposed BSP and Giraph
>>> being completely graph oriented. But the later one about the
>>> infrastructure
>>> is going to be moot with both Giraph and Hama trying to be ported over to
>>> YARN.
>>>
>>> So here's my billion dollar question: Is it possible to implement
>>> Girpah's
>>> graph based APIs over the Hama's bsp APIs which both run over a single
>>> Apache BSP implementation over YARN?
>>>
>>> I also do see the email thread regarding Hama and Giraph's future
>>> collaboration when Hadoop NextGen aka YARN comes in:
>>> http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
>>>
>>> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs
>>> or
>>> internals except that I see a bsp package in Giraph's source tree. I do
>>> know
>>> a tiny bit about Hama's APIs and internal but my expertise is only two
>>> days.
>>>
>>> Thanks,
>>> +Vinod
>>> (An elephant maintainer trying to see if a Giraffe can be made to ride
>>> over
>>> a hippopotamus riding over an elephant)
>>>
>>
>>
>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Port to YARN: GIRAPH and HAMA

Posted by Avery Ching <ac...@apache.org>.

Hi Vinod,

Edward and I have chatted about this at times.  It sounds better in 
theory (both BSP based and adding support for MRv2) than in practice I 
think (underlying implementations are quite different).  Actually, I 
also believe that in the future, Giraph is not going to solely be 
BSP-based graph computing.  We are also thinking about other underlying 
computing models (i.e. streaming (asynchronous) graph processing - see

http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201109.mbox/%3CCAEVHzWC8b-7RiBjkDiQKjiu-rVBz9=ogEOajXHbCLCR5n3+QVg@mail.gmail.com%3E

But I think today, the issues are the following:

1)  Giraph runs completely as a MapReduce job on Hadoop today.  This 
needs to be maintained to support our current users, who will not likely 
move to MRv2 for at least a year.
2)  The internals of Giraph are implemented differently than Hama and 
would take some time to port to.
3)  If we have various graph processing computing models (BSP based, 
streams or asynchronous, or a combination), then being on Hama brings 
little value for Giraph.

Perhaps more practically, I wonder if it would be possible for someone 
from the Hama team to refactor our code a bit to support Hama-style BSP 
in Giraph?  Certainly would be a pretty cool project...

Avery

On 9/13/11 4:49 AM, Edward J. Yoon wrote:
> Quite a while ago, I implemented a clone of Google Pregel simply using
> BSPLib[1] and decided to focus on BSP computing engine.
>
> Hama and Giraph projects are differ in slogan but not in kind.
>
> If we made some collaboration, Giraph should be implemented on top of
> Hama BSP computing engine.
>
> Otherwise, we will back to square one again.
>
> 1. http://markmail.org/thread/4czcgtjupjvpqcqi
>
> On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com>  wrote:
>> Crosspost to hama-dev and giraph-dev.
>>
>> It was only in my morning time that I was looking at HAMA-431, the port of
>> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
>> which is about porting Giraph to YARN.
>>
>> I was also looking at the Girpah proposal for entry into Apache Incubator.
>> There is an interesting section there:
>> {quote}
>> Relationships with Other Apache Products
>>
>> Giraph has some overlapping functionality with Apache Hama. However, there
>> are some significant differences. Giraph focuses on graph-based bulk
>> synchronous parallel (BSP) computing, while Apache Hama is more for general
>> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
>> Apache Hama uses its own computing framework.
>> {quote}
>>
>> I agree with the point about Hama being a general purposed BSP and Giraph
>> being completely graph oriented. But the later one about the infrastructure
>> is going to be moot with both Giraph and Hama trying to be ported over to
>> YARN.
>>
>> So here's my billion dollar question: Is it possible to implement Girpah's
>> graph based APIs over the Hama's bsp APIs which both run over a single
>> Apache BSP implementation over YARN?
>>
>> I also do see the email thread regarding Hama and Giraph's future
>> collaboration when Hadoop NextGen aka YARN comes in:
>> http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
>>
>> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
>> internals except that I see a bsp package in Giraph's source tree. I do know
>> a tiny bit about Hama's APIs and internal but my expertise is only two days.
>>
>> Thanks,
>> +Vinod
>> (An elephant maintainer trying to see if a Giraffe can be made to ride over
>> a hippopotamus riding over an elephant)
>>
>
>

Re: Port to YARN: GIRAPH and HAMA

Posted by Avery Ching <ac...@apache.org>.

Hi Vinod,

Edward and I have chatted about this at times.  It sounds better in 
theory (both BSP based and adding support for MRv2) than in practice I 
think (underlying implementations are quite different).  Actually, I 
also believe that in the future, Giraph is not going to solely be 
BSP-based graph computing.  We are also thinking about other underlying 
computing models (i.e. streaming (asynchronous) graph processing - see

http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201109.mbox/%3CCAEVHzWC8b-7RiBjkDiQKjiu-rVBz9=ogEOajXHbCLCR5n3+QVg@mail.gmail.com%3E

But I think today, the issues are the following:

1)  Giraph runs completely as a MapReduce job on Hadoop today.  This 
needs to be maintained to support our current users, who will not likely 
move to MRv2 for at least a year.
2)  The internals of Giraph are implemented differently than Hama and 
would take some time to port to.
3)  If we have various graph processing computing models (BSP based, 
streams or asynchronous, or a combination), then being on Hama brings 
little value for Giraph.

Perhaps more practically, I wonder if it would be possible for someone 
from the Hama team to refactor our code a bit to support Hama-style BSP 
in Giraph?  Certainly would be a pretty cool project...

Avery

On 9/13/11 4:49 AM, Edward J. Yoon wrote:
> Quite a while ago, I implemented a clone of Google Pregel simply using
> BSPLib[1] and decided to focus on BSP computing engine.
>
> Hama and Giraph projects are differ in slogan but not in kind.
>
> If we made some collaboration, Giraph should be implemented on top of
> Hama BSP computing engine.
>
> Otherwise, we will back to square one again.
>
> 1. http://markmail.org/thread/4czcgtjupjvpqcqi
>
> On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com>  wrote:
>> Crosspost to hama-dev and giraph-dev.
>>
>> It was only in my morning time that I was looking at HAMA-431, the port of
>> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
>> which is about porting Giraph to YARN.
>>
>> I was also looking at the Girpah proposal for entry into Apache Incubator.
>> There is an interesting section there:
>> {quote}
>> Relationships with Other Apache Products
>>
>> Giraph has some overlapping functionality with Apache Hama. However, there
>> are some significant differences. Giraph focuses on graph-based bulk
>> synchronous parallel (BSP) computing, while Apache Hama is more for general
>> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
>> Apache Hama uses its own computing framework.
>> {quote}
>>
>> I agree with the point about Hama being a general purposed BSP and Giraph
>> being completely graph oriented. But the later one about the infrastructure
>> is going to be moot with both Giraph and Hama trying to be ported over to
>> YARN.
>>
>> So here's my billion dollar question: Is it possible to implement Girpah's
>> graph based APIs over the Hama's bsp APIs which both run over a single
>> Apache BSP implementation over YARN?
>>
>> I also do see the email thread regarding Hama and Giraph's future
>> collaboration when Hadoop NextGen aka YARN comes in:
>> http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
>>
>> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
>> internals except that I see a bsp package in Giraph's source tree. I do know
>> a tiny bit about Hama's APIs and internal but my expertise is only two days.
>>
>> Thanks,
>> +Vinod
>> (An elephant maintainer trying to see if a Giraffe can be made to ride over
>> a hippopotamus riding over an elephant)
>>
>
>

Re: Port to YARN: GIRAPH and HAMA

Posted by "Edward J. Yoon" <ed...@apache.org>.

Quite a while ago, I implemented a clone of Google Pregel simply using
BSPLib[1] and decided to focus on BSP computing engine.

Hama and Giraph projects are differ in slogan but not in kind.

If we made some collaboration, Giraph should be implemented on top of
Hama BSP computing engine.

Otherwise, we will back to square one again.

1. http://markmail.org/thread/4czcgtjupjvpqcqi

On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> Crosspost to hama-dev and giraph-dev.
>
> It was only in my morning time that I was looking at HAMA-431, the port of
> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
> which is about porting Giraph to YARN.
>
> I was also looking at the Girpah proposal for entry into Apache Incubator.
> There is an interesting section there:
> {quote}
> Relationships with Other Apache Products
>
> Giraph has some overlapping functionality with Apache Hama. However, there
> are some significant differences. Giraph focuses on graph-based bulk
> synchronous parallel (BSP) computing, while Apache Hama is more for general
> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
> Apache Hama uses its own computing framework.
> {quote}
>
> I agree with the point about Hama being a general purposed BSP and Giraph
> being completely graph oriented. But the later one about the infrastructure
> is going to be moot with both Giraph and Hama trying to be ported over to
> YARN.
>
> So here's my billion dollar question: Is it possible to implement Girpah's
> graph based APIs over the Hama's bsp APIs which both run over a single
> Apache BSP implementation over YARN?
>
> I also do see the email thread regarding Hama and Giraph's future
> collaboration when Hadoop NextGen aka YARN comes in:
> http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
>
> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
> internals except that I see a bsp package in Giraph's source tree. I do know
> a tiny bit about Hama's APIs and internal but my expertise is only two days.
>
> Thanks,
> +Vinod
> (An elephant maintainer trying to see if a Giraffe can be made to ride over
> a hippopotamus riding over an elephant)
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Port to YARN: GIRAPH and HAMA

Posted by "Edward J. Yoon" <ed...@apache.org>.

Quite a while ago, I implemented a clone of Google Pregel simply using
BSPLib[1] and decided to focus on BSP computing engine.

Hama and Giraph projects are differ in slogan but not in kind.

If we made some collaboration, Giraph should be implemented on top of
Hama BSP computing engine.

Otherwise, we will back to square one again.

1. http://markmail.org/thread/4czcgtjupjvpqcqi

On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> Crosspost to hama-dev and giraph-dev.
>
> It was only in my morning time that I was looking at HAMA-431, the port of
> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
> which is about porting Giraph to YARN.
>
> I was also looking at the Girpah proposal for entry into Apache Incubator.
> There is an interesting section there:
> {quote}
> Relationships with Other Apache Products
>
> Giraph has some overlapping functionality with Apache Hama. However, there
> are some significant differences. Giraph focuses on graph-based bulk
> synchronous parallel (BSP) computing, while Apache Hama is more for general
> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
> Apache Hama uses its own computing framework.
> {quote}
>
> I agree with the point about Hama being a general purposed BSP and Giraph
> being completely graph oriented. But the later one about the infrastructure
> is going to be moot with both Giraph and Hama trying to be ported over to
> YARN.
>
> So here's my billion dollar question: Is it possible to implement Girpah's
> graph based APIs over the Hama's bsp APIs which both run over a single
> Apache BSP implementation over YARN?
>
> I also do see the email thread regarding Hama and Giraph's future
> collaboration when Hadoop NextGen aka YARN comes in:
> http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
>
> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
> internals except that I see a bsp package in Giraph's source tree. I do know
> a tiny bit about Hama's APIs and internal but my expertise is only two days.
>
> Thanks,
> +Vinod
> (An elephant maintainer trying to see if a Giraffe can be made to ride over
> a hippopotamus riding over an elephant)
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon