You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Deepak Nettem <de...@gmail.com> on 2012/01/22 20:30:10 UTC

Changing Graph Size

Hi Folks,

I have a Graph processing problem where after each iteration, some vertices
get vanished. That is, they get merged into their neighbouring nodes based
on certain conditions and the graph keeps getting simplified after each
iteration.

I was wondering if Giraph is worth trying for this.

I was going through the documentation and it says that the input data has
to be sorted.Why is this necessary?

Also, how does the  so called 'master' divide vertices in ranges? Does it
use some kind of a range partitioner? If there is range partitioning,
that's a problem for me, because of vanishing vertices - there will be load
imbalance.

Best,
Deepak

Re: Changing Graph Size

Posted by Avery Ching <ac...@apache.org>.
Not that updated I guess, but I have some presentations that are kind of 
recent (but still prior to the vertex range changes in GIRAPH-11 
unfortunately).  Here is my most recent one from October 2011 
(http://www.slideshare.net/averyching/20111014hortonworks).  There are 
some folks working on presentations for FOSDEM (Claudio) and Berlin 
Buzzwords (Jakob).  Maybe they have some up-to-date material?

Avery


On 1/22/12 7:23 PM, Deepak Nettem wrote:
> Awesome!
>
> I can't wait to try this out. Is there any other resources / updated 
> documentation where I can get insight into how Giraph works internally?
>
> I am specifically interested in understanding how Mapper-only makes 
> this entire thing possible, what are the key-value pairs (since it's 
> built on top of Hadoop), and where Zookeeper fits in.
>
> Best,
> Deepak
>
> On Sun, Jan 22, 2012 at 5:30 PM, Avery Ching <aching@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hi Deepak,
>
>     Answers inline.
>
>     Happy sunday!
>
>     Avery
>
>
>     On 1/22/12 11:30 AM, Deepak Nettem wrote:
>>     Hi Folks,
>>
>>     I have a Graph processing problem where after each iteration,
>>     some vertices get vanished. That is, they get merged into their
>>     neighbouring nodes based on certain conditions and the graph
>>     keeps getting simplified after each iteration.
>>
>>     I was wondering if Giraph is worth trying for this.
>>
>     Giraph can support this graph mutation at any iteration in the
>     graph.  See
>     https://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java,
>     the method r emoveVertexRequest() will remove vertices for you
>     between iterations.
>
>
>>     I was going through the documentation and it says that the input
>>     data has to be sorted.Why is this necessary?
>>
>
>     Ouch, this used to be a requirement, but no longer.  You can load
>     vertices however you like.  The workers will forward them to the
>     appropriate partition.
>
>
>>     Also, how does the  so called 'master' divide vertices in ranges?
>>     Does it use some kind of a range partitioner? If there is range
>>     partitioning, that's a problem for me, because of vanishing
>>     vertices - there will be load imbalance.
>>
>     Again, out of date documentation.  Please see
>     https://issues.apache.org/jira/browse/GIRAPH-11 for the relevant
>     change.  Let us know if you have any other questions.
>
>>     Best,
>>     Deepak
>


Re: Changing Graph Size

Posted by Deepak Nettem <de...@gmail.com>.
Awesome!

I can't wait to try this out. Is there any other resources / updated
documentation where I can get insight into how Giraph works internally?

I am specifically interested in understanding how Mapper-only makes this
entire thing possible, what are the key-value pairs (since it's built on
top of Hadoop), and where Zookeeper fits in.

Best,
Deepak

On Sun, Jan 22, 2012 at 5:30 PM, Avery Ching <ac...@apache.org> wrote:

>  Hi Deepak,
>
> Answers inline.
>
> Happy sunday!
>
> Avery
>
>
> On 1/22/12 11:30 AM, Deepak Nettem wrote:
>
> Hi Folks,
>
> I have a Graph processing problem where after each iteration, some
> vertices get vanished. That is, they get merged into their neighbouring
> nodes based on certain conditions and the graph keeps getting simplified
> after each iteration.
>
> I was wondering if Giraph is worth trying for this.
>
>  Giraph can support this graph mutation at any iteration in the graph.
> See
> https://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java,
> the method r emoveVertexRequest() will remove vertices for you between
> iterations.
>
>
> I was going through the documentation and it says that the input data has
> to be sorted.Why is this necessary?
>
>
> Ouch, this used to be a requirement, but no longer.  You can load vertices
> however you like.  The workers will forward them to the appropriate
> partition.
>
>
> Also, how does the  so called 'master' divide vertices in ranges? Does it
> use some kind of a range partitioner? If there is range partitioning,
> that's a problem for me, because of vanishing vertices - there will be load
> imbalance.
>
>  Again, out of date documentation.  Please see
> https://issues.apache.org/jira/browse/GIRAPH-11 for the relevant change.
> Let us know if you have any other questions.
>
> Best,
> Deepak
>
>
>

Re: Changing Graph Size

Posted by Avery Ching <ac...@apache.org>.
Hi Deepak,

Answers inline.

Happy sunday!

Avery

On 1/22/12 11:30 AM, Deepak Nettem wrote:
> Hi Folks,
>
> I have a Graph processing problem where after each iteration, some 
> vertices get vanished. That is, they get merged into their 
> neighbouring nodes based on certain conditions and the graph keeps 
> getting simplified after each iteration.
>
> I was wondering if Giraph is worth trying for this.
>
Giraph can support this graph mutation at any iteration in the graph.  
See 
https://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java, 
the method r emoveVertexRequest() will remove vertices for you between 
iterations.

> I was going through the documentation and it says that the input data 
> has to be sorted.Why is this necessary?
>

Ouch, this used to be a requirement, but no longer.  You can load 
vertices however you like.  The workers will forward them to the 
appropriate partition.

> Also, how does the  so called 'master' divide vertices in ranges? Does 
> it use some kind of a range partitioner? If there is range 
> partitioning, that's a problem for me, because of vanishing vertices - 
> there will be load imbalance.
>
Again, out of date documentation.  Please see 
https://issues.apache.org/jira/browse/GIRAPH-11 for the relevant 
change.  Let us know if you have any other questions.

> Best,
> Deepak