You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Matthieu Labour <ma...@gmail.com> on 2014/03/28 17:26:53 UTC

Help on clustering connected components with Giraph

Hi

I am looking for tips on how to leverage Giraph for the use case below:

I have a list of Nodes.
A Node is a collection of Key-Value pairs.
2 Nodes are related (have an edge) if they share a Key-Value pair.

Until now I have been running a Depth First Search algorithm to cluster the
Nodes into Connected Components.

However, my data set has grown significantly and I need to scale. This is
the reason that brought me to Giraph.

I have gone through the Connected Component example in Giraph but need a
bit of help to get started. Specifically I wonder how I can change it to
accommodate the use case described above.

I would greatly appreciate any help.
Thank you in advance.
-matt

Re: Help on clustering connected components with Giraph

Posted by Matthieu Labour <ma...@gmail.com>.
Pankaj
Thanks a lot for this great idea. We will give it a try.
Cheers.


On Fri, Mar 28, 2014 at 3:25 PM, Pankaj Malhotra <pa...@gmail.com>wrote:

> maybe it would be better if you use mapreduce such that in the map phase
> each key-value pair at a node is a key and the node is the value...this way
> you get the first level of connections at the reduce-keys...then u can use
> the output of reduce phase as adjacency list for the graph to be processed
> using Giraph...
> Cheers
> Pankaj
> On Mar 28, 2014 6:27 PM, "Matthieu Labour" <ma...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am looking for tips on how to leverage Giraph for the use case below:
>>
>> I have a list of Nodes.
>> A Node is a collection of Key-Value pairs.
>> 2 Nodes are related (have an edge) if they share a Key-Value pair.
>>
>> Until now I have been running a Depth First Search algorithm to cluster
>> the Nodes into Connected Components.
>>
>> However, my data set has grown significantly and I need to scale. This is
>> the reason that brought me to Giraph.
>>
>> I have gone through the Connected Component example in Giraph but need a
>> bit of help to get started. Specifically I wonder how I can change it to
>> accommodate the use case described above.
>>
>> I would greatly appreciate any help.
>> Thank you in advance.
>> -matt
>>
>

Re: Help on clustering connected components with Giraph

Posted by Pankaj Malhotra <pa...@gmail.com>.
maybe it would be better if you use mapreduce such that in the map phase
each key-value pair at a node is a key and the node is the value...this way
you get the first level of connections at the reduce-keys...then u can use
the output of reduce phase as adjacency list for the graph to be processed
using Giraph...
Cheers
Pankaj
On Mar 28, 2014 6:27 PM, "Matthieu Labour" <ma...@gmail.com>
wrote:

> Hi
>
> I am looking for tips on how to leverage Giraph for the use case below:
>
> I have a list of Nodes.
> A Node is a collection of Key-Value pairs.
> 2 Nodes are related (have an edge) if they share a Key-Value pair.
>
> Until now I have been running a Depth First Search algorithm to cluster
> the Nodes into Connected Components.
>
> However, my data set has grown significantly and I need to scale. This is
> the reason that brought me to Giraph.
>
> I have gone through the Connected Component example in Giraph but need a
> bit of help to get started. Specifically I wonder how I can change it to
> accommodate the use case described above.
>
> I would greatly appreciate any help.
> Thank you in advance.
> -matt
>