You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Harsha HN <99...@gmail.com> on 2014/09/17 15:43:22 UTC

Adjacency List representation in Spark

Hello

We are building an adjacency list to represent a graph. Vertexes, Edges and
Weights for the same has been extracted from hdfs files by a Spark job.
Further we expect size of the adjacency list(Hash Map) could grow over
20Gigs.
How can we represent this in RDD, so that it will distributed in nature?

Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is
there any other way other than GraphX?

Thanks and Regards,
Harsha

Re: Adjacency List representation in Spark

Posted by Koert Kuipers <ko...@tresata.com>.
we build our own adjacency lists as well. the main motivation for us was
that graphx has some assumptions about everything fitting in memory (it has
.cache statements all over place). however if my understanding is wrong and
graphx can handle graphs that do not fit in memory i would be interested to
know how to use it like that.

On Thu, Sep 18, 2014 at 10:42 AM, Harsha HN <99...@gmail.com>
wrote:

> Hi Andrew,
>
> The only reason that I avoided GraphX approach is that I didnt see any
> explanation on Java side nor API documentation on Java.
> Do you have any code piece of using GraphX API in JAVA?
>
> Thanks,
> Harsha
>
> On Wed, Sep 17, 2014 at 10:44 PM, Andrew Ash <an...@andrewash.com> wrote:
>
>> Hi Harsha,
>>
>> You could look through the GraphX source to see the approach taken there
>> for ideas in your own.  I'd recommend starting at
>> https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala#L385
>> to see the storage technique.
>>
>> Why do you want to avoid using GraphX?
>>
>> Good luck!
>> Andrew
>>
>> On Wed, Sep 17, 2014 at 6:43 AM, Harsha HN <99...@gmail.com>
>> wrote:
>>
>>> Hello
>>>
>>> We are building an adjacency list to represent a graph. Vertexes, Edges
>>> and Weights for the same has been extracted from hdfs files by a Spark job.
>>> Further we expect size of the adjacency list(Hash Map) could grow over
>>> 20Gigs.
>>> How can we represent this in RDD, so that it will distributed in nature?
>>>
>>> Basically we are trying to fit HashMap(Adjacency List) into Spark RDD.
>>> Is there any other way other than GraphX?
>>>
>>> Thanks and Regards,
>>> Harsha
>>>
>>
>>
>

Re: Adjacency List representation in Spark

Posted by Harsha HN <99...@gmail.com>.
Hi Andrew,

The only reason that I avoided GraphX approach is that I didnt see any
explanation on Java side nor API documentation on Java.
Do you have any code piece of using GraphX API in JAVA?

Thanks,
Harsha

On Wed, Sep 17, 2014 at 10:44 PM, Andrew Ash <an...@andrewash.com> wrote:

> Hi Harsha,
>
> You could look through the GraphX source to see the approach taken there
> for ideas in your own.  I'd recommend starting at
> https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala#L385
> to see the storage technique.
>
> Why do you want to avoid using GraphX?
>
> Good luck!
> Andrew
>
> On Wed, Sep 17, 2014 at 6:43 AM, Harsha HN <99...@gmail.com>
> wrote:
>
>> Hello
>>
>> We are building an adjacency list to represent a graph. Vertexes, Edges
>> and Weights for the same has been extracted from hdfs files by a Spark job.
>> Further we expect size of the adjacency list(Hash Map) could grow over
>> 20Gigs.
>> How can we represent this in RDD, so that it will distributed in nature?
>>
>> Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is
>> there any other way other than GraphX?
>>
>> Thanks and Regards,
>> Harsha
>>
>
>

Re: Adjacency List representation in Spark

Posted by Andrew Ash <an...@andrewash.com>.
Hi Harsha,

You could look through the GraphX source to see the approach taken there
for ideas in your own.  I'd recommend starting at
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala#L385
to see the storage technique.

Why do you want to avoid using GraphX?

Good luck!
Andrew

On Wed, Sep 17, 2014 at 6:43 AM, Harsha HN <99...@gmail.com> wrote:

> Hello
>
> We are building an adjacency list to represent a graph. Vertexes, Edges
> and Weights for the same has been extracted from hdfs files by a Spark job.
> Further we expect size of the adjacency list(Hash Map) could grow over
> 20Gigs.
> How can we represent this in RDD, so that it will distributed in nature?
>
> Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is
> there any other way other than GraphX?
>
> Thanks and Regards,
> Harsha
>