You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by James <al...@gmail.com> on 2014/11/02 06:57:26 UTC

How to correctly extimate the number of partition of a graph in GraphX

Hello,

I am trying to run Connected Component algorithm on a very big graph. In
practice I found that a small number of partition size would lead to OOM,
while a large number would cause various time out exceptions. Thus I wonder
how to estimate the number of partition of a graph in GraphX?

Alcaid

Re: How to correctly extimate the number of partition of a graph in GraphX

Posted by James <al...@gmail.com>.
Hello,

We get a graph with 100B edges of nearly 800GB in gz format.
We have 80 machines, each one has 60GB memory.
I have not ever seen the program run to completion.

Alcaid

2014-11-02 14:06 GMT+08:00 Ankur Dave <an...@gmail.com>:

> How large is your graph, and how much memory does your cluster have?
>
> We don't have a good way to determine the *optimal* number of partitions
> aside from trial and error, but to get the job to at least run to
> completion, it might help to use the MEMORY_AND_DISK storage level and a
> large number of partitions.
>
> Ankur <http://www.ankurdave.com/>
>
> On Sat, Nov 1, 2014 at 10:57 PM, James <al...@gmail.com> wrote:
>
>> Hello,
>>
>> I am trying to run Connected Component algorithm on a very big graph. In
>> practice I found that a small number of partition size would lead to OOM,
>> while a large number would cause various time out exceptions. Thus I wonder
>> how to estimate the number of partition of a graph in GraphX?
>>
>> Alcaid
>>
>
>

Re: How to correctly extimate the number of partition of a graph in GraphX

Posted by Ankur Dave <an...@gmail.com>.
How large is your graph, and how much memory does your cluster have?

We don't have a good way to determine the *optimal* number of partitions
aside from trial and error, but to get the job to at least run to
completion, it might help to use the MEMORY_AND_DISK storage level and a
large number of partitions.

Ankur <http://www.ankurdave.com/>

On Sat, Nov 1, 2014 at 10:57 PM, James <al...@gmail.com> wrote:

> Hello,
>
> I am trying to run Connected Component algorithm on a very big graph. In
> practice I found that a small number of partition size would lead to OOM,
> while a large number would cause various time out exceptions. Thus I wonder
> how to estimate the number of partition of a graph in GraphX?
>
> Alcaid
>