You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by jerome richard <je...@msn.com> on 2013/07/26 14:02:25 UTC
Scaling Problem
Hi,
I encountered a critical scaling problem using Giraph. I made a very simple algorithm to test Giraph on large graphs : a connexity test. It works on relatively large graphs (3 072 441 nodes and 117 185 083 edges) but not on very large graph (52 000 000 nodes and 2 000 000 000 edges). In fact, during the processing of the biggest graph, Giraph core seems to fail after the superstep 14 (15 on some jobs). The input graph size is 30 GB stored as text and the output is also stored as text. 9 working jobs are used to compute the graph.
Here is the tracktrace of jobs (this is the same for the 9 jobs): java.lang.IllegalStateException: run: Caught an unrecoverable exception exists: Failed to check /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions after 3 tries! at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.IllegalStateException: exists: Failed to check /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions after 3 tries! at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369) at org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:678) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:248) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more
Could you help me to solve this problem?If you need the code of the program, I can put that here (the code is relatively tiny).
Thanks, Jérôme.
RE: Scaling Problem
Posted by jerome richard <je...@msn.com>.
"Can you paste your cluster information ?"
What kind of information do you need?How can I get these informations?
"What are your message types?"
The message type is just LongWritable. I don't use collections during the graph processing. I use collections just to load the input graph but it seems to works perfectly. Is it possible to avoid allocation of primitive writables (like LongWritable) to increase performances and use less memory ?
"How you invoke the job?"
Here is the command typed in my terminal to start the job : hadoop jar hadoop/jars/test-connexity.jar \ lifo.giraph.test.Main \ /test-connexity \ /test-connexity-output \ 10
The first Giraph argument is the input file, the second is the output file and the last is the number of workers.Please find attached the code of my Giraph application. Main.java configure and start the job. VertexComputation.java compute the data and the thow last file define how to load the input and save the output graph.
PS : I'm not English, so I'm sorry if I do some language mistakes.
Thanks for your help.
Date: Fri, 26 Jul 2013 08:13:22 -0700
From: aching@apache.org
To: user@giraph.apache.org
Subject: Re: Scaling Problem
Hi guys,
At some point, we do need to help with a guide for conserving
memory, but this is a generic Java problem. You can work around
it by avoiding objects as much as possible by using primitives
directly. If you need primitive collections see FastUtils,
Trove, etc. Combiners also save a lot of memory for messages.
What are your message types?
Avery
On 7/26/13 6:53 AM, Puneet Jain wrote:
Can you paste your cluster information ? I am also
struggling to make it work on 75M vertices and 100s of million
edges.
On Fri, Jul 26, 2013 at 8:02 AM, jerome
richard <je...@msn.com>
wrote:
Hi,
I
encountered a critical scaling problem using
Giraph. I made a very simple algorithm to test
Giraph on large graphs : a connexity test. It
works on relatively large graphs (3 072 441 nodes
and 117 185 083 edges) but not on very large graph
(52 000 000 nodes and 2 000 000 000 edges).
In
fact, during the processing of the biggest graph,
Giraph core seems to fail after the superstep 14
(15 on some jobs). The input graph size is 30 GB
stored as text and the output is also stored as
text. 9 working jobs are used to compute the
graph.
Here
is the tracktrace of jobs (this is the same for
the 9 jobs):
java.lang.IllegalStateException: run: Caught an
unrecoverable exception exists: Failed to check
/_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
after 3 tries!
at
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at
java.security.AccessController.doPrivileged(Native
Method)
at javax.security.auth.Subject.doAs(Unknown
Source)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at
org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.IllegalStateException:
exists: Failed to check
/_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
after 3 tries!
at
org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
at
org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:678)
at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:248)
at
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
... 7 more
Could
you help me to solve this problem?
If
you need the code of the program, I can put that
here (the code is relatively tiny).
Thanks,
Jérôme.
--
--Puneet
Re: Scaling Problem
Posted by Avery Ching <ac...@apache.org>.
Hi guys,
At some point, we do need to help with a guide for conserving memory,
but this is a generic Java problem. You can work around it by avoiding
objects as much as possible by using primitives directly. If you need
primitive collections see FastUtils, Trove, etc. Combiners also save a
lot of memory for messages.
What are your message types?
Avery
On 7/26/13 6:53 AM, Puneet Jain wrote:
> Can you paste your cluster information ? I am also struggling to make
> it work on 75M vertices and 100s of million edges.
>
>
> On Fri, Jul 26, 2013 at 8:02 AM, jerome richard
> <jeromerichard111@msn.com <ma...@msn.com>> wrote:
>
> Hi,
>
> I encountered a critical scaling problem using Giraph. I made a
> very simple algorithm to test Giraph on large graphs : a connexity
> test. It works on relatively large graphs (3 072 441 nodes and 117
> 185 083 edges) but not on very large graph (52 000 000 nodes and 2
> 000 000 000 edges).
> In fact, during the processing of the biggest graph, Giraph core
> seems to fail after the superstep 14 (15 on some jobs). The input
> graph size is 30 GB stored as text and the output is also stored
> as text. 9 working jobs are used to compute the graph.
>
> Here is the tracktrace of jobs (this is the same for the 9 jobs):
> java.lang.IllegalStateException: run: Caught an unrecoverable
> exception exists: Failed to check
> /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
> after 3 tries!
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
> at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Unknown Source)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.IllegalStateException: exists: Failed to
> check
> /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
> after 3 tries!
> at
> org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
> at
> org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:678)
> at
> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:248)
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
> ... 7 more
>
> Could you help me to solve this problem?
> If you need the code of the program, I can put that here (the code
> is relatively tiny).
>
> Thanks,
> Jérôme.
>
>
>
>
> --
> --Puneet
Re: Scaling Problem
Posted by Puneet Jain <pu...@gmail.com>.
Can you paste your cluster information ? I am also struggling to make it
work on 75M vertices and 100s of million edges.
On Fri, Jul 26, 2013 at 8:02 AM, jerome richard <je...@msn.com>wrote:
> Hi,
>
> I encountered a critical scaling problem using Giraph. I made a very
> simple algorithm to test Giraph on large graphs : a connexity test. It
> works on relatively large graphs (3 072 441 nodes and 117 185 083 edges)
> but not on very large graph (52 000 000 nodes and 2 000 000 000 edges).
> In fact, during the processing of the biggest graph, Giraph core seems to
> fail after the superstep 14 (15 on some jobs). The input graph size is 30
> GB stored as text and the output is also stored as text. 9 working jobs are
> used to compute the graph.
>
> Here is the tracktrace of jobs (this is the same for the 9 jobs):
> java.lang.IllegalStateException: run: Caught an unrecoverable
> exception exists: Failed to check
> /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
> after 3 tries!
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Unknown Source)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.IllegalStateException: exists: Failed to check
> /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
> after 3 tries!
> at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
> at
> org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:678)
> at
> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:248)
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
> ... 7 more
>
> Could you help me to solve this problem?
> If you need the code of the program, I can put that here (the code is
> relatively tiny).
>
> Thanks,
> Jérôme.
>
>
--
--Puneet
Re: Scaling Problem
Posted by Han JU <ju...@gmail.com>.
What's your cluster configuration? How you invoke the job?
2013/7/26 jerome richard <je...@msn.com>
> Hi,
>
> I encountered a critical scaling problem using Giraph. I made a very
> simple algorithm to test Giraph on large graphs : a connexity test. It
> works on relatively large graphs (3 072 441 nodes and 117 185 083 edges)
> but not on very large graph (52 000 000 nodes and 2 000 000 000 edges).
> In fact, during the processing of the biggest graph, Giraph core seems to
> fail after the superstep 14 (15 on some jobs). The input graph size is 30
> GB stored as text and the output is also stored as text. 9 working jobs are
> used to compute the graph.
>
> Here is the tracktrace of jobs (this is the same for the 9 jobs):
> java.lang.IllegalStateException: run: Caught an unrecoverable
> exception exists: Failed to check
> /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
> after 3 tries!
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Unknown Source)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.IllegalStateException: exists: Failed to check
> /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
> after 3 tries!
> at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
> at
> org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:678)
> at
> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:248)
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
> ... 7 more
>
> Could you help me to solve this problem?
> If you need the code of the program, I can put that here (the code is
> relatively tiny).
>
> Thanks,
> Jérôme.
>
>
--
*JU Han*
Software Engineer Intern @ KXEN Inc.
UTC - Université de Technologie de Compiègne
* **GI06 - Fouille de Données et Décisionnel*
+33 0619608888