You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hama.apache.org by Suraj Menon <su...@apache.org> on 2013/02/02 23:13:38 UTC

Re: Error when executing BSPPeer process

Hi Francisco, By any chance, is your aggregation runner getting messages
from all vertices, with each message being the vector from each vertex?
That is 35000*65537*4(assuming integer) = roughly 9GB. Apologies if you are
not, because assuming you have 15 tasks for the purpose, the memory usage
for vertex is coming to be around 600 MB. I am looking into limiting the
memory used during Vertex#compute. Apologies for the delay.

On Wed, Jan 30, 2013 at 2:01 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Have you tried to increase the number of tasks?
>
> On Sat, Jan 26, 2013 at 2:34 AM, Francisco Sanches
> <sa...@gmail.com> wrote:
> > Hi
> >
> > With graphs with up to 15,000 vertices can run. I believe it is a
> > quick look at memory saw this being used enough memory swap.
> >
> > 2013/1/24 Edward J. Yoon <ed...@apache.org>:
> >> Have you checked your algorithm with small input? I'm not sure that is
> >> a memory issue.
> >>
> >> On Thu, Jan 24, 2013 at 7:53 PM, Francisco Sanches
> >> <sa...@gmail.com> wrote:
> >>> Hi thanks for the reply,
> >>>
> >>> My input is a graph of 35 000 vertices, but my application uses a
> >>> vector of 65537 to store the distance. Each node has a vector with the
> >>> number of positions (All-pair).
> >>>
> >>> 2013/1/24 Edward J. Yoon <ed...@apache.org>:
> >>>> Hi,
> >>>>
> >>>> How large your input data is?
> >>>>
> >>>> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
> >>>> <sa...@gmail.com> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> When I try to run a larger graph in the application that I developed
> >>>>> here first get this message:
> >>>>>
> >>>>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar
> aplic19jan.jar
> >>>>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
> >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to
> process : 1
> >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to
> process : 1
> >>>>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job:
> job_201301221058_0002
> >>>>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps number: 0
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps number: 1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of
> supersteps: 1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> >>>>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> >>>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TASK_INPUT_RECORDS=42089
> >>>>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to
> process : 42
> >>>>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job:
> job_201301221058_0001
> >>>>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps number: 0
> >>>>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps number: 1
> >>>>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
> >>>>> multiple SLF4J bindings.
> >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> >>>>>
> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> >>>>>
> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
> >>>>> attempt_201301221058_0001_000001_0: SLF4J: See
> >>>>> http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> >>>>> sync.ZKSyncClient: Initializing ZK Sync Client
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> >>>>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
> >>>>> node30/10.16.2.30:61003
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> ipc.Server:
> >>>>> Starting SocketReader
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> ipc.Server:
> >>>>> IPC Server Responder: starting
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> ipc.Server:
> >>>>> IPC Server listener on 61003: starting
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> >>>>> message.HadoopMessageManagerImpl:  BSPPeer address:node30 port:61003
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> ipc.Server:
> >>>>> IPC Server handler 0 on 61003: starting
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
> >>>>> graph.GraphJobRunner: Loading finished at 1 steps.
> >>>>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
> >>>>>
> >>>>> real 5m34.523s
> >>>>> user 0m4.832s
> >>>>> sys 0m0.960s
> >>>>>
> >>>>> Analyzing logs the one that fails has this log:
> >>>>>
> >>>>> /************************************************************
> >>>>> STARTUP_MSG: Starting GroomServer
> >>>>> STARTUP_MSG:   host = node30/10.16.2.30
> >>>>> STARTUP_MSG:   args = [node30]
> >>>>> STARTUP_MSG:   version = 1.0.0
> >>>>> STARTUP_MSG:   build =
> >>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0-r
> >>>>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
> >>>>> ************************************************************/
> >>>>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom
> start
> >>>>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
> >>>>> /tmp/hama-hduser/bsp/local
> >>>>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting
> SocketReader
> >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> listener on 50000: starting
> >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> Responder: starting
> >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer: Worker
> >>>>> rpc server --> node30:50000
> >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 0 on 50000: starting
> >>>>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
> >>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> >>>>> org.mortbay.log.Slf4jLog
> >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer:
> starting
> >>>>> webserver: node30
> >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
> >>>>> returned by webServer.getConnectors()[0].getLocalPort() before open()
> >>>>> is -1. Opening the listener on 40015
> >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
> >>>>> listener.getLocalPort() returned 40015
> >>>>> webServer.getConnectors()[0].getLocalPort() returned 40015
> >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
> >>>>> bound to port 40015
> >>>>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
> >>>>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
> >>>>> SelectChannelConnector@node30:40015
> >>>>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting
> SocketReader
> >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> Responder: starting
> >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> listener on 59102: starting
> >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 0 on 59102: starting
> >>>>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 1 on 59102: starting
> >>>>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 2 on 59102: starting
> >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 4 on 59102: starting
> >>>>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 3 on 59102: starting
> >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 5 on 59102: starting
> >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 6 on 59102: starting
> >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 7 on 59102: starting
> >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 8 on 59102: starting
> >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 9 on 59102: starting
> >>>>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
> >>>>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
> >>>>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer:
> Starting
> >>>>> groom: node30:50000
> >>>>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer: Launch
> 1 tasks.
> >>>>> 2013-01-22 10:59:18,283 WARN org.apache.hadoop.util.NativeCodeLoader:
> >>>>> Unable to load native-hadoop library for your platform... using
> >>>>> builtin-java classes where applicable
> >>>>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> 'attempt_201301221058_0002_000000_0' has started.
> >>>>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> attempt_201301221058_0002_000000_0 is done.
> >>>>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer: Launch
> 3 tasks.
> >>>>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> 'attempt_201301221058_0001_000000_0' has started.
> >>>>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer: Launch
> 3 tasks.
> >>>>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> 'attempt_201301221058_0001_000002_0' has started.
> >>>>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer: Launch
> 3 tasks.
> >>>>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> 'attempt_201301221058_0001_000001_0' has started.
> >>>>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
> >>>>> when executing BSPPeer process.
> >>>>> java.io.IOException: BSP task process exit with nonzero status of
> 137.
> >>>>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
> >>>>> -classpath,
> /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
> >>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
> >>>>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
> >>>>>         at
> org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
> >>>>>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>>>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> >>>>>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> >>>>>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>>>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>>>         at java.lang.Thread.run(Thread.java:722)
> >>>>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer: adding
> >>>>> purge task: attempt_201301221058_0001_000001_0
> >>>>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About
> to
> >>>>> purge task: attempt_201301221058_0001_000001_0
> >>>>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill 1
> tasks.
> >>>>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill 1
> tasks.
> >>>>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill 1
> tasks.
> >>>>>
> >>>>>
> >>>>> What can it be? Some configuration? To not get the message of Java
> >>>>> Heap Space configured:
> >>>>>
> >>>>>   <name> bsp.child.java.opts </ name>
> >>>>>      <value>-Xmx6144m </ value>
> >>>>>
> >>>>> I have only 4 GB of memory in each Groomserver, can it be that? I am
> >>>>> using the trunk version of Hama.
> >>>>>
> >>>>> --
> >>>>> Francisco Sanches
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>> @eddieyoon
> >>>
> >>>
> >>>
> >>> --
> >>> Francisco Sanches
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >
> >
> >
> > --
> > Francisco Sanches
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error when executing BSPPeer process

Posted by Francisco Sanches <sa...@gmail.com>.
Hi Suraj,

I made some changes in my application, now it packs in a Text information
vertexid and distance from the end of the superstep it sends to its
adjacent Text. But I think the problem is as you observed Suraj, my
application uses too much memory even. I have observed that the ram on all
nodes of the cluster has been fully utilized. The swap memory has also been
used. In a cluster with 14 computers with 4 gb of ram, the largest graph
that could run was a 15 thousand vertices. I need to be able to run one of
at least 35 000 vertices. Would if I change the memory to 8 gb, I can run a
bigger graph? I'll try to organize here to change this setting memory, take
a few days when I finish post the result here.

2013/2/2 Suraj Menon <su...@apache.org>

> Hi Francisco, By any chance, is your aggregation runner getting messages
> from all vertices, with each message being the vector from each vertex?
> That is 35000*65537*4(assuming integer) = roughly 9GB. Apologies if you are
> not, because assuming you have 15 tasks for the purpose, the memory usage
> for vertex is coming to be around 600 MB. I am looking into limiting the
> memory used during Vertex#compute. Apologies for the delay.
>
> On Wed, Jan 30, 2013 at 2:01 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > Have you tried to increase the number of tasks?
> >
> > On Sat, Jan 26, 2013 at 2:34 AM, Francisco Sanches
> > <sa...@gmail.com> wrote:
> > > Hi
> > >
> > > With graphs with up to 15,000 vertices can run. I believe it is a
> > > quick look at memory saw this being used enough memory swap.
> > >
> > > 2013/1/24 Edward J. Yoon <ed...@apache.org>:
> > >> Have you checked your algorithm with small input? I'm not sure that is
> > >> a memory issue.
> > >>
> > >> On Thu, Jan 24, 2013 at 7:53 PM, Francisco Sanches
> > >> <sa...@gmail.com> wrote:
> > >>> Hi thanks for the reply,
> > >>>
> > >>> My input is a graph of 35 000 vertices, but my application uses a
> > >>> vector of 65537 to store the distance. Each node has a vector with
> the
> > >>> number of positions (All-pair).
> > >>>
> > >>> 2013/1/24 Edward J. Yoon <ed...@apache.org>:
> > >>>> Hi,
> > >>>>
> > >>>> How large your input data is?
> > >>>>
> > >>>> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
> > >>>> <sa...@gmail.com> wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> When I try to run a larger graph in the application that I
> developed
> > >>>>> here first get this message:
> > >>>>>
> > >>>>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar
> > aplic19jan.jar
> > >>>>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
> > >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to
> > process : 1
> > >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to
> > process : 1
> > >>>>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job:
> > job_201301221058_0002
> > >>>>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps
> number: 0
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps
> number: 1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of
> > supersteps: 1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> > >>>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> > >>>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> TASK_INPUT_RECORDS=42089
> > >>>>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to
> > process : 42
> > >>>>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job:
> > job_201301221058_0001
> > >>>>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps
> number: 0
> > >>>>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps
> number: 1
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
> > >>>>> multiple SLF4J bindings.
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> > >>>>>
> >
> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> > >>>>>
> >
> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: See
> > >>>>> http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> > >>>>> sync.ZKSyncClient: Initializing ZK Sync Client
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> > >>>>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
> > >>>>> node30/10.16.2.30:61003
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> Starting SocketReader
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server Responder: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server listener on 61003: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > >>>>> message.HadoopMessageManagerImpl:  BSPPeer address:node30
> port:61003
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server handler 0 on 61003: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
> > >>>>> graph.GraphJobRunner: Loading finished at 1 steps.
> > >>>>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
> > >>>>>
> > >>>>> real 5m34.523s
> > >>>>> user 0m4.832s
> > >>>>> sys 0m0.960s
> > >>>>>
> > >>>>> Analyzing logs the one that fails has this log:
> > >>>>>
> > >>>>> /************************************************************
> > >>>>> STARTUP_MSG: Starting GroomServer
> > >>>>> STARTUP_MSG:   host = node30/10.16.2.30
> > >>>>> STARTUP_MSG:   args = [node30]
> > >>>>> STARTUP_MSG:   version = 1.0.0
> > >>>>> STARTUP_MSG:   build =
> > >>>>>
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0-r
> > >>>>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
> > >>>>> ************************************************************/
> > >>>>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom
> > start
> > >>>>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
> > >>>>> /tmp/hama-hduser/bsp/local
> > >>>>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting
> > SocketReader
> > >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> listener on 50000: starting
> > >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> Responder: starting
> > >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer:
> Worker
> > >>>>> rpc server --> node30:50000
> > >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 0 on 50000: starting
> > >>>>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
> > >>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > >>>>> org.mortbay.log.Slf4jLog
> > >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer:
> > starting
> > >>>>> webserver: node30
> > >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
> > >>>>> returned by webServer.getConnectors()[0].getLocalPort() before
> open()
> > >>>>> is -1. Opening the listener on 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
> > >>>>> listener.getLocalPort() returned 40015
> > >>>>> webServer.getConnectors()[0].getLocalPort() returned 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
> > >>>>> bound to port 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
> > >>>>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
> > >>>>> SelectChannelConnector@node30:40015
> > >>>>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting
> > SocketReader
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> Responder: starting
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> listener on 59102: starting
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 0 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 1 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 2 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 4 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 3 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 5 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 6 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 7 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 8 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 9 on 59102: starting
> > >>>>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
> > >>>>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
> > >>>>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer:
> > Starting
> > >>>>> groom: node30:50000
> > >>>>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 1 tasks.
> > >>>>> 2013-01-22 10:59:18,283 WARN
> org.apache.hadoop.util.NativeCodeLoader:
> > >>>>> Unable to load native-hadoop library for your platform... using
> > >>>>> builtin-java classes where applicable
> > >>>>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> 'attempt_201301221058_0002_000000_0' has started.
> > >>>>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> attempt_201301221058_0002_000000_0 is done.
> > >>>>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> 'attempt_201301221058_0001_000000_0' has started.
> > >>>>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> 'attempt_201301221058_0001_000002_0' has started.
> > >>>>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> 'attempt_201301221058_0001_000001_0' has started.
> > >>>>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
> > >>>>> when executing BSPPeer process.
> > >>>>> java.io.IOException: BSP task process exit with nonzero status of
> > 137.
> > >>>>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
> > >>>>> -classpath,
> >
> /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
> > >>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
> > >>>>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
> > >>>>>         at
> > org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
> > >>>>>         at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>>>         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > >>>>>         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > >>>>>         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >>>>>         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >>>>>         at java.lang.Thread.run(Thread.java:722)
> > >>>>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer:
> adding
> > >>>>> purge task: attempt_201301221058_0001_000001_0
> > >>>>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About
> > to
> > >>>>> purge task: attempt_201301221058_0001_000001_0
> > >>>>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill
> 1
> > tasks.
> > >>>>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill
> 1
> > tasks.
> > >>>>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill
> 1
> > tasks.
> > >>>>>
> > >>>>>
> > >>>>> What can it be? Some configuration? To not get the message of Java
> > >>>>> Heap Space configured:
> > >>>>>
> > >>>>>   <name> bsp.child.java.opts </ name>
> > >>>>>      <value>-Xmx6144m </ value>
> > >>>>>
> > >>>>> I have only 4 GB of memory in each Groomserver, can it be that? I
> am
> > >>>>> using the trunk version of Hama.
> > >>>>>
> > >>>>> --
> > >>>>> Francisco Sanches
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best Regards, Edward J. Yoon
> > >>>> @eddieyoon
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Francisco Sanches
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >> @eddieyoon
> > >
> > >
> > >
> > > --
> > > Francisco Sanches
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>



-- 
Francisco Sanches