You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hama.apache.org by Francisco Sanches <sa...@gmail.com> on 2013/01/22 15:21:22 UTC

Error when executing BSPPeer process

Hi,

When I try to run a larger graph in the application that I developed
here first get this message:

hduser@node28:/usr/local/hama-trunk$ time bin/hama jar aplic19jan.jar
65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job: job_201301221058_0002
13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps number: 0
13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps number: 1
13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of supersteps: 1
13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
13/01/22 10:59:32 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter
13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
13/01/22 10:59:32 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter
13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
13/01/22 10:59:32 INFO bsp.BSPJobClient:     TASK_INPUT_RECORDS=42089
13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to process : 42
13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job: job_201301221058_0001
13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps number: 0
13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps number: 1
attempt_201301221058_0001_000001_0: SLF4J: Class path contains
multiple SLF4J bindings.
attempt_201301221058_0001_000001_0: SLF4J: Found binding in
[jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201301221058_0001_000001_0: SLF4J: Found binding in
[file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201301221058_0001_000001_0: SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
sync.ZKSyncClient: Initializing ZK Sync Client
attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
node30/10.16.2.30:61003
attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
Starting SocketReader
attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
IPC Server Responder: starting
attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
IPC Server listener on 61003: starting
attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
message.HadoopMessageManagerImpl:  BSPPeer address:node30 port:61003
attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
IPC Server handler 0 on 61003: starting
attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
graph.GraphJobRunner: Loading finished at 1 steps.
13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.

real 5m34.523s
user 0m4.832s
sys 0m0.960s

Analyzing logs the one that fails has this log:

/************************************************************
STARTUP_MSG: Starting GroomServer
STARTUP_MSG:   host = node30/10.16.2.30
STARTUP_MSG:   args = [node30]
STARTUP_MSG:   version = 1.0.0
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
************************************************************/
2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom start
2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
/tmp/hama-hduser/bsp/local
2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 50000: starting
2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer: Worker
rpc server --> node30:50000
2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 50000: starting
2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer: starting
webserver: node30
2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open()
is -1. Opening the listener on 40015
2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
listener.getLocalPort() returned 40015
webServer.getConnectors()[0].getLocalPort() returned 40015
2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
bound to port 40015
2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
SelectChannelConnector@node30:40015
2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 59102: starting
2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 59102: starting
2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 59102: starting
2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 59102: starting
2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 59102: starting
2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 59102: starting
2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 59102: starting
2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 59102: starting
2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 59102: starting
2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 59102: starting
2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 59102: starting
2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
GroomServer up at: 127.0.0.1/127.0.0.1:59102
2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer: Starting
groom: node30:50000
2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer: Launch 1 tasks.
2013-01-22 10:59:18,283 WARN org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201301221058_0002_000000_0' has started.
2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
attempt_201301221058_0002_000000_0 is done.
2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201301221058_0001_000000_0' has started.
2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201301221058_0001_000002_0' has started.
2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201301221058_0001_000001_0' has started.
2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
when executing BSPPeer process.
java.io.IOException: BSP task process exit with nonzero status of 137.
command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
-classpath, /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
	at org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer: adding
purge task: attempt_201301221058_0001_000001_0
2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About to
purge task: attempt_201301221058_0001_000001_0
2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.


What can it be? Some configuration? To not get the message of Java
Heap Space configured:

  <name> bsp.child.java.opts </ name>
     <value>-Xmx6144m </ value>

I have only 4 GB of memory in each Groomserver, can it be that? I am
using the trunk version of Hama.

-- 
Francisco Sanches

Re: Error when executing BSPPeer process

Posted by Francisco Sanches <sa...@gmail.com>.
Hi Suraj,

I made some changes in my application, now it packs in a Text information
vertexid and distance from the end of the superstep it sends to its
adjacent Text. But I think the problem is as you observed Suraj, my
application uses too much memory even. I have observed that the ram on all
nodes of the cluster has been fully utilized. The swap memory has also been
used. In a cluster with 14 computers with 4 gb of ram, the largest graph
that could run was a 15 thousand vertices. I need to be able to run one of
at least 35 000 vertices. Would if I change the memory to 8 gb, I can run a
bigger graph? I'll try to organize here to change this setting memory, take
a few days when I finish post the result here.

2013/2/2 Suraj Menon <su...@apache.org>

> Hi Francisco, By any chance, is your aggregation runner getting messages
> from all vertices, with each message being the vector from each vertex?
> That is 35000*65537*4(assuming integer) = roughly 9GB. Apologies if you are
> not, because assuming you have 15 tasks for the purpose, the memory usage
> for vertex is coming to be around 600 MB. I am looking into limiting the
> memory used during Vertex#compute. Apologies for the delay.
>
> On Wed, Jan 30, 2013 at 2:01 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > Have you tried to increase the number of tasks?
> >
> > On Sat, Jan 26, 2013 at 2:34 AM, Francisco Sanches
> > <sa...@gmail.com> wrote:
> > > Hi
> > >
> > > With graphs with up to 15,000 vertices can run. I believe it is a
> > > quick look at memory saw this being used enough memory swap.
> > >
> > > 2013/1/24 Edward J. Yoon <ed...@apache.org>:
> > >> Have you checked your algorithm with small input? I'm not sure that is
> > >> a memory issue.
> > >>
> > >> On Thu, Jan 24, 2013 at 7:53 PM, Francisco Sanches
> > >> <sa...@gmail.com> wrote:
> > >>> Hi thanks for the reply,
> > >>>
> > >>> My input is a graph of 35 000 vertices, but my application uses a
> > >>> vector of 65537 to store the distance. Each node has a vector with
> the
> > >>> number of positions (All-pair).
> > >>>
> > >>> 2013/1/24 Edward J. Yoon <ed...@apache.org>:
> > >>>> Hi,
> > >>>>
> > >>>> How large your input data is?
> > >>>>
> > >>>> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
> > >>>> <sa...@gmail.com> wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> When I try to run a larger graph in the application that I
> developed
> > >>>>> here first get this message:
> > >>>>>
> > >>>>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar
> > aplic19jan.jar
> > >>>>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
> > >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to
> > process : 1
> > >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to
> > process : 1
> > >>>>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job:
> > job_201301221058_0002
> > >>>>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps
> number: 0
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps
> number: 1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of
> > supersteps: 1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> > >>>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> > >>>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> TASK_INPUT_RECORDS=42089
> > >>>>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to
> > process : 42
> > >>>>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job:
> > job_201301221058_0001
> > >>>>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps
> number: 0
> > >>>>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps
> number: 1
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
> > >>>>> multiple SLF4J bindings.
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> > >>>>>
> >
> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> > >>>>>
> >
> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: See
> > >>>>> http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> > >>>>> sync.ZKSyncClient: Initializing ZK Sync Client
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> > >>>>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
> > >>>>> node30/10.16.2.30:61003
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> Starting SocketReader
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server Responder: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server listener on 61003: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > >>>>> message.HadoopMessageManagerImpl:  BSPPeer address:node30
> port:61003
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server handler 0 on 61003: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
> > >>>>> graph.GraphJobRunner: Loading finished at 1 steps.
> > >>>>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
> > >>>>>
> > >>>>> real 5m34.523s
> > >>>>> user 0m4.832s
> > >>>>> sys 0m0.960s
> > >>>>>
> > >>>>> Analyzing logs the one that fails has this log:
> > >>>>>
> > >>>>> /************************************************************
> > >>>>> STARTUP_MSG: Starting GroomServer
> > >>>>> STARTUP_MSG:   host = node30/10.16.2.30
> > >>>>> STARTUP_MSG:   args = [node30]
> > >>>>> STARTUP_MSG:   version = 1.0.0
> > >>>>> STARTUP_MSG:   build =
> > >>>>>
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0-r
> > >>>>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
> > >>>>> ************************************************************/
> > >>>>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom
> > start
> > >>>>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
> > >>>>> /tmp/hama-hduser/bsp/local
> > >>>>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting
> > SocketReader
> > >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> listener on 50000: starting
> > >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> Responder: starting
> > >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer:
> Worker
> > >>>>> rpc server --> node30:50000
> > >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 0 on 50000: starting
> > >>>>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
> > >>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > >>>>> org.mortbay.log.Slf4jLog
> > >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer:
> > starting
> > >>>>> webserver: node30
> > >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
> > >>>>> returned by webServer.getConnectors()[0].getLocalPort() before
> open()
> > >>>>> is -1. Opening the listener on 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
> > >>>>> listener.getLocalPort() returned 40015
> > >>>>> webServer.getConnectors()[0].getLocalPort() returned 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
> > >>>>> bound to port 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
> > >>>>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
> > >>>>> SelectChannelConnector@node30:40015
> > >>>>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting
> > SocketReader
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> Responder: starting
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> listener on 59102: starting
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 0 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 1 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 2 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 4 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 3 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 5 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 6 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 7 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 8 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > >>>>> handler 9 on 59102: starting
> > >>>>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
> > >>>>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
> > >>>>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer:
> > Starting
> > >>>>> groom: node30:50000
> > >>>>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 1 tasks.
> > >>>>> 2013-01-22 10:59:18,283 WARN
> org.apache.hadoop.util.NativeCodeLoader:
> > >>>>> Unable to load native-hadoop library for your platform... using
> > >>>>> builtin-java classes where applicable
> > >>>>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> 'attempt_201301221058_0002_000000_0' has started.
> > >>>>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> attempt_201301221058_0002_000000_0 is done.
> > >>>>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> 'attempt_201301221058_0001_000000_0' has started.
> > >>>>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> 'attempt_201301221058_0001_000002_0' has started.
> > >>>>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
> > >>>>> 'attempt_201301221058_0001_000001_0' has started.
> > >>>>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
> > >>>>> when executing BSPPeer process.
> > >>>>> java.io.IOException: BSP task process exit with nonzero status of
> > 137.
> > >>>>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
> > >>>>> -classpath,
> >
> /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
> > >>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
> > >>>>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
> > >>>>>         at
> > org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
> > >>>>>         at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>>>         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > >>>>>         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > >>>>>         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >>>>>         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >>>>>         at java.lang.Thread.run(Thread.java:722)
> > >>>>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer:
> adding
> > >>>>> purge task: attempt_201301221058_0001_000001_0
> > >>>>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About
> > to
> > >>>>> purge task: attempt_201301221058_0001_000001_0
> > >>>>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill
> 1
> > tasks.
> > >>>>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill
> 1
> > tasks.
> > >>>>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill
> 1
> > tasks.
> > >>>>>
> > >>>>>
> > >>>>> What can it be? Some configuration? To not get the message of Java
> > >>>>> Heap Space configured:
> > >>>>>
> > >>>>>   <name> bsp.child.java.opts </ name>
> > >>>>>      <value>-Xmx6144m </ value>
> > >>>>>
> > >>>>> I have only 4 GB of memory in each Groomserver, can it be that? I
> am
> > >>>>> using the trunk version of Hama.
> > >>>>>
> > >>>>> --
> > >>>>> Francisco Sanches
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best Regards, Edward J. Yoon
> > >>>> @eddieyoon
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Francisco Sanches
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >> @eddieyoon
> > >
> > >
> > >
> > > --
> > > Francisco Sanches
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>



-- 
Francisco Sanches

Re: Error when executing BSPPeer process

Posted by Suraj Menon <su...@apache.org>.
Hi Francisco, By any chance, is your aggregation runner getting messages
from all vertices, with each message being the vector from each vertex?
That is 35000*65537*4(assuming integer) = roughly 9GB. Apologies if you are
not, because assuming you have 15 tasks for the purpose, the memory usage
for vertex is coming to be around 600 MB. I am looking into limiting the
memory used during Vertex#compute. Apologies for the delay.

On Wed, Jan 30, 2013 at 2:01 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Have you tried to increase the number of tasks?
>
> On Sat, Jan 26, 2013 at 2:34 AM, Francisco Sanches
> <sa...@gmail.com> wrote:
> > Hi
> >
> > With graphs with up to 15,000 vertices can run. I believe it is a
> > quick look at memory saw this being used enough memory swap.
> >
> > 2013/1/24 Edward J. Yoon <ed...@apache.org>:
> >> Have you checked your algorithm with small input? I'm not sure that is
> >> a memory issue.
> >>
> >> On Thu, Jan 24, 2013 at 7:53 PM, Francisco Sanches
> >> <sa...@gmail.com> wrote:
> >>> Hi thanks for the reply,
> >>>
> >>> My input is a graph of 35 000 vertices, but my application uses a
> >>> vector of 65537 to store the distance. Each node has a vector with the
> >>> number of positions (All-pair).
> >>>
> >>> 2013/1/24 Edward J. Yoon <ed...@apache.org>:
> >>>> Hi,
> >>>>
> >>>> How large your input data is?
> >>>>
> >>>> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
> >>>> <sa...@gmail.com> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> When I try to run a larger graph in the application that I developed
> >>>>> here first get this message:
> >>>>>
> >>>>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar
> aplic19jan.jar
> >>>>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
> >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to
> process : 1
> >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to
> process : 1
> >>>>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job:
> job_201301221058_0002
> >>>>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps number: 0
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps number: 1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of
> supersteps: 1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> >>>>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> >>>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
> >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TASK_INPUT_RECORDS=42089
> >>>>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to
> process : 42
> >>>>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job:
> job_201301221058_0001
> >>>>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps number: 0
> >>>>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps number: 1
> >>>>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
> >>>>> multiple SLF4J bindings.
> >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> >>>>>
> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> >>>>>
> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
> >>>>> attempt_201301221058_0001_000001_0: SLF4J: See
> >>>>> http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> >>>>> sync.ZKSyncClient: Initializing ZK Sync Client
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> >>>>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
> >>>>> node30/10.16.2.30:61003
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> ipc.Server:
> >>>>> Starting SocketReader
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> ipc.Server:
> >>>>> IPC Server Responder: starting
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> ipc.Server:
> >>>>> IPC Server listener on 61003: starting
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> >>>>> message.HadoopMessageManagerImpl:  BSPPeer address:node30 port:61003
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> ipc.Server:
> >>>>> IPC Server handler 0 on 61003: starting
> >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
> >>>>> graph.GraphJobRunner: Loading finished at 1 steps.
> >>>>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
> >>>>>
> >>>>> real 5m34.523s
> >>>>> user 0m4.832s
> >>>>> sys 0m0.960s
> >>>>>
> >>>>> Analyzing logs the one that fails has this log:
> >>>>>
> >>>>> /************************************************************
> >>>>> STARTUP_MSG: Starting GroomServer
> >>>>> STARTUP_MSG:   host = node30/10.16.2.30
> >>>>> STARTUP_MSG:   args = [node30]
> >>>>> STARTUP_MSG:   version = 1.0.0
> >>>>> STARTUP_MSG:   build =
> >>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0-r
> >>>>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
> >>>>> ************************************************************/
> >>>>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom
> start
> >>>>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
> >>>>> /tmp/hama-hduser/bsp/local
> >>>>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting
> SocketReader
> >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> listener on 50000: starting
> >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> Responder: starting
> >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer: Worker
> >>>>> rpc server --> node30:50000
> >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 0 on 50000: starting
> >>>>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
> >>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> >>>>> org.mortbay.log.Slf4jLog
> >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer:
> starting
> >>>>> webserver: node30
> >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
> >>>>> returned by webServer.getConnectors()[0].getLocalPort() before open()
> >>>>> is -1. Opening the listener on 40015
> >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
> >>>>> listener.getLocalPort() returned 40015
> >>>>> webServer.getConnectors()[0].getLocalPort() returned 40015
> >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
> >>>>> bound to port 40015
> >>>>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
> >>>>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
> >>>>> SelectChannelConnector@node30:40015
> >>>>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting
> SocketReader
> >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> Responder: starting
> >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> listener on 59102: starting
> >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 0 on 59102: starting
> >>>>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 1 on 59102: starting
> >>>>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 2 on 59102: starting
> >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 4 on 59102: starting
> >>>>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 3 on 59102: starting
> >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 5 on 59102: starting
> >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 6 on 59102: starting
> >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 7 on 59102: starting
> >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 8 on 59102: starting
> >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> >>>>> handler 9 on 59102: starting
> >>>>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
> >>>>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
> >>>>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer:
> Starting
> >>>>> groom: node30:50000
> >>>>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer: Launch
> 1 tasks.
> >>>>> 2013-01-22 10:59:18,283 WARN org.apache.hadoop.util.NativeCodeLoader:
> >>>>> Unable to load native-hadoop library for your platform... using
> >>>>> builtin-java classes where applicable
> >>>>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> 'attempt_201301221058_0002_000000_0' has started.
> >>>>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> attempt_201301221058_0002_000000_0 is done.
> >>>>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer: Launch
> 3 tasks.
> >>>>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> 'attempt_201301221058_0001_000000_0' has started.
> >>>>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer: Launch
> 3 tasks.
> >>>>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> 'attempt_201301221058_0001_000002_0' has started.
> >>>>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer: Launch
> 3 tasks.
> >>>>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
> >>>>> 'attempt_201301221058_0001_000001_0' has started.
> >>>>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
> >>>>> when executing BSPPeer process.
> >>>>> java.io.IOException: BSP task process exit with nonzero status of
> 137.
> >>>>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
> >>>>> -classpath,
> /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
> >>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
> >>>>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
> >>>>>         at
> org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
> >>>>>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>>>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> >>>>>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> >>>>>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>>>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>>>         at java.lang.Thread.run(Thread.java:722)
> >>>>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer: adding
> >>>>> purge task: attempt_201301221058_0001_000001_0
> >>>>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About
> to
> >>>>> purge task: attempt_201301221058_0001_000001_0
> >>>>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill 1
> tasks.
> >>>>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill 1
> tasks.
> >>>>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill 1
> tasks.
> >>>>>
> >>>>>
> >>>>> What can it be? Some configuration? To not get the message of Java
> >>>>> Heap Space configured:
> >>>>>
> >>>>>   <name> bsp.child.java.opts </ name>
> >>>>>      <value>-Xmx6144m </ value>
> >>>>>
> >>>>> I have only 4 GB of memory in each Groomserver, can it be that? I am
> >>>>> using the trunk version of Hama.
> >>>>>
> >>>>> --
> >>>>> Francisco Sanches
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>> @eddieyoon
> >>>
> >>>
> >>>
> >>> --
> >>> Francisco Sanches
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >
> >
> >
> > --
> > Francisco Sanches
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error when executing BSPPeer process

Posted by "Edward J. Yoon" <ed...@apache.org>.
Have you tried to increase the number of tasks?

On Sat, Jan 26, 2013 at 2:34 AM, Francisco Sanches
<sa...@gmail.com> wrote:
> Hi
>
> With graphs with up to 15,000 vertices can run. I believe it is a
> quick look at memory saw this being used enough memory swap.
>
> 2013/1/24 Edward J. Yoon <ed...@apache.org>:
>> Have you checked your algorithm with small input? I'm not sure that is
>> a memory issue.
>>
>> On Thu, Jan 24, 2013 at 7:53 PM, Francisco Sanches
>> <sa...@gmail.com> wrote:
>>> Hi thanks for the reply,
>>>
>>> My input is a graph of 35 000 vertices, but my application uses a
>>> vector of 65537 to store the distance. Each node has a vector with the
>>> number of positions (All-pair).
>>>
>>> 2013/1/24 Edward J. Yoon <ed...@apache.org>:
>>>> Hi,
>>>>
>>>> How large your input data is?
>>>>
>>>> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
>>>> <sa...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> When I try to run a larger graph in the application that I developed
>>>>> here first get this message:
>>>>>
>>>>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar aplic19jan.jar
>>>>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
>>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
>>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
>>>>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job: job_201301221058_0002
>>>>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps number: 0
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps number: 1
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of supersteps: 1
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
>>>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
>>>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
>>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TASK_INPUT_RECORDS=42089
>>>>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to process : 42
>>>>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job: job_201301221058_0001
>>>>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps number: 0
>>>>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps number: 1
>>>>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
>>>>> multiple SLF4J bindings.
>>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
>>>>> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
>>>>> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> attempt_201301221058_0001_000001_0: SLF4J: See
>>>>> http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
>>>>> sync.ZKSyncClient: Initializing ZK Sync Client
>>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
>>>>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
>>>>> node30/10.16.2.30:61003
>>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>>>> Starting SocketReader
>>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>>>> IPC Server Responder: starting
>>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>>>> IPC Server listener on 61003: starting
>>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
>>>>> message.HadoopMessageManagerImpl:  BSPPeer address:node30 port:61003
>>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>>>> IPC Server handler 0 on 61003: starting
>>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
>>>>> graph.GraphJobRunner: Loading finished at 1 steps.
>>>>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
>>>>>
>>>>> real 5m34.523s
>>>>> user 0m4.832s
>>>>> sys 0m0.960s
>>>>>
>>>>> Analyzing logs the one that fails has this log:
>>>>>
>>>>> /************************************************************
>>>>> STARTUP_MSG: Starting GroomServer
>>>>> STARTUP_MSG:   host = node30/10.16.2.30
>>>>> STARTUP_MSG:   args = [node30]
>>>>> STARTUP_MSG:   version = 1.0.0
>>>>> STARTUP_MSG:   build =
>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>>>>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
>>>>> ************************************************************/
>>>>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom start
>>>>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
>>>>> /tmp/hama-hduser/bsp/local
>>>>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
>>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> listener on 50000: starting
>>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> Responder: starting
>>>>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer: Worker
>>>>> rpc server --> node30:50000
>>>>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 0 on 50000: starting
>>>>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
>>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>>>> org.mortbay.log.Slf4jLog
>>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer: starting
>>>>> webserver: node30
>>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
>>>>> returned by webServer.getConnectors()[0].getLocalPort() before open()
>>>>> is -1. Opening the listener on 40015
>>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
>>>>> listener.getLocalPort() returned 40015
>>>>> webServer.getConnectors()[0].getLocalPort() returned 40015
>>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
>>>>> bound to port 40015
>>>>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
>>>>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
>>>>> SelectChannelConnector@node30:40015
>>>>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
>>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> Responder: starting
>>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> listener on 59102: starting
>>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 0 on 59102: starting
>>>>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 1 on 59102: starting
>>>>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 2 on 59102: starting
>>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 4 on 59102: starting
>>>>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 3 on 59102: starting
>>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 5 on 59102: starting
>>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 6 on 59102: starting
>>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 7 on 59102: starting
>>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 8 on 59102: starting
>>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>> handler 9 on 59102: starting
>>>>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
>>>>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
>>>>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer: Starting
>>>>> groom: node30:50000
>>>>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer: Launch 1 tasks.
>>>>> 2013-01-22 10:59:18,283 WARN org.apache.hadoop.util.NativeCodeLoader:
>>>>> Unable to load native-hadoop library for your platform... using
>>>>> builtin-java classes where applicable
>>>>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
>>>>> 'attempt_201301221058_0002_000000_0' has started.
>>>>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
>>>>> attempt_201301221058_0002_000000_0 is done.
>>>>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>>>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
>>>>> 'attempt_201301221058_0001_000000_0' has started.
>>>>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>>>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
>>>>> 'attempt_201301221058_0001_000002_0' has started.
>>>>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>>>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
>>>>> 'attempt_201301221058_0001_000001_0' has started.
>>>>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
>>>>> when executing BSPPeer process.
>>>>> java.io.IOException: BSP task process exit with nonzero status of 137.
>>>>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
>>>>> -classpath, /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
>>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
>>>>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
>>>>>         at org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
>>>>>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>>         at java.lang.Thread.run(Thread.java:722)
>>>>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer: adding
>>>>> purge task: attempt_201301221058_0001_000001_0
>>>>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About to
>>>>> purge task: attempt_201301221058_0001_000001_0
>>>>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>>>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>>>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>>>>
>>>>>
>>>>> What can it be? Some configuration? To not get the message of Java
>>>>> Heap Space configured:
>>>>>
>>>>>   <name> bsp.child.java.opts </ name>
>>>>>      <value>-Xmx6144m </ value>
>>>>>
>>>>> I have only 4 GB of memory in each Groomserver, can it be that? I am
>>>>> using the trunk version of Hama.
>>>>>
>>>>> --
>>>>> Francisco Sanches
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>
>>>
>>>
>>> --
>>> Francisco Sanches
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Francisco Sanches



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error when executing BSPPeer process

Posted by Francisco Sanches <sa...@gmail.com>.
Hi

With graphs with up to 15,000 vertices can run. I believe it is a
quick look at memory saw this being used enough memory swap.

2013/1/24 Edward J. Yoon <ed...@apache.org>:
> Have you checked your algorithm with small input? I'm not sure that is
> a memory issue.
>
> On Thu, Jan 24, 2013 at 7:53 PM, Francisco Sanches
> <sa...@gmail.com> wrote:
>> Hi thanks for the reply,
>>
>> My input is a graph of 35 000 vertices, but my application uses a
>> vector of 65537 to store the distance. Each node has a vector with the
>> number of positions (All-pair).
>>
>> 2013/1/24 Edward J. Yoon <ed...@apache.org>:
>>> Hi,
>>>
>>> How large your input data is?
>>>
>>> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
>>> <sa...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> When I try to run a larger graph in the application that I developed
>>>> here first get this message:
>>>>
>>>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar aplic19jan.jar
>>>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
>>>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job: job_201301221058_0002
>>>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps number: 0
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps number: 1
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of supersteps: 1
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
>>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
>>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TASK_INPUT_RECORDS=42089
>>>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to process : 42
>>>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job: job_201301221058_0001
>>>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps number: 0
>>>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps number: 1
>>>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
>>>> multiple SLF4J bindings.
>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
>>>> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
>>>> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
>>>> attempt_201301221058_0001_000001_0: SLF4J: See
>>>> http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
>>>> sync.ZKSyncClient: Initializing ZK Sync Client
>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
>>>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
>>>> node30/10.16.2.30:61003
>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>>> Starting SocketReader
>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>>> IPC Server Responder: starting
>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>>> IPC Server listener on 61003: starting
>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
>>>> message.HadoopMessageManagerImpl:  BSPPeer address:node30 port:61003
>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>>> IPC Server handler 0 on 61003: starting
>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
>>>> graph.GraphJobRunner: Loading finished at 1 steps.
>>>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
>>>>
>>>> real 5m34.523s
>>>> user 0m4.832s
>>>> sys 0m0.960s
>>>>
>>>> Analyzing logs the one that fails has this log:
>>>>
>>>> /************************************************************
>>>> STARTUP_MSG: Starting GroomServer
>>>> STARTUP_MSG:   host = node30/10.16.2.30
>>>> STARTUP_MSG:   args = [node30]
>>>> STARTUP_MSG:   version = 1.0.0
>>>> STARTUP_MSG:   build =
>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>>>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
>>>> ************************************************************/
>>>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom start
>>>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
>>>> /tmp/hama-hduser/bsp/local
>>>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> listener on 50000: starting
>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> Responder: starting
>>>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer: Worker
>>>> rpc server --> node30:50000
>>>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 0 on 50000: starting
>>>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>>> org.mortbay.log.Slf4jLog
>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer: starting
>>>> webserver: node30
>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
>>>> returned by webServer.getConnectors()[0].getLocalPort() before open()
>>>> is -1. Opening the listener on 40015
>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
>>>> listener.getLocalPort() returned 40015
>>>> webServer.getConnectors()[0].getLocalPort() returned 40015
>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
>>>> bound to port 40015
>>>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
>>>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
>>>> SelectChannelConnector@node30:40015
>>>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> Responder: starting
>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> listener on 59102: starting
>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 0 on 59102: starting
>>>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 1 on 59102: starting
>>>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 2 on 59102: starting
>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 4 on 59102: starting
>>>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 3 on 59102: starting
>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 5 on 59102: starting
>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 6 on 59102: starting
>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 7 on 59102: starting
>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 8 on 59102: starting
>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 9 on 59102: starting
>>>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
>>>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
>>>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer: Starting
>>>> groom: node30:50000
>>>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer: Launch 1 tasks.
>>>> 2013-01-22 10:59:18,283 WARN org.apache.hadoop.util.NativeCodeLoader:
>>>> Unable to load native-hadoop library for your platform... using
>>>> builtin-java classes where applicable
>>>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
>>>> 'attempt_201301221058_0002_000000_0' has started.
>>>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
>>>> attempt_201301221058_0002_000000_0 is done.
>>>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
>>>> 'attempt_201301221058_0001_000000_0' has started.
>>>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
>>>> 'attempt_201301221058_0001_000002_0' has started.
>>>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
>>>> 'attempt_201301221058_0001_000001_0' has started.
>>>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
>>>> when executing BSPPeer process.
>>>> java.io.IOException: BSP task process exit with nonzero status of 137.
>>>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
>>>> -classpath, /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
>>>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
>>>>         at org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
>>>>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>         at java.lang.Thread.run(Thread.java:722)
>>>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer: adding
>>>> purge task: attempt_201301221058_0001_000001_0
>>>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About to
>>>> purge task: attempt_201301221058_0001_000001_0
>>>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>>>
>>>>
>>>> What can it be? Some configuration? To not get the message of Java
>>>> Heap Space configured:
>>>>
>>>>   <name> bsp.child.java.opts </ name>
>>>>      <value>-Xmx6144m </ value>
>>>>
>>>> I have only 4 GB of memory in each Groomserver, can it be that? I am
>>>> using the trunk version of Hama.
>>>>
>>>> --
>>>> Francisco Sanches
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>
>>
>>
>> --
>> Francisco Sanches
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Francisco Sanches

Re: Error when executing BSPPeer process

Posted by "Edward J. Yoon" <ed...@apache.org>.
Have you checked your algorithm with small input? I'm not sure that is
a memory issue.

On Thu, Jan 24, 2013 at 7:53 PM, Francisco Sanches
<sa...@gmail.com> wrote:
> Hi thanks for the reply,
>
> My input is a graph of 35 000 vertices, but my application uses a
> vector of 65537 to store the distance. Each node has a vector with the
> number of positions (All-pair).
>
> 2013/1/24 Edward J. Yoon <ed...@apache.org>:
>> Hi,
>>
>> How large your input data is?
>>
>> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
>> <sa...@gmail.com> wrote:
>>> Hi,
>>>
>>> When I try to run a larger graph in the application that I developed
>>> here first get this message:
>>>
>>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar aplic19jan.jar
>>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
>>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job: job_201301221058_0002
>>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps number: 0
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps number: 1
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of supersteps: 1
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TASK_INPUT_RECORDS=42089
>>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to process : 42
>>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job: job_201301221058_0001
>>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps number: 0
>>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps number: 1
>>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
>>> multiple SLF4J bindings.
>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
>>> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
>>> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
>>> attempt_201301221058_0001_000001_0: SLF4J: See
>>> http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
>>> sync.ZKSyncClient: Initializing ZK Sync Client
>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
>>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
>>> node30/10.16.2.30:61003
>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>> Starting SocketReader
>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>> IPC Server Responder: starting
>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>> IPC Server listener on 61003: starting
>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
>>> message.HadoopMessageManagerImpl:  BSPPeer address:node30 port:61003
>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>>> IPC Server handler 0 on 61003: starting
>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
>>> graph.GraphJobRunner: Loading finished at 1 steps.
>>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
>>>
>>> real 5m34.523s
>>> user 0m4.832s
>>> sys 0m0.960s
>>>
>>> Analyzing logs the one that fails has this log:
>>>
>>> /************************************************************
>>> STARTUP_MSG: Starting GroomServer
>>> STARTUP_MSG:   host = node30/10.16.2.30
>>> STARTUP_MSG:   args = [node30]
>>> STARTUP_MSG:   version = 1.0.0
>>> STARTUP_MSG:   build =
>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
>>> ************************************************************/
>>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom start
>>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
>>> /tmp/hama-hduser/bsp/local
>>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> listener on 50000: starting
>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> Responder: starting
>>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer: Worker
>>> rpc server --> node30:50000
>>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 0 on 50000: starting
>>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>> org.mortbay.log.Slf4jLog
>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer: starting
>>> webserver: node30
>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
>>> returned by webServer.getConnectors()[0].getLocalPort() before open()
>>> is -1. Opening the listener on 40015
>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
>>> listener.getLocalPort() returned 40015
>>> webServer.getConnectors()[0].getLocalPort() returned 40015
>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
>>> bound to port 40015
>>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
>>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
>>> SelectChannelConnector@node30:40015
>>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> Responder: starting
>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> listener on 59102: starting
>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 0 on 59102: starting
>>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 1 on 59102: starting
>>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 2 on 59102: starting
>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 4 on 59102: starting
>>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 3 on 59102: starting
>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 5 on 59102: starting
>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 6 on 59102: starting
>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 7 on 59102: starting
>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 8 on 59102: starting
>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 9 on 59102: starting
>>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
>>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
>>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer: Starting
>>> groom: node30:50000
>>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer: Launch 1 tasks.
>>> 2013-01-22 10:59:18,283 WARN org.apache.hadoop.util.NativeCodeLoader:
>>> Unable to load native-hadoop library for your platform... using
>>> builtin-java classes where applicable
>>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
>>> 'attempt_201301221058_0002_000000_0' has started.
>>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
>>> attempt_201301221058_0002_000000_0 is done.
>>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
>>> 'attempt_201301221058_0001_000000_0' has started.
>>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
>>> 'attempt_201301221058_0001_000002_0' has started.
>>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
>>> 'attempt_201301221058_0001_000001_0' has started.
>>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
>>> when executing BSPPeer process.
>>> java.io.IOException: BSP task process exit with nonzero status of 137.
>>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
>>> -classpath, /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
>>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
>>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
>>>         at org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
>>>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>         at java.lang.Thread.run(Thread.java:722)
>>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer: adding
>>> purge task: attempt_201301221058_0001_000001_0
>>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About to
>>> purge task: attempt_201301221058_0001_000001_0
>>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>>
>>>
>>> What can it be? Some configuration? To not get the message of Java
>>> Heap Space configured:
>>>
>>>   <name> bsp.child.java.opts </ name>
>>>      <value>-Xmx6144m </ value>
>>>
>>> I have only 4 GB of memory in each Groomserver, can it be that? I am
>>> using the trunk version of Hama.
>>>
>>> --
>>> Francisco Sanches
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Francisco Sanches



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error when executing BSPPeer process

Posted by Francisco Sanches <sa...@gmail.com>.
Hi thanks for the reply,

My input is a graph of 35 000 vertices, but my application uses a
vector of 65537 to store the distance. Each node has a vector with the
number of positions (All-pair).

2013/1/24 Edward J. Yoon <ed...@apache.org>:
> Hi,
>
> How large your input data is?
>
> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
> <sa...@gmail.com> wrote:
>> Hi,
>>
>> When I try to run a larger graph in the application that I developed
>> here first get this message:
>>
>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar aplic19jan.jar
>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job: job_201301221058_0002
>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps number: 0
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps number: 1
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of supersteps: 1
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
>> org.apache.hama.bsp.JobInProgress$JobCounter
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TASK_INPUT_RECORDS=42089
>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to process : 42
>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job: job_201301221058_0001
>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps number: 0
>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps number: 1
>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
>> multiple SLF4J bindings.
>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
>> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
>> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
>> attempt_201301221058_0001_000001_0: SLF4J: See
>> http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
>> sync.ZKSyncClient: Initializing ZK Sync Client
>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
>> node30/10.16.2.30:61003
>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>> Starting SocketReader
>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>> IPC Server Responder: starting
>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>> IPC Server listener on 61003: starting
>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
>> message.HadoopMessageManagerImpl:  BSPPeer address:node30 port:61003
>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
>> IPC Server handler 0 on 61003: starting
>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
>> graph.GraphJobRunner: Loading finished at 1 steps.
>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
>>
>> real 5m34.523s
>> user 0m4.832s
>> sys 0m0.960s
>>
>> Analyzing logs the one that fails has this log:
>>
>> /************************************************************
>> STARTUP_MSG: Starting GroomServer
>> STARTUP_MSG:   host = node30/10.16.2.30
>> STARTUP_MSG:   args = [node30]
>> STARTUP_MSG:   version = 1.0.0
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
>> ************************************************************/
>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom start
>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
>> /tmp/hama-hduser/bsp/local
>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
>> listener on 50000: starting
>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
>> Responder: starting
>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer: Worker
>> rpc server --> node30:50000
>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 0 on 50000: starting
>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer: starting
>> webserver: node30
>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
>> returned by webServer.getConnectors()[0].getLocalPort() before open()
>> is -1. Opening the listener on 40015
>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
>> listener.getLocalPort() returned 40015
>> webServer.getConnectors()[0].getLocalPort() returned 40015
>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
>> bound to port 40015
>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
>> SelectChannelConnector@node30:40015
>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>> Responder: starting
>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>> listener on 59102: starting
>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 0 on 59102: starting
>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 1 on 59102: starting
>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 2 on 59102: starting
>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 4 on 59102: starting
>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 3 on 59102: starting
>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 5 on 59102: starting
>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 6 on 59102: starting
>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 7 on 59102: starting
>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 8 on 59102: starting
>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 9 on 59102: starting
>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer: Starting
>> groom: node30:50000
>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer: Launch 1 tasks.
>> 2013-01-22 10:59:18,283 WARN org.apache.hadoop.util.NativeCodeLoader:
>> Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
>> 'attempt_201301221058_0002_000000_0' has started.
>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
>> attempt_201301221058_0002_000000_0 is done.
>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
>> 'attempt_201301221058_0001_000000_0' has started.
>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
>> 'attempt_201301221058_0001_000002_0' has started.
>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
>> 'attempt_201301221058_0001_000001_0' has started.
>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
>> when executing BSPPeer process.
>> java.io.IOException: BSP task process exit with nonzero status of 137.
>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
>> -classpath, /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
>>         at org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
>>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         at java.lang.Thread.run(Thread.java:722)
>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer: adding
>> purge task: attempt_201301221058_0001_000001_0
>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About to
>> purge task: attempt_201301221058_0001_000001_0
>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>>
>>
>> What can it be? Some configuration? To not get the message of Java
>> Heap Space configured:
>>
>>   <name> bsp.child.java.opts </ name>
>>      <value>-Xmx6144m </ value>
>>
>> I have only 4 GB of memory in each Groomserver, can it be that? I am
>> using the trunk version of Hama.
>>
>> --
>> Francisco Sanches
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Francisco Sanches

Re: Error when executing BSPPeer process

Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi,

How large your input data is?

On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
<sa...@gmail.com> wrote:
> Hi,
>
> When I try to run a larger graph in the application that I developed
> here first get this message:
>
> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar aplic19jan.jar
> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a 42 30
> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths to process : 1
> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job: job_201301221058_0002
> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps number: 0
> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps number: 1
> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of supersteps: 1
> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> org.apache.hama.bsp.JobInProgress$JobCounter
> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TASK_INPUT_RECORDS=42089
> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths to process : 42
> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job: job_201301221058_0001
> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps number: 0
> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps number: 1
> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
> multiple SLF4J bindings.
> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
> attempt_201301221058_0001_000001_0: SLF4J: See
> http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> sync.ZKSyncClient: Initializing ZK Sync Client
> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
> node30/10.16.2.30:61003
> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
> Starting SocketReader
> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
> IPC Server Responder: starting
> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
> IPC Server listener on 61003: starting
> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> message.HadoopMessageManagerImpl:  BSPPeer address:node30 port:61003
> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO ipc.Server:
> IPC Server handler 0 on 61003: starting
> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
> graph.GraphJobRunner: Loading finished at 1 steps.
> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
>
> real 5m34.523s
> user 0m4.832s
> sys 0m0.960s
>
> Analyzing logs the one that fails has this log:
>
> /************************************************************
> STARTUP_MSG: Starting GroomServer
> STARTUP_MSG:   host = node30/10.16.2.30
> STARTUP_MSG:   args = [node30]
> STARTUP_MSG:   version = 1.0.0
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC 2011
> ************************************************************/
> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer: groom start
> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
> /tmp/hama-hduser/bsp/local
> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 50000: starting
> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer: Worker
> rpc server --> node30:50000
> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 50000: starting
> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer: starting
> webserver: node30
> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open()
> is -1. Opening the listener on 40015
> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
> listener.getLocalPort() returned 40015
> webServer.getConnectors()[0].getLocalPort() returned 40015
> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer: Jetty
> bound to port 40015
> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
> SelectChannelConnector@node30:40015
> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 59102: starting
> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 59102: starting
> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 59102: starting
> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 59102: starting
> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 4 on 59102: starting
> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 59102: starting
> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 59102: starting
> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 59102: starting
> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 7 on 59102: starting
> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 59102: starting
> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 59102: starting
> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
> GroomServer up at: 127.0.0.1/127.0.0.1:59102
> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer: Starting
> groom: node30:50000
> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer: Launch 1 tasks.
> 2013-01-22 10:59:18,283 WARN org.apache.hadoop.util.NativeCodeLoader:
> Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201301221058_0002_000000_0' has started.
> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer: Task
> attempt_201301221058_0002_000000_0 is done.
> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201301221058_0001_000000_0' has started.
> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201301221058_0001_000002_0' has started.
> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201301221058_0001_000001_0' has started.
> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner: Error
> when executing BSPPeer process.
> java.io.IOException: BSP task process exit with nonzero status of 137.
> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
> -classpath, /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
>         at org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer: adding
> purge task: attempt_201301221058_0001_000001_0
> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer: About to
> purge task: attempt_201301221058_0001_000001_0
> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer: Kill 1 tasks.
>
>
> What can it be? Some configuration? To not get the message of Java
> Heap Space configured:
>
>   <name> bsp.child.java.opts </ name>
>      <value>-Xmx6144m </ value>
>
> I have only 4 GB of memory in each Groomserver, can it be that? I am
> using the trunk version of Hama.
>
> --
> Francisco Sanches



-- 
Best Regards, Edward J. Yoon
@eddieyoon