You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Jyoti Yadav <ra...@gmail.com> on 2014/01/20 08:22:57 UTC

Problem occured when running job with >1 worker.

Hi Folks...

When i am running one algorithm on single system cluster with 1 worker,it
is working fine...But when i increased  the no of worker >1,following error
is thrown at run time..





*ERROR org.apache.giraph.master.BspServiceMaster:
superstepChosenWorkerAlive: Missing chosen worker
Worker(hostname=kanha-Vostro-1014, MRtaskID=2, port=30002) on superstep
172014-01-20 12:27:36,451 INFO org.apache.giraph.master.MasterThread:
masterThread: Coordination of superstep 17 took 1414.576 seconds ended with
state WORKER_FAILURE and is now on superstep 172014-01-20 12:28:02,723
ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm
failed with
ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException: -1*
    at
org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1276)
    at org.apache.giraph.master.MasterThread.run(MasterThread.java:139)
2014-01-20 12:28:06,059 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg =
java.lang.ArrayIndexOutOfBoundsException: -1, exiting...
java.lang.IllegalStateException: java.lang.ArrayIndexOutOfBoundsException:
-1
    at org.apache.giraph.master.MasterThread.run(MasterThread.java:185)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at
org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1276)
    at org.apache.giraph.master.MasterThread.run(MasterThread.java:139)
2014-01-20 12:28:36,993 INFO org.apache.giraph.zk.ZooKeeperManager: run:
Shutdown hook started.
2014-01-20 12:28:36,993 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper
process.
2014-01-20 12:29:08,015 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143
typically means killed).




Any ideas??
Thanks..
Jyoti

Re: Problem occured when running job with >1 worker.

Posted by Jyoti Yadav <ra...@gmail.com>.
Hi  Kaya..

Below is the worker's log..






WARN org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address kanha-Vostro-1014/
127.0.1.1:30002
java.nio.channels.ClosedChannelException
    at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:674)
    at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:642)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:98)
    at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:385)
    at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:256)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:40,161 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:40,106 WARN
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address null
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:40,297 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:40,044 WARN
org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught:
Channel failed with remote address /127.0.0.1:43641
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
    at sun.nio.ch.IOUtil.read(IOUtil.java:193)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:63)
    at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:385)
    at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:256)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:40,074 WARN
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address kanha-Vostro-1014/
127.0.1.1:30002
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
    at sun.nio.ch.IOUtil.read(IOUtil.java:193)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:63)
    at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:385)
    at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:256)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:40,044 WARN org.apache.giraph.comm.netty.NettyClient:
getNextChannel: Failed to reconnect to kanha-Vostro-1014/127.0.1.1:30002 on
attempt 1 out of 1000 max attempts, sleeping for 5 secs
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:42,079 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:42,079 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:44,133 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:44,134 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:45,298 INFO org.apache.giraph.comm.netty.NettyClient:
Using Netty without authentication.
2014-01-20 12:29:45,299 WARN org.apache.giraph.comm.netty.NettyClient:
getNextChannel: Failed to reconnect to kanha-Vostro-1014/127.0.1.1:30002 on
attempt 2 out of 1000 max attempts, sleeping for 5 secs
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:45,318 WARN
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address null
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:46,101 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:46,102 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:47,294 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:47,295 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:48,509 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:48,510 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:49,967 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:49,968 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:50,318 INFO org.apache.giraph.comm.netty.NettyClient:
Using Netty without authentication.
2014-01-20 12:29:50,319 WARN org.apache.giraph.comm.netty.NettyClient:
getNextChannel: Failed to reconnect to kanha-Vostro-1014/127.0.1.1:30002 on
attempt 3 out of 1000 max attempts, sleeping for 5 secs
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:50,332 WARN
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address null
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:51,765 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:51,765 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:53,042 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:53,043 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:54,765 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server kanha-Vostro-1014/127.0.1.1:22181
2014-01-20 12:29:54,765 WARN org.apache.zookeeper.ClientCnxn: Session
0x143ae2e202a0001 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-01-20 12:29:55,332 INFO org.apache.giraph.comm.netty.NettyClient:
Using Netty without authentication.
2014-01-20 12:29:55,333 WARN org.apache.giraph.comm.netty.NettyClient:
getNextChannel: Failed to reconnect to kanha-Vostro-1014/127.0.1.1:30002 on
attempt 4 out of 1000 max attempts, sleeping for 5 secs
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366)
    at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282)
    at
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
    at
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
2014-01-20 12:29:55,357 WARN
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address null



Thanks



On Mon, Jan 20, 2014 at 1:41 PM, Sertuğ Kaya <se...@agmlab.com> wrote:

>  Hi Jyoti;
> I assume this is the log of master vertex. It seems like master can not
> reach a worker for some reason. Did you also check the worker vertex's log?
> Maybe you can share it too.
> Sertug
>
>
> On 20-01-2014 09:22, Jyoti Yadav wrote:
>
>
> *h.master.MasterThread: masterThread: Master algorithm failed with
> ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException: -1*
>
>
>

Re: Problem occured when running job with >1 worker.

Posted by Sertuğ Kaya <se...@agmlab.com>.
Hi Jyoti;
I assume this is the log of master vertex. It seems like master can not 
reach a worker for some reason. Did you also check the worker vertex's 
log? Maybe you can share it too.
Sertug

On 20-01-2014 09:22, Jyoti Yadav wrote:
> *h.master.MasterThread: masterThread: Master algorithm failed with 
> ArrayIndexOutOfBoundsException
> java.lang.ArrayIndexOutOfBoundsException: -1*