You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Pankaj Malhotra <pa...@gmail.com> on 2014/02/13 11:43:35 UTC

IllegalStateException getNextChannel with large number of vertices

Hi,

I am using giraph-1.0.0 with hadoop 1.0.0.

My cluster has 4 nodes with 32 processors each.

I am using 24 workers with default checkpointing settings.


My implementation works fine for <0.3M vertices, 2.0M edges> but fails
on a data-set with <1.5M vertices, 10.3M edges> with the following
errors:

*Error on each worker:*

java.lang.IllegalStateException: run: Caught an unrecoverable
exception getNextChannel: Failed to connect to hadoop2:30000 in 1000
connect attempts
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.IllegalStateException: getNextChannel: Failed to
connect to hadoop2:30000 in 1000 connect attempts
	at org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:611)
	at org.apache.giraph.comm.netty.NettyClient.sendWritableRequest(NettyClient.java:635)
	at org.apache.giraph.comm.netty.NettyWorkerClient.sendWritableRequest(NettyWorkerClient.java:144)
	at org.apache.giraph.comm.netty.NettyWorkerAggregatorRequestProcessor.sendAggregatedValuesToMaster(NettyWorkerAggregatorRequestProcessor.java:119)
	at org.apache.giraph.worker.WorkerAggregatorHandler.finishSuperstep(WorkerAggregatorHandler.java:218)
	at org.apache.giraph.worker.BspServiceWorker.finishSuperstep(BspServiceWorker.java:758)
	at org.apache.giraph.graph.GraphTaskManager.completeSuperstepAndCollectStats(GraphTaskManager.java:387)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:276)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
	... 7 more


*Error or Master:*

2014-02-13 13:21:13,889 INFO
org.apache.giraph.partition.PartitionUtils: analyzePartitionStats:
Edges - Mean: 427869, Min: Worker(hostname=hadoop5, MRtaskID=18,
port=30018) - 422292, Max: Worker(hostname=hadoop4, MRtaskID=19,
port=30019) - 433205
2014-02-13 13:21:13,967 INFO
org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
of 24 workers finished on superstep 8 on path
/_hadoopBsp/job_201402131226_0005/_applicationAttemptsDir/0/_superstepDir/8/_workerFinishedDir
2014-02-13 13:31:18,353 ERROR
org.apache.giraph.master.BspServiceMaster: superstepChosenWorkerAlive:
Missing chosen worker Worker(hostname=hadoop5, MRtaskID=6, port=30006)
on superstep 8
2014-02-13 13:31:18,362 INFO org.apache.giraph.master.MasterThread:
masterThread: Coordination of superstep 8 took 604.513 seconds ended
with state WORKER_FAILURE and is now on superstep 8
2014-02-13 13:31:19,663 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with
ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1219)
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:135)
2014-02-13 13:31:19,679 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg =
java.lang.ArrayIndexOutOfBoundsException: -1, exiting...
java.lang.IllegalStateException: java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:181)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1219)
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:135)
2014-02-13 13:31:19,803 INFO org.apache.giraph.zk.ZooKeeperManager:
run: Shutdown hook started.
2014-02-13 13:31:19,803 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper
process.
2014-02-13 13:31:20,276 INFO org.apache.zookeeper.ClientCnxn: Unable
to read additional data from server sessionid 0x1442a26dbf10000,
likely server has closed socket, closing socket connection and
attempting reconnect
2014-02-13 13:31:20,366 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: ZooKeeper process exited with 143 (note that
143 typically means killed).

------------------------------

Please let me know if any additional details are required.

Thanks

Pankaj

IIT Delhi