You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Vincentius Martin <vi...@gmail.com> on 2014/11/29 12:48:19 UTC

How to activate checkpoint?

Hi,

I'm using Giraph 1.0.0 and I ran RandomMessageBenchmark in Giraph.

In the middle of the process I tried killing a hadoop task (= a worker).
Suddenly the process just failed with the following exception in master

2014-11-29 04:40:18,049 INFO org.apache.giraph.master.MasterThread:
masterThread: Coordination of superstep 1 took 611.669 seconds ended
with state WORKER_FAILURE and is now on superstep 1
2014-11-29 04:40:18,313 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with RuntimeException
java.lang.RuntimeException: restartFromCheckpoint: KeeperException
	at org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1185)
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:135)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/_hadoopBsp/job_201411290417_0003/_edgeInputSplitDir
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
	at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:307)
	at org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1179)
	... 1 more
2014-11-29 04:40:18,315 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg =
java.lang.RuntimeException: restartFromCheckpoint: KeeperException,
exiting...
java.lang.IllegalStateException: java.lang.RuntimeException:
restartFromCheckpoint: KeeperException
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:181)
Caused by: java.lang.RuntimeException: restartFromCheckpoint: KeeperException
	at org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1185)
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:135)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/_hadoopBsp/job_201411290417_0003/_edgeInputSplitDir
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
	at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:307)
	at org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1179)

Is this some kind of bug in Giraph? What I see from the log is: master is
trying to do restartFromCheckpoint but it failed.

How can I activate a checkpoint situation in Giraph?

Thanks

Regards,
Vincentius Martin