You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@twill.apache.org by "Albert Shau (JIRA)" <ji...@apache.org> on 2015/09/11 03:54:45 UTC

[jira] [Created] (TWILL-152) Zookeeper NodeExistsException on AM restarts

Albert Shau created TWILL-152:
---------------------------------

             Summary: Zookeeper NodeExistsException on AM restarts
                 Key: TWILL-152
                 URL: https://issues.apache.org/jira/browse/TWILL-152
             Project: Apache Twill
          Issue Type: Bug
            Reporter: Albert Shau


If the AM fails and is restarted (for example, due to expiration of AMRM token), we see failures starting up again due to zookeeper nodes already existing

{code}

java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /c072c759-d7bf-488a-a8ca-782a3656392f/runnables
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na]
	at org.apache.twill.internal.ServiceMain.doMain(ServiceMain.java:94) ~[org.apache.twill.twill-yarn-0.5.0-incubating.jar:0.5.0-incubating]
	at org.apache.twill.internal.appmaster.ApplicationMasterMain.main(ApplicationMasterMain.java:77) [org.apache.twill.twill-yarn-0.5.0-incubating.jar:0.5.0-incubating]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_75]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_75]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_75]
	at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_75]
	at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:86) [launcher.e5147f31-88e3-486c-a6e2-8d33bdc30ebb.jar:na]
java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /c072c759-d7bf-488a-a8ca-782a3656392f/runnables
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na]
	at org.apache.twill.internal.appmaster.ApplicationMasterService.doStart(ApplicationMasterService.java:222) ~[org.apache.twill.twill-yarn-0.5.0-incubating.jar:0.5.0-incubating]
	at org.apache.twill.internal.AbstractTwillService.startUp(AbstractTwillService.java:171) ~[org.apache.twill.twill-core-0.5.0-incubating.jar:0.5.0-incubating]
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
	at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_75]
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /c072c759-d7bf-488a-a8ca-782a3656392f/runnables
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) ~[org.apache.zookeeper.zookeeper-3.4.5.jar:3.4.5-1392090]
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[org.apache.zookeeper.zookeeper-3.4.5.jar:3.4.5-1392090]
	at org.apache.twill.internal.zookeeper.DefaultZKClientService$Callbacks$1.processResult(DefaultZKClientService.java:500) ~[org.apache.twill.twill-zookeeper-0.5.0-incubating.jar:0.5.0-incubating]
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:602) ~[org.apache.zookeeper.zookeeper-3.4.5.jar:3.4.5-1392090]
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) ~[org.apache.zookeeper.zookeeper-3.4.5.jar:3.4.5-1392090]
{code}

This is due to the fact that the restarted AM has the same run id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)