You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Jordan Zimmerman (JIRA)" <ji...@apache.org> on 2015/04/01 16:44:54 UTC

[jira] [Commented] (CURATOR-196) this.client.create().creatingParentsIfNeeded() throw Puzzling EXCEPTION

    [ https://issues.apache.org/jira/browse/CURATOR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390651#comment-14390651 ] 

Jordan Zimmerman commented on CURATOR-196:
------------------------------------------

I don't see where the bug is. From what I can tell you are calling create() for a node that already exists. The most likely problem is code like this:

{code}
if(this.client.checkExists().forPath(path)!=null) {
    ....
} else {
    this.client.create().
   ....
}
{code}

This type of check is not correct. After the call to checkExists() another client may have created the node and, so, the create() call will fail. The only safe way to handle this is to always try to create() and, if that throws NodeExists, call setData(). Or, acquire a Curator lock first.

> this.client.create().creatingParentsIfNeeded()  throw Puzzling EXCEPTION
> ------------------------------------------------------------------------
>
>                 Key: CURATOR-196
>                 URL: https://issues.apache.org/jira/browse/CURATOR-196
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.6.0
>         Environment: RedHat
>            Reporter: HuanWang
>
> Scene One:In Single test. when I wanna register to zk. The code as below:
> {code}
> private void startWorker() {
> 		try {
> 			LOG.info("Start With Worker IP:" + this.workerIP);
> 			
> 			this.client.makeDir(SuperionConstant.ZOOKEEPER_WORKER_MONITOR_PATH);
> 			this.client.makeDir(SuperionConstant.ZOOKEEPER_WORKER_PATH);
> 			
> 			this.workerMonitorPath = SuperionConstant.ZOOKEEPER_WORKER_MONITOR_PATH + "/" + this.workerIP;
> 			/** Ephemeral Node: /workersMonitor/192.168.0.2 */
> 			this.client.createEphemeralNode(this.workerMonitorPath);
> 			
> 			
> 			this.workerPath = SuperionConstant.ZOOKEEPER_WORKER_PATH + "/" + this.workerIP;
> 			/** worker Node: /workers/192.168.0.2 */
> 			this.client.makeDir(this.workerPath);
> 			
> 			String workerStatePath = this.workerPath + "/" + "state";
> 			/** Persistent Node:  /workers/192.168.0.2/state */
> 			this.client.makeDir(workerStatePath);
> 			
> 			/** Persistent Node:  /workers/192.168.0.2/state/ProcessID */
> 			String workerStatePidPath = workerStatePath + "/" + "ProcessID";
> 			this.client.writeInt32(workerStatePidPath, workerPID);
> 			
> 			//this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_PATH);
> 			/** Persistent Node: /jobs/tmp   */
> 			this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_TMP_PATH);
> 			/** Persistent Node: /jobs/state   */
> 			this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_STATE_PATH);
> 			
> 			//register the worker in Zookeeper success
> 			this.containerManager.setBlockNewContainerRequests(false);	
> 		} catch (Exception e) {
> 			String errorMsg = "Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected";
> 			LOG.error(errorMsg, e);
> 			throw new SuperionRuntimeException(errorMsg,e);
> 		}
> 	}
> {code}
> ==========================================================
> the function I use is creatingParentsIfNeeded().
> ==========================================================
> {code}
> public synchronized void writeData(String path,byte data[]) throws Exception {
> 		   System.out.println(path+"  : writeData");
> 		if(this.client.checkExists().forPath(path)!=null) {
> 			//node exit
> 			System.out.println(path+"  : checkExist");
> 			this.client.setData().forPath(path, data);
> 		} else {
> 			//node not exit, create new
> 			System.out.println(path+ "  : node not exit");
> 			this.client.create().creatingParentsIfNeeded()
> 			.withMode(CreateMode.PERSISTENT).forPath(path, data);
> 		//	this.client.create().withMode(CreateMode.PERSISTENT).forPath(path, data);
> 			System.out.println(path+ "  : creatingParentsIfNeeded");
> 		}
> {code}
> ======================================================
> but sometimes (not every time) .it would throw NodeExistException:
> =======================================================
> {code}
> 015-03-31 15:29:49,452 INFO  [main-EventThread] state.ConnectionStateManager (ConnectionStateManager.java:postState(228)) - State change: CONNECTED
> /workersMonitor  : checkExist
> /workers  : writeData
> /workers  : checkExist
> /workers/10.24.76.52  : writeData
> /workers/10.24.76.52  : node not exit
> /workers/10.24.76.52  : creatingParentsIfNeeded
> /workers/10.24.76.52/state  : writeData
> /workers/10.24.76.52/state  : node not exit
> 2015-03-31 15:29:50,508 ERROR [main] zookeeper.ZookeeperService (ZookeeperService.java:startWorker(331)) - Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> 2015-03-31 15:29:50,510 INFO  [main] service.AbstractService (AbstractService.java:noteFailure(272)) - Service com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService failed in state STARTED; cause: com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:332)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	... 10 more
> 2015-03-31 15:29:50,557 INFO  [main] zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x34a75c727c204a4 closed
> 2015-03-31 15:29:50,557 INFO  [main-EventThread] zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down
> 2015-03-31 15:29:50,558 INFO  [main] service.AbstractService (AbstractService.java:noteFailure(272)) - Service com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl failed in state STARTED; cause: com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:332)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	... 10 more
> 2015-03-31 15:29:50,561 INFO  [main] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:isEnabled(168)) - Neither virutal-memory nor physical-memory monitoring is needed. Not running the monitor-thread
> 2015-03-31 15:29:50,562 INFO  [main] service.AbstractService (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state STARTED; cause: com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:332)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	... 10 more
> 2015-03-31 15:29:50,562 INFO  [Public Localizer] localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(642)) - Public cache exiting
> 2015-03-31 15:29:50,563 INFO  [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping Worker metrics system...
> 2015-03-31 15:29:50,564 INFO  [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - Worker metrics system stopped.
> 2015-03-31 15:29:50,564 INFO  [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - Worker metrics system shutdown complete.
> 2015-03-31 15:29:50,564 FATAL [main] worker.Worker (Worker.java:initAndStartNodeManager(184)) - Error starting NodeManager
> com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:332)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	... 10 more
> {code}
> ===================================================
> Scene Two:   When starting job :
> ===================================================
> {code}
> private void startJob(ZookeeperEvent zookeeperEvent) {
> 		
> 		StartJobZookeeperEvent startJobZookeeperEvent = (StartJobZookeeperEvent) zookeeperEvent;
> 		String jobInstanceId = startJobZookeeperEvent
> 				.getStartContainerRequest().getContainerId().getApplicationId()
> 				.getJobInstanceZKId();
> 		
> 		String jobTmpEphemeral = SuperionConstant.ZOOKEEPER_JOB_TMP_PATH + "/" + jobInstanceId;
> 		String jobStatePersistent = SuperionConstant.ZOOKEEPER_JOB_STATE_PATH + "/" + jobInstanceId;
> 		
> 		String jobStateWorkerIP = jobStatePersistent + "/" + SuperionConstant.JobState.WorkerIP;
> 		String jobStateJobStatus = jobStatePersistent + "/" + SuperionConstant.JobState.JobStatus;
> 		String jobStateJobErrorMsg = jobStatePersistent + "/" + SuperionConstant.JobState.JobErrorMsg;
> 		String jobStateCreateTime = jobStatePersistent + "/" + SuperionConstant.JobState.CreateTime;
> 		
> 		try {
> 			/** Ephemeral Node: /job/tmp/jobInstanceId */
> 			this.client.createEphemeralNode(jobTmpEphemeral);
> 			if(this.client.checkExists(jobTmpEphemeral) == null)
> 				throw new Exception("ephemeral node["+jobTmpEphemeral+"] create fail");
> 			
> 			/** update job state-----------------  */
> 			/** Persistent Node: /jobs/state/jobInstanceId */
> 			this.client.makeDir(jobStatePersistent);
> 			/** Persistent Node: /jobs/state/jobInstanceId/WorkerIP */
> 			this.client.writeString(jobStateWorkerIP, this.workerIP);
> 			
> 			/** Persistent Node: /jobs/state/jobInstanceId/CreateTime */
> 			this.client.writeInt64(jobStateCreateTime, System.currentTimeMillis());
> 			/* start container request */
> 			StartContainerResponse response = this.containerManager.startContainers(
> 					startJobZookeeperEvent.getStartContainerRequest());
> 			
> 			int jobStatusInt = SuperionConstant.JOB_STATUS_TAKED;
> 			
> 			//TODO whtest
> 			
> 			if(!response.isSuccess()) {
> 			//	jobStatusInt = SuperionConstant.JOB_STATUS_PARAMETER_CHECK_ERROR;
> 				LOG.error(startJobZookeeperEvent.getStartContainerRequest().getContainerId().toString() + " start exception", 
> 						response.getFailureReason());
> 			String jobErrorMsg = response.getFailureReason().getMessage();
> 			throw new Exception(jobErrorMsg,response.getFailureReason());
> 				/** Persistent Node: /jobs/state/jobInstanceId/JobErrorMsg */
> //       			this.client.writeString(jobStateJobErrorMsg, jobErrorMsg);
> 			
> 			} 
> 			
> 			/** Persistent Node: /jobs/state/jobInstanceId/JobStatus */
> 			this.client.writeInt32(jobStateJobStatus, jobStatusInt);
> 		} catch (Exception e) {
> 			LOG.error("exception happened when start job" , e);
> 			
> 			if(e instanceof KeeperException.NodeExistsException){
> 				/*
> 				* node exit exception when /job/tmp/jobInstanceId create
> 				* if /job/tmp/jobInstanceId create then return
> 				* */
> 				KeeperException.NodeExistsException nodeExists = (KeeperException.NodeExistsException)e;
> 				     String existsPath = nodeExists.getPath();
> 				  
> 				if(existsPath != null && existsPath.startsWith(SuperionConstant.ZOOKEEPER_JOB_TMP_PATH)) {
> 					return;
> 				}
> 			}
> 			try{
> 				String jobErrorMsg = e.getMessage();
> 				/** Persistent Node: /jobs/state/jobInstanceId/JobErrorMsg */
> 				this.client.writeString(jobStateJobErrorMsg, jobErrorMsg);
> 				/** Persistent Node: /jobs/state/jobInstanceId/JobStatus */		
> 				this.client.writeInt32(jobStateJobStatus, SuperionConstant.JOB_STATUS_PARAMETER_CHECK_ERROR);
> 			} catch(Exception ignoreE) {
> 				LOG.warn("Ignore Exception", ignoreE);//ignore
> 			} finally {
> 				try {
> 					this.client.deleteEphemeralNode(jobTmpEphemeral);
> 				} catch(Exception exception) {
> 					LOG.warn("Ignore Exception", exception);//ignore
> 				}
> 			}
> 		}
> 	}
> {code}
> ====================================================
> When we saw logs.we find some jobs(not every one) throw the Exception
> ==================================================
> {code}
> ource_visiblity as resource9_2_ from job_depend_resource jobdependr0_ where jobdependr0_.job_id=?
> 2015-03-28 00:01:58,651 INFO  [AsyncDispatcher event handler] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(319)) - Start request for container_20150327000156_5755_0299_0144_ by user bicbt
> 2015-03-28 00:01:58,652 INFO  [AsyncDispatcher event handler] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(343)) - Creating a new application reference for app application_20150327000156_5755
> 2015-03-28 00:01:58,652 INFO  [AsyncDispatcher event handler] worker.WorkerAuditLogger (WorkerAuditLogger.java:logSuccess(98)) - USER=bicbt     OPERATION=Start Container Request       TARGET=ContainerManageImpl      RESULT=SUCCESS  APPID=application_20150327000156_5755   CONTAINERID=container_20150327000156_5755_0299_0144_
> 2015-03-28 00:01:58,675 ERROR [AsyncDispatcher event handler] zookeeper.ZookeeperService (ZookeeperService.java:startJob(178)) - exception happened when start job
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /jobs/state/1_299_20150328000156_144_0/JobStatus
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
>         at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
>         at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>         at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
>         at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
>         at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
>         at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:119)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeInt32(ZookeeperClient.java:126)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startJob(ZookeeperService.java:176)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.handle(ZookeeperService.java:104)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.handle(ZookeeperService.java:30)
>         at com.suning.cybertron.superion.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:138)
>         at com.suning.cybertron.superion.event.AsyncDispatcher$1.run(AsyncDispatcher.java:85)
>         at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)