You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Jordan Zimmerman (JIRA)" <ji...@apache.org> on 2015/03/31 17:49:54 UTC

[jira] [Commented] (CURATOR-196) this.client.create().creatingParentsIfNeeded() throw Puzzling EXCEPTION

    [ https://issues.apache.org/jira/browse/CURATOR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388735#comment-14388735 ] 

Jordan Zimmerman commented on CURATOR-196:
------------------------------------------

Please reformat the description of this issue. The formatting has been lost. Use the \{code\} tag around your code samples.

> this.client.create().creatingParentsIfNeeded()  throw Puzzling EXCEPTION
> ------------------------------------------------------------------------
>
>                 Key: CURATOR-196
>                 URL: https://issues.apache.org/jira/browse/CURATOR-196
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.6.0
>         Environment: RedHat
>            Reporter: HuanWang
>
> Scene One:In Single test. when I wanna register to zk. The code as below:
> private void startWorker() {
> 		try {
> 			LOG.info("Start With Worker IP:" + this.workerIP);
> 			
> 			this.client.makeDir(SuperionConstant.ZOOKEEPER_WORKER_MONITOR_PATH);
> 			this.client.makeDir(SuperionConstant.ZOOKEEPER_WORKER_PATH);
> 			
> 			this.workerMonitorPath = SuperionConstant.ZOOKEEPER_WORKER_MONITOR_PATH + "/" + this.workerIP;
> 			/** Ephemeral Node: /workersMonitor/192.168.0.2 */
> 			this.client.createEphemeralNode(this.workerMonitorPath);
> 			
> 			
> 			this.workerPath = SuperionConstant.ZOOKEEPER_WORKER_PATH + "/" + this.workerIP;
> 			/** worker Node: /workers/192.168.0.2 */
> 			this.client.makeDir(this.workerPath);
> 			
> 			String workerStatePath = this.workerPath + "/" + "state";
> 			/** Persistent Node:  /workers/192.168.0.2/state */
> 			this.client.makeDir(workerStatePath);
> 			
> 			/** Persistent Node:  /workers/192.168.0.2/state/ProcessID */
> 			String workerStatePidPath = workerStatePath + "/" + "ProcessID";
> 			this.client.writeInt32(workerStatePidPath, workerPID);
> 			
> 			//this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_PATH);
> 			/** Persistent Node: /jobs/tmp   */
> 			this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_TMP_PATH);
> 			/** Persistent Node: /jobs/state   */
> 			this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_STATE_PATH);
> 			
> 			//register the worker in Zookeeper success
> 			this.containerManager.setBlockNewContainerRequests(false);	
> 		} catch (Exception e) {
> 			String errorMsg = "Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected";
> 			LOG.error(errorMsg, e);
> 			throw new SuperionRuntimeException(errorMsg,e);
> 		}
> 	}
> ==========================================================
> the function I use is creatingParentsIfNeeded().
> ==========================================================
> public synchronized void writeData(String path,byte data[]) throws Exception {
> 		   System.out.println(path+"  : writeData");
> 		if(this.client.checkExists().forPath(path)!=null) {
> 			//node exit
> 			System.out.println(path+"  : checkExist");
> 			this.client.setData().forPath(path, data);
> 		} else {
> 			//node not exit, create new
> 			System.out.println(path+ "  : node not exit");
> 			this.client.create().creatingParentsIfNeeded()
> 			.withMode(CreateMode.PERSISTENT).forPath(path, data);
> 		//	this.client.create().withMode(CreateMode.PERSISTENT).forPath(path, data);
> 			System.out.println(path+ "  : creatingParentsIfNeeded");
> 		}
> ======================================================
> but sometimes (not every time) .it would throw NodeExistException:
> =======================================================
> 015-03-31 15:29:49,452 INFO  [main-EventThread] state.ConnectionStateManager (ConnectionStateManager.java:postState(228)) - State change: CONNECTED
> /workersMonitor  : checkExist
> /workers  : writeData
> /workers  : checkExist
> /workers/10.24.76.52  : writeData
> /workers/10.24.76.52  : node not exit
> /workers/10.24.76.52  : creatingParentsIfNeeded
> /workers/10.24.76.52/state  : writeData
> /workers/10.24.76.52/state  : node not exit
> 2015-03-31 15:29:50,508 ERROR [main] zookeeper.ZookeeperService (ZookeeperService.java:startWorker(331)) - Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> 2015-03-31 15:29:50,510 INFO  [main] service.AbstractService (AbstractService.java:noteFailure(272)) - Service com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService failed in state STARTED; cause: com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:332)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	... 10 more
> 2015-03-31 15:29:50,557 INFO  [main] zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x34a75c727c204a4 closed
> 2015-03-31 15:29:50,557 INFO  [main-EventThread] zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down
> 2015-03-31 15:29:50,558 INFO  [main] service.AbstractService (AbstractService.java:noteFailure(272)) - Service com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl failed in state STARTED; cause: com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:332)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	... 10 more
> 2015-03-31 15:29:50,561 INFO  [main] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:isEnabled(168)) - Neither virutal-memory nor physical-memory monitoring is needed. Not running the monitor-thread
> 2015-03-31 15:29:50,562 INFO  [main] service.AbstractService (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state STARTED; cause: com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:332)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	... 10 more
> 2015-03-31 15:29:50,562 INFO  [Public Localizer] localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(642)) - Public cache exiting
> 2015-03-31 15:29:50,563 INFO  [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping Worker metrics system...
> 2015-03-31 15:29:50,564 INFO  [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - Worker metrics system stopped.
> 2015-03-31 15:29:50,564 INFO  [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - Worker metrics system shutdown complete.
> 2015-03-31 15:29:50,564 FATAL [main] worker.Worker (Worker.java:initAndStartNodeManager(184)) - Error starting NodeManager
> com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:332)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.serviceStart(ZookeeperService.java:86)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:230)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> 	at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.java:143)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager(Worker.java:182)
> 	at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /workers/10.24.76.52/state
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
> 	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
> 	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:125)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.makeDir(ZookeeperClient.java:169)
> 	at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startWorker(ZookeeperService.java:315)
> 	... 10 more
> ===================================================
> Scene Two:   When starting job :
> ===================================================
> private void startJob(ZookeeperEvent zookeeperEvent) {
> 		
> 		StartJobZookeeperEvent startJobZookeeperEvent = (StartJobZookeeperEvent) zookeeperEvent;
> 		String jobInstanceId = startJobZookeeperEvent
> 				.getStartContainerRequest().getContainerId().getApplicationId()
> 				.getJobInstanceZKId();
> 		
> 		String jobTmpEphemeral = SuperionConstant.ZOOKEEPER_JOB_TMP_PATH + "/" + jobInstanceId;
> 		String jobStatePersistent = SuperionConstant.ZOOKEEPER_JOB_STATE_PATH + "/" + jobInstanceId;
> 		
> 		String jobStateWorkerIP = jobStatePersistent + "/" + SuperionConstant.JobState.WorkerIP;
> 		String jobStateJobStatus = jobStatePersistent + "/" + SuperionConstant.JobState.JobStatus;
> 		String jobStateJobErrorMsg = jobStatePersistent + "/" + SuperionConstant.JobState.JobErrorMsg;
> 		String jobStateCreateTime = jobStatePersistent + "/" + SuperionConstant.JobState.CreateTime;
> 		
> 		try {
> 			/** Ephemeral Node: /job/tmp/jobInstanceId */
> 			this.client.createEphemeralNode(jobTmpEphemeral);
> 			if(this.client.checkExists(jobTmpEphemeral) == null)
> 				throw new Exception("ephemeral node["+jobTmpEphemeral+"] create fail");
> 			
> 			/** update job state-----------------  */
> 			/** Persistent Node: /jobs/state/jobInstanceId */
> 			this.client.makeDir(jobStatePersistent);
> 			/** Persistent Node: /jobs/state/jobInstanceId/WorkerIP */
> 			this.client.writeString(jobStateWorkerIP, this.workerIP);
> 			
> 			/** Persistent Node: /jobs/state/jobInstanceId/CreateTime */
> 			this.client.writeInt64(jobStateCreateTime, System.currentTimeMillis());
> 			/* start container request */
> 			StartContainerResponse response = this.containerManager.startContainers(
> 					startJobZookeeperEvent.getStartContainerRequest());
> 			
> 			int jobStatusInt = SuperionConstant.JOB_STATUS_TAKED;
> 			
> 			//TODO whtest
> 			
> 			if(!response.isSuccess()) {
> 			//	jobStatusInt = SuperionConstant.JOB_STATUS_PARAMETER_CHECK_ERROR;
> 				LOG.error(startJobZookeeperEvent.getStartContainerRequest().getContainerId().toString() + " start exception", 
> 						response.getFailureReason());
> 			String jobErrorMsg = response.getFailureReason().getMessage();
> 			throw new Exception(jobErrorMsg,response.getFailureReason());
> 				/** Persistent Node: /jobs/state/jobInstanceId/JobErrorMsg */
> //       			this.client.writeString(jobStateJobErrorMsg, jobErrorMsg);
> 			
> 			} 
> 			
> 			/** Persistent Node: /jobs/state/jobInstanceId/JobStatus */
> 			this.client.writeInt32(jobStateJobStatus, jobStatusInt);
> 		} catch (Exception e) {
> 			LOG.error("exception happened when start job" , e);
> 			
> 			if(e instanceof KeeperException.NodeExistsException){
> 				/*
> 				* node exit exception when /job/tmp/jobInstanceId create
> 				* if /job/tmp/jobInstanceId create then return
> 				* */
> 				KeeperException.NodeExistsException nodeExists = (KeeperException.NodeExistsException)e;
> 				     String existsPath = nodeExists.getPath();
> 				  
> 				if(existsPath != null && existsPath.startsWith(SuperionConstant.ZOOKEEPER_JOB_TMP_PATH)) {
> 					return;
> 				}
> 			}
> 			try{
> 				String jobErrorMsg = e.getMessage();
> 				/** Persistent Node: /jobs/state/jobInstanceId/JobErrorMsg */
> 				this.client.writeString(jobStateJobErrorMsg, jobErrorMsg);
> 				/** Persistent Node: /jobs/state/jobInstanceId/JobStatus */		
> 				this.client.writeInt32(jobStateJobStatus, SuperionConstant.JOB_STATUS_PARAMETER_CHECK_ERROR);
> 			} catch(Exception ignoreE) {
> 				LOG.warn("Ignore Exception", ignoreE);//ignore
> 			} finally {
> 				try {
> 					this.client.deleteEphemeralNode(jobTmpEphemeral);
> 				} catch(Exception exception) {
> 					LOG.warn("Ignore Exception", exception);//ignore
> 				}
> 			}
> 		}
> 	}
> ====================================================
> When we saw logs.we find some jobs(not every one) throw the Exception
> ==================================================
> ource_visiblity as resource9_2_ from job_depend_resource jobdependr0_ where jobdependr0_.job_id=?
> 2015-03-28 00:01:58,651 INFO  [AsyncDispatcher event handler] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(319)) - Start request for container_20150327000156_5755_0299_0144_ by user bicbt
> 2015-03-28 00:01:58,652 INFO  [AsyncDispatcher event handler] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(343)) - Creating a new application reference for app application_20150327000156_5755
> 2015-03-28 00:01:58,652 INFO  [AsyncDispatcher event handler] worker.WorkerAuditLogger (WorkerAuditLogger.java:logSuccess(98)) - USER=bicbt     OPERATION=Start Container Request       TARGET=ContainerManageImpl      RESULT=SUCCESS  APPID=application_20150327000156_5755   CONTAINERID=container_20150327000156_5755_0299_0144_
> 2015-03-28 00:01:58,675 ERROR [AsyncDispatcher event handler] zookeeper.ZookeeperService (ZookeeperService.java:startJob(178)) - exception happened when start job
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /jobs/state/1_299_20150328000156_144_0/JobStatus
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
>         at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
>         at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>         at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
>         at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
>         at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
>         at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeData(ZookeeperClient.java:119)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperClient.writeInt32(ZookeeperClient.java:126)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.startJob(ZookeeperService.java:176)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.handle(ZookeeperService.java:104)
>         at com.suning.cybertron.superion.worker.containermanager.zookeeper.ZookeeperService.handle(ZookeeperService.java:30)
>         at com.suning.cybertron.superion.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:138)
>         at com.suning.cybertron.superion.event.AsyncDispatcher$1.run(AsyncDispatcher.java:85)
>         at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)