You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by GitBox <gi...@apache.org> on 2019/06/05 07:56:52 UTC
[GitHub] [kylin] wangxiaojing123 commented on a change in pull request #664:
KYLIN-4017 Build engine get zk(zookeeper) lock failed when building job,
it causes the whole build engine doesn't work
wangxiaojing123 commented on a change in pull request #664: KYLIN-4017 Build engine get zk(zookeeper) lock failed when building job, it causes the whole build engine doesn't work
URL: https://github.com/apache/kylin/pull/664#discussion_r290618823
##########
File path: core-common/src/main/java/org/apache/kylin/common/util/ZKUtil.java
##########
@@ -84,7 +84,7 @@ public void onRemoval(RemovalNotification<String, CuratorFramework> notification
logger.error("Error at closing " + curator, ex);
}
}
- }).expireAfterWrite(1, TimeUnit.DAYS).build();
+ }).expireAfterWrite(10000, TimeUnit.DAYS).build();//never expired
Review comment:
> if the cache expire after 1 day,then will run curator.close(),in other words the newZookeeperClient will closed, but the newZookeeperClient should be as start state all the build engine lifecycle ,it used when build segment.if newZookeeperClient.state!=start,it can't get zk lock ,can't build :
DistributedScheduler
```java
public void run() {
try (SetThreadName ignored = new SetThreadName("Scheduler %s Job %s",
System.identityHashCode(DistributedScheduler.this), executable.getId())) {
if (jobLock.lock(getLockPath(executable.getId()))) {
logger.info(executable.toString() + " scheduled in server: " + serverName);
context.addRunningJob(executable);
jobWithLocks.add(executable.getId());
executable.execute(context);
}
} catch (ExecuteException e) {
logger.error("ExecuteException job:" + executable.getId() + " in server: " + serverName, e);
} catch (Exception e) {
logger.error("unknown error execute job:" + executable.getId() + " in server: " + serverName, e);
} finally {
context.removeRunningJob(executable);
releaseJobLock(executable);
// trigger the next step asap
fetcherPool.schedule(fetcher, 0, TimeUnit.SECONDS);
}
}
```
ZookeeperDistributedLock:
```java
public boolean lock(String lockPath) {
logger.debug("{} trying to lock {}", client, lockPath);
try {
curator.create().creatingParentsIfNeeded().withMode(CreateMode.EPHEMERAL).forPath(lockPath, clientBytes);
} catch (KeeperException.NodeExistsException ex) {
logger.debug("{} see {} is already locked", client, lockPath);
} catch (Exception ex) {
throw new IllegalStateException("Error while " + client + " trying to lock " + lockPath, ex);
}
String lockOwner = peekLock(lockPath);
if (client.equals(lockOwner)) {
logger.info("{} acquired lock at {}", client, lockPath);
return true;
} else {
logger.debug("{} failed to acquire lock at {}, which is held by {}", client, lockPath, lockOwner);
return false;
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services