You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by GitBox <gi...@apache.org> on 2019/06/05 07:56:52 UTC

[GitHub] [kylin] wangxiaojing123 commented on a change in pull request #664: KYLIN-4017 Build engine get zk(zookeeper) lock failed when building job, it causes the whole build engine doesn't work

wangxiaojing123 commented on a change in pull request #664: KYLIN-4017 Build engine get zk(zookeeper) lock failed when building job, it causes the whole build engine doesn't work
URL: https://github.com/apache/kylin/pull/664#discussion_r290618823
 
 

 ##########
 File path: core-common/src/main/java/org/apache/kylin/common/util/ZKUtil.java
 ##########
 @@ -84,7 +84,7 @@ public void onRemoval(RemovalNotification<String, CuratorFramework> notification
                         logger.error("Error at closing " + curator, ex);
                     }
                 }
-            }).expireAfterWrite(1, TimeUnit.DAYS).build();
+            }).expireAfterWrite(10000, TimeUnit.DAYS).build();//never expired
 
 Review comment:
   > if the cache expire after 1 day,then will run  curator.close(),in other words the newZookeeperClient will closed, but the newZookeeperClient should be as start state all the build engine lifecycle ,it used when build segment.if newZookeeperClient.state!=start,it can't get zk lock ,can't build :
   
   DistributedScheduler
   
   ```java
   public void run() {
               try (SetThreadName ignored = new SetThreadName("Scheduler %s Job %s",
                       System.identityHashCode(DistributedScheduler.this), executable.getId())) {
                   if (jobLock.lock(getLockPath(executable.getId()))) {
                       logger.info(executable.toString() + " scheduled in server: " + serverName);
   
                       context.addRunningJob(executable);
                       jobWithLocks.add(executable.getId());
                       executable.execute(context);
                   }
               } catch (ExecuteException e) {
                   logger.error("ExecuteException job:" + executable.getId() + " in server: " + serverName, e);
               } catch (Exception e) {
                   logger.error("unknown error execute job:" + executable.getId() + " in server: " + serverName, e);
               } finally {
                   context.removeRunningJob(executable);
                   releaseJobLock(executable);
                   // trigger the next step asap
                   fetcherPool.schedule(fetcher, 0, TimeUnit.SECONDS);
               }
           }
   ```
   
   
    ZookeeperDistributedLock:
    ```java
   public boolean lock(String lockPath) {
           logger.debug("{} trying to lock {}", client, lockPath);
           try {
               curator.create().creatingParentsIfNeeded().withMode(CreateMode.EPHEMERAL).forPath(lockPath, clientBytes);
           } catch (KeeperException.NodeExistsException ex) {
               logger.debug("{} see {} is already locked", client, lockPath);
           } catch (Exception ex) {
               throw new IllegalStateException("Error while " + client + " trying to lock " + lockPath, ex);
           }
   
           String lockOwner = peekLock(lockPath);
           if (client.equals(lockOwner)) {
               logger.info("{} acquired lock at {}", client, lockPath);
               return true;
           } else {
               logger.debug("{} failed to acquire lock at {}, which is held by {}", client, lockPath, lockOwner);
               return false;
           }
       }
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services