You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "nichunen (JIRA)" <ji...@apache.org> on 2019/07/17 06:33:00 UTC

[jira] [Updated] (KYLIN-4017) Build engine get zk(zookeeper) lock failed when building job, it causes the whole build engine doesn't work.

     [ https://issues.apache.org/jira/browse/KYLIN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nichunen updated KYLIN-4017:
----------------------------
    Affects Version/s:     (was: Future)
        Fix Version/s:     (was: v3.0.0-beta)
                       v3.0.0-alpha2

> Build engine get zk(zookeeper) lock failed when building job, it causes the whole build engine doesn't work.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4017
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4017
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine, Tools, Build and Test
>    Affects Versions: v3.0.0, v3.0.0-alpha
>            Reporter: wangxiaojing
>            Priority: Critical
>              Labels: build
>             Fix For: v3.0.0-alpha2
>
>         Attachments: zkinstancestart.png
>
>
> Kylin has ZK acquisition lock exception when it is building job. Only restart can solve this problem. Otherwise, it can't build job ,the whole build engine doesn't work.This problem will continue to occur one day after restart. Log looks like below:
> {code:java}
> 2019-05-15 11:09:43,209 INFO [FetcherRunner 1910115020-57] threadpool.FetcherRunner:59 : CubingJob{id=878974c4-4c65-88a4-a912-b238fcc33bdc, name=BUILD CUBE - es_report_respnse_rate_cube - 20190513000000_20190514000000 - GMT+08:00 2019-05-15 11:03:15, state=READY} prepare to schedule and its priority is 20
> 2019-05-15 11:09:43,209 INFO [FetcherRunner 1910115020-57] threadpool.FetcherRunner:63 : CubingJob{id=878974c4-4c65-88a4-a912-b238fcc33bdc, name=BUILD CUBE - es_report_respnse_rate_cube - 20190513000000_20190514000000 - GMT+08:00 2019-05-15 11:03:15, state=READY} scheduled
> 2019-05-15 11:09:43,209 DEBUG [Scheduler 719764581 Job 878974c4-4c65-88a4-a912-b238fcc33bdc-132] zookeeper.ZookeeperDistributedLock:92 : 18786@bigdata-kylin-build01.gz01.diditaxi.com trying to lock /job_engine/lock/878974c4-4c65-88a4-a912-b238fcc33bdc
> 2019-05-15 11:09:43,212 ERROR [pool-12-thread-10] threadpool.DistributedScheduler:115 : unknown error execute job:878974c4-4c65-88a4-a912-b238fcc33bdc in server: 18786@bigdata-kylin-build01.gz01.diditaxi.com
> java.lang.IllegalStateException: Error while 18786@bigdata-kylin-build01.gz01.diditaxi.com trying to lock /job_engine/lock/878974c4-4c65-88a4-a912-b238fcc33bdc
>  at org.apache.kylin.job.lock.zookeeper.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:99)
>  at org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock.lock(ZookeeperJobLock.java:41)
>  at org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:105)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: instance must be started before calling this method
>  at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>  at org.apache.curator.framework.imps.CuratorFrameworkImpl.create(CuratorFrameworkImpl.java:351)
>  at org.apache.kylin.job.lock.zookeeper.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:95)
>  ... 5 more{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)