You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Guangxu Cheng (Jira)" <ji...@apache.org> on 2020/11/15 04:31:00 UTC

[jira] [Created] (KYLIN-4819) build cube failed when `kylin.metadata.hbase-client-retries-number` great than 1

Guangxu Cheng created KYLIN-4819:
------------------------------------

             Summary: build cube failed when `kylin.metadata.hbase-client-retries-number` great than 1
                 Key: KYLIN-4819
                 URL: https://issues.apache.org/jira/browse/KYLIN-4819
             Project: Kylin
          Issue Type: Bug
          Components: Job Engine
    Affects Versions: v3.1.1
            Reporter: Guangxu Cheng
            Assignee: Guangxu Cheng


{code:bash}
2020-11-11 07:31:49,187 TRACE [Scheduler 2133794029 Job 70c242ce-6756-f77a-4b79-6b75c6ecd884-22265] hbase.HBaseResourceStore:334 : Update row /execute_output/70c242ce-6756-f77a-4b79-6b75c6ecd884-10 from oldTs: 1605051060239, to newTs: 1605051080210, operation result: false
2020-11-11 07:31:49,196 ERROR [Scheduler 2133794029 Job 70c242ce-6756-f77a-4b79-6b75c6ecd884-22265] common.MapReduceExecutable:212 : error execute MapReduceExecutable\{id=70c242ce-6756-f77a-4b79-6b75c6ecd884-10, name=Build N-Dimension Cuboid : level 5, state=RUNNING}
org.apache.kylin.common.persistence.WriteConflictException: Overwriting conflict /execute_output/70c242ce-6756-f77a-4b79-6b75c6ecd884-10, expect old TS 1605051060239, but it is 1605051080210
 at org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:337)
 at org.apache.kylin.common.persistence.ResourceStore$6.call(ResourceStore.java:443)
 at org.apache.kylin.common.persistence.ResourceStore$6.call(ResourceStore.java:440)
 at org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
 at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceWithRetry(ResourceStore.java:440)
 at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:428)
 at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:422)
 at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:402)
 at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:381)
 at org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:252)
 at org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:426)
 at org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:570)
 at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:177)
 at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
 at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
 at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
 at org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
{code}
When the HBase cluster has performance problems or regions move, kylin may fail to access HBase. However, many exceptions can be recovered by retrying. Therefore, I suggest setting the default value of the number of retries to 3 [KYLIN-4711|https://issues.apache.org/jira/browse/KYLIN-4711]

However, after retrying is enabled, the exception writeconflictexception will appear in some scenarios, which is caused by the checkAndPut operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)