You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by liang <pw...@gmail.com> on 2019/12/10 09:43:01 UTC

Is refresh segments currently supported?

Hi there,
  We noticed a segment refresh job at ERROR status at "*2019-12-10 02:27:15*
". After a deep dive into kylin.log, we found that the
WriteConflictException was raised when trying to update the last modified
time for the dict.

(Sensitive content are consered)

2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job
> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 :
> error execute
> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03,
> name=Build Dimension Dictionary, state=RUNNING}
> org.apache.kylin.common.persistence.WriteConflictException: Overwriting
> conflict
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
> expect old TS 1575916011421, but it is 1575916035573
> at
> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
> at
> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
> at
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
> at
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
> at
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
> at
> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
> at
> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
> at
> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
> at
> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
> at
> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
> at org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
> at
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
> at
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>

We have a glance at the codebase (Kylin-2.6.4), we found the last modified
time of the dict (which has been built) will be updated at the step "Build
Dimension Dictionary". The log blow shows the same dict is processed many
times.

Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been built,
> save it"|grep "THIS_IS_COLUMN_NAME"
> 2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job
> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job
> 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job
> 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job
> 69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job
> 345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job
> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job
> 6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job
> c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job
> cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job
> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job
> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job
> f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job
> f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 :
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>

Since we issue the refresh segments request concurrently (with REST APIs),
the dict has a chance to be modified concurrently. Kylin uses checkAndPut
to update HBase to prevent race-condition, so the WriteConflictException
may be raised when updating the same resource.

Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
> "Update dictionary "
> 2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job
> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575914830203
> 2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job
> 593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575914831090
> 2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job
> 160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575914840528
> 2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job
> 345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575914841727
> 2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job
> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575914843095
> 2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job
> c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575916003771
> 2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job
> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575916011421
> 2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job
> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575916035573
> 2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job
> f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 :
> Update dictionary
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
> lastModifiedTime to 1575916182549
>

The log above shown that the job "f76fb12e-477f-b404-e021-adaf520d04bb" has
not been print out the log - "Update dictionary blablabla...." because of
the WriteConflictException.

The question is: refreshing segments concurrently is supported for Kylin
2.6.4? Concurrently refreshing is very important for us, since we want to
reduce the building time.

Regards,
Liang

Re: Is refresh segments currently supported?

Posted by liang <pw...@gmail.com>.
Hi ShaoFeng,

I've created an issue: https://issues.apache.org/jira/browse/KYLIN-4291. I
hope this issue will be resolved soon.

Thanks,
Liang

On Tue, Dec 10, 2019 at 6:55 PM ShaoFeng Shi <sh...@apache.org> wrote:

> Awesome, thank you!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> liang <pw...@gmail.com> 于2019年12月10日周二 下午6:23写道:
>
>> Hi ShaoFeng,
>>
>> I'll create an issue in JIRA later. Thanks for your reply.
>>
>> Regards,
>> Liang
>>
>> On Tue, Dec 10, 2019 at 6:17 PM ShaoFeng Shi <sh...@apache.org>
>> wrote:
>>
>>> Yes, Kylin supports in parallel segment building. It seems to be a bug
>>> introduced in KYLIN-3977 in Kylin 2.6.3. Would you like to create it as a
>>> JIRA so that you can get notified when there is a hot-fix? The JIRA can be
>>> reported in https://issues.apache.org/jira/secure/Dashboard.jspa , and
>>> selecting KYLIN as the project. Thank you!
>>>
>>>
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>> Apache Kylin PMC
>>> Email: shaofengshi@apache.org
>>>
>>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>>> Join Kylin user mail group: user-subscribe@kylin.apache.org
>>> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>>>
>>>
>>>
>>>
>>> liang <pw...@gmail.com> 于2019年12月10日周二 下午5:43写道:
>>>
>>>> Hi there,
>>>>   We noticed a segment refresh job at ERROR status at "*2019-12-10
>>>> 02:27:15*". After a deep dive into kylin.log, we found that the
>>>> WriteConflictException was raised when trying to update the last modified
>>>> time for the dict.
>>>>
>>>> (Sensitive content are consered)
>>>>
>>>> 2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job
>>>>> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 :
>>>>> error execute
>>>>> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03,
>>>>> name=Build Dimension Dictionary, state=RUNNING}
>>>>> org.apache.kylin.common.persistence.WriteConflictException:
>>>>> Overwriting conflict
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
>>>>> expect old TS 1575916011421, but it is 1575916035573
>>>>> at
>>>>> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
>>>>> at
>>>>> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
>>>>> at
>>>>> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>>>>> at
>>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
>>>>> at
>>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
>>>>> at
>>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
>>>>> at
>>>>> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
>>>>> at
>>>>> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
>>>>> at
>>>>> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
>>>>> at
>>>>> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
>>>>> at
>>>>> org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
>>>>> at
>>>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
>>>>> at
>>>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
>>>>> at
>>>>> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>>>>> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>>>>> at
>>>>> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>>>>> at
>>>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>>>>> at
>>>>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>>>>> at
>>>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>>>>> at
>>>>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>
>>>> We have a glance at the codebase (Kylin-2.6.4), we found the last
>>>> modified time of the dict (which has been built) will be updated at the
>>>> step "Build Dimension Dictionary". The log blow shows the same dict is
>>>> processed many times.
>>>>
>>>> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been
>>>>> built, save it"|grep "THIS_IS_COLUMN_NAME"
>>>>> 2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job
>>>>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job
>>>>> 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job
>>>>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job
>>>>> 69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job
>>>>> 345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job
>>>>> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job
>>>>> 6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job
>>>>> c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job
>>>>> cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job
>>>>> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job
>>>>> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job
>>>>> f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>> 2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job
>>>>> f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 :
>>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>>
>>>>
>>>> Since we issue the refresh segments request concurrently (with REST
>>>> APIs), the dict has a chance to be modified concurrently. Kylin uses checkAndPut
>>>> to update HBase to prevent race-condition, so the
>>>> WriteConflictException may be raised when updating the same resource.
>>>>
>>>> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
>>>>> "Update dictionary "
>>>>> 2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job
>>>>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575914830203
>>>>> 2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job
>>>>> 593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575914831090
>>>>> 2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job
>>>>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575914840528
>>>>> 2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job
>>>>> 345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575914841727
>>>>> 2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job
>>>>> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575914843095
>>>>> 2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job
>>>>> c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575916003771
>>>>> 2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job
>>>>> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575916011421
>>>>> 2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job
>>>>> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575916035573
>>>>> 2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job
>>>>> f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 :
>>>>> Update dictionary
>>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>>> lastModifiedTime to 1575916182549
>>>>>
>>>>
>>>> The log above shown that the job "f76fb12e-477f-b404-e021-adaf520d04bb"
>>>> has not been print out the log - "Update dictionary blablabla...."
>>>> because of the WriteConflictException.
>>>>
>>>> The question is: refreshing segments concurrently is supported for
>>>> Kylin 2.6.4? Concurrently refreshing is very important for us, since we
>>>> want to reduce the building time.
>>>>
>>>> Regards,
>>>> Liang
>>>>
>>>

Re: Is refresh segments currently supported?

Posted by ShaoFeng Shi <sh...@apache.org>.
Awesome, thank you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




liang <pw...@gmail.com> 于2019年12月10日周二 下午6:23写道:

> Hi ShaoFeng,
>
> I'll create an issue in JIRA later. Thanks for your reply.
>
> Regards,
> Liang
>
> On Tue, Dec 10, 2019 at 6:17 PM ShaoFeng Shi <sh...@apache.org>
> wrote:
>
>> Yes, Kylin supports in parallel segment building. It seems to be a bug
>> introduced in KYLIN-3977 in Kylin 2.6.3. Would you like to create it as a
>> JIRA so that you can get notified when there is a hot-fix? The JIRA can be
>> reported in https://issues.apache.org/jira/secure/Dashboard.jspa , and
>> selecting KYLIN as the project. Thank you!
>>
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofengshi@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscribe@kylin.apache.org
>> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>>
>>
>>
>>
>> liang <pw...@gmail.com> 于2019年12月10日周二 下午5:43写道:
>>
>>> Hi there,
>>>   We noticed a segment refresh job at ERROR status at "*2019-12-10
>>> 02:27:15*". After a deep dive into kylin.log, we found that the
>>> WriteConflictException was raised when trying to update the last modified
>>> time for the dict.
>>>
>>> (Sensitive content are consered)
>>>
>>> 2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job
>>>> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 :
>>>> error execute
>>>> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03,
>>>> name=Build Dimension Dictionary, state=RUNNING}
>>>> org.apache.kylin.common.persistence.WriteConflictException: Overwriting
>>>> conflict
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
>>>> expect old TS 1575916011421, but it is 1575916035573
>>>> at
>>>> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
>>>> at
>>>> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
>>>> at
>>>> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>>>> at
>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
>>>> at
>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
>>>> at
>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
>>>> at
>>>> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
>>>> at
>>>> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
>>>> at
>>>> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
>>>> at
>>>> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
>>>> at
>>>> org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
>>>> at
>>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
>>>> at
>>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
>>>> at
>>>> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>>>> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>>>> at
>>>> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>>>> at
>>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>>>> at
>>>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>>>> at
>>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>>>> at
>>>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>
>>> We have a glance at the codebase (Kylin-2.6.4), we found the last
>>> modified time of the dict (which has been built) will be updated at the
>>> step "Build Dimension Dictionary". The log blow shows the same dict is
>>> processed many times.
>>>
>>> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been
>>>> built, save it"|grep "THIS_IS_COLUMN_NAME"
>>>> 2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job
>>>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job
>>>> 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job
>>>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job
>>>> 69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job
>>>> 345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job
>>>> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job
>>>> 6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job
>>>> c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job
>>>> cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job
>>>> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job
>>>> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job
>>>> f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>> 2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job
>>>> f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 :
>>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>>
>>>
>>> Since we issue the refresh segments request concurrently (with REST
>>> APIs), the dict has a chance to be modified concurrently. Kylin uses checkAndPut
>>> to update HBase to prevent race-condition, so the
>>> WriteConflictException may be raised when updating the same resource.
>>>
>>> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
>>>> "Update dictionary "
>>>> 2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job
>>>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575914830203
>>>> 2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job
>>>> 593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575914831090
>>>> 2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job
>>>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575914840528
>>>> 2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job
>>>> 345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575914841727
>>>> 2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job
>>>> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575914843095
>>>> 2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job
>>>> c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575916003771
>>>> 2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job
>>>> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575916011421
>>>> 2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job
>>>> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575916035573
>>>> 2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job
>>>> f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 :
>>>> Update dictionary
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>>> lastModifiedTime to 1575916182549
>>>>
>>>
>>> The log above shown that the job "f76fb12e-477f-b404-e021-adaf520d04bb"
>>> has not been print out the log - "Update dictionary blablabla...."
>>> because of the WriteConflictException.
>>>
>>> The question is: refreshing segments concurrently is supported for Kylin
>>> 2.6.4? Concurrently refreshing is very important for us, since we want to
>>> reduce the building time.
>>>
>>> Regards,
>>> Liang
>>>
>>

Re: Is refresh segments currently supported?

Posted by liang <pw...@gmail.com>.
Hi ShaoFeng,

I'll create an issue in JIRA later. Thanks for your reply.

Regards,
Liang

On Tue, Dec 10, 2019 at 6:17 PM ShaoFeng Shi <sh...@apache.org> wrote:

> Yes, Kylin supports in parallel segment building. It seems to be a bug
> introduced in KYLIN-3977 in Kylin 2.6.3. Would you like to create it as a
> JIRA so that you can get notified when there is a hot-fix? The JIRA can be
> reported in https://issues.apache.org/jira/secure/Dashboard.jspa , and
> selecting KYLIN as the project. Thank you!
>
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> liang <pw...@gmail.com> 于2019年12月10日周二 下午5:43写道:
>
>> Hi there,
>>   We noticed a segment refresh job at ERROR status at "*2019-12-10
>> 02:27:15*". After a deep dive into kylin.log, we found that the
>> WriteConflictException was raised when trying to update the last modified
>> time for the dict.
>>
>> (Sensitive content are consered)
>>
>> 2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job
>>> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 :
>>> error execute
>>> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03,
>>> name=Build Dimension Dictionary, state=RUNNING}
>>> org.apache.kylin.common.persistence.WriteConflictException: Overwriting
>>> conflict
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
>>> expect old TS 1575916011421, but it is 1575916035573
>>> at
>>> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
>>> at
>>> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
>>> at
>>> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>>> at
>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
>>> at
>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
>>> at
>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
>>> at
>>> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
>>> at
>>> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
>>> at
>>> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
>>> at
>>> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
>>> at
>>> org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
>>> at
>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
>>> at
>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
>>> at
>>> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>>> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>>> at
>>> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>>> at
>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>>> at
>>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>>> at
>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>>> at
>>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>
>> We have a glance at the codebase (Kylin-2.6.4), we found the last
>> modified time of the dict (which has been built) will be updated at the
>> step "Build Dimension Dictionary". The log blow shows the same dict is
>> processed many times.
>>
>> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been
>>> built, save it"|grep "THIS_IS_COLUMN_NAME"
>>> 2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job
>>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job
>>> 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job
>>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job
>>> 69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job
>>> 345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job
>>> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job
>>> 6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job
>>> c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job
>>> cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job
>>> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job
>>> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job
>>> f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>> 2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job
>>> f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 :
>>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>>
>>
>> Since we issue the refresh segments request concurrently (with REST
>> APIs), the dict has a chance to be modified concurrently. Kylin uses checkAndPut
>> to update HBase to prevent race-condition, so the WriteConflictException
>> may be raised when updating the same resource.
>>
>> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
>>> "Update dictionary "
>>> 2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job
>>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575914830203
>>> 2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job
>>> 593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575914831090
>>> 2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job
>>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575914840528
>>> 2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job
>>> 345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575914841727
>>> 2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job
>>> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575914843095
>>> 2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job
>>> c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575916003771
>>> 2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job
>>> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575916011421
>>> 2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job
>>> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575916035573
>>> 2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job
>>> f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 :
>>> Update dictionary
>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>>> lastModifiedTime to 1575916182549
>>>
>>
>> The log above shown that the job "f76fb12e-477f-b404-e021-adaf520d04bb"
>> has not been print out the log - "Update dictionary blablabla...."
>> because of the WriteConflictException.
>>
>> The question is: refreshing segments concurrently is supported for Kylin
>> 2.6.4? Concurrently refreshing is very important for us, since we want to
>> reduce the building time.
>>
>> Regards,
>> Liang
>>
>

Re: Is refresh segments currently supported?

Posted by ShaoFeng Shi <sh...@apache.org>.
Yes, Kylin supports in parallel segment building. It seems to be a bug
introduced in KYLIN-3977 in Kylin 2.6.3. Would you like to create it as a
JIRA so that you can get notified when there is a hot-fix? The JIRA can be
reported in https://issues.apache.org/jira/secure/Dashboard.jspa , and
selecting KYLIN as the project. Thank you!


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




liang <pw...@gmail.com> 于2019年12月10日周二 下午5:43写道:

> Hi there,
>   We noticed a segment refresh job at ERROR status at "*2019-12-10
> 02:27:15*". After a deep dive into kylin.log, we found that the
> WriteConflictException was raised when trying to update the last modified
> time for the dict.
>
> (Sensitive content are consered)
>
> 2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job
>> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 :
>> error execute
>> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03,
>> name=Build Dimension Dictionary, state=RUNNING}
>> org.apache.kylin.common.persistence.WriteConflictException: Overwriting
>> conflict
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
>> expect old TS 1575916011421, but it is 1575916035573
>> at
>> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
>> at
>> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
>> at
>> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
>> at
>> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
>> at
>> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
>> at
>> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
>> at org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
>> at
>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
>> at
>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
>> at
>> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>> at
>> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>> at
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>> at
>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>> at
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>> at
>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>
> We have a glance at the codebase (Kylin-2.6.4), we found the last modified
> time of the dict (which has been built) will be updated at the step "Build
> Dimension Dictionary". The log blow shows the same dict is processed many
> times.
>
> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been
>> built, save it"|grep "THIS_IS_COLUMN_NAME"
>> 2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job
>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job
>> 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job
>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job
>> 69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job
>> 345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job
>> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job
>> 6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job
>> c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job
>> cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job
>> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job
>> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job
>> f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job
>> f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>>
>
> Since we issue the refresh segments request concurrently (with REST APIs),
> the dict has a chance to be modified concurrently. Kylin uses checkAndPut
> to update HBase to prevent race-condition, so the WriteConflictException
> may be raised when updating the same resource.
>
> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
>> "Update dictionary "
>> 2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job
>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575914830203
>> 2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job
>> 593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575914831090
>> 2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job
>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575914840528
>> 2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job
>> 345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575914841727
>> 2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job
>> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575914843095
>> 2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job
>> c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575916003771
>> 2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job
>> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575916011421
>> 2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job
>> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575916035573
>> 2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job
>> f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 :
>> Update dictionary
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>> lastModifiedTime to 1575916182549
>>
>
> The log above shown that the job "f76fb12e-477f-b404-e021-adaf520d04bb"
> has not been print out the log - "Update dictionary blablabla...."
> because of the WriteConflictException.
>
> The question is: refreshing segments concurrently is supported for Kylin
> 2.6.4? Concurrently refreshing is very important for us, since we want to
> reduce the building time.
>
> Regards,
> Liang
>