You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "hujiahua (Jira)" <ji...@apache.org> on 2021/11/17 02:28:00 UTC
[jira] [Updated] (KYLIN-5128) The job of resizing global dict bucket sometimes run for a long time

     [ https://issues.apache.org/jira/browse/KYLIN-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hujiahua updated KYLIN-5128:
----------------------------
    Description: 
I often encounter cube building job running for a long time in the global dict resizing process stage. After spark stage analysis, I found that it was caused by too little concurrency of the task.
 !image-2021-11-17-10-03-26-943.png! 

And I also found kylin using sparkSession.createDataset to build dict bucket dataset, which mean the parallelize size was `sparkContext.defaultParallelism`. When enable spark executor dynamic allocation (set spark.dynamicAllocation.enabled = true) ,sparkContext.defaultParallelism will change during runtime, and have a chance to get a small parallelism value.
 !image-2021-11-17-10-12-46-187.png! 



  was:
I often encounter cube building job running for a long time in the global dict resizing process stage. After spark stage analysis, I found that it was caused by too little concurrency of the task.
 !image-2021-11-17-10-03-26-943.png! 

And I also found kylin using sparkSession.createDataset to build dict bucket dataset, where mean the parallelize size was `sparkContext.defaultParallelism`. When enable spark executor dynamic allocation (spark.dynamicAllocation.enabled) ,sparkContext.defaultParallelism will change during runtime, and have a chance to get a small parallelism value.
 !image-2021-11-17-10-12-46-187.png! 




> The job of resizing global dict bucket sometimes run for a long time
> --------------------------------------------------------------------
>
>                 Key: KYLIN-5128
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5128
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v4.0.0
>            Reporter: hujiahua
>            Priority: Major
>         Attachments: image-2021-11-17-10-03-26-943.png, image-2021-11-17-10-12-46-187.png
>
>
> I often encounter cube building job running for a long time in the global dict resizing process stage. After spark stage analysis, I found that it was caused by too little concurrency of the task.
>  !image-2021-11-17-10-03-26-943.png! 
> And I also found kylin using sparkSession.createDataset to build dict bucket dataset, which mean the parallelize size was `sparkContext.defaultParallelism`. When enable spark executor dynamic allocation (set spark.dynamicAllocation.enabled = true) ,sparkContext.defaultParallelism will change during runtime, and have a chance to get a small parallelism value.
>  !image-2021-11-17-10-12-46-187.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)