You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Zhu Zhu (Jira)" <ji...@apache.org> on 2020/08/10 01:45:00 UTC

[jira] [Comment Edited] (FLINK-16069) Creation of TaskDeploymentDescriptor can block main thread for long time

    [ https://issues.apache.org/jira/browse/FLINK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173885#comment-17173885 ] 

Zhu Zhu edited comment on FLINK-16069 at 8/10/20, 1:44 AM:
-----------------------------------------------------------

Hi Yumeng, I think we have not reached a consensus. I once did a PoC of the idea that "cache generated ShuffleDescriptors for ALL-to-ALL edges for reuse". But there were no further progress due to some other prioritized work.
I'd like to understand if this has become a blocking problem to your industrial practice. If it is, we can resume this discussion and prioritize this improvement.


was (Author: zhuzh):
Hi Yumeng, I think we have not reached a consensus. I once did a PoC of the idea that "cache generated ShuffleDescriptors for ALL-to-ALL edges for reuse". But there were no further progress due to some other prioritized work.
I'd like to understand if this has become a serious and urgent problem for you. If it is, we can resume this discussion.

> Creation of TaskDeploymentDescriptor can block main thread for long time
> ------------------------------------------------------------------------
>
>                 Key: FLINK-16069
>                 URL: https://issues.apache.org/jira/browse/FLINK-16069
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: huweihua
>            Priority: Major
>
> The deploy of tasks will take long time when we submit a high parallelism job. And Execution#deploy run in mainThread, so it will block JobMaster process other akka messages, such as Heartbeat. The creation of TaskDeploymentDescriptor take most of time. We can put the creation in future.
> For example, A job [source(8000)->sink(8000)], the total 16000 tasks from SCHEDULED to DEPLOYING took more than 1mins. This caused the heartbeat of TaskManager timeout and job never success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)