You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Nico Kruber (JIRA)" <ji...@apache.org> on 2018/12/17 14:53:00 UTC
[jira] [Commented] (FLINK-10661) Initial credit should be configured in a separate parameter

    [ https://issues.apache.org/jira/browse/FLINK-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723037#comment-16723037 ] 

Nico Kruber commented on FLINK-10661:
-------------------------------------

[~zjwang] I'm not quite sure I get the problem, so let's first try to get a common understanding of the problem. Are the following the scenarios you were describing?

1) scaling out: if, for example, we are scaling from 10 to 20, under full load I would expect the out queues of the 10 to be full and the 20 to be at 50% only, simply because there are 10 network outputs vs. 20 inputs.

2) scaling in: inversely, if you scale from 20 to 10, input queues are full but output queues at 50%.

For these cases, you want to be able to reclaim the unused buffers for some other part of the pipeline? Simply splitting {{buffers-per-channel}} into one parameter for the sender and a separate one for the receiver won't be enough then because you may have both operations, i.e. scale-out and scale-in, in your job-graph. What you may want is to be able to fine-tune this per operator; that would help and give you the desired control.

> Initial credit should be configured in a separate parameter
> -----------------------------------------------------------
>
>                 Key: FLINK-10661
>                 URL: https://issues.apache.org/jira/browse/FLINK-10661
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.4, 1.6.1
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Minor
>
> In credit-based network flow control, the required credits on receiver side are calculated by backlog plus initial credit which is equal to the value in parameter {{taskmanager.network.memory.buffers-per-channel}}. We plus the initial credit as backlog overhead in order to decrease the possibility of waiting credits on sender side. The best result is concurrent work between sender and receiver, not block each other.
>  
> We found a bad case in some rebalance or rescale scenarios, the outqueue usage reaches 100% on sender side, but the inqueue usage is about 50% or less.  That means the credit announcement is not enough for sender side although there are still many free credit resources on receiver side. So it is not reasonable resulting in wasting resources.
>  
> It would be better if we can adjust the credit overhead to debug the performance online. And it needs another separate parameter to define initial credit not messed with {{taskmanager.network.memory.buffers-per-channel}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)