You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (JIRA)" <ji...@apache.org> on 2018/11/27 14:38:00 UTC

[jira] [Closed] (FLINK-10367) Avoid recursion stack overflow during releasing SingleInputGate

     [ https://issues.apache.org/jira/browse/FLINK-10367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Piotr Nowojski closed FLINK-10367.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.8.0
                   1.6.3

merged commit b379316 into apache:master
merged commit 73858ea6e7 into apache:release-1.6

not yet merged to release-1.7 because of release being in progress


> Avoid recursion stack overflow during releasing SingleInputGate
> ---------------------------------------------------------------
>
>                 Key: FLINK-10367
>                 URL: https://issues.apache.org/jira/browse/FLINK-10367
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.6.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.6.3, 1.8.0
>
>
> For task failure or canceling, the {{SingleInputGate#releaseAllResources}} will be invoked before task exits.
> In the process of {{SingleInputGate#releaseAllResources}}, we first loop to release all the input channels, then destroy the {{BufferPool}}.  For {{RemoteInputChannel#releaseAllResources}}, it will return floating buffers to the {{BufferPool}} {{which assigns this recycled buffer to the other listeners(RemoteInputChannel}}). 
> It may exist recursive call in this process. If the listener is already released before, it will directly recycle this buffer to the {{BufferPool}} which takes another listener to notify available buffer. The above process may be invoked repeatedly in recursive way.
> If there are many input channels as listeners in the {{BufferPool}}, it will cause {{StackOverflow}} error because of recursion. And in our testing job, the scale of 10,000 input channels ever caused this error.
> I think of two ways for solving this potential problem:
>  # When the input channel is released, it should notify the {{BufferPool}} of unregistering this listener, otherwise it is inconsistent between them.
>  # {{SingleInputGate}} should destroy the {{BufferPool}} first, then loop to release all the internal input channels. To do so, all the listeners in {{BufferPool}} will be removed during destroying, and the input channel will not have further interactions during {{RemoteInputChannel#releaseAllResources}}.
> I prefer the second way to solve this problem, because we do not want to expand another interface method for removing buffer listener, further currently the internal data structure in {{BufferPool}} can not support remove a listener directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)