You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Zhijiang (Jira)" <ji...@apache.org> on 2020/05/22 03:09:00 UTC

[jira] [Resolved] (FLINK-17823) Resolve the race condition while releasing RemoteInputChannel

     [ https://issues.apache.org/jira/browse/FLINK-17823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhijiang resolved FLINK-17823.
------------------------------
    Resolution: Fixed

Merged in release-1.11: 3eb1075ded64da20e6f7a5bc268f455eaf6573eb

Will merge to master later and update the info.

> Resolve the race condition while releasing RemoteInputChannel
> -------------------------------------------------------------
>
>                 Key: FLINK-17823
>                 URL: https://issues.apache.org/jira/browse/FLINK-17823
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.11.0
>            Reporter: Zhijiang
>            Assignee: Zhijiang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>
> RemoteInputChannel#releaseAllResources might be called by canceler thread. Meanwhile, the task thread can also call RemoteInputChannel#getNextBuffer. There probably cause two potential problems:
>  * Task thread might get null buffer after canceler thread already released all the buffers, then it might cause misleading NPE in getNextBuffer.
>  * Task thread and canceler thread might pull the same buffer concurrently, which causes unexpected exception when the same buffer is recycled twice.
> The solution is to properly synchronize the buffer queue in release method to avoid the same buffer pulled by both canceler thread and task thread. And in getNextBuffer method, we add some explicit checks to avoid misleading NPE and hint some valid exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)