You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zhijiang (JIRA)" <ji...@apache.org> on 2018/10/22 03:57:00 UTC

[jira] [Comment Edited] (FLINK-9761) Potential buffer leak in PartitionRequestClientHandler during job failures

    [ https://issues.apache.org/jira/browse/FLINK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658543#comment-16658543 ] 

zhijiang edited comment on FLINK-9761 at 10/22/18 3:56 AM:
-----------------------------------------------------------

I just quickly reviewed the related codes and think this is still a problem which exists only in non-credit-based mode.

When {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} is called by canceler thread, and the {{stagedBufferResponse}} exists currently. But we directly set {{stagedBufferResponse = null}}, so it has no chance to consume and release this netty message any more resulting in leak issue.

 

Even though the {{stageMessages}} is not empty, the {{stagedMessageHandler}} would only consume and release the messages in this {{stageMessages}} list, and it will not consume and release {{stagedBufferResponse}} firstly. So it still has logic problem I think.

 

Maybe need [~NicoK] double check if I guessed the above issue correctly.


was (Author: zjwang):
I just quickly reviewed the related codes and think this is still a problem which exists only in non-credit-based mode.

When {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} is called by canceler thread, and the {{stagedBufferResponse}} is not currently. But we directly set {{stagedBufferResponse = null}}, so it has no chance to consume and release this netty message any more resulting in leak issue.

 

Even though the {{stageMessages}} is not empty, the {{stagedMessageHandler}} would only consume and release the messages in this {{stageMessages}} list, and it will not consume and release {{stagedBufferResponse}} firstly. So it still has logic problem I think.

 

Maybe need [~NicoK] double check if I guessed the above issue correctly.

> Potential buffer leak in PartitionRequestClientHandler during job failures
> --------------------------------------------------------------------------
>
>                 Key: FLINK-9761
>                 URL: https://issues.apache.org/jira/browse/FLINK-9761
>             Project: Flink
>          Issue Type: Bug
>          Components: Network
>    Affects Versions: 1.5.0
>            Reporter: Nico Kruber
>            Assignee: Nico Kruber
>            Priority: Critical
>             Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> {{PartitionRequestClientHandler#stagedMessages}} may be accessed from multiple threads:
> 1) Netty's IO thread
> 2) During cancellation, {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} is called
> If {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} thinks, {{stagesMessages}} is empty, however, it will not install the {{stagedMessagesHandler}} that consumes and releases buffers from received messages.
> Unless some unexpected combination of code calls prevents this from happening, this would leak the non-recycled buffers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)