You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Nico Kruber (JIRA)" <ji...@apache.org> on 2018/07/02 10:00:00 UTC

[jira] [Comment Edited] (FLINK-9636) Network buffer leaks in requesting a batch of segments during canceling

    [ https://issues.apache.org/jira/browse/FLINK-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529613#comment-16529613 ] 

Nico Kruber edited comment on FLINK-9636 at 7/2/18 9:59 AM:
------------------------------------------------------------

Actually, {{numRequiredBuffers}} is only a local variable in this method - why should we bother changing it?

Also, if there is an {{InterruptedException}} when polling memory segments from the {{availableMemorySegments}} queue, this will be re-thrown and the request will fail - {{NetworkBufferPool}} should then be restored to the state it was before which it is, isn't it?

I see only one point where the accounting for {{numTotalRequiredBuffers}} can be wrong: if an exception is thrown in the first of the {{redistributeBuffers()}} calls. Tracing it further down, this can only happen if {{SpillableSubpartition#releaseMemory()}} throws, e.g. due to a failure in creating a {{spillWriter}}. I'm working on a patch...


was (Author: nicok):
Actually, {{numRequiredBuffers}} is only a local variable in this method - why should we bother changing it?

Also, if there is an {{InterruptedException}} when polling memory segments from the {{availableMemorySegments}} queue, this will be re-thrown and the request will fail - {{NetworkBufferPool}} should then be restored to the state it was before which it is, isn't it?

I see only one point where the accounting for {{numTotalRequiredBuffers}} can be wrong: if an exception is thrown in the first of the {{redistributeBuffers()}} calls.

> Network buffer leaks in requesting a batch of segments during canceling
> -----------------------------------------------------------------------
>
>                 Key: FLINK-9636
>                 URL: https://issues.apache.org/jira/browse/FLINK-9636
>             Project: Flink
>          Issue Type: Bug
>          Components: Network
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: zhijiang
>            Priority: Major
>             Fix For: 1.5.1
>
>
> In {{NetworkBufferPool#requestMemorySegments}}, {{numTotalRequiredBuffers}} is increased by {{numRequiredBuffers}} first.
> If {{InterruptedException}} is thrown during polling segments from the available queue, the requested segments will be recycled back to {{NetworkBufferPool}}, {{numTotalRequiredBuffers}} is decreased by the number of polled segments which is now inconsistent with {{numRequiredBuffers}}. So {{numTotalRequiredBuffers}} in {{NetworkBufferPool}} leaks in this case, and we can also decrease {{numRequiredBuffers}} to fix this bug.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)