You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Nico Kruber (JIRA)" <ji...@apache.org> on 2018/07/02 10:00:00 UTC
[jira] [Comment Edited] (FLINK-9636) Network buffer leaks in
requesting a batch of segments during canceling
[ https://issues.apache.org/jira/browse/FLINK-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529613#comment-16529613 ]
Nico Kruber edited comment on FLINK-9636 at 7/2/18 9:59 AM:
------------------------------------------------------------
Actually, {{numRequiredBuffers}} is only a local variable in this method - why should we bother changing it?
Also, if there is an {{InterruptedException}} when polling memory segments from the {{availableMemorySegments}} queue, this will be re-thrown and the request will fail - {{NetworkBufferPool}} should then be restored to the state it was before which it is, isn't it?
I see only one point where the accounting for {{numTotalRequiredBuffers}} can be wrong: if an exception is thrown in the first of the {{redistributeBuffers()}} calls. Tracing it further down, this can only happen if {{SpillableSubpartition#releaseMemory()}} throws, e.g. due to a failure in creating a {{spillWriter}}. I'm working on a patch...
was (Author: nicok):
Actually, {{numRequiredBuffers}} is only a local variable in this method - why should we bother changing it?
Also, if there is an {{InterruptedException}} when polling memory segments from the {{availableMemorySegments}} queue, this will be re-thrown and the request will fail - {{NetworkBufferPool}} should then be restored to the state it was before which it is, isn't it?
I see only one point where the accounting for {{numTotalRequiredBuffers}} can be wrong: if an exception is thrown in the first of the {{redistributeBuffers()}} calls.
> Network buffer leaks in requesting a batch of segments during canceling
> -----------------------------------------------------------------------
>
> Key: FLINK-9636
> URL: https://issues.apache.org/jira/browse/FLINK-9636
> Project: Flink
> Issue Type: Bug
> Components: Network
> Affects Versions: 1.5.0, 1.6.0
> Reporter: zhijiang
> Priority: Major
> Fix For: 1.5.1
>
>
> In {{NetworkBufferPool#requestMemorySegments}}, {{numTotalRequiredBuffers}} is increased by {{numRequiredBuffers}} first.
> If {{InterruptedException}} is thrown during polling segments from the available queue, the requested segments will be recycled back to {{NetworkBufferPool}}, {{numTotalRequiredBuffers}} is decreased by the number of polled segments which is now inconsistent with {{numRequiredBuffers}}. So {{numTotalRequiredBuffers}} in {{NetworkBufferPool}} leaks in this case, and we can also decrease {{numRequiredBuffers}} to fix this bug.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)