You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Roman Khachatryan (Jira)" <ji...@apache.org> on 2020/05/20 23:11:00 UTC

[jira] [Comment Edited] (FLINK-17820) Memory threshold is ignored for channel state

    [ https://issues.apache.org/jira/browse/FLINK-17820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112651#comment-17112651 ] 

Roman Khachatryan edited comment on FLINK-17820 at 5/20/20, 11:10 PM:
----------------------------------------------------------------------

The data to FsCheckpointStateOutputStream is written through DataOutputStream.

Without flushing DataOutputStream some data can be left in its buffer. Flushing DataOutputStream also flushes the underlying FsCheckpointStateOutputStream.

Although in current JDK DataOutputStream doesn't buffer data. Do you think we can rely on it?


was (Author: roman_khachatryan):
The data to FsCheckpointStateOutputStream is written through DataOutputStream.

Without flushing DataOutputStream some data can be left in its buffer. Flushing DataOutputStream also flushes the underlying FsCheckpointStateOutputStream.

Although in current JDK DataOutputStream doesn't buffer data, I think it's risky to rely on it.

 

> Memory threshold is ignored for channel state
> ---------------------------------------------
>
>                 Key: FLINK-17820
>                 URL: https://issues.apache.org/jira/browse/FLINK-17820
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / Task
>    Affects Versions: 1.11.0
>            Reporter: Roman Khachatryan
>            Assignee: Roman Khachatryan
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>
> Config parameter state.backend.fs.memory-threshold is ignored for channel state. Causing each subtask to have a file per checkpoint. Regardless of the size of channel state (of this subtask).
> This also causes slow cleanup and delays the next checkpoint.
>  
> The problem is that {{ChannelStateCheckpointWriter.finishWriteAndResult}} calls flush(); which actually flushes the data on disk.
>  
> From FSDataOutputStream.flush Javadoc:
> A completed flush does not mean that the data is necessarily persistent. Data persistence can is only assumed after calls to close() or sync().
>  
> Possible solutions:
> 1. not to flush in {{ChannelStateCheckpointWriter.finishWriteAndResult (which can lead to data loss in a wrapping stream).}}
> {{2. change }}{{FsCheckpointStateOutputStream.flush behavior}}
> {{3. wrap }}{{FsCheckpointStateOutputStream to prevent flush}}{{}}{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)