You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ChangjiGuo (Jira)" <ji...@apache.org> on 2022/10/13 11:30:00 UTC
[jira] [Comment Edited] (FLINK-28984) FsCheckpointStateOutputStream is not being released normally
[ https://issues.apache.org/jira/browse/FLINK-28984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616973#comment-17616973 ]
ChangjiGuo edited comment on FLINK-28984 at 10/13/22 11:29 AM:
---------------------------------------------------------------
Hi [~Yanfei Lei], thanks for your reply! Yes, that's what I want to express!
was (Author: changjiguo):
Hi [~Yanfei Lei], thans for your reply! Yes, that's what I want to express!
> FsCheckpointStateOutputStream is not being released normally
> ------------------------------------------------------------
>
> Key: FLINK-28984
> URL: https://issues.apache.org/jira/browse/FLINK-28984
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.6, 1.15.1
> Reporter: ChangjiGuo
> Priority: Major
> Labels: pull-request-available
> Attachments: log.png
>
>
> If the checkpoint is aborted, AsyncSnapshotCallable will close the snapshotCloseableRegistry when it is canceled. There may be two situations here:
> # The FSDataOutputStream has been created and closed while closing FsCheckpointStateOutputStream.
> # The FSDataOutputStream has not been created yet, but closed flag has been set to true. You can see this in log:
> {code:java}
> 2022-08-16 12:55:44,161 WARN org.apache.flink.core.fs.SafetyNetCloseableRegistry - Closing unclosed resource via safety-net: ClosingFSDataOutputStream(org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream@4ebe8e64) : xxxxx/flink/checkpoint/state/9214a2e302904b14baf2dc1aacbc7933/ae157c5a05a8922a46a179cdb4c86b10/shared/9d8a1e92-2f69-4ab0-8ce9-c1beb149229a {code}
> The output stream will be automatically closed by the SafetyNetCloseableRegistry but the file will not be deleted.
> The second case usually occurs when the storage system has high latency in creating files.
> How to reproduce?
> This is not easy to reproduce, but you can try to set a smaller checkpoint timeout and increase the parallelism of the flink job.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)