You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/12/14 12:13:03 UTC

[GitHub] [flink] 1996fanrui commented on pull request #20689: [FLINK-28984][runtime] Fix the problem that FsCheckpointStateOutputStream is not being released normally

1996fanrui commented on PR #20689:
URL: https://github.com/apache/flink/pull/20689#issuecomment-1351217015

   Hi @Myasuka , thanks a lot for your review, I want to add more information here, please help take a look in your free time, thanks!
   
   This problem also occurred in our production environment. The shared directory of a flink job has more than 1 million files. It exceeded the hdfs upper limit, causing new files not to be written. 
   
   However only 50k files are available, the other 950k files should be cleaned up.
   
   <img width="1670" alt="image" src="https://user-images.githubusercontent.com/38427477/207588272-dda7ba69-c84c-4372-aeb4-c54657b9b956.png">
   
   <img width="1451" alt="image" src="https://user-images.githubusercontent.com/38427477/207589898-7b8f6c1b-8947-4fa1-843a-c7e7103aa755.png">
   
   
   ## I want to express the root cause again:
   
   Async thread is creating outputStream(`FsCheckpointStateOutputStream#flushToFile -> createStream`), and the response of hdfs may be slow. At this same time, the task thread calls `FsCheckpointStateOutputStream#close`, outputStream and statePath are null, so outputStream will not be closed and statePath will not be cleaned up.
   
   When the Async thread ends, FileSystemSafetyNet will close the outputStream without cleaning it up. so, it will be kept forever.
   
   ## How to reproduce?
   
   I added some delay inside the `createStream` and turn down the checkpoint timeout, it's easy to reproduce this bug. It will keep too many files to the hdfs forever.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org