You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ayush Verma <ay...@zalando.de> on 2018/08/09 10:09:03 UTC

Re: Heap Problem with Checkpoints

Hello Piotr, I work with Fabian and have been investigating the memory leak
associated with issues mentioned in this thread. I took a heap dump of our
master node and noticed that there was >1gb (and growing) worth of entries
in the set, /files/, in class *java.io.DeleteOnExitHook*.
Almost all the strings in this set look like,
/tmp/hadoop-root/s3a/output-*****.tmp.

This means that the checkpointing code, which uploads the data to s3,
maintains it in a temporary local file, which is supposed to be deleted on
exit of the JVM. In our case, the checkpointing is quite heavy and because
we have a long running flink cluster, it causes this /set/ to grow
unbounded, eventually cause an OOM. Please see these images:
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1624/Screen_Shot_2018-08-09_at_11.png> 
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1624/Screen_Shot_2018-08-09_at_11.png> 

The culprit seems to be *org.apache.hadoop.fs.s3a.S3AOutputStream*, which
in-turn, calls
*org.apache.hadoop.fs.s3a.S3AFileSystem.createTmpFileForWrite()*. If we
follow the method call chain from there, we end up at
*org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite()*, where we
can see the temp file being created and the method deleteOnExit() being
called.

Maybe instead of relying on *deleteOnExit()* we can keep track of these tmp
files, and as soon as they are no longer required, delete them ourself.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Heap Problem with Checkpoints

Posted by Piotr Nowojski <pi...@data-artisans.com>.
Hi,

Thanks for getting back with more information.

Apparently this is a known bug of JDK since 2003 and is still not resolved:
https://bugs.java.com/view_bug.do?bug_id=4872014 <https://bugs.java.com/view_bug.do?bug_id=4872014>
https://bugs.java.com/view_bug.do?bug_id=6664633 <https://bugs.java.com/view_bug.do?bug_id=6664633>

Code that is using this `deleteOnExit` is not part of a Flink, but an external library that we are using (hadoop-aws:2.8.x), so we can not fix it for them and this bug should be reported/forwarded to them (I have already done just that <https://issues.apache.org/jira/browse/HADOOP-15658>). More interesting S3AOutputStream is already manually deleting those files when they are not needed in `org.apache.hadoop.fs.s3a.S3AOutputStream#close`’s finally block:

} finally {
  if (!backupFile.delete()) {
    LOG.warn("Could not delete temporary s3a file: {}", backupFile);
  }
  super.close();
}

But this doesn’t remove the entry from DeleteOnExitHook. 

From what I see in the code, flink-s3-fs-presto filesystem implantation that we provide doesn’t use deleteOnExit, so if you can switch to this filesystem it would solve the problem for you.

Piotrek

> On 9 Aug 2018, at 12:09, Ayush Verma <ay...@zalando.de> wrote:
> 
> Hello Piotr, I work with Fabian and have been investigating the memory leak
> associated with issues mentioned in this thread. I took a heap dump of our
> master node and noticed that there was >1gb (and growing) worth of entries
> in the set, /files/, in class *java.io.DeleteOnExitHook*.
> Almost all the strings in this set look like,
> /tmp/hadoop-root/s3a/output-*****.tmp.
> 
> This means that the checkpointing code, which uploads the data to s3,
> maintains it in a temporary local file, which is supposed to be deleted on
> exit of the JVM. In our case, the checkpointing is quite heavy and because
> we have a long running flink cluster, it causes this /set/ to grow
> unbounded, eventually cause an OOM. Please see these images:
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1624/Screen_Shot_2018-08-09_at_11.png> 
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1624/Screen_Shot_2018-08-09_at_11.png> 
> 
> The culprit seems to be *org.apache.hadoop.fs.s3a.S3AOutputStream*, which
> in-turn, calls
> *org.apache.hadoop.fs.s3a.S3AFileSystem.createTmpFileForWrite()*. If we
> follow the method call chain from there, we end up at
> *org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite()*, where we
> can see the temp file being created and the method deleteOnExit() being
> called.
> 
> Maybe instead of relying on *deleteOnExit()* we can keep track of these tmp
> files, and as soon as they are no longer required, delete them ourself.
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/