You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Udi Meiri (JIRA)" <ji...@apache.org> on 2018/11/08 02:14:00 UTC

[jira] [Commented] (BEAM-6018) Memory leak in GCSUtil.java executeBatches

    [ https://issues.apache.org/jira/browse/BEAM-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679186#comment-16679186 ] 

Udi Meiri commented on BEAM-6018:
---------------------------------

Peeking at the code in MoreExecutors.java, addDelayedShutdownHook() doesn't save the reference to the thread it creates, so it won't be easy to remove the hook.
The solution I'm looking at is to use a single persistent thread pool executeBatches.

> Memory leak in GCSUtil.java executeBatches
> ------------------------------------------
>
>                 Key: BEAM-6018
>                 URL: https://issues.apache.org/jira/browse/BEAM-6018
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.7.0
>            Reporter: Udi Meiri
>            Assignee: Udi Meiri
>            Priority: Major
>
> In streaming pipelines there are multiple calls to moveToOutputFiles (https://github.com/apache/beam/blob/42984a821b3e73aee2966d11d7fb436b5ff22b68/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L796).
> When writing to GCS, this call uses executeBatches (https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java#L551), which wraps a thread pool in MoreExecutors.getExitingExecutorService(). This wrapper introduces a DelayedShutdownHook which persists until the worker exits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)