You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/08/23 08:21:00 UTC

[jira] [Commented] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

    [ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589897#comment-16589897 ] 

ASF GitHub Bot commented on FLINK-9693:
---------------------------------------

TisonKun commented on issue #6251: [FLINK-9693] Set Execution#taskRestore to null after deployment
URL: https://github.com/apache/flink/pull/6251#issuecomment-415333090
 
 
   Thanks till, this saves my weekend :-)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Possible memory leak in jobmanager retaining archived checkpoints
> -----------------------------------------------------------------
>
>                 Key: FLINK-9693
>                 URL: https://issues.apache.org/jira/browse/FLINK-9693
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager, State Backends, Checkpointing
>    Affects Versions: 1.5.0, 1.6.0
>         Environment: !image.png!!image (1).png!
>            Reporter: Steven Zhen Wu
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.3, 1.5.3, 1.6.1, 1.7.0
>
>         Attachments: 20180725_jm_mem_leak.png, 41K_ExecutionVertex_objs_retained_9GB.png, ExecutionVertexZoomIn.png
>
>
> First, some context about the job
>  * Flink 1.4.1
>  * stand-alone deployment mode
>  * embarrassingly parallel: all operators are chained together
>  * parallelism is over 1,000
>  * stateless except for Kafka source operators. checkpoint size is 8.4 MB.
>  * set "state.backend.fs.memory-threshold" so that only jobmanager writes to S3 to checkpoint
>  * internal checkpoint with 10 checkpoints retained in history
>  
> Summary of the observations
>  * 41,567 ExecutionVertex objects retained 9+ GB of memory
>  * Expanded in one ExecutionVertex. it seems to storing the kafka offsets for source operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)