You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yun Gao (Jira)" <ji...@apache.org> on 2022/04/13 06:28:07 UTC
[jira] [Updated] (FLINK-25027) Allow GC of a finished job's JobMaster before the slot timeout is reached
[ https://issues.apache.org/jira/browse/FLINK-25027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yun Gao updated FLINK-25027:
----------------------------
Fix Version/s: 1.16.0
> Allow GC of a finished job's JobMaster before the slot timeout is reached
> -------------------------------------------------------------------------
>
> Key: FLINK-25027
> URL: https://issues.apache.org/jira/browse/FLINK-25027
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.14.0, 1.12.5, 1.13.3
> Reporter: Nico Kruber
> Assignee: Shammon
> Priority: Major
> Fix For: 1.15.0, 1.16.0
>
> Attachments: image-2021-11-23-20-32-20-479.png
>
>
> In a session cluster, after a (batch) job is finished, the JobMaster seems to stick around for another couple of minutes before being eligible for garbage collection.
> Looking into a heap dump, it seems to be tied to a {{PhysicalSlotRequestBulkCheckerImpl}} which is enqueued in the underlying Akka executor (and keeps the JM from being GC’d). Per default the action is scheduled for {{slot.request.timeout}} that defaults to 5 min (thanks [~trohrmann] for helping out here)
> !image-2021-11-23-20-32-20-479.png!
> With this setting, you will have to account for enough metaspace to cover 5 minutes of time which may span a couple of jobs, needlessly!
> The problem seems to be that Flink is using the main thread executor for the scheduling that uses the {{ActorSystem}}'s scheduler and the future task scheduled with Akka can (probably) not be easily cancelled.
> One idea could be to use a dedicated thread pool per JM, that we shut down when the JM terminates. That way we would not keep the JM from being GC’d.
> (The concrete example we investigated was a DataSet job)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)