You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Eron Wright (JIRA)" <ji...@apache.org> on 2018/05/23 23:47:00 UTC
[jira] [Updated] (FLINK-8622) flink-mesos: High memory usage of
scheduler + job manager. GC never kicks in.
[ https://issues.apache.org/jira/browse/FLINK-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eron Wright updated FLINK-8622:
--------------------------------
Fix Version/s: (was: 1.5.0)
> flink-mesos: High memory usage of scheduler + job manager. GC never kicks in.
> -----------------------------------------------------------------------------
>
> Key: FLINK-8622
> URL: https://issues.apache.org/jira/browse/FLINK-8622
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination, Mesos, ResourceManager
> Affects Versions: 1.4.0, 1.3.2
> Reporter: Bhumika Bayani
> Priority: Major
> Attachments: flink-mem-usage-graph-for-jira.png
>
>
> We are deploying a 1 job manager + 6 taskmanager flink cluster on mesos.
> We have observed that the memory usage for 'jobmanager' is high. In spite of allocating more and more memory resources to it, it hits the limit within minutes.
> We had started with 1.5 GB RAM and 1 GB heap. Currently we have allocated 4 GB RAM, 3 GB heap to jobmanager cum scheduler. We tried allocating 8GB RAM and lesser heap (i.e. same, 3GB) too. In that case also, memory graph was identical.
> As per the graph below, the scheduler almost always runs with maximum memory resources.
> !flink-mem-usage-graph-for-jira.png!
>
> Throughout the run of the scheduler, we do not see memory usage going down unless it is killed due to OOM. So inferring, garbage collection is never happening.
> We have tried using both flink versions 1.4 and 1.3 but could see same issue on both versions.
>
> Is there any way we can find out where and how memory is being used?
> Are there any flink config options for jobmanager or jvm parameters which can help us restrict the memory usage, force garbage collection, and prevent it from crash?
> Please let us know if there any resource recommendations from Flink for running Flink on mesos at scale.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)