You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Andrey Zagrebin <az...@apache.org> on 2020/03/03 15:35:52 UTC

[Survey] Default size for the new JVM Metaspace limit in 1.10

Hi All,

Recently, FLIP-49 [1] introduced the new JVM Metaspace limit in the 1.10
release [2]. Flink scripts, which start the task manager JVM process, set
this limit by adding the corresponding JVM argument. This has been done to
properly plan resources. especially to derive container size for
Yarn/Mesos/Kubernetes. Also, it should surface potential class loading
leaks. There is an option to change it:
'taskmanager.memory.jvm-metaspace.size' [3]. Its current default value is
96Mb.

This change led to 'OutOfMemoryError: Metaspace' in certain cases after
upgrading to 1.10 version. In some cases, a class loading leak has been
detected [4] and has to be investigated on its own. In other cases, just
increasing the option value helped because the default value was not
enough, presumably, due to the job specifics. In general, the required
Metaspace size depends on the job and there is no default value to cover
all cases. There is an issue to improve docs for this concern [5].

This survey is to come up with the most reasonable default value for this
option. If you have encountered this issue and increasing the Metaspace
size helped (there is no class loading leak), please, report any specifics
of your job, if you think it is relevant for this concern, and the option
value that resolved it. There is also a dedicated Jira issue [6] for
reporting.

Thanks,
Andrey

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#jvm-parameters
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-jvm-metaspace-size
[4] https://issues.apache.org/jira/browse/FLINK-16142
[5] https://issues.apache.org/jira/browse/FLINK-16278
[6] https://jira.apache.org/jira/browse/FLINK-16406

Re: [Survey] Default size for the new JVM Metaspace limit in 1.10

Posted by Andrey Zagrebin <az...@apache.org>.
Hi all,

Bumping this topic. Poll about:

*Increasing default JVM Metaspace size from 96Mb to 256Mb and*
*Existing Flink 1.10 setups with small process memory size (~1GB)*

The community discusses 1.10.1 bugfix release and whether to increase the
default size for the JVM Metaspace size.
So far increasing this setting from 96Mb to 256Mb helped in all reported
cases where the default value of 96m was not enough.

Increasing the default value can affect already existing Flink 1.10 setups,
especially the case where the process memory size is explicitly set to some
relatively small value, e.g. around 1GB,
but the JVM Metaspace is not. This can lead to the decreased size of the
Flink memory and all its components, e.g. JVM heap and managed memory.

The question is how many important setups like this (with small process
memory size) already exist to investigate how badly they will be affected
by the suggested change.
Any feedback is appreciated.

Best,
Andrey

On Tue, Mar 3, 2020 at 6:35 PM Andrey Zagrebin <az...@apache.org> wrote:

> Hi All,
>
> Recently, FLIP-49 [1] introduced the new JVM Metaspace limit in the 1.10
> release [2]. Flink scripts, which start the task manager JVM process, set
> this limit by adding the corresponding JVM argument. This has been done to
> properly plan resources. especially to derive container size for
> Yarn/Mesos/Kubernetes. Also, it should surface potential class loading
> leaks. There is an option to change it:
> 'taskmanager.memory.jvm-metaspace.size' [3]. Its current default value is
> 96Mb.
>
> This change led to 'OutOfMemoryError: Metaspace' in certain cases after
> upgrading to 1.10 version. In some cases, a class loading leak has been
> detected [4] and has to be investigated on its own. In other cases, just
> increasing the option value helped because the default value was not
> enough, presumably, due to the job specifics. In general, the required
> Metaspace size depends on the job and there is no default value to cover
> all cases. There is an issue to improve docs for this concern [5].
>
> This survey is to come up with the most reasonable default value for this
> option. If you have encountered this issue and increasing the Metaspace
> size helped (there is no class loading leak), please, report any specifics
> of your job, if you think it is relevant for this concern, and the option
> value that resolved it. There is also a dedicated Jira issue [6] for
> reporting.
>
> Thanks,
> Andrey
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#jvm-parameters
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-jvm-metaspace-size
> [4] https://issues.apache.org/jira/browse/FLINK-16142
> [5] https://issues.apache.org/jira/browse/FLINK-16278
> [6] https://jira.apache.org/jira/browse/FLINK-16406
>