You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ChangZhuo Chen (陳昌倬 Jira)" <ji...@apache.org> on 2020/02/25 02:42:00 UTC
[jira] [Updated] (FLINK-16267) Flink uses more memory than
taskmanager.memory.process.size in Kubernetes
[ https://issues.apache.org/jira/browse/FLINK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ChangZhuo Chen (陳昌倬) updated FLINK-16267:
-----------------------------------------
Description:
This issue is from [https://stackoverflow.com/questions/60336764/flink-uses-more-memory-than-taskmanager-memory-process-size-in-kubernetes]
In Flink 1.10.0, we try to use `taskmanager.memory.process.size` to limit the resource used by taskmanager to ensure they are not killed by Kubernetes. However, we still get lots of taskmanager `OOMKilled` with the following setup.
* The Kubernetes setup is the same as described in [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html].
* The following is resource configuration for taskmanager deployment in Kubernetes:
{{resources:}}
{{ requests:}}
{{ cpu: 1000m}}
{{ memory: 4096Mi}}
{{ limits:}}
{{ cpu: 1000m}}
{{ memory: 4096Mi}}
* The following are all memory related configurations in `flink-conf.yaml` in 1.10.0:
{{jobmanager.heap.size: 820m}}
{{taskmanager.memory.jvm-metaspace.size: 128m}}
{{taskmanager.memory.process.size: 4096m}}
* We use RocksDB and we don't set `state.backend.rocksdb.memory.managed` in `flink-conf.yaml`.
** Use S3 as checkpoint storage.
* The code uses DateStream API
** input/output are both Kafka.
* The following is our dependencies FYI.
{{val flinkVersion = "1.10.0"}}{{libraryDependencies += "com.squareup.okhttp3" % "okhttp" % "4.2.2"}}
{{libraryDependencies += "com.typesafe" % "config" % "1.4.0"}}
{{libraryDependencies += "joda-time" % "joda-time" % "2.10.5"}}
{{libraryDependencies += "org.apache.flink" %% "flink-connector-kafka" % flinkVersion}}
{{libraryDependencies += "org.apache.flink" % "flink-metrics-dropwizard" % flinkVersion}}
{{libraryDependencies += "org.apache.flink" %% "flink-scala" % flinkVersion % "provided"}}
{{libraryDependencies += "org.apache.flink" %% "flink-statebackend-rocksdb" % flinkVersion % "provided"}}
{{libraryDependencies += "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided"}}
{{libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.7"}}
{{libraryDependencies += "org.log4s" %% "log4s" % "1.8.2"}}
{{libraryDependencies += "org.rogach" %% "scallop" % "3.3.1"}}
* The configuration we used in Flink 1.9.1 are the following. It does not have `OOMKilled`.
* Kubernetes
{{resources:}}
{{ requests:}}
{{ cpu: 1200m}}
{{ memory: 2G}}
{{ limits:}}
{{ cpu: 1500m}}
{{ memory: 2G}}
* Flink 1.9.1
{{jobmanager.heap.size: 820m}}
{{taskmanager.heap.size: 1024m}}
was:
This issue is from [https://stackoverflow.com/questions/60336764/flink-uses-more-memory-than-taskmanager-memory-process-size-in-kubernetes]
In Flink 1.10.0, we try to use `taskmanager.memory.process.size` to limit the resource used by taskmanager to ensure they are not killed by Kubernetes. However, we still get lots of taskmanager `OOMKilled` with the following setup.
* The Kubernetes setup is the same as described in https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html.
* The following is resource configuration for taskmanager deployment in Kubernetes:
{{resources:}}
{{ requests:}}
{{ cpu: 1000m}}
{{ memory: 4096Mi}}
{{ limits:}}
{{ cpu: 1000m}}
{{ memory: 4096Mi}}
* The following are all memory related configurations in `flink-conf.yaml` in 1.10.0:
{{jobmanager.heap.size: 820m}}
{{taskmanager.memory.jvm-metaspace.size: 128m}}
{{taskmanager.memory.process.size: 4096m}}
* We use RocksDB and we don't set `state.backend.rocksdb.memory.managed` in `flink-conf.yaml`.
** Use S3 as checkpoint storage.
* The code uses DateStream API
** input/output are both Kafka.
* The following is our dependencies FYI.
{{val flinkVersion = "1.10.0"}}{{libraryDependencies += "com.squareup.okhttp3" % "okhttp" % "4.2.2"}}
{{libraryDependencies += "com.typesafe" % "config" % "1.4.0"}}
{{libraryDependencies += "joda-time" % "joda-time" % "2.10.5"}}
{{libraryDependencies += "org.apache.flink" %% "flink-connector-kafka" % flinkVersion}}
{{libraryDependencies += "org.apache.flink" % "flink-metrics-dropwizard" % flinkVersion}}
{{libraryDependencies += "org.apache.flink" %% "flink-scala" % flinkVersion % "provided"}}
{{libraryDependencies += "org.apache.flink" %% "flink-statebackend-rocksdb" % flinkVersion % "provided"}}
{{libraryDependencies += "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided"}}
{{libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.7"}}
{{libraryDependencies += "org.log4s" %% "log4s" % "1.8.2"}}
{{libraryDependencies += "org.rogach" %% "scallop" % "3.3.1"}}
* The configuration we used in Flink 1.9.1 are the following. It does not have `OOMKilled`.
* Kubernetes
{{resources:}}
{{ requests:}}
{{ cpu: 1200m}}
{{ memory: 2G}}
{{ limits:}}
{{ cpu: 1500m}}
{{ memory: 2G}}
* Flink 1.9.1
{{jobmanager.heap.size: 820m}}
{{taskmanager.heap.size: 1024m}}
> Flink uses more memory than taskmanager.memory.process.size in Kubernetes
> -------------------------------------------------------------------------
>
> Key: FLINK-16267
> URL: https://issues.apache.org/jira/browse/FLINK-16267
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Task
> Affects Versions: 1.10.0
> Environment: * Dockerized Flink 1.10.0, with the following docker file.
> {{FROM flink:1.10-scala_2.11}}
> {{RUN mkdir -p /opt/flink/plugins/s3 && \}}
> {{ ln -s /opt/flink/opt/flink-s3-fs-presto-1.10.0.jar /opt/flink/plugins/s3/}}
> {{RUN ln -s /opt/flink/opt/flink-metrics-prometheus-1.10.0.jar /opt/flink/lib/}}
> Reporter: ChangZhuo Chen (陳昌倬)
> Priority: Major
> Attachments: oomkilled_taskmanager.log
>
>
> This issue is from [https://stackoverflow.com/questions/60336764/flink-uses-more-memory-than-taskmanager-memory-process-size-in-kubernetes]
> In Flink 1.10.0, we try to use `taskmanager.memory.process.size` to limit the resource used by taskmanager to ensure they are not killed by Kubernetes. However, we still get lots of taskmanager `OOMKilled` with the following setup.
> * The Kubernetes setup is the same as described in [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html].
> * The following is resource configuration for taskmanager deployment in Kubernetes:
> {{resources:}}
> {{ requests:}}
> {{ cpu: 1000m}}
> {{ memory: 4096Mi}}
> {{ limits:}}
> {{ cpu: 1000m}}
> {{ memory: 4096Mi}}
> * The following are all memory related configurations in `flink-conf.yaml` in 1.10.0:
> {{jobmanager.heap.size: 820m}}
> {{taskmanager.memory.jvm-metaspace.size: 128m}}
> {{taskmanager.memory.process.size: 4096m}}
> * We use RocksDB and we don't set `state.backend.rocksdb.memory.managed` in `flink-conf.yaml`.
> ** Use S3 as checkpoint storage.
> * The code uses DateStream API
> ** input/output are both Kafka.
> * The following is our dependencies FYI.
> {{val flinkVersion = "1.10.0"}}{{libraryDependencies += "com.squareup.okhttp3" % "okhttp" % "4.2.2"}}
> {{libraryDependencies += "com.typesafe" % "config" % "1.4.0"}}
> {{libraryDependencies += "joda-time" % "joda-time" % "2.10.5"}}
> {{libraryDependencies += "org.apache.flink" %% "flink-connector-kafka" % flinkVersion}}
> {{libraryDependencies += "org.apache.flink" % "flink-metrics-dropwizard" % flinkVersion}}
> {{libraryDependencies += "org.apache.flink" %% "flink-scala" % flinkVersion % "provided"}}
> {{libraryDependencies += "org.apache.flink" %% "flink-statebackend-rocksdb" % flinkVersion % "provided"}}
> {{libraryDependencies += "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided"}}
> {{libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.7"}}
> {{libraryDependencies += "org.log4s" %% "log4s" % "1.8.2"}}
> {{libraryDependencies += "org.rogach" %% "scallop" % "3.3.1"}}
> * The configuration we used in Flink 1.9.1 are the following. It does not have `OOMKilled`.
> * Kubernetes
> {{resources:}}
> {{ requests:}}
> {{ cpu: 1200m}}
> {{ memory: 2G}}
> {{ limits:}}
> {{ cpu: 1500m}}
> {{ memory: 2G}}
> * Flink 1.9.1
> {{jobmanager.heap.size: 820m}}
> {{taskmanager.heap.size: 1024m}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)