You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Sitan Pang (Jira)" <ji...@apache.org> on 2022/08/30 06:44:00 UTC
[jira] [Created] (FLINK-29134) fetch metrics may cause oom(ThreadPool task pile up)
Sitan Pang created FLINK-29134:
----------------------------------
Summary: fetch metrics may cause oom(ThreadPool task pile up)
Key: FLINK-29134
URL: https://issues.apache.org/jira/browse/FLINK-29134
Project: Flink
Issue Type: Improvement
Components: Runtime / Metrics
Affects Versions: 1.11.0
Reporter: Sitan Pang
Attachments: dump-queueTask.png, dump-threadPool.png
When we queryMetrics we use thread pool to process the data which are returned by TMs.
{code:java}
private void queryMetrics(final MetricQueryServiceGateway queryServiceGateway) {
LOG.debug("Query metrics for {}.", queryServiceGateway.getAddress());
queryServiceGateway
.queryMetrics(timeout)
.whenCompleteAsync(
(MetricDumpSerialization.MetricSerializationResult result, Throwable t) -> {
if (t != null) {
LOG.debug("Fetching metrics failed.", t);
} else {
metrics.addAll(deserializer.deserialize(result));
}
},
executor);
} {code}
The only condition we will fetch metrics is update time is larger than updateInterval
{code:java}
public void update() {
synchronized (this) {
long currentTime = System.currentTimeMillis();
if (currentTime - lastUpdateTime > updateInterval) {
lastUpdateTime = currentTime;
fetchMetrics();
}
}
} {code}
Therefore, if we could not process the data in update-interval-time, metrics data will accumulate.
Besides, Rest handler and metrics share thread pool. When we open ui, it maybe even worse.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)