You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Kaare Nilsen (Jira)" <ji...@apache.org> on 2020/03/10 11:47:00 UTC
[jira] [Created] (KAFKA-9690) MemoryLeak in JMX Reporter
Kaare Nilsen created KAFKA-9690:
-----------------------------------
Summary: MemoryLeak in JMX Reporter
Key: KAFKA-9690
URL: https://issues.apache.org/jira/browse/KAFKA-9690
Project: Kafka
Issue Type: Bug
Components: consumer
Affects Versions: 2.4.0
Reporter: Kaare Nilsen
Attachments: image-2020-03-10-12-37-49-259.png, image-2020-03-10-12-44-11-688.png
We use kafka in a streamin http application creating a new consumer for each incoming requests. In version 2.4.0 we experience that the memory builds up for each new consumer. After debugging the issue after a memory dump revealed it was in the JMX subsystem we found that one of the JMX beans (kafka.consumer) build up one metric consumer-metrics without releasing them on closing the consumer.
What we found is that the metricRemoval
{code:java}
public void metricRemoval(KafkaMetric metric) {
synchronized (LOCK) {
MetricName metricName = metric.metricName();
String mBeanName = getMBeanName(prefix, metricName);
KafkaMbean mbean = removeAttribute(metric, mBeanName);
if (mbean != null) {
if (mbean.metrics.isEmpty()) {
unregister(mbean);
mbeans.remove(mBeanName);
} else
reregister(mbean);
}
}
}
{code}
The check mbean.metrics.isEmpty() for this particular metric never yielded true so the mbean was never removed. Thus building up the mbeans HashMap.
The metrics that is not released are:
{code:java}
last-poll-seconds-ago
poll-idle-ratio-avg")
time-between-poll-avg
time-between-poll-max
{code}
I have a workaround in my code now by having a modified JMXReporter in my pwn project with the following close method
{code:java}
public void close() {
synchronized (LOCK) {
for (KafkaMbean mbean : this.mbeans.values()) {
mbean.removeAttribute("last-poll-seconds-ago");
mbean.removeAttribute("poll-idle-ratio-avg");
mbean.removeAttribute("time-between-poll-avg");
mbean.removeAttribute("time-between-poll-max");
unregister(mbean);
}
}
}
{code}
This will remove the attributes that are not cleaned up and prevent the memory leakage, but I have not found the root casue.
Another workaround is to use kafka client 2.3.1
this is how it looks in the jmx console after a couple of clients have connected and disconnected. Here you can see that the one metric builds up and the old ones have the four attributes that makes the unregister fail.
!image-2020-03-10-12-37-49-259.png!
dThis Is how it looks after a while in kafka client 2.3.1
!image-2020-03-10-12-44-11-688.png!
As you can see no leakage here.
I suspect this pull request to be the one that have introduced the leak:
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-517%3A+Add+consumer+metrics+to+observe+user+poll+behavior]
https://issues.apache.org/jira/browse/KAFKA-8874
--
This message was sent by Atlassian Jira
(v8.3.4#803005)