You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Kaare Nilsen (Jira)" <ji...@apache.org> on 2020/03/10 11:47:00 UTC

[jira] [Created] (KAFKA-9690) MemoryLeak in JMX Reporter

Kaare Nilsen created KAFKA-9690:
-----------------------------------

             Summary: MemoryLeak in JMX Reporter
                 Key: KAFKA-9690
                 URL: https://issues.apache.org/jira/browse/KAFKA-9690
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 2.4.0
            Reporter: Kaare Nilsen
         Attachments: image-2020-03-10-12-37-49-259.png, image-2020-03-10-12-44-11-688.png

We use kafka in a streamin http application creating a new consumer for each incoming requests. In version 2.4.0 we experience that the memory builds up for each new consumer. After debugging the issue after a memory dump revealed it was in the JMX subsystem we found that one of the JMX beans (kafka.consumer) build up one metric consumer-metrics without releasing them on closing the consumer.

What we found is that the metricRemoval  
{code:java}
public void metricRemoval(KafkaMetric metric) {
    synchronized (LOCK) {
        MetricName metricName = metric.metricName();
        String mBeanName = getMBeanName(prefix, metricName);
        KafkaMbean mbean = removeAttribute(metric, mBeanName);
        if (mbean != null) {
            if (mbean.metrics.isEmpty()) {
                unregister(mbean);
                mbeans.remove(mBeanName);
            } else
                reregister(mbean);
        }
    }
}
{code}
The check mbean.metrics.isEmpty() for this particular metric never yielded true so the mbean was never removed. Thus building up the mbeans HashMap.

The metrics that is not released are:
{code:java}
last-poll-seconds-ago
poll-idle-ratio-avg")
time-between-poll-avg
time-between-poll-max
{code}
I have a workaround in my code now by having a modified JMXReporter in my pwn project with the following close method
{code:java}
public void close() {
    synchronized (LOCK) {
        for (KafkaMbean mbean : this.mbeans.values()) {
            mbean.removeAttribute("last-poll-seconds-ago");
            mbean.removeAttribute("poll-idle-ratio-avg");
            mbean.removeAttribute("time-between-poll-avg");
            mbean.removeAttribute("time-between-poll-max");
            unregister(mbean);
        }
    }
}
{code}
This will remove the attributes that are not cleaned up and prevent the memory leakage, but I have not found the root casue.
Another workaround is to use kafka client 2.3.1

 

this is how it looks in the jmx console after a couple of clients have connected and disconnected. Here you can see that the one metric builds up and the old ones have the four attributes that makes the unregister fail.

 

!image-2020-03-10-12-37-49-259.png!

 

dThis Is how it looks after a while in kafka client 2.3.1
!image-2020-03-10-12-44-11-688.png!

As you can see no leakage here.

I suspect this pull request to be the one that have introduced the leak: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-517%3A+Add+consumer+metrics+to+observe+user+poll+behavior]

https://issues.apache.org/jira/browse/KAFKA-8874



--
This message was sent by Atlassian Jira
(v8.3.4#803005)