You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Jin Tianfan (JIRA)" <ji...@apache.org> on 2018/11/06 12:56:00 UTC
[jira] [Commented] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

    [ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676724#comment-16676724 ] 

Jin Tianfan commented on KAFKA-3980:
------------------------------------

[~ijuma] [~rsivaram]

I meet the same problem.My kafka broker version kafka_2.11-0.11.0.0. we use jmap -histo:live pId,this is the output:  

--------------------------------------------------------------------------------------------------------------

num #instances #bytes class name
----------------------------------------------
 1: 10468626 1241525032 [C
 2: 5186139 448500296 [Ljava.util.HashMap$Node;
 3: 10468427 251242248 java.lang.String
 4: 10389530 249348720 javax.management.ObjectName$Property
 5: 10372951 249121296 [Ljavax.management.ObjectName$Property;
 6: 5179185 248600880 java.util.HashMap
 7: 5186476 207459040 javax.management.ObjectName
 8: 5240552 167697664 java.util.HashMap$Node
 9: 6302 139607712 [B
 10: 5176173 124228152 org.apache.kafka.common.metrics.JmxReporter$KafkaMbean
 11: 5176210 82819360 java.util.HashMap$EntrySet
 12: 90003 2160072 java.util.concurrent.ConcurrentSkipListMap$Node
 13: 84784 2034816 java.lang.Double
 14: 45662 1461184 java.util.concurrent.ConcurrentHashMap$Node
 15: 25106 1244408 [Ljava.lang.Object;
 16: 43453 1042872 java.util.concurrent.ConcurrentSkipListMap$Index
 17: 18418 736720 java.util.LinkedHashMap$Entry

--------------------------------------------------------------------------------------------------------------------------

our jvm info as below:

/data/program/java/bin/java -Xmx4G -Xms4G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/data/program/kafka/kafka_2.11-0.11.0.0/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/data/program/kafka/kafka_2.11-0.11.0.0/bin/../logs -Dlog4j.configuration=

 

we found too many metrics data in our borker memory.The broker has running healthly nearly one year.I review the code,only  ReplicationQuotaManager & ClientQuotaManager set sensor expire time 1hour.other sensors set sensor expire time Long.MAX_VALUE. Is this cause too many metrics in my heap? if you need i will send you my dump file.

I hope to receive your reply as soon as possible.

 

 

 

> JmxReporter uses excessive memory causing OutOfMemoryException
> --------------------------------------------------------------
>
>                 Key: KAFKA-3980
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3980
>             Project: Kafka
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.9.0.1
>            Reporter: Andrew Jorgensen
>            Priority: Major
>
> I have some nodes in a kafka cluster that occasionally will run out of memory whenever I restart the producers. I was able to take a heap dump from both a recently restarted Kafka node which weighed in at about 20 MB and a node that has been running for 2 months is using over 700MB of memory. Looking at the heap dump it looks like the JmxReporter is holding on to metrics and causing them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the producers it will cause the node to experience an Java heap space exception and OOM. The nodes  then fail to startup correctly and write a -1 as the leader number to the partitions they were responsible for effectively resetting the offset and rendering that partition unavailable. The kafka process then needs to go be restarted in order to re-assign the node to the partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)