You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Dave Thomas (JIRA)" <ji...@apache.org> on 2017/02/17 23:03:44 UTC
[jira] [Created] (KAFKA-4778) OOM on kafka-streams instances with
high numbers of unreaped Record classes
Dave Thomas created KAFKA-4778:
----------------------------------
Summary: OOM on kafka-streams instances with high numbers of unreaped Record classes
Key: KAFKA-4778
URL: https://issues.apache.org/jira/browse/KAFKA-4778
Project: Kafka
Issue Type: Bug
Components: consumer
Affects Versions: 0.10.1.1
Environment: AWS m3.large Ubuntu 16.04.1 LTS. rocksDB on local SSD.
Kafka has 3 zk, 5 brokers.
stream processors are run with:
-XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
Stream processors written in scala 2.11.8
Reporter: Dave Thomas
We have a stream processing app with ~8 source/sink stages operating roughly at the rate of 500k messages ingested/day (~4M across the 8 stages).
We get OOM eruptions once every ~18 hours. Note it is Linux triggering the OOM-killer, not the JVM terminating itself.
It may be worth noting that stream processing uses ~50 mbytes while processing normally for hours on end, until the problem surfaces; then in ~20-30 sec memory grows suddenly from under 100 mbytes to >1 gig and does not shrink until the process is killed.
We are using supervisor to restart the instances. Sometimes, the process dies immediately once stream processing resumes for the same reason, a process which could continue for minutes or hours. This extended window has enabled us to capture a heap dump using jmap.
jhat's histogram feature reveals the following top objects in memory:
Class Instance Count Total Size
class [B 4070487 867857833
class [Ljava.lang.Object; 2066986 268036184
class [C 539519 92010932
class [S 1003290 80263028
class [I 508208 77821516
class java.nio.HeapByteBuffer 1506943 58770777
class org.apache.kafka.common.record.Record 1506783 36162792
class org.apache.kafka.clients.consumer.ConsumerRecord 528652 35948336
class org.apache.kafka.common.record.MemoryRecords$RecordsIterator 501742 32613230
class org.apache.kafka.common.record.LogEntry 2009373 32149968
class org.xerial.snappy.SnappyInputStream 501600 20565600
class java.io.DataInputStream 501742 20069680
class java.io.EOFException 501606 20064240
class java.util.ArrayDeque 501941 8031056
class java.lang.Long 516463 4131704
Note high on the list include org.apache.kafka.common.record.Record,
org.apache.kafka.clients.consumer.ConsumerRecord,
org.apache.kafka.common.record.MemoryRecords$RecordsIterator,
org.apache.kafka.common.record.LogEntry
All of these contain 500k-1.5M instances.
There is nothing in stream processing logs that is distinctive (log levels are still at default).
Could it be references (weak, phantom, etc) causing these instances to not be garbage-collected?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)