You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Vagesh Mathapati (Jira)" <ji...@apache.org> on 2020/08/13 06:57:00 UTC

[jira] [Created] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

Vagesh Mathapati created KAFKA-10396:
----------------------------------------

             Summary: Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
                 Key: KAFKA-10396
                 URL: https://issues.apache.org/jira/browse/KAFKA-10396
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect, streams
    Affects Versions: 2.5.0, 2.3.1
            Reporter: Vagesh Mathapati


We are observing that overall memory of our container keep on growing and never came down.
After analysis find out that rocksdbjni.so is keep on allocating 64M chunks of memory off-heap and never releases back. This causes OOM kill after memory reaches configured limit.

We use Kafka stream and globalktable for our many kafka topics.

Below is our environment
 * Kubernetes cluster
 * openjdk 11.0.7 2020-04-14 LTS
 * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
 * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
 * Springboot 2.3
 * spring-kafka-2.5.0
 * kafka-streams-2.5.0
 * kafka-streams-avro-serde-5.4.0
 * rocksdbjni-5.18.3

Observed same result with kafka 2.3 version.

Below is the snippet of our analysis
from pmap output we took addresses from these 64M allocations (RSS)

Address Kbytes RSS Dirty Mode Mapping
00007f3ce8000000 65536 65532 65532 rw--- [ anon ]
00007f3cf4000000 65536 65536 65536 rw--- [ anon ]
00007f3d64000000 65536 65536 65536 rw--- [ anon ]

We tried to match with memory allocation logs enabled with the help of Azul systems team.

@ /tmp/librocksdbjni6564497922441568920.so:
_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] - 0x7f3ce8ff7ca0
 @ /tmp/librocksdbjni6564497922441568920.so:
_ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] - 0x7f3ce8ff9780
 @ /tmp/librocksdbjni6564497922441568920.so:
_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] - 0x7f3ce8ff9750
 @ /tmp/librocksdbjni6564497922441568920.so:
_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] - 0x7f3ce8ff97c0
 @ /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] - 0x7f3ce8ffccf0
 @ /tmp/librocksdbjni6564497922441568920.so:
_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] - 0x7f3ce8ffcd10
 @ /tmp/librocksdbjni6564497922441568920.so:
_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] - 0x7f3ce8ffccf0
 @ /tmp/librocksdbjni6564497922441568920.so:
_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] - 0x7f3ce8ffcd10


We also identified that content on this 64M is just 0s and no any data present in it.

I tried to tune the rocksDB configuratino as mentioned but it did not helped. [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]

 

Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)