You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Marouane RAJI (JIRA)" <ji...@apache.org> on 2019/07/01 08:55:00 UTC
[jira] [Created] (SAMZA-2265) Memory leak potentially due to Kafka
Checkpoint Management
Marouane RAJI created SAMZA-2265:
------------------------------------
Summary: Memory leak potentially due to Kafka Checkpoint Management
Key: SAMZA-2265
URL: https://issues.apache.org/jira/browse/SAMZA-2265
Project: Samza
Issue Type: Bug
Affects Versions: 1.0, 1.1
Environment:
```
job.container.count : 110
yarn.container.memory.mb=4000
yarn.container.cpu.cores=8
yarn.am.container.cpu.cores=8
yarn.am.container.memory.mb=1024
task.opts=-Xmx2800M
task.checkpoint.replication.factor=2
```
Reporter: Marouane RAJI
Attachments: image-2019-07-01-09-47-11-241.png, image-2019-07-01-09-48-45-876.png, image-2019-07-01-09-50-04-693.png
Hi,
We recently upgraded one of our high throughput samza jobs from 0.13.1 to 1.0 then to 1.1. It seems that in both later versions we would have a memory leak. This ever-increasing memory would lead to containers failing/ yarn restarting them.
It is worth noticing that we upgraded other smaller (in container specs and throughput) samza jobs without any issues.
specs about job :
* reading ~70k msg/sec
* 211 input topic , including one broadcasting one (2 msg/day, used for config updates)
* 1 output topic.
Below, memory consumption in both versions for one container
!image-2019-07-01-09-47-11-241.png!
Heap-dumps comparison:
!image-2019-07-01-09-48-45-876.png!
The difference between both version keep increasing slowly, the main cause of that in the increase in byte[]
In the 1.0 and 1.1 version the main reference holding these bytes seems to be KafkaCheckpointManager:
!image-2019-07-01-09-50-04-693.png!
Could this PR solves this issues [https://github.com/apache/samza/pull/993] ? as, we would be releasing KafkaConsumer used for checkpointing ?
Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)