You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by YennieChen88 <ch...@jd.com> on 2018/08/29 13:14:11 UTC

Taskmanager process memory increasing always

Hello,
	My case is counting the number of successful login and failures within 1
hour, 10 min, 5 min, 3 min, 1 min, 10 second and 1 second, keyBy login ip or
device id. Based on previous counting results of different time dimensions,
predict the complicance of the next login.
	After varous attempts, I chose slide windows to count, e.g. 1 hour window
size with 1 min window step, 10 min widonw size with 10 second window step,
5 min window with 5 second window step... Except this, I used rocksdb as
state backend, and enabled checkpoint.
	But now encounter some problems.
	1. The RES memory of every taskmanager process is increasing all the time
and can not be stable, until the process killed without any OOM exception
log.
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1520/memory_usage.png> 
	  After several tests, I found that the process memory increase is related
to the key (ip or device id). If key values fix in a certain range,  process
memory can be stable. But if key values randomly changing, the memory
increasing. In fact, the key login ip and device id is random. We also found
that login reduces after the midnight, and the memory can be shortly stable.
But memory increases during the day. I ran a job 15 days ago, the memory is
still increasing.The key random changes, the memory increases, is it normal? 

	2. The rocksdb seems take up a lot of memory. 
	   If I changed rocksdb to file system state backend, the memory can drop
to around 30%. If there is no limit configuration, will rocksdb's used
memory increases all the time?

	3. There are some taskmanagers of the flink cluster do not run any task (no
slot be used), but the memory is also increasing linearly after the job run
several days. What do they use memory for? I have no idea.
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1520/memory_usage2.png> 

	Hope for your reply. Thank you.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Taskmanager process memory increasing always

Posted by YennieChen88 <ch...@jd.com>.
As far as I know, rocksdb mainly uses off-heap memory, which is hard to be
controlled by JVM. Maybe you can monitor off-heap memory of taskmanager
process by professional tools, such as gperftools...



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Taskmanager process memory increasing always

Posted by "Yan Zhou [FDS Science]" <yz...@coupang.com>.
I have met similar issue. Yarn kills the TaskManagers, as their memory usage grows to the limit. I think it might be rocksdb causing the problem. Is there any way to debug the memory usage of rocksdb backend?


Best

Yan

________________________________
From: YennieChen88 <ch...@jd.com>
Sent: Wednesday, August 29, 2018 6:14:11 AM
To: user@flink.apache.org
Subject: Taskmanager process memory increasing always

Hello,
        My case is counting the number of successful login and failures within 1
hour, 10 min, 5 min, 3 min, 1 min, 10 second and 1 second, keyBy login ip or
device id. Based on previous counting results of different time dimensions,
predict the complicance of the next login.
        After varous attempts, I chose slide windows to count, e.g. 1 hour window
size with 1 min window step, 10 min widonw size with 10 second window step,
5 min window with 5 second window step... Except this, I used rocksdb as
state backend, and enabled checkpoint.
        But now encounter some problems.
        1. The RES memory of every taskmanager process is increasing all the time
and can not be stable, until the process killed without any OOM exception
log.
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1520/memory_usage.png>
          After several tests, I found that the process memory increase is related
to the key (ip or device id). If key values fix in a certain range,  process
memory can be stable. But if key values randomly changing, the memory
increasing. In fact, the key login ip and device id is random. We also found
that login reduces after the midnight, and the memory can be shortly stable.
But memory increases during the day. I ran a job 15 days ago, the memory is
still increasing.The key random changes, the memory increases, is it normal?

        2. The rocksdb seems take up a lot of memory.
           If I changed rocksdb to file system state backend, the memory can drop
to around 30%. If there is no limit configuration, will rocksdb's used
memory increases all the time?

        3. There are some taskmanagers of the flink cluster do not run any task (no
slot be used), but the memory is also increasing linearly after the job run
several days. What do they use memory for? I have no idea.
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1520/memory_usage2.png>

        Hope for your reply. Thank you.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/