You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Shashwat Rastogi <sh...@reflektion.com> on 2017/07/27 09:29:05 UTC

Can someone explain memory usage in a flink worker?

Hi,

I have a setup of 7 task managers, each with 64GB of physical memory out of which I have allocated 35GB as task manager’s heap memory. I am using rocksdb as state backend.

I see a lot of anomalies in the reports generated by the Flink-UI vs my system metrics. Can someone please help explaining what is happening here.

- Why does the flink-ui shows 35 GB as free memory when the system has currently occupied 48.4 GB of memory, which leaves only (62.5-48.4)14.1 GB free. 
- Where is the memory used by RocksDb displayed? The machine does not do anything except serving as the flink worker, so can I assume that 62.5 GB - 35 GB - 14.1 Gb - 523MB = 12.9 GB is the memory used by RocksDb? When I do `ps -ef | grep rocksdb` I don’t see any process running, is this normal?
- Also, my system metrics shows that the memory usage keeps on increasing until the task-manager itself gets killed <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-Leak-Flink-RocksDB-td14439.html> but when I see in the Flink-Ui I always see lot of free memory. I am using the default configuration so I don’t think flink managed memory occupies non-heap memory. I am not able to figure out where does this ever-increasing memory consumption is coming from, my guess is this is used by RocksDb.

FLINK-UI




SYSTEM METRICS




Thanks in advance.
Shashwat

Re: Can someone explain memory usage in a flink worker?

Posted by Stephan Ewen <se...@apache.org>.
Hi Shashwat!

The issues and options mentioned by Flavio have nothing to do with the
above reported situation. There may be some issue in Netty, but at this
point it might as well be that a library or input format used in the user
code has a memory leak, so not sure if we can blame Netty there, yet.

Here are some other thoughts:

  - The "Free Memory" is a wrong label (you are probably using an older
version of Flink), it is now correctly called "JVM Heap Size". Hence the
confusion ;-)

  - There is no separate RocksDB process - RocksDB is an embedded library,
its memory consumption contributes to the JVM process memory size, but not
to the heap (and also not the "direct" memory). It simply consumes native
process memory. There is no easy way to limit that, so RocksDB may grow
indefinitely, if you allow it to.

  - What you can do is limit the size that RocksDB may take, have a look at
the following two classes to see how to configure the RocksDB memory
footprint:

    => org.apache.flink.contrib.streaming.state.PredefinedOptions
    => org.apache.flink.contrib.streaming.state.OptionsFactory

  - The managed memory of Flink is used in Batch (DataSet) - in Streaming
(DataStream), RocksDB is effectively the managed off-heap memory. As you
experienced, RocksDB has a bit of a memory behavior of its own.
   We are looking to auto-configure that better, see:
https://issues.apache.org/jira/browse/FLINK-7289


Hope that helps,
Stephan



On Thu, Jul 27, 2017 at 12:21 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> I also faced annoying problems with Flink memory and TMs killed by the OS
> because of OOM.
> To limit somehow the memory consumption of TM *on a single job* I do the
> following:
>
> Add to flink.yaml
>
>    - env.java.opts: -Dio.netty.recycler.maxCapacity.default=1
>
> Edit taskmanager.sh
>
>    - and change TM_MAX_OFFHEAP_SIZE from 8388607T to 5g
>
>
> PROBABLY the unexpected memory consumption is causeb by Netty and this
> allow a single job to terminate without being killed by the OS.
> However, the TM memory continuosly grows one job after the other..it seems
> that Flink doesn't free all the memory somehow (but I don't know where).
>
> I hope this helps,
> Flavio
>
> On Thu, Jul 27, 2017 at 11:29 AM, Shashwat Rastogi <
> shashwat.rastogi@reflektion.com> wrote:
>
>> Hi,
>>
>> I have a setup of 7 task managers, each with 64GB of physical memory out
>> of which I have allocated 35GB as task manager’s heap memory. I am using
>> rocksdb as state backend.
>>
>> I see a lot of anomalies in the reports generated by the Flink-UI vs my
>> system metrics. Can someone please help explaining what is happening here.
>>
>> - Why does the flink-ui shows 35 GB as free memory when the system has
>> currently occupied 48.4 GB of memory, which leaves only (62.5-48.4)14.1 GB
>> free.
>> - Where is the memory used by RocksDb displayed? The machine does not do
>> anything except serving as the flink worker, so can I assume that 62.5 GB -
>> 35 GB - 14.1 Gb - 523MB = 12.9 GB is the memory used by RocksDb? When I do `*ps
>> -ef | grep rocksdb` *I don’t see any process running, is this normal?
>> - Also, my system metrics shows that the memory usage keeps on increasing until
>> the task-manager itself gets killed
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-Leak-Flink-RocksDB-td14439.html> but
>> when I see in the Flink-Ui I always see lot of free memory. I am using the
>> default configuration so I don’t think flink managed memory occupies
>> non-heap memory. I am not able to figure out where does this
>> ever-increasing memory consumption is coming from, my guess is this is used
>> by RocksDb.
>>
>> *FLINK-UI*
>>
>>
>>
>> *SYSTEM METRICS*
>>
>>
>>
>> Thanks in advance.
>> Shashwat
>>
>
>

Re: Can someone explain memory usage in a flink worker?

Posted by Flavio Pompermaier <po...@okkam.it>.
I also faced annoying problems with Flink memory and TMs killed by the OS
because of OOM.
To limit somehow the memory consumption of TM *on a single job* I do the
following:

Add to flink.yaml

   - env.java.opts: -Dio.netty.recycler.maxCapacity.default=1

Edit taskmanager.sh

   - and change TM_MAX_OFFHEAP_SIZE from 8388607T to 5g


PROBABLY the unexpected memory consumption is causeb by Netty and this
allow a single job to terminate without being killed by the OS.
However, the TM memory continuosly grows one job after the other..it seems
that Flink doesn't free all the memory somehow (but I don't know where).

I hope this helps,
Flavio

On Thu, Jul 27, 2017 at 11:29 AM, Shashwat Rastogi <
shashwat.rastogi@reflektion.com> wrote:

> Hi,
>
> I have a setup of 7 task managers, each with 64GB of physical memory out
> of which I have allocated 35GB as task manager’s heap memory. I am using
> rocksdb as state backend.
>
> I see a lot of anomalies in the reports generated by the Flink-UI vs my
> system metrics. Can someone please help explaining what is happening here.
>
> - Why does the flink-ui shows 35 GB as free memory when the system has
> currently occupied 48.4 GB of memory, which leaves only (62.5-48.4)14.1 GB
> free.
> - Where is the memory used by RocksDb displayed? The machine does not do
> anything except serving as the flink worker, so can I assume that 62.5 GB -
> 35 GB - 14.1 Gb - 523MB = 12.9 GB is the memory used by RocksDb? When I do `*ps
> -ef | grep rocksdb` *I don’t see any process running, is this normal?
> - Also, my system metrics shows that the memory usage keeps on increasing until
> the task-manager itself gets killed
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-Leak-Flink-RocksDB-td14439.html> but
> when I see in the Flink-Ui I always see lot of free memory. I am using the
> default configuration so I don’t think flink managed memory occupies
> non-heap memory. I am not able to figure out where does this
> ever-increasing memory consumption is coming from, my guess is this is used
> by RocksDb.
>
> *FLINK-UI*
>
>
>
> *SYSTEM METRICS*
>
>
>
> Thanks in advance.
> Shashwat
>