You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Fritz Budiyanto <fb...@icloud.com> on 2017/05/20 01:12:51 UTC

Excessive stdout is causing java heap out of mem

Hi,

I notice that when I enabled DataStreamSink’s print() for debugging, (kinda excessive printing), its causing java Heap out of memory.
Possibly the Task Manager is buffering all stdout for the WebInterface? I haven’t spent time debugging it, but I wonder if this is expected where massive print will exhaust java heap, and I’m using standalone mode.

Is there a way to disable this memory logging for web interface, and just redirect stdout to file instead with file rotation?
What is the suggested method of logging ?

—
Fritz

Re: Excessive stdout is causing java heap out of mem

Posted by Robert Metzger <rm...@apache.org>.
What you can always do to reduce pressure on the heap from large state is
using the RocksDB state backend. Then, all the state will be kept on disk.

On Thu, May 25, 2017 at 7:20 AM, Fritz Budiyanto <fb...@icloud.com>
wrote:

> Hi Robert,
>
> Yes, lots of buffering in the heap. State backend is JobManager with Heap
> backend, and I disabled checkpointing to debug this issue.
>
> I found a bug in my apps during restart. On a restart, the app is reading
> Kafka from earliest offset with days of data and its getting burst of
> stream with very fast moving timestamp. My apps didn’t schedule the
> ProcessFunction timer to the latest timer due to a bug from the elements
> burst and its causing pipeline to back pressure, as a result the watermark
> is also get stuck which is causing lots of buffering.
>
> Thanks for your tips, it was helpful.
>
> —
> Fritz
>
> On May 23, 2017, at 6:02 AM, Robert Metzger <rm...@apache.org> wrote:
>
> Hi Fritz,
>
> what are you doing on your task manager?
> Are you keeping many objects on the heap in your application?
> Are you using any window operators of Flink? If so, which statebackend are
> you using?
>
>
>
> On Tue, May 23, 2017 at 7:02 AM, Fritz Budiyanto <fb...@icloud.com>
> wrote:
>
>> Hi Robert,
>>
>> Thanks Robert, I’ll start using the logger.
>>
>> I didn’t pay attention whether the error occur when I accessed the log
>> from job manager.
>> I will do that in my next test.
>>
>> Anyone has any suggestion on how to debug out of memory exception on
>> flink jm/tm ?
>>
>> —
>> Fritz
>>
>>
>> On May 22, 2017, at 12:04 PM, Robert Metzger <rm...@apache.org> wrote:
>>
>> Hi Fritz,
>>
>> The TaskManagers are not buffering all stdout for the webinterface (at
>> least I'm not aware of that). Did the error occur when accessing the log
>> from the JobManager?
>> Flinks web front end lazily loads the logs from the taskmanagers.
>>
>> The suggested method for logging is to use slf4j for logging, so the
>> following code snippets :
>>
>> import org.slf4j.Logger;
>> import org.slf4j.LoggerFactory;
>>
>> private static final Logger LOG = LoggerFactory.getLogger(MyJob.class);
>>
>> Then you can do stuff like:
>>
>> LOG.info("My log statement");
>>
>> Also, using a logging Framework will allow you to redirect the log
>> contents of your job to a separate file.
>>
>> But I'm not sure if the logging is really causing the TaskManager JVMs to
>> die ...
>>
>>
>> On Sat, May 20, 2017 at 3:12 AM, Fritz Budiyanto <fb...@icloud.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I notice that when I enabled DataStreamSink’s print() for debugging,
>>> (kinda excessive printing), its causing java Heap out of memory.
>>> Possibly the Task Manager is buffering all stdout for the WebInterface?
>>> I haven’t spent time debugging it, but I wonder if this is expected where
>>> massive print will exhaust java heap, and I’m using standalone mode.
>>>
>>> Is there a way to disable this memory logging for web interface, and
>>> just redirect stdout to file instead with file rotation?
>>> What is the suggested method of logging ?
>>>
>>> —
>>> Fritz
>>
>>
>>
>>
>
>

Re: Excessive stdout is causing java heap out of mem

Posted by Fritz Budiyanto <fb...@icloud.com>.
Hi Robert,

Yes, lots of buffering in the heap. State backend is JobManager with Heap backend, and I disabled checkpointing to debug this issue.

I found a bug in my apps during restart. On a restart, the app is reading Kafka from earliest offset with days of data and its getting burst of stream with very fast moving timestamp. My apps didn’t schedule the ProcessFunction timer to the latest timer due to a bug from the elements burst and its causing pipeline to back pressure, as a result the watermark is also get stuck which is causing lots of buffering.

Thanks for your tips, it was helpful.

—
Fritz

> On May 23, 2017, at 6:02 AM, Robert Metzger <rm...@apache.org> wrote:
> 
> Hi Fritz,
> 
> what are you doing on your task manager?
> Are you keeping many objects on the heap in your application?
> Are you using any window operators of Flink? If so, which statebackend are you using?
> 
> 
> 
> On Tue, May 23, 2017 at 7:02 AM, Fritz Budiyanto <fbudiyan@icloud.com <ma...@icloud.com>> wrote:
> Hi Robert,
> 
> Thanks Robert, I’ll start using the logger. 
> 
> I didn’t pay attention whether the error occur when I accessed the log from job manager.
> I will do that in my next test.
> 
> Anyone has any suggestion on how to debug out of memory exception on flink jm/tm ?
> 
> —
> Fritz
> 
> 
>> On May 22, 2017, at 12:04 PM, Robert Metzger <rmetzger@apache.org <ma...@apache.org>> wrote:
>> 
>> Hi Fritz,
>> 
>> The TaskManagers are not buffering all stdout for the webinterface (at least I'm not aware of that). Did the error occur when accessing the log from the JobManager?
>> Flinks web front end lazily loads the logs from the taskmanagers.
>> 
>> The suggested method for logging is to use slf4j for logging, so the following code snippets :
>> 
>> import org.slf4j.Logger;
>> import org.slf4j.LoggerFactory;
>> private static final Logger LOG = LoggerFactory.getLogger(MyJob.class);
>> Then you can do stuff like:
>> LOG.info("My log statement");
>> Also, using a logging Framework will allow you to redirect the log contents of your job to a separate file.
>> 
>> But I'm not sure if the logging is really causing the TaskManager JVMs to die ...
>> 
>> 
>> On Sat, May 20, 2017 at 3:12 AM, Fritz Budiyanto <fbudiyan@icloud.com <ma...@icloud.com>> wrote:
>> Hi,
>> 
>> I notice that when I enabled DataStreamSink’s print() for debugging, (kinda excessive printing), its causing java Heap out of memory.
>> Possibly the Task Manager is buffering all stdout for the WebInterface? I haven’t spent time debugging it, but I wonder if this is expected where massive print will exhaust java heap, and I’m using standalone mode.
>> 
>> Is there a way to disable this memory logging for web interface, and just redirect stdout to file instead with file rotation?
>> What is the suggested method of logging ?
>> 
>> —
>> Fritz
>> 
> 
> 


Re: Excessive stdout is causing java heap out of mem

Posted by Robert Metzger <rm...@apache.org>.
Hi Fritz,

what are you doing on your task manager?
Are you keeping many objects on the heap in your application?
Are you using any window operators of Flink? If so, which statebackend are
you using?



On Tue, May 23, 2017 at 7:02 AM, Fritz Budiyanto <fb...@icloud.com>
wrote:

> Hi Robert,
>
> Thanks Robert, I’ll start using the logger.
>
> I didn’t pay attention whether the error occur when I accessed the log
> from job manager.
> I will do that in my next test.
>
> Anyone has any suggestion on how to debug out of memory exception on flink
> jm/tm ?
>
> —
> Fritz
>
>
> On May 22, 2017, at 12:04 PM, Robert Metzger <rm...@apache.org> wrote:
>
> Hi Fritz,
>
> The TaskManagers are not buffering all stdout for the webinterface (at
> least I'm not aware of that). Did the error occur when accessing the log
> from the JobManager?
> Flinks web front end lazily loads the logs from the taskmanagers.
>
> The suggested method for logging is to use slf4j for logging, so the
> following code snippets :
>
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
>
> private static final Logger LOG = LoggerFactory.getLogger(MyJob.class);
>
> Then you can do stuff like:
>
> LOG.info("My log statement");
>
> Also, using a logging Framework will allow you to redirect the log
> contents of your job to a separate file.
>
> But I'm not sure if the logging is really causing the TaskManager JVMs to
> die ...
>
>
> On Sat, May 20, 2017 at 3:12 AM, Fritz Budiyanto <fb...@icloud.com>
> wrote:
>
>> Hi,
>>
>> I notice that when I enabled DataStreamSink’s print() for debugging,
>> (kinda excessive printing), its causing java Heap out of memory.
>> Possibly the Task Manager is buffering all stdout for the WebInterface? I
>> haven’t spent time debugging it, but I wonder if this is expected where
>> massive print will exhaust java heap, and I’m using standalone mode.
>>
>> Is there a way to disable this memory logging for web interface, and just
>> redirect stdout to file instead with file rotation?
>> What is the suggested method of logging ?
>>
>> —
>> Fritz
>
>
>
>

Re: Excessive stdout is causing java heap out of mem

Posted by Fritz Budiyanto <fb...@icloud.com>.
Hi Robert,

Thanks Robert, I’ll start using the logger. 

I didn’t pay attention whether the error occur when I accessed the log from job manager.
I will do that in my next test.

Anyone has any suggestion on how to debug out of memory exception on flink jm/tm ?

—
Fritz


> On May 22, 2017, at 12:04 PM, Robert Metzger <rm...@apache.org> wrote:
> 
> Hi Fritz,
> 
> The TaskManagers are not buffering all stdout for the webinterface (at least I'm not aware of that). Did the error occur when accessing the log from the JobManager?
> Flinks web front end lazily loads the logs from the taskmanagers.
> 
> The suggested method for logging is to use slf4j for logging, so the following code snippets :
> 
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> private static final Logger LOG = LoggerFactory.getLogger(MyJob.class);
> Then you can do stuff like:
> LOG.info("My log statement");
> Also, using a logging Framework will allow you to redirect the log contents of your job to a separate file.
> 
> But I'm not sure if the logging is really causing the TaskManager JVMs to die ...
> 
> 
> On Sat, May 20, 2017 at 3:12 AM, Fritz Budiyanto <fbudiyan@icloud.com <ma...@icloud.com>> wrote:
> Hi,
> 
> I notice that when I enabled DataStreamSink’s print() for debugging, (kinda excessive printing), its causing java Heap out of memory.
> Possibly the Task Manager is buffering all stdout for the WebInterface? I haven’t spent time debugging it, but I wonder if this is expected where massive print will exhaust java heap, and I’m using standalone mode.
> 
> Is there a way to disable this memory logging for web interface, and just redirect stdout to file instead with file rotation?
> What is the suggested method of logging ?
> 
> —
> Fritz
> 


Re: Excessive stdout is causing java heap out of mem

Posted by Robert Metzger <rm...@apache.org>.
Hi Fritz,

The TaskManagers are not buffering all stdout for the webinterface (at
least I'm not aware of that). Did the error occur when accessing the log
from the JobManager?
Flinks web front end lazily loads the logs from the taskmanagers.

The suggested method for logging is to use slf4j for logging, so the
following code snippets :

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

private static final Logger LOG = LoggerFactory.getLogger(MyJob.class);

Then you can do stuff like:

LOG.info("My log statement");

Also, using a logging Framework will allow you to redirect the log contents
of your job to a separate file.

But I'm not sure if the logging is really causing the TaskManager JVMs to
die ...


On Sat, May 20, 2017 at 3:12 AM, Fritz Budiyanto <fb...@icloud.com>
wrote:

> Hi,
>
> I notice that when I enabled DataStreamSink’s print() for debugging,
> (kinda excessive printing), its causing java Heap out of memory.
> Possibly the Task Manager is buffering all stdout for the WebInterface? I
> haven’t spent time debugging it, but I wonder if this is expected where
> massive print will exhaust java heap, and I’m using standalone mode.
>
> Is there a way to disable this memory logging for web interface, and just
> redirect stdout to file instead with file rotation?
> What is the suggested method of logging ?
>
> —
> Fritz