You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Maxim Parkachov <la...@gmail.com> on 2020/07/30 13:59:52 UTC

Flink streaming job logging reserves space

Hi everyone,

I have a strange issue with flink logging. I use pretty much standard log4
config, which is writing to standard output in order to see it in Flink
GUI. Deployment is on YARN with job mode. I can see logs in UI, no problem.
On the servers, where Flink YARN containers are running, there is disk
quota on the partition where YARN normally creates logs. I see no specific
files in the application_xx directory, but space on the disk is actually
decreasing with time. After several weeks we eventually hit quota. It seems
like some file or pipe is created but not closed, but still reserves the
space. After I restart Flink job, space is immediately returned back. I'm
sure that flink job is the problem, I have re-produces issue on a cluster
where only 1 filnk job was running. Below is my log4 config. Any help or
idea is appreciated.

Thanks in advance,
Maxim.
-------------------------------------------
# This affects logging for both user code and Flink
log4j.rootLogger=INFO, file, stderr

# Uncomment this if you want to _only_ change Flink's logging
#log4j.logger.org.apache.flink=INFO

# The following lines keep the log level of common libraries/connectors on
# log level INFO. The root logger does not override this. You have to
manually
# change the log levels here.
log4j.logger.akka=INFO
log4j.logger.org.apache.kafka=INFO
log4j.logger.org.apache.hadoop=INFO
log4j.logger.org.apache.zookeeper=INFO

# Log all infos in the given file
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.file=${log.file}
log4j.appender.file.append=false
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS}
%-5p %-60c %x - %m%n

# Suppress the irrelevant (wrong) warnings from the Netty channel handler
log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR,
file

Re: Flink streaming job logging reserves space

Posted by Maxim Parkachov <la...@gmail.com>.
Hi Yang,

Thanks for your advice, now I have a good reason to upgrade to 1.11.

Regards,
Maxim.

On Tue, Aug 4, 2020 at 9:39 AM Yang Wang <da...@gmail.com> wrote:

> AFAIK, there is no way to roll the *.out/err files except we hijack the
> stdout/stderr in Flink code. However, it is a temporary hack.
>
> A good way is to write your logs to other separate files that could roll
> via log4j. If you want to access them in the Flink webUI,
> upgrade to the 1.11 version. Then you will find a "Log List" tab under
> JobManager sidebar.
>
>
> Best,
> Yang
>
> Maxim Parkachov <la...@gmail.com> 于2020年8月4日周二 下午2:52写道:
>
>> Hi Yang,
>>
>> you are right. Since then, I looked for open files and found *.out/*.err
>> files on that partition and as you mentioned they don't roll.
>> I could implement a workaround to restart the streaming job every week or
>> so, but I really don't want to go this way.
>>
>> I tried to forward logs to files and then I could roll them, but then I
>> don't see logs in the GUI.
>>
>> So my question would be, how to make them roll ?
>>
>> Regards,
>> Maxim.
>>
>> On Tue, Aug 4, 2020 at 4:48 AM Yang Wang <da...@gmail.com> wrote:
>>
>>> Hi Maxim,
>>>
>>> First, i want to confirm with you that do you have checked all the
>>> "yarn.nodemanager.log-dirs". If you
>>> could access the logs in Flink webUI, the log files(e.g.
>>> taskmanager.log, taskmanager.out, taskmanager.err)
>>> should exist. I suggest to double check the multiple log-dirs.
>>>
>>> Since the *.out/err files do not roll, if you print some user logs to
>>> the stdout/stderr, the two files will increase
>>> over time.
>>>
>>> When you stop the Flink application, Yarn will clean up all the jars and
>>> logs, so you find that the disk space get back.
>>>
>>>
>>> Best,
>>> Yang
>>>
>>> Maxim Parkachov <la...@gmail.com> 于2020年7月30日周四 下午10:00写道:
>>>
>>>> Hi everyone,
>>>>
>>>> I have a strange issue with flink logging. I use pretty much standard
>>>> log4 config, which is writing to standard output in order to see it in
>>>> Flink GUI. Deployment is on YARN with job mode. I can see logs in UI, no
>>>> problem. On the servers, where Flink YARN containers are running, there is
>>>> disk quota on the partition where YARN normally creates logs. I see no
>>>> specific files in the application_xx directory, but space on the disk is
>>>> actually decreasing with time. After several weeks we eventually hit quota.
>>>> It seems like some file or pipe is created but not closed, but still
>>>> reserves the space. After I restart Flink job, space is
>>>> immediately returned back. I'm sure that flink job is the problem, I have
>>>> re-produces issue on a cluster where only 1 filnk job was running. Below is
>>>> my log4 config. Any help or idea is appreciated.
>>>>
>>>> Thanks in advance,
>>>> Maxim.
>>>> -------------------------------------------
>>>> # This affects logging for both user code and Flink
>>>> log4j.rootLogger=INFO, file, stderr
>>>>
>>>> # Uncomment this if you want to _only_ change Flink's logging
>>>> #log4j.logger.org.apache.flink=INFO
>>>>
>>>> # The following lines keep the log level of common libraries/connectors
>>>> on
>>>> # log level INFO. The root logger does not override this. You have to
>>>> manually
>>>> # change the log levels here.
>>>> log4j.logger.akka=INFO
>>>> log4j.logger.org.apache.kafka=INFO
>>>> log4j.logger.org.apache.hadoop=INFO
>>>> log4j.logger.org.apache.zookeeper=INFO
>>>>
>>>> # Log all infos in the given file
>>>> log4j.appender.file=org.apache.log4j.FileAppender
>>>> log4j.appender.file.file=${log.file}
>>>> log4j.appender.file.append=false
>>>> log4j.appender.file.layout=org.apache.log4j.PatternLayout
>>>> log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd
>>>> HH:mm:ss,SSS} %-5p %-60c %x - %m%n
>>>>
>>>> # Suppress the irrelevant (wrong) warnings from the Netty channel
>>>> handler
>>>> log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR,
>>>> file
>>>>
>>>>

Re: Flink streaming job logging reserves space

Posted by Yang Wang <da...@gmail.com>.
AFAIK, there is no way to roll the *.out/err files except we hijack the
stdout/stderr in Flink code. However, it is a temporary hack.

A good way is to write your logs to other separate files that could roll
via log4j. If you want to access them in the Flink webUI,
upgrade to the 1.11 version. Then you will find a "Log List" tab under
JobManager sidebar.


Best,
Yang

Maxim Parkachov <la...@gmail.com> 于2020年8月4日周二 下午2:52写道:

> Hi Yang,
>
> you are right. Since then, I looked for open files and found *.out/*.err
> files on that partition and as you mentioned they don't roll.
> I could implement a workaround to restart the streaming job every week or
> so, but I really don't want to go this way.
>
> I tried to forward logs to files and then I could roll them, but then I
> don't see logs in the GUI.
>
> So my question would be, how to make them roll ?
>
> Regards,
> Maxim.
>
> On Tue, Aug 4, 2020 at 4:48 AM Yang Wang <da...@gmail.com> wrote:
>
>> Hi Maxim,
>>
>> First, i want to confirm with you that do you have checked all the
>> "yarn.nodemanager.log-dirs". If you
>> could access the logs in Flink webUI, the log files(e.g. taskmanager.log,
>> taskmanager.out, taskmanager.err)
>> should exist. I suggest to double check the multiple log-dirs.
>>
>> Since the *.out/err files do not roll, if you print some user logs to the
>> stdout/stderr, the two files will increase
>> over time.
>>
>> When you stop the Flink application, Yarn will clean up all the jars and
>> logs, so you find that the disk space get back.
>>
>>
>> Best,
>> Yang
>>
>> Maxim Parkachov <la...@gmail.com> 于2020年7月30日周四 下午10:00写道:
>>
>>> Hi everyone,
>>>
>>> I have a strange issue with flink logging. I use pretty much standard
>>> log4 config, which is writing to standard output in order to see it in
>>> Flink GUI. Deployment is on YARN with job mode. I can see logs in UI, no
>>> problem. On the servers, where Flink YARN containers are running, there is
>>> disk quota on the partition where YARN normally creates logs. I see no
>>> specific files in the application_xx directory, but space on the disk is
>>> actually decreasing with time. After several weeks we eventually hit quota.
>>> It seems like some file or pipe is created but not closed, but still
>>> reserves the space. After I restart Flink job, space is
>>> immediately returned back. I'm sure that flink job is the problem, I have
>>> re-produces issue on a cluster where only 1 filnk job was running. Below is
>>> my log4 config. Any help or idea is appreciated.
>>>
>>> Thanks in advance,
>>> Maxim.
>>> -------------------------------------------
>>> # This affects logging for both user code and Flink
>>> log4j.rootLogger=INFO, file, stderr
>>>
>>> # Uncomment this if you want to _only_ change Flink's logging
>>> #log4j.logger.org.apache.flink=INFO
>>>
>>> # The following lines keep the log level of common libraries/connectors
>>> on
>>> # log level INFO. The root logger does not override this. You have to
>>> manually
>>> # change the log levels here.
>>> log4j.logger.akka=INFO
>>> log4j.logger.org.apache.kafka=INFO
>>> log4j.logger.org.apache.hadoop=INFO
>>> log4j.logger.org.apache.zookeeper=INFO
>>>
>>> # Log all infos in the given file
>>> log4j.appender.file=org.apache.log4j.FileAppender
>>> log4j.appender.file.file=${log.file}
>>> log4j.appender.file.append=false
>>> log4j.appender.file.layout=org.apache.log4j.PatternLayout
>>> log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS}
>>> %-5p %-60c %x - %m%n
>>>
>>> # Suppress the irrelevant (wrong) warnings from the Netty channel handler
>>> log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR,
>>> file
>>>
>>>

Re: Flink streaming job logging reserves space

Posted by Maxim Parkachov <la...@gmail.com>.
Hi Yang,

you are right. Since then, I looked for open files and found *.out/*.err
files on that partition and as you mentioned they don't roll.
I could implement a workaround to restart the streaming job every week or
so, but I really don't want to go this way.

I tried to forward logs to files and then I could roll them, but then I
don't see logs in the GUI.

So my question would be, how to make them roll ?

Regards,
Maxim.

On Tue, Aug 4, 2020 at 4:48 AM Yang Wang <da...@gmail.com> wrote:

> Hi Maxim,
>
> First, i want to confirm with you that do you have checked all the
> "yarn.nodemanager.log-dirs". If you
> could access the logs in Flink webUI, the log files(e.g. taskmanager.log,
> taskmanager.out, taskmanager.err)
> should exist. I suggest to double check the multiple log-dirs.
>
> Since the *.out/err files do not roll, if you print some user logs to the
> stdout/stderr, the two files will increase
> over time.
>
> When you stop the Flink application, Yarn will clean up all the jars and
> logs, so you find that the disk space get back.
>
>
> Best,
> Yang
>
> Maxim Parkachov <la...@gmail.com> 于2020年7月30日周四 下午10:00写道:
>
>> Hi everyone,
>>
>> I have a strange issue with flink logging. I use pretty much standard
>> log4 config, which is writing to standard output in order to see it in
>> Flink GUI. Deployment is on YARN with job mode. I can see logs in UI, no
>> problem. On the servers, where Flink YARN containers are running, there is
>> disk quota on the partition where YARN normally creates logs. I see no
>> specific files in the application_xx directory, but space on the disk is
>> actually decreasing with time. After several weeks we eventually hit quota.
>> It seems like some file or pipe is created but not closed, but still
>> reserves the space. After I restart Flink job, space is
>> immediately returned back. I'm sure that flink job is the problem, I have
>> re-produces issue on a cluster where only 1 filnk job was running. Below is
>> my log4 config. Any help or idea is appreciated.
>>
>> Thanks in advance,
>> Maxim.
>> -------------------------------------------
>> # This affects logging for both user code and Flink
>> log4j.rootLogger=INFO, file, stderr
>>
>> # Uncomment this if you want to _only_ change Flink's logging
>> #log4j.logger.org.apache.flink=INFO
>>
>> # The following lines keep the log level of common libraries/connectors on
>> # log level INFO. The root logger does not override this. You have to
>> manually
>> # change the log levels here.
>> log4j.logger.akka=INFO
>> log4j.logger.org.apache.kafka=INFO
>> log4j.logger.org.apache.hadoop=INFO
>> log4j.logger.org.apache.zookeeper=INFO
>>
>> # Log all infos in the given file
>> log4j.appender.file=org.apache.log4j.FileAppender
>> log4j.appender.file.file=${log.file}
>> log4j.appender.file.append=false
>> log4j.appender.file.layout=org.apache.log4j.PatternLayout
>> log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS}
>> %-5p %-60c %x - %m%n
>>
>> # Suppress the irrelevant (wrong) warnings from the Netty channel handler
>> log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR,
>> file
>>
>>

Re: Flink streaming job logging reserves space

Posted by Yang Wang <da...@gmail.com>.
Hi Maxim,

First, i want to confirm with you that do you have checked all the
"yarn.nodemanager.log-dirs". If you
could access the logs in Flink webUI, the log files(e.g. taskmanager.log,
taskmanager.out, taskmanager.err)
should exist. I suggest to double check the multiple log-dirs.

Since the *.out/err files do not roll, if you print some user logs to the
stdout/stderr, the two files will increase
over time.

When you stop the Flink application, Yarn will clean up all the jars and
logs, so you find that the disk space get back.


Best,
Yang

Maxim Parkachov <la...@gmail.com> 于2020年7月30日周四 下午10:00写道:

> Hi everyone,
>
> I have a strange issue with flink logging. I use pretty much standard log4
> config, which is writing to standard output in order to see it in Flink
> GUI. Deployment is on YARN with job mode. I can see logs in UI, no problem.
> On the servers, where Flink YARN containers are running, there is disk
> quota on the partition where YARN normally creates logs. I see no specific
> files in the application_xx directory, but space on the disk is actually
> decreasing with time. After several weeks we eventually hit quota. It seems
> like some file or pipe is created but not closed, but still reserves the
> space. After I restart Flink job, space is immediately returned back. I'm
> sure that flink job is the problem, I have re-produces issue on a cluster
> where only 1 filnk job was running. Below is my log4 config. Any help or
> idea is appreciated.
>
> Thanks in advance,
> Maxim.
> -------------------------------------------
> # This affects logging for both user code and Flink
> log4j.rootLogger=INFO, file, stderr
>
> # Uncomment this if you want to _only_ change Flink's logging
> #log4j.logger.org.apache.flink=INFO
>
> # The following lines keep the log level of common libraries/connectors on
> # log level INFO. The root logger does not override this. You have to
> manually
> # change the log levels here.
> log4j.logger.akka=INFO
> log4j.logger.org.apache.kafka=INFO
> log4j.logger.org.apache.hadoop=INFO
> log4j.logger.org.apache.zookeeper=INFO
>
> # Log all infos in the given file
> log4j.appender.file=org.apache.log4j.FileAppender
> log4j.appender.file.file=${log.file}
> log4j.appender.file.append=false
> log4j.appender.file.layout=org.apache.log4j.PatternLayout
> log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS}
> %-5p %-60c %x - %m%n
>
> # Suppress the irrelevant (wrong) warnings from the Netty channel handler
> log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR,
> file
>
>