You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Deepak Sharma <de...@gmail.com> on 2019/02/14 06:40:07 UTC

Spark streaming filling the disk with logs

Hi All
I am running a spark streaming job with below configuration :

--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"

But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN
logs are getting written but then it affects all the jobs .

Is there any way to get rid of INFO level of logging at spark streaming job
level ?

Thanks
Deepak

-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

RE: Spark streaming filling the disk with logs

Posted by "Jain, Abhishek 3. (Nokia - IN/Bangalore)" <ab...@nokia.com>.
The properties provided earlier, will work for the standalone mode. For cluster mode, the below properties need to be added in the spark-submit:
--files "<path>/log4j.properties"     (to make log4j property file available for both driver and executor/s)

(to enable the extra java options for driver and executor/s)
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:<path>/log4j.properties"
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:<path>/log4j.properties"

Regards,
Abhishek Jain

From: email@yeikel.com <em...@yeikel.com>
Sent: Friday, February 15, 2019 7:32 AM
To: Jain, Abhishek 3. (Nokia - IN/Bangalore) <ab...@nokia.com>; 'Deepak Sharma' <de...@gmail.com>
Cc: 'spark users' <us...@spark.apache.org>
Subject: RE: Spark streaming filling the disk with logs

I have a quick question about this configuration. Particularly this line :

log4j.appender.rolling.file=/var/log/spark/<logfilename>

Where is that path at? At the driver level or for each executor individually?

Thank you

From: Jain, Abhishek 3. (Nokia - IN/Bangalore) <ab...@nokia.com>>
Sent: Thursday, February 14, 2019 7:48 AM
To: Deepak Sharma <de...@gmail.com>>
Cc: spark users <us...@spark.apache.org>>
Subject: RE: Spark streaming filling the disk with logs

++
If you can afford loosing few old logs, then you can make use of rolling file Appender as well.

log4j.rootLogger=INFO, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.maxFileSize=50MB
log4j.appender.rolling.maxBackupIndex=5
log4j.appender.rolling.file=/var/log/spark/<logfilename>
log4j.logger.org.apache.spark=<LogLevel>

This means log4j will roll the log file by 50MB and keep only 5 recent files. These files are saved in /var/log/spark directory, with filename mentioned.

Regards,
Abhishek Jain

From: Jain, Abhishek 3. (Nokia - IN/Bangalore)
Sent: Thursday, February 14, 2019 5:58 PM
To: Deepak Sharma <de...@gmail.com>>
Cc: spark users <us...@spark.apache.org>>
Subject: RE: Spark streaming filling the disk with logs

Hi Deepak,

The spark logging can be set for different purposes. Say for example if you want to control the spark-submit log, “log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.

Similarly, to control third party logs:
log4j.logger.org.spark_project.jetty=<LEVEL>, log4j.logger.org.apache.parquet=<LEVEL> etc..

These properties can be set in the conf/log4j .properties file.

Hope this helps! 😊

Regards,
Abhishek Jain

From: Deepak Sharma <de...@gmail.com>>
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <us...@spark.apache.org>>
Subject: Spark streaming filling the disk with logs

Hi All
I am running a spark streaming job with below configuration :

--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"

But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN logs are getting written but then it affects all the jobs .

Is there any way to get rid of INFO level of logging at spark streaming job level ?

Thanks
Deepak

--
Thanks
Deepak
www.bigdatabig.com<http://www.bigdatabig.com>
www.keosha.net<http://www.keosha.net>

RE: Spark streaming filling the disk with logs

Posted by em...@yeikel.com.
I have a quick question about this configuration. Particularly this line : 

 

log4j.appender.rolling.file=/var/log/spark/<logfilename>

 

Where is that path at? At the driver level or for each executor individually? 

 

Thank you

 

From: Jain, Abhishek 3. (Nokia - IN/Bangalore) <ab...@nokia.com> 
Sent: Thursday, February 14, 2019 7:48 AM
To: Deepak Sharma <de...@gmail.com>
Cc: spark users <us...@spark.apache.org>
Subject: RE: Spark streaming filling the disk with logs

 

++

If you can afford loosing few old logs, then you can make use of rolling file Appender as well.

 

log4j.rootLogger=INFO, rolling

log4j.appender.rolling=org.apache.log4j.RollingFileAppender

log4j.appender.rolling.layout=org.apache.log4j.PatternLayout

log4j.appender.rolling.maxFileSize=50MB

log4j.appender.rolling.maxBackupIndex=5

log4j.appender.rolling.file=/var/log/spark/<logfilename>

log4j.logger.org.apache.spark=<LogLevel>

 

This means log4j will roll the log file by 50MB and keep only 5 recent files. These files are saved in /var/log/spark directory, with filename mentioned.

 

Regards,
Abhishek Jain

 

From: Jain, Abhishek 3. (Nokia - IN/Bangalore) 
Sent: Thursday, February 14, 2019 5:58 PM
To: Deepak Sharma <deepakmca05@gmail.com <ma...@gmail.com> >
Cc: spark users <user@spark.apache.org <ma...@spark.apache.org> >
Subject: RE: Spark streaming filling the disk with logs

 

Hi Deepak,

 

The spark logging can be set for different purposes. Say for example if you want to control the spark-submit log, “log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.

 

Similarly, to control third party logs:

log4j.logger.org.spark_project.jetty=<LEVEL>, log4j.logger.org.apache.parquet=<LEVEL> etc..

 

These properties can be set in the conf/log4j .properties file.

 

Hope this helps! 😊

 

Regards,

Abhishek Jain

 

From: Deepak Sharma <deepakmca05@gmail.com <ma...@gmail.com> > 
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <user@spark.apache.org <ma...@spark.apache.org> >
Subject: Spark streaming filling the disk with logs

 

Hi All

I am running a spark streaming job with below configuration :

 

--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"

 

But it’s still filling the disk with info logs.

If the logging level is set to WARN at cluster level , then only the WARN logs are getting written but then it affects all the jobs .

 

Is there any way to get rid of INFO level of logging at spark streaming job level ?

 

Thanks

Deepak 

 

-- 

Thanks
Deepak
www.bigdatabig.com <http://www.bigdatabig.com> 
www.keosha.net <http://www.keosha.net> 


RE: Spark streaming filling the disk with logs

Posted by "Jain, Abhishek 3. (Nokia - IN/Bangalore)" <ab...@nokia.com>.
++
If you can afford loosing few old logs, then you can make use of rolling file Appender as well.

log4j.rootLogger=INFO, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.maxFileSize=50MB
log4j.appender.rolling.maxBackupIndex=5
log4j.appender.rolling.file=/var/log/spark/<logfilename>
log4j.logger.org.apache.spark=<LogLevel>

This means log4j will roll the log file by 50MB and keep only 5 recent files. These files are saved in /var/log/spark directory, with filename mentioned.

Regards,
Abhishek Jain

From: Jain, Abhishek 3. (Nokia - IN/Bangalore)
Sent: Thursday, February 14, 2019 5:58 PM
To: Deepak Sharma <de...@gmail.com>
Cc: spark users <us...@spark.apache.org>
Subject: RE: Spark streaming filling the disk with logs

Hi Deepak,

The spark logging can be set for different purposes. Say for example if you want to control the spark-submit log, “log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.

Similarly, to control third party logs:
log4j.logger.org.spark_project.jetty=<LEVEL>, log4j.logger.org.apache.parquet=<LEVEL> etc..

These properties can be set in the conf/log4j .properties file.

Hope this helps! 😊

Regards,
Abhishek Jain

From: Deepak Sharma <de...@gmail.com>>
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <us...@spark.apache.org>>
Subject: Spark streaming filling the disk with logs

Hi All
I am running a spark streaming job with below configuration :

--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"

But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN logs are getting written but then it affects all the jobs .

Is there any way to get rid of INFO level of logging at spark streaming job level ?

Thanks
Deepak

--
Thanks
Deepak
www.bigdatabig.com<http://www.bigdatabig.com>
www.keosha.net<http://www.keosha.net>

RE: Spark streaming filling the disk with logs

Posted by "Jain, Abhishek 3. (Nokia - IN/Bangalore)" <ab...@nokia.com>.
Hi Deepak,

The spark logging can be set for different purposes. Say for example if you want to control the spark-submit log, “log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.

Similarly, to control third party logs:
log4j.logger.org.spark_project.jetty=<LEVEL>, log4j.logger.org.apache.parquet=<LEVEL> etc..

These properties can be set in the conf/log4j .properties file.

Hope this helps! 😊

Regards,
Abhishek Jain

From: Deepak Sharma <de...@gmail.com>
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <us...@spark.apache.org>
Subject: Spark streaming filling the disk with logs

Hi All
I am running a spark streaming job with below configuration :

--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"

But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN logs are getting written but then it affects all the jobs .

Is there any way to get rid of INFO level of logging at spark streaming job level ?

Thanks
Deepak

--
Thanks
Deepak
www.bigdatabig.com<http://www.bigdatabig.com>
www.keosha.net<http://www.keosha.net>