You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Deepak Sharma <de...@gmail.com> on 2019/02/14 06:40:07 UTC
Spark streaming filling the disk with logs
Hi All
I am running a spark streaming job with below configuration :
--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"
But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN
logs are getting written but then it affects all the jobs .
Is there any way to get rid of INFO level of logging at spark streaming job
level ?
Thanks
Deepak
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
RE: Spark streaming filling the disk with logs
Posted by "Jain, Abhishek 3. (Nokia - IN/Bangalore)" <ab...@nokia.com>.
The properties provided earlier, will work for the standalone mode. For cluster mode, the below properties need to be added in the spark-submit:
--files "<path>/log4j.properties" (to make log4j property file available for both driver and executor/s)
(to enable the extra java options for driver and executor/s)
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:<path>/log4j.properties"
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:<path>/log4j.properties"
Regards,
Abhishek Jain
From: email@yeikel.com <em...@yeikel.com>
Sent: Friday, February 15, 2019 7:32 AM
To: Jain, Abhishek 3. (Nokia - IN/Bangalore) <ab...@nokia.com>; 'Deepak Sharma' <de...@gmail.com>
Cc: 'spark users' <us...@spark.apache.org>
Subject: RE: Spark streaming filling the disk with logs
I have a quick question about this configuration. Particularly this line :
log4j.appender.rolling.file=/var/log/spark/<logfilename>
Where is that path at? At the driver level or for each executor individually?
Thank you
From: Jain, Abhishek 3. (Nokia - IN/Bangalore) <ab...@nokia.com>>
Sent: Thursday, February 14, 2019 7:48 AM
To: Deepak Sharma <de...@gmail.com>>
Cc: spark users <us...@spark.apache.org>>
Subject: RE: Spark streaming filling the disk with logs
++
If you can afford loosing few old logs, then you can make use of rolling file Appender as well.
log4j.rootLogger=INFO, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.maxFileSize=50MB
log4j.appender.rolling.maxBackupIndex=5
log4j.appender.rolling.file=/var/log/spark/<logfilename>
log4j.logger.org.apache.spark=<LogLevel>
This means log4j will roll the log file by 50MB and keep only 5 recent files. These files are saved in /var/log/spark directory, with filename mentioned.
Regards,
Abhishek Jain
From: Jain, Abhishek 3. (Nokia - IN/Bangalore)
Sent: Thursday, February 14, 2019 5:58 PM
To: Deepak Sharma <de...@gmail.com>>
Cc: spark users <us...@spark.apache.org>>
Subject: RE: Spark streaming filling the disk with logs
Hi Deepak,
The spark logging can be set for different purposes. Say for example if you want to control the spark-submit log, “log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.
Similarly, to control third party logs:
log4j.logger.org.spark_project.jetty=<LEVEL>, log4j.logger.org.apache.parquet=<LEVEL> etc..
These properties can be set in the conf/log4j .properties file.
Hope this helps! 😊
Regards,
Abhishek Jain
From: Deepak Sharma <de...@gmail.com>>
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <us...@spark.apache.org>>
Subject: Spark streaming filling the disk with logs
Hi All
I am running a spark streaming job with below configuration :
--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"
But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN logs are getting written but then it affects all the jobs .
Is there any way to get rid of INFO level of logging at spark streaming job level ?
Thanks
Deepak
--
Thanks
Deepak
www.bigdatabig.com<http://www.bigdatabig.com>
www.keosha.net<http://www.keosha.net>
RE: Spark streaming filling the disk with logs
Posted by em...@yeikel.com.
I have a quick question about this configuration. Particularly this line :
log4j.appender.rolling.file=/var/log/spark/<logfilename>
Where is that path at? At the driver level or for each executor individually?
Thank you
From: Jain, Abhishek 3. (Nokia - IN/Bangalore) <ab...@nokia.com>
Sent: Thursday, February 14, 2019 7:48 AM
To: Deepak Sharma <de...@gmail.com>
Cc: spark users <us...@spark.apache.org>
Subject: RE: Spark streaming filling the disk with logs
++
If you can afford loosing few old logs, then you can make use of rolling file Appender as well.
log4j.rootLogger=INFO, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.maxFileSize=50MB
log4j.appender.rolling.maxBackupIndex=5
log4j.appender.rolling.file=/var/log/spark/<logfilename>
log4j.logger.org.apache.spark=<LogLevel>
This means log4j will roll the log file by 50MB and keep only 5 recent files. These files are saved in /var/log/spark directory, with filename mentioned.
Regards,
Abhishek Jain
From: Jain, Abhishek 3. (Nokia - IN/Bangalore)
Sent: Thursday, February 14, 2019 5:58 PM
To: Deepak Sharma <deepakmca05@gmail.com <ma...@gmail.com> >
Cc: spark users <user@spark.apache.org <ma...@spark.apache.org> >
Subject: RE: Spark streaming filling the disk with logs
Hi Deepak,
The spark logging can be set for different purposes. Say for example if you want to control the spark-submit log, “log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.
Similarly, to control third party logs:
log4j.logger.org.spark_project.jetty=<LEVEL>, log4j.logger.org.apache.parquet=<LEVEL> etc..
These properties can be set in the conf/log4j .properties file.
Hope this helps! 😊
Regards,
Abhishek Jain
From: Deepak Sharma <deepakmca05@gmail.com <ma...@gmail.com> >
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <user@spark.apache.org <ma...@spark.apache.org> >
Subject: Spark streaming filling the disk with logs
Hi All
I am running a spark streaming job with below configuration :
--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"
But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN logs are getting written but then it affects all the jobs .
Is there any way to get rid of INFO level of logging at spark streaming job level ?
Thanks
Deepak
--
Thanks
Deepak
www.bigdatabig.com <http://www.bigdatabig.com>
www.keosha.net <http://www.keosha.net>
RE: Spark streaming filling the disk with logs
Posted by "Jain, Abhishek 3. (Nokia - IN/Bangalore)" <ab...@nokia.com>.
++
If you can afford loosing few old logs, then you can make use of rolling file Appender as well.
log4j.rootLogger=INFO, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.maxFileSize=50MB
log4j.appender.rolling.maxBackupIndex=5
log4j.appender.rolling.file=/var/log/spark/<logfilename>
log4j.logger.org.apache.spark=<LogLevel>
This means log4j will roll the log file by 50MB and keep only 5 recent files. These files are saved in /var/log/spark directory, with filename mentioned.
Regards,
Abhishek Jain
From: Jain, Abhishek 3. (Nokia - IN/Bangalore)
Sent: Thursday, February 14, 2019 5:58 PM
To: Deepak Sharma <de...@gmail.com>
Cc: spark users <us...@spark.apache.org>
Subject: RE: Spark streaming filling the disk with logs
Hi Deepak,
The spark logging can be set for different purposes. Say for example if you want to control the spark-submit log, “log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.
Similarly, to control third party logs:
log4j.logger.org.spark_project.jetty=<LEVEL>, log4j.logger.org.apache.parquet=<LEVEL> etc..
These properties can be set in the conf/log4j .properties file.
Hope this helps! 😊
Regards,
Abhishek Jain
From: Deepak Sharma <de...@gmail.com>>
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <us...@spark.apache.org>>
Subject: Spark streaming filling the disk with logs
Hi All
I am running a spark streaming job with below configuration :
--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"
But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN logs are getting written but then it affects all the jobs .
Is there any way to get rid of INFO level of logging at spark streaming job level ?
Thanks
Deepak
--
Thanks
Deepak
www.bigdatabig.com<http://www.bigdatabig.com>
www.keosha.net<http://www.keosha.net>
RE: Spark streaming filling the disk with logs
Posted by "Jain, Abhishek 3. (Nokia - IN/Bangalore)" <ab...@nokia.com>.
Hi Deepak,
The spark logging can be set for different purposes. Say for example if you want to control the spark-submit log, “log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.
Similarly, to control third party logs:
log4j.logger.org.spark_project.jetty=<LEVEL>, log4j.logger.org.apache.parquet=<LEVEL> etc..
These properties can be set in the conf/log4j .properties file.
Hope this helps! 😊
Regards,
Abhishek Jain
From: Deepak Sharma <de...@gmail.com>
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <us...@spark.apache.org>
Subject: Spark streaming filling the disk with logs
Hi All
I am running a spark streaming job with below configuration :
--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"
But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN logs are getting written but then it affects all the jobs .
Is there any way to get rid of INFO level of logging at spark streaming job level ?
Thanks
Deepak
--
Thanks
Deepak
www.bigdatabig.com<http://www.bigdatabig.com>
www.keosha.net<http://www.keosha.net>