You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kostiantyn Kudriavtsev <ku...@gmail.com> on 2014/07/03 21:26:48 UTC

Spark logging strategy on YARN

Hi all,

Could you please share your the best practices on writing logs in Spark? I’m running it on YARN, so when I check logs I’m bit confused… 
Currently, I’m writing System.err.println to put a message in log and access it via YARN history server. But, I don’t like this way… I’d like to use log4j/slf4j and write them to more concrete place… any practices?

Thank you in advance

RE: Spark logging strategy on YARN

Posted by Andrew Lee <al...@hotmail.com>.
Hi Kudryavtsev,
Here's what I am doing as a common practice and reference, I don't want to say it is best practice since it requires a lot of customer experience and feedback, but from a development and operating stand point, it will be great to separate the YARN container logs with the Spark logs.
Event Log - Use HistoryServer to take a look at the workflow, overall resource usage, etc for the Job.

Spark Log - Provide readable info on settings and configuration, and is covered by the event logs. You can customize this in the 'conf' folder with your own log4j.properties file. This won't be picked up by your YARN container since your Hadoop may be referring to a different log4j file somewhere else.
Stderr/Stdout log - This is actually picked up by the YARN container and you won't be able to override this unless you override the one in the resource folder (yarn/common/src/main/resources/log4j-spark-container.properties) during the build process and include it in your build (JAR file).
One thing I haven't tried yet is to separate that resource file into a separate JAR, and include it in the ext jar options on HDFS to suppress the log. This is more of a exploiting the CLASSPATH search behavior to override YARN log4j settings without building JARs to include the YARN container log4j settings, I don't know if this is a good practice though. Just some ideas that gives ppl flexibility, but probably not a good practice.
Anyone else have ideas? thoughts?








> From: kudryavtsev.konstantin@gmail.com
> Subject: Spark logging strategy on YARN
> Date: Thu, 3 Jul 2014 22:26:48 +0300
> To: user@spark.apache.org
> 
> Hi all,
> 
> Could you please share your the best practices on writing logs in Spark? I’m running it on YARN, so when I check logs I’m bit confused… 
> Currently, I’m writing System.err.println to put a message in log and access it via YARN history server. But, I don’t like this way… I’d like to use log4j/slf4j and write them to more concrete place… any practices?
> 
> Thank you in advance