You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Luca Borin <bo...@gmail.com> on 2019/07/24 05:05:48 UTC

Apache Spark Log4j logging applicationId

Hi,

I would like to add the applicationId to all logs produced by Spark through
Log4j. Consider that I have a cluster with several jobs running in it, so
the presence of the applicationId would be useful to logically divide them.

I have found a partial solution. If I change the layout of the
PatternLayout logger, I can add the print of the ThreadContext (see here
<https://logging.apache.org/log4j/2.x/manual/thread-context.html>), which
can be used to add through MDC the information of the applicationId (see
here
<https://stackoverflow.com/questions/54706582/output-spark-application-id-in-the-logs-with-log4j>).
This works for the driver, but I would like to add this information at
Spark application startup, both for driver and workers. Notice that I'm
working with a managed environment (Databricks), so I'm partially limited
in cluster management. One workaround to execute the put of the parameter
through MDC to all workers is to use a broadcast variable and perform an
action with it, but I don't think it is stable, considering that this
should work also if the worker machine restarts or is substituted.

Thank you