You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Prashant Sharma (Jira)" <ji...@apache.org> on 2020/07/01 11:19:00 UTC

[jira] [Updated] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

     [ https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Sharma updated SPARK-30985:
------------------------------------
    Description: 
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific configuration files.

So this feature, will let the user specific configuration files be mounted on the driver and executor pods' SPARK_CONF_DIR.



Please review the attached design doc, for more details.

  was:
SPARK_CONF_DIR hosts configuration files like, 
1) spark-defaults.conf - containing all the spark properties.
2) log4j.properties - Logger configuration.
3) spark-env.sh - Environment variables to be setup at driver and executor.
4) core-site.xml - Hadoop related configuration.
5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
6) metrics.properties - Spark metrics.
7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific configuration files and the default behaviour in the Yarn or standalone mode is that these configuration files are copied to the worker nodes as required by the users themselves. In other words, they are not auto-copied.

But, in the case of  spark on kubernetes, we use spark images and generally these images are approved or undergoe some kind of standardisation. These files cannot be simply copied to the SPARK_CONF_DIR of the running executor and driver pods by the user. 

So, at the moment we have special casing for providing each configuration and for any other user specific configuration files, the process is more complex, i.e. - e.g. one can start with their own custom image of spark with configuration files pre installed etc..
Examples of special casing are:
1. Hadoop configuration in spark.kubernetes.hadoop.configMapName
2. Spark-env.sh as in spark.kubernetes.driverEnv.[EnvironmentVariableName]
3. Log4j.properties as in https://github.com/apache/spark/pull/26193
... And for those such special casing does not exist, they are simply out of luck.

So this feature, will let the user specific configuration files be mounted on the driver and executor pods' SPARK_CONF_DIR.
At the moment it is not clear, if there is a need to, let user specify which config files to propagate - to driver and or executor. But, if there is a case that feature will be helpful, we can increase the scope of this work or create another JIRA issue to track that work.


> Propagate SPARK_CONF_DIR files to driver and exec pods.
> -------------------------------------------------------
>
>                 Key: SPARK-30985
>                 URL: https://issues.apache.org/jira/browse/SPARK-30985
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Prashant Sharma
>            Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific configuration files.
> So this feature, will let the user specific configuration files be mounted on the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org