You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Sergey Zhemzhitsky (JIRA)" <ji...@apache.org> on 2017/09/12 08:43:00 UTC

[jira] [Commented] (OOZIE-2812) SparkConfigurationService should support loading configurations from multiple Spark versions

    [ https://issues.apache.org/jira/browse/OOZIE-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162682#comment-16162682 ] 

Sergey Zhemzhitsky commented on OOZIE-2812:
-------------------------------------------

Hello guys,

I'm wondering whether *oozie.service.SparkConfigurationService.spark.configurations* configuration option is really necessary?  
Here are just my two cents regarding this not so obvious option

# Spark jobs are just yarn applications as well as flink applications, as well as java applications that can run on yarn, etc.
There are may be multiple Spark jobs which use completely different spark versions (for example custom patched ones), so it is not necessary to create indirect reference between yarn resource manager and spark configuration, although for map-reduce the similar *oozie.service.HadoopAccessorService.hadoop.configurations* option makes sense, because currently it's hardly possible to run multiple map-reduce implementations on top of single yarn resource manager.
# *oozie.service.SparkConfigurationService.spark.configurations* option reads *spark-defaults.properties* from the default location (i.e. */etc/spark/conf*) into java.util.Properties and then just appends these ones to the *spark-opts* as if these options were specified by means of *--conf* command line options. 
But, usually, when using *spark\-submit.sh*, *\-\-conf* command line options have precedence over properties specified in *spark\-defaults.properties* and there is a chance that options provided by the user by means of *--conf* spark option will be overriden by properties provided in *oozie.service.SparkConfigurationService.spark.configurations*.
# Latest implementation of spark action already supports *<file ...>* element and *spark\-defaults.properties* in current working directory, and these two possibilities give more flexibility than *oozie.service.SparkConfigurationService.spark.configurations*, because user can
## add *spark-default.properties* to the workflow application and oozie will pick it up 
## add 
{code}
<file>local:/etc/spark/conf/spark-defaults.properties</file>
{code}
and oozie's spark action will automatically pick up this spark-defaults by means of *\-\-properties\-file* option preserving semantics of precedence of properties 
** provided by means of *spark\-defaults.properties* file, 
** provided by means of *\-\-properties\-file* command line option
** provided by means of *\-\-conf* command line options

> SparkConfigurationService should support loading configurations from multiple Spark versions
> --------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-2812
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2812
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Peter Cseh
>            Assignee: Peter Cseh
>         Attachments: OOZIE-2812.001.patch, OOZIE-2812.002.patch, OOZIE-2812.003.patch, OOZIE-2812.004.patch
>
>
> Right now SparkConfigruationService serves one Spark configuration set by
> {{oozie.service.SparkConfigurationService.spark.configurations}}
> We cloud improve this to support more versions depending on the name of the sharelib.
> E.g. the property could change to
> oozie.service.SparkConfigurationService.<sharelib_name>.configurations
> This would be backward compatible as the name for the default Spark sharelib is spark while it would be possible to add a sharelib named spark2 or spark2.1 and define itheir configuration via oozie.service.SparkConfigurationService.spark2.configurations and
> oozie.service.SparkConfigurationService.spark2.1.configurations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)