You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/16 22:55:05 UTC

[jira] [Updated] (SPARK-25920) Avoid custom processing of CLI options for cluster submission

     [ https://issues.apache.org/jira/browse/SPARK-25920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-25920:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> Avoid custom processing of CLI options for cluster submission
> -------------------------------------------------------------
>
>                 Key: SPARK-25920
>                 URL: https://issues.apache.org/jira/browse/SPARK-25920
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Submit
>    Affects Versions: 3.1.0
>            Reporter: Marcelo Masiero Vanzin
>            Priority: Minor
>
> In {{SparkSubmit}}, when an app is being submitted in cluster mode, there is currently a lot of code specific to each resource manager to take the {{SparkSubmit}} internals, package them up in a rm-specific set of "command line options", and parse them back into memory when the rm-specific class is invoked.
> e.g. for YARN
> {code}
>     // In yarn-cluster mode, use yarn.Client as a wrapper around the user class
>     if (isYarnCluster) {
>       childMainClass = YARN_CLUSTER_SUBMIT_CLASS
>       if (args.isPython) {
>         childArgs += ("--primary-py-file", args.primaryResource)
>         childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
>   [blah blah blah]
> {code}
> For Mesos:
> {code}
>     if (isMesosCluster) {
>       assert(args.useRest, "Mesos cluster mode is only supported through the REST submission API")
>       childMainClass = REST_CLUSTER_SUBMIT_CLASS
>       if (args.isPython) {
>         // Second argument is main class
>         childArgs += (args.primaryResource, "")
>         if (args.pyFiles != null) {
>           sparkConf.set("spark.submit.pyFiles", args.pyFiles)
>         }
>   [blah blah blah]
> {code}
> For k8s:
> {code}
>     if (isKubernetesCluster) {
>       childMainClass = KUBERNETES_CLUSTER_SUBMIT_CLASS
>       if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
>         if (args.isPython) {
>           childArgs ++= Array("--primary-py-file", args.primaryResource)
>           childArgs ++= Array("--main-class", "org.apache.spark.deploy.PythonRunner")
>   [blah blah blah]
> {code}
> These parts of the code are all very similar and there's not a good reason for why each RM needs specific processing here. We should try to simplify all this stuff and pass pre-parsed command line options to the cluster submission classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org