You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/16 22:55:05 UTC
[jira] [Updated] (SPARK-25920) Avoid custom processing of CLI
options for cluster submission
[ https://issues.apache.org/jira/browse/SPARK-25920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-25920:
----------------------------------
Affects Version/s: (was: 3.0.0)
3.1.0
> Avoid custom processing of CLI options for cluster submission
> -------------------------------------------------------------
>
> Key: SPARK-25920
> URL: https://issues.apache.org/jira/browse/SPARK-25920
> Project: Spark
> Issue Type: Improvement
> Components: Spark Submit
> Affects Versions: 3.1.0
> Reporter: Marcelo Masiero Vanzin
> Priority: Minor
>
> In {{SparkSubmit}}, when an app is being submitted in cluster mode, there is currently a lot of code specific to each resource manager to take the {{SparkSubmit}} internals, package them up in a rm-specific set of "command line options", and parse them back into memory when the rm-specific class is invoked.
> e.g. for YARN
> {code}
> // In yarn-cluster mode, use yarn.Client as a wrapper around the user class
> if (isYarnCluster) {
> childMainClass = YARN_CLUSTER_SUBMIT_CLASS
> if (args.isPython) {
> childArgs += ("--primary-py-file", args.primaryResource)
> childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
> [blah blah blah]
> {code}
> For Mesos:
> {code}
> if (isMesosCluster) {
> assert(args.useRest, "Mesos cluster mode is only supported through the REST submission API")
> childMainClass = REST_CLUSTER_SUBMIT_CLASS
> if (args.isPython) {
> // Second argument is main class
> childArgs += (args.primaryResource, "")
> if (args.pyFiles != null) {
> sparkConf.set("spark.submit.pyFiles", args.pyFiles)
> }
> [blah blah blah]
> {code}
> For k8s:
> {code}
> if (isKubernetesCluster) {
> childMainClass = KUBERNETES_CLUSTER_SUBMIT_CLASS
> if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
> if (args.isPython) {
> childArgs ++= Array("--primary-py-file", args.primaryResource)
> childArgs ++= Array("--main-class", "org.apache.spark.deploy.PythonRunner")
> [blah blah blah]
> {code}
> These parts of the code are all very similar and there's not a good reason for why each RM needs specific processing here. We should try to simplify all this stuff and pass pre-parsed command line options to the cluster submission classes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org