You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2018/11/01 21:31:00 UTC
[jira] [Created] (SPARK-25920) Avoid custom processing of CLI
options for cluster submission
Marcelo Vanzin created SPARK-25920:
--------------------------------------
Summary: Avoid custom processing of CLI options for cluster submission
Key: SPARK-25920
URL: https://issues.apache.org/jira/browse/SPARK-25920
Project: Spark
Issue Type: Improvement
Components: Spark Submit
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin
In {{SparkSubmit}}, when an app is being submitted in cluster mode, there is currently a lot of code specific to each resource manager to take the {{SparkSubmit}} internals, package them up in a rm-specific set of "command line options", and parse them back into memory when the rm-specific class is invoked.
e.g. for YARN
{code}
// In yarn-cluster mode, use yarn.Client as a wrapper around the user class
if (isYarnCluster) {
childMainClass = YARN_CLUSTER_SUBMIT_CLASS
if (args.isPython) {
childArgs += ("--primary-py-file", args.primaryResource)
childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
[blah blah blah]
{code}
For Mesos:
{code}
if (isMesosCluster) {
assert(args.useRest, "Mesos cluster mode is only supported through the REST submission API")
childMainClass = REST_CLUSTER_SUBMIT_CLASS
if (args.isPython) {
// Second argument is main class
childArgs += (args.primaryResource, "")
if (args.pyFiles != null) {
sparkConf.set("spark.submit.pyFiles", args.pyFiles)
}
[blah blah blah]
{code}
For k8s:
{code}
if (isKubernetesCluster) {
childMainClass = KUBERNETES_CLUSTER_SUBMIT_CLASS
if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
if (args.isPython) {
childArgs ++= Array("--primary-py-file", args.primaryResource)
childArgs ++= Array("--main-class", "org.apache.spark.deploy.PythonRunner")
[blah blah blah]
{code}
These parts of the code are all very similar and there's not a good reason for why each RM needs specific processing here. We should try to simplify all this stuff and pass pre-parsed command line options to the cluster submission classes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org