You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kevin Doyle (Jira)" <ji...@apache.org> on 2019/10/24 17:12:00 UTC

[jira] [Updated] (SPARK-29593) Enhance Cluster Managers to be Pluggable

     [ https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Doyle updated SPARK-29593:
--------------------------------
    Description: 
Today Cluster Managers are bundled with Spark and it is hard to add new ones. Kubernetes forked the code to build it and then bring it into Spark. Lots of work is still going on with the Kubernetes cluster manager. It should be able to ship more often if Spark had a pluggable way to bring in Cluster Managers. This will also benefit enterprise companies that have their own cluster managers that aren't open source, so can't be part of Spark itself.

High level idea to be discussed for additional options:
 1. Make the cluster manager pluggable.
 2. Have the Spark Standalone cluster manager ship with Spark by default and be the base cluster manager others can inherit from. Others can be shipped or not shipped at same time.
 3. Each Cluster Manager can ship additional jars that can be placed inside Spark, then with a configuration file define the cluster manager Spark runs with. 
 4. The configuration file can define which classes to use for the various parts. Can reuse files from Spark Standalone Cluster Manager or say to use a different one.
 5. Based on the classes that are allowed to be switched out in the Spark code we can use code like the following to load a different class.

–+val+ +clazz+ = Class.forName("*<Some Scheduler Class we got the class name from configuration file*")
 +val+ cons = clazz.getConstructor(classOf[SparkContext])
 cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]

Proposal discussed at Spark + AI Summit Europe 2019: [https://databricks.com/session_eu19/refactoring-apache-spark-to-allow-additional-cluster-managers]

  was:
Today Cluster Managers are bundled with Spark and it is hard to add new ones. Kubernetes forked the code to build it and then bring it into Spark. Lots of work is still going on with the Kubernetes cluster manager. It should be able to ship more often if Spark had a pluggable way to bring in Cluster Managers. This will also benefit enterprise companies that have their own cluster managers that aren't open source, so can't be part of Spark itself.

High level idea to be discussed for additional options:
1. Make the cluster manager pluggable.
2. Have the Spark Standalone cluster manager ship with Spark by default and be the base cluster manager others can inherit from. Others can be shipped or not shipped at same time.
3. Each Cluster Manager can ship additional jars that can be placed inside Spark, then with a configuration file define the cluster manager Spark runs with. 
4. The configuration file can define which classes to use for the various parts. Can reuse files from Spark Standalone Cluster Manager or say to use a different one.
5. Based on the classes that are allowed to be switched out in the Spark code we can use code like the following to load a different class.



–+val+ +clazz+ = Class.forName("*<Some Scheduler Class we got the class name from configuration file*")
 +val+ cons = clazz.getConstructor(classOf[SparkContext])
 cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]


> Enhance Cluster Managers to be Pluggable
> ----------------------------------------
>
>                 Key: SPARK-29593
>                 URL: https://issues.apache.org/jira/browse/SPARK-29593
>             Project: Spark
>          Issue Type: New Feature
>          Components: Scheduler
>    Affects Versions: 2.4.4
>            Reporter: Kevin Doyle
>            Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. Kubernetes forked the code to build it and then bring it into Spark. Lots of work is still going on with the Kubernetes cluster manager. It should be able to ship more often if Spark had a pluggable way to bring in Cluster Managers. This will also benefit enterprise companies that have their own cluster managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and be the base cluster manager others can inherit from. Others can be shipped or not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside Spark, then with a configuration file define the cluster manager Spark runs with. 
>  4. The configuration file can define which classes to use for the various parts. Can reuse files from Spark Standalone Cluster Manager or say to use a different one.
>  5. Based on the classes that are allowed to be switched out in the Spark code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("*<Some Scheduler Class we got the class name from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]
> Proposal discussed at Spark + AI Summit Europe 2019: [https://databricks.com/session_eu19/refactoring-apache-spark-to-allow-additional-cluster-managers]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org