You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matt Cheah (JIRA)" <ji...@apache.org> on 2017/01/06 00:48:59 UTC

[jira] [Commented] (SPARK-18278) Support native submission of spark jobs to a kubernetes cluster

    [ https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803050#comment-15803050 ] 

Matt Cheah commented on SPARK-18278:
------------------------------------

I refactored the scheduler code as a thought experiment on what it would take to make the scheduler pluggable. The goal was to allow writers of a custom scheduler implementation not need to have any references to {{CoarseGrainedSchedulerBackend}} or {{TaskSchedulerImpl}}. A preliminary idea for the rewrite for the existing schedulers is [here|https://github.com/palantir/spark/pull/81], and it's certainly a non-trivial change. The overarching philosophy of this prototype is to encourage dependency injection to wire together the scheduler components - see the implementations of [ExternalClusterManagerFactory|https://github.com/palantir/spark/pull/81/files#diff-a37079e493cd374ca7f0ac417ae6b3a4R21] like [YarnClusterManagerFactory|https://github.com/palantir/spark/pull/81/files#diff-78f80e86e01a29956fad626b9c172a78R29] - but there might be a better way to do this, and I haven't given enough thought to alternate designs. Some of the new public APIs introduced by this change were also defined arbitrarily but should be given more careful thought, such as the method signatures on [ExecutorLifecycleHandler|https://github.com/palantir/spark/pull/81/files#diff-fbbfb3c6d8556728653f9f5636f86ccbR24] and expectations from [ExternalClusterManager.validate()|https://github.com/palantir/spark/pull/81/files#diff-1163e4c135751192d763853e24a3629dR34]. Existing components still refer to {{CoarseGrainedSchedulerBackend}} and {{TaskSchedulerImpl}} but that's fine since the standalone, YARN, and Mesos scheduler internals should be able to use the non-public APIs. An implementation of the Kubernetes feature using this draft API is provided [here|https://github.com/palantir/spark/pull/90], and the Kubernetes-specific components don't need to reference {{CoarseGrainedSchedulerBackend}} or {{TaskSchedulerImpl}}.

The thought experiment shows that there would be a non-trivial amount of complexity that would be introduced if schedulers were to be made truly pluggable. The extra complexity and changes to the existing scheduler might de-stabilize the existing cluster manager support; for example, this prototype reorganizes much of the executor loss coordination logic as well but I haven't tested those changes thoroughly. The alternative to avoid this complexity would be to make {{CoarseGrainedSchedulerBackend}} and {{TaskSchedulerImpl}} part of the public API, but I'm extremely wary of going down this path because we would not just be exposing an interface, but rather we would be exposing a heavily opinionated implementation. Custom subclasses of {{CoarseGrainedSchedulerBackend}} would be expected to be resilient to changes in the implementations of these complex classes.

Given the results from my experimentation and the drawbacks I've already highlighted in maintaining a separate fork, I still would favor integrating Kubernetes into the existing scheduler framework in-repo and marking it as an experimental feature for several releases, following the precedent of how the SQL and YARN experimental features were built and released in the past. [~rxin], where should we go from here?

> Support native submission of spark jobs to a kubernetes cluster
> ---------------------------------------------------------------
>
>                 Key: SPARK-18278
>                 URL: https://issues.apache.org/jira/browse/SPARK-18278
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Build, Deploy, Documentation, Scheduler, Spark Core
>            Reporter: Erik Erlandson
>         Attachments: SPARK-18278 - Spark on Kubernetes Design Proposal.pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting Spark applications to a kubernetes cluster.   The submitted application runs in a driver executing on a kubernetes pod, and executors lifecycles are also managed as pods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org