You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (JIRA)" <ji...@apache.org> on 2014/12/20 10:43:13 UTC

[jira] [Commented] (SPARK-3561) Allow for pluggable execution contexts in Spark

    [ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254605#comment-14254605 ] 

Nicholas Chammas commented on SPARK-3561:
-----------------------------------------

{quote}
One of the main reasons for this proposal was to have better elasticity in YARN. Towards that end, can you try out the new elastic scaling code for YARN (SPARK-3174)? It will ship in Spark 1.2 and be one of the main new features. This integrates nicely with YARN's native shuffle service. In fact IIRC the main design of the shuffle service in YARN was specifically for this purpose. In that way, I think elements of this proposal are indeed making it into Spark.
{quote}

Now that 1.2.0 (which includes [SPARK-3174]) is out, what is the status of this proposal?

>From what I understand, the interest in this proposal stemmed mainly from people wanting elastic scaling and better utilization of cluster resources in Spark-on-YARN deployments. 1.2.0 includes such improvements as part of [SPARK-3174].

Are those improvements sufficiently addressing users' needs in that regard? If not, is the best solution then to implement this proposal here, or do we just need to keep iterating on Spark's existing integration with YARN via existing integration points?

Perhaps I didn't properly grasp the motivation for this proposal, but reading through the discussion in the comments here, it seems like implementing a new execution context for Spark just to improve its performance on YARN is overkill.

> Allow for pluggable execution contexts in Spark
> -----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark provides integration with external resource-managers such as Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the current architecture of Spark-on-YARN can be enhanced to provide significantly better utilization of cluster resources for large scale, batch and/or ETL applications when run alongside other applications (Spark and others) and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) - a gateway and a delegate to Hadoop execution environment - as a non-public api (@Experimental) not exposed to end users of Spark. 
> The trait will define 6 operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> * persist
> * unpersist
> Each method directly maps to the corresponding methods in current version of SparkContext. JobExecutionContext implementation will be accessed by SparkContext via master URL as "execution-context:foo.bar.MyJobExecutionContext" with default implementation containing the existing code from SparkContext, thus allowing current (corresponding) methods of SparkContext to delegate to such implementation. An integrator will now have an option to provide custom implementation of DefaultExecutionContext by either implementing it from scratch or extending form DefaultExecutionContext. 
> Please see the attached design doc for more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org