You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by olegz <gi...@git.apache.org> on 2014/10/20 02:40:20 UTC

[GitHub] spark pull request: Initial commit to provide pluggable strategy t...

GitHub user olegz opened a pull request:

    https://github.com/apache/spark/pull/2849

    Initial commit to provide pluggable strategy to facilitate access to nat...

    Initial commit to provide pluggable strategy to facilitate access to native Hadoop resources
    
    Added HadoopExecutionContext trait and its default implementation DefaultHadoopExecutionContext
    Modified SparkContext to instantiate and delegate to the instance of HadoopExecutionContext where appropriate
    
    Changed HadoopExecutionContext to JobExecutionContext
    Changed DefaultHadoopExecutionContext to DefaultExecutionContext
    Name changes are due to the fact that when Spark executes outside of Hadoop having Hadoop in the name woudl be confusing
    Added initial documentation and tests
    
    polished scaladoc
    
    annotated JobExecutionContext with @DeveloperAPI
    
    eliminated TaskScheduler null checks in favor of NoOpTaskScheduler
    to be used in cases where execution of Spark DAG is delegated to an external execution environment
    
    added execution-context check to SparkSubmit
    
    Added recognition of execution-context to SparkContext
    updated spark-class script to recognize when 'execution-context:' is used
    
    polished merge
    
    changed annotations from @DeveloperApi to @Experimental as part of the PR suggestion
    
    externalized persist and unpersist operations
    
    added classpath hooks to spark-class

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/olegz/spark-1 SH-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2849
    
----
commit 84556c86f95500f89bb57f2bcc6c35f025799dc5
Author: Oleg Zhurakousky <ol...@suitcase.io>
Date:   2014-09-16T15:26:48Z

    Initial commit to provide pluggable strategy to facilitate access to native Hadoop resources
    Added HadoopExecutionContext trait and its default implementation DefaultHadoopExecutionContext
    Modified SparkContext to instantiate and delegate to the instance of HadoopExecutionContext where appropriate
    
    Changed HadoopExecutionContext to JobExecutionContext
    Changed DefaultHadoopExecutionContext to DefaultExecutionContext
    Name changes are due to the fact that when Spark executes outside of Hadoop having Hadoop in the name woudl be confusing
    Added initial documentation and tests
    
    polished scaladoc
    
    annotated JobExecutionContext with @DeveloperAPI
    
    eliminated TaskScheduler null checks in favor of NoOpTaskScheduler
    to be used in cases where execution of Spark DAG is delegated to an external execution environment
    
    added execution-context check to SparkSubmit
    
    Added recognition of execution-context to SparkContext
    updated spark-class script to recognize when 'execution-context:' is used
    
    polished merge
    
    changed annotations from @DeveloperApi to @Experimental as part of the PR suggestion
    
    externalized persist and unpersist operations
    
    added classpath hooks to spark-class

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by olegz <gi...@git.apache.org>.

Github user olegz commented on the pull request:

    https://github.com/apache/spark/pull/2849#issuecomment-59807570
  
    @andrewor14 done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by olegz <gi...@git.apache.org>.

Github user olegz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2849#discussion_r19648060
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1639,12 +1615,38 @@ object SparkContext extends Logging {
             scheduler.initialize(backend)
             (backend, scheduler)
     
    +      case EXECUTION_CONTEXT(sparkUrl) =>
    +        logInfo("Will use custom job execution context " + sparkUrl)
    +        sc.executionContext = Class.forName(sparkUrl).newInstance().
    +            asInstanceOf[JobExecutionContext]
    +        val scheduler = new NoOpTaskScheduler(sc)
    +        val backend = new LocalBackend(scheduler, 1)
    +        (backend, scheduler)
    +        
           case _ =>
             throw new SparkException("Could not parse Master URL: '" + master + "'")
         }
       }
     }
    -
    +/**
    + * No-op implementation of TaskScheduler which is used in cases where 
    + * execution of Spark DAG is delegate to an external execution environment,
    + * thus not relying on DAGScheduler nor TaskScheduler
    + */
    +private class NoOpTaskScheduler(sc: SparkContext) extends TaskSchedulerImpl(sc, 1) {
    +  override val schedulingMode: SchedulingMode.SchedulingMode = SchedulingMode.NONE
    +  override def start(): Unit = {}
    +  override def stop(): Unit = {}
    +  override def submitTasks(taskSet: TaskSet): Unit = {}
    +  override def cancelTasks(stageId: Int, interruptThread: Boolean) = {}
    +  override def setDAGScheduler(dagScheduler: DAGScheduler): Unit = {}
    +  override def defaultParallelism(): Int = 1
    +  override def executorHeartbeatReceived(execId: String, 
    +      taskMetrics: Array[(Long, TaskMetrics)],
    +    blockManagerId: BlockManagerId): Boolean = true
    +  override def applicationId(): String = sc.appName
    --- End diff --
    
    Thanks, I'll address it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2849#issuecomment-96770156
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by maidh91 <gi...@git.apache.org>.

Github user maidh91 commented on the pull request:

    https://github.com/apache/spark/pull/2849#issuecomment-97692131
  
    Is this patch still working? when will Spark finish to verify it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by sarutak <gi...@git.apache.org>.

Github user sarutak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2849#discussion_r19647973
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1639,12 +1615,38 @@ object SparkContext extends Logging {
             scheduler.initialize(backend)
             (backend, scheduler)
     
    +      case EXECUTION_CONTEXT(sparkUrl) =>
    --- End diff --
    
    If we use custom execution engine, DAGScheduler is not needed to initialize right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2849#issuecomment-60014038
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Initial commit to provide pluggable strategy t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2849#issuecomment-59672773
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2849#issuecomment-63709032
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by olegz <gi...@git.apache.org>.

Github user olegz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2849#discussion_r19648083
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1639,12 +1615,38 @@ object SparkContext extends Logging {
             scheduler.initialize(backend)
             (backend, scheduler)
     
    +      case EXECUTION_CONTEXT(sparkUrl) =>
    --- End diff --
    
    That is correct. As you can see it is not being used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

Posted by sarutak <gi...@git.apache.org>.

Github user sarutak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2849#discussion_r19647952
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1639,12 +1615,38 @@ object SparkContext extends Logging {
             scheduler.initialize(backend)
             (backend, scheduler)
     
    +      case EXECUTION_CONTEXT(sparkUrl) =>
    +        logInfo("Will use custom job execution context " + sparkUrl)
    +        sc.executionContext = Class.forName(sparkUrl).newInstance().
    +            asInstanceOf[JobExecutionContext]
    +        val scheduler = new NoOpTaskScheduler(sc)
    +        val backend = new LocalBackend(scheduler, 1)
    +        (backend, scheduler)
    +        
           case _ =>
             throw new SparkException("Could not parse Master URL: '" + master + "'")
         }
       }
     }
    -
    +/**
    + * No-op implementation of TaskScheduler which is used in cases where 
    + * execution of Spark DAG is delegate to an external execution environment,
    + * thus not relying on DAGScheduler nor TaskScheduler
    + */
    +private class NoOpTaskScheduler(sc: SparkContext) extends TaskSchedulerImpl(sc, 1) {
    +  override val schedulingMode: SchedulingMode.SchedulingMode = SchedulingMode.NONE
    +  override def start(): Unit = {}
    +  override def stop(): Unit = {}
    +  override def submitTasks(taskSet: TaskSet): Unit = {}
    +  override def cancelTasks(stageId: Int, interruptThread: Boolean) = {}
    +  override def setDAGScheduler(dagScheduler: DAGScheduler): Unit = {}
    +  override def defaultParallelism(): Int = 1
    +  override def executorHeartbeatReceived(execId: String, 
    +      taskMetrics: Array[(Long, TaskMetrics)],
    +    blockManagerId: BlockManagerId): Boolean = true
    +  override def applicationId(): String = sc.appName
    --- End diff --
    
    Please don't use `appName` for Application ID because Application ID should be unique.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Initial commit to provide pluggable strategy t...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2849#issuecomment-59807087
  
    Hey @olegz is there an associated JIRA for this? If so could you include it in the title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org