You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by uncleGen <gi...@git.apache.org> on 2017/02/15 08:01:53 UTC

[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

GitHub user uncleGen opened a pull request:

    https://github.com/apache/spark/pull/16936

    [SPARK-19605][DStream] Fail it if existing resource is not enough to run streaming job

    ## What changes were proposed in this pull request?
    
    For more detailed discussion, please review:
    - [SPARK-19605](https://issues.apache.org/jira/browse/SPARK-19605)
    
    ## How was this patch tested?
    
    add new unit test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uncleGen/spark SPARK-19605

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16936.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16936
    
----
commit 4093330ee0c85f2c374db3c3ff1d05832c77b89e
Author: uncleGen <hu...@gmail.com>
Date:   2017-02-15T07:58:34Z

    SPARK-19605: Fail it if existing resource is not enough to run streaming job

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72928/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    (There is no detail in the JIRA that you point to.) I think we already discussed this in part? this duplicates some logic about how cluster resource management works. Does it really not work if not enough receivers can schedule? that seems like a condition to check somewhere else, like, log a warning that a batch can't operate or can't start because not all receivers scheduled. This could happen even if enough resources are theoretically available, just not available to the app.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    **[Test build #72928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72928/testReport)** for PR 16936 at commit [`19c8adb`](https://github.com/apache/spark/commit/19c8adbc1df4063c95fa55eae12be96f528bd204).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    Hm it just seems like the wrong approach, to externally estimate whether in theory it won't schedule. It is certainly a problem if streaming doesn't work though users would already realize it. The error check or message could be more explicit but it seems like something the streaming machinery should know and warn about?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    **[Test build #72926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72926/testReport)** for PR 16936 at commit [`4093330`](https://github.com/apache/spark/commit/4093330ee0c85f2c374db3c3ff1d05832c77b89e).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16936#discussion_r101223957
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala ---
    @@ -938,7 +958,7 @@ object SlowTestReceiver {
     /** Streaming application for testing DStream and RDD creation sites */
     package object testPackage extends Assertions {
       def test() {
    -    val conf = new SparkConf().setMaster("local").setAppName("CreationSite test")
    +    val conf = new SparkConf().setMaster("local[2]").setAppName("CreationSite test")
         val ssc = new StreamingContext(conf, Milliseconds(100))
    --- End diff --
    
    unit test fail here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    Let us call @zsxwing for some suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    **[Test build #72928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72928/testReport)** for PR 16936 at commit [`19c8adb`](https://github.com/apache/spark/commit/19c8adbc1df4063c95fa55eae12be96f528bd204).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen closed the pull request at:

    https://github.com/apache/spark/pull/16936


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16936#discussion_r101223862
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala ---
    @@ -215,10 +215,6 @@ package object config {
     
       /* Executor configuration. */
     
    -  private[spark] val EXECUTOR_CORES = ConfigBuilder("spark.executor.cores")
    -    .intConf
    -    .createWithDefault(1)
    -
       private[spark] val EXECUTOR_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.executor.memoryOverhead")
    --- End diff --
    
    multi-definition error in ApplicationMaster.scala, remove this as we add it in core


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16936#discussion_r101223265
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala ---
    @@ -437,6 +438,74 @@ class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false
       }
     
       /**
    +   * Check if existing resource is enough to run job.
    +   */
    +  private def checkResourceValid(): Unit = {
    +    val coresPerTask = ssc.conf.get(CPUS_PER_TASK)
    +
    +    def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
    +
    +    ssc.conf.get("spark.master") match {
    +      case m if m.startsWith("yarn") =>
    +        val numCoresPerExecutor = ssc.conf.get(EXECUTOR_CORES)
    +        val numExecutors = getTargetExecutorNumber()
    +        if (numExecutors * numCoresPerExecutor / coresPerTask < numReceivers) {
    +          throw new SparkException("There are no enough resource to run Spark Streaming job: " +
    +            s"existing resource can only be used to scheduler some of receivers." +
    +            s"$numExecutors executors, $numCoresPerExecutor cores per executor, $coresPerTask " +
    +            s"cores per task and $numReceivers receivers.")
    +        }
    +      case m if m.startsWith("spark") || m.startsWith("mesos") =>
    +        val coresMax = ssc.conf.get(CORES_MAX).getOrElse(0)
    +        if (coresMax / coresPerTask < numReceivers) {
    +          throw new SparkException("There are no enough resource to run Spark Streaming job: " +
    +            s"existing resource can only be used to scheduler some of receivers." +
    +            s"$coresMax cores totally, $coresPerTask cores per task and $numReceivers receivers.")
    +        }
    +      case m if m.startsWith("local") =>
    +        m match {
    +          case "local" =>
    +            throw new SparkException("There are no enough resource to run Spark Streaming job.")
    +          case SparkMasterRegex.LOCAL_N_REGEX(threads) =>
    +            val threadCount = if (threads == "*") localCpuCount else threads.toInt
    +            if (threadCount / coresPerTask < numReceivers) {
    +              throw new SparkException("There are no enough resource to run Spark Streaming job: " +
    +                s"existing resource can only be used to scheduler some of receivers." +
    +                s"$threadCount threads, $coresPerTask threads per task and $numReceivers " +
    +                s"receivers.")
    +            }
    +          case SparkMasterRegex.LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
    +            val threadCount = if (threads == "*") localCpuCount else threads.toInt
    +            if (threadCount / coresPerTask < numReceivers) {
    +              throw new SparkException("There are no enough resource to run Spark Streaming job: " +
    +                s"existing resource can only be used to scheduler some of receivers." +
    +                s"$threadCount threads, $coresPerTask threads per task and $numReceivers " +
    +                s"receivers.")
    +            }
    +          case SparkMasterRegex.LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
    +            val coresMax = numSlaves.toInt * coresPerSlave.toInt
    +            if (coresMax / coresPerTask < numReceivers) {
    +              throw new SparkException("There are no enough resource to run Spark Streaming job: " +
    +                s"existing resource can only be used to scheduler some of receivers." +
    +                s"$numSlaves slaves, $coresPerSlave cores per slave, $coresPerTask " +
    +                s"cores per task and $numReceivers receivers.")
    +            }
    +        }
    +    }
    +  }
    +
    +  private def getTargetExecutorNumber(): Int = {
    +    if (Utils.isDynamicAllocationEnabled(ssc.conf)) {
    +      ssc.conf.get(DYN_ALLOCATION_MAX_EXECUTORS)
    +    } else {
    +      val targetNumExecutors =
    +        sys.env.get("SPARK_EXECUTOR_INSTANCES").map(_.toInt).getOrElse(2)
    +      // System property can override environment variable.
    --- End diff --
    
    here "2" refers to "YarnSparkHadoopUtil.DEFAULT_NUMBER_EXECUTORS"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    I agreed with @srowen. Streaming should not need to know the details about the mode, and when someone changes other modules, they won't know these codes inside streaming. I would like to see a cleaner solution.
    
    In addition, when there are not enough resources in a cluster, even if the user sets a large core number, they will still not be able to run the streaming job. Hence, they should always be aware of this limitation of Streaming receivers, and it seems not worth to fix this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72926/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    **[Test build #72926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72926/testReport)** for PR 16936 at commit [`4093330`](https://github.com/apache/spark/commit/4093330ee0c85f2c374db3c3ff1d05832c77b89e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the issue:

    https://github.com/apache/spark/pull/16936
  
    @srowen 
    > Does it really not work if not enough receivers can schedule? 
    
    That's not what I want to express. What I mean is the stream output can not operate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org