You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by nyogesh <gi...@git.apache.org> on 2018/04/25 07:32:29 UTC

[GitHub] spark pull request #21150: [SPARK-24075][MESOS]

GitHub user nyogesh opened a pull request:

    https://github.com/apache/spark/pull/21150

    [SPARK-24075][MESOS]

    ## What changes were proposed in this pull request?
    
    * Spark drivers run on mesos with supervise enabled will try the driver indefinitely on failures. This PR is to optionally limit the number of retries to a configurable number.
    * Introducing sparkConf "spark.mesos.driver.supervise.maxRetries" which limits the number of times the driver can be retried. The default behavior is for the driver to be retried indefinitely.
    * When the check is made to see if supervise is enabled and the job has to be retried, an additional check is made to see if the allowable retries have been exceeded.
    * Added unit tests for method hasDriverExceededRetries.
    * Added documentation for "spark.mesos.driver.supervise.maxRetries".
    
    ## How was this patch tested?
    
    Added unit tests
    
    Built spark package, dockerized it and deployed it as a service on Mesos. Once spark service was running on mesos, a series of drivers were submitted
    
    1. With supervise disabled, ran a successful driver
    2. With supervise disabled, ran a failure driver
    3. With supervise enabled, ran a successful driver
    4. With supervise enabled and without setting "spark.mesos.driver.supervise.maxRetries", ran failure driver
    5. With supervise enabled and setting "spark.mesos.driver.supervise.maxRetries=3", ran a failure driver


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nyogesh/spark SPARK-24075

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21150
    
----
commit fcfe1e80842f758b4ee0004b7ce5730bca40f543
Author: Yogesh Natarajan <yn...@...>
Date:   2018-04-25T07:08:55Z

    [SPARK-24075][MESOS]
    
    * Spark drivers run on mesos with supervise enabled will try to run the driver indefinitely on failures. This PR is to optionally limit the number of retries to a configurable number.
    * Introducing sparkConf "spark.mesos.driver.supervise.maxRetries" which limits the number of times the driver can be retried. The default behavior is for the driver to be retried indefinitely.
    * When a check is made to see if supervise is enabled, an additional check is made to see if the allowable retries have been exceeded.
    * Added unit tests for method hasDriverExceededRetries.
    * Added documentation for "spark.mesos.driver.supervise.maxRetries".

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    **[Test build #92797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92797/testReport)** for PR 21150 at commit [`0a93f57`](https://github.com/apache/spark/commit/0a93f576c647f20751c4186475e9943b5be7c150).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21150: [SPARK-24075][MESOS] Option to limit number of re...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21150#discussion_r227028091
  
    --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala ---
    @@ -728,6 +729,28 @@ private[spark] class MesosClusterScheduler(
           state == MesosTaskState.TASK_LOST
       }
     
    +  /**
    +   * Check if the driver has exceed the number of retries.
    +   * When "spark.mesos.driver.supervise.maxRetries" is not set,
    +   * the default behavior is to retry indefinitely
    +   *
    +   * @param retryState Retry state of the driver
    +   * @param conf Spark Context to check if it contains "spark.mesos.driver.supervise.maxRetries"
    +   * @return true if driver has reached retry limit
    +   *         false if driver can be retried
    +   */
    +  private[scheduler] def hasDriverExceededRetries(retryState: Option[MesosClusterRetryState],
    --- End diff --
    
    Please fix the param style:
    hasDriverExceededRetries(
         retryState: Option[MesosClusterRetryState],
         conf.....) 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    **[Test build #92797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92797/testReport)** for PR 21150 at commit [`0a93f57`](https://github.com/apache/spark/commit/0a93f576c647f20751c4186475e9943b5be7c150).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92797/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    @tnachen 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21150
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org