You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kbzod <gi...@git.apache.org> on 2014/07/17 19:08:56 UTC

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

GitHub user kbzod opened a pull request:

    https://github.com/apache/spark/pull/1465

    SPARK-2083 Add support for spark.local.maxFailures configuration property

    The logic in `SparkContext` for creating a new task scheduler now looks for a "spark.local.maxFailures" property to specify the number of task failures in a local job that will cause the job to fail. Its default is the prior fixed value of 1 (no retries).
    
    The patch includes documentation updates and new unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kbzod/spark SPARK-2083

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1465.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1465
    
----
commit a8b369dad154b7773ec8b535cf399b428ec3952c
Author: Bill Havanki <bh...@cloudera.com>
Date:   2014-07-17T16:01:22Z

    SPARK-2083 Add support for spark.local.maxFailures configuration property

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-51719921
  
    I think there's already a mechanism to set this by using `local[N, maxFailures]` to create your SparkContext:
    
    ```scala
    // Regular expression for local[N, maxRetries], used in tests with failing tasks
         val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+)\s*,\s*([0-9]+)\]""".r
    
    // ...
    
    case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
             val scheduler = new TaskSchedulerImpl(sc, maxFailures.toInt, isLocal = true)
             val backend = new LocalBackend(scheduler, threads.toInt)
             scheduler.initialize(backend)
             scheduler
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1465#discussion_r15148112
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1477,7 +1478,8 @@ object SparkContext extends Logging {
             def localCpuCount = Runtime.getRuntime.availableProcessors()
             // local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
             val threadCount = if (threads == "*") localCpuCount else threads.toInt
    -        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
    +        val localTaskFailures = sc.conf.getInt("spark.local.maxFailures", MAX_LOCAL_TASK_FAILURES)
    --- End diff --
    
    here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-62332453
  
    I'm going to close this issue as wontfix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-52587730
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-49335741
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by kbzod <gi...@git.apache.org>.
Github user kbzod commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1465#discussion_r15245702
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1463,12 +1463,13 @@ object SparkContext extends Logging {
         // Regular expression for connection to Simr cluster
         val SIMR_REGEX = """simr://(.*)""".r
     
    -    // When running locally, don't try to re-execute tasks on failure.
    +    // When running locally, by default don't try to re-execute tasks on failure.
         val MAX_LOCAL_TASK_FAILURES = 1
     
         master match {
           case "local" =>
    -        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
    +        val localTaskFailures = sc.conf.getInt("spark.local.maxFailures", MAX_LOCAL_TASK_FAILURES)
    --- End diff --
    
    Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by kbzod <gi...@git.apache.org>.
Github user kbzod commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-53260871
  
    @JoshRosen You are right, the `local[N, maxfailures]` mechanism works already, but the filer of [SPARK-2083](https://issues.apache.org/jira/browse/SPARK-2083) stated that "it is not documented and hard to manage", and suggested a patch like this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-54694585
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1465#discussion_r15148114
  
    --- Diff: docs/configuration.md ---
    @@ -599,6 +599,15 @@ Apart from these, the following properties are also available, and may be useful
       <td>
         Number of individual task failures before giving up on the job.
         Should be greater than or equal to 1. Number of allowed retries = this value - 1.
    +    Does not apply to running Spark locally.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.local.maxFailures</code></td>
    +  <td>1</td>
    +  <td>
    +    Number of individual task failures before giving up on the job, when running Spark locally.
    +    Should be greater than or equal to 1. No retries are allowed.
    --- End diff --
    
    "No retries are allowed." ? What does this mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-55338673
  
    add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by roji <gi...@git.apache.org>.
Github user roji commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-51718343
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-56902717
  
    If this is being used for testing, I don't see a compelling reason to adding a config over using the constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1465#discussion_r15148111
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1463,12 +1463,13 @@ object SparkContext extends Logging {
         // Regular expression for connection to Simr cluster
         val SIMR_REGEX = """simr://(.*)""".r
     
    -    // When running locally, don't try to re-execute tasks on failure.
    +    // When running locally, by default don't try to re-execute tasks on failure.
         val MAX_LOCAL_TASK_FAILURES = 1
     
         master match {
           case "local" =>
    -        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
    +        val localTaskFailures = sc.conf.getInt("spark.local.maxFailures", MAX_LOCAL_TASK_FAILURES)
    --- End diff --
    
    i'd rename the variable maxTaskFailures


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1465#issuecomment-55338661
  
    @kbzod Why do we need a separate config for the local case? I think the correct solution is to use the same config, but set a different default value for local mode. Right now this doesn't work because we pass in a hard-coded value of 1, but we can change that to take in `spark.task.maxFailures` instead (and then default to 1).
    
    Also, can you rebase to master when you have the chance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by kbzod <gi...@git.apache.org>.
Github user kbzod commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1465#discussion_r15245710
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1477,7 +1478,8 @@ object SparkContext extends Logging {
             def localCpuCount = Runtime.getRuntime.availableProcessors()
             // local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
             val threadCount = if (threads == "*") localCpuCount else threads.toInt
    -        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
    +        val localTaskFailures = sc.conf.getInt("spark.local.maxFailures", MAX_LOCAL_TASK_FAILURES)
    --- End diff --
    
    Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1465


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

Posted by kbzod <gi...@git.apache.org>.
Github user kbzod commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1465#discussion_r15245692
  
    --- Diff: docs/configuration.md ---
    @@ -599,6 +599,15 @@ Apart from these, the following properties are also available, and may be useful
       <td>
         Number of individual task failures before giving up on the job.
         Should be greater than or equal to 1. Number of allowed retries = this value - 1.
    +    Does not apply to running Spark locally.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.local.maxFailures</code></td>
    +  <td>1</td>
    +  <td>
    +    Number of individual task failures before giving up on the job, when running Spark locally.
    +    Should be greater than or equal to 1. No retries are allowed.
    --- End diff --
    
    Ah, that's an error. It should say "Number of allowed retries is this value - 1." Will fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---