You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by djvulee <gi...@git.apache.org> on 2017/07/17 08:00:44 UTC

[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

GitHub user djvulee opened a pull request:

    https://github.com/apache/spark/pull/18651

    [SPARK-21383][Core] Fix the YarnAllocator allocates more Resource

    ## What changes were proposed in this pull request?
    When NodeManagers launching Executors,
    the `missing` value will excel the
    real value when the launch is slow, this can lead to YARN allocates more resource.
    
    
    We add the `numExecutorToBeLaunched` when calculate the `missing` to avoid this.
    
    
    ## How was this patch tested?
    Test by experiment.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/djvulee/spark YarnAllocate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18651
    
----
commit 818c9126959e8576861478e18389e6ed8fdbeac4
Author: DjvuLee <li...@bytedance.com>
Date:   2017-07-17T07:54:09Z

    [Core] Fix the YarnAllocator allocate more Resource
    
    When NodeManagers launched the Executors, the missing will excel the
    real value, this can lead to YARN allocate more resource.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by djvulee <gi...@git.apache.org>.
Github user djvulee commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    I update the code, please take a look at @vanzin @tgravescs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by djvulee <gi...@git.apache.org>.
Github user djvulee commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128143898
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -82,6 +82,8 @@ private[yarn] class YarnAllocator(
     
       @volatile private var numExecutorsRunning = 0
     
    +  @volatile private var numExecutorToBeLaunched = 0
    --- End diff --
    
    OK,I will change the name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79789/testReport)** for PR 18651 at commit [`582f424`](https://github.com/apache/spark/commit/582f4241e2592e63f58e9333b1e98cde6bc4d6f8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128096119
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -82,6 +82,8 @@ private[yarn] class YarnAllocator(
     
       @volatile private var numExecutorsRunning = 0
     
    +  @volatile private var numExecutorToBeLaunched = 0
    --- End diff --
    
    A better name would be `numExecutorsStarting`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79716/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79743/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79764/testReport)** for PR 18651 at commit [`cc73e5b`](https://github.com/apache/spark/commit/cc73e5bd774d9de2d27781afbf77e90413438711).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by djvulee <gi...@git.apache.org>.
Github user djvulee commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128943316
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    I agree that put `numExecutorsStarting.decrementAndGet()` together with `numExecutorsRunning.incrementAndGet()` in the `updateInternalState` is better if we can.
    
    Why I try to put `numExecutorsStarting.decrementAndGet()` in the `finally` block is that if there some Exceptions is not `NonFatal`, and caught by the following code, we may can not allocated resources as we  specified, this is the same as @vanzin worried.
    
    We may double the count in the current code, but this only slow down the allocation rate for a small time.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128828098
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    Note I thought the last version handled this properly where he had the decrement in the catch block


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79743/testReport)** for PR 18651 at commit [`856ebd6`](https://github.com/apache/spark/commit/856ebd6b51bd2531b12947afe46896d5ea5dbf8e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79928/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79789/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by djvulee <gi...@git.apache.org>.
Github user djvulee commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128144194
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -505,32 +508,37 @@ private[yarn] class YarnAllocator(
     
           if (numExecutorsRunning < targetNumExecutors) {
             if (launchContainers) {
    -          launcherPool.execute(new Runnable {
    -            override def run(): Unit = {
    -              try {
    -                new ExecutorRunnable(
    -                  Some(container),
    -                  conf,
    -                  sparkConf,
    -                  driverUrl,
    -                  executorId,
    -                  executorHostname,
    -                  executorMemory,
    -                  executorCores,
    -                  appAttemptId.getApplicationId.toString,
    -                  securityMgr,
    -                  localResources
    -                ).run()
    -                updateInternalState()
    -              } catch {
    -                case NonFatal(e) =>
    -                  logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    -                  amClient.releaseAssignedContainer(containerId)
    +          try {
    +            numExecutorToBeLaunched += 1
    +            launcherPool.execute(new Runnable {
    +              override def run(): Unit = {
    +                try {
    +                  new ExecutorRunnable(
    +                    Some(container),
    +                    conf,
    +                    sparkConf,
    +                    driverUrl,
    +                    executorId,
    +                    executorHostname,
    +                    executorMemory,
    +                    executorCores,
    +                    appAttemptId.getApplicationId.toString,
    +                    securityMgr,
    +                    localResources
    +                  ).run()
    +                  updateInternalState()
    +                } catch {
    +                  case NonFatal(e) =>
    +                    logError(s"Failed to launch executor $executorId on container $containerId", e)
    +                    // Assigned container should be released immediately
    +                    // to avoid unnecessary resource occupation.
    +                    amClient.releaseAssignedContainer(containerId)
    +                }
                   }
    -            }
    -          })
    +            })
    +          } finally {
    +            numExecutorToBeLaunched -= 1
    --- End diff --
    
    Yes, you're right. When I test the code by experiment, i decrease the `numExecutorToBeLaunched` in the `updateInternalState` function, but I later found this may impact the test.
    
    I will fix this soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128247587
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -270,7 +272,7 @@ private[yarn] class YarnAllocator(
           logDebug("Allocated containers: %d. Current executor count: %d. Cluster resources: %s."
             .format(
               allocatedContainers.size,
    -          numExecutorsRunning,
    +          numExecutorsRunning.get,
    --- End diff --
    
    it woudl be nice to print the numExecutorsStarting here as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128828035
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    this is "starting" not "running".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128827627
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    sorry don't follow. updateInternalState is called right after the run if it succeeds.  You do still need to handle the failure as well but in the failure the running is never incremented


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128339942
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -242,7 +244,7 @@ private[yarn] class YarnAllocator(
         if (executorIdToContainer.contains(executorId)) {
           val container = executorIdToContainer.get(executorId).get
           internalReleaseContainer(container)
    -      numExecutorsRunning -= 1
    --- End diff --
    
    This doesn't need to be an atomic integer because this method is synchronized already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    LGTM.  @vanzin  anything further?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128829289
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    not sure I follow your reference between starting and running.  I was just saying in the failure case it doesn't matter because you aren't going to overcount.
    
    If we don't decrement the starting inside of updateInternalState we have the possibility to over count because running and starting aren't incremented/decremented within the synchronize block.  We don't want to do that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128834971
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    Ok, I see. The current code can double count the same executor as starting and running, while the previous code could count it as starting even though it failed to start (for a really small window), but that is a self-healing situation while the previous can have some adverse effects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by djvulee <gi...@git.apache.org>.
Github user djvulee commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128290766
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -294,7 +296,8 @@ private[yarn] class YarnAllocator(
       def updateResourceRequests(): Unit = {
         val pendingAllocate = getPendingAllocate
         val numPendingAllocate = pendingAllocate.size
    -    val missing = targetNumExecutors - numPendingAllocate - numExecutorsRunning
    +    val missing = targetNumExecutors - numPendingAllocate -
    +      numExecutorsStarting.get - numExecutorsRunning.get
     
    --- End diff --
    
    Thanks for your advice! I just add the debug info.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Merging to master / 2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by djvulee <gi...@git.apache.org>.
Github user djvulee commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128432387
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -242,7 +244,7 @@ private[yarn] class YarnAllocator(
         if (executorIdToContainer.contains(executorId)) {
           val container = executorIdToContainer.get(executorId).get
           internalReleaseContainer(container)
    -      numExecutorsRunning -= 1
    --- End diff --
    
    Yes, I just try to keep consistency with `numExecutorsStarting`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128829650
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    `numExecutorsStarting` is incremented before the task is submitted and needs to be decremented after the task finishes, successfully or not. That's exactly what the current code does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by djvulee <gi...@git.apache.org>.
Github user djvulee commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    I just update the code, and test by experiment, can you take a look at @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128095881
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -505,32 +508,37 @@ private[yarn] class YarnAllocator(
     
           if (numExecutorsRunning < targetNumExecutors) {
             if (launchContainers) {
    -          launcherPool.execute(new Runnable {
    -            override def run(): Unit = {
    -              try {
    -                new ExecutorRunnable(
    -                  Some(container),
    -                  conf,
    -                  sparkConf,
    -                  driverUrl,
    -                  executorId,
    -                  executorHostname,
    -                  executorMemory,
    -                  executorCores,
    -                  appAttemptId.getApplicationId.toString,
    -                  securityMgr,
    -                  localResources
    -                ).run()
    -                updateInternalState()
    -              } catch {
    -                case NonFatal(e) =>
    -                  logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    -                  amClient.releaseAssignedContainer(containerId)
    +          try {
    +            numExecutorToBeLaunched += 1
    +            launcherPool.execute(new Runnable {
    +              override def run(): Unit = {
    +                try {
    +                  new ExecutorRunnable(
    +                    Some(container),
    +                    conf,
    +                    sparkConf,
    +                    driverUrl,
    +                    executorId,
    +                    executorHostname,
    +                    executorMemory,
    +                    executorCores,
    +                    appAttemptId.getApplicationId.toString,
    +                    securityMgr,
    +                    localResources
    +                  ).run()
    +                  updateInternalState()
    +                } catch {
    +                  case NonFatal(e) =>
    +                    logError(s"Failed to launch executor $executorId on container $containerId", e)
    +                    // Assigned container should be released immediately
    +                    // to avoid unnecessary resource occupation.
    +                    amClient.releaseAssignedContainer(containerId)
    +                }
                   }
    -            }
    -          })
    +            })
    +          } finally {
    +            numExecutorToBeLaunched -= 1
    --- End diff --
    
    Same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128096434
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -505,32 +508,37 @@ private[yarn] class YarnAllocator(
     
           if (numExecutorsRunning < targetNumExecutors) {
             if (launchContainers) {
    -          launcherPool.execute(new Runnable {
    -            override def run(): Unit = {
    -              try {
    -                new ExecutorRunnable(
    -                  Some(container),
    -                  conf,
    -                  sparkConf,
    -                  driverUrl,
    -                  executorId,
    -                  executorHostname,
    -                  executorMemory,
    -                  executorCores,
    -                  appAttemptId.getApplicationId.toString,
    -                  securityMgr,
    -                  localResources
    -                ).run()
    -                updateInternalState()
    -              } catch {
    -                case NonFatal(e) =>
    -                  logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    -                  amClient.releaseAssignedContainer(containerId)
    +          try {
    +            numExecutorToBeLaunched += 1
    +            launcherPool.execute(new Runnable {
    +              override def run(): Unit = {
    +                try {
    +                  new ExecutorRunnable(
    +                    Some(container),
    +                    conf,
    +                    sparkConf,
    +                    driverUrl,
    +                    executorId,
    +                    executorHostname,
    +                    executorMemory,
    +                    executorCores,
    +                    appAttemptId.getApplicationId.toString,
    +                    securityMgr,
    +                    localResources
    +                  ).run()
    +                  updateInternalState()
    +                } catch {
    +                  case NonFatal(e) =>
    +                    logError(s"Failed to launch executor $executorId on container $containerId", e)
    +                    // Assigned container should be released immediately
    +                    // to avoid unnecessary resource occupation.
    +                    amClient.releaseAssignedContainer(containerId)
    +                }
                   }
    -            }
    -          })
    +            })
    +          } finally {
    +            numExecutorToBeLaunched -= 1
    --- End diff --
    
    Also, shouldn't this be done in the actual thread that launching the executor? Otherwise you're decrementing the counter as soon as the task is submitted for execution, but you're not really waiting for the task to execute.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128249084
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -294,7 +296,8 @@ private[yarn] class YarnAllocator(
       def updateResourceRequests(): Unit = {
         val pendingAllocate = getPendingAllocate
         val numPendingAllocate = pendingAllocate.size
    -    val missing = targetNumExecutors - numPendingAllocate - numExecutorsRunning
    +    val missing = targetNumExecutors - numPendingAllocate -
    +      numExecutorsStarting.get - numExecutorsRunning.get
     
    --- End diff --
    
    can you also add in a debug message here, something like below.  I found this very useful when debugging this issue and think it would be useful for debugging other allocating issues in the future.
    
    logDebug(s"Updating resource requests, target: $targetNumExecutors, pending: " +
     +      s"$numPendingAllocate, running: $numExecutorsRunning, executorsStarting $numExecutorsPendingStart")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79745/testReport)** for PR 18651 at commit [`bf7f78c`](https://github.com/apache/spark/commit/bf7f78c2ed43d7fc858a03e314c221fb834c020d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128831747
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    yes but its a bug right now as the numbers can be wrong. Are you looking at the synchronization?
    
    Right now everything is called synchronized up to the point of launcher pool to do the ExecutorRunnable.  At this point running is not incremented, pending is decremented and we now increment Starting.  That is fine.
    
    But when the ExecutorRunnable finishes the only place its called synchronized is in updateInternalState.  This right now increments running but does not decrement starting.  if updateResourceRequests gets called (which is synchronized), Right after updateInternalState (which leave the syncrhonized) but before the finally block executes and decrements starting the total number can be more then it really is.  That executor is counted as both running and starting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79743/testReport)** for PR 18651 at commit [`856ebd6`](https://github.com/apache/spark/commit/856ebd6b51bd2531b12947afe46896d5ea5dbf8e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by djvulee <gi...@git.apache.org>.
Github user djvulee commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128435807
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,8 +535,9 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
    +                  numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    Yes, it is more robust. I have update the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18651
  
        the missing value will excel the
    
    Not sure what that means. Maybe "exceed"?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79745/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128828683
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -503,8 +511,9 @@ private[yarn] class YarnAllocator(
             allocatedContainerToHostMap.put(containerId, executorHostname)
           }
     
    -      if (numExecutorsRunning < targetNumExecutors) {
    +      if (numExecutorsRunning.get < targetNumExecutors) {
             if (launchContainers) {
    +          numExecutorsStarting.incrementAndGet()
    --- End diff --
    
    note if we change the below back this needs to move outside if(launchContainers)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79764/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128807459
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    why did you move this out of updateInternalState?  I would rather see those stay together because updateInternalState is synchronized and if they aren't changed at the same time you could end up with the wrong numbers. numExecutorsStarting could still be say 1 when you have already added that to the # running and then updateResourceRequests gets an incorrect number.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128095843
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -505,32 +508,37 @@ private[yarn] class YarnAllocator(
     
           if (numExecutorsRunning < targetNumExecutors) {
             if (launchContainers) {
    -          launcherPool.execute(new Runnable {
    -            override def run(): Unit = {
    -              try {
    -                new ExecutorRunnable(
    -                  Some(container),
    -                  conf,
    -                  sparkConf,
    -                  driverUrl,
    -                  executorId,
    -                  executorHostname,
    -                  executorMemory,
    -                  executorCores,
    -                  appAttemptId.getApplicationId.toString,
    -                  securityMgr,
    -                  localResources
    -                ).run()
    -                updateInternalState()
    -              } catch {
    -                case NonFatal(e) =>
    -                  logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    -                  amClient.releaseAssignedContainer(containerId)
    +          try {
    +            numExecutorToBeLaunched += 1
    --- End diff --
    
    This is not thread safe. `+=` is two operations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128838209
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    yep, the fact that its still marked as Starting even though failed will fix itself next loop through. Its no different then if we didn't know it failed and it was still in the ExecutorRunnable.run code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128340324
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,8 +535,9 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
    +                  numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    Safer to put this in a `finally`, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79745/testReport)** for PR 18651 at commit [`bf7f78c`](https://github.com/apache/spark/commit/bf7f78c2ed43d7fc858a03e314c221fb834c020d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r129038408
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    If its a fatal error I don't really expect us to continue so dynamic allocation doesn't matter. If you know a case where we would recover from fatal exception, I'm fine with adding in another catch there to decrement in the fatal case as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79928/testReport)** for PR 18651 at commit [`7703494`](https://github.com/apache/spark/commit/77034944610c5973325bb3fd71ac9f153f59d32b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79928/testReport)** for PR 18651 at commit [`7703494`](https://github.com/apache/spark/commit/77034944610c5973325bb3fd71ac9f153f59d32b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128807888
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on container $containerId", e)
    -                  // Assigned container should be released immediately to avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    If you move this to `updateInternalState`, then if there's a problem in `ExecutorRunnable.run` this counter will not be decremented.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79716/testReport)** for PR 18651 at commit [`818c912`](https://github.com/apache/spark/commit/818c9126959e8576861478e18389e6ed8fdbeac4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18651: [SPARK-21383][Core] Fix the YarnAllocator allocat...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18651


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79716/testReport)** for PR 18651 at commit [`818c912`](https://github.com/apache/spark/commit/818c9126959e8576861478e18389e6ed8fdbeac4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79764/testReport)** for PR 18651 at commit [`cc73e5b`](https://github.com/apache/spark/commit/cc73e5bd774d9de2d27781afbf77e90413438711).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18651: [SPARK-21383][Core] Fix the YarnAllocator allocates more...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18651
  
    **[Test build #79789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79789/testReport)** for PR 18651 at commit [`582f424`](https://github.com/apache/spark/commit/582f4241e2592e63f58e9333b1e98cde6bc4d6f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org