You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sitalkedia <gi...@git.apache.org> on 2017/08/25 01:08:02 UTC

[GitHub] spark pull request #19048: [SPARK-21834] Incorrect executor request in case ...

GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/19048

    [SPARK-21834] Incorrect executor request in case of dynamic allocation

    ## What changes were proposed in this pull request?
    
    killExecutor api currently does not allow killing an executor without updating the total number of executors needed. In case of dynamic allocation is turned on and the allocator tries to kill an executor, the scheduler reduces the total number of executors needed ( see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) which is incorrect because the allocator already takes care of setting the required number of executors itself.
    
    ## How was this patch tested?
    
    Ran a job on the cluster and made sure the executor request is correct


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark skedia/oss_fix_executor_allocation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19048.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19048
    
----
commit 120f3833c86776bd90c4e1db66846d7a9035f29a
Author: Sital Kedia <sk...@fb.com>
Date:   2017-08-25T00:24:53Z

    [SPARK-21834] Incorrect executor request in case of dynamic allocation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81195/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81116/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    >> Why? Because of the idle timeout? If that's your point, then the change I referenced above should avoid that.
    
    Yes because of idle timeout. Note that the `numExecutorsTarget` is 5 and EAM has 10 executors available, so it is fine to kill 2 of them. That is not the issue. 
    
    >> How? The scheduler (a.k.a. CGSB) does not kill executors on its own. It has to be told to do so in some way
    
    Because the EAM asks it to kill 2 of them. But please note that while killing 2 executors the EAM did not reduce its target to 3, it is still 5. But since scheduler keeps its internal target, it reduces its target from 5 to 3. And the EAM and scheduler gets out of sync.
    
    >> If you can actually provide logs that show what you're trying to say that would probably be easier.
    
    Actually, I added a lot of debug log to find this issue so probably the log is not going to be of any help to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81116/testReport)** for PR 19048 at commit [`e000db3`](https://github.com/apache/spark/commit/e000db31f7ce620a004aa74841861abb4a0631af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19048: [SPARK-21834] Incorrect executor request in case ...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia closed the pull request at:

    https://github.com/apache/spark/pull/19048


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81110/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    I'm not sure I understand why is this a problem. What is the undesired behavior that happens because of this? That's not explained either in the PR nor in the bug.
    
    The way I understand the code, yes, there are potentially redundant calls to update the target number of executors; but then, `ExecutorAllocationManager` makes that call every time it wakes up (100ms), so it doesn't seem like that causes any problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    SparkR tests have been super flaky lately.
    
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Not sure why the PRB is not picking up my requests. @sitalkedia can you close and re-open the PR to see if that does it?
    
    (The change looks fine, it just would be nice to get a clean test run.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    > May be it would be cleaner if we provide a new api like this - killExecutorsAndNotUpdateTotal?
    
    I think the main thing that bothers me is that adding anything to the API is making all this code even more complicated and confusing than it already is.
    
    Having two (3 if you count the YARN allocator) places track all this state is bound to lead to these issues. Optimally only the EAM would keep track of these things; the CGSB shouldn't really be dealing with executor allocation and de-allocation, just with managing the existing executors that connect to it. But fixing things like that is probably a much larger change (the words "hornets' nest" come to mind).
    
    Barring that, I think that we should make the change that leads to the correct behavior without making the internal interface more complicated than it needs to be. If changing the semantics of `ExecutorAllocationClient` lead to the code being easier to follow, then that's what we should do. After all, there is a single implementation of it (the CGSB). (And, digressing back to my paragraph above, maybe `ExecutorAllocationClient` shouldn't even exist and we should only have the EAM. But back to this PR.)
    
    Or maybe you can reach the same thing through other means. For example, maybe if you get rid of the `replace` argument and make `killExecutors` not update the CGSB target count, and then force the caller to call `requestTotalExecutors` before killing executors, you could achieve the same thing. Maybe there are corner cases doing that, but maybe it works?
    
    If none of those work, then we can talk about adding new things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Created https://github.com/apache/spark/pull/19081.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81110/testReport)** for PR 19048 at commit [`e30bbac`](https://github.com/apache/spark/commit/e30bbac45d346e96f1004578234324cb3426b3b3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    jenkins retest this please.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81132/testReport)** for PR 19048 at commit [`e000db3`](https://github.com/apache/spark/commit/e000db31f7ce620a004aa74841861abb4a0631af).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Jenkins retest this please. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81195/testReport)** for PR 19048 at commit [`6cc5fab`](https://github.com/apache/spark/commit/6cc5fab56110034e5221c40c5dfcc38e72f8c05a).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Well, that's adding an API that does the same thing as existing APIs but a little bit differently. In my view that adds to the problem, instead of fixing it. Now every caller into the `ExecutorAllocationClient` has its own version of the API that does things the exact way they want.
    
    For example: the `replace` argument of the existing API is pretty much the functionality you want. The EAM can call `requestTotalExecutors` right before it calls `killExecutors(replace = true)` and achieve what it wants. It's just awkwardly named in this context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81194/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    (Or it can call `killExecutors()` like it does today and then call `requestTotalExecutors` right after, same result without the awkwardness of the parameter name, but that adds a trip to the cluster manager.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    That's not really true.
    The EAM uses the `requestTotalExecutors` api to set the target for the scheduler.
    
    - 10 executors are running, each executor can run 4 tasks at max.
    - 20 tasks are running so EAM sets the internal target to 5 and also asks the CGSB to set its target to 5.
    - However, only 2 executors are idle, so the EAM will try to kill 2 of them.
    - CGSB now sets its target to 3 because the EAM asks to kill 2 executors.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81194/testReport)** for PR 19048 at commit [`22c3596`](https://github.com/apache/spark/commit/22c3596091072b23c9e8368088e3385e9c3ea846).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81116/testReport)** for PR 19048 at commit [`e000db3`](https://github.com/apache/spark/commit/e000db31f7ce620a004aa74841861abb4a0631af).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    >> this code in the EAM: Should be changed to account for the current number of executors, so that the EAM doesn't tell the CGSB that it wants less executors than currently exist. 
    
    Actually if you look at the api, `ExecutorAllocationManager` api, this is how `requestTotalExecutors` behaves - `The total number of executors we'd like to have. The cluster manager shouldn't kill any running executor to reach this number, but, if all existing executors were to die, this is the number of executors
    we'd want to be allocated.` So the EAM is right in setting the number of total executors it needs to 5 because lets say all executors die, it is up to the cluster manager to spawn 5 executors (not 10). 
    
    >>Your solution (the new updateTotalExecutor) looks too much like the existing replace parameter, and it's a little confusing if you try to think about how to use both. What does it mean to ask for updateTotalExecutor = false and replace = false? The latter means you want the executor count to go down, while the former means you don't.
    
    I agree with you on this. May be it would be cleaner if we provide a new api like this - `killExecutorsAndNotUpdateTotal`? 
    
    
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81193/testReport)** for PR 19048 at commit [`297059f`](https://github.com/apache/spark/commit/297059fd0767c10621722cb200e3394977e4a731).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    @jiangxb1987 - I agree with you. I do not have the context or history to comment on that. Unfortunately, the api has been designed that way and book keeping of target number of executors is done by the CGSB. Changing the existing scheduler behavior will require a bigger change and possibly breaking some existing api behavior which I think is out of the scope of this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    I think I see what you're saying. But I still think it's the fault of the EAM.
    
    > But please note that while killing 2 executors the EAM did not reduce its target to 3, it is still 5.
    
    And I think the problem here is that the EAM should not be telling the CGSB that the target is 5 when 5 is actually the "minimum" the EAM wants, but there may be more executors running that haven't timed out yet. Basically, this code in the EAM:
    
    ```
          if (numExecutorsTarget < oldNumExecutorsTarget) {
            client.requestTotalExecutors(numExecutorsTarget, localityAwareTasks, hostToLocalTaskCount)
            logDebug(s"Lowering target number of executors to $numExecutorsTarget (previously " +
              s"$oldNumExecutorsTarget) because not all requested executors are actually needed")
          }
    ```
    
    Should be changed to account for the current number of executors, so that the EAM doesn't tell the CGSB that it wants less executors than currently exist. Because even if the EAM may not currently "need" the extra executors, it hasn't timed them out, so they need to be counted towards the "number of executors that I expect to be active".
    
    Your solution (the new `updateTotalExecutor`) looks too much like the existing `replace` parameter, and it's a little confusing if you try to think about how to use both. What does it mean to ask for `updateTotalExecutor = false` and `replace  = false`? The latter means you want the executor count to go down, while the former means you don't.
    
    Now if the EAM tells the CGSB the correct amount of executors it expects to be active (which means something like `max(executors I need, active executors)`) then the problem should go away, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    cc - @markhamstra , @sameeragarwal, @rxin, @vanzin,  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    On a high level I agree that keeping the states in 3 places is creating a mess but changing that would require a big refactoring which is probably outside of the scope of this change.
    
    >> Or maybe you can reach the same thing through other means. For example, maybe if you get rid of the replace argument and make killExecutors not update the CGSB target count, and then force the caller to call requestTotalExecutors before killing executors, you could achieve the same thing. Maybe there are corner cases doing that, but maybe it works?
    
    That might work. But there is a race-condition in doing that.  In order to do that, we need to have a `getTotalExecutors` api to CGSB and the caller needs to use the api as follows
    
    `val executors = getTotalExecutors();
    requestTotalExecutors(executors - 1)
    killExecutors(x)`
    
    It is possible that the total executors value changed by another thread between `getTotalExecutors` and `requestTotalExecutors` call. Setting aside the above potential race condition, I do not personally like that the caller need to call three apis which could have been done in just one.
    
    Instead of doing that, how about we add a new api to `ExecutorAllocationClient` which looks like this  - killExecutors(List<Executors>, Int requestedTotal) - This will be used only by the EAM to kill a specific executor and also set the total exected count in CGSB. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81193/testReport)** for PR 19048 at commit [`297059f`](https://github.com/apache/spark/commit/297059fd0767c10621722cb200e3394977e4a731).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    > This is when things get out of sync because now the scheduler will set the number of total executors needed from 4 to 1.
    
    Have you actually observed that behavior?
    
    The way I understand the code, both `ExecutorAllocationManager` and `CoarseGrainedSchedulerBackend` keep track of the target number separately, and deal in absolutes. So you have this order of events:
    
    - 10 executors are running
    - EAM detects 5 as idle and requests that 5 be killed, update its internal target to 5
    - CGSB tries to kill 5 executors and update its internal target to 5 too
    - EAM in its periodic task tells CGSB that it expects 5 executors to exist
    - Everybody is happy
    
    Is that not what you're seeing?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    > but the EAM asks the scheduler to kill 2 of them.
    
    Why? Because of the idle timeout? If that's your point, then the change I referenced above should avoid that.
    
    > The scheduler decieds to kill 2 of them and sets the new target as 3. While the EAM has set the target as 5
    
    How? The scheduler (a.k.a. CGSB) does not kill executors on its own. It has to be told to do so in some way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81132/testReport)** for PR 19048 at commit [`e000db3`](https://github.com/apache/spark/commit/e000db31f7ce620a004aa74841861abb4a0631af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81193/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    >>Or it can call killExecutors() like it does today and then call requestTotalExecutors right after, same result without the awkwardness of the parameter name, but that adds a trip to the cluster manager.
    
    Okay, that seems like a reasonable hack. Only downside as you mentioned is extra trip to the CM and adding more confusion to the usage of `ExecutorAllocationClient` :/. But like you mentioned, cleaning up the api will be a much larger change which is outside of the scope of this PR. I will make the change as you suggested soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    If you can actually provide logs that show what you're trying to say that would probably be easier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81194/testReport)** for PR 19048 at commit [`22c3596`](https://github.com/apache/spark/commit/22c3596091072b23c9e8368088e3385e9c3ea846).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81110/testReport)** for PR 19048 at commit [`e30bbac`](https://github.com/apache/spark/commit/e30bbac45d346e96f1004578234324cb3426b3b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Not sure why the test failed? May be the build is unstable? cc - @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    **[Test build #81195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81195/testReport)** for PR 19048 at commit [`6cc5fab`](https://github.com/apache/spark/commit/6cc5fab56110034e5221c40c5dfcc38e72f8c05a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81132/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    I think I'm starting to understand what you're getting at, but I still don't see why this has anything to do with the CGSB. What I understand from your comment is that the EAM may reduce its target and at the same time try to kill idle executors, basically doubling down and killing too many executors in the process.
    
    Isn't this what this piece of code is trying to prevent?
    
    ```
          } else if (newExecutorTotal - 1 < numExecutorsTarget) {
            logDebug(s"Not removing idle executor $executorIdToBeRemoved because there are only " +
              s"$newExecutorTotal executor(s) left (number of executor target $numExecutorsTarget)")
    ```
    
    If killing an idle executor would bring the number below the current target, and it won't be killed. That's a pretty recent fix so maybe you haven't seen it (SPARK-21656).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Looking at the scheduler and the dynamic executor allocator code, this is what my understanding, correct me if I am wrong. 
    
    Let's say the dynamic executor allocator is ramping down the number of executors. There are 10 executors running and it needs only 4. Then  `ExecutorAllocationManager` will make a call to set the total executors to 4 and also try to kill some idle executors (say 3). This is when things get out of sync because now the scheduler will set the number of total executors needed from 4 to 1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Yea I agree the change made in this PR looks good for your issue, I'm just suggesting maybe we could refactor the code further more, maybe as a follow up work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    One thing I don't understand clearly is why we should update the `requestedTotalExecutors` inside the function `killExecutors`, asking to kill some executor(s) don't implies we are requesting for less executor(s) IMO. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19048: [SPARK-21834] Incorrect executor request in case of dyna...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    To be clear there is no issue on EAM side. Consider the following situation -
    
    - 10 executors are running, each executor can run 4 tasks at max.
    - 20 tasks are running so EAM sets the internal target to 5 and also asks the CGSB to set its `requestedTotalExecutors` to 5. However, it can not kill any executor yet because all of them have atleast one running tasks.
    - 2 tasks on 2 executors succeeds, now the target is still 5 (18/4), but the EAM asks the scheduler to kill 2 of them. 
    - The scheduler decieds to kill 2 of them and sets the new target as 3. While the EAM has set the target as 5.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org