You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sitalkedia <gi...@git.apache.org> on 2017/10/19 05:29:14 UTC

[GitHub] spark pull request #19534: [SPARK-22312][CORE] Fix bug in Executor allocatio...

GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/19534

    [SPARK-22312][CORE] Fix bug in Executor allocation manager in running…

    ## What changes were proposed in this pull request?
    
    We often see the issue of Spark jobs stuck because the Executor Allocation Manager does not ask for any executor even if there are pending tasks in case dynamic allocation is turned on. Looking at the logic in Executor Allocation Manager, which calculates the running tasks, it can happen that the calculation will be wrong and the number of running tasks can become negative.
    
    
    ## How was this patch tested?
    
    Added unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark skedia/fix_stuck_job

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19534
    
----
commit 4f0cffa42c828d3f49e983dae8b2188b78036fcc
Author: Sital Kedia <sk...@fb.com>
Date:   2017-10-19T05:24:38Z

    [SPARK-22312][CORE] Fix bug in Executor allocation manager in running tasks calculation

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    @sitalkedia That makes sense. The proposed solutions are quite similar, we can choose to continue with either PR, WDYT @jerryshao @sitalkedia ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    @sitalkedia would you please fix the PR title, seems it is broken now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    **[Test build #82914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82914/testReport)** for PR 19534 at commit [`f8fcc35`](https://github.com/apache/spark/commit/f8fcc3560e087440c7618b33cc892f3feafd4a3a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    **[Test build #82914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82914/testReport)** for PR 19534 at commit [`f8fcc35`](https://github.com/apache/spark/commit/f8fcc3560e087440c7618b33cc892f3feafd4a3a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    @sitalkedia I'm OK with either.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82903/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    @sitalkedia would you please reopen this PR, I think the second issue I fixed before is not valid anymore, for the first issue the fix is no difference compared to here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19534: [SPARK-22312][CORE] Fix bug in Executor allocatio...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia closed the pull request at:

    https://github.com/apache/spark/pull/19534


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    Jenkins retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    **[Test build #82903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82903/testReport)** for PR 19534 at commit [`f8fcc35`](https://github.com/apache/spark/commit/f8fcc3560e087440c7618b33cc892f3feafd4a3a).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    I think other PR is fixing one more issue on top of runningTasks being negative, so we can proceed with the other one.  What do you think @jerryshao ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82914/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    Do you mean we may first set `numRunningTasks` to 0 and then run into `onTaskEnd` and have `numRunningTasks -= 1`? Could we simply check `stageIdToSpeculativeTaskIndices`/`stageIdToTaskIndices` to see whether the stageId is still valid to avoid the issue?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    **[Test build #82903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82903/testReport)** for PR 19534 at commit [`f8fcc35`](https://github.com/apache/spark/commit/f8fcc3560e087440c7618b33cc892f3feafd4a3a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    Let's fix up Saisai's PR then.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    @sitalkedia I have a very old similar PR #11205 , maybe you can refer to it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    cc - @vanzin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/19534
  
    @jiangxb1987 - yes that is the issue and you are right, we can avoid it by checking if the stageId is valid when we get a task end event. But I like this approach better because we can clean up the hack which sets the `numRunningTasks` to 0 when stage ends and also it is inline with the way we are doing bookkeeping in ExecutorAllocationListener i.e, keeping entry per stage. Let me know what you think.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org