You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2015/12/29 10:46:32 UTC

[GitHub] spark pull request: [SPARK-12552]Correctly count the driver resour...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/10506

    [SPARK-12552]Correctly count the driver resource when recover from failure for Master

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-12552

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10506
    
----
commit 710f5de578449c9f8156540bdc26b4b12d2567d5
Author: jerryshao <ss...@hortonworks.com>
Date:   2015-12-29T09:42:28Z

    Correctly count the driver resource when recover from failure for Master

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167765195
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167955370
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48458/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    @jiangxb1987 , to reproduce this issue, you can:
    
    1. Configure to enable standalone HA, for example "spark.deploy.recoveryMode FILESYSTEM" and "spark.deploy.recoveryDirectory recovery"
    1. Start a local standalone cluster (master and worker on one the same machine).
    2. Submit a spark application with standalone cluster mode, for example "./bin/spark-submit --master spark://NT00022.local:6066 --deploy-mode cluster --class org.apache.spark.examples.SparkPi examples/target/scala-2.11/jars/spark-examples_2.11-2.3.0-SNAPSHOT.jar 10000"
    3. During application running, stop the master process and restart it.
    4. Wait for application to finish, you will see the unexpected core/memory number in master UI.
    
    ![screen shot 2017-06-09 at 1 53 58 pm](https://user-images.githubusercontent.com/850797/26963102-7a40cbbc-4d1d-11e7-9fd8-3be0fd9d1d9e.png)
    
    This is mainly because when Master recover Driver, Master don't count the resources (core/memory) used by Driver, so this part of resources are free, which will be used to allocate a new executor, when the application is finished, this over-occupied resource by new executor will make the worker resources to be negative.
    
    Besides, in the current Master, only when new executor is allocated, then application state will be changed to "RUNNING", recovered application will never have the chance to change the state from "WAITING" to "RUNNING" because there's no new executor allocated.
    
    Can you please take a try, this issue do exist and be reported in JIRA and mail list several times.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #73742 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73742/testReport)** for PR 10506 at commit [`f231aed`](https://github.com/apache/spark/commit/f231aed3865e2e9ee3becd73fda6b1086d6968db).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #77985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77985/testReport)** for PR 10506 at commit [`0bb82bb`](https://github.com/apache/spark/commit/0bb82bb8ee2993a3c79d0fd109c023ba14ed2a9f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Is anyone still working on this and if not, can you close the PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167820998
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #77990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77990/testReport)** for PR 10506 at commit [`0bb82bb`](https://github.com/apache/spark/commit/0bb82bb8ee2993a3c79d0fd109c023ba14ed2a9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10506#discussion_r121015644
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ---
    @@ -367,7 +367,7 @@ private[deploy] class Master(
                 drivers.find(_.id == driverId).foreach { driver =>
                   driver.worker = Some(worker)
                   driver.state = DriverState.RUNNING
    -              worker.drivers(driverId) = driver
    +              worker.addDriver(driver)
    --- End diff --
    
    One major question(though I haven't tested this) -- Won't we call schedule() after we completed recovery? I think we will handle the resource change correctly there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #77627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77627/testReport)** for PR 10506 at commit [`e2d6dbf`](https://github.com/apache/spark/commit/e2d6dbfb8cb31c20c051e966707c8db3d38d211a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77627/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10506#discussion_r121041958
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ---
    @@ -367,7 +367,7 @@ private[deploy] class Master(
                 drivers.find(_.id == driverId).foreach { driver =>
                   driver.worker = Some(worker)
                   driver.state = DriverState.RUNNING
    -              worker.drivers(driverId) = driver
    +              worker.addDriver(driver)
    --- End diff --
    
    From my understanding, `schedule()` will only handle waiting drivers, but here is trying to correctly calculate the exiting drivers, so I don't think `schedule()` will save the issue here. Let me try to test on latest master and back to you the result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167957380
  
    **[Test build #48463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48463/consoleFull)** for PR 10506 at commit [`3eb0b71`](https://github.com/apache/spark/commit/3eb0b713934a7881b6cc135403160e631594c1fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #77976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77976/testReport)** for PR 10506 at commit [`c62889a`](https://github.com/apache/spark/commit/c62889ac7229fb92d70cb2965185ca5fb65331b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167765169
  
    **[Test build #48410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48410/consoleFull)** for PR 10506 at commit [`710f5de`](https://github.com/apache/spark/commit/710f5de578449c9f8156540bdc26b4b12d2567d5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by GavinGavinNo1 <gi...@git.apache.org>.
Github user GavinGavinNo1 commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-203702270
  
    Thank you for your comment for PR #12054. I think changing app state from WAITING to RUNNING in function completeRecovery. Suppose that some app is WAITING before master toggle, then all apps and all workers get known of master changed. But if last signal (WorkerSchedulerStateResponse or MasterChangeAcknowledged) is from some worker, then function completeRecovery is revoked, which means the app I mentioned above is in RUNNING state. If the cluster doesn't have enough resource for all apps, maybe that app will be in a wrong state for a while.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77985/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167972802
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48463/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #73740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73740/testReport)** for PR 10506 at commit [`88b58eb`](https://github.com/apache/spark/commit/88b58eb23c25fb69cdc288fe64dfa331f91ebd33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167955973
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190575732
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52223/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167972673
  
    **[Test build #48463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48463/consoleFull)** for PR 10506 at commit [`3eb0b71`](https://github.com/apache/spark/commit/3eb0b713934a7881b6cc135403160e631594c1fa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    I think this problem shouldn't have happen in general case, could you give more specific description on your integrated cluster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77976/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10506#discussion_r121596180
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -134,6 +138,71 @@ class MasterSuite extends SparkFunSuite
         CustomRecoveryModeFactory.instantiationAttempts should be > instantiationAttempts
       }
     
    +  test("master correctly recover the application") {
    +    val conf = new SparkConf(loadDefaults = false)
    +    conf.set("spark.deploy.recoveryMode", "CUSTOM")
    +    conf.set("spark.deploy.recoveryMode.factory",
    +      classOf[FakeRecoveryModeFactory].getCanonicalName)
    +    conf.set("spark.master.rest.enabled", "false")
    +
    +    val fakeAppInfo = makeAppInfo(1024)
    +    val fakeWorkerInfo = makeWorkerInfo(8192, 16)
    +    val fakeDriverInfo = new DriverInfo(
    +      startTime = 0,
    +      id = "test_driver",
    +      desc = new DriverDescription(
    +        jarUrl = "",
    +        mem = 1024,
    +        cores = 1,
    +        supervise = false,
    +        command = new Command("", Nil, Map.empty, Nil, Nil, Nil)),
    +      submitDate = new Date())
    +
    +    // Build the fake recovery data
    +    FakeRecoveryModeFactory.persistentData.put(s"app_${fakeAppInfo.id}", fakeAppInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"driver_${fakeDriverInfo.id}", fakeDriverInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"worker_${fakeWorkerInfo.id}", fakeWorkerInfo)
    +
    +    var master: Master = null
    +    try {
    +      master = makeMaster(conf)
    +      master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +      // Wait until Master recover from checkpoint data.
    +      eventually(timeout(5 seconds), interval(100 milliseconds)) {
    +        master.idToApp.size should be(1)
    +      }
    +
    +      master.idToApp.keySet should be(Set(fakeAppInfo.id))
    +      getDrivers(master) should be(Set(fakeDriverInfo))
    +      master.workers should be(Set(fakeWorkerInfo))
    +
    +      // Notify Master about the executor and driver info to make it correctly recovered.
    +      val fakeExecutors = List(
    +        new ExecutorDescription(fakeAppInfo.id, 0, 8, ExecutorState.RUNNING),
    +        new ExecutorDescription(fakeAppInfo.id, 0, 7, ExecutorState.RUNNING))
    +      master.self.send(MasterChangeAcknowledged(fakeAppInfo.id))
    +      master.self.send(
    +        WorkerSchedulerStateResponse(fakeWorkerInfo.id, fakeExecutors, Seq(fakeDriverInfo.id)))
    +
    +      eventually(timeout(5 seconds), interval(100 microseconds)) {
    +        getState(master) should be(RecoveryState.ALIVE)
    +      }
    +
    +      // If driver's resource is also counted, free cores should 0
    +      fakeWorkerInfo.coresFree should be(0)
    +      fakeWorkerInfo.coresUsed should be(16)
    +      // State of application should be RUNNING
    +      fakeAppInfo.state should be(ApplicationState.RUNNING)
    --- End diff --
    
    Done, thanks for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #77976 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77976/testReport)** for PR 10506 at commit [`c62889a`](https://github.com/apache/spark/commit/c62889ac7229fb92d70cb2965185ca5fb65331b7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190575725
  
    **[Test build #52223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52223/consoleFull)** for PR 10506 at commit [`a117dcd`](https://github.com/apache/spark/commit/a117dcdcfc4ebffb5aa338ead75fbc03515a2db5).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #73742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73742/testReport)** for PR 10506 at commit [`f231aed`](https://github.com/apache/spark/commit/f231aed3865e2e9ee3becd73fda6b1086d6968db).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10506


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167765197
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48410/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190629099
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    BTW, @jerryshao It would be great if we can add test framework to verify the states and statistics on the condition of Driver/Executor Lost/Join/Relaunch, is there any hope that you would invest some time on that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190575730
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #77985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77985/testReport)** for PR 10506 at commit [`0bb82bb`](https://github.com/apache/spark/commit/0bb82bb8ee2993a3c79d0fd109c023ba14ed2a9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    I think the fix is right and the test case also looks good, we'd better merge this after and some new test cases over the application running state issue. @cloud-fan Could please have a look too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10506#discussion_r121576458
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -134,6 +138,71 @@ class MasterSuite extends SparkFunSuite
         CustomRecoveryModeFactory.instantiationAttempts should be > instantiationAttempts
       }
     
    +  test("master correctly recover the application") {
    +    val conf = new SparkConf(loadDefaults = false)
    +    conf.set("spark.deploy.recoveryMode", "CUSTOM")
    +    conf.set("spark.deploy.recoveryMode.factory",
    +      classOf[FakeRecoveryModeFactory].getCanonicalName)
    +    conf.set("spark.master.rest.enabled", "false")
    +
    +    val fakeAppInfo = makeAppInfo(1024)
    +    val fakeWorkerInfo = makeWorkerInfo(8192, 16)
    +    val fakeDriverInfo = new DriverInfo(
    +      startTime = 0,
    +      id = "test_driver",
    +      desc = new DriverDescription(
    +        jarUrl = "",
    +        mem = 1024,
    +        cores = 1,
    +        supervise = false,
    +        command = new Command("", Nil, Map.empty, Nil, Nil, Nil)),
    +      submitDate = new Date())
    +
    +    // Build the fake recovery data
    +    FakeRecoveryModeFactory.persistentData.put(s"app_${fakeAppInfo.id}", fakeAppInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"driver_${fakeDriverInfo.id}", fakeDriverInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"worker_${fakeWorkerInfo.id}", fakeWorkerInfo)
    +
    +    var master: Master = null
    +    try {
    +      master = makeMaster(conf)
    +      master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +      // Wait until Master recover from checkpoint data.
    +      eventually(timeout(5 seconds), interval(100 milliseconds)) {
    +        master.idToApp.size should be(1)
    +      }
    +
    +      master.idToApp.keySet should be(Set(fakeAppInfo.id))
    +      getDrivers(master) should be(Set(fakeDriverInfo))
    +      master.workers should be(Set(fakeWorkerInfo))
    +
    +      // Notify Master about the executor and driver info to make it correctly recovered.
    +      val fakeExecutors = List(
    +        new ExecutorDescription(fakeAppInfo.id, 0, 8, ExecutorState.RUNNING),
    +        new ExecutorDescription(fakeAppInfo.id, 0, 7, ExecutorState.RUNNING))
    +      master.self.send(MasterChangeAcknowledged(fakeAppInfo.id))
    +      master.self.send(
    +        WorkerSchedulerStateResponse(fakeWorkerInfo.id, fakeExecutors, Seq(fakeDriverInfo.id)))
    +
    +      eventually(timeout(5 seconds), interval(100 microseconds)) {
    +        getState(master) should be(RecoveryState.ALIVE)
    +      }
    +
    +      // If driver's resource is also counted, free cores should 0
    +      fakeWorkerInfo.coresFree should be(0)
    +      fakeWorkerInfo.coresUsed should be(16)
    +      // State of application should be RUNNING
    +      fakeAppInfo.state should be(ApplicationState.RUNNING)
    --- End diff --
    
    shall we also test these before the recovery? To show that we do change something when recovering


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Hi @kayousterhout , I guess the issue still exists, but unfortunately there's no one reviewing this patch. I could rebase the code if someone could review it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Ping @zsxwing , hopes you're the right person to review this very old PR, the issue still exists in the latest master, can you please take a review, thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190575051
  
    **[Test build #52223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52223/consoleFull)** for PR 10506 at commit [`a117dcd`](https://github.com/apache/spark/commit/a117dcdcfc4ebffb5aa338ead75fbc03515a2db5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10506#discussion_r121609933
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -134,6 +138,79 @@ class MasterSuite extends SparkFunSuite
         CustomRecoveryModeFactory.instantiationAttempts should be > instantiationAttempts
       }
     
    +  test("master correctly recover the application") {
    +    val conf = new SparkConf(loadDefaults = false)
    +    conf.set("spark.deploy.recoveryMode", "CUSTOM")
    +    conf.set("spark.deploy.recoveryMode.factory",
    +      classOf[FakeRecoveryModeFactory].getCanonicalName)
    +    conf.set("spark.master.rest.enabled", "false")
    +
    +    val fakeAppInfo = makeAppInfo(1024)
    +    val fakeWorkerInfo = makeWorkerInfo(8192, 16)
    +    val fakeDriverInfo = new DriverInfo(
    +      startTime = 0,
    +      id = "test_driver",
    +      desc = new DriverDescription(
    +        jarUrl = "",
    +        mem = 1024,
    +        cores = 1,
    +        supervise = false,
    +        command = new Command("", Nil, Map.empty, Nil, Nil, Nil)),
    +      submitDate = new Date())
    +
    +    // Build the fake recovery data
    +    FakeRecoveryModeFactory.persistentData.put(s"app_${fakeAppInfo.id}", fakeAppInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"driver_${fakeDriverInfo.id}", fakeDriverInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"worker_${fakeWorkerInfo.id}", fakeWorkerInfo)
    +
    +    var master: Master = null
    +    try {
    +      master = makeMaster(conf)
    +      master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +      // Wait until Master recover from checkpoint data.
    +      eventually(timeout(5 seconds), interval(100 milliseconds)) {
    +        master.idToApp.size should be(1)
    +      }
    +
    +      master.idToApp.keySet should be(Set(fakeAppInfo.id))
    +      getDrivers(master) should be(Set(fakeDriverInfo))
    +      master.workers should be(Set(fakeWorkerInfo))
    +
    +      // Notify Master about the executor and driver info to make it correctly recovered.
    +      val fakeExecutors = List(
    +        new ExecutorDescription(fakeAppInfo.id, 0, 8, ExecutorState.RUNNING),
    +        new ExecutorDescription(fakeAppInfo.id, 0, 7, ExecutorState.RUNNING))
    +
    +      fakeAppInfo.state should be(ApplicationState.UNKNOWN)
    +
    +      master.self.send(MasterChangeAcknowledged(fakeAppInfo.id))
    +      master.self.send(
    +        WorkerSchedulerStateResponse(fakeWorkerInfo.id, fakeExecutors, Seq(fakeDriverInfo.id)))
    +
    +      eventually(timeout(1 second), interval(10 milliseconds)) {
    --- End diff --
    
    hmmm will this be flaky?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73742/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #77627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77627/testReport)** for PR 10506 at commit [`e2d6dbf`](https://github.com/apache/spark/commit/e2d6dbfb8cb31c20c051e966707c8db3d38d211a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167820776
  
    **[Test build #48413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48413/consoleFull)** for PR 10506 at commit [`710f5de`](https://github.com/apache/spark/commit/710f5de578449c9f8156540bdc26b4b12d2567d5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Sure, I will bring this to update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #73740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73740/testReport)** for PR 10506 at commit [`88b58eb`](https://github.com/apache/spark/commit/88b58eb23c25fb69cdc288fe64dfa331f91ebd33).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    OK fine to leave this open then (I don't have the time or expertise to review this unfortunately)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73740/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10506#discussion_r121609987
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -134,6 +138,79 @@ class MasterSuite extends SparkFunSuite
         CustomRecoveryModeFactory.instantiationAttempts should be > instantiationAttempts
       }
     
    +  test("master correctly recover the application") {
    +    val conf = new SparkConf(loadDefaults = false)
    +    conf.set("spark.deploy.recoveryMode", "CUSTOM")
    +    conf.set("spark.deploy.recoveryMode.factory",
    +      classOf[FakeRecoveryModeFactory].getCanonicalName)
    +    conf.set("spark.master.rest.enabled", "false")
    +
    +    val fakeAppInfo = makeAppInfo(1024)
    +    val fakeWorkerInfo = makeWorkerInfo(8192, 16)
    +    val fakeDriverInfo = new DriverInfo(
    +      startTime = 0,
    +      id = "test_driver",
    +      desc = new DriverDescription(
    +        jarUrl = "",
    +        mem = 1024,
    +        cores = 1,
    +        supervise = false,
    +        command = new Command("", Nil, Map.empty, Nil, Nil, Nil)),
    +      submitDate = new Date())
    +
    +    // Build the fake recovery data
    +    FakeRecoveryModeFactory.persistentData.put(s"app_${fakeAppInfo.id}", fakeAppInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"driver_${fakeDriverInfo.id}", fakeDriverInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"worker_${fakeWorkerInfo.id}", fakeWorkerInfo)
    +
    +    var master: Master = null
    +    try {
    +      master = makeMaster(conf)
    +      master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +      // Wait until Master recover from checkpoint data.
    +      eventually(timeout(5 seconds), interval(100 milliseconds)) {
    +        master.idToApp.size should be(1)
    +      }
    +
    +      master.idToApp.keySet should be(Set(fakeAppInfo.id))
    +      getDrivers(master) should be(Set(fakeDriverInfo))
    +      master.workers should be(Set(fakeWorkerInfo))
    +
    +      // Notify Master about the executor and driver info to make it correctly recovered.
    +      val fakeExecutors = List(
    +        new ExecutorDescription(fakeAppInfo.id, 0, 8, ExecutorState.RUNNING),
    +        new ExecutorDescription(fakeAppInfo.id, 0, 7, ExecutorState.RUNNING))
    +
    +      fakeAppInfo.state should be(ApplicationState.UNKNOWN)
    +
    +      master.self.send(MasterChangeAcknowledged(fakeAppInfo.id))
    +      master.self.send(
    +        WorkerSchedulerStateResponse(fakeWorkerInfo.id, fakeExecutors, Seq(fakeDriverInfo.id)))
    +
    +      eventually(timeout(1 second), interval(10 milliseconds)) {
    +        // Application state should be WAITING when "MasterChangeAcknowledged" event executed.
    +        fakeAppInfo.state should be(ApplicationState.WAITING)
    +      }
    +
    +      eventually(timeout(5 seconds), interval(100 milliseconds)) {
    +        getState(master) should be(RecoveryState.ALIVE)
    +      }
    +
    +      // If driver's resource is also counted, free cores should 0
    +      fakeWorkerInfo.coresFree should be(0)
    +      fakeWorkerInfo.coresUsed should be(16)
    --- End diff --
    
    we can also test these 2 before recovering


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77990/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10506#discussion_r121610617
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -134,6 +138,79 @@ class MasterSuite extends SparkFunSuite
         CustomRecoveryModeFactory.instantiationAttempts should be > instantiationAttempts
       }
     
    +  test("master correctly recover the application") {
    +    val conf = new SparkConf(loadDefaults = false)
    +    conf.set("spark.deploy.recoveryMode", "CUSTOM")
    +    conf.set("spark.deploy.recoveryMode.factory",
    +      classOf[FakeRecoveryModeFactory].getCanonicalName)
    +    conf.set("spark.master.rest.enabled", "false")
    +
    +    val fakeAppInfo = makeAppInfo(1024)
    +    val fakeWorkerInfo = makeWorkerInfo(8192, 16)
    +    val fakeDriverInfo = new DriverInfo(
    +      startTime = 0,
    +      id = "test_driver",
    +      desc = new DriverDescription(
    +        jarUrl = "",
    +        mem = 1024,
    +        cores = 1,
    +        supervise = false,
    +        command = new Command("", Nil, Map.empty, Nil, Nil, Nil)),
    +      submitDate = new Date())
    +
    +    // Build the fake recovery data
    +    FakeRecoveryModeFactory.persistentData.put(s"app_${fakeAppInfo.id}", fakeAppInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"driver_${fakeDriverInfo.id}", fakeDriverInfo)
    +    FakeRecoveryModeFactory.persistentData.put(s"worker_${fakeWorkerInfo.id}", fakeWorkerInfo)
    +
    +    var master: Master = null
    +    try {
    +      master = makeMaster(conf)
    +      master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +      // Wait until Master recover from checkpoint data.
    +      eventually(timeout(5 seconds), interval(100 milliseconds)) {
    +        master.idToApp.size should be(1)
    +      }
    +
    +      master.idToApp.keySet should be(Set(fakeAppInfo.id))
    +      getDrivers(master) should be(Set(fakeDriverInfo))
    +      master.workers should be(Set(fakeWorkerInfo))
    +
    +      // Notify Master about the executor and driver info to make it correctly recovered.
    +      val fakeExecutors = List(
    +        new ExecutorDescription(fakeAppInfo.id, 0, 8, ExecutorState.RUNNING),
    +        new ExecutorDescription(fakeAppInfo.id, 0, 7, ExecutorState.RUNNING))
    +
    +      fakeAppInfo.state should be(ApplicationState.UNKNOWN)
    +
    +      master.self.send(MasterChangeAcknowledged(fakeAppInfo.id))
    +      master.self.send(
    +        WorkerSchedulerStateResponse(fakeWorkerInfo.id, fakeExecutors, Seq(fakeDriverInfo.id)))
    +
    +      eventually(timeout(1 second), interval(10 milliseconds)) {
    --- End diff --
    
    Because RPC `send` is asynchronous, if we check the app state immediately after `send` we will get "UNKNOWN" state instead of "WAITING".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190629100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52225/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190628757
  
    **[Test build #52225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52225/consoleFull)** for PR 10506 at commit [`7cec07c`](https://github.com/apache/spark/commit/7cec07c59ffb73261c743c5dffd5ea262ca9c0dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167793719
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167911683
  
    Sure, will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167795222
  
    **[Test build #48413 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48413/consoleFull)** for PR 10506 at commit [`710f5de`](https://github.com/apache/spark/commit/710f5de578449c9f8156540bdc26b4b12d2567d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Could you rebase this? @jerryshao 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10506: [SPARK-12552][Core]Correctly count the driver res...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10506#discussion_r121016096
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ---
    @@ -547,6 +547,9 @@ private[deploy] class Master(
         workers.filter(_.state == WorkerState.UNKNOWN).foreach(removeWorker)
         apps.filter(_.state == ApplicationState.UNKNOWN).foreach(finishApplication)
     
    +    // Update the state of recovered apps to RUNNING
    +    apps.filter(_.state == ApplicationState.WAITING).foreach(_.state = ApplicationState.RUNNING)
    --- End diff --
    
    This should also been done later in schedule().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Currently we don't cover the Driver/Executor Lost/Relaunch cases in `MasterSuite`, and we have seen several issues related to relaunching drivers in standalone mode, so it would be great if we can write a test frame to verify the Driver/Worker states and statistics(memory/cores etc.) meets our expectations on Worker Join/Lost/ReJoin, and fix the inconsistencies in follow up PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167758670
  
    **[Test build #48410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48410/consoleFull)** for PR 10506 at commit [`710f5de`](https://github.com/apache/spark/commit/710f5de578449c9f8156540bdc26b4b12d2567d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190575489
  
    @andrewor14 , would you please review this patch again, it is pending here a long time and I think it is actually a bug here. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    @jerryshao Thank you for your effort, I'll try this tomorrow!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Thanks @srowen , I think the fix is OK, at least should be no worse than previous code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167821001
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48413/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167860300
  
    Can you add a unit test? You might have to mock the `completeRecovery` method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    thanks, merging to master/2.2! The fix is only 2 lines so should be safe to backport


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    I also don't feel like I know enough to review this, but if you're confident about the fix, i think you can go ahead. The change looks reasonable on its face.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167955369
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-190579869
  
    **[Test build #52225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52225/consoleFull)** for PR 10506 at commit [`7cec07c`](https://github.com/apache/spark/commit/7cec07c59ffb73261c743c5dffd5ea262ca9c0dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10506#issuecomment-167972801
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    >It would be great if we can add test framework to verify the states and statistics on the condition of Driver/Executor Lost/Join/Relaunch
    
    @jiangxb1987 can you explain more about what you want?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10506: [SPARK-12552][Core]Correctly count the driver resource w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10506
  
    **[Test build #77990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77990/testReport)** for PR 10506 at commit [`0bb82bb`](https://github.com/apache/spark/commit/0bb82bb8ee2993a3c79d0fd109c023ba14ed2a9f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org