You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by witgo <gi...@git.apache.org> on 2017/05/06 13:31:34 UTC

[GitHub] spark pull request #17882: [WIP][SPARK-20079][try 2][yarn] Re registration o...

GitHub user witgo opened a pull request:

    https://github.com/apache/spark/pull/17882

    [WIP][SPARK-20079][try 2][yarn] Re registration of AM hangs spark cluster in yarn-client mode.

    See #17480 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/witgo/spark SPARK-20079_try2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17882.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17882
    
----
commit fd4c486725ae49d068e67088a185fb4cc229e21f
Author: Guoqiang Li <wi...@qq.com>
Date:   2017-03-30T14:17:49Z

    SPARK-20079: Re registration of AM hangs spark cluster in yarn-client mode.

commit fa27d7f6da49659c8aa6efe55a3e457e883eb17a
Author: Guoqiang Li <wi...@qq.com>
Date:   2017-04-04T04:33:45Z

    review commits

commit 6341d31c2961a08db2f36339f5ebe8c814eeb4c7
Author: Guoqiang Li <wi...@qq.com>
Date:   2017-04-07T10:32:29Z

    review commits

commit e992df93e7222d5d2bd66d9a2c19984c9b241fd5
Author: Guoqiang Li <wi...@qq.com>
Date:   2017-04-23T04:56:58Z

    Delete "initializing = true" in ExecutorAllocationManager.reset

commit 917cf43ffaaeb20347df3c7e480cb75ae87dca83
Author: Guoqiang Li <wi...@qq.com>
Date:   2017-05-06T13:28:28Z

    Add msg: RetrieveLastAllocatedExecutorNumber

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    I think this could be closed, @vanzin already created a new PR based on this (#18663).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78055/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    I'm very sorry, I haven't taken the time to update this code recently.
    @vanzin , thank you for your work. I'll close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #78052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78052/testReport)** for PR 17882 at commit [`f019209`](https://github.com/apache/spark/commit/f019209d40943eec2d4cbd4ba8284f59d9de8440).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [WIP][SPARK-20079][yarn] Re registration of AM ha...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r121972913
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ---
    @@ -176,16 +179,6 @@ private[spark] abstract class YarnSchedulerBackend(
       }
     
       /**
    -   * Reset the state of SchedulerBackend to the initial state. This is happened when AM is failed
    -   * and re-registered itself to driver after a failure. The stale state in driver should be
    -   * cleaned.
    -   */
    -  override protected def reset(): Unit = {
    --- End diff --
    
    I think `ExecutorAllocationManager#reset` is still necessary, but the following code should be removed
    ```scala
        initializing = true
        numExecutorsTarget = initialNumExecutors
        numExecutorsToAdd = 1
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #76523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76523/testReport)** for PR 17882 at commit [`917cf43`](https://github.com/apache/spark/commit/917cf43ffaaeb20347df3c7e480cb75ae87dca83).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [WIP][SPARK-20079][yarn] Re registration of AM ha...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r122336025
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ---
    @@ -68,6 +68,12 @@ private[spark] abstract class YarnSchedulerBackend(
       // Flag to specify whether this schedulerBackend should be reset.
       private var shouldResetOnAmRegister = false
     
    +  private val currentState = new CurrentAMState(0,
    +    RequestExecutors(Utils.getDynamicAllocationInitialExecutors(conf), 0, Map.empty, Set.empty))
    +
    +  protected class CurrentAMState(
    --- End diff --
    
    The way this class is used isn't really helping. You could have the two fields as mutable fields in the parent instead of declaring a separate type.
    
    I'd suggest either getting rid of this class, or turning it into a proper type to be sent as a reply to the "get" RPC, defined in `CoarseGrainedClusterMessage.scala` (meaning you'd keep state in fields here, and the message itself would be instantiated only when replying to the RPC, and would be immutable).
    
    Also, either suggestion means you don't need the changes around `setCurrentExecutorIdCounter`. The old code was fine.
    
    Finally, I know I suggested the name, but making the names of the classes more generic would be better (since that file is not YARN-specific). e.g. `GetExecutorAllocatorState` and `ExecutorAllocatorState` for the reply.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [SPARK-20079][yarn] Re registration of AM hangs s...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r120710864
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ---
    @@ -307,6 +301,9 @@ private[spark] abstract class YarnSchedulerBackend(
     
           case RetrieveLastAllocatedExecutorId =>
             context.reply(currentExecutorIdCounter)
    +
    +      case RetrieveLastRequestExecutors =>
    --- End diff --
    
    Why make a new call for this? What I had in mind would re-use `RetrieveLastAllocatedExecutorId` adding a new properties that tell the AM what's the initial state. Might be good to change the name of the message at that point to (e.g. `GetAMInitialState` or something).
    
    That call is already made during initialization of `YarnAllocator`, so it should be a smaller change overall.
    
    So, you'd have one already existing RPC instead of one RPC that may cause the AM to send another RPC to signal the driver to send an RPC to the AM (my head is spinning now).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [SPARK-20079][try 2][yarn] Re registration of AM hangs s...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    CC @mridulm , can you please review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [SPARK-20079][try 2][yarn] Re registration of AM hangs s...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Could you remove `[try 2]` from the title and write a proper commit message? I'll try to review the code later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    @jerryshao @vanzin 
    Would you take some time to review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [SPARK-20079][yarn] Re registration of AM hangs spark cl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #77797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77797/testReport)** for PR 17882 at commit [`bce09cb`](https://github.com/apache/spark/commit/bce09cb37de45c546a1dd68ca6db1d2701143443).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #78055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78055/testReport)** for PR 17882 at commit [`45aac34`](https://github.com/apache/spark/commit/45aac34d312bd80d4e67ab41939cebca8164868b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #76533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76533/testReport)** for PR 17882 at commit [`53d0c25`](https://github.com/apache/spark/commit/53d0c2551ef73dc843a53a088c5c7c835956f490).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [WIP][SPARK-20079][yarn] Re registration of AM ha...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r122335403
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -358,14 +358,26 @@ private[spark] class ApplicationMaster(
           dummyRunner.launchContextDebugInfo()
         }
     
    +    /**
    +     * (executorIdCounter, requestExecutors) should be the initial state
    +     * or the last state AM restart.
    +     *
    +     * @see SPARK-12864, SPARK-20079
    +     */
    +    val (executorIdCounter, requestExecutors) =
    +      driverRef.askSync[(Int, RequestExecutors)](GetAMInitialState)
         allocator = client.register(driverUrl,
           driverRef,
           yarnConf,
           _sparkConf,
           uiAddress,
           historyAddress,
           securityMgr,
    -      localResources)
    +      localResources,
    +      executorIdCounter)
    +    if (requestExecutors.requestedTotal != allocator.getTargetNumExecutors) {
    +      amEndpoint.send(requestExecutors)
    --- End diff --
    
    This doesn't work; because you're using `send`, the handler would have to be in the `receive` method of the endpoint, and it's actually in `receiveAndReply` (which is the handler for `ask`).
    
    You have to call `requestTotalExecutorsWithPreferredLocalities` directly here, or make that method take a `RequestExecutors` message as an argument, but you shouldn't go through the RPC system just to make a method call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [WIP][SPARK-20079][yarn] Re registration of AM ha...

Posted by witgo <gi...@git.apache.org>.
Github user witgo closed the pull request at:

    https://github.com/apache/spark/pull/17882


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    ping @witgo how it is going?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [SPARK-20079][yarn] Re registration of AM hangs s...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r120652302
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ---
    @@ -68,6 +68,8 @@ private[spark] abstract class YarnSchedulerBackend(
       // Flag to specify whether this schedulerBackend should be reset.
       private var shouldResetOnAmRegister = false
     
    +  private var lastRequestExecutors = RequestExecutors(-1, -1, Map.empty, Set.empty)
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76523/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Thank you @jerryshao.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #76533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76533/testReport)** for PR 17882 at commit [`53d0c25`](https://github.com/apache/spark/commit/53d0c2551ef73dc843a53a088c5c7c835956f490).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    gentle ping @witgo for review comments above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [SPARK-20079][yarn] Re registration of AM hangs spark cl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #78055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78055/testReport)** for PR 17882 at commit [`45aac34`](https://github.com/apache/spark/commit/45aac34d312bd80d4e67ab41939cebca8164868b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  protected class CurrentAMState(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [SPARK-20079][yarn] Re registration of AM hangs spark cl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77797/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [SPARK-20079][yarn] Re registration of AM hangs spark cl...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    @vanzin Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [SPARK-20079][yarn] Re registration of AM hangs s...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r120644588
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ---
    @@ -176,16 +179,6 @@ private[spark] abstract class YarnSchedulerBackend(
       }
     
       /**
    -   * Reset the state of SchedulerBackend to the initial state. This is happened when AM is failed
    -   * and re-registered itself to driver after a failure. The stale state in driver should be
    -   * cleaned.
    -   */
    -  override protected def reset(): Unit = {
    --- End diff --
    
    From the current code, we reset the state of `ExecutorAllocationManager` is not correct. Iy causes the `RetrieveLastRequestExecutors` message to not work properly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76533/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [SPARK-20079][try 2][yarn] Re registration of AM ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r120351695
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ---
    @@ -68,6 +68,8 @@ private[spark] abstract class YarnSchedulerBackend(
       // Flag to specify whether this schedulerBackend should be reset.
       private var shouldResetOnAmRegister = false
     
    +  private var lastRequestExecutors = RequestExecutors(-1, -1, Map.empty, Set.empty)
    --- End diff --
    
    I would suggest to change the first `-1` to `Utils#getDynamicAllocationInitialExecutors`, and the second to be "0".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #76523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76523/testReport)** for PR 17882 at commit [`917cf43`](https://github.com/apache/spark/commit/917cf43ffaaeb20347df3c7e480cb75ae87dca83).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [SPARK-20079][yarn] Re registration of AM hangs s...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r120809352
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ---
    @@ -176,16 +179,6 @@ private[spark] abstract class YarnSchedulerBackend(
       }
     
       /**
    -   * Reset the state of SchedulerBackend to the initial state. This is happened when AM is failed
    -   * and re-registered itself to driver after a failure. The stale state in driver should be
    -   * cleaned.
    -   */
    -  override protected def reset(): Unit = {
    --- End diff --
    
    OK, then if `ExecutorAllocationManager#reset` is not required, then why would you still keep that method?
    
    Also should we clean this two fields during AM restart in `ExecutorAllocationManager`?
    
    ```
        executorsPendingToRemove.clear()
        removeTimes.clear()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78052/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [SPARK-20079][yarn] Re registration of AM hangs spark cl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #77797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77797/testReport)** for PR 17882 at commit [`bce09cb`](https://github.com/apache/spark/commit/bce09cb37de45c546a1dd68ca6db1d2701143443).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17882: [SPARK-20079][try 2][yarn] Re registration of AM ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17882#discussion_r120352440
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ---
    @@ -176,16 +179,6 @@ private[spark] abstract class YarnSchedulerBackend(
       }
     
       /**
    -   * Reset the state of SchedulerBackend to the initial state. This is happened when AM is failed
    -   * and re-registered itself to driver after a failure. The stale state in driver should be
    -   * cleaned.
    -   */
    -  override protected def reset(): Unit = {
    --- End diff --
    
    Why would you remove this method, AFAIK it is still necessary, otherwise who will reset the state of `ExecutorAllocationManager`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17882: [WIP][SPARK-20079][yarn] Re registration of AM hangs spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17882
  
    **[Test build #78052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78052/testReport)** for PR 17882 at commit [`f019209`](https://github.com/apache/spark/commit/f019209d40943eec2d4cbd4ba8284f59d9de8440).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  protected class CurrentAMState(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org