You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2015/10/08 04:53:54 UTC

[GitHub] spark pull request: [SPARK-10987] [yarn] Work around for netty rpc...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/9021

    [SPARK-10987] [yarn] Work around for netty rpc disconnection event.

    In YARN client mode, when the AM connects to the driver, it may be the case
    that the driver never needs to send a message back to the AM (i.e., no
    dynamic allocation or preemption). This triggers an issue in the netty rpc
    backend where no disconnection event is sent to endpoints, and the AM never
    exits after the driver shuts down.
    
    The real fix is too complicated, so this is a quick hack to unblock YARN
    client mode until we can work on the real fix. It forces the driver to
    send a message to the AM when the AM registers, thus establishing that
    connection and enabling the disconnection event when the driver goes
    away.
    
    Also, a minor side issue: when the executor is shutting down, it needs
    to send an "ack" back to the driver when using the netty rpc backend; but
    that "ack" wasn't being sent because the handler was shutting down the rpc
    env before returning. So added a change to delay the shutdown a little bit,
    allowing the ack to be sent back.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-10987

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9021.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9021
    
----
commit 8b7620fcfa0a677865dde799ea933e0e7ee266ab
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-10-08T00:35:50Z

    [SPARK-10987] [yarn] Work around for netty rpc disconnection event.
    
    In YARN client mode, when the AM connects to the driver, it may be the case
    that the driver never needs to send a message back to the AM (i.e., no
    dynamic allocation or preemption). This triggers an issue in the netty rpc
    backend where no disconnection event is sent to endpoints, and the AM never
    exits after the driver shuts down.
    
    The real fix is too complicated, so this is a quick hack to unblock YARN
    client mode until we can work on the real fix. It forces the driver to
    send a message to the AM when the AM registers, thus establishing that
    connection and enabling the disconnection event when the driver goes
    away.
    
    Also, a minor side issue: when the executor is shutting down, it needs
    to send an "ack" back to the driver when using the netty rpc backend; but
    that "ack" wasn't being sent because the handler was shutting down the rpc
    env before returning. So added a change to delay the shutdown a little bit,
    allowing the ack to be sent back.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146419475
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43377/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146617647
  
    > It's because we use a thread pool to call sendRpc.
    
    That too, but it can also happen if you enable multiple connections per peer (`spark.shuffle.io.numConnectionsPerPeer`). I still think for the RPC library we should hardcode that to `1` - that should also make implementing SPARK-10997 simpler.
    
    The bugs I saw in the transport library weren't really related to connection establishment, but I can take a look at how hard it would be to make connections non-blocking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146419472
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Work around for netty rpc...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146403614
  
    /cc @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Work around for netty rpc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146403827
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146420921
  
    > This though worries me that there are other similar bugs lurking in the background.
    
    I'm sure there are. Any code that relies on the disconnection even may be broken, depending on the situation. Also, while investigating this, I found a couple of issues in the transport library that I will fix separately...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146420692
  
    LGTM. 
    
    This though worries me that there are other similar bugs lurking in the background. I would expect the new RPC to behave exactly like the old one (minus any akka-related issues we saw). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Work around for netty rpc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146404426
  
      [Test build #43377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43377/consoleFull) for   PR 9021 at commit [`8b7620f`](https://github.com/apache/spark/commit/8b7620fcfa0a677865dde799ea933e0e7ee266ab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146419423
  
      [Test build #43377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43377/console) for   PR 9021 at commit [`8b7620f`](https://github.com/apache/spark/commit/8b7620fcfa0a677865dde799ea933e0e7ee266ab).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9021


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146582988
  
    LGTM. 
    
    @vanzin I think maybe we should monitor dual connections. I also found an issue about the message order. If we run the following code in order,
    ```
    ref.send("A")
    ref.send("B")
    ```
    the receiver side may see "B" before "A". It's because we use a thread pool to call `sendRpc`. While you fix the  transport library, is it possible to make `clientFactory.createClient` non-blocking so that we don't need the thread pool?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Work around for netty rpc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146403818
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146420122
  
    Output from test:
    
        [info] YarnClusterSuite:
        [info] - run Spark in yarn-client mode (15 seconds, 863 milliseconds)
        [info] - run Spark in yarn-cluster mode (13 seconds, 956 milliseconds)
        [info] - run Spark in yarn-cluster mode unsuccessfully (10 seconds, 916 milliseconds)
        [info] - run Python application in yarn-client mode (18 seconds, 526 milliseconds)
        [info] - run Python application in yarn-cluster mode (16 seconds, 987 milliseconds)
        [info] - user class path first in client mode (15 seconds, 256 milliseconds)
        [info] - user class path first in cluster mode (13 seconds, 875 milliseconds)
    
    Compare to the version in the bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9021#issuecomment-146618288
  
    Merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org