You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by SaintBacchus <gi...@git.apache.org> on 2015/04/16 03:57:43 UTC

[GitHub] spark pull request: Do not let Yarn Shuffle Server retry its serve...

GitHub user SaintBacchus opened a pull request:

    https://github.com/apache/spark/pull/5537

    Do not let Yarn Shuffle Server retry its server port.

    It's better to let the NodeManager get down rather than take a port retry when `spark.shuffle.service.port` has been conflicted during starting the Spark Yarn Shuffle Server, because the retry mechanism will make the inconsistency of shuffle port and also make client fail to find the port.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/SaintBacchus/spark YarnShufflePortRetry

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5537
    
----
commit 7126547da9ca6447d8c53bec8bb4094e11badb2e
Author: huangzhaowei <ca...@gmail.com>
Date:   2015-04-16T01:37:20Z

    Do not let Yarn Shuffle Server retry its server port.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-94187864
  
    Ah, I realize now that the original change that brought this bug up (#4240) was perhaps not the right solution. The issue that #4240 solved was actually brought up beforehand (https://github.com/apache/spark/pull/3688#issuecomment-69588907). I think my recommended fix then would still be good, as it would not affect the Yarn or Standalone Worker shuffle services.
    
    I created a patch which aims to resolve both #4240 and this PR's issues: #5575.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93949861
  
      [Test build #30473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30473/consoleFull) for   PR 5537 at commit [`962770c`](https://github.com/apache/spark/commit/962770c914a1a1928dccbf14a26df735ba4f77f3).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5537#discussion_r28622548
  
    --- Diff: core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala ---
    @@ -43,8 +43,12 @@ object SparkTransportConf {
        * @param numUsableCores if nonzero, this will restrict the server and client threads to only
        *                       use the given number of cores, rather than all of the machine's cores.
        *                       This restriction will only occur if these properties are not already set.
    +   * @param disablePortRetry if true, server will not retry its port. It's better for the long-run
    +   *                        server to disable it since the server and client had the agreement of
    +   *                        the specific port.
        */
    -  def fromSparkConf(_conf: SparkConf, numUsableCores: Int = 0): TransportConf = {
    +  def fromSparkConf(_conf: SparkConf, numUsableCores: Int = 0,
    --- End diff --
    
    Also, this change broke the MiMA checks. It doesn't feel like these classes should be public (so maybe an exclusion should be fine here), but you can also work around it by declaring an overloaded method instead.
    
    @aarondav any comments about whether these classes are really meant to be public?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-94062816
  
    The changes as they currently stand LGTM, let's just fix up the style as @vanzin pointed out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93836401
  
    Given that the network-common library (where the retry logic is) is supposed to be generic, and not just for shuffle, I'd rather keep the retry logic there, so this change looks better to me. I would just avoid setting the value in the passed configuration, preferring to clone it instead (or some other way that does not modify the input conf).
    
    Perhaps a similar change in the non-YARN service (does that exist?) would be needed too.
    
    That being said, the javadoc for `TransportServer.bindRightPort` seems wrong; it says it will try to bind to the given port multiple times, but it actually tries different ports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93949870
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30473/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93687641
  
    CC @andrewor14 That makes some sense. I hope there aren't side effects to setting this globally in this process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5537#discussion_r28622275
  
    --- Diff: core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala ---
    @@ -43,8 +43,12 @@ object SparkTransportConf {
        * @param numUsableCores if nonzero, this will restrict the server and client threads to only
        *                       use the given number of cores, rather than all of the machine's cores.
        *                       This restriction will only occur if these properties are not already set.
    +   * @param disablePortRetry if true, server will not retry its port. It's better for the long-run
    +   *                        server to disable it since the server and client had the agreement of
    --- End diff --
    
    super nit: alignment is off by 1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by SaintBacchus <gi...@git.apache.org>.
Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93691448
  
    @srowen This code was used in `YarnShuffleService`, any spark application can't get this property.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-94114473
  
    I see, it makes sense to remove it for services set up independently of the Spark application, but not for Netty because the executors will still be able to find the port. This workaround seems appropriate then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by SaintBacchus <gi...@git.apache.org>.
Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93947685
  
    @andrewor14 `TransportServer#bindRightPort` will use in `Netty` network, in that case have a retry mechanism is a better way.
    @vanzin  I have clone the configuration and modify the javadoc and also set no retry in `StandaloneWorkerShuffleService`. Please have a check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by SaintBacchus <gi...@git.apache.org>.
Github user SaintBacchus closed the pull request at:

    https://github.com/apache/spark/pull/5537


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5537#discussion_r28624950
  
    --- Diff: core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala ---
    @@ -43,8 +43,12 @@ object SparkTransportConf {
        * @param numUsableCores if nonzero, this will restrict the server and client threads to only
        *                       use the given number of cores, rather than all of the machine's cores.
        *                       This restriction will only occur if these properties are not already set.
    +   * @param disablePortRetry if true, server will not retry its port. It's better for the long-run
    +   *                        server to disable it since the server and client had the agreement of
    +   *                        the specific port.
        */
    -  def fromSparkConf(_conf: SparkConf, numUsableCores: Int = 0): TransportConf = {
    +  def fromSparkConf(_conf: SparkConf, numUsableCores: Int = 0,
    --- End diff --
    
    Definitely not meant to be public, could make it `private[spark]` and add exclusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93948083
  
      [Test build #30473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30473/consoleFull) for   PR 5537 at commit [`962770c`](https://github.com/apache/spark/commit/962770c914a1a1928dccbf14a26df735ba4f77f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93628986
  
      [Test build #30391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30391/consoleFull) for   PR 5537 at commit [`7126547`](https://github.com/apache/spark/commit/7126547da9ca6447d8c53bec8bb4094e11badb2e).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93628990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30391/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5537#discussion_r28640416
  
    --- Diff: core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala ---
    @@ -43,8 +43,12 @@ object SparkTransportConf {
        * @param numUsableCores if nonzero, this will restrict the server and client threads to only
        *                       use the given number of cores, rather than all of the machine's cores.
        *                       This restriction will only occur if these properties are not already set.
    +   * @param disablePortRetry if true, server will not retry its port. It's better for the long-run
    +   *                        server to disable it since the server and client had the agreement of
    +   *                        the specific port.
    --- End diff --
    
    if true, the server will not retry to bind to a different port than the one configured.
    This is better for long-running servers since they may have agreed upon specific
    ports with the clients.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5537#discussion_r28640429
  
    --- Diff: core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala ---
    @@ -43,8 +43,12 @@ object SparkTransportConf {
        * @param numUsableCores if nonzero, this will restrict the server and client threads to only
        *                       use the given number of cores, rather than all of the machine's cores.
        *                       This restriction will only occur if these properties are not already set.
    +   * @param disablePortRetry if true, server will not retry its port. It's better for the long-run
    +   *                        server to disable it since the server and client had the agreement of
    +   *                        the specific port.
    --- End diff --
    
    actually, since the only place we ever set this to true is in `StandaloneWorkerShuffleService`, does it make sense to just move the conf setting there along with this comment? (similar to what you did in YARN)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5537#discussion_r28622322
  
    --- Diff: core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala ---
    @@ -43,8 +43,12 @@ object SparkTransportConf {
        * @param numUsableCores if nonzero, this will restrict the server and client threads to only
        *                       use the given number of cores, rather than all of the machine's cores.
        *                       This restriction will only occur if these properties are not already set.
    +   * @param disablePortRetry if true, server will not retry its port. It's better for the long-run
    +   *                        server to disable it since the server and client had the agreement of
    +   *                        the specific port.
        */
    -  def fromSparkConf(_conf: SparkConf, numUsableCores: Int = 0): TransportConf = {
    +  def fromSparkConf(_conf: SparkConf, numUsableCores: Int = 0,
    --- End diff --
    
    nit: style here is to indent next line by 2 spaces, or have one parameter per line indented by 4 spaces each.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93799183
  
    @SaintBacchus It's fine to set it to 0, but maybe it makes more sense to just remove the logic that retries port in the first place since the shuffle service is the only place that binds to ports in the whole subproject. More specifically I'm referring to `TransportServer#bindRightPort` and `TransportConf#portMaxRetries`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Do not let Yarn Shuffle Server retry its serve...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5537#issuecomment-93617048
  
      [Test build #30391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30391/consoleFull) for   PR 5537 at commit [`7126547`](https://github.com/apache/spark/commit/7126547da9ca6447d8c53bec8bb4094e11badb2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org