You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liyichao <gi...@git.apache.org> on 2017/05/24 17:47:07 UTC

[GitHub] spark pull request #18092: Make rpc timeout and retry for shuffle registrati...

GitHub user liyichao opened a pull request:

    https://github.com/apache/spark/pull/18092

    Make rpc timeout and retry for shuffle registration configurable.

    ## What changes were proposed in this pull request?
    
    As title said
    
    ## How was this patch tested?
    no
    
    cc @sitalkedia


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liyichao/spark SPARK-20640

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18092
    
----
commit 80e9ad9e02fbfd24bbd6d97e03b1bdf01e4c922c
Author: Li Yichao <ly...@zhihu.com>
Date:   2017-05-24T17:42:43Z

    Make rpc timeout and retry for shuffle registration configurable.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77355/testReport)** for PR 18092 at commit [`aa51261`](https://github.com/apache/spark/commit/aa512614f303bd2de3144fe452d5bb62616e9756).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    @fabboe This is not qualified for backporting to 2.2


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by swevrywhere <gi...@git.apache.org>.
Github user swevrywhere commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Was this included in the new PySpark 2.2.1 release? I didn't see this item number in https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12340470, b ut am hoping that it was since this item has a fix for 6 months now. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77356/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78112/testReport)** for PR 18092 at commit [`5bee019`](https://github.com/apache/spark/commit/5bee0196e8ffe92db66a566dc940a01f05246b54).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class KVStoreSerializer `
      * `public abstract class KVStoreView<T> implements Iterable<T> `
      * `public class KVTypeInfo `
      * `public class LevelDB implements KVStore `
      * `  public static class TypeAliases `
      * `class LevelDBIterator<T> implements KVStoreIterator<T> `
      * `class LevelDBTypeInfo `
      * `  class Index `
      * `public class UnsupportedStoreVersionException extends IOException `
      * `          logError(s\"Not measuring processing time for listener class $className because a \" +`
      * `class FilteredObjectInputStream extends ObjectInputStream `
      * `        String.format(\"Unexpected class in stream: %s\", desc.getName()));`
      * `case class Uuid() extends LeafExpression `
      * `case class UnresolvedHint(name: String, parameters: Seq[Any], child: LogicalPlan)`
      * `case class HintInfo(broadcast: Boolean = false) `
      * `public final class ParquetDictionary implements Dictionary `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78145/testReport)** for PR 18092 at commit [`c06e871`](https://github.com/apache/spark/commit/c06e871e46d174f8812e3b3ed2a61809de0ca794).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122617448
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(port: Int): (TransportServer, Int) = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    --- End diff --
    
    nit:
    ```
    def xxxI(
        para1: XXX
        para2: XXX): XXX
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78379/testReport)** for PR 18092 at commit [`4e0169f`](https://github.com/apache/spark/commit/4e0169fec340543fbd3c9d680ecc35ece7fca5d9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78379/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122620995
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1286,61 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(port: Int): (TransportServer, Int) = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(
    +                     client: TransportClient,
    --- End diff --
    
    I mean 4 spaces indention, not to align with `def`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    >> I can not think of meaningful test cases, are there any suggestions?
    
    How about just "unit tests" ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77355/testReport)** for PR 18092 at commit [`aa51261`](https://github.com/apache/spark/commit/aa512614f303bd2de3144fe452d5bb62616e9756).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77894/testReport)** for PR 18092 at commit [`5bee019`](https://github.com/apache/spark/commit/5bee0196e8ffe92db66a566dc940a01f05246b54).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class KVStoreSerializer `
      * `public abstract class KVStoreView<T> implements Iterable<T> `
      * `public class KVTypeInfo `
      * `public class LevelDB implements KVStore `
      * `  public static class TypeAliases `
      * `class LevelDBIterator<T> implements KVStoreIterator<T> `
      * `class LevelDBTypeInfo `
      * `  class Index `
      * `public class UnsupportedStoreVersionException extends IOException `
      * `          logError(s\"Not measuring processing time for listener class $className because a \" +`
      * `class FilteredObjectInputStream extends ObjectInputStream `
      * `        String.format(\"Unexpected class in stream: %s\", desc.getName()));`
      * `case class Uuid() extends LeafExpression `
      * `case class UnresolvedHint(name: String, parameters: Seq[Any], child: LogicalPlan)`
      * `case class HintInfo(broadcast: Boolean = false) `
      * `public final class ParquetDictionary implements Dictionary `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Thank you for making this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122617592
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(port: Int): (TransportServer, Int) = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    +                             callback: RpcResponseCallback): Unit = {
    +          val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
    +          msgObj match {
    +            case exec: RegisterExecutor =>
    +              Thread.sleep(50)
    +              val attempt = attempts.getOrElse(exec.execId, 0) + 1
    +              attempts(exec.execId) = attempt
    +              if (attempt < 2) {
    +                callback.onFailure(new Exception(tryAgainMsg))
    +                return
    +              }
    +              callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0)))
    +          }
    +        }
    +      }
    +
    +      val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores = 0)
    +      val transCtx = new TransportContext(transConf, handler, true)
    +      (transCtx.createServer(port, Seq.empty[TransportServerBootstrap].asJava), port)
    +    }
    +    val candidatePort = RandomUtils.nextInt(1024, 65536)
    +    val (server, shufflePort) = Utils.startServiceOnPort(candidatePort,
    --- End diff --
    
    will this be flaky? e.g. the port is occupied by other test suites


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78365/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122918448
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -303,6 +303,16 @@ package object config {
           .bytesConf(ByteUnit.BYTE)
           .createWithDefault(100 * 1024 * 1024)
     
    +  private[spark] val SHUFFLE_REGISTRATION_TIMEOUT =
    +    ConfigBuilder("spark.shuffle.registration.timeout")
    --- End diff --
    
    can you add `.doc("xxx")` to explain it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78112/testReport)** for PR 18092 at commit [`5bee019`](https://github.com/apache/spark/commit/5bee0196e8ffe92db66a566dc940a01f05246b54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78256/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78151/testReport)** for PR 18092 at commit [`d01134e`](https://github.com/apache/spark/commit/d01134ef92401a5275c7388c8e6d65c82785acfa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78366/testReport)** for PR 18092 at commit [`59a9ebd`](https://github.com/apache/spark/commit/59a9ebd41fbe3a657bfe8cc6348561f46c0aaa6d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78249/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122661511
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleClient.java ---
    @@ -49,6 +49,7 @@
       private final TransportConf conf;
       private final boolean authEnabled;
       private final SecretKeyHolder secretKeyHolder;
    +  private final long registrationTimeoutMilli;
    --- End diff --
    
    "MS" or "Millis" is more consistent. Milli suggests something different. https://en.wikipedia.org/wiki/Milli_Vanilli


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77422 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77422/testReport)** for PR 18092 at commit [`fb2b706`](https://github.com/apache/spark/commit/fb2b7061c1775e7e502228c3f551010cca6b001c).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78365/testReport)** for PR 18092 at commit [`97f825e`](https://github.com/apache/spark/commit/97f825e4e29b2f892f3c104848f9f9086e8b608f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r118547246
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -170,11 +170,17 @@ private[spark] class BlockManager(
       // service, or just our own Executor's BlockManager.
       private[spark] var shuffleServerId: BlockManagerId = _
     
    +  private val registrationTimeout =
    +    conf.getTimeAsMs("spark.shuffle.registration.timeout", "5s")
    --- End diff --
    
    For new configurations, should we be putting these into the `config` package object? See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/package.scala and https://github.com/apache/spark/pull/10205


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77849/testReport)** for PR 18092 at commit [`e073070`](https://github.com/apache/spark/commit/e07307011fd05c07aff014db206e8d25fcaad4a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r121269157
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1285,57 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val shufflePort = 10000
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    conf.set("spark.shuffle.service.enabled", "true")
    +    conf.set("spark.shuffle.service.port", shufflePort.toString)
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(): TransportServer = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    +                             callback: RpcResponseCallback): Unit = {
    +          val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
    +          msgObj match {
    +            case exec: RegisterExecutor =>
    +              Thread.sleep(50)
    +              val attempt = attempts.getOrElse(exec.execId, 0) + 1
    +              attempts(exec.execId) = attempt
    +              if (attempt < 2) {
    +                callback.onFailure(new Exception(tryAgainMsg))
    +                return
    +              }
    +              callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0)))
    +          }
    +        }
    +      }
    +
    +      val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores = 0)
    +      val transCtx = new TransportContext(transConf, handler, true)
    +      transCtx.createServer(shufflePort, Nil.asInstanceOf[Seq[TransportServerBootstrap]].asJava)
    +    }
    +    newShuffleServer()
    +
    +    conf.set(SHUFFLE_REGISTRATION_TIMEOUT.key, "40")
    +    conf.set(SHUFFLE_REGISTRATION_MAX_ATTEMPTS.key, "1")
    +    var e = intercept[SparkException]{
    +      makeBlockManager(8000, "executor1")
    +    }.getMessage
    +    assert(e.contains("TimeoutException"))
    +
    +    conf.set(SHUFFLE_REGISTRATION_TIMEOUT.key, "1000")
    +    conf.set(SHUFFLE_REGISTRATION_MAX_ATTEMPTS.key, "1")
    +    e = intercept[SparkException]{
    --- End diff --
    
    Ahh, I see why you needed the `sleep()` above: so we can actually return an error in the non-timeout case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78366/testReport)** for PR 18092 at commit [`59a9ebd`](https://github.com/apache/spark/commit/59a9ebd41fbe3a657bfe8cc6348561f46c0aaa6d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77894/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78287/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122621671
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1286,61 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(port: Int): (TransportServer, Int) = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(
    +                     client: TransportClient,
    --- End diff --
    
    Oh.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r121281612
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1285,57 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val shufflePort = 10000
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    conf.set("spark.shuffle.service.enabled", "true")
    +    conf.set("spark.shuffle.service.port", shufflePort.toString)
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(): TransportServer = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    +                             callback: RpcResponseCallback): Unit = {
    +          val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
    +          msgObj match {
    +            case exec: RegisterExecutor =>
    +              Thread.sleep(50)
    +              val attempt = attempts.getOrElse(exec.execId, 0) + 1
    +              attempts(exec.execId) = attempt
    +              if (attempt < 2) {
    +                callback.onFailure(new Exception(tryAgainMsg))
    +                return
    +              }
    +              callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0)))
    +          }
    +        }
    +      }
    +
    +      val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores = 0)
    +      val transCtx = new TransportContext(transConf, handler, true)
    +      transCtx.createServer(shufflePort, Nil.asInstanceOf[Seq[TransportServerBootstrap]].asJava)
    +    }
    +    newShuffleServer()
    +
    +    conf.set(SHUFFLE_REGISTRATION_TIMEOUT.key, "40")
    +    conf.set(SHUFFLE_REGISTRATION_MAX_ATTEMPTS.key, "1")
    +    var e = intercept[SparkException]{
    +      makeBlockManager(8000, "executor1")
    +    }.getMessage
    +    assert(e.contains("TimeoutException"))
    +
    +    conf.set(SHUFFLE_REGISTRATION_TIMEOUT.key, "1000")
    +    conf.set(SHUFFLE_REGISTRATION_MAX_ATTEMPTS.key, "1")
    +    e = intercept[SparkException]{
    --- End diff --
    
    Hi, what's your suggestion?  When attempt < 2, we already return an error `tryAgainMsg`. The request must fail if specified time is not passed, and succeed otherwise, there seems to be no other choice besides `sleep`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78151/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    LGTM except 2 minor comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78379/testReport)** for PR 18092 at commit [`4e0169f`](https://github.com/apache/spark/commit/4e0169fec340543fbd3c9d680ecc35ece7fca5d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r121269224
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleClient.java ---
    @@ -60,10 +61,19 @@
       public ExternalShuffleClient(
           TransportConf conf,
           SecretKeyHolder secretKeyHolder,
    -      boolean authEnabled) {
    +      boolean authEnabled,
    +      long registrationTimeoutMilli) {
         this.conf = conf;
         this.secretKeyHolder = secretKeyHolder;
         this.authEnabled = authEnabled;
    +    this.registrationTimeoutMilli = registrationTimeoutMilli;
    +  }
    +
    +  public ExternalShuffleClient(
    --- End diff --
    
    Do we actually need this backwards-compatible constructor? AFAIK this interface is internal only and shouldn't be used from outside of Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    @JoshRosen Could you please see the failed test? It seems unrelated to this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    cc - @squito


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77894/testReport)** for PR 18092 at commit [`5bee019`](https://github.com/apache/spark/commit/5bee0196e8ffe92db66a566dc940a01f05246b54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122238555
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/mesos/MesosExternalShuffleClient.java ---
    @@ -61,7 +61,7 @@ public MesosExternalShuffleClient(
           TransportConf conf,
           SecretKeyHolder secretKeyHolder,
           boolean authEnabled) {
    -    super(conf, secretKeyHolder, authEnabled);
    +    super(conf, secretKeyHolder, authEnabled, 5000);
    --- End diff --
    
    Let's put this magic number into a config value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77356/testReport)** for PR 18092 at commit [`bb1801e`](https://github.com/apache/spark/commit/bb1801e278609ba378632c533665960cc80bd007).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78365/testReport)** for PR 18092 at commit [`97f825e`](https://github.com/apache/spark/commit/97f825e4e29b2f892f3c104848f9f9086e8b608f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122918471
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -303,6 +303,16 @@ package object config {
           .bytesConf(ByteUnit.BYTE)
           .createWithDefault(100 * 1024 * 1024)
     
    +  private[spark] val SHUFFLE_REGISTRATION_TIMEOUT =
    +    ConfigBuilder("spark.shuffle.registration.timeout")
    +      .timeConf(TimeUnit.MILLISECONDS)
    +      .createWithDefault(5000)
    +
    +  private[spark] val SHUFFLE_REGISTRATION_MAX_ATTEMPTS =
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78246/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: Make rpc timeout and retry for shuffle registrati...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r118327566
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -174,14 +174,18 @@ private[spark] class BlockManager(
       // standard BlockTransferService to directly connect to other Executors.
       private[spark] val shuffleClient = if (externalShuffleServiceEnabled) {
         val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores)
    -    new ExternalShuffleClient(transConf, securityManager, securityManager.isAuthenticationEnabled())
    +    new ExternalShuffleClient(transConf, securityManager, securityManager.isAuthenticationEnabled(), registrationTimeout)
       } else {
         blockTransferService
       }
     
       // Max number of failures before this block manager refreshes the block locations from the driver
       private val maxFailuresBeforeLocationRefresh =
         conf.getInt("spark.block.failures.beforeLocationRefresh", 5)
    +  private val registrationTimeout =
    --- End diff --
    
    Please use conf.getTimeAsMs instead of getInt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r123040340
  
    --- Diff: docs/configuration.md ---
    @@ -639,6 +639,20 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    +  <td><code>spark.shuffle.registration.timeout</code></td>
    +  <td>5000</td>
    +  <td>
    +    Timeout in milliseconds for registration to the external service.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.shuffle.registration.maxAttempts</code></td>
    +  <td>3</td>
    +  <td>
    +    When we fail to register to the external service, we will retry for maxAttempts times.
    --- End diff --
    
    nit:  When we fail to register to the external shuffle service, we will retry for maxAttempts times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    ping @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r121269201
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1285,57 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val shufflePort = 10000
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    conf.set("spark.shuffle.service.enabled", "true")
    +    conf.set("spark.shuffle.service.port", shufflePort.toString)
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(): TransportServer = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    +                             callback: RpcResponseCallback): Unit = {
    +          val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
    +          msgObj match {
    +            case exec: RegisterExecutor =>
    +              Thread.sleep(50)
    +              val attempt = attempts.getOrElse(exec.execId, 0) + 1
    +              attempts(exec.execId) = attempt
    +              if (attempt < 2) {
    +                callback.onFailure(new Exception(tryAgainMsg))
    +                return
    +              }
    +              callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0)))
    +          }
    +        }
    +      }
    +
    +      val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores = 0)
    +      val transCtx = new TransportContext(transConf, handler, true)
    +      transCtx.createServer(shufflePort, Nil.asInstanceOf[Seq[TransportServerBootstrap]].asJava)
    +    }
    +    newShuffleServer()
    --- End diff --
    
    Do we need to shut down the server created here or otherwise perform any cleanup after the test finishes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78247/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78246/testReport)** for PR 18092 at commit [`c8e7c64`](https://github.com/apache/spark/commit/c8e7c64d7e599c3f6283f2390c1ea188e4ed899a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r121269114
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1285,57 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val shufflePort = 10000
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    conf.set("spark.shuffle.service.enabled", "true")
    +    conf.set("spark.shuffle.service.port", shufflePort.toString)
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(): TransportServer = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    +                             callback: RpcResponseCallback): Unit = {
    +          val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
    +          msgObj match {
    +            case exec: RegisterExecutor =>
    +              Thread.sleep(50)
    --- End diff --
    
    Do we actually need this sleep? What if we just simply never returned any response if `attempt < 2`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77315/testReport)** for PR 18092 at commit [`80e9ad9`](https://github.com/apache/spark/commit/80e9ad9e02fbfd24bbd6d97e03b1bdf01e4c922c).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78112/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78287/testReport)** for PR 18092 at commit [`020f9e2`](https://github.com/apache/spark/commit/020f9e21f40f93a7a3b93f3e698ae4ecc0f94685).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77356/testReport)** for PR 18092 at commit [`bb1801e`](https://github.com/apache/spark/commit/bb1801e278609ba378632c533665960cc80bd007).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Sorry, I thought it not necessary to duplicate message in JIRA, thanks for the suggestion.
    PR is updated. As to the test plan, the modification seems straightforward, and I can not think of meaningful test cases, are there any suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78366/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77422/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122918254
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleClient.java ---
    @@ -132,7 +135,7 @@ public void registerWithShuffleServer(
         checkInit();
         try (TransportClient client = clientFactory.createUnmanagedClient(host, port)) {
           ByteBuffer registerMessage = new RegisterExecutor(appId, execId, executorInfo).toByteBuffer();
    -      client.sendRpcSync(registerMessage, 5000 /* timeoutMs */);
    --- End diff --
    
    previous we use `Ms`, can we keep this instead of `Millis`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r123040230
  
    --- Diff: docs/configuration.md ---
    @@ -639,6 +639,20 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    +  <td><code>spark.shuffle.registration.timeout</code></td>
    +  <td>5000</td>
    +  <td>
    +    Timeout in milliseconds for registration to the external service.
    --- End diff --
    
    nit: Timeout in milliseconds for registration to the external shuffle service.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Looks good overall. Left a couple of comments regarding something that I worry could potentially make the tests flaky.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78151/testReport)** for PR 18092 at commit [`d01134e`](https://github.com/apache/spark/commit/d01134ef92401a5275c7388c8e6d65c82785acfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    LGTM except some minor comments. Can we also add some document for this 2 new options in `configuration.md` the "Shuffle Behavior" section? thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122620600
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(port: Int): (TransportServer, Int) = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    --- End diff --
    
    Updated. By the way, I am a little confused.
    
    First, when you insert line break before, Intellij auto indent like this:
    
    ```
    override def receive(
                                      client: TransportClient
    ```
    
    Second, in the same file, at near 1349, `fetchBlocks`'s indent is like this:
    
    ```
        override def fetchBlocks(
            host: String,
            port: Int,
            execId: String,
            blockIds: Array[String],
            listener: BlockFetchingListener,
            shuffleFiles: Array[File]): Unit = {
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77849/testReport)** for PR 18092 at commit [`e073070`](https://github.com/apache/spark/commit/e07307011fd05c07aff014db206e8d25fcaad4a2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasMinSupport(Params):`
      * `class HasNumPartitions(Params):`
      * `class HasMinConfidence(Params):`
      * `case class UnresolvedRelation(`
      * `case class DayOfWeek(child: Expression) extends UnaryExpression with ImplicitCastInputTypes `
      * `case class StringReplace(srcExpr: Expression, searchExpr: Expression, replaceExpr: Expression)`
      * `case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes `
      * `trait Command extends LogicalPlan `
      * `case class ResolvedHint(child: LogicalPlan, hints: HintInfo = HintInfo())`
      * `case class HintInfo(`
      * `case class ExecutedCommandExec(cmd: RunnableCommand, children: Seq[SparkPlan]) extends SparkPlan `
      * `case class StateStoreId(`
      * `class UnsafeRowPair(var key: UnsafeRow = null, var value: UnsafeRow = null) `
      * `trait StateStoreWriter extends StatefulOperator `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78287/testReport)** for PR 18092 at commit [`020f9e2`](https://github.com/apache/spark/commit/020f9e21f40f93a7a3b93f3e698ae4ecc0f94685).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78256 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78256/testReport)** for PR 18092 at commit [`ca308bc`](https://github.com/apache/spark/commit/ca308bcc12243d1a3011997688883c58f0b2d801).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78249/testReport)** for PR 18092 at commit [`d31d8da`](https://github.com/apache/spark/commit/d31d8da7952e1db527fa892087b2feb85799cae4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by fabboe <gi...@git.apache.org>.
Github user fabboe commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    can we merge this to branch-2.2 for the next minor?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    ping @jiangxb1987


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77315/testReport)** for PR 18092 at commit [`80e9ad9`](https://github.com/apache/spark/commit/80e9ad9e02fbfd24bbd6d97e03b1bdf01e4c922c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r121269120
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1285,57 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val shufflePort = 10000
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    conf.set("spark.shuffle.service.enabled", "true")
    +    conf.set("spark.shuffle.service.port", shufflePort.toString)
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(): TransportServer = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    +                             callback: RpcResponseCallback): Unit = {
    +          val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
    +          msgObj match {
    +            case exec: RegisterExecutor =>
    +              Thread.sleep(50)
    +              val attempt = attempts.getOrElse(exec.execId, 0) + 1
    +              attempts(exec.execId) = attempt
    +              if (attempt < 2) {
    +                callback.onFailure(new Exception(tryAgainMsg))
    +                return
    +              }
    +              callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0)))
    +          }
    +        }
    +      }
    +
    +      val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores = 0)
    +      val transCtx = new TransportContext(transConf, handler, true)
    +      transCtx.createServer(shufflePort, Nil.asInstanceOf[Seq[TransportServerBootstrap]].asJava)
    --- End diff --
    
    Nit: you can just write `Seq.empty[TransportServerBootstrap].asJava`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r121269084
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1285,57 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val shufflePort = 10000
    --- End diff --
    
    I'm afraid that this may lead to flakiness in Jenkins: we run multiple concurrent builds on the machine and they aren't containerized, so hardcoding ports in unit tests risks port conflicts (especially when several jobs kick off at about the same time; this actually _is_ an issue in practice).
    
    If you need to know the port that it binds to then I would recommend using `Utils.startServiceOnPort` (see examples of this in existing tests elsewhere in the codebase, such as in the Kafka module).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77624/testReport)** for PR 18092 at commit [`e073070`](https://github.com/apache/spark/commit/e07307011fd05c07aff014db206e8d25fcaad4a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r118671187
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -170,11 +170,17 @@ private[spark] class BlockManager(
       // service, or just our own Executor's BlockManager.
       private[spark] var shuffleServerId: BlockManagerId = _
     
    +  private val registrationTimeout =
    +    conf.getTimeAsMs("spark.shuffle.registration.timeout", "5s")
    --- End diff --
    
    Updated and add unit test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77422 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77422/testReport)** for PR 18092 at commit [`fb2b706`](https://github.com/apache/spark/commit/fb2b7061c1775e7e502228c3f551010cca6b001c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18092#discussion_r122620196
  
    --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
    @@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
         assert(master.getLocations("item").isEmpty)
       }
     
    +  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") {
    +    val tryAgainMsg = "test_spark_20640_try_again"
    +    // a server which delays response 50ms and must try twice for success.
    +    def newShuffleServer(port: Int): (TransportServer, Int) = {
    +      val attempts = new mutable.HashMap[String, Int]()
    +      val handler = new NoOpRpcHandler {
    +        override def receive(client: TransportClient, message: ByteBuffer,
    +                             callback: RpcResponseCallback): Unit = {
    +          val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
    +          msgObj match {
    +            case exec: RegisterExecutor =>
    +              Thread.sleep(50)
    +              val attempt = attempts.getOrElse(exec.execId, 0) + 1
    +              attempts(exec.execId) = attempt
    +              if (attempt < 2) {
    +                callback.onFailure(new Exception(tryAgainMsg))
    +                return
    +              }
    +              callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0)))
    +          }
    +        }
    +      }
    +
    +      val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores = 0)
    +      val transCtx = new TransportContext(transConf, handler, true)
    +      (transCtx.createServer(port, Seq.empty[TransportServerBootstrap].asJava), port)
    +    }
    +    val candidatePort = RandomUtils.nextInt(1024, 65536)
    +    val (server, shufflePort) = Utils.startServiceOnPort(candidatePort,
    --- End diff --
    
    No, because `startServiceOnPort` will handle the conflicted port case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77315/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78247/testReport)** for PR 18092 at commit [`d31d8da`](https://github.com/apache/spark/commit/d31d8da7952e1db527fa892087b2feb85799cae4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77849/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78145/testReport)** for PR 18092 at commit [`c06e871`](https://github.com/apache/spark/commit/c06e871e46d174f8812e3b3ed2a61809de0ca794).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: Make rpc timeout and retry for shuffle registration conf...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Please update the summary and test plan for the PR. In the summary you can put the JIRA description - 
    
    ""Currently the shuffle service registration timeout and retry has been hardcoded. This works well for small workloads but under heavy workload when the shuffle service is busy transferring large amount of data we see significant delay in responding to the registration request, as a result we often see the executors fail to register with the shuffle service, eventually failing the job. We need to make these two parameters configurable.""
    
    Also you can change the title of the PR to 
    
    [SPARK-20640][CORE]Make rpc timeout and retry for shuffle registration configurable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78145/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78249/testReport)** for PR 18092 at commit [`d31d8da`](https://github.com/apache/spark/commit/d31d8da7952e1db527fa892087b2feb85799cae4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77355/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18092


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #78256 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78256/testReport)** for PR 18092 at commit [`ca308bc`](https://github.com/apache/spark/commit/ca308bcc12243d1a3011997688883c58f0b2d801).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77624/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18092
  
    **[Test build #77624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77624/testReport)** for PR 18092 at commit [`e073070`](https://github.com/apache/spark/commit/e07307011fd05c07aff014db206e8d25fcaad4a2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasMinSupport(Params):`
      * `class HasNumPartitions(Params):`
      * `class HasMinConfidence(Params):`
      * `case class UnresolvedRelation(`
      * `case class DayOfWeek(child: Expression) extends UnaryExpression with ImplicitCastInputTypes `
      * `case class StringReplace(srcExpr: Expression, searchExpr: Expression, replaceExpr: Expression)`
      * `case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes `
      * `trait Command extends LogicalPlan `
      * `case class ResolvedHint(child: LogicalPlan, hints: HintInfo = HintInfo())`
      * `case class HintInfo(`
      * `case class ExecutedCommandExec(cmd: RunnableCommand, children: Seq[SparkPlan]) extends SparkPlan `
      * `case class StateStoreId(`
      * `class UnsafeRowPair(var key: UnsafeRow = null, var value: UnsafeRow = null) `
      * `trait StateStoreWriter extends StatefulOperator `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org