You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/18 02:21:22 UTC

[GitHub] [spark] turboFei opened a new pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

turboFei opened a new pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604458017
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25135/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600577639
 
 
   **[Test build #119986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119986/testReport)** for PR 27943 at commit [`73d2a63`](https://github.com/apache/spark/commit/73d2a63fddc1c6efe4f797514eb4a1711f9fd175).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607319253
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25381/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600738366
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24718/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604626916
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603163006
 
 
   Agree the fail fast time window length should be a little less than conf.ioRetryWaitTimeMs().
   
   
   
   > The only other question I have is connections not going through the retryingblockfetcher, this could potentially fail them much faster, if its a one time fetch is that what we want. I would need to look a bit more at the usages there.
   
   I think we can only fast fail the connection if conf.maxIORetries >0.
   
   
   
   >Also I'm curious could this lead to the scenario that, when you have two tasks and only one client, the connection request from the second task may fail fast every time it tries to connect (because the connection from the first task always fail beforehand)?
   
   I think if the  service address is still unreachable after maxIORetries retries, it does not matter to fail the another task.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603838714
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605098041
 
 
   **[Test build #120502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120502/testReport)** for PR 27943 at commit [`8942ae5`](https://github.com/apache/spark/commit/8942ae5304e8ea645625ef7d4f994fc7b47c9e57).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-606998340
 
 
   **[Test build #120661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120661/testReport)** for PR 27943 at commit [`fe8c08c`](https://github.com/apache/spark/commit/fe8c08c8aabefa842c83a7173c5cf02dfe60f9b3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603657165
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25022/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei edited a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei edited a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600431305
 
 
   Thanks for the reply.
   We met this issue when ESS(node manager) was busy for full gc, and the task cost long times(connectionTimeout*maxRetry*(number of requests to the same ESS)) and then became fetch failed.
   We expect the task could fast fail instead of wasting too much times to wait the lock of client and then create client timeout.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607400417
 
 
   **[Test build #120682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120682/testReport)** for PR 27943 at commit [`9739b8a`](https://github.com/apache/spark/commit/9739b8a299cb0e58653018e9d570a4b7e9f5058a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607323072
 
 
   **[Test build #120682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120682/testReport)** for PR 27943 at commit [`9739b8a`](https://github.com/apache/spark/commit/9739b8a299cb0e58653018e9d570a4b7e9f5058a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-601095546
 
 
   I think the problem can be described as: the client pool has no more healthy clients, and we need to run a task, what shall we do?
   
   Currently we just run and timeout 3 times, and this PR proposes to fail fast.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605102126
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603157520
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600392427
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r397566176
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +193,20 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      double fastFailTimeWindow = conf.ioRetryWaitTimeMs() * 0.95;
 
 Review comment:
   thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604198431
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei edited a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei edited a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-601209554
 
 
   Just attach the example mentioned in the description.
   > For example: there are two request connection, rc1 and rc2.
   Especially, the io.numConnectionsPerPeer is 1 and connection timeout is 2 minutes.
   1: rc1 hold the client lock, it timeout after 2 minutes.
   2: rc2 hold the client lock, it timeout after 2 minutes.
   3: rc1 start the second retry, hold lock and timeout after 2 minutes.
   4: rc2 start the second retry, hold lock and timeout after 2 minutes.
   5: rc1 start the third retry, hold lock and timeout after 2 minutes.
   6: rc2 start the third retry, hold lock and timeout after 2 minutes.
   It wastes lots of time.
   
   The concern is that, for some case, these request connections block each other.
   
   We can also just fast break the connection but don't increase its retry count.
   If the rc1 connect timeout, then we fast *break* the first retry of rc2 but don't increase the retry count of rc2.
   Then rc1 will wait a io retry wait, and then start the second retry and connect timeout.
   Then we fast break the rc2 and still don't increase its retry count.
   Then rc1 will wait a io retry wait, and then start the third retry and connect timeout, then rc1 throw fetch failed exception.
   
   
   I think it is better than the request connections block each other.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607319253
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25381/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605098821
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25208/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605098821
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25208/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402149106
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   I think I get your point.
   Do you mean that add `expectedException = ExpectedException.none();` after this test case complete?
   
   I just refer the usage in SparkSubmitCommandBuilderSuite, it is the only suite uses **Rule** before this pr.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603169552
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604458005
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605102139
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120502/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603640018
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120298/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603676932
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402149106
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   I think I get your point.
   Do you mean that add `expectedException = ExpectedException.none();` after this test case complete?
   
   I just refer the usage in SparkSubmitCommandBuilderSuite, it is only used before this pr.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603737294
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25045/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603866859
 
 
   I'll take a closer look this afternoon, but can you update the description to just mention that retries still take affect since we had discussion about that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603737288
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603290788
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600663088
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600663104
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119989/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402155245
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   https://github.com/junit-team/junit4/blob/435d41f0d45cfdbc1a38e1ad4eb1d5300da533f9/src/main/java/org/junit/Rule.java#L27-L30
   
   Here is the comments, it is just for each \@Test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402100291
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,11 +206,30 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      // If this connection should fast fail when last connection failed in last fast fail time
+      // window and it did, fail this connection directly.
+      if (fastFail && System.currentTimeMillis() - clientPool.lastConnectionFailed <
+        fastFailTimeWindow) {
+        throw new IOException(
+          String.format("Connecting to %s failed in the last %s ms, fail this connection directly",
+            resolvedAddress, fastFailTimeWindow));
+      }
+      try {
+        clientPool.clients[clientIndex] = createClient(resolvedAddress);
+        clientPool.lastConnectionFailed = 0;
+      } catch (IOException e) {
+        clientPool.lastConnectionFailed = System.currentTimeMillis();
+        throw e;
+      }
       return clientPool.clients[clientIndex];
     }
   }
 
+  public TransportClient createClient(String remoteHost, int remotePort)
 
 Review comment:
   I think it is not necessary, because it is a public method.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607401468
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120682/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-601040166
 
 
   > How about that, if the last connection failed in the last retry io wait, the new connection would be break but its retry count would not increase.
   
   So here, you don't want to reduce the connection chances for the request/task in case of it may could establish connection successfully without giving up connecting within ioRetryWait?
   
   If so,  I think it can be a valid concern but really make things complicate and I'm wondering if it's worth to do so.
   
   And at least, as per @tgravescs comment, I think we should record the failed time for the whole `ClientPool` instead of every client from the same `ClientPool`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604529606
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604395845
 
 
   **[Test build #120412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120412/testReport)** for PR 27943 at commit [`8784906`](https://github.com/apache/spark/commit/8784906c927936a9f062c3b29efc7d5b36e608d7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604827678
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-606998709
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25360/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600738351
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604552169
 
 
   **[Test build #120423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120423/testReport)** for PR 27943 at commit [`d7a1ad5`](https://github.com/apache/spark/commit/d7a1ad57bda7600b3d16c82823617aed3354b061).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604792486
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r397623831
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -112,6 +114,7 @@ public TransportClientFactory(
     }
     this.metrics = new NettyMemoryMetrics(
       this.pooledAllocator, conf.getModuleName() + "-client", conf);
+    fastFailTimeWindow = conf.ioRetryWaitTimeMs() * 0.95;
 
 Review comment:
   thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604538168
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei edited a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei edited a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-602625438
 
 
   Thanks for the reply @tgravescs 
   Sorry for the unclear description. 
   
   `All connections` I mentioned above is that the sent requests, which want to connect the same unreachable address.
   
   
   It is my mistake that does not recognize  that there maybe several clients for the same address, may be we need keep a lastConnectionFailedTime variable for one `clientPool`.
   
   The problem is that, for a task, there maybe several request connections to the same address.
   Specially, for a shuffle read task and there is only one client in the client pool and it would always been picked by the connections, which want to connect the same ESS.
   If this address is unreachable, these connections would block each other(during createClient).
   These connections owned to a same task and want to connect the same ESS,  if this ESS was still unreachable.
   It  would cost connectionNum \* connectionTimeOut \* maxRetry to retry, and then fail the task.
   It is ideally that this task could fail in connectionTimeOut \* maxRetry.
   
   
   
   
   
   
    
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603733697
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r398562861
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    server.close();
+    int unreachablePort = server.getPort();
 
 Review comment:
   I think It is not necessary.
   I have checked the code, the port would be only set value during init()

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r398560833
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    server.close();
+    int unreachablePort = server.getPort();
 
 Review comment:
   seems safer to get port before close() just incase the close call cleans it up or makes it inaccessible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604257195
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603157520
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607047773
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120661/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604245789
 
 
   **[Test build #120389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120389/testReport)** for PR 27943 at commit [`3ef1612`](https://github.com/apache/spark/commit/3ef16128e6ea095faa3a0cdabccfc7bc66dc7f7c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604311763
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600432565
 
 
   I think it may happen for these case below:
   - nm GC
   - nm crash
   - temporary network issue

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600578225
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24707/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604265614
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600577988
 
 
   **[Test build #119986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119986/testReport)** for PR 27943 at commit [`73d2a63`](https://github.com/apache/spark/commit/73d2a63fddc1c6efe4f797514eb4a1711f9fd175).
    * This patch **fails build dependency tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603656641
 
 
   **[Test build #120312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120312/testReport)** for PR 27943 at commit [`dd126cf`](https://github.com/apache/spark/commit/dd126cf99d024923178780431981a172e119d0cb).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603597247
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604542407
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r398060555
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -86,6 +87,7 @@
   private EventLoopGroup workerGroup;
   private final PooledByteBufAllocator pooledAllocator;
   private final NettyMemoryMetrics metrics;
+  private final double fastFailTimeWindow;
 
 Review comment:
   we don't really need this as double, just case to an int. ioRetryWaitTimeMs returns int

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604396879
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120412/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603876706
 
 
   thanks, have updated the description.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607727689
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25416/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604542421
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25139/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605116074
 
 
   **[Test build #120505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120505/testReport)** for PR 27943 at commit [`49c8e13`](https://github.com/apache/spark/commit/49c8e13d2c351103231b0185d793baeff56f2771).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605287329
 
 
   **[Test build #120505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120505/testReport)** for PR 27943 at commit [`49c8e13`](https://github.com/apache/spark/commit/49c8e13d2c351103231b0185d793baeff56f2771).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603168872
 
 
   **[Test build #120269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120269/testReport)** for PR 27943 at commit [`02537b3`](https://github.com/apache/spark/commit/02537b31c996d4427f6fa51dabfcfe6555014a25).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604198439
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25098/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604828095
 
 
   Thank you for your patient review, thanks a lot @tgravescs  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604256784
 
 
   **[Test build #120402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120402/testReport)** for PR 27943 at commit [`8784906`](https://github.com/apache/spark/commit/8784906c927936a9f062c3b29efc7d5b36e608d7).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604265614
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603169559
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24980/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604541562
 
 
   **[Test build #120431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120431/testReport)** for PR 27943 at commit [`7af60c8`](https://github.com/apache/spark/commit/7af60c8171d8bf95fa07b3c101c7cd888e595643).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r398062878
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +195,18 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed < fastFailTimeWindow) {
+        throw new IOException(
+          String.format("Connecting to %s failed in the last %s ms, fail this connection directly",
+            resolvedAddress, fastFailTimeWindow));
+      }
+      try {
+        clientPool.clients[clientIndex] = createClient(resolvedAddress);
+        clientPool.lastConnectionFailed = 0;
 
 Review comment:
   initialize this when clientPool is created above

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607815600
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605098815
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604257199
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25111/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402140496
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   How do we associate `expectedException` to this specific test case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604311262
 
 
   **[Test build #120412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120412/testReport)** for PR 27943 at commit [`8784906`](https://github.com/apache/spark/commit/8784906c927936a9f062c3b29efc7d5b36e608d7).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604538168
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604200446
 
 
   **[Test build #120389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120389/testReport)** for PR 27943 at commit [`3ef1612`](https://github.com/apache/spark/commit/3ef16128e6ea095faa3a0cdabccfc7bc66dc7f7c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603657152
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603598937
 
 
   **[Test build #120298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120298/testReport)** for PR 27943 at commit [`9ff5ca2`](https://github.com/apache/spark/commit/9ff5ca20be633c17ceeabbaad46693da001ed943).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605288444
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604453680
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600420568
 
 
   are you saying we should fail fast for bad clients? When/How a client can go "bad"?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402158870
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   https://github.com/junit-team/junit4/blob/435d41f0d45cfdbc1a38e1ad4eb1d5300da533f9/src/main/java/org/junit/Rule.java#L27-L42
   
   Here is an example, this folder is a **Rule**.
   As described in the comments, it would create a temporary folder before
    each test method, and deletes it after each.
   So, I think **Rule** should be unique for each test. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604437368
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25131/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600742375
 
 
   **[Test build #119998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119998/testReport)** for PR 27943 at commit [`fa266f2`](https://github.com/apache/spark/commit/fa266f2e70b2d53fc1207050629dac4f1dc23c23).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r401664640
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -121,18 +126,25 @@ public MetricSet getAllMetrics() {
   /**
    * Create a {@link TransportClient} connecting to the given remote host / port.
    *
-   * We maintains an array of clients (size determined by spark.shuffle.io.numConnectionsPerPeer)
+   * We maintain an array of clients (size determined by spark.shuffle.io.numConnectionsPerPeer)
    * and randomly picks one to use. If no client was previously created in the randomly selected
    * spot, this function creates a new client and places it there.
    *
+   * We also maintain a last connection failed time of these clients and a fast fail time window
+   * based on io retry wait time. If this connection request can be retried and the last connection
+   * failed time of these clients in the fast fail time window, fail this connection directly.
+   *
    * Prior to the creation of a new TransportClient, we will execute all
    * {@link TransportClientBootstrap}s that are registered with this factory.
    *
    * This blocks until a connection is successfully established and fully bootstrapped.
    *
    * Concurrency: This method is safe to call from multiple threads.
+   *
+   * @param fastFail whether this connection should fast fail when the last connection of these
 
 Review comment:
   Can you add the other 2 parameters here.
   
   can you clarify the text here a bit:
   
   whether this call should fail immediately when the last attempt to the same address failed with in the last fast fail time window.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604403094
 
 
   thanks @HeartSaVioR 
   
   test passed cc @Ngone51 @tgravescs @jiangxb1987  @cloud-fan 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-601027932
 
 
   How about that, if the last connection failed in the last retry io wait, the new connection would be break but its retry count would not increase.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605102139
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120502/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-601202011
 
 
   > Currently we just run and timeout 3 times, and this PR proposes to fail fast.
   
   We should not be failing without retrying.  Is that really what this does?  I'd have to take a closer look but I thought the RetryingBlockFetcher caught this and did its normal retries within it, but that was my question yesterday to confirm?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607815607
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120718/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r394123402
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +194,17 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed[clientIndex] < conf.ioRetryWaitTimeMs()) {
+        throw new IOException(
+          String.format("Connecting to %s failed in the last (%s ms), fail this connection directly",
 
 Review comment:
   nit: `in the last %s ms`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605073646
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605102098
 
 
   **[Test build #120502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120502/testReport)** for PR 27943 at commit [`8942ae5`](https://github.com/apache/spark/commit/8942ae5304e8ea645625ef7d4f994fc7b47c9e57).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605450821
 
 
   cc @travishegner 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600392680
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603156824
 
 
   **[Test build #120266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120266/testReport)** for PR 27943 at commit [`62eedae`](https://github.com/apache/spark/commit/62eedaed23632104c566f7b8588132d7e5b0233d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603640018
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120298/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402163164
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   yea, let's reset to `ExpectedException.none()`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603676936
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120312/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603838724
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120333/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603597250
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25010/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r396608339
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +194,18 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed[clientIndex]
+        < conf.ioRetryWaitTimeMs()) {
+        throw new IOException(
+          String.format("Connecting to %s failed in the last %s ms, fail this connection directly",
+            resolvedAddress, conf.ioRetryWaitTimeMs()));
+      }
+      try {
+        clientPool.clients[clientIndex] = createClient(resolvedAddress);
 
 Review comment:
   never mind, you already have time if < conf.ioRetryWaitTimeMs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605217946
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120498/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603290799
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120269/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604538183
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120426/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r394339400
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +194,18 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed[clientIndex]
+        < conf.ioRetryWaitTimeMs()) {
+        throw new IOException(
+          String.format("Connecting to %s failed in the last %s ms, fail this connection directly",
+            resolvedAddress, conf.ioRetryWaitTimeMs()));
+      }
+      try {
+        clientPool.clients[clientIndex] = createClient(resolvedAddress);
 
 Review comment:
   Shall we reset waiting time if connection succeeded? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605072770
 
 
   **[Test build #120498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120498/testReport)** for PR 27943 at commit [`0b5dfe1`](https://github.com/apache/spark/commit/0b5dfe1154ed5f75393fa9e502a5e6180db72d0b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604553306
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604541562
 
 
   **[Test build #120431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120431/testReport)** for PR 27943 at commit [`7af60c8`](https://github.com/apache/spark/commit/7af60c8171d8bf95fa07b3c101c7cd888e595643).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600584777
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402090205
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   and AFAIK junit has `assertThrows` to check exception and error message.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402089187
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   shall we reset the `expectedException` after this test case completes?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607727680
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604529615
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120427/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604528998
 
 
   **[Test build #120427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120427/testReport)** for PR 27943 at commit [`8e354da`](https://github.com/apache/spark/commit/8e354dac46f28275de789f4ea1b171f204f330f2).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605073656
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25204/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603733919
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25044/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605116673
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25211/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r397618892
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -112,6 +114,7 @@ public TransportClientFactory(
     }
     this.metrics = new NettyMemoryMetrics(
       this.pooledAllocator, conf.getModuleName() + "-client", conf);
+    fastFailTimeWindow = conf.ioRetryWaitTimeMs() * 0.95;
 
 Review comment:
   I just have another idea to make code a bit more simpler:
   
   ```suggestion
       fastFailTimeWindow = if (conf.maxIORetries() > 0) conf.ioRetryWaitTimeMs() * 0.95 else 0;
   ```
   
   Then, we can simply the if condition below and call `conf.maxIORetries()` only once.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604265621
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120402/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607047767
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603736731
 
 
   **[Test build #120333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120333/testReport)** for PR 27943 at commit [`dd126cf`](https://github.com/apache/spark/commit/dd126cf99d024923178780431981a172e119d0cb).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604553321
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120423/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603733912
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604827078
 
 
   **[Test build #120448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120448/testReport)** for PR 27943 at commit [`66adc09`](https://github.com/apache/spark/commit/66adc0983a5011b077096e5f9e62c04844793bde).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605102126
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r398562861
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    server.close();
+    int unreachablePort = server.getPort();
 
 Review comment:
   I think It is not necessary.
   I have checked the code, the port would only been set during init()

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607727006
 
 
   **[Test build #120718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120718/testReport)** for PR 27943 at commit [`f9be82e`](https://github.com/apache/spark/commit/f9be82e249088ae083e2ac84b94be8e69bf50304).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607323072
 
 
   **[Test build #120682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120682/testReport)** for PR 27943 at commit [`9739b8a`](https://github.com/apache/spark/commit/9739b8a299cb0e58653018e9d570a4b7e9f5058a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603282084
 
 
   **[Test build #120266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120266/testReport)** for PR 27943 at commit [`62eedae`](https://github.com/apache/spark/commit/62eedaed23632104c566f7b8588132d7e5b0233d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600577997
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei edited a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei edited a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600432565
 
 
   I think it may happen for these cases below:
   - nm GC
   - nm crash
   - temporary network issue

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600578218
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604200446
 
 
   **[Test build #120389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120389/testReport)** for PR 27943 at commit [`3ef1612`](https://github.com/apache/spark/commit/3ef16128e6ea095faa3a0cdabccfc7bc66dc7f7c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402088951
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,11 +206,30 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      // If this connection should fast fail when last connection failed in last fast fail time
+      // window and it did, fail this connection directly.
+      if (fastFail && System.currentTimeMillis() - clientPool.lastConnectionFailed <
+        fastFailTimeWindow) {
+        throw new IOException(
+          String.format("Connecting to %s failed in the last %s ms, fail this connection directly",
+            resolvedAddress, fastFailTimeWindow));
+      }
+      try {
+        clientPool.clients[clientIndex] = createClient(resolvedAddress);
+        clientPool.lastConnectionFailed = 0;
+      } catch (IOException e) {
+        clientPool.lastConnectionFailed = System.currentTimeMillis();
+        throw e;
+      }
       return clientPool.clients[clientIndex];
     }
   }
 
+  public TransportClient createClient(String remoteHost, int remotePort)
 
 Review comment:
   also add `@VisibleForTesting`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600738366
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24718/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-601209554
 
 
   Just attach the example mentioned in the description.
   > For example: there are two request connection, rc1 and rc2.
   Especially, the io.numConnectionsPerPeer is 1 and connection timeout is 2 minutes.
   1: rc1 hold the client lock, it timeout after 2 minutes.
   2: rc2 hold the client lock, it timeout after 2 minutes.
   3: rc1 start the second retry, hold lock and timeout after 2 minutes.
   4: rc2 start the second retry, hold lock and timeout after 2 minutes.
   5: rc1 start the third retry, hold lock and timeout after 2 minutes.
   6: rc2 start the third retry, hold lock and timeout after 2 minutes.
   It wastes lots of time.
   
   The concern is that, for some case, these request connections block each other.
   If the rc1 connect timeout, then we fast *break* the first retry of rc2 but don't increase the retry count of rc2.
   Then rc1 will wait a io retry wait, and then start the second retry and connect timeout.
   Then we fast break the rc2 and don't increase its retry count.
   Then rc1 will wait a io retry wait, and then start the third retry and connect timeout, then rc1 throw fetch failed exception.
   
   
   I think it is better than the request connections block each other.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-602501688
 
 
   >  >Currently we just run and timeout 3 times, and this PR proposes to fail fast.
   >
   > We should not be failing without retrying. Is that really what this does? I'd have to take a closer look but I thought the RetryingBlockFetcher caught this and did its normal retries within it, but that was my question yesterday to confirm?
   
   @cloud-fan @tgravescs  IIUC, this PR only fail fast in a single one connection try but will still retry if it's a `RetryingBlockFetcher`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600575484
 
 
   ok to test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604827683
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120448/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402167657
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-606998340
 
 
   **[Test build #120661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120661/testReport)** for PR 27943 at commit [`fe8c08c`](https://github.com/apache/spark/commit/fe8c08c8aabefa842c83a7173c5cf02dfe60f9b3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600578013
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119986/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604311772
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25120/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402107095
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   It seems that we do not need reset the expectedException, it is unique for each test and there is also no reset method.
   
   Yes, junit5 provide assertThrows api, but our junit version is junit4.
   And junit4 just provide `@Test(expected = Exception.class)`, it is not suit for this test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607046965
 
 
   **[Test build #120661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120661/testReport)** for PR 27943 at commit [`fe8c08c`](https://github.com/apache/spark/commit/fe8c08c8aabefa842c83a7173c5cf02dfe60f9b3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604792486
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604246164
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120389/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-606998709
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25360/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600392427
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402155245
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   https://github.com/junit-team/junit4/blob/435d41f0d45cfdbc1a38e1ad4eb1d5300da533f9/src/main/java/org/junit/Rule.java#L27-L30
   
   Here is the comments, it is just for each \@Test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603657152
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607815600
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402158870
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   https://github.com/junit-team/junit4/blob/435d41f0d45cfdbc1a38e1ad4eb1d5300da533f9/src/main/java/org/junit/Rule.java#L27-L42
   
   Here is an example, this folder is a RULER.
   As described in the comments, it would create a temporary folder before
    each test method, and deletes it after each.
   So, I think \@RULE should be unique for each test. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607401448
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r401667407
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
 ##########
 @@ -119,7 +119,7 @@ private[spark] class NettyBlockTransferService(
         override def createAndStart(blockIds: Array[String],
             listener: BlockFetchingListener): Unit = {
           try {
-            val client = clientFactory.createClient(host, port)
+            val client = clientFactory.createClient(host, port, true)
 
 Review comment:
   this is being passed in all the time even when the maxRetries is 0.  change the parameter to be based on:
   if (maxRetries > 0) true else false

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604827683
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120448/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607319249
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600661759
 
 
   **[Test build #119989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119989/testReport)** for PR 27943 at commit [`5d40e8d`](https://github.com/apache/spark/commit/5d40e8db3bc578944c9cae023873ce5eb364e2b4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605116673
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25211/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605098815
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r401327864
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -121,18 +126,22 @@ public MetricSet getAllMetrics() {
   /**
    * Create a {@link TransportClient} connecting to the given remote host / port.
    *
-   * We maintains an array of clients (size determined by spark.shuffle.io.numConnectionsPerPeer)
+   * We maintain an array of clients (size determined by spark.shuffle.io.numConnectionsPerPeer)
    * and randomly picks one to use. If no client was previously created in the randomly selected
    * spot, this function creates a new client and places it there.
    *
+   * We also maintain a last connection failed time of these clients and a fast fail time window
+   * based on io retry wait time. If this connection request can be retried and the last connection
+   * failed time of these clients in the fast fail time window, fail this connection directly.
+   *
    * Prior to the creation of a new TransportClient, we will execute all
    * {@link TransportClientBootstrap}s that are registered with this factory.
    *
    * This blocks until a connection is successfully established and fully bootstrapped.
    *
    * Concurrency: This method is safe to call from multiple threads.
    */
-  public TransportClient createClient(String remoteHost, int remotePort)
+  public TransportClient createClient(String remoteHost, int remotePort, boolean withRetry)
 
 Review comment:
   thanks, have named this variable to `fastFail` and add java doc and comments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604257199
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25111/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605288452
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120505/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605116655
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604257195
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603168872
 
 
   **[Test build #120269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120269/testReport)** for PR 27943 at commit [`02537b3`](https://github.com/apache/spark/commit/02537b31c996d4427f6fa51dabfcfe6555014a25).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603290799
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120269/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603290788
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607319249
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600451872
 
 
   OK to test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604792489
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25156/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r398585034
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -112,6 +116,7 @@ public TransportClientFactory(
     }
     this.metrics = new NettyMemoryMetrics(
       this.pooledAllocator, conf.getModuleName() + "-client", conf);
+    fastFailTimeWindow = conf.maxIORetries() > 0 ? (int)(conf.ioRetryWaitTimeMs() * 0.95) : 0;
 
 Review comment:
   so I think you may have misunderstood my comment. my concern is really on those things that don't go through RetryFetcher and don't use maxIORetries.  Like  just sending RPC messages.
   For instance the external block client fetching the local host dirs, or just any other rpc message.   I guess the rpc messages through outbox all cache the client, so that isn't as much of a concern.  Looking some more, it looks like those cases are pretty limited so it should be ok.
   Also this doesn't really disable it if the max retries is 0 because you could still theoretically hit the case that 2 try in the same millisecond and then the second would fail fast.  how about setting to -1 in that case

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607047773
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120661/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603837561
 
 
   **[Test build #120333 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120333/testReport)** for PR 27943 at commit [`dd126cf`](https://github.com/apache/spark/commit/dd126cf99d024923178780431981a172e119d0cb).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603736731
 
 
   **[Test build #120333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120333/testReport)** for PR 27943 at commit [`dd126cf`](https://github.com/apache/spark/commit/dd126cf99d024923178780431981a172e119d0cb).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604542421
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25139/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600584788
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24710/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r394829873
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +194,19 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed[clientIndex]
 
 Review comment:
   I think I get your point.
   Do you means that, we need define a new exception type, when retryBlockFetcher catch this exception, the retry count should not increase?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607815607
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120718/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604311763
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605288452
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120505/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605116074
 
 
   **[Test build #120505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120505/testReport)** for PR 27943 at commit [`49c8e13`](https://github.com/apache/spark/commit/49c8e13d2c351103231b0185d793baeff56f2771).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605098041
 
 
   **[Test build #120502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120502/testReport)** for PR 27943 at commit [`8942ae5`](https://github.com/apache/spark/commit/8942ae5304e8ea645625ef7d4f994fc7b47c9e57).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603597250
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25010/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603730142
 
 
   UT had passed before, the latest test is killed manually.
   cc @cloud-fan  @Ngone51 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-602592168
 
 
   > > > Currently we just run and timeout 3 times, and this PR proposes to fail fast.
   > > 
   > > 
   > > We should not be failing without retrying. Is that really what this does? I'd have to take a closer look but I thought the RetryingBlockFetcher caught this and did its normal retries within it, but that was my question yesterday to confirm?
   > 
   > @cloud-fan @tgravescs IIUC, this PR only fail fast in a single one connection try but will still retry if it's a `RetryingBlockFetcher`.
   
   In fact, the current implementation in this patch would fast fail all connections.
   I just propose a compromise solution that just fast fail a single one connection in the comments.
   
   I prefer to fast fail all connections related with the unreachable ESS.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-606998700
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402107095
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   It seems that we do not need reset the expectedException, it is uniq for each test.
   
   Yes, junit5 provide assertThrows api, but our junit version is junit4.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603838714
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604538183
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120426/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604453703
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25134/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-602941555
 
 
   Agree the fail fast time window length should be a little less than `conf.ioRetryWaitTimeMs()` so you will not fail the retried connection immediately. Also I'm curious could this lead to the scenario that, when you have two tasks and only one client, the connection request from the second task may fail fast every time it tries to connect (because the connection from the first task always fail beforehand)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604246158
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r396464340
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +194,19 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed[clientIndex]
 
 Review comment:
   no, I mean you are using a random number here and the last task that failed might not have matched.  Don't you need an indicator over the entire clientPool?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r398591598
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -112,6 +116,7 @@ public TransportClientFactory(
     }
     this.metrics = new NettyMemoryMetrics(
       this.pooledAllocator, conf.getModuleName() + "-client", conf);
+    fastFailTimeWindow = conf.maxIORetries() > 0 ? (int)(conf.ioRetryWaitTimeMs() * 0.95) : 0;
 
 Review comment:
   thanks, make sense.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604309299
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600823837
 
 
   **[Test build #119998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119998/testReport)** for PR 27943 at commit [`fa266f2`](https://github.com/apache/spark/commit/fa266f2e70b2d53fc1207050629dac4f1dc23c23).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600584271
 
 
   **[Test build #119989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119989/testReport)** for PR 27943 at commit [`5d40e8d`](https://github.com/apache/spark/commit/5d40e8db3bc578944c9cae023873ce5eb364e2b4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600663104
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119989/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604265512
 
 
   **[Test build #120402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120402/testReport)** for PR 27943 at commit [`8784906`](https://github.com/apache/spark/commit/8784906c927936a9f062c3b29efc7d5b36e608d7).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r394610498
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +194,19 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed[clientIndex]
 
 Review comment:
   clientIndex here is just a random number, so this isn't always going to work if the last task that failed got a different clientIndex, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605116655
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604396875
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605450853
 
 
   gentle ping @tgravescs 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604626926
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120431/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603284116
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120266/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604792133
 
 
   **[Test build #120448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120448/testReport)** for PR 27943 at commit [`66adc09`](https://github.com/apache/spark/commit/66adc0983a5011b077096e5f9e62c04844793bde).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-601025341
 
 
   IIUC throw an IOException here would eventually lead to task failure with FetchFailedException? If that's really the case then I would be really conservative here and wait instead of fail fast.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603284104
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607401448
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604529606
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605217936
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603676936
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120312/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604452896
 
 
   **[Test build #120426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120426/testReport)** for PR 27943 at commit [`3246606`](https://github.com/apache/spark/commit/32466065d29d091bb036c8908f2dc7f768f1aad1).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600392680
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603169559
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24980/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607727680
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607047767
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-607842053
 
 
   merging to master

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r394487813
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +194,18 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed[clientIndex]
+        < conf.ioRetryWaitTimeMs()) {
+        throw new IOException(
+          String.format("Connecting to %s failed in the last %s ms, fail this connection directly",
+            resolvedAddress, conf.ioRetryWaitTimeMs()));
+      }
+      try {
+        clientPool.clients[clientIndex] = createClient(resolvedAddress);
 
 Review comment:
   thanks, I will set `clientPool.lastConnectionFailed[clientIndex] = 0;` after create client successfully.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604265621
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120402/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604198439
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25098/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604396875
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r396466146
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +194,18 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      if (System.currentTimeMillis() - clientPool.lastConnectionFailed[clientIndex]
+        < conf.ioRetryWaitTimeMs()) {
+        throw new IOException(
+          String.format("Connecting to %s failed in the last %s ms, fail this connection directly",
+            resolvedAddress, conf.ioRetryWaitTimeMs()));
+      }
+      try {
+        clientPool.clients[clientIndex] = createClient(resolvedAddress);
 
 Review comment:
   I'm confused by how it would ever succeed after it failed once because if it failed once then it never gets here to try to create client?  Maybe there is something I'm missing with the clientIndex, but that code above seems like a bug.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603157533
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24977/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604453680
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605450821
 
 
   cc @travishegner 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600442399
 
 
   makes sense to me to treat `conf.ioRetryWaitTimeMs` as per client not per task, as multiple tasks may share the same client.
   
   cc @squito @vanzin  @jiangxb1987 @Ngone51 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-602625438
 
 
   Thanks for the reply @tgravescs 
   Sorry for the unclear description. 
   
   `All connections` I mentioned above is that the sent request connections to the same unreachable address.
   
   It is my mistake that does not recognize  that there maybe several clients for the same address, may be we need keep a lastConnectionFailedTime variable for one clientPool.
   
   The problem is that, for a task, there maybe several request connections to the same address.
   Specially, for a shuffle read task and there is only one client in the client pool and it would always been picked by the connections, which want to connect the same ESS.
   If this address is unreachable, these connections would block each other(during createClient).
   These connections owned to a same task and want to connect the same ESS,  if this ESS was still unreachable.
   It  would cost connectionNum \* connectionTimeOut \* maxRetry to retry, and then fail the task.
   It is ideally that this task could fail in connectionTimeOut \* maxRetry.
   
   
   
   
   
   
    
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#discussion_r397563886
 
 

 ##########
 File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ##########
 @@ -192,7 +193,20 @@ public TransportClient createClient(String remoteHost, int remotePort)
           logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
         }
       }
-      clientPool.clients[clientIndex] = createClient(resolvedAddress);
+      double fastFailTimeWindow = conf.ioRetryWaitTimeMs() * 0.95;
 
 Review comment:
   Shall we make it a global variable?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600663088
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-600584271
 
 
   **[Test build #119989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119989/testReport)** for PR 27943 at commit [`5d40e8d`](https://github.com/apache/spark/commit/5d40e8db3bc578944c9cae023873ce5eb364e2b4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
turboFei commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r402107095
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    int unreachablePort = server.getPort();
+    server.close();
+    try {
+      factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
+    } catch (Exception e) {
+      assert(e instanceof IOException);
+    }
+    expectedException.expect(IOException.class);
+    expectedException.expectMessage("fail this connection directly");
+    factory.createClient(TestUtils.getLocalHost(), unreachablePort, true);
 
 Review comment:
   It seems that we do not need reset the expectedException, it is unique for each test and there is also no reset method.
   
   Yes, junit5 provide assertThrows api, but our junit version is junit4.
   And junit4 just provides `@Test(expected = Exception.class)`, it is not suit for this test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#discussion_r398578161
 
 

 ##########
 File path: common/network-common/src/test/java/org/apache/spark/network/client/TransportClientFactorySuite.java
 ##########
 @@ -224,4 +226,23 @@ public void closeFactoryBeforeCreateClient() throws IOException, InterruptedExce
     factory.close();
     factory.createClient(TestUtils.getLocalHost(), server1.getPort());
   }
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void fastFailConnectionInTimeWindow() throws IOException, InterruptedException {
+    TransportClientFactory factory = context.createClientFactory();
+    TransportServer server = context.createServer();
+    server.close();
+    int unreachablePort = server.getPort();
 
 Review comment:
   thanks, I understand it works now, its just future proofing it incase someone decided to clear port on close or something.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-603156824
 
 
   **[Test build #120266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120266/testReport)** for PR 27943 at commit [`62eedae`](https://github.com/apache/spark/commit/62eedaed23632104c566f7b8588132d7e5b0233d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-605216687
 
 
   **[Test build #120498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120498/testReport)** for PR 27943 at commit [`0b5dfe1`](https://github.com/apache/spark/commit/0b5dfe1154ed5f75393fa9e502a5e6180db72d0b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
URL: https://github.com/apache/spark/pull/27943#issuecomment-604626926
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120431/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org