You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/16 06:24:00 UTC

[jira] [Commented] (FLINK-4052) Unstable test ConnectionUtilsTest

    [ https://issues.apache.org/jira/browse/FLINK-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651188#comment-16651188 ] 

ASF GitHub Bot commented on FLINK-4052:
---------------------------------------

leanken opened a new pull request #6853: [FLINK-4052] using non-local unreachable ip:port to fix unstable Test testReturnLocalHostAddressUsingHeuristics
URL: https://github.com/apache/flink/pull/6853
 
 
   ## What is the purpose of the change
   
   See. [FLINK-4052](https://issues.apache.org/jira/browse/FLINK-4052) and also [FLINK-3687](https://issues.apache.org/jira/browse/FLINK-3687).
   
   Fix testReturnLocalHostAddressUsingHeuristics unstable failure.
   
   ## Diagnose
   
   I was able to 100% reproduce the failure in local environment as below.
   
   ```
   java.lang.AssertionError: 
   Expected :linxuewei.local/30.5.17.21
   Actual   :/127.0.0.1
    <Click to see difference>
   
     at org.junit.Assert.fail(Assert.java:88)
     at org.junit.Assert.failNotEquals(Assert.java:834)
     at org.junit.Assert.assertEquals(Assert.java:118)
     at org.junit.Assert.assertEquals(Assert.java:144)
     at org.apache.flink.runtime.net.ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics(ConnectionUtilsTest.java:64)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   ``` 
   
   In general, in my local env, there are two ip the test case will try to use to connect to the target unreachable endpoint. **30.5.17.21** and **127.0.0.1**.
   
   Reproduce Step:
   
   * when case try with 30.5.17.21, do nothing.
   * when case try with 127.0.0.1, create a local process that listened on target unreachable port, in this case, the unreachable endpoint is now reachable now.
   * before the case return 127.0.0.1, there are still some code will trying to connect to the endpoint with 30.5.17.21 again which is return by **InetAddress.getLocalHost()**
   
   ```
   # See. ConnectionUtils.java line: 276
   
   case SLOW_CONNECT:
     LOG.debug("Trying to connect to {} from local address {} with timeout {}",
         targetAddress, interfaceAddress, strategy.getTimeout());
   
     if (tryToConnect(interfaceAddress, targetAddress, strategy.getTimeout(), logging)) {
       return tryLocalHostBeforeReturning(interfaceAddress, targetAddress, logging);
     }
     break;
   ```
   
   * before the tryLocalHostBeforeReturning called, kill the local process that listened on the target unreachable port. In this case, the test case will return 127.0.0.1 as assert compare result. Hans. the test case will failed.
   
   ## Proposal
   
   In genernal, in the build environment, local port used corrupt could happened time to time, So I suggest we change the local unreachable endpoint to a outside unreachable endpoint, for instance 8.8.8.8:65535 in my pull request. I think this way can prevent the test failure happen again.
   
   ## Brief change log
   
   * using non-local unreachable ip:port to fix unstable Test testReturnLocalHostAddressUsingHeuristics
   
   ## Verifying this change
   
   Test testReturnLocalHostAddressUsingHeuristics fixed.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not documented)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Unstable test ConnectionUtilsTest
> ---------------------------------
>
>                 Key: FLINK-4052
>                 URL: https://issues.apache.org/jira/browse/FLINK-4052
>             Project: Flink
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 1.0.2, 1.3.0
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>             Fix For: 1.7.0, 1.6.2, 1.5.5
>
>
> The error is the following:
> {code}
> ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics:59 expected:<testing-worker-linux-docker-e744e561-3361-linux-13/172.17.5.9> but was:</127.0.0.1>
> {code}
> The probable cause for the failure is that the test tries to select an unused closed port (from the ephemeral range), and then assumes that all connections to that port fail.
> If a concurrent test actually uses that port, connections to the port will succeed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)