You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by rmetzger <gi...@git.apache.org> on 2015/09/21 18:21:56 UTC

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

GitHub user rmetzger opened a pull request:

    https://github.com/apache/flink/pull/1159

    [FLINK-2722] Use InetAddress.getLocalHost() as first approach when detecting the TMs own ip/hostname

    A user reported a connection issue with Netty being unable to connect to a TaskManager to subscribe to an intermediate result.
    The problem occurred when the TaskManager and JobManager were running on the same host (something that can easily happen on YARN).
    In that case, the TaskManager was reporting a host-local ip address to the JobManager when connecting.
    
    To avoid the issue in the future, the TaskManager first tries to use the hostname returned by InetAddress.getLocalHost(). In a properly set-up environment, this will return a connection which is accessible by all machines in a cluster.
    
    I tested this implementation on a cluster in GCE using YARN (note that I was not able to reproduce the issue originally reported, but the user reported that the issue has been fixed)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rmetzger/flink network_detection_0.10

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1159
    
----
commit f9913e731f44c0c84e83670dfe35d2c4ead09a4a
Author: Robert Metzger <rm...@apache.org>
Date:   2015-09-04T12:51:40Z

    [FLINK-2722] Use InetAddress.getLocalHost() as first approach when detecting the TMs own ip/hostname

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1159#discussion_r40048959
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/NetUtils.java ---
    @@ -189,9 +191,17 @@ public static InetAddress findConnectingAddress(InetSocketAddress targetAddress,
     		long currentSleepTime = MIN_SLEEP_TIME;
     		long elapsedTime = 0;
     
    +		// before trying with different strategies: test with getLocalHost():
    +		InetAddress localhostName = InetAddress.getLocalHost();
    +
    +		if(tryToConnect(localhostName, targetAddress, AddressDetectionState.ADDRESS.getTimeout(), false)) {
    +			LOG.debug("Using immediately InetAddress.getLocalHost() for the connecting address");
    --- End diff --
    
    it might be useful to log the value of `localhostName` - `InetAddress.getLocalHost()` sometimes returns odd values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1159#discussion_r40064020
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/NetUtils.java ---
    @@ -189,9 +191,17 @@ public static InetAddress findConnectingAddress(InetSocketAddress targetAddress,
     		long currentSleepTime = MIN_SLEEP_TIME;
     		long elapsedTime = 0;
     
    +		// before trying with different strategies: test with getLocalHost():
    +		InetAddress localhostName = InetAddress.getLocalHost();
    +
    +		if(tryToConnect(localhostName, targetAddress, AddressDetectionState.ADDRESS.getTimeout(), false)) {
    --- End diff --
    
    Why do we have this statement here? It should be covered by the `LOCAL_HOST` strategy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1159#issuecomment-142221347
  
    Thank you for the thorough review @tillrohrmann. I removed the `resolveAddress()` method. It had a lot of duplicates with `findConnectingAddress()` and was only used in one other location.
    I'm now using the `findConnectingAddress()` there as well.
    
    Regarding a test: There is a little test for `findConnectingAddress()`, but we should add another test for the feature. My main problem is that I would really like to include the fix into 0.10-milestone-1, which seems to be blocked on this PR.
    
    I'm currently manually testing my change on YARN to see if everything is working as expected ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1159#issuecomment-142279741
  
    Except for the missing test case, it looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1159


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1159#discussion_r40064104
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/NetUtils.java ---
    @@ -276,6 +289,13 @@ private static InetAddress findAddressUsingStrategy(AddressDetectionState strate
     				InetAddress interfaceAddress = ee.nextElement();
     
     				switch (strategy) {
    +					case LOCAL_HOST:
    --- End diff --
    
    Why do we execute the `LOCAL_HOST` strategy for every interface? I thought we use `InetAddress.getLocalHost` as the outbound interface.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1159#issuecomment-142216804
  
    I had some comments. Additionally, I was wondering whether it wouldn't make sense to also include the new strategy `LOCAL_HOST` in the `resolveAddress` method. Seems odd to only support it in `findAddressUsingStrategy`.
    
    It would also be nice to add a test case for this feature if the time allows it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1159#discussion_r40058785
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/NetUtils.java ---
    @@ -189,9 +191,17 @@ public static InetAddress findConnectingAddress(InetSocketAddress targetAddress,
     		long currentSleepTime = MIN_SLEEP_TIME;
     		long elapsedTime = 0;
     
    +		// before trying with different strategies: test with getLocalHost():
    +		InetAddress localhostName = InetAddress.getLocalHost();
    +
    +		if(tryToConnect(localhostName, targetAddress, AddressDetectionState.ADDRESS.getTimeout(), false)) {
    +			LOG.debug("Using immediately InetAddress.getLocalHost() for the connecting address");
    --- End diff --
    
    These are the produced log statements in `DEBUG` level:
    ```
    16:12:19,822 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to select the network interface and address to use by connecting to the leading JobManager.
    16:12:19,822 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics
    16:12:19,833 INFO  org.apache.flink.runtime.net.NetUtils                         - Retrieved new target address /10.240.221.7:33378.
    16:12:19,835 DEBUG org.apache.flink.runtime.net.NetUtils                         - Trying to connect to (/10.240.221.7:33378) from local address cdh544-master.c.astral-sorter-757.internal/10.240.242.143 with timeout 50
    16:12:19,838 DEBUG org.apache.flink.runtime.net.NetUtils                         - Using immediately InetAddress.getLocalHost() for the connecting address
    16:12:19,839 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager will use hostname/address 'cdh544-master.c.astral-sorter-757.internal' (10.240.242.143) for communication.
    16:12:19,839 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Starting TaskManager in streaming mode BATCH_ONLY
    16:12:19,839 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Starting TaskManager actor system at cdh544-master.c.astral-sorter-757.internal:0
    ```
    I think the messages in `INFO` level contain enough information for users to understand whats going on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1159#discussion_r40049021
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/NetUtils.java ---
    @@ -189,9 +191,17 @@ public static InetAddress findConnectingAddress(InetSocketAddress targetAddress,
     		long currentSleepTime = MIN_SLEEP_TIME;
     		long elapsedTime = 0;
     
    +		// before trying with different strategies: test with getLocalHost():
    +		InetAddress localhostName = InetAddress.getLocalHost();
    +
    +		if(tryToConnect(localhostName, targetAddress, AddressDetectionState.ADDRESS.getTimeout(), false)) {
    +			LOG.debug("Using immediately InetAddress.getLocalHost() for the connecting address");
    --- End diff --
    
    also it could throw `UnknownHostException` - not sure if that could be a problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1159#issuecomment-142281622
  
    okay, I'm merging



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2722] Use InetAddress.getLocalHost() as...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/1159#issuecomment-142215963
  
    +1 if the user reported that his problem is gone and it doesn't interfere with other normal operation then we should merge it now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---