You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2015/02/24 21:43:04 UTC

[jira] [Updated] (FLINK-1608) TaskManagers may pick wrong network interface when starting before JobManager

     [ https://issues.apache.org/jira/browse/FLINK-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephan Ewen updated FLINK-1608:
--------------------------------
    Description: 
The taskmanagers use a NetUtils routine to find an interface that lets them talk to the Jobmanager. However, if the JobManager is not online yet, they fall back to some non-localhost device.

In cases where the TaskManagers start faster than the JobManager, they pick the wrong hostname and interface.

The later logic (that tries to connect to the JobManager actor) has a logic with retries. I think we need a similar logic here...

  was:
The taskmanagers use a NetUtils routine to find an interface that lets them talk to the Jobmanager. However, if the JobManager is not online yet, they fall back to localhost.

In cases where the TaskManagers start faster than the JobManager, they pick the wrong hostname and interface.


> TaskManagers may pick wrong network interface when starting before JobManager
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-1608
>                 URL: https://issues.apache.org/jira/browse/FLINK-1608
>             Project: Flink
>          Issue Type: Bug
>          Components: TaskManager
>    Affects Versions: 0.9
>            Reporter: Stephan Ewen
>             Fix For: 0.9
>
>
> The taskmanagers use a NetUtils routine to find an interface that lets them talk to the Jobmanager. However, if the JobManager is not online yet, they fall back to some non-localhost device.
> In cases where the TaskManagers start faster than the JobManager, they pick the wrong hostname and interface.
> The later logic (that tries to connect to the JobManager actor) has a logic with retries. I think we need a similar logic here...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)