You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/03/05 16:22:00 UTC

[jira] [Updated] (SLIDER-1259) Slider does not work in multi homed environments

     [ https://issues.apache.org/jira/browse/SLIDER-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated SLIDER-1259:
-----------------------------------
    Attachment: SLIDER-1259-001.patch

> Slider does not work in multi homed environments
> ------------------------------------------------
>
>                 Key: SLIDER-1259
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1259
>             Project: Slider
>          Issue Type: Bug
>          Components: appmaster
>    Affects Versions: Slider 0.92
>            Reporter: Lev Bronshtein
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: SLIDER-1259-001.patch
>
>
> In an an environment where Hadoop Worker nodes bind the Node Manager to an interface with a hostname different from the one returned by socket.getfqdn() for example in our test environment a difference between f-bcpc-vm3 and just bcpc-vm3, which is the hostname bound to the management interface, but not the interface for hadoop/production traffic.  This results in our inability to introspect running jobs.
>  
> For example running  *slider registry --name slider_poc --listexp* results in the following output in the ResourceManager logs
> {quote}2018-01-26 17:30:32,147 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is accessing unchecked [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which is the app master GUI of application_1516910361403_0094 owned by ubuntu 
>  2018-01-26 17:31:13,639 WARN org.mortbay.log: /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: java.net.ConnectException: Connection timed out (Connection timed out) 
> {quote}
>  
> Note how the redirect is to [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,] where as it should have been to [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.]  Renaming the host to f-bcpc-vm3 results in appropriate behavior.
>  
> perhaps *hostname.py* can be instructed to look at one of before registering 
> *yarn.nodemanager.address*
>  *yarn.nodemanager.bind-host*
>  *yarn.nodemanager.hostname*
>  
> When called in Register.py
> register = {'responseId': int(id),
>   'timestamp': timestamp,
>   'label': self.config.getLabel(),
>   *'publicHostname': hostname.public_hostname(),*
>   'agentVersion': version,
>   'actualState': actualState,
>   'expectedState': expectedState,
>   'allocatedPorts': allocated_ports,
>   'logFolders': log_folders,
>   'tags': tags
>  }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)