You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Ilya Pronin (JIRA)" <ji...@apache.org> on 2018/02/02 00:17:00 UTC

[jira] [Comment Edited] (MESOS-7698) Libprocess doesn't handle IP changes

    [ https://issues.apache.org/jira/browse/MESOS-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349554#comment-16349554 ] 

Ilya Pronin edited comment on MESOS-7698 at 2/2/18 12:16 AM:
-------------------------------------------------------------

[~greggomann], libprocess looks up the host address its running on upon process startup and remembers that address for the lifetime of the process. If {{--advertise_ip}} flag is not provided, then this address is used as a return address in inter-libprocess communication ({{User-Agent: libprocess/*}} header field). When I encountered the described problem, the IP address of one of our hosts has changed due to network maintenance. The agent on that host tried to re-register with the master, telling him that he was located at addr1, while in reality he was at addr2. Because of that logic with return address, the master was sending his responses to a wrong host at addr1. I never tried to reproduce the problem, but I suppose it should be relatively easy reproduced by changing the IP address of the interface used by the agent for communicating with the master.

Maybe we could make the usage of return addresses in libprocess-libprocess communication more "relaxed". If the user doesn't want libprocess to advertise a specific address, sending libprocess can omit the address in the {{User-Agent}} field and the receiver will use return address from the connection?

I can work on a patch if somebody can shepherd this work.


was (Author: ipronin):
[~greggomann], libprocess looks up the host address its running on upon process startup and remembers that address for the lifetime of the process. If {{--advertise_ip}} flag is not provided, then this address is used as a return address in inter-libprocess communication ({{User-Agent: libprocess/*}} header field). When I encountered the described problem, the IP address of one of our hosts has changed due to network maintenance. The agent on that host tried to re-register with the master, telling him that he was located at addr1, while in reality he was at addr2. Because of that logic with return address, the master was sending his responses to a wrong host at addr1. I never tried to reproduce the problem, but I suppose it should be relatively easy reproduced by changing the IP address of the interface used by the agent for communicating with the master.

Maybe we could make the usage of return addresses in libprocess-libprocess communication more "relaxed". If the user doesn't want libprocess to advertise a specific address, sending libprocess can omit the address in the {{User-Agent}} field and the receiver will use return address from the connection?

> Libprocess doesn't handle IP changes
> ------------------------------------
>
>                 Key: MESOS-7698
>                 URL: https://issues.apache.org/jira/browse/MESOS-7698
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 1.2.0
>            Reporter: Ilya Pronin
>            Priority: Major
>
> If a host IP address changes libprocess will never learn about it and will continue to send messages "from" the old IP.
> This will cause weird situations. E.g. an agent will indefinitely try to reregister with a master pretending that it can be reached by an old IP. The master will send {{SlaveReregisteredMessage}} to the wrong host (potentially a different agent), using an IP from the {{User-Agent: libprocess/*}} header.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)