You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Carles Figuerola (JIRA)" <ji...@apache.org> on 2017/06/06 14:38:18 UTC

[jira] [Created] (MESOS-7628) Changing from --ip to --advertise_ip makes the mesos-slaves not take any new jobs

Carles Figuerola created MESOS-7628:
---------------------------------------

             Summary: Changing from --ip to --advertise_ip makes the mesos-slaves not take any new jobs
                 Key: MESOS-7628
                 URL: https://issues.apache.org/jira/browse/MESOS-7628
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 0.28.1
         Environment: CentOS Linux release 7.2.1511 (Core) 
            Reporter: Carles Figuerola


We had been running an extensive environment with all the mesos agents using the --ip flag so the masters could find them, as this makes it bind to only that IP and calls to http://localhost:5051 wouldn't work, we found that replacing it for --advertise_ip would make the agents findable by the masters but the process would bind to 0.0.0.0 instead. Upon doing this in a live environment, the masters won't schedule any tasks to the agents:

master log:
{code}
Jun 06 14:30:16 mesosmst002.us-west-2.lab.example.com mesos-master[869]: E0606 14:30:16.905573   918 process.cpp:1958] Failed to shutdown socket with fd 45: Transport endpoint is not connected
Jun 06 14:32:24 mesosmst002.us-west-2.lab.example.com mesos-master[869]: E0606 14:32:24.137552   918 process.cpp:1958] Failed to shutdown socket with fd 29: Transport endpoint is not connected
Jun 06 14:32:41 mesosmst002.us-west-2.lab.example.com mesos-master[869]: E0606 14:32:41.033612   918 process.cpp:1958] Failed to shutdown socket with fd 45: Transport endpoint is not connected
{code}

agent logs:
{code}
Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: E0606 14:32:37.103865 26516 process.cpp:1958] Failed to shutdown socket with fd 24: Transport endpoint is not connected
Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: E0606 14:32:37.103961 26516 process.cpp:1958] Failed to shutdown socket with fd 23: Transport endpoint is not connected
Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: E0606 14:32:37.104019 26516 process.cpp:1958] Failed to shutdown socket with fd 21: Transport endpoint is not connected
Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: E0606 14:32:37.104082 26516 process.cpp:1958] Failed to shutdown socket with fd 15: Transport endpoint is not connected
Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: E0606 14:34:47.151888 26516 process.cpp:1958] Failed to shutdown socket with fd 24: Transport endpoint is not connected
Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: E0606 14:34:47.152065 26516 process.cpp:1958] Failed to shutdown socket with fd 23: Transport endpoint is not connected
Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: E0606 14:34:47.152196 26516 process.cpp:1958] Failed to shutdown socket with fd 21: Transport endpoint is not connected
Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: E0606 14:34:47.152262 26516 process.cpp:1958] Failed to shutdown socket with fd 15: Transport endpoint is not connected
{code}

When testing this in another region on a new cluster with this flag enabled, the tasks get scheduled and the system works as expected.

Any help is appreciated, thanks



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)