You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Guangya Liu (JIRA)" <ji...@apache.org> on 2015/11/02 14:26:27 UTC

[jira] [Commented] (MESOS-1826) Improve logging for when master cannot connect to slaves

    [ https://issues.apache.org/jira/browse/MESOS-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985204#comment-14985204 ] 

Guangya Liu commented on MESOS-1826:
------------------------------------

Thanks [~adam-mesos] after some test with the steps you provided, I think that the log message in master is now very clear and the end user can know what is wrong with his slave from master log. What do you say? Thanks!

{code}
I1102 21:21:18.333128 26844 replica.cpp:512] Replica received write request for position 1836 from (8)@192.168.0.101:5050
E1102 21:21:18.335482 26851 process.cpp:1911] Failed to shutdown socket with fd 11: Transport endpoint is not connected
I1102 21:21:18.336086 26845 hierarchical.cpp:335] Added slave 0ad8ede6-9627-4a95-a5c9-d7a21c1ac4c8-S0 (localhost) with cpus(*):1; mem(*):623; disk(*):9618; ports(*):[31000-32000] (allocated: )
I1102 21:21:18.337069 26846 master.cpp:3921] Registered slave 0ad8ede6-9627-4a95-a5c9-d7a21c1ac4c8-S0 at slave(1)@127.0.0.1:5051 (localhost) with cpus(*):1; mem(*):623; disk(*):9618; ports(*):[31000-32000]
I1102 21:21:18.337333 26846 master.cpp:1077] Slave 0ad8ede6-9627-4a95-a5c9-d7a21c1ac4c8-S0 at slave(1)@127.0.0.1:5051 (localhost) disconnected
I1102 21:21:18.337376 26846 master.cpp:2525] Disconnecting slave 0ad8ede6-9627-4a95-a5c9-d7a21c1ac4c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
I1102 21:21:18.337473 26846 master.cpp:2544] Deactivating slave 0ad8ede6-9627-4a95-a5c9-d7a21c1ac4c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
{code}

> Improve logging for when master cannot connect to slaves
> --------------------------------------------------------
>
>                 Key: MESOS-1826
>                 URL: https://issues.apache.org/jira/browse/MESOS-1826
>             Project: Mesos
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Thomas Rampelberg
>            Assignee: Guangya Liu
>            Priority: Minor
>              Labels: newbie
>
> When first setting a mesos cluster up, it is possible to get into a state where your slaves are constantly re-registering. This happens because the slave pid is not reachable from the master.
> Currently, the master logs make it pretty tough to figure out that this is the problem that is occurring. It would be fantastic if there was a better explanation in the logs, something like:
>     Unable to connect to slave X at x.x.x.x:5051. Please make sure that host is reachable from your master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)