You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Dalton Matos Coelho Barreto (Jira)" <ji...@apache.org> on 2019/12/13 15:07:00 UTC

[jira] [Created] (MESOS-10068) Mesos Master doesn't send AGENT_REWOVED when removing agent from internal state

Dalton Matos Coelho Barreto created MESOS-10068:
---------------------------------------------------

             Summary: Mesos Master doesn't send AGENT_REWOVED when removing agent from internal state
                 Key: MESOS-10068
                 URL: https://issues.apache.org/jira/browse/MESOS-10068
             Project: Mesos
          Issue Type: Bug
          Components: master
    Affects Versions: 1.7.3, 1.8.2, 1.9.1
            Reporter: Dalton Matos Coelho Barreto


Hello,

 

Looking at the documentation of the master {{/api/v1}} endpoint, the {{SUBSCRIBE}} message says that only {{TASK_ADDED}} and {{TASK_UPDATED}} is supported for this endpoint, but when a new agent joins the cluster a {{AGENT_ADDED}} event is received.

The problem is that when this agent is stopped the {{AGENT_REMOVED}} is not received by clients subscribed to the master API.

 

I testes this behavior with versions: {{1.7.3}}, {{1.8.2}} and {{1.9.1}}. All using the docker image {{mesos/mesos-centos}}.

The only way I saw a {{AGENT_REMOVED}} event was when a new agent joined the cluster but the master couldn't communicate with this agent, in this specific test there was a firewall blocking port {{5051}} on the slave, that is, no body was being able to tal to the slave on port {{5051}}.

 
h2. Here are the steps do reproduce the problem
 * Start a new mesos master
 * Connect to the {{/api/v1}} endpoint, sendingo a {{SUBSCRIBE}} message:
 ** 
{noformat}
curl --no-buffer -Ld '{"type": "SUBSCRIBE"}' -H "Content-Type: application/json" http://MASTER_IP:5050/api/v1{noformat}

 * Start a new slave and confirm the {{AGENT_ADDED}} event is delivered;
 * Stop this slave;
 * Checks that {{/slaves?slave_id=AGENT_ID}} returns a JSON response with the field {{active=false}}.
 * Waits for mesos master stop listing this slave, that is, {{/slaves?slave_id=AGENT_ID}} returns an empty response;

Even after the empty response, the event never reaches the subscriber.

 

The mesos master logs shows this:
{noformat}
 I1213 15:03:10.338935    13 master.cpp:1297] Agent 2cd23025-c09d-401b-8f26-9265eda8f800-S1 at slave(1)@172.18.0.51:5051 (86813ca2a964) disconnected
I1213 15:03:10.339089    13 master.cpp:3399] Disconnecting agent 2cd23025-c09d-401b-8f26-9265eda8f800-S1 at slave(1)@172.18.0.51:5051 (86813ca2a964)
I1213 15:03:10.339207    13 master.cpp:3418] Deactivating agent 2cd23025-c09d-401b-8f26-9265eda8f800-S1 at slave(1)@172.18.0.51:5051 (86813ca2a964)
{noformat}
And then:
{noformat}
W1213 15:04:40.726670    15 process.cpp:1917] Failed to send 'mesos.internal.PingSlaveMessage' to '172.18.0.51:5051', connect: Failed to connect to 172.18.0.51:5051: No route to host{noformat}
And some time after this:
{noformat}
I1213 15:04:37.685007     7 hierarchical.cpp:900] Removed agent 2cd23025-c09d-401b-8f26-9265eda8f800-S1   {noformat}
I will attach the full master logs also.

 

Do you think this could be a bug?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)