You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Qiang Chen <qz...@gmail.com> on 2016/07/18 09:08:23 UTC

What will happen in maintenance mode

Hi all,

I'm puzzled in using maintenance mode.

I see this from mesos [doc 
site](http://mesos.apache.org/documentation/latest/maintenance/):

```
When maintenance is triggered by the operator, all agents on the machine 
are told to shutdown. These agents are removed from the master, which 
means that a |TASK_LOST| status update will be sent for every task 
running on each of those agents. The scheduler driver\u2019s |slaveLost| 
callback will also be invoked for each of the removed agents. Any agents 
on machines in maintenance are also prevented from re-registering with 
the master in the future (until maintenance is completed and the machine 
is brought back up).
```
But I didn't find the agent machine shutdown or task failed when I test 
the maintenance HTTP endpoints.

If mesos agents are in that mode will move the running tasks to other 
agents? namely, it will evacuate all the tasks in those agents? and the 
shutdown?

When I POST "/maintenance/schedule" and "/machine/down" and give a 
proper maintain time window. I got the response that those specified 
agents are in the "draining_machines" and "down_machines" list by GET 
"/maintenance/status", but didn't shutdown and evacuate any tasks, why ? 
does it make sense?

Thanks.

-- 
Best Regards,
Chen, Qiang


Re: What will happen in maintenance mode

Posted by Joseph Wu <jo...@mesosphere.io>.
My guess is that your agents don't match the machines you specified.  Note:
The maintenance endpoints in Mesos allow you to specify maintenance against
non-existent machines, because the operator may add agents on those
machines in future.

In Mesos' maintenance primitives, a "machine" is a hostname + IP.  (A
physical/virtual machine can hold multiple agents.)  The response in
/maintenance/status is in terms of machines, not agents.  If none of your
frameworks support inverse offers, then you won't get any useful
information from the /maintenance/status endpoint.

You can figure out an agent's hostname/IP by hitting the /master/slaves
endpoint:

{
  "slaves": [
    {
      "pid":"slave(1)@127.0.0.1:5051",
      "hostname":"foo-bar",
      ...

^ The above translates to a machine = { "hostname": "foo-bar", "ip" : "
127.0.0.1" }

On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qz...@gmail.com> wrote:

> Hi all,
>
> I'm puzzled in using maintenance mode.
>
> I see this from mesos [doc site](
> http://mesos.apache.org/documentation/latest/maintenance/):
>
> ```
> When maintenance is triggered by the operator, all agents on the machine
> are told to shutdown. These agents are removed from the master, which means
> that a TASK_LOST status update will be sent for every task running on
> each of those agents. The scheduler driver’s slaveLost callback will also
> be invoked for each of the removed agents. Any agents on machines in
> maintenance are also prevented from re-registering with the master in the
> future (until maintenance is completed and the machine is brought back up).
> ```
> But I didn't find the agent machine shutdown or task failed when I test
> the maintenance HTTP endpoints.
>
> If mesos agents are in that mode will move the running tasks to other
> agents? namely, it will evacuate all the tasks in those agents? and the
> shutdown?
>
> When I POST "/maintenance/schedule" and "/machine/down" and give a proper
> maintain time window. I got the response that those specified agents are in
> the "draining_machines" and "down_machines" list by GET
> "/maintenance/status", but didn't shutdown and evacuate any tasks, why ?
> does it make sense?
>
> Thanks.
>
> --
> Best Regards,
> Chen, Qiang
>
>

Re: What will happen in maintenance mode

Posted by Joseph Wu <jo...@mesosphere.io>.
There are some cluster environments where nodes do not have an IP or
hostname.  That's why each MachineID must one have OR the other.  Not one
XOR the other.

There is a note further up the page that explains how Mesos matches
machines to agents:
https://github.com/apache/mesos/blame/3e115accca390663575753279f4400495625cb91/docs/maintenance.md#L135-L142

On Fri, Jul 22, 2016 at 9:34 PM, tommy xiao <xi...@gmail.com> wrote:

> yes, in recently mesos deployment, if i ignore the hostname, just
> specified IP, the mesos cluster sometime is not working. because the
> hostname is not correct. so i also curious the machine definition:
> "Each machine must have at least a hostname or IP included. The hostname
> is not case-sensitive."
>
> it should be defined must hostname and ip included.
>
>
> 2016-07-19 11:38 GMT+08:00 Qiang Chen <qz...@gmail.com>:
>
>> Thanks Joseph.
>>
>> I saw this from mesos [doc site](
>> http://mesos.apache.org/documentation/latest/maintenance/):
>>
>> "Each machine must have at least a hostname or IP included. The hostname
>> is not case-sensitive."
>>
>> From my test, the statement above is not correct, as if I only specific
>> the hostname or IP, it will NOT take effect for the maintenance agents.
>> but should specific both will OK.
>>
>> On 2016年07月19日 02:17, Joseph Wu wrote:
>>
>> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
>> for Automatic Cleanup! (joseph@mesosphere.io) Add cleanup rule
>> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3Dm%252B%252F9y8szBbdXKWiZ%252FDADQ0%252Fzx2OsVPpMz1%252BhAd8WOjE%253D%26token%3D7yPWMILH6f2hh7W8GLG1B4W3dWqI9yjvahQVEYFryQn3PGah0U1DPo7rfMlTIncRBOxGwo9jI4CHtQ%252BZ435zSbIfdjC1em9cdavejMkUAGEDLcp7EpoDgqU0pX3rrX3o0uawWqnSxys%253D&tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>> | More info
>> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>>
>>
>> My guess is that your agents don't match the machines you specified.
>> Note: The maintenance endpoints in Mesos allow you to specify maintenance
>> against non-existent machines, because the operator may add agents on those
>> machines in future.
>>
>> In Mesos' maintenance primitives, a "machine" is a hostname + IP.  (A
>> physical/virtual machine can hold multiple agents.)  The response in
>> /maintenance/status is in terms of machines, not agents.  If none of your
>> frameworks support inverse offers, then you won't get any useful
>> information from the /maintenance/status endpoint.
>>
>> You can figure out an agent's hostname/IP by hitting the /master/slaves
>> endpoint:
>>
>> {
>>   "slaves": [
>>     {
>>       "pid":"slave(1)@127.0.0.1:5051",
>>       "hostname":"foo-bar",
>>       ...
>>
>> ^ The above translates to a machine = { "hostname": "foo-bar", "ip" : "
>> 127.0.0.1" }
>>
>> On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qz...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm puzzled in using maintenance mode.
>>>
>>> I see this from mesos [doc site](
>>> http://mesos.apache.org/documentation/latest/maintenance/):
>>>
>>> ```
>>> When maintenance is triggered by the operator, all agents on the machine
>>> are told to shutdown. These agents are removed from the master, which means
>>> that a TASK_LOST status update will be sent for every task running on
>>> each of those agents. The scheduler driver’s slaveLost callback will
>>> also be invoked for each of the removed agents. Any agents on machines in
>>> maintenance are also prevented from re-registering with the master in the
>>> future (until maintenance is completed and the machine is brought back up).
>>> ```
>>> But I didn't find the agent machine shutdown or task failed when I test
>>> the maintenance HTTP endpoints.
>>>
>>> If mesos agents are in that mode will move the running tasks to other
>>> agents? namely, it will evacuate all the tasks in those agents? and the
>>> shutdown?
>>>
>>> When I POST "/maintenance/schedule" and "/machine/down" and give a
>>> proper maintain time window. I got the response that those specified agents
>>> are in the "draining_machines" and "down_machines" list by GET
>>> "/maintenance/status", but didn't shutdown and evacuate any tasks, why ?
>>> does it make sense?
>>>
>>> Thanks.
>>>
>>> --
>>> Best Regards,
>>> Chen, Qiang
>>>
>>>
>>
>> --
>> Best Regards,
>> Chen, Qiang
>>
>>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>

Re: What will happen in maintenance mode

Posted by tommy xiao <xi...@gmail.com>.
yes, in recently mesos deployment, if i ignore the hostname, just specified
IP, the mesos cluster sometime is not working. because the hostname is not
correct. so i also curious the machine definition:
"Each machine must have at least a hostname or IP included. The hostname is
not case-sensitive."

it should be defined must hostname and ip included.


2016-07-19 11:38 GMT+08:00 Qiang Chen <qz...@gmail.com>:

> Thanks Joseph.
>
> I saw this from mesos [doc site](
> http://mesos.apache.org/documentation/latest/maintenance/):
>
> "Each machine must have at least a hostname or IP included. The hostname
> is not case-sensitive."
>
> From my test, the statement above is not correct, as if I only specific
> the hostname or IP, it will NOT take effect for the maintenance agents.
> but should specific both will OK.
>
> On 2016年07月19日 02:17, Joseph Wu wrote:
>
> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (joseph@mesosphere.io) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3Dm%252B%252F9y8szBbdXKWiZ%252FDADQ0%252Fzx2OsVPpMz1%252BhAd8WOjE%253D%26token%3D7yPWMILH6f2hh7W8GLG1B4W3dWqI9yjvahQVEYFryQn3PGah0U1DPo7rfMlTIncRBOxGwo9jI4CHtQ%252BZ435zSbIfdjC1em9cdavejMkUAGEDLcp7EpoDgqU0pX3rrX3o0uawWqnSxys%253D&tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
>
> My guess is that your agents don't match the machines you specified.
> Note: The maintenance endpoints in Mesos allow you to specify maintenance
> against non-existent machines, because the operator may add agents on those
> machines in future.
>
> In Mesos' maintenance primitives, a "machine" is a hostname + IP.  (A
> physical/virtual machine can hold multiple agents.)  The response in
> /maintenance/status is in terms of machines, not agents.  If none of your
> frameworks support inverse offers, then you won't get any useful
> information from the /maintenance/status endpoint.
>
> You can figure out an agent's hostname/IP by hitting the /master/slaves
> endpoint:
>
> {
>   "slaves": [
>     {
>       "pid":"slave(1)@127.0.0.1:5051",
>       "hostname":"foo-bar",
>       ...
>
> ^ The above translates to a machine = { "hostname": "foo-bar", "ip" : "
> 127.0.0.1" }
>
> On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qz...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'm puzzled in using maintenance mode.
>>
>> I see this from mesos [doc site](
>> http://mesos.apache.org/documentation/latest/maintenance/):
>>
>> ```
>> When maintenance is triggered by the operator, all agents on the machine
>> are told to shutdown. These agents are removed from the master, which means
>> that a TASK_LOST status update will be sent for every task running on
>> each of those agents. The scheduler driver’s slaveLost callback will
>> also be invoked for each of the removed agents. Any agents on machines in
>> maintenance are also prevented from re-registering with the master in the
>> future (until maintenance is completed and the machine is brought back up).
>> ```
>> But I didn't find the agent machine shutdown or task failed when I test
>> the maintenance HTTP endpoints.
>>
>> If mesos agents are in that mode will move the running tasks to other
>> agents? namely, it will evacuate all the tasks in those agents? and the
>> shutdown?
>>
>> When I POST "/maintenance/schedule" and "/machine/down" and give a proper
>> maintain time window. I got the response that those specified agents are in
>> the "draining_machines" and "down_machines" list by GET
>> "/maintenance/status", but didn't shutdown and evacuate any tasks, why ?
>> does it make sense?
>>
>> Thanks.
>>
>> --
>> Best Regards,
>> Chen, Qiang
>>
>>
>
> --
> Best Regards,
> Chen, Qiang
>
>


-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com

Re: What will happen in maintenance mode

Posted by Qiang Chen <qz...@gmail.com>.
Thanks Joseph.

I saw this from mesos [doc 
site](http://mesos.apache.org/documentation/latest/maintenance/):

"Each machine must have at least a hostname or IP included. The hostname 
is not case-sensitive."

 From my test, the statement above is not correct, as if I only specific 
the hostname or IP, it will NOT take effect for the maintenance agents.
but should specific both will OK.

On 2016\u5e7407\u670819\u65e5 02:17, Joseph Wu wrote:
> Boxbe <https://www.boxbe.com/overview> This message is eligible for 
> Automatic Cleanup! (joseph@mesosphere.io) Add cleanup rule 
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3Dm%252B%252F9y8szBbdXKWiZ%252FDADQ0%252Fzx2OsVPpMz1%252BhAd8WOjE%253D%26token%3D7yPWMILH6f2hh7W8GLG1B4W3dWqI9yjvahQVEYFryQn3PGah0U1DPo7rfMlTIncRBOxGwo9jI4CHtQ%252BZ435zSbIfdjC1em9cdavejMkUAGEDLcp7EpoDgqU0pX3rrX3o0uawWqnSxys%253D&tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> 
> | More info 
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> 
>
>
>
> My guess is that your agents don't match the machines you specified.  
> Note: The maintenance endpoints in Mesos allow you to specify 
> maintenance against non-existent machines, because the operator may 
> add agents on those machines in future.
>
> In Mesos' maintenance primitives, a "machine" is a hostname + IP.  (A 
> physical/virtual machine can hold multiple agents.) The response in 
> /maintenance/status is in terms of machines, not agents.  If none of 
> your frameworks support inverse offers, then you won't get any useful 
> information from the /maintenance/status endpoint.
>
> You can figure out an agent's hostname/IP by hitting the 
> /master/slaves endpoint:
> {
>    "slaves": [
>      {
>        "pid":"slave(1)@127.0.0.1:5051 <http://127.0.0.1:5051>",
>        "hostname":"foo-bar",
>        ...
> ^ The above translates to a machine = { "hostname": "foo-bar", "ip" : 
> " 127.0.0.1" }
>
> On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qzschen@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi all,
>
>     I'm puzzled in using maintenance mode.
>
>     I see this from mesos [doc
>     site](http://mesos.apache.org/documentation/latest/maintenance/):
>
>     ```
>     When maintenance is triggered by the operator, all agents on the
>     machine are told to shutdown. These agents are removed from the
>     master, which means that a |TASK_LOST| status update will be sent
>     for every task running on each of those agents. The scheduler
>     driver\u2019s |slaveLost| callback will also be invoked for each of the
>     removed agents. Any agents on machines in maintenance are also
>     prevented from re-registering with the master in the future (until
>     maintenance is completed and the machine is brought back up).
>     ```
>     But I didn't find the agent machine shutdown or task failed when I
>     test the maintenance HTTP endpoints.
>
>     If mesos agents are in that mode will move the running tasks to
>     other agents? namely, it will evacuate all the tasks in those
>     agents? and the shutdown?
>
>     When I POST "/maintenance/schedule" and "/machine/down" and give a
>     proper maintain time window. I got the response that those
>     specified agents are in the "draining_machines" and
>     "down_machines" list by GET "/maintenance/status", but didn't
>     shutdown and evacuate any tasks, why ? does it make sense?
>
>     Thanks.
>
>     -- 
>     Best Regards,
>     Chen, Qiang
>
>

-- 
Best Regards,
Chen, Qiang