You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Qiang Chen <qz...@gmail.com> on 2016/07/18 09:08:23 UTC
What will happen in maintenance mode
Hi all,
I'm puzzled in using maintenance mode.
I see this from mesos [doc
site](http://mesos.apache.org/documentation/latest/maintenance/):
```
When maintenance is triggered by the operator, all agents on the machine
are told to shutdown. These agents are removed from the master, which
means that a |TASK_LOST| status update will be sent for every task
running on each of those agents. The scheduler driver\u2019s |slaveLost|
callback will also be invoked for each of the removed agents. Any agents
on machines in maintenance are also prevented from re-registering with
the master in the future (until maintenance is completed and the machine
is brought back up).
```
But I didn't find the agent machine shutdown or task failed when I test
the maintenance HTTP endpoints.
If mesos agents are in that mode will move the running tasks to other
agents? namely, it will evacuate all the tasks in those agents? and the
shutdown?
When I POST "/maintenance/schedule" and "/machine/down" and give a
proper maintain time window. I got the response that those specified
agents are in the "draining_machines" and "down_machines" list by GET
"/maintenance/status", but didn't shutdown and evacuate any tasks, why ?
does it make sense?
Thanks.
--
Best Regards,
Chen, Qiang
Re: What will happen in maintenance mode
Posted by Joseph Wu <jo...@mesosphere.io>.
My guess is that your agents don't match the machines you specified. Note:
The maintenance endpoints in Mesos allow you to specify maintenance against
non-existent machines, because the operator may add agents on those
machines in future.
In Mesos' maintenance primitives, a "machine" is a hostname + IP. (A
physical/virtual machine can hold multiple agents.) The response in
/maintenance/status is in terms of machines, not agents. If none of your
frameworks support inverse offers, then you won't get any useful
information from the /maintenance/status endpoint.
You can figure out an agent's hostname/IP by hitting the /master/slaves
endpoint:
{
"slaves": [
{
"pid":"slave(1)@127.0.0.1:5051",
"hostname":"foo-bar",
...
^ The above translates to a machine = { "hostname": "foo-bar", "ip" : "
127.0.0.1" }
On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qz...@gmail.com> wrote:
> Hi all,
>
> I'm puzzled in using maintenance mode.
>
> I see this from mesos [doc site](
> http://mesos.apache.org/documentation/latest/maintenance/):
>
> ```
> When maintenance is triggered by the operator, all agents on the machine
> are told to shutdown. These agents are removed from the master, which means
> that a TASK_LOST status update will be sent for every task running on
> each of those agents. The scheduler driver’s slaveLost callback will also
> be invoked for each of the removed agents. Any agents on machines in
> maintenance are also prevented from re-registering with the master in the
> future (until maintenance is completed and the machine is brought back up).
> ```
> But I didn't find the agent machine shutdown or task failed when I test
> the maintenance HTTP endpoints.
>
> If mesos agents are in that mode will move the running tasks to other
> agents? namely, it will evacuate all the tasks in those agents? and the
> shutdown?
>
> When I POST "/maintenance/schedule" and "/machine/down" and give a proper
> maintain time window. I got the response that those specified agents are in
> the "draining_machines" and "down_machines" list by GET
> "/maintenance/status", but didn't shutdown and evacuate any tasks, why ?
> does it make sense?
>
> Thanks.
>
> --
> Best Regards,
> Chen, Qiang
>
>
Re: What will happen in maintenance mode
Posted by Joseph Wu <jo...@mesosphere.io>.
There are some cluster environments where nodes do not have an IP or
hostname. That's why each MachineID must one have OR the other. Not one
XOR the other.
There is a note further up the page that explains how Mesos matches
machines to agents:
https://github.com/apache/mesos/blame/3e115accca390663575753279f4400495625cb91/docs/maintenance.md#L135-L142
On Fri, Jul 22, 2016 at 9:34 PM, tommy xiao <xi...@gmail.com> wrote:
> yes, in recently mesos deployment, if i ignore the hostname, just
> specified IP, the mesos cluster sometime is not working. because the
> hostname is not correct. so i also curious the machine definition:
> "Each machine must have at least a hostname or IP included. The hostname
> is not case-sensitive."
>
> it should be defined must hostname and ip included.
>
>
> 2016-07-19 11:38 GMT+08:00 Qiang Chen <qz...@gmail.com>:
>
>> Thanks Joseph.
>>
>> I saw this from mesos [doc site](
>> http://mesos.apache.org/documentation/latest/maintenance/):
>>
>> "Each machine must have at least a hostname or IP included. The hostname
>> is not case-sensitive."
>>
>> From my test, the statement above is not correct, as if I only specific
>> the hostname or IP, it will NOT take effect for the maintenance agents.
>> but should specific both will OK.
>>
>> On 2016年07月19日 02:17, Joseph Wu wrote:
>>
>> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
>> for Automatic Cleanup! (joseph@mesosphere.io) Add cleanup rule
>> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3Dm%252B%252F9y8szBbdXKWiZ%252FDADQ0%252Fzx2OsVPpMz1%252BhAd8WOjE%253D%26token%3D7yPWMILH6f2hh7W8GLG1B4W3dWqI9yjvahQVEYFryQn3PGah0U1DPo7rfMlTIncRBOxGwo9jI4CHtQ%252BZ435zSbIfdjC1em9cdavejMkUAGEDLcp7EpoDgqU0pX3rrX3o0uawWqnSxys%253D&tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>> | More info
>> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>>
>>
>> My guess is that your agents don't match the machines you specified.
>> Note: The maintenance endpoints in Mesos allow you to specify maintenance
>> against non-existent machines, because the operator may add agents on those
>> machines in future.
>>
>> In Mesos' maintenance primitives, a "machine" is a hostname + IP. (A
>> physical/virtual machine can hold multiple agents.) The response in
>> /maintenance/status is in terms of machines, not agents. If none of your
>> frameworks support inverse offers, then you won't get any useful
>> information from the /maintenance/status endpoint.
>>
>> You can figure out an agent's hostname/IP by hitting the /master/slaves
>> endpoint:
>>
>> {
>> "slaves": [
>> {
>> "pid":"slave(1)@127.0.0.1:5051",
>> "hostname":"foo-bar",
>> ...
>>
>> ^ The above translates to a machine = { "hostname": "foo-bar", "ip" : "
>> 127.0.0.1" }
>>
>> On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qz...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm puzzled in using maintenance mode.
>>>
>>> I see this from mesos [doc site](
>>> http://mesos.apache.org/documentation/latest/maintenance/):
>>>
>>> ```
>>> When maintenance is triggered by the operator, all agents on the machine
>>> are told to shutdown. These agents are removed from the master, which means
>>> that a TASK_LOST status update will be sent for every task running on
>>> each of those agents. The scheduler driver’s slaveLost callback will
>>> also be invoked for each of the removed agents. Any agents on machines in
>>> maintenance are also prevented from re-registering with the master in the
>>> future (until maintenance is completed and the machine is brought back up).
>>> ```
>>> But I didn't find the agent machine shutdown or task failed when I test
>>> the maintenance HTTP endpoints.
>>>
>>> If mesos agents are in that mode will move the running tasks to other
>>> agents? namely, it will evacuate all the tasks in those agents? and the
>>> shutdown?
>>>
>>> When I POST "/maintenance/schedule" and "/machine/down" and give a
>>> proper maintain time window. I got the response that those specified agents
>>> are in the "draining_machines" and "down_machines" list by GET
>>> "/maintenance/status", but didn't shutdown and evacuate any tasks, why ?
>>> does it make sense?
>>>
>>> Thanks.
>>>
>>> --
>>> Best Regards,
>>> Chen, Qiang
>>>
>>>
>>
>> --
>> Best Regards,
>> Chen, Qiang
>>
>>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>
Re: What will happen in maintenance mode
Posted by tommy xiao <xi...@gmail.com>.
yes, in recently mesos deployment, if i ignore the hostname, just specified
IP, the mesos cluster sometime is not working. because the hostname is not
correct. so i also curious the machine definition:
"Each machine must have at least a hostname or IP included. The hostname is
not case-sensitive."
it should be defined must hostname and ip included.
2016-07-19 11:38 GMT+08:00 Qiang Chen <qz...@gmail.com>:
> Thanks Joseph.
>
> I saw this from mesos [doc site](
> http://mesos.apache.org/documentation/latest/maintenance/):
>
> "Each machine must have at least a hostname or IP included. The hostname
> is not case-sensitive."
>
> From my test, the statement above is not correct, as if I only specific
> the hostname or IP, it will NOT take effect for the maintenance agents.
> but should specific both will OK.
>
> On 2016年07月19日 02:17, Joseph Wu wrote:
>
> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (joseph@mesosphere.io) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3Dm%252B%252F9y8szBbdXKWiZ%252FDADQ0%252Fzx2OsVPpMz1%252BhAd8WOjE%253D%26token%3D7yPWMILH6f2hh7W8GLG1B4W3dWqI9yjvahQVEYFryQn3PGah0U1DPo7rfMlTIncRBOxGwo9jI4CHtQ%252BZ435zSbIfdjC1em9cdavejMkUAGEDLcp7EpoDgqU0pX3rrX3o0uawWqnSxys%253D&tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
>
> My guess is that your agents don't match the machines you specified.
> Note: The maintenance endpoints in Mesos allow you to specify maintenance
> against non-existent machines, because the operator may add agents on those
> machines in future.
>
> In Mesos' maintenance primitives, a "machine" is a hostname + IP. (A
> physical/virtual machine can hold multiple agents.) The response in
> /maintenance/status is in terms of machines, not agents. If none of your
> frameworks support inverse offers, then you won't get any useful
> information from the /maintenance/status endpoint.
>
> You can figure out an agent's hostname/IP by hitting the /master/slaves
> endpoint:
>
> {
> "slaves": [
> {
> "pid":"slave(1)@127.0.0.1:5051",
> "hostname":"foo-bar",
> ...
>
> ^ The above translates to a machine = { "hostname": "foo-bar", "ip" : "
> 127.0.0.1" }
>
> On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qz...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'm puzzled in using maintenance mode.
>>
>> I see this from mesos [doc site](
>> http://mesos.apache.org/documentation/latest/maintenance/):
>>
>> ```
>> When maintenance is triggered by the operator, all agents on the machine
>> are told to shutdown. These agents are removed from the master, which means
>> that a TASK_LOST status update will be sent for every task running on
>> each of those agents. The scheduler driver’s slaveLost callback will
>> also be invoked for each of the removed agents. Any agents on machines in
>> maintenance are also prevented from re-registering with the master in the
>> future (until maintenance is completed and the machine is brought back up).
>> ```
>> But I didn't find the agent machine shutdown or task failed when I test
>> the maintenance HTTP endpoints.
>>
>> If mesos agents are in that mode will move the running tasks to other
>> agents? namely, it will evacuate all the tasks in those agents? and the
>> shutdown?
>>
>> When I POST "/maintenance/schedule" and "/machine/down" and give a proper
>> maintain time window. I got the response that those specified agents are in
>> the "draining_machines" and "down_machines" list by GET
>> "/maintenance/status", but didn't shutdown and evacuate any tasks, why ?
>> does it make sense?
>>
>> Thanks.
>>
>> --
>> Best Regards,
>> Chen, Qiang
>>
>>
>
> --
> Best Regards,
> Chen, Qiang
>
>
--
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com
Re: What will happen in maintenance mode
Posted by Qiang Chen <qz...@gmail.com>.
Thanks Joseph.
I saw this from mesos [doc
site](http://mesos.apache.org/documentation/latest/maintenance/):
"Each machine must have at least a hostname or IP included. The hostname
is not case-sensitive."
From my test, the statement above is not correct, as if I only specific
the hostname or IP, it will NOT take effect for the maintenance agents.
but should specific both will OK.
On 2016\u5e7407\u670819\u65e5 02:17, Joseph Wu wrote:
> Boxbe <https://www.boxbe.com/overview> This message is eligible for
> Automatic Cleanup! (joseph@mesosphere.io) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3Dm%252B%252F9y8szBbdXKWiZ%252FDADQ0%252Fzx2OsVPpMz1%252BhAd8WOjE%253D%26token%3D7yPWMILH6f2hh7W8GLG1B4W3dWqI9yjvahQVEYFryQn3PGah0U1DPo7rfMlTIncRBOxGwo9jI4CHtQ%252BZ435zSbIfdjC1em9cdavejMkUAGEDLcp7EpoDgqU0pX3rrX3o0uawWqnSxys%253D&tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
>
>
> My guess is that your agents don't match the machines you specified.
> Note: The maintenance endpoints in Mesos allow you to specify
> maintenance against non-existent machines, because the operator may
> add agents on those machines in future.
>
> In Mesos' maintenance primitives, a "machine" is a hostname + IP. (A
> physical/virtual machine can hold multiple agents.) The response in
> /maintenance/status is in terms of machines, not agents. If none of
> your frameworks support inverse offers, then you won't get any useful
> information from the /maintenance/status endpoint.
>
> You can figure out an agent's hostname/IP by hitting the
> /master/slaves endpoint:
> {
> "slaves": [
> {
> "pid":"slave(1)@127.0.0.1:5051 <http://127.0.0.1:5051>",
> "hostname":"foo-bar",
> ...
> ^ The above translates to a machine = { "hostname": "foo-bar", "ip" :
> " 127.0.0.1" }
>
> On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qzschen@gmail.com
> <ma...@gmail.com>> wrote:
>
> Hi all,
>
> I'm puzzled in using maintenance mode.
>
> I see this from mesos [doc
> site](http://mesos.apache.org/documentation/latest/maintenance/):
>
> ```
> When maintenance is triggered by the operator, all agents on the
> machine are told to shutdown. These agents are removed from the
> master, which means that a |TASK_LOST| status update will be sent
> for every task running on each of those agents. The scheduler
> driver\u2019s |slaveLost| callback will also be invoked for each of the
> removed agents. Any agents on machines in maintenance are also
> prevented from re-registering with the master in the future (until
> maintenance is completed and the machine is brought back up).
> ```
> But I didn't find the agent machine shutdown or task failed when I
> test the maintenance HTTP endpoints.
>
> If mesos agents are in that mode will move the running tasks to
> other agents? namely, it will evacuate all the tasks in those
> agents? and the shutdown?
>
> When I POST "/maintenance/schedule" and "/machine/down" and give a
> proper maintain time window. I got the response that those
> specified agents are in the "draining_machines" and
> "down_machines" list by GET "/maintenance/status", but didn't
> shutdown and evacuate any tasks, why ? does it make sense?
>
> Thanks.
>
> --
> Best Regards,
> Chen, Qiang
>
>
--
Best Regards,
Chen, Qiang