You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@aurora.apache.org by Ziliang Chen <zl...@gmail.com> on 2016/06/25 11:08:28 UTC

Prevent service Job moved from one machine to another periodically

Hi,

I have "service" job scheduled by Aurora. I found periodically, the service
job will be moved from one machine to another (stop it on previous machine
and restart it on another one). May i ask if this is an expected behavior
and if it is, how to make the service job stick to one machine unless there
is a failure ?

Thank you very much !

-- 
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com

Re: Prevent service Job moved from one machine to another periodically

Posted by Ziliang Chen <zl...@gmail.com>.
Got it. Thanks Bill!

On Sun, Jun 26, 2016 at 9:31 AM, Bill Farner <wf...@apache.org> wrote:

> FYI the behavior of an update will have a similar outcome - tasks are
> subject to move when restarted in the course of an update.
>
>
> On Saturday, June 25, 2016, Ziliang Chen <zl...@gmail.com> wrote:
>
>> Found the issue in the code, when doing update the job, i first did a
>> kill. Thanks Bill/Erb!
>>
>> On Sun, Jun 26, 2016 at 1:09 AM, Bill Farner <wf...@apache.org> wrote:
>>
>>> Entering the KILLING state suggests that a user issued a kill command
>>> for the service.  Does that sound plausible?
>>>
>>>
>>> On Saturday, June 25, 2016, Ziliang Chen <zl...@gmail.com> wrote:
>>>
>>>> Instructed KILL.
>>>>
>>>>  4 minutes ago - KILLED : Instructed to kill task.
>>>>
>>>>    - 06/25 22:32:23 LOCAL • PENDING
>>>>    - 06/25 22:33:06 LOCAL • ASSIGNED
>>>>    - 06/25 22:33:07 LOCAL • STARTING • Initializing sandbox.
>>>>    - 06/25 22:33:09 LOCAL • RUNNING
>>>>    - 06/25 22:42:15 LOCAL • KILLING • Killed by UNSECURE
>>>>    - 06/25 22:42:18 LOCAL • KILLED • Instructed to kill task.
>>>>
>>>>
>>>> On Sat, Jun 25, 2016 at 9:55 PM, Erb, Stephan <
>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>
>>>>> When you go to the scheduler website, you should be able to expand the
>>>>> task event history of a terminated instance (by clicking on the + icon).
>>>>> What does it say there?
>>>>>
>>>>>
>>>>>
>>>>> *From: *Ziliang Chen <zl...@gmail.com>
>>>>> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>>>> *Date: *Saturday 25 June 2016 at 15:08
>>>>> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>>>> *Subject: *Re: Prevent service Job moved from one machine to another
>>>>> periodically
>>>>>
>>>>>
>>>>>
>>>>> Hi Erb,
>>>>>
>>>>>
>>>>>
>>>>> As always, appreciate for your quick response!
>>>>>
>>>>> With your statements, I can understand Aurora's philosophy
>>>>> absolutely. But in my case, my service program is up and running there in
>>>>> good state, it seems that Aurora scheduler will kill my service
>>>>> program periodically and move it to another machine. I expect my service
>>>>> program running there forever unless there is a restart/crash etc.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Jun 25, 2016 at 8:27 PM, Erb, Stephan <
>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>
>>>>> Hi Zi-Liang,
>>>>>
>>>>>
>>>>>
>>>>> by default, services in Aurora are not pinned to a particular machine.
>>>>> This is based on the philosophy that services should be stateless and thus
>>>>> not dependent on a particular host, if possible.
>>>>>
>>>>>
>>>>>
>>>>> Whenever an instance/task of your service has terminated, the
>>>>> scheduler might pick any other random machine to launch a replacement.
>>>>> There are many reasons why this could happen:
>>>>>
>>>>>
>>>>>
>>>>> ·         Your instance has crashed, ran out of memory, or simply
>>>>> exited normally.
>>>>>
>>>>> ·         If enabled, your health checks may have detected that the
>>>>> instance is no longer responding.
>>>>>
>>>>> ·         The agent machine it was running on failed or lost
>>>>> connectivity with Mesos.
>>>>>
>>>>> ·         You have used the aurora_admin client to drain a machine.
>>>>>
>>>>> ·         You used a client command such as restart or update.
>>>>>
>>>>>
>>>>>
>>>>> If necessary, you could use constraints [1] to force Aurora to always
>>>>> schedule a service on the same host. However, this is not really
>>>>> recommended as it can easily lead to situations where your service cannot
>>>>> be launched at all, due to missing resources of he selected host in
>>>>> question.
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/aurora/blob/master/docs/features/constraints.md
>>>>>
>>>>>
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Stephan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From: *Ziliang Chen <zl...@gmail.com>
>>>>> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>>>> *Date: *Saturday 25 June 2016 at 13:08
>>>>> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>>>> *Subject: *Prevent service Job moved from one machine to another
>>>>> periodically
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I have "service" job scheduled by Aurora. I found periodically, the
>>>>> service job will be moved from one machine to another (stop it on previous
>>>>> machine and restart it on another one). May i ask if this is an expected
>>>>> behavior and if it is, how to make the service job stick to one machine
>>>>> unless there is a failure ?
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much !
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards, Zi-Liang
>>>>>
>>>>> Mail:zlchen.ken@gmail.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards, Zi-Liang
>>>>>
>>>>> Mail:zlchen.ken@gmail.com
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards, Zi-Liang
>>>>
>>>> Mail:zlchen.ken@gmail.com
>>>>
>>>
>>
>>
>> --
>> Regards, Zi-Liang
>>
>> Mail:zlchen.ken@gmail.com
>>
>


-- 
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com

Re: Prevent service Job moved from one machine to another periodically

Posted by Bill Farner <wf...@apache.org>.
FYI the behavior of an update will have a similar outcome - tasks are
subject to move when restarted in the course of an update.

On Saturday, June 25, 2016, Ziliang Chen <zl...@gmail.com> wrote:

> Found the issue in the code, when doing update the job, i first did a
> kill. Thanks Bill/Erb!
>
> On Sun, Jun 26, 2016 at 1:09 AM, Bill Farner <wfarner@apache.org
> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>
>> Entering the KILLING state suggests that a user issued a kill command for
>> the service.  Does that sound plausible?
>>
>>
>> On Saturday, June 25, 2016, Ziliang Chen <zlchen.ken@gmail.com
>> <javascript:_e(%7B%7D,'cvml','zlchen.ken@gmail.com');>> wrote:
>>
>>> Instructed KILL.
>>>
>>>  4 minutes ago - KILLED : Instructed to kill task.
>>>
>>>    - 06/25 22:32:23 LOCAL • PENDING
>>>    - 06/25 22:33:06 LOCAL • ASSIGNED
>>>    - 06/25 22:33:07 LOCAL • STARTING • Initializing sandbox.
>>>    - 06/25 22:33:09 LOCAL • RUNNING
>>>    - 06/25 22:42:15 LOCAL • KILLING • Killed by UNSECURE
>>>    - 06/25 22:42:18 LOCAL • KILLED • Instructed to kill task.
>>>
>>>
>>> On Sat, Jun 25, 2016 at 9:55 PM, Erb, Stephan <
>>> Stephan.Erb@blue-yonder.com> wrote:
>>>
>>>> When you go to the scheduler website, you should be able to expand the
>>>> task event history of a terminated instance (by clicking on the + icon).
>>>> What does it say there?
>>>>
>>>>
>>>>
>>>> *From: *Ziliang Chen <zl...@gmail.com>
>>>> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>>> *Date: *Saturday 25 June 2016 at 15:08
>>>> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>>> *Subject: *Re: Prevent service Job moved from one machine to another
>>>> periodically
>>>>
>>>>
>>>>
>>>> Hi Erb,
>>>>
>>>>
>>>>
>>>> As always, appreciate for your quick response!
>>>>
>>>> With your statements, I can understand Aurora's philosophy absolutely.
>>>> But in my case, my service program is up and running there in good state,
>>>> it seems that Aurora scheduler will kill my service program periodically
>>>> and move it to another machine. I expect my service program running there
>>>> forever unless there is a restart/crash etc.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Jun 25, 2016 at 8:27 PM, Erb, Stephan <
>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>
>>>> Hi Zi-Liang,
>>>>
>>>>
>>>>
>>>> by default, services in Aurora are not pinned to a particular machine.
>>>> This is based on the philosophy that services should be stateless and thus
>>>> not dependent on a particular host, if possible.
>>>>
>>>>
>>>>
>>>> Whenever an instance/task of your service has terminated, the scheduler
>>>> might pick any other random machine to launch a replacement. There are many
>>>> reasons why this could happen:
>>>>
>>>>
>>>>
>>>> ·         Your instance has crashed, ran out of memory, or simply
>>>> exited normally.
>>>>
>>>> ·         If enabled, your health checks may have detected that the
>>>> instance is no longer responding.
>>>>
>>>> ·         The agent machine it was running on failed or lost
>>>> connectivity with Mesos.
>>>>
>>>> ·         You have used the aurora_admin client to drain a machine.
>>>>
>>>> ·         You used a client command such as restart or update.
>>>>
>>>>
>>>>
>>>> If necessary, you could use constraints [1] to force Aurora to always
>>>> schedule a service on the same host. However, this is not really
>>>> recommended as it can easily lead to situations where your service cannot
>>>> be launched at all, due to missing resources of he selected host in
>>>> question.
>>>>
>>>>
>>>>
>>>> [1]
>>>> https://github.com/apache/aurora/blob/master/docs/features/constraints.md
>>>>
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Stephan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Ziliang Chen <zl...@gmail.com>
>>>> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>>> *Date: *Saturday 25 June 2016 at 13:08
>>>> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>>> *Subject: *Prevent service Job moved from one machine to another
>>>> periodically
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I have "service" job scheduled by Aurora. I found periodically, the
>>>> service job will be moved from one machine to another (stop it on previous
>>>> machine and restart it on another one). May i ask if this is an expected
>>>> behavior and if it is, how to make the service job stick to one machine
>>>> unless there is a failure ?
>>>>
>>>>
>>>>
>>>> Thank you very much !
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards, Zi-Liang
>>>>
>>>> Mail:zlchen.ken@gmail.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards, Zi-Liang
>>>>
>>>> Mail:zlchen.ken@gmail.com
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards, Zi-Liang
>>>
>>> Mail:zlchen.ken@gmail.com
>>>
>>
>
>
> --
> Regards, Zi-Liang
>
> Mail:zlchen.ken@gmail.com
> <javascript:_e(%7B%7D,'cvml','Mail:zlchen.ken@gmail.com');>
>

Re: Prevent service Job moved from one machine to another periodically

Posted by Ziliang Chen <zl...@gmail.com>.
Found the issue in the code, when doing update the job, i first did a kill.
Thanks Bill/Erb!

On Sun, Jun 26, 2016 at 1:09 AM, Bill Farner <wf...@apache.org> wrote:

> Entering the KILLING state suggests that a user issued a kill command for
> the service.  Does that sound plausible?
>
>
> On Saturday, June 25, 2016, Ziliang Chen <zl...@gmail.com> wrote:
>
>> Instructed KILL.
>>
>>  4 minutes ago - KILLED : Instructed to kill task.
>>
>>    - 06/25 22:32:23 LOCAL • PENDING
>>    - 06/25 22:33:06 LOCAL • ASSIGNED
>>    - 06/25 22:33:07 LOCAL • STARTING • Initializing sandbox.
>>    - 06/25 22:33:09 LOCAL • RUNNING
>>    - 06/25 22:42:15 LOCAL • KILLING • Killed by UNSECURE
>>    - 06/25 22:42:18 LOCAL • KILLED • Instructed to kill task.
>>
>>
>> On Sat, Jun 25, 2016 at 9:55 PM, Erb, Stephan <
>> Stephan.Erb@blue-yonder.com> wrote:
>>
>>> When you go to the scheduler website, you should be able to expand the
>>> task event history of a terminated instance (by clicking on the + icon).
>>> What does it say there?
>>>
>>>
>>>
>>> *From: *Ziliang Chen <zl...@gmail.com>
>>> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>> *Date: *Saturday 25 June 2016 at 15:08
>>> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>> *Subject: *Re: Prevent service Job moved from one machine to another
>>> periodically
>>>
>>>
>>>
>>> Hi Erb,
>>>
>>>
>>>
>>> As always, appreciate for your quick response!
>>>
>>> With your statements, I can understand Aurora's philosophy absolutely.
>>> But in my case, my service program is up and running there in good state,
>>> it seems that Aurora scheduler will kill my service program periodically
>>> and move it to another machine. I expect my service program running there
>>> forever unless there is a restart/crash etc.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Jun 25, 2016 at 8:27 PM, Erb, Stephan <
>>> Stephan.Erb@blue-yonder.com> wrote:
>>>
>>> Hi Zi-Liang,
>>>
>>>
>>>
>>> by default, services in Aurora are not pinned to a particular machine.
>>> This is based on the philosophy that services should be stateless and thus
>>> not dependent on a particular host, if possible.
>>>
>>>
>>>
>>> Whenever an instance/task of your service has terminated, the scheduler
>>> might pick any other random machine to launch a replacement. There are many
>>> reasons why this could happen:
>>>
>>>
>>>
>>> ·         Your instance has crashed, ran out of memory, or simply
>>> exited normally.
>>>
>>> ·         If enabled, your health checks may have detected that the
>>> instance is no longer responding.
>>>
>>> ·         The agent machine it was running on failed or lost
>>> connectivity with Mesos.
>>>
>>> ·         You have used the aurora_admin client to drain a machine.
>>>
>>> ·         You used a client command such as restart or update.
>>>
>>>
>>>
>>> If necessary, you could use constraints [1] to force Aurora to always
>>> schedule a service on the same host. However, this is not really
>>> recommended as it can easily lead to situations where your service cannot
>>> be launched at all, due to missing resources of he selected host in
>>> question.
>>>
>>>
>>>
>>> [1]
>>> https://github.com/apache/aurora/blob/master/docs/features/constraints.md
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Stephan
>>>
>>>
>>>
>>>
>>>
>>> *From: *Ziliang Chen <zl...@gmail.com>
>>> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>> *Date: *Saturday 25 June 2016 at 13:08
>>> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>>> *Subject: *Prevent service Job moved from one machine to another
>>> periodically
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I have "service" job scheduled by Aurora. I found periodically, the
>>> service job will be moved from one machine to another (stop it on previous
>>> machine and restart it on another one). May i ask if this is an expected
>>> behavior and if it is, how to make the service job stick to one machine
>>> unless there is a failure ?
>>>
>>>
>>>
>>> Thank you very much !
>>>
>>>
>>>
>>> --
>>>
>>> Regards, Zi-Liang
>>>
>>> Mail:zlchen.ken@gmail.com
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Regards, Zi-Liang
>>>
>>> Mail:zlchen.ken@gmail.com
>>>
>>>
>>
>>
>> --
>> Regards, Zi-Liang
>>
>> Mail:zlchen.ken@gmail.com
>>
>


-- 
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com

Re: Prevent service Job moved from one machine to another periodically

Posted by Bill Farner <wf...@apache.org>.
Entering the KILLING state suggests that a user issued a kill command for
the service.  Does that sound plausible?

On Saturday, June 25, 2016, Ziliang Chen <zl...@gmail.com> wrote:

> Instructed KILL.
>
>  4 minutes ago - KILLED : Instructed to kill task.
>
>    - 06/25 22:32:23 LOCAL • PENDING
>    - 06/25 22:33:06 LOCAL • ASSIGNED
>    - 06/25 22:33:07 LOCAL • STARTING • Initializing sandbox.
>    - 06/25 22:33:09 LOCAL • RUNNING
>    - 06/25 22:42:15 LOCAL • KILLING • Killed by UNSECURE
>    - 06/25 22:42:18 LOCAL • KILLED • Instructed to kill task.
>
>
> On Sat, Jun 25, 2016 at 9:55 PM, Erb, Stephan <Stephan.Erb@blue-yonder.com
> <javascript:_e(%7B%7D,'cvml','Stephan.Erb@blue-yonder.com');>> wrote:
>
>> When you go to the scheduler website, you should be able to expand the
>> task event history of a terminated instance (by clicking on the + icon).
>> What does it say there?
>>
>>
>>
>> *From: *Ziliang Chen <zlchen.ken@gmail.com
>> <javascript:_e(%7B%7D,'cvml','zlchen.ken@gmail.com');>>
>> *Reply-To: *"user@aurora.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>" <
>> user@aurora.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>>
>> *Date: *Saturday 25 June 2016 at 15:08
>> *To: *"user@aurora.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>" <
>> user@aurora.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>>
>> *Subject: *Re: Prevent service Job moved from one machine to another
>> periodically
>>
>>
>>
>> Hi Erb,
>>
>>
>>
>> As always, appreciate for your quick response!
>>
>> With your statements, I can understand Aurora's philosophy absolutely.
>> But in my case, my service program is up and running there in good state,
>> it seems that Aurora scheduler will kill my service program periodically
>> and move it to another machine. I expect my service program running there
>> forever unless there is a restart/crash etc.
>>
>>
>>
>>
>>
>> On Sat, Jun 25, 2016 at 8:27 PM, Erb, Stephan <
>> Stephan.Erb@blue-yonder.com
>> <javascript:_e(%7B%7D,'cvml','Stephan.Erb@blue-yonder.com');>> wrote:
>>
>> Hi Zi-Liang,
>>
>>
>>
>> by default, services in Aurora are not pinned to a particular machine.
>> This is based on the philosophy that services should be stateless and thus
>> not dependent on a particular host, if possible.
>>
>>
>>
>> Whenever an instance/task of your service has terminated, the scheduler
>> might pick any other random machine to launch a replacement. There are many
>> reasons why this could happen:
>>
>>
>>
>> ·         Your instance has crashed, ran out of memory, or simply exited
>> normally.
>>
>> ·         If enabled, your health checks may have detected that the
>> instance is no longer responding.
>>
>> ·         The agent machine it was running on failed or lost
>> connectivity with Mesos.
>>
>> ·         You have used the aurora_admin client to drain a machine.
>>
>> ·         You used a client command such as restart or update.
>>
>>
>>
>> If necessary, you could use constraints [1] to force Aurora to always
>> schedule a service on the same host. However, this is not really
>> recommended as it can easily lead to situations where your service cannot
>> be launched at all, due to missing resources of he selected host in
>> question.
>>
>>
>>
>> [1]
>> https://github.com/apache/aurora/blob/master/docs/features/constraints.md
>>
>>
>>
>> Best regards,
>>
>> Stephan
>>
>>
>>
>>
>>
>> *From: *Ziliang Chen <zlchen.ken@gmail.com
>> <javascript:_e(%7B%7D,'cvml','zlchen.ken@gmail.com');>>
>> *Reply-To: *"user@aurora.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>" <
>> user@aurora.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>>
>> *Date: *Saturday 25 June 2016 at 13:08
>> *To: *"user@aurora.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>" <
>> user@aurora.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>>
>> *Subject: *Prevent service Job moved from one machine to another
>> periodically
>>
>>
>>
>> Hi,
>>
>>
>>
>> I have "service" job scheduled by Aurora. I found periodically, the
>> service job will be moved from one machine to another (stop it on previous
>> machine and restart it on another one). May i ask if this is an expected
>> behavior and if it is, how to make the service job stick to one machine
>> unless there is a failure ?
>>
>>
>>
>> Thank you very much !
>>
>>
>>
>> --
>>
>> Regards, Zi-Liang
>>
>> Mail:zlchen.ken@gmail.com
>> <javascript:_e(%7B%7D,'cvml','Mail:zlchen.ken@gmail.com');>
>>
>>
>>
>>
>>
>> --
>>
>> Regards, Zi-Liang
>>
>> Mail:zlchen.ken@gmail.com
>> <javascript:_e(%7B%7D,'cvml','Mail:zlchen.ken@gmail.com');>
>>
>>
>
>
> --
> Regards, Zi-Liang
>
> Mail:zlchen.ken@gmail.com
> <javascript:_e(%7B%7D,'cvml','Mail:zlchen.ken@gmail.com');>
>

Re: Prevent service Job moved from one machine to another periodically

Posted by Ziliang Chen <zl...@gmail.com>.
Instructed KILL.

 4 minutes ago - KILLED : Instructed to kill task.

   - 06/25 22:32:23 LOCAL • PENDING
   - 06/25 22:33:06 LOCAL • ASSIGNED
   - 06/25 22:33:07 LOCAL • STARTING • Initializing sandbox.
   - 06/25 22:33:09 LOCAL • RUNNING
   - 06/25 22:42:15 LOCAL • KILLING • Killed by UNSECURE
   - 06/25 22:42:18 LOCAL • KILLED • Instructed to kill task.


On Sat, Jun 25, 2016 at 9:55 PM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> When you go to the scheduler website, you should be able to expand the
> task event history of a terminated instance (by clicking on the + icon).
> What does it say there?
>
>
>
> *From: *Ziliang Chen <zl...@gmail.com>
> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
> *Date: *Saturday 25 June 2016 at 15:08
> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
> *Subject: *Re: Prevent service Job moved from one machine to another
> periodically
>
>
>
> Hi Erb,
>
>
>
> As always, appreciate for your quick response!
>
> With your statements, I can understand Aurora's philosophy absolutely.
> But in my case, my service program is up and running there in good state,
> it seems that Aurora scheduler will kill my service program periodically
> and move it to another machine. I expect my service program running there
> forever unless there is a restart/crash etc.
>
>
>
>
>
> On Sat, Jun 25, 2016 at 8:27 PM, Erb, Stephan <St...@blue-yonder.com>
> wrote:
>
> Hi Zi-Liang,
>
>
>
> by default, services in Aurora are not pinned to a particular machine.
> This is based on the philosophy that services should be stateless and thus
> not dependent on a particular host, if possible.
>
>
>
> Whenever an instance/task of your service has terminated, the scheduler
> might pick any other random machine to launch a replacement. There are many
> reasons why this could happen:
>
>
>
> ·         Your instance has crashed, ran out of memory, or simply exited
> normally.
>
> ·         If enabled, your health checks may have detected that the
> instance is no longer responding.
>
> ·         The agent machine it was running on failed or lost connectivity
> with Mesos.
>
> ·         You have used the aurora_admin client to drain a machine.
>
> ·         You used a client command such as restart or update.
>
>
>
> If necessary, you could use constraints [1] to force Aurora to always
> schedule a service on the same host. However, this is not really
> recommended as it can easily lead to situations where your service cannot
> be launched at all, due to missing resources of he selected host in
> question.
>
>
>
> [1]
> https://github.com/apache/aurora/blob/master/docs/features/constraints.md
>
>
>
> Best regards,
>
> Stephan
>
>
>
>
>
> *From: *Ziliang Chen <zl...@gmail.com>
> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
> *Date: *Saturday 25 June 2016 at 13:08
> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
> *Subject: *Prevent service Job moved from one machine to another
> periodically
>
>
>
> Hi,
>
>
>
> I have "service" job scheduled by Aurora. I found periodically, the
> service job will be moved from one machine to another (stop it on previous
> machine and restart it on another one). May i ask if this is an expected
> behavior and if it is, how to make the service job stick to one machine
> unless there is a failure ?
>
>
>
> Thank you very much !
>
>
>
> --
>
> Regards, Zi-Liang
>
> Mail:zlchen.ken@gmail.com
>
>
>
>
>
> --
>
> Regards, Zi-Liang
>
> Mail:zlchen.ken@gmail.com
>
>


-- 
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com

Re: Prevent service Job moved from one machine to another periodically

Posted by "Erb, Stephan" <St...@blue-yonder.com>.
When you go to the scheduler website, you should be able to expand the task event history of a terminated instance (by clicking on the + icon). What does it say there?

From: Ziliang Chen <zl...@gmail.com>
Reply-To: "user@aurora.apache.org" <us...@aurora.apache.org>
Date: Saturday 25 June 2016 at 15:08
To: "user@aurora.apache.org" <us...@aurora.apache.org>
Subject: Re: Prevent service Job moved from one machine to another periodically

Hi Erb,

As always, appreciate for your quick response!
With your statements, I can understand Aurora's philosophy absolutely. But in my case, my service program is up and running there in good state, it seems that Aurora scheduler will kill my service program periodically and move it to another machine. I expect my service program running there forever unless there is a restart/crash etc.


On Sat, Jun 25, 2016 at 8:27 PM, Erb, Stephan <St...@blue-yonder.com>> wrote:
Hi Zi-Liang,

by default, services in Aurora are not pinned to a particular machine. This is based on the philosophy that services should be stateless and thus not dependent on a particular host, if possible.

Whenever an instance/task of your service has terminated, the scheduler might pick any other random machine to launch a replacement. There are many reasons why this could happen:


•         Your instance has crashed, ran out of memory, or simply exited normally.

•         If enabled, your health checks may have detected that the instance is no longer responding.

•         The agent machine it was running on failed or lost connectivity with Mesos.

•         You have used the aurora_admin client to drain a machine.

•         You used a client command such as restart or update.

If necessary, you could use constraints [1] to force Aurora to always schedule a service on the same host. However, this is not really recommended as it can easily lead to situations where your service cannot be launched at all, due to missing resources of he selected host in question.

[1] https://github.com/apache/aurora/blob/master/docs/features/constraints.md

Best regards,
Stephan


From: Ziliang Chen <zl...@gmail.com>>
Reply-To: "user@aurora.apache.org<ma...@aurora.apache.org>" <us...@aurora.apache.org>>
Date: Saturday 25 June 2016 at 13:08
To: "user@aurora.apache.org<ma...@aurora.apache.org>" <us...@aurora.apache.org>>
Subject: Prevent service Job moved from one machine to another periodically

Hi,

I have "service" job scheduled by Aurora. I found periodically, the service job will be moved from one machine to another (stop it on previous machine and restart it on another one). May i ask if this is an expected behavior and if it is, how to make the service job stick to one machine unless there is a failure ?

Thank you very much !

--
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com<ma...@gmail.com>



--
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com<ma...@gmail.com>

Re: Prevent service Job moved from one machine to another periodically

Posted by Ziliang Chen <zl...@gmail.com>.
Hi Erb,

As always, appreciate for your quick response!
With your statements, I can understand Aurora's philosophy absolutely. But
in my case, my service program is up and running there in good state, it
seems that Aurora scheduler will kill my service program periodically and
move it to another machine. I expect my service program running there
forever unless there is a restart/crash etc.


On Sat, Jun 25, 2016 at 8:27 PM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> Hi Zi-Liang,
>
>
>
> by default, services in Aurora are not pinned to a particular machine.
> This is based on the philosophy that services should be stateless and thus
> not dependent on a particular host, if possible.
>
>
>
> Whenever an instance/task of your service has terminated, the scheduler
> might pick any other random machine to launch a replacement. There are many
> reasons why this could happen:
>
>
>
> ·         Your instance has crashed, ran out of memory, or simply exited
> normally.
>
> ·         If enabled, your health checks may have detected that the
> instance is no longer responding.
>
> ·         The agent machine it was running on failed or lost connectivity
> with Mesos.
>
> ·         You have used the aurora_admin client to drain a machine.
>
> ·         You used a client command such as restart or update.
>
>
>
> If necessary, you could use constraints [1] to force Aurora to always
> schedule a service on the same host. However, this is not really
> recommended as it can easily lead to situations where your service cannot
> be launched at all, due to missing resources of he selected host in
> question.
>
>
>
> [1]
> https://github.com/apache/aurora/blob/master/docs/features/constraints.md
>
>
>
> Best regards,
>
> Stephan
>
>
>
>
>
> *From: *Ziliang Chen <zl...@gmail.com>
> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
> *Date: *Saturday 25 June 2016 at 13:08
> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
> *Subject: *Prevent service Job moved from one machine to another
> periodically
>
>
>
> Hi,
>
>
>
> I have "service" job scheduled by Aurora. I found periodically, the
> service job will be moved from one machine to another (stop it on previous
> machine and restart it on another one). May i ask if this is an expected
> behavior and if it is, how to make the service job stick to one machine
> unless there is a failure ?
>
>
>
> Thank you very much !
>
>
>
> --
>
> Regards, Zi-Liang
>
> Mail:zlchen.ken@gmail.com
>
>


-- 
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com

Re: Prevent service Job moved from one machine to another periodically

Posted by "Erb, Stephan" <St...@blue-yonder.com>.
Hi Zi-Liang,

by default, services in Aurora are not pinned to a particular machine. This is based on the philosophy that services should be stateless and thus not dependent on a particular host, if possible.

Whenever an instance/task of your service has terminated, the scheduler might pick any other random machine to launch a replacement. There are many reasons why this could happen:


·         Your instance has crashed, ran out of memory, or simply exited normally.

·         If enabled, your health checks may have detected that the instance is no longer responding.

·         The agent machine it was running on failed or lost connectivity with Mesos.

·         You have used the aurora_admin client to drain a machine.

·         You used a client command such as restart or update.

If necessary, you could use constraints [1] to force Aurora to always schedule a service on the same host. However, this is not really recommended as it can easily lead to situations where your service cannot be launched at all, due to missing resources of he selected host in question.

[1] https://github.com/apache/aurora/blob/master/docs/features/constraints.md

Best regards,
Stephan


From: Ziliang Chen <zl...@gmail.com>
Reply-To: "user@aurora.apache.org" <us...@aurora.apache.org>
Date: Saturday 25 June 2016 at 13:08
To: "user@aurora.apache.org" <us...@aurora.apache.org>
Subject: Prevent service Job moved from one machine to another periodically

Hi,

I have "service" job scheduled by Aurora. I found periodically, the service job will be moved from one machine to another (stop it on previous machine and restart it on another one). May i ask if this is an expected behavior and if it is, how to make the service job stick to one machine unless there is a failure ?

Thank you very much !

--
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com<ma...@gmail.com>