You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by ilya musayev <il...@gmail.com> on 2018/04/04 20:15:18 UTC

[DISCUSS] CloudStack graceful shutdown

Use case:
In any environment - time to time - administrator needs to perform a
maintenance. Current stop sequence of cloudstack management server will
ignore the fact that there may be long running async jobs - and terminate
the process. This in turn can create a poor user experience and occasional
inconsistency  in cloudstack db.

This is especially painful in large environments where the user has
thousands of nodes and there is a continuous patching that happens around
the clock - that requires migration of workload from one node to another.

With that said - i've created a script that monitors the async job queue
for given MS and waits for it complete all jobs. More details are posted
below.

I'd like to introduce "graceful-shutdown" into the systemctl/service of
cloudstack-management service.

The details of how it will work is below:

Workflow for graceful shutdown:
  Using iptables/firewalld - block any connection attempts on 8080/8443 (we
can identify the ports dynamically)
  Identify the MSID for the node, using the proper msid - query async_job
table for
1) any jobs that are still running (or job_status=“0”)
2) job_dispatcher not like “pseudoJobDispatcher"
3) job_init_msid=$my_ms_id

Monitor this async_job table for 60 minutes - until all async jobs for MSID
are done, then proceed with shutdown
    If failed for any reason or terminated, catch the exit via trap command
and unblock the 8080/8443

Comments are welcome

Regards,
ilya

Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
Rafael

> * Regarding the tasks/jobs that management servers (MSs) execute; are
these
tasks originate from requests that come to the MS, or is it possible that
requests received by one management server to be executed by other? I mean,
if I execute a request against MS1, will this request always be
executed/threated by MS1, or is it possible that this request is executed
by another MS (e.g. MS2)?

Yes its possible, but it will be tracked under async_job with proper MS
that is responsible for this task.

My initial goal was to prevent user from creating more async jobs on the
node thats about to go down for maintenance - but as i'm thinking about it
- i dont know if it matters - since async job will be executed on the MS
node that tracks a specific hypervisor/agent - as defined in cloud.host
table.

Maybe i'll leave off the blocking off 8080/8443 and just focus on tracking
async_jobs instead. Assuming you are managing your MS with Load Balancer -
it should be smart enough to shift the user traffic to MS that is up.

> * I would suggest that after we block traffic coming from
8080/8443/8250(we
will need to block this as well right?), we can log the execution of tasks.
I mean, something saying, there are XXX tasks (enumerate tasks) still being
executed, we will wait for them to finish before shutting down

8250 - is a bit too aggressive in my opinion andwe dont want to do that. If
you block 8250 and you have a long running tasks - you are waiting on to
complete - then it may fail - because you block agent communication on 8250.

Thanks
ilya


On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
rafaelweingartner@gmail.com> wrote:

> Big +1 for this feature; I only have a few doubts.
>
> * Regarding the tasks/jobs that management servers (MSs) execute; are these
> tasks originate from requests that come to the MS, or is it possible that
> requests received by one management server to be executed by other? I mean,
> if I execute a request against MS1, will this request always be
> executed/threated by MS1, or is it possible that this request is executed
> by another MS (e.g. MS2)?
>
> * I would suggest that after we block traffic coming from 8080/8443/8250(we
> will need to block this as well right?), we can log the execution of tasks.
> I mean, something saying, there are XXX tasks (enumerate tasks) still being
> executed, we will wait for them to finish before shutting down.
>
> * The timeout (60 minutes suggested) could be global settings that we can
> load before executing the graceful-shutdown.
>
> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <ilya.mailing.lists@gmail.com
> >
> wrote:
>
> > Use case:
> > In any environment - time to time - administrator needs to perform a
> > maintenance. Current stop sequence of cloudstack management server will
> > ignore the fact that there may be long running async jobs - and terminate
> > the process. This in turn can create a poor user experience and
> occasional
> > inconsistency  in cloudstack db.
> >
> > This is especially painful in large environments where the user has
> > thousands of nodes and there is a continuous patching that happens around
> > the clock - that requires migration of workload from one node to another.
> >
> > With that said - i've created a script that monitors the async job queue
> > for given MS and waits for it complete all jobs. More details are posted
> > below.
> >
> > I'd like to introduce "graceful-shutdown" into the systemctl/service of
> > cloudstack-management service.
> >
> > The details of how it will work is below:
> >
> > Workflow for graceful shutdown:
> >   Using iptables/firewalld - block any connection attempts on 8080/8443
> (we
> > can identify the ports dynamically)
> >   Identify the MSID for the node, using the proper msid - query async_job
> > table for
> > 1) any jobs that are still running (or job_status=“0”)
> > 2) job_dispatcher not like “pseudoJobDispatcher"
> > 3) job_init_msid=$my_ms_id
> >
> > Monitor this async_job table for 60 minutes - until all async jobs for
> MSID
> > are done, then proceed with shutdown
> >     If failed for any reason or terminated, catch the exit via trap
> command
> > and unblock the 8080/8443
> >
> > Comments are welcome
> >
> > Regards,
> > ilya
> >
>
>
>
> --
> Rafael Weingärtner
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
Andrija

This is the reason for this enhancement, snapshot, migration and others -
are all async jobs - and therefore should be tracked in async_job table
under specific MS.It is known they may take a while to complete and last
thing we want is to interrupt it.

Depending on what value you have set in Configurations - it may time out -
but continue working on the background.. meaning cloudstack will stop
tracking the async job beyond specific interval - but cloudstack agent will
push forward.

I dont see a harm of taking the server offline - if there are no jobs that
are being tracked.

However - we should not stop the server - if we identify any jobs that are
still active. The user can decide to append the forceful shutdown after the
graceful one if he feels like it. For example

[shell] # service cloudstack-management graceful-shutdown; service
cloudstack-management shutdown

For your issue,

Please check the value for "job.cancel.threshold.minutes"

      "category": "Advanced",

      "description": "Time (in minutes) for async-jobs to be forcely
cancelled if it has been in process for long",

      "name": "job.cancel.threshold.minutes",

      "value": "60"


I propose for the graceful shutdown command to source
"job.cancel.threshold.minutes"
as a max value - before giving up on the endeavor.


The only issue i'm on the fence about - is blocking access to 8080/8443 -
if you have a single node setup.


There is a chance you may block the access to cloudstack for over an hour -
and that may not be what you intended.


Perhaps we add a parameter in db.properties for
"graceful.shutdown.block.api.server = true/false"


Regards,

ilya

On Wed, Apr 4, 2018 at 2:22 PM, Andrija Panic <an...@gmail.com>
wrote:

> One comment here (I had to shutdown whole DC for few hours recently....),
> please make sure to perhaps at least consider snapshoting process as the
> special case - it can take few hours for snapshot to complete really (copy
> process from Primary to Secondary Storage)
>
> I did (in my recent unfortunate DC shutdown), actually stop MS (we also
> have script to identify running async jobs), so we stop it once safe, but
> any running qemu-img processes (we use kVM) need to be killed manually
> (ansbile) after MS is stopped, etc,etc...
>
> I can assume most jobs can take reasonable long time to complete, but
> snapshots are probably the biggest exceptions as can take extremely long
> time to complete...
>
> Cheers
>
> On 4 April 2018 at 22:46, Tutkowski, Mike <Mi...@netapp.com>
> wrote:
>
> > I may be remembering this incorrectly, but from what I recall, if a
> > resource is owned by one MS and a request related to that resource comes
> in
> > to another MS, the MS that received the request passes it on to the other
> > MS.
> >
> > > On Apr 4, 2018, at 2:36 PM, Rafael Weingärtner <
> > rafaelweingartner@gmail.com> wrote:
> > >
> > > Big +1 for this feature; I only have a few doubts.
> > >
> > > * Regarding the tasks/jobs that management servers (MSs) execute; are
> > these
> > > tasks originate from requests that come to the MS, or is it possible
> that
> > > requests received by one management server to be executed by other? I
> > mean,
> > > if I execute a request against MS1, will this request always be
> > > executed/threated by MS1, or is it possible that this request is
> executed
> > > by another MS (e.g. MS2)?
> > >
> > > * I would suggest that after we block traffic coming from
> > 8080/8443/8250(we
> > > will need to block this as well right?), we can log the execution of
> > tasks.
> > > I mean, something saying, there are XXX tasks (enumerate tasks) still
> > being
> > > executed, we will wait for them to finish before shutting down.
> > >
> > > * The timeout (60 minutes suggested) could be global settings that we
> can
> > > load before executing the graceful-shutdown.
> > >
> > > On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> > ilya.mailing.lists@gmail.com>
> > > wrote:
> > >
> > >> Use case:
> > >> In any environment - time to time - administrator needs to perform a
> > >> maintenance. Current stop sequence of cloudstack management server
> will
> > >> ignore the fact that there may be long running async jobs - and
> > terminate
> > >> the process. This in turn can create a poor user experience and
> > occasional
> > >> inconsistency  in cloudstack db.
> > >>
> > >> This is especially painful in large environments where the user has
> > >> thousands of nodes and there is a continuous patching that happens
> > around
> > >> the clock - that requires migration of workload from one node to
> > another.
> > >>
> > >> With that said - i've created a script that monitors the async job
> queue
> > >> for given MS and waits for it complete all jobs. More details are
> posted
> > >> below.
> > >>
> > >> I'd like to introduce "graceful-shutdown" into the systemctl/service
> of
> > >> cloudstack-management service.
> > >>
> > >> The details of how it will work is below:
> > >>
> > >> Workflow for graceful shutdown:
> > >>  Using iptables/firewalld - block any connection attempts on 8080/8443
> > (we
> > >> can identify the ports dynamically)
> > >>  Identify the MSID for the node, using the proper msid - query
> async_job
> > >> table for
> > >> 1) any jobs that are still running (or job_status=“0”)
> > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> > >> 3) job_init_msid=$my_ms_id
> > >>
> > >> Monitor this async_job table for 60 minutes - until all async jobs for
> > MSID
> > >> are done, then proceed with shutdown
> > >>    If failed for any reason or terminated, catch the exit via trap
> > command
> > >> and unblock the 8080/8443
> > >>
> > >> Comments are welcome
> > >>
> > >> Regards,
> > >> ilya
> > >>
> > >
> > >
> > >
> > > --
> > > Rafael Weingärtner
> >
>
>
>
> --
>
> Andrija Panić
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Andrija Panic <an...@gmail.com>.
One comment here (I had to shutdown whole DC for few hours recently....),
please make sure to perhaps at least consider snapshoting process as the
special case - it can take few hours for snapshot to complete really (copy
process from Primary to Secondary Storage)

I did (in my recent unfortunate DC shutdown), actually stop MS (we also
have script to identify running async jobs), so we stop it once safe, but
any running qemu-img processes (we use kVM) need to be killed manually
(ansbile) after MS is stopped, etc,etc...

I can assume most jobs can take reasonable long time to complete, but
snapshots are probably the biggest exceptions as can take extremely long
time to complete...

Cheers

On 4 April 2018 at 22:46, Tutkowski, Mike <Mi...@netapp.com> wrote:

> I may be remembering this incorrectly, but from what I recall, if a
> resource is owned by one MS and a request related to that resource comes in
> to another MS, the MS that received the request passes it on to the other
> MS.
>
> > On Apr 4, 2018, at 2:36 PM, Rafael Weingärtner <
> rafaelweingartner@gmail.com> wrote:
> >
> > Big +1 for this feature; I only have a few doubts.
> >
> > * Regarding the tasks/jobs that management servers (MSs) execute; are
> these
> > tasks originate from requests that come to the MS, or is it possible that
> > requests received by one management server to be executed by other? I
> mean,
> > if I execute a request against MS1, will this request always be
> > executed/threated by MS1, or is it possible that this request is executed
> > by another MS (e.g. MS2)?
> >
> > * I would suggest that after we block traffic coming from
> 8080/8443/8250(we
> > will need to block this as well right?), we can log the execution of
> tasks.
> > I mean, something saying, there are XXX tasks (enumerate tasks) still
> being
> > executed, we will wait for them to finish before shutting down.
> >
> > * The timeout (60 minutes suggested) could be global settings that we can
> > load before executing the graceful-shutdown.
> >
> > On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> ilya.mailing.lists@gmail.com>
> > wrote:
> >
> >> Use case:
> >> In any environment - time to time - administrator needs to perform a
> >> maintenance. Current stop sequence of cloudstack management server will
> >> ignore the fact that there may be long running async jobs - and
> terminate
> >> the process. This in turn can create a poor user experience and
> occasional
> >> inconsistency  in cloudstack db.
> >>
> >> This is especially painful in large environments where the user has
> >> thousands of nodes and there is a continuous patching that happens
> around
> >> the clock - that requires migration of workload from one node to
> another.
> >>
> >> With that said - i've created a script that monitors the async job queue
> >> for given MS and waits for it complete all jobs. More details are posted
> >> below.
> >>
> >> I'd like to introduce "graceful-shutdown" into the systemctl/service of
> >> cloudstack-management service.
> >>
> >> The details of how it will work is below:
> >>
> >> Workflow for graceful shutdown:
> >>  Using iptables/firewalld - block any connection attempts on 8080/8443
> (we
> >> can identify the ports dynamically)
> >>  Identify the MSID for the node, using the proper msid - query async_job
> >> table for
> >> 1) any jobs that are still running (or job_status=“0”)
> >> 2) job_dispatcher not like “pseudoJobDispatcher"
> >> 3) job_init_msid=$my_ms_id
> >>
> >> Monitor this async_job table for 60 minutes - until all async jobs for
> MSID
> >> are done, then proceed with shutdown
> >>    If failed for any reason or terminated, catch the exit via trap
> command
> >> and unblock the 8080/8443
> >>
> >> Comments are welcome
> >>
> >> Regards,
> >> ilya
> >>
> >
> >
> >
> > --
> > Rafael Weingärtner
>



-- 

Andrija Panić

Re: [DISCUSS] CloudStack graceful shutdown

Posted by "Tutkowski, Mike" <Mi...@netapp.com>.
I may be remembering this incorrectly, but from what I recall, if a resource is owned by one MS and a request related to that resource comes in to another MS, the MS that received the request passes it on to the other MS.

> On Apr 4, 2018, at 2:36 PM, Rafael Weingärtner <ra...@gmail.com> wrote:
> 
> Big +1 for this feature; I only have a few doubts.
> 
> * Regarding the tasks/jobs that management servers (MSs) execute; are these
> tasks originate from requests that come to the MS, or is it possible that
> requests received by one management server to be executed by other? I mean,
> if I execute a request against MS1, will this request always be
> executed/threated by MS1, or is it possible that this request is executed
> by another MS (e.g. MS2)?
> 
> * I would suggest that after we block traffic coming from 8080/8443/8250(we
> will need to block this as well right?), we can log the execution of tasks.
> I mean, something saying, there are XXX tasks (enumerate tasks) still being
> executed, we will wait for them to finish before shutting down.
> 
> * The timeout (60 minutes suggested) could be global settings that we can
> load before executing the graceful-shutdown.
> 
> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <il...@gmail.com>
> wrote:
> 
>> Use case:
>> In any environment - time to time - administrator needs to perform a
>> maintenance. Current stop sequence of cloudstack management server will
>> ignore the fact that there may be long running async jobs - and terminate
>> the process. This in turn can create a poor user experience and occasional
>> inconsistency  in cloudstack db.
>> 
>> This is especially painful in large environments where the user has
>> thousands of nodes and there is a continuous patching that happens around
>> the clock - that requires migration of workload from one node to another.
>> 
>> With that said - i've created a script that monitors the async job queue
>> for given MS and waits for it complete all jobs. More details are posted
>> below.
>> 
>> I'd like to introduce "graceful-shutdown" into the systemctl/service of
>> cloudstack-management service.
>> 
>> The details of how it will work is below:
>> 
>> Workflow for graceful shutdown:
>>  Using iptables/firewalld - block any connection attempts on 8080/8443 (we
>> can identify the ports dynamically)
>>  Identify the MSID for the node, using the proper msid - query async_job
>> table for
>> 1) any jobs that are still running (or job_status=“0”)
>> 2) job_dispatcher not like “pseudoJobDispatcher"
>> 3) job_init_msid=$my_ms_id
>> 
>> Monitor this async_job table for 60 minutes - until all async jobs for MSID
>> are done, then proceed with shutdown
>>    If failed for any reason or terminated, catch the exit via trap command
>> and unblock the 8080/8443
>> 
>> Comments are welcome
>> 
>> Regards,
>> ilya
>> 
> 
> 
> 
> -- 
> Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
After much useful input from many of you - i realize my approach is
somewhat incomplete and possible very optimistic.

Speaking to Marcus, here is what we propose as alternate solution, i was
hoping to stay outside of the "core" - but it looks like there is no other
away around it.

Proposed functionality: Management Server functional to prepare for
maintenance
* i'm thinking this should be applicable to multinode setup only
drain all connection on 8250 for KVM and Other agents - by issuing a
reconnect command on agents
while 8250 is still listening, a new attempt to connect will be blocked and
agent will be asked to reconnect (if you have LB - it will route it to
another node and eventually reconnect all agents to other nodes - this
might be an area where Marc's HAProxy solution would plugin). In 4.11 -
there is a new framework for managing agent connectivity without needing
Load Balancer, need to investigate how this will work.
allow the existing running async tasks to complete - as per
"job.cancel.threshold.minutes"
max value
        queue the new tasks and process them on the next management server

Still dont know what will happen to Xen or VMware in this case - perhaps
ShapeBlue team can help answer or fill in the blanks for us.

Regards,
ilya

On Thu, Apr 5, 2018 at 2:48 PM, ilya musayev <il...@gmail.com>
wrote:

> Hi Sergey
>
> Glad to see you are doing well,
>
> I was gonna say drop "enterprise virtualization company" and save a
> $fortune$ - but its not for everyone :)
>
> I'll post another proposed solution to bottom of this thread.
>
> Regards
> ilya
>
>
> On Wed, Apr 4, 2018 at 5:22 PM, Sergey Levitskiy <se...@hotmail.com>
> wrote:
>
>> Now without spellchecking :)
>>
>> This is not simple e.g. for VMware. Each management server also acts as
>> an agent proxy so tasks against a particular ESX host will be always
>> forwarded. That right answer will be to support a native “maintenance mode”
>> for management server. When entered to such mode the management server
>> should release all agents including SSVM, block/redirect API calls and
>> login request and finish all async job it originated.
>>
>>
>>
>> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@hotmail.com<mailto:
>> serg38l@hotmail.com>> wrote:
>>
>> This is not simple e.g. for VMware. Each management server also acts as
>> an agent proxy so tasks against a particular ESX host will be always
>> forwarded. That right answer will be to a native support for “maintenance
>> mode” for management server. When entered to such mode the management
>> server should release all agents including save, block/redirect API calls
>> and login request and finish all a sync job it originated.
>>
>> Sent from my iPhone
>>
>> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
>> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
>>
>> Ilya, still regarding the management server that is being shut down issue;
>> if other MSs/or maybe system VMs (I am not sure to know if they are able
>> to
>> do such tasks) can direct/redirect/send new jobs to this management server
>> (the one being shut down), the process might never end because new tasks
>> are always being created for the management server that we want to shut
>> down. Is this scenario possible?
>>
>> That is why I mentioned blocking the port 8250 for the
>> “graceful-shutdown”.
>>
>> If this scenario is not possible, then everything s fine.
>>
>>
>> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
>> ilya.mailing.lists@gmail.com<ma...@gmail.com>>
>> wrote:
>>
>> I'm thinking of using a configuration from "job.cancel.threshold.minutes"
>> -
>> it will be the longest
>>
>>     "category": "Advanced",
>>
>>     "description": "Time (in minutes) for async-jobs to be forcely
>> cancelled if it has been in process for long",
>>
>>     "name": "job.cancel.threshold.minutes",
>>
>>     "value": "60"
>>
>>
>>
>>
>> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
>> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
>>
>> Big +1 for this feature; I only have a few doubts.
>>
>> * Regarding the tasks/jobs that management servers (MSs) execute; are
>> these
>> tasks originate from requests that come to the MS, or is it possible that
>> requests received by one management server to be executed by other? I
>> mean,
>> if I execute a request against MS1, will this request always be
>> executed/threated by MS1, or is it possible that this request is executed
>> by another MS (e.g. MS2)?
>>
>> * I would suggest that after we block traffic coming from
>> 8080/8443/8250(we
>> will need to block this as well right?), we can log the execution of
>> tasks.
>> I mean, something saying, there are XXX tasks (enumerate tasks) still
>> being
>> executed, we will wait for them to finish before shutting down.
>>
>> * The timeout (60 minutes suggested) could be global settings that we can
>> load before executing the graceful-shutdown.
>>
>> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
>> ilya.mailing.lists@gmail.com<ma...@gmail.com>
>>
>> wrote:
>>
>> Use case:
>> In any environment - time to time - administrator needs to perform a
>> maintenance. Current stop sequence of cloudstack management server will
>> ignore the fact that there may be long running async jobs - and
>> terminate
>> the process. This in turn can create a poor user experience and
>> occasional
>> inconsistency  in cloudstack db.
>>
>> This is especially painful in large environments where the user has
>> thousands of nodes and there is a continuous patching that happens
>> around
>> the clock - that requires migration of workload from one node to
>> another.
>>
>> With that said - i've created a script that monitors the async job
>> queue
>> for given MS and waits for it complete all jobs. More details are
>> posted
>> below.
>>
>> I'd like to introduce "graceful-shutdown" into the systemctl/service of
>> cloudstack-management service.
>>
>> The details of how it will work is below:
>>
>> Workflow for graceful shutdown:
>> Using iptables/firewalld - block any connection attempts on 8080/8443
>> (we
>> can identify the ports dynamically)
>> Identify the MSID for the node, using the proper msid - query
>> async_job
>> table for
>> 1) any jobs that are still running (or job_status=“0”)
>> 2) job_dispatcher not like “pseudoJobDispatcher"
>> 3) job_init_msid=$my_ms_id
>>
>> Monitor this async_job table for 60 minutes - until all async jobs for
>> MSID
>> are done, then proceed with shutdown
>>   If failed for any reason or terminated, catch the exit via trap
>> command
>> and unblock the 8080/8443
>>
>> Comments are welcome
>>
>> Regards,
>> ilya
>>
>>
>>
>>
>> --
>> Rafael Weingärtner
>>
>>
>>
>>
>>
>> --
>> Rafael Weingärtner
>>
>
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
Hi Sergey

Glad to see you are doing well,

I was gonna say drop "enterprise virtualization company" and save a
$fortune$ - but its not for everyone :)

I'll post another proposed solution to bottom of this thread.

Regards
ilya


On Wed, Apr 4, 2018 at 5:22 PM, Sergey Levitskiy <se...@hotmail.com>
wrote:

> Now without spellchecking :)
>
> This is not simple e.g. for VMware. Each management server also acts as an
> agent proxy so tasks against a particular ESX host will be always
> forwarded. That right answer will be to support a native “maintenance mode”
> for management server. When entered to such mode the management server
> should release all agents including SSVM, block/redirect API calls and
> login request and finish all async job it originated.
>
>
>
> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@hotmail.com<mailto:
> serg38l@hotmail.com>> wrote:
>
> This is not simple e.g. for VMware. Each management server also acts as an
> agent proxy so tasks against a particular ESX host will be always
> forwarded. That right answer will be to a native support for “maintenance
> mode” for management server. When entered to such mode the management
> server should release all agents including save, block/redirect API calls
> and login request and finish all a sync job it originated.
>
> Sent from my iPhone
>
> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
>
> Ilya, still regarding the management server that is being shut down issue;
> if other MSs/or maybe system VMs (I am not sure to know if they are able to
> do such tasks) can direct/redirect/send new jobs to this management server
> (the one being shut down), the process might never end because new tasks
> are always being created for the management server that we want to shut
> down. Is this scenario possible?
>
> That is why I mentioned blocking the port 8250 for the “graceful-shutdown”.
>
> If this scenario is not possible, then everything s fine.
>
>
> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <ilya.mailing.lists@gmail.com
> <ma...@gmail.com>>
> wrote:
>
> I'm thinking of using a configuration from "job.cancel.threshold.minutes" -
> it will be the longest
>
>     "category": "Advanced",
>
>     "description": "Time (in minutes) for async-jobs to be forcely
> cancelled if it has been in process for long",
>
>     "name": "job.cancel.threshold.minutes",
>
>     "value": "60"
>
>
>
>
> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
>
> Big +1 for this feature; I only have a few doubts.
>
> * Regarding the tasks/jobs that management servers (MSs) execute; are
> these
> tasks originate from requests that come to the MS, or is it possible that
> requests received by one management server to be executed by other? I
> mean,
> if I execute a request against MS1, will this request always be
> executed/threated by MS1, or is it possible that this request is executed
> by another MS (e.g. MS2)?
>
> * I would suggest that after we block traffic coming from
> 8080/8443/8250(we
> will need to block this as well right?), we can log the execution of
> tasks.
> I mean, something saying, there are XXX tasks (enumerate tasks) still
> being
> executed, we will wait for them to finish before shutting down.
>
> * The timeout (60 minutes suggested) could be global settings that we can
> load before executing the graceful-shutdown.
>
> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> ilya.mailing.lists@gmail.com<ma...@gmail.com>
>
> wrote:
>
> Use case:
> In any environment - time to time - administrator needs to perform a
> maintenance. Current stop sequence of cloudstack management server will
> ignore the fact that there may be long running async jobs - and
> terminate
> the process. This in turn can create a poor user experience and
> occasional
> inconsistency  in cloudstack db.
>
> This is especially painful in large environments where the user has
> thousands of nodes and there is a continuous patching that happens
> around
> the clock - that requires migration of workload from one node to
> another.
>
> With that said - i've created a script that monitors the async job
> queue
> for given MS and waits for it complete all jobs. More details are
> posted
> below.
>
> I'd like to introduce "graceful-shutdown" into the systemctl/service of
> cloudstack-management service.
>
> The details of how it will work is below:
>
> Workflow for graceful shutdown:
> Using iptables/firewalld - block any connection attempts on 8080/8443
> (we
> can identify the ports dynamically)
> Identify the MSID for the node, using the proper msid - query
> async_job
> table for
> 1) any jobs that are still running (or job_status=“0”)
> 2) job_dispatcher not like “pseudoJobDispatcher"
> 3) job_init_msid=$my_ms_id
>
> Monitor this async_job table for 60 minutes - until all async jobs for
> MSID
> are done, then proceed with shutdown
>   If failed for any reason or terminated, catch the exit via trap
> command
> and unblock the 8080/8443
>
> Comments are welcome
>
> Regards,
> ilya
>
>
>
>
> --
> Rafael Weingärtner
>
>
>
>
>
> --
> Rafael Weingärtner
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
Marc

Thank you posting the details on how your implementation works.
Unfortunately for us - HAproxy is not an option - hence we cant take
advantage of this implementation, but please do share with the community -
perhaps it will help someone else.

I'm going to post to the bottom of this thread with new proposed solution.

Regards
ilya

On Thu, Apr 5, 2018 at 2:36 AM, Marc-Aurèle Brothier <ma...@exoscale.ch>
wrote:

> Hi all,
>
> Good point ilya but as stated by Sergey there's more thing to consider
> before being able to do a proper shutdown. I augmented my script I gave you
> originally and changed code in CS. What we're doing for our environment is
> as follow:
>
> 1. the MGMT looks for a change in the file /etc/lb-agent which contains
> keywords for HAproxy[2] (ready, maint) so that HA-proxy can disable the
> mgmt on the keyword "maint" and the mgmt server stops a couple of
> threads[1] to stop processing async jobs in the queue
> 2. Looks for the async jobs and wait until there is none to ensure you can
> send the reconnect commands (if jobs are running, a reconnect will result
> in a failed job since the result will never reach the management server -
> the agent waits for the current job to be done before reconnecting, and
> discard the result... rooms for improvement here!)
> 3. Issue a reconnectHost command to all the hosts connected to the mgmt
> server so that they reconnect to another one, otherwise the mgmt must be up
> since it is used to forward commands to agents.
> 4. when all agents are reconnected, we can shutdown the management server
> and perform the maintenance.
>
> One issue remains for me, during the reconnect, the commands that are
> processed at the same time should be kept in a queue until the agents have
> finished any current jobs and have reconnected. Today the little time
> window during which the reconnect happens can lead to failed jobs due to
> the agent not being connected at the right moment.
>
> I could push a PR for the change to stop some processing threads based on
> the content of a file. It's possible also to cancel the drain of the
> management by simply changing the content of the file back to "ready"
> again, instead of "maint" [2].
>
> [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> [2] HA proxy documentation on agent checker: https://cbonte.github.io/
> haproxy-dconv/1.6/configuration.html#5.2-agent-check
>
> Regarding your issue on the port blocking, I think it's fair to consider
> that if you want to shutdown your server at some point, you have to stop
> serving (some) requests. Here the only way it's to stop serving everything.
> If the API had a REST design, we could reject any POST/PUT/DELETE
> operations and allow GET ones. I don't know how hard it would be today to
> only allow listBaseCmd operations to be more friendly with the users.
>
> Marco
>
>
> On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <se...@hotmail.com>
> wrote:
>
> > Now without spellchecking :)
> >
> > This is not simple e.g. for VMware. Each management server also acts as
> an
> > agent proxy so tasks against a particular ESX host will be always
> > forwarded. That right answer will be to support a native “maintenance
> mode”
> > for management server. When entered to such mode the management server
> > should release all agents including SSVM, block/redirect API calls and
> > login request and finish all async job it originated.
> >
> >
> >
> > On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@hotmail.com
> <mailto:
> > serg38l@hotmail.com>> wrote:
> >
> > This is not simple e.g. for VMware. Each management server also acts as
> an
> > agent proxy so tasks against a particular ESX host will be always
> > forwarded. That right answer will be to a native support for “maintenance
> > mode” for management server. When entered to such mode the management
> > server should release all agents including save, block/redirect API calls
> > and login request and finish all a sync job it originated.
> >
> > Sent from my iPhone
> >
> > On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> > rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
> >
> > Ilya, still regarding the management server that is being shut down
> issue;
> > if other MSs/or maybe system VMs (I am not sure to know if they are able
> to
> > do such tasks) can direct/redirect/send new jobs to this management
> server
> > (the one being shut down), the process might never end because new tasks
> > are always being created for the management server that we want to shut
> > down. Is this scenario possible?
> >
> > That is why I mentioned blocking the port 8250 for the
> “graceful-shutdown”.
> >
> > If this scenario is not possible, then everything s fine.
> >
> >
> > On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> ilya.mailing.lists@gmail.com
> > <ma...@gmail.com>>
> > wrote:
> >
> > I'm thinking of using a configuration from
> "job.cancel.threshold.minutes" -
> > it will be the longest
> >
> >     "category": "Advanced",
> >
> >     "description": "Time (in minutes) for async-jobs to be forcely
> > cancelled if it has been in process for long",
> >
> >     "name": "job.cancel.threshold.minutes",
> >
> >     "value": "60"
> >
> >
> >
> >
> > On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> > rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
> >
> > Big +1 for this feature; I only have a few doubts.
> >
> > * Regarding the tasks/jobs that management servers (MSs) execute; are
> > these
> > tasks originate from requests that come to the MS, or is it possible that
> > requests received by one management server to be executed by other? I
> > mean,
> > if I execute a request against MS1, will this request always be
> > executed/threated by MS1, or is it possible that this request is executed
> > by another MS (e.g. MS2)?
> >
> > * I would suggest that after we block traffic coming from
> > 8080/8443/8250(we
> > will need to block this as well right?), we can log the execution of
> > tasks.
> > I mean, something saying, there are XXX tasks (enumerate tasks) still
> > being
> > executed, we will wait for them to finish before shutting down.
> >
> > * The timeout (60 minutes suggested) could be global settings that we can
> > load before executing the graceful-shutdown.
> >
> > On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> > ilya.mailing.lists@gmail.com<ma...@gmail.com>
> >
> > wrote:
> >
> > Use case:
> > In any environment - time to time - administrator needs to perform a
> > maintenance. Current stop sequence of cloudstack management server will
> > ignore the fact that there may be long running async jobs - and
> > terminate
> > the process. This in turn can create a poor user experience and
> > occasional
> > inconsistency  in cloudstack db.
> >
> > This is especially painful in large environments where the user has
> > thousands of nodes and there is a continuous patching that happens
> > around
> > the clock - that requires migration of workload from one node to
> > another.
> >
> > With that said - i've created a script that monitors the async job
> > queue
> > for given MS and waits for it complete all jobs. More details are
> > posted
> > below.
> >
> > I'd like to introduce "graceful-shutdown" into the systemctl/service of
> > cloudstack-management service.
> >
> > The details of how it will work is below:
> >
> > Workflow for graceful shutdown:
> > Using iptables/firewalld - block any connection attempts on 8080/8443
> > (we
> > can identify the ports dynamically)
> > Identify the MSID for the node, using the proper msid - query
> > async_job
> > table for
> > 1) any jobs that are still running (or job_status=“0”)
> > 2) job_dispatcher not like “pseudoJobDispatcher"
> > 3) job_init_msid=$my_ms_id
> >
> > Monitor this async_job table for 60 minutes - until all async jobs for
> > MSID
> > are done, then proceed with shutdown
> >   If failed for any reason or terminated, catch the exit via trap
> > command
> > and unblock the 8080/8443
> >
> > Comments are welcome
> >
> > Regards,
> > ilya
> >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
> >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Marc-Aurèle Brothier <ma...@exoscale.ch>.
As we are already using a list management server API calls to handle the
scripting of the shutdown/upgrade/start, I manually backported the code:

https://github.com/apache/cloudstack/pull/2578

On Tue, Apr 17, 2018 at 9:31 PM, Rafael Weingärtner <
rafaelweingartner@gmail.com> wrote:

> Ron, that is a good analogy.
>
> There is something else that I forgot to mention. We discussed the issue of
> migrating Jobs/tasks to other management servers. This is not something
> easy to achieve because of the way it is currently implemented in ACS.
> However, as soon as we have a more comprehensive solution to a graceful
> shutdown, this becomes something feasible for us to work on.
>
> I do not know if Ilya is going to develop a graceful shutdown or if someone
> else will pick this up, but we are willing to work on it. Of course, it is
> not something that we would develop right away because it will probably
> take quite some work, and we have some other priorities. However,  I will
> discuss this further internally and see what we can come up with.
>
> On Tue, Apr 17, 2018 at 1:46 PM, Ron Wheeler <rwheeler@artifact-software.
> com
> > wrote:
>
> > Part of this sounds like the Windows shut down process which is familiar
> > to many.
> >
> > For those who have never used Windows:
> >
> > Once you initiate the shutdown, it asks the tasks to shut down.
> > If tasks have not shutdown within a "reasonable period", it lists them
> and
> > asks you if you want to wait a bit longer, force them to close or abort
> the
> > shutdown so that you can manually shut them down.
> > If you "force" a shutdown it closes all of the tasks using all of the
> > brutality at its command.
> > If you abort, then you have to redo the shutdown after you have manually
> > exited from the processes that you care about.
> >
> > This is pretty user friendly but requires that you have a way to signal
> to
> > a task that it is time to say goodbye.
> >
> > The "reasonable time" needs to have a default that is short enough to
> make
> > the operator happy and long enough to have a reasonable chance of getting
> > everything stopped without intervention. If you allow the shutdown to
> > proceed after the interval, while the operator waits then you need to
> > refresh the list of running tasks when tasks end.
> >
> > Ron
> >
> >
> > On 17/04/2018 11:27 AM, Rafael Weingärtner wrote:
> >
> >> Ilya and others,
> >>
> >> We have been discussing this idea of graceful/nicely shutdown.  Our
> >> feeling
> >> is that we (in CloudStack community) might have been trying to solve
> this
> >> problem with too much scripting. What if we developed a more integrated
> >> (native) solution?
> >>
> >> Let me explain our idea.
> >>
> >> ACS has a table called “mshost”, which is used to store management
> server
> >> information. During balancing and when jobs are dispatched to other
> >> management servers this table is consulted/queried.  Therefore, we have
> >> been discussing the idea of creating a management API for management
> >> servers.  We could have an API method that changes the state of
> management
> >> servers to “prepare to maintenance” and then “maintenance” (as soon as
> all
> >> of the task/jobs it is managing finish). The idea is that during
> >> rebalancing we would remove the hosts of servers that are not in “Up”
> >> state
> >> (of course we would also ignore hosts in the aforementioned state to
> >> receive hosts to manage).  Moreover, when we send/dispatch jobs to other
> >> management servers, we could ignore the ones that are not in “Up” state
> >> (which is something already done).
> >>
> >> By doing this, the nicely shutdown could be executed in a few steps.
> >>
> >> 1 – issue the maintenance method for the management server you desire
> >> 2 – wait until the MS goes into maintenance mode, while there are still
> >> running jobs it (the management server) will be maintained in prepare
> for
> >> maintenance
> >> 3 – execute the Linux shutdown command
> >>
> >> We would need other APIs methods to manage MSs then. An (i) API method
> to
> >> list MSs, and we could even create an (ii) API to remove
> old/de-activated
> >> management servers, which we currently do not have (forcing users to
> apply
> >> changed directly in the database).
> >>
> >> Moreover, in this model, we would not kill hanging jobs; we would wait
> >> until they expire and ACS expunges them. Of course, it is possible to
> >> develop a forceful maintenance method as well. Then, when the “prepare
> for
> >> maintenance” takes longer than a parameter, we could kill hanging jobs.
> >>
> >> All of this would allow the MS to be kept up and receiving requests
> until
> >> it can be safely shutdown. What do you guys about this approach?
> >>
> >> On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yz...@marketo.com>
> wrote:
> >>
> >> As a cloud admin, I would love to have this feature.
> >>>
> >>> It so happens that I just accidentally restarted my ACS management
> server
> >>> while two instances are migrating to another Xen cluster (via storage
> >>> migration, not live migration).  As results, both instances
> >>> ends up with corrupted data disk which can't be reattached or migrated.
> >>>
> >>> Any feature which prevents this from happening would be great.  A low
> >>> hanging fruit is simply checking for
> >>> if there are any async jobs running, especially any kind of migration
> >>> jobs
> >>> or other known long running type of
> >>> jobs and warn the operator  so that he has a chance to abort server
> >>> shutdowns.
> >>>
> >>> Yiping
> >>>
> >>> On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com>
> >>> wrote:
> >>>
> >>>      Andrija
> >>>
> >>>      This is a tough scenario.
> >>>
> >>>      As an admin, they way i would have handled this situation, is to
> >>> advertise
> >>>      the upcoming outage and then take away specific API commands from
> a
> >>> user a
> >>>      day before - so he does not cause any long running async jobs.
> Once
> >>>      maintenance completes - enable the API commands back to the user.
> >>> However -
> >>>      i dont know who your user base is and if this would be an
> acceptable
> >>>      solution.
> >>>
> >>>      Perhaps also investigate what can be done to speed up your long
> >>> running
> >>>      tasks...
> >>>
> >>>      As a side node, we will be working on a feature that would allow
> >>> for a
> >>>      graceful termination of the process/job, meaning if agent noticed
> a
> >>>      disconnect or termination request - it will abort the command in
> >>> flight. We
> >>>      can also consider restarting this tasks again or what not - but it
> >>> would
> >>>      not be part of this enhancement.
> >>>
> >>>      Regards
> >>>      ilya
> >>>
> >>>      On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <
> >>> andrija.panic@gmail.com
> >>>      wrote:
> >>>
> >>>      > Hi Ilya,
> >>>      >
> >>>      > thanks for the feedback - but in "real world", you need to
> >>> "understand"
> >>>      > that 60min is next to useless timeout for some jobs (if I
> >>> understand
> >>> this
> >>>      > specific parameter correctly ?? - job is really canceled, not
> only
> >>> job
> >>>      > monitoring is canceled ???) -
> >>>      >
> >>>      > My value for the  "job.cancel.threshold.minutes" is 2880 minutes
> >>> (2
> >>> days?)
> >>>      >
> >>>      > I can tell you when you have CEPH/NFS (CEPH even "worse" case,
> >>> since
> >>> slower
> >>>      > read durign qemu-img convert process...) of 500GB, then imagine
> >>> snapshot
> >>>      > job will take many hours. Should I mention 1TB volumes (yes, we
> >>> had
> >>>      > client's like that...)
> >>>      > Than attaching 1TB volume, that was uploaded to ACS (lives
> >>> originally on
> >>>      > Secondary Storage, and takes time to be copied over to NFS/CEPH)
> >>> will take
> >>>      > up to few hours.
> >>>      > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
> >>> takes
> >>>      > time...etc.
> >>>      >
> >>>      > I'm just giving you feedback as "user", admin of the cloud, zero
> >>> DEV
> >>> skills
> >>>      > here :) , just to make sure you make practical decisions (and I
> >>> admit I
> >>>      > might be wrong with my stuff, but just giving you feedback from
> >>> our
> >>> public
> >>>      > cloud setup)
> >>>      >
> >>>      >
> >>>      > Cheers!
> >>>      >
> >>>      >
> >>>      >
> >>>      >
> >>>      > On 5 April 2018 at 15:16, Tutkowski, Mike <
> >>> Mike.Tutkowski@netapp.com
> >>>      > wrote:
> >>>      >
> >>>      > > Wow, there’s been a lot of good details noted from several
> >>> people
> >>> on how
> >>>      > > this process works today and how we’d like it to work in the
> >>> near
> >>> future.
> >>>      > >
> >>>      > > 1) Any chance this is already documented on the Wiki?
> >>>      > >
> >>>      > > 2) If not, any chance someone would be willing to do so (a
> flow
> >>> diagram
> >>>      > > would be particularly useful).
> >>>      > >
> >>>      > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
> >>> marco@exoscale.ch>
> >>>      > > wrote:
> >>>      > > >
> >>>      > > > Hi all,
> >>>      > > >
> >>>      > > > Good point ilya but as stated by Sergey there's more thing
> to
> >>> consider
> >>>      > > > before being able to do a proper shutdown. I augmented my
> >>> script
> >>> I gave
> >>>      > > you
> >>>      > > > originally and changed code in CS. What we're doing for our
> >>> environment
> >>>      > > is
> >>>      > > > as follow:
> >>>      > > >
> >>>      > > > 1. the MGMT looks for a change in the file /etc/lb-agent
> which
> >>> contains
> >>>      > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
> >>> disable the
> >>>      > > > mgmt on the keyword "maint" and the mgmt server stops a
> >>> couple of
> >>>      > > > threads[1] to stop processing async jobs in the queue
> >>>      > > > 2. Looks for the async jobs and wait until there is none to
> >>> ensure you
> >>>      > > can
> >>>      > > > send the reconnect commands (if jobs are running, a
> reconnect
> >>> will
> >>>      > result
> >>>      > > > in a failed job since the result will never reach the
> >>> management
> >>>      > server -
> >>>      > > > the agent waits for the current job to be done before
> >>> reconnecting, and
> >>>      > > > discard the result... rooms for improvement here!)
> >>>      > > > 3. Issue a reconnectHost command to all the hosts connected
> to
> >>> the mgmt
> >>>      > > > server so that they reconnect to another one, otherwise the
> >>> mgmt
> >>> must
> >>>      > be
> >>>      > > up
> >>>      > > > since it is used to forward commands to agents.
> >>>      > > > 4. when all agents are reconnected, we can shutdown the
> >>> management
> >>>      > server
> >>>      > > > and perform the maintenance.
> >>>      > > >
> >>>      > > > One issue remains for me, during the reconnect, the commands
> >>> that are
> >>>      > > > processed at the same time should be kept in a queue until
> the
> >>> agents
> >>>      > > have
> >>>      > > > finished any current jobs and have reconnected. Today the
> >>> little
> >>> time
> >>>      > > > window during which the reconnect happens can lead to failed
> >>> jobs due
> >>>      > to
> >>>      > > > the agent not being connected at the right moment.
> >>>      > > >
> >>>      > > > I could push a PR for the change to stop some processing
> >>> threads
> >>> based
> >>>      > on
> >>>      > > > the content of a file. It's possible also to cancel the
> drain
> >>> of
> >>> the
> >>>      > > > management by simply changing the content of the file back
> to
> >>> "ready"
> >>>      > > > again, instead of "maint" [2].
> >>>      > > >
> >>>      > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> >>>      > > > [2] HA proxy documentation on agent checker:
> >>> https://cbonte.github.io/
> >>>      > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> >>>      > > >
> >>>      > > > Regarding your issue on the port blocking, I think it's fair
> >>> to
> >>>      > consider
> >>>      > > > that if you want to shutdown your server at some point, you
> >>> have
> >>> to
> >>>      > stop
> >>>      > > > serving (some) requests. Here the only way it's to stop
> >>> serving
> >>>      > > everything.
> >>>      > > > If the API had a REST design, we could reject any
> >>> POST/PUT/DELETE
> >>>      > > > operations and allow GET ones. I don't know how hard it
> would
> >>> be
> >>> today
> >>>      > to
> >>>      > > > only allow listBaseCmd operations to be more friendly with
> the
> >>> users.
> >>>      > > >
> >>>      > > > Marco
> >>>      > > >
> >>>      > > >
> >>>      > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
> >>> serg38l@hotmail.com>
> >>>      > > > wrote:
> >>>      > > >
> >>>      > > >> Now without spellchecking :)
> >>>      > > >>
> >>>      > > >> This is not simple e.g. for VMware. Each management server
> >>> also
> >>> acts
> >>>      > as
> >>>      > > an
> >>>      > > >> agent proxy so tasks against a particular ESX host will be
> >>> always
> >>>      > > >> forwarded. That right answer will be to support a native
> >>> “maintenance
> >>>      > > mode”
> >>>      > > >> for management server. When entered to such mode the
> >>> management
> >>> server
> >>>      > > >> should release all agents including SSVM, block/redirect
> API
> >>> calls and
> >>>      > > >> login request and finish all async job it originated.
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
> >>> serg38l@hotmail.com
> >>>      > > <mailto:
> >>>      > > >> serg38l@hotmail.com>> wrote:
> >>>      > > >>
> >>>      > > >> This is not simple e.g. for VMware. Each management server
> >>> also
> >>> acts
> >>>      > as
> >>>      > > an
> >>>      > > >> agent proxy so tasks against a particular ESX host will be
> >>> always
> >>>      > > >> forwarded. That right answer will be to a native support
> for
> >>>      > > “maintenance
> >>>      > > >> mode” for management server. When entered to such mode the
> >>> management
> >>>      > > >> server should release all agents including save,
> >>> block/redirect
> >>> API
> >>>      > > calls
> >>>      > > >> and login request and finish all a sync job it originated.
> >>>      > > >>
> >>>      > > >> Sent from my iPhone
> >>>      > > >>
> >>>      > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> >>>      > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail
> .
> >>> com
> >>>      > wrote:
> >>>      > > >>
> >>>      > > >> Ilya, still regarding the management server that is being
> >>> shut
> >>> down
> >>>      > > issue;
> >>>      > > >> if other MSs/or maybe system VMs (I am not sure to know if
> >>> they
> >>> are
> >>>      > > able to
> >>>      > > >> do such tasks) can direct/redirect/send new jobs to this
> >>> management
> >>>      > > server
> >>>      > > >> (the one being shut down), the process might never end
> >>> because
> >>> new
> >>>      > tasks
> >>>      > > >> are always being created for the management server that we
> >>> want
> >>> to
> >>>      > shut
> >>>      > > >> down. Is this scenario possible?
> >>>      > > >>
> >>>      > > >> That is why I mentioned blocking the port 8250 for the
> >>>      > > “graceful-shutdown”.
> >>>      > > >>
> >>>      > > >> If this scenario is not possible, then everything s fine.
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> >>>      > > ilya.mailing.lists@gmail.com
> >>>      > > >> <ma...@gmail.com>>
> >>>      > > >> wrote:
> >>>      > > >>
> >>>      > > >> I'm thinking of using a configuration from
> >>>      > > "job.cancel.threshold.minutes" -
> >>>      > > >> it will be the longest
> >>>      > > >>
> >>>      > > >>    "category": "Advanced",
> >>>      > > >>
> >>>      > > >>    "description": "Time (in minutes) for async-jobs to be
> >>> forcely
> >>>      > > >> cancelled if it has been in process for long",
> >>>      > > >>
> >>>      > > >>    "name": "job.cancel.threshold.minutes",
> >>>      > > >>
> >>>      > > >>    "value": "60"
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> >>>      > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail
> .
> >>> com
> >>>      > wrote:
> >>>      > > >>
> >>>      > > >> Big +1 for this feature; I only have a few doubts.
> >>>      > > >>
> >>>      > > >> * Regarding the tasks/jobs that management servers (MSs)
> >>> execute; are
> >>>      > > >> these
> >>>      > > >> tasks originate from requests that come to the MS, or is it
> >>> possible
> >>>      > > that
> >>>      > > >> requests received by one management server to be executed
> by
> >>> other? I
> >>>      > > >> mean,
> >>>      > > >> if I execute a request against MS1, will this request
> always
> >>> be
> >>>      > > >> executed/threated by MS1, or is it possible that this
> >>> request is
> >>>      > > executed
> >>>      > > >> by another MS (e.g. MS2)?
> >>>      > > >>
> >>>      > > >> * I would suggest that after we block traffic coming from
> >>>      > > >> 8080/8443/8250(we
> >>>      > > >> will need to block this as well right?), we can log the
> >>> execution of
> >>>      > > >> tasks.
> >>>      > > >> I mean, something saying, there are XXX tasks (enumerate
> >>> tasks)
> >>> still
> >>>      > > >> being
> >>>      > > >> executed, we will wait for them to finish before shutting
> >>> down.
> >>>      > > >>
> >>>      > > >> * The timeout (60 minutes suggested) could be global
> settings
> >>> that we
> >>>      > > can
> >>>      > > >> load before executing the graceful-shutdown.
> >>>      > > >>
> >>>      > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> >>>      > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
> >>> gmail.com>
> >>>      > > >>
> >>>      > > >> wrote:
> >>>      > > >>
> >>>      > > >> Use case:
> >>>      > > >> In any environment - time to time - administrator needs to
> >>> perform a
> >>>      > > >> maintenance. Current stop sequence of cloudstack management
> >>> server
> >>>      > will
> >>>      > > >> ignore the fact that there may be long running async jobs -
> >>> and
> >>>      > > >> terminate
> >>>      > > >> the process. This in turn can create a poor user experience
> >>> and
> >>>      > > >> occasional
> >>>      > > >> inconsistency  in cloudstack db.
> >>>      > > >>
> >>>      > > >> This is especially painful in large environments where the
> >>> user
> >>> has
> >>>      > > >> thousands of nodes and there is a continuous patching that
> >>> happens
> >>>      > > >> around
> >>>      > > >> the clock - that requires migration of workload from one
> >>> node to
> >>>      > > >> another.
> >>>      > > >>
> >>>      > > >> With that said - i've created a script that monitors the
> >>> async
> >>> job
> >>>      > > >> queue
> >>>      > > >> for given MS and waits for it complete all jobs. More
> details
> >>> are
> >>>      > > >> posted
> >>>      > > >> below.
> >>>      > > >>
> >>>      > > >> I'd like to introduce "graceful-shutdown" into the
> >>> systemctl/service
> >>>      > of
> >>>      > > >> cloudstack-management service.
> >>>      > > >>
> >>>      > > >> The details of how it will work is below:
> >>>      > > >>
> >>>      > > >> Workflow for graceful shutdown:
> >>>      > > >> Using iptables/firewalld - block any connection attempts on
> >>> 8080/8443
> >>>      > > >> (we
> >>>      > > >> can identify the ports dynamically)
> >>>      > > >> Identify the MSID for the node, using the proper msid -
> query
> >>>      > > >> async_job
> >>>      > > >> table for
> >>>      > > >> 1) any jobs that are still running (or job_status=“0”)
> >>>      > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> >>>      > > >> 3) job_init_msid=$my_ms_id
> >>>      > > >>
> >>>      > > >> Monitor this async_job table for 60 minutes - until all
> async
> >>> jobs for
> >>>      > > >> MSID
> >>>      > > >> are done, then proceed with shutdown
> >>>      > > >>  If failed for any reason or terminated, catch the exit via
> >>> trap
> >>>      > > >> command
> >>>      > > >> and unblock the 8080/8443
> >>>      > > >>
> >>>      > > >> Comments are welcome
> >>>      > > >>
> >>>      > > >> Regards,
> >>>      > > >> ilya
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> --
> >>>      > > >> Rafael Weingärtner
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> --
> >>>      > > >> Rafael Weingärtner
> >>>      > > >>
> >>>      > >
> >>>      >
> >>>      >
> >>>      >
> >>>      > --
> >>>      >
> >>>      > Andrija Panić
> >>>      >
> >>>
> >>>
> >>>
> >>>
> >>
> > --
> > Ron Wheeler
> > President
> > Artifact Software Inc
> > email: rwheeler@artifact-software.com
> > skype: ronaldmwheeler
> > phone: 866-970-2435, ext 102
> >
> >
>
>
> --
> Rafael Weingärtner
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Rafael Weingärtner <ra...@gmail.com>.
Ron, that is a good analogy.

There is something else that I forgot to mention. We discussed the issue of
migrating Jobs/tasks to other management servers. This is not something
easy to achieve because of the way it is currently implemented in ACS.
However, as soon as we have a more comprehensive solution to a graceful
shutdown, this becomes something feasible for us to work on.

I do not know if Ilya is going to develop a graceful shutdown or if someone
else will pick this up, but we are willing to work on it. Of course, it is
not something that we would develop right away because it will probably
take quite some work, and we have some other priorities. However,  I will
discuss this further internally and see what we can come up with.

On Tue, Apr 17, 2018 at 1:46 PM, Ron Wheeler <rwheeler@artifact-software.com
> wrote:

> Part of this sounds like the Windows shut down process which is familiar
> to many.
>
> For those who have never used Windows:
>
> Once you initiate the shutdown, it asks the tasks to shut down.
> If tasks have not shutdown within a "reasonable period", it lists them and
> asks you if you want to wait a bit longer, force them to close or abort the
> shutdown so that you can manually shut them down.
> If you "force" a shutdown it closes all of the tasks using all of the
> brutality at its command.
> If you abort, then you have to redo the shutdown after you have manually
> exited from the processes that you care about.
>
> This is pretty user friendly but requires that you have a way to signal to
> a task that it is time to say goodbye.
>
> The "reasonable time" needs to have a default that is short enough to make
> the operator happy and long enough to have a reasonable chance of getting
> everything stopped without intervention. If you allow the shutdown to
> proceed after the interval, while the operator waits then you need to
> refresh the list of running tasks when tasks end.
>
> Ron
>
>
> On 17/04/2018 11:27 AM, Rafael Weingärtner wrote:
>
>> Ilya and others,
>>
>> We have been discussing this idea of graceful/nicely shutdown.  Our
>> feeling
>> is that we (in CloudStack community) might have been trying to solve this
>> problem with too much scripting. What if we developed a more integrated
>> (native) solution?
>>
>> Let me explain our idea.
>>
>> ACS has a table called “mshost”, which is used to store management server
>> information. During balancing and when jobs are dispatched to other
>> management servers this table is consulted/queried.  Therefore, we have
>> been discussing the idea of creating a management API for management
>> servers.  We could have an API method that changes the state of management
>> servers to “prepare to maintenance” and then “maintenance” (as soon as all
>> of the task/jobs it is managing finish). The idea is that during
>> rebalancing we would remove the hosts of servers that are not in “Up”
>> state
>> (of course we would also ignore hosts in the aforementioned state to
>> receive hosts to manage).  Moreover, when we send/dispatch jobs to other
>> management servers, we could ignore the ones that are not in “Up” state
>> (which is something already done).
>>
>> By doing this, the nicely shutdown could be executed in a few steps.
>>
>> 1 – issue the maintenance method for the management server you desire
>> 2 – wait until the MS goes into maintenance mode, while there are still
>> running jobs it (the management server) will be maintained in prepare for
>> maintenance
>> 3 – execute the Linux shutdown command
>>
>> We would need other APIs methods to manage MSs then. An (i) API method to
>> list MSs, and we could even create an (ii) API to remove old/de-activated
>> management servers, which we currently do not have (forcing users to apply
>> changed directly in the database).
>>
>> Moreover, in this model, we would not kill hanging jobs; we would wait
>> until they expire and ACS expunges them. Of course, it is possible to
>> develop a forceful maintenance method as well. Then, when the “prepare for
>> maintenance” takes longer than a parameter, we could kill hanging jobs.
>>
>> All of this would allow the MS to be kept up and receiving requests until
>> it can be safely shutdown. What do you guys about this approach?
>>
>> On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yz...@marketo.com> wrote:
>>
>> As a cloud admin, I would love to have this feature.
>>>
>>> It so happens that I just accidentally restarted my ACS management server
>>> while two instances are migrating to another Xen cluster (via storage
>>> migration, not live migration).  As results, both instances
>>> ends up with corrupted data disk which can't be reattached or migrated.
>>>
>>> Any feature which prevents this from happening would be great.  A low
>>> hanging fruit is simply checking for
>>> if there are any async jobs running, especially any kind of migration
>>> jobs
>>> or other known long running type of
>>> jobs and warn the operator  so that he has a chance to abort server
>>> shutdowns.
>>>
>>> Yiping
>>>
>>> On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com>
>>> wrote:
>>>
>>>      Andrija
>>>
>>>      This is a tough scenario.
>>>
>>>      As an admin, they way i would have handled this situation, is to
>>> advertise
>>>      the upcoming outage and then take away specific API commands from a
>>> user a
>>>      day before - so he does not cause any long running async jobs. Once
>>>      maintenance completes - enable the API commands back to the user.
>>> However -
>>>      i dont know who your user base is and if this would be an acceptable
>>>      solution.
>>>
>>>      Perhaps also investigate what can be done to speed up your long
>>> running
>>>      tasks...
>>>
>>>      As a side node, we will be working on a feature that would allow
>>> for a
>>>      graceful termination of the process/job, meaning if agent noticed a
>>>      disconnect or termination request - it will abort the command in
>>> flight. We
>>>      can also consider restarting this tasks again or what not - but it
>>> would
>>>      not be part of this enhancement.
>>>
>>>      Regards
>>>      ilya
>>>
>>>      On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <
>>> andrija.panic@gmail.com
>>>      wrote:
>>>
>>>      > Hi Ilya,
>>>      >
>>>      > thanks for the feedback - but in "real world", you need to
>>> "understand"
>>>      > that 60min is next to useless timeout for some jobs (if I
>>> understand
>>> this
>>>      > specific parameter correctly ?? - job is really canceled, not only
>>> job
>>>      > monitoring is canceled ???) -
>>>      >
>>>      > My value for the  "job.cancel.threshold.minutes" is 2880 minutes
>>> (2
>>> days?)
>>>      >
>>>      > I can tell you when you have CEPH/NFS (CEPH even "worse" case,
>>> since
>>> slower
>>>      > read durign qemu-img convert process...) of 500GB, then imagine
>>> snapshot
>>>      > job will take many hours. Should I mention 1TB volumes (yes, we
>>> had
>>>      > client's like that...)
>>>      > Than attaching 1TB volume, that was uploaded to ACS (lives
>>> originally on
>>>      > Secondary Storage, and takes time to be copied over to NFS/CEPH)
>>> will take
>>>      > up to few hours.
>>>      > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
>>> takes
>>>      > time...etc.
>>>      >
>>>      > I'm just giving you feedback as "user", admin of the cloud, zero
>>> DEV
>>> skills
>>>      > here :) , just to make sure you make practical decisions (and I
>>> admit I
>>>      > might be wrong with my stuff, but just giving you feedback from
>>> our
>>> public
>>>      > cloud setup)
>>>      >
>>>      >
>>>      > Cheers!
>>>      >
>>>      >
>>>      >
>>>      >
>>>      > On 5 April 2018 at 15:16, Tutkowski, Mike <
>>> Mike.Tutkowski@netapp.com
>>>      > wrote:
>>>      >
>>>      > > Wow, there’s been a lot of good details noted from several
>>> people
>>> on how
>>>      > > this process works today and how we’d like it to work in the
>>> near
>>> future.
>>>      > >
>>>      > > 1) Any chance this is already documented on the Wiki?
>>>      > >
>>>      > > 2) If not, any chance someone would be willing to do so (a flow
>>> diagram
>>>      > > would be particularly useful).
>>>      > >
>>>      > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
>>> marco@exoscale.ch>
>>>      > > wrote:
>>>      > > >
>>>      > > > Hi all,
>>>      > > >
>>>      > > > Good point ilya but as stated by Sergey there's more thing to
>>> consider
>>>      > > > before being able to do a proper shutdown. I augmented my
>>> script
>>> I gave
>>>      > > you
>>>      > > > originally and changed code in CS. What we're doing for our
>>> environment
>>>      > > is
>>>      > > > as follow:
>>>      > > >
>>>      > > > 1. the MGMT looks for a change in the file /etc/lb-agent which
>>> contains
>>>      > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
>>> disable the
>>>      > > > mgmt on the keyword "maint" and the mgmt server stops a
>>> couple of
>>>      > > > threads[1] to stop processing async jobs in the queue
>>>      > > > 2. Looks for the async jobs and wait until there is none to
>>> ensure you
>>>      > > can
>>>      > > > send the reconnect commands (if jobs are running, a reconnect
>>> will
>>>      > result
>>>      > > > in a failed job since the result will never reach the
>>> management
>>>      > server -
>>>      > > > the agent waits for the current job to be done before
>>> reconnecting, and
>>>      > > > discard the result... rooms for improvement here!)
>>>      > > > 3. Issue a reconnectHost command to all the hosts connected to
>>> the mgmt
>>>      > > > server so that they reconnect to another one, otherwise the
>>> mgmt
>>> must
>>>      > be
>>>      > > up
>>>      > > > since it is used to forward commands to agents.
>>>      > > > 4. when all agents are reconnected, we can shutdown the
>>> management
>>>      > server
>>>      > > > and perform the maintenance.
>>>      > > >
>>>      > > > One issue remains for me, during the reconnect, the commands
>>> that are
>>>      > > > processed at the same time should be kept in a queue until the
>>> agents
>>>      > > have
>>>      > > > finished any current jobs and have reconnected. Today the
>>> little
>>> time
>>>      > > > window during which the reconnect happens can lead to failed
>>> jobs due
>>>      > to
>>>      > > > the agent not being connected at the right moment.
>>>      > > >
>>>      > > > I could push a PR for the change to stop some processing
>>> threads
>>> based
>>>      > on
>>>      > > > the content of a file. It's possible also to cancel the drain
>>> of
>>> the
>>>      > > > management by simply changing the content of the file back to
>>> "ready"
>>>      > > > again, instead of "maint" [2].
>>>      > > >
>>>      > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
>>>      > > > [2] HA proxy documentation on agent checker:
>>> https://cbonte.github.io/
>>>      > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
>>>      > > >
>>>      > > > Regarding your issue on the port blocking, I think it's fair
>>> to
>>>      > consider
>>>      > > > that if you want to shutdown your server at some point, you
>>> have
>>> to
>>>      > stop
>>>      > > > serving (some) requests. Here the only way it's to stop
>>> serving
>>>      > > everything.
>>>      > > > If the API had a REST design, we could reject any
>>> POST/PUT/DELETE
>>>      > > > operations and allow GET ones. I don't know how hard it would
>>> be
>>> today
>>>      > to
>>>      > > > only allow listBaseCmd operations to be more friendly with the
>>> users.
>>>      > > >
>>>      > > > Marco
>>>      > > >
>>>      > > >
>>>      > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
>>> serg38l@hotmail.com>
>>>      > > > wrote:
>>>      > > >
>>>      > > >> Now without spellchecking :)
>>>      > > >>
>>>      > > >> This is not simple e.g. for VMware. Each management server
>>> also
>>> acts
>>>      > as
>>>      > > an
>>>      > > >> agent proxy so tasks against a particular ESX host will be
>>> always
>>>      > > >> forwarded. That right answer will be to support a native
>>> “maintenance
>>>      > > mode”
>>>      > > >> for management server. When entered to such mode the
>>> management
>>> server
>>>      > > >> should release all agents including SSVM, block/redirect API
>>> calls and
>>>      > > >> login request and finish all async job it originated.
>>>      > > >>
>>>      > > >>
>>>      > > >>
>>>      > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
>>> serg38l@hotmail.com
>>>      > > <mailto:
>>>      > > >> serg38l@hotmail.com>> wrote:
>>>      > > >>
>>>      > > >> This is not simple e.g. for VMware. Each management server
>>> also
>>> acts
>>>      > as
>>>      > > an
>>>      > > >> agent proxy so tasks against a particular ESX host will be
>>> always
>>>      > > >> forwarded. That right answer will be to a native support for
>>>      > > “maintenance
>>>      > > >> mode” for management server. When entered to such mode the
>>> management
>>>      > > >> server should release all agents including save,
>>> block/redirect
>>> API
>>>      > > calls
>>>      > > >> and login request and finish all a sync job it originated.
>>>      > > >>
>>>      > > >> Sent from my iPhone
>>>      > > >>
>>>      > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
>>>      > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail.
>>> com
>>>      > wrote:
>>>      > > >>
>>>      > > >> Ilya, still regarding the management server that is being
>>> shut
>>> down
>>>      > > issue;
>>>      > > >> if other MSs/or maybe system VMs (I am not sure to know if
>>> they
>>> are
>>>      > > able to
>>>      > > >> do such tasks) can direct/redirect/send new jobs to this
>>> management
>>>      > > server
>>>      > > >> (the one being shut down), the process might never end
>>> because
>>> new
>>>      > tasks
>>>      > > >> are always being created for the management server that we
>>> want
>>> to
>>>      > shut
>>>      > > >> down. Is this scenario possible?
>>>      > > >>
>>>      > > >> That is why I mentioned blocking the port 8250 for the
>>>      > > “graceful-shutdown”.
>>>      > > >>
>>>      > > >> If this scenario is not possible, then everything s fine.
>>>      > > >>
>>>      > > >>
>>>      > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
>>>      > > ilya.mailing.lists@gmail.com
>>>      > > >> <ma...@gmail.com>>
>>>      > > >> wrote:
>>>      > > >>
>>>      > > >> I'm thinking of using a configuration from
>>>      > > "job.cancel.threshold.minutes" -
>>>      > > >> it will be the longest
>>>      > > >>
>>>      > > >>    "category": "Advanced",
>>>      > > >>
>>>      > > >>    "description": "Time (in minutes) for async-jobs to be
>>> forcely
>>>      > > >> cancelled if it has been in process for long",
>>>      > > >>
>>>      > > >>    "name": "job.cancel.threshold.minutes",
>>>      > > >>
>>>      > > >>    "value": "60"
>>>      > > >>
>>>      > > >>
>>>      > > >>
>>>      > > >>
>>>      > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
>>>      > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail.
>>> com
>>>      > wrote:
>>>      > > >>
>>>      > > >> Big +1 for this feature; I only have a few doubts.
>>>      > > >>
>>>      > > >> * Regarding the tasks/jobs that management servers (MSs)
>>> execute; are
>>>      > > >> these
>>>      > > >> tasks originate from requests that come to the MS, or is it
>>> possible
>>>      > > that
>>>      > > >> requests received by one management server to be executed by
>>> other? I
>>>      > > >> mean,
>>>      > > >> if I execute a request against MS1, will this request always
>>> be
>>>      > > >> executed/threated by MS1, or is it possible that this
>>> request is
>>>      > > executed
>>>      > > >> by another MS (e.g. MS2)?
>>>      > > >>
>>>      > > >> * I would suggest that after we block traffic coming from
>>>      > > >> 8080/8443/8250(we
>>>      > > >> will need to block this as well right?), we can log the
>>> execution of
>>>      > > >> tasks.
>>>      > > >> I mean, something saying, there are XXX tasks (enumerate
>>> tasks)
>>> still
>>>      > > >> being
>>>      > > >> executed, we will wait for them to finish before shutting
>>> down.
>>>      > > >>
>>>      > > >> * The timeout (60 minutes suggested) could be global settings
>>> that we
>>>      > > can
>>>      > > >> load before executing the graceful-shutdown.
>>>      > > >>
>>>      > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
>>>      > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
>>> gmail.com>
>>>      > > >>
>>>      > > >> wrote:
>>>      > > >>
>>>      > > >> Use case:
>>>      > > >> In any environment - time to time - administrator needs to
>>> perform a
>>>      > > >> maintenance. Current stop sequence of cloudstack management
>>> server
>>>      > will
>>>      > > >> ignore the fact that there may be long running async jobs -
>>> and
>>>      > > >> terminate
>>>      > > >> the process. This in turn can create a poor user experience
>>> and
>>>      > > >> occasional
>>>      > > >> inconsistency  in cloudstack db.
>>>      > > >>
>>>      > > >> This is especially painful in large environments where the
>>> user
>>> has
>>>      > > >> thousands of nodes and there is a continuous patching that
>>> happens
>>>      > > >> around
>>>      > > >> the clock - that requires migration of workload from one
>>> node to
>>>      > > >> another.
>>>      > > >>
>>>      > > >> With that said - i've created a script that monitors the
>>> async
>>> job
>>>      > > >> queue
>>>      > > >> for given MS and waits for it complete all jobs. More details
>>> are
>>>      > > >> posted
>>>      > > >> below.
>>>      > > >>
>>>      > > >> I'd like to introduce "graceful-shutdown" into the
>>> systemctl/service
>>>      > of
>>>      > > >> cloudstack-management service.
>>>      > > >>
>>>      > > >> The details of how it will work is below:
>>>      > > >>
>>>      > > >> Workflow for graceful shutdown:
>>>      > > >> Using iptables/firewalld - block any connection attempts on
>>> 8080/8443
>>>      > > >> (we
>>>      > > >> can identify the ports dynamically)
>>>      > > >> Identify the MSID for the node, using the proper msid - query
>>>      > > >> async_job
>>>      > > >> table for
>>>      > > >> 1) any jobs that are still running (or job_status=“0”)
>>>      > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
>>>      > > >> 3) job_init_msid=$my_ms_id
>>>      > > >>
>>>      > > >> Monitor this async_job table for 60 minutes - until all async
>>> jobs for
>>>      > > >> MSID
>>>      > > >> are done, then proceed with shutdown
>>>      > > >>  If failed for any reason or terminated, catch the exit via
>>> trap
>>>      > > >> command
>>>      > > >> and unblock the 8080/8443
>>>      > > >>
>>>      > > >> Comments are welcome
>>>      > > >>
>>>      > > >> Regards,
>>>      > > >> ilya
>>>      > > >>
>>>      > > >>
>>>      > > >>
>>>      > > >>
>>>      > > >> --
>>>      > > >> Rafael Weingärtner
>>>      > > >>
>>>      > > >>
>>>      > > >>
>>>      > > >>
>>>      > > >>
>>>      > > >> --
>>>      > > >> Rafael Weingärtner
>>>      > > >>
>>>      > >
>>>      >
>>>      >
>>>      >
>>>      > --
>>>      >
>>>      > Andrija Panić
>>>      >
>>>
>>>
>>>
>>>
>>
> --
> Ron Wheeler
> President
> Artifact Software Inc
> email: rwheeler@artifact-software.com
> skype: ronaldmwheeler
> phone: 866-970-2435, ext 102
>
>


-- 
Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Ron Wheeler <rw...@artifact-software.com>.
Part of this sounds like the Windows shut down process which is familiar 
to many.

For those who have never used Windows:

Once you initiate the shutdown, it asks the tasks to shut down.
If tasks have not shutdown within a "reasonable period", it lists them 
and asks you if you want to wait a bit longer, force them to close or 
abort the shutdown so that you can manually shut them down.
If you "force" a shutdown it closes all of the tasks using all of the 
brutality at its command.
If you abort, then you have to redo the shutdown after you have manually 
exited from the processes that you care about.

This is pretty user friendly but requires that you have a way to signal 
to a task that it is time to say goodbye.

The "reasonable time" needs to have a default that is short enough to 
make the operator happy and long enough to have a reasonable chance of 
getting everything stopped without intervention. If you allow the 
shutdown to proceed after the interval, while the operator waits then 
you need to refresh the list of running tasks when tasks end.

Ron

On 17/04/2018 11:27 AM, Rafael Weingärtner wrote:
> Ilya and others,
>
> We have been discussing this idea of graceful/nicely shutdown.  Our feeling
> is that we (in CloudStack community) might have been trying to solve this
> problem with too much scripting. What if we developed a more integrated
> (native) solution?
>
> Let me explain our idea.
>
> ACS has a table called “mshost”, which is used to store management server
> information. During balancing and when jobs are dispatched to other
> management servers this table is consulted/queried.  Therefore, we have
> been discussing the idea of creating a management API for management
> servers.  We could have an API method that changes the state of management
> servers to “prepare to maintenance” and then “maintenance” (as soon as all
> of the task/jobs it is managing finish). The idea is that during
> rebalancing we would remove the hosts of servers that are not in “Up” state
> (of course we would also ignore hosts in the aforementioned state to
> receive hosts to manage).  Moreover, when we send/dispatch jobs to other
> management servers, we could ignore the ones that are not in “Up” state
> (which is something already done).
>
> By doing this, the nicely shutdown could be executed in a few steps.
>
> 1 – issue the maintenance method for the management server you desire
> 2 – wait until the MS goes into maintenance mode, while there are still
> running jobs it (the management server) will be maintained in prepare for
> maintenance
> 3 – execute the Linux shutdown command
>
> We would need other APIs methods to manage MSs then. An (i) API method to
> list MSs, and we could even create an (ii) API to remove old/de-activated
> management servers, which we currently do not have (forcing users to apply
> changed directly in the database).
>
> Moreover, in this model, we would not kill hanging jobs; we would wait
> until they expire and ACS expunges them. Of course, it is possible to
> develop a forceful maintenance method as well. Then, when the “prepare for
> maintenance” takes longer than a parameter, we could kill hanging jobs.
>
> All of this would allow the MS to be kept up and receiving requests until
> it can be safely shutdown. What do you guys about this approach?
>
> On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yz...@marketo.com> wrote:
>
>> As a cloud admin, I would love to have this feature.
>>
>> It so happens that I just accidentally restarted my ACS management server
>> while two instances are migrating to another Xen cluster (via storage
>> migration, not live migration).  As results, both instances
>> ends up with corrupted data disk which can't be reattached or migrated.
>>
>> Any feature which prevents this from happening would be great.  A low
>> hanging fruit is simply checking for
>> if there are any async jobs running, especially any kind of migration jobs
>> or other known long running type of
>> jobs and warn the operator  so that he has a chance to abort server
>> shutdowns.
>>
>> Yiping
>>
>> On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com> wrote:
>>
>>      Andrija
>>
>>      This is a tough scenario.
>>
>>      As an admin, they way i would have handled this situation, is to
>> advertise
>>      the upcoming outage and then take away specific API commands from a
>> user a
>>      day before - so he does not cause any long running async jobs. Once
>>      maintenance completes - enable the API commands back to the user.
>> However -
>>      i dont know who your user base is and if this would be an acceptable
>>      solution.
>>
>>      Perhaps also investigate what can be done to speed up your long running
>>      tasks...
>>
>>      As a side node, we will be working on a feature that would allow for a
>>      graceful termination of the process/job, meaning if agent noticed a
>>      disconnect or termination request - it will abort the command in
>> flight. We
>>      can also consider restarting this tasks again or what not - but it
>> would
>>      not be part of this enhancement.
>>
>>      Regards
>>      ilya
>>
>>      On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <andrija.panic@gmail.com
>>      wrote:
>>
>>      > Hi Ilya,
>>      >
>>      > thanks for the feedback - but in "real world", you need to
>> "understand"
>>      > that 60min is next to useless timeout for some jobs (if I understand
>> this
>>      > specific parameter correctly ?? - job is really canceled, not only
>> job
>>      > monitoring is canceled ???) -
>>      >
>>      > My value for the  "job.cancel.threshold.minutes" is 2880 minutes (2
>> days?)
>>      >
>>      > I can tell you when you have CEPH/NFS (CEPH even "worse" case, since
>> slower
>>      > read durign qemu-img convert process...) of 500GB, then imagine
>> snapshot
>>      > job will take many hours. Should I mention 1TB volumes (yes, we had
>>      > client's like that...)
>>      > Than attaching 1TB volume, that was uploaded to ACS (lives
>> originally on
>>      > Secondary Storage, and takes time to be copied over to NFS/CEPH)
>> will take
>>      > up to few hours.
>>      > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
>> takes
>>      > time...etc.
>>      >
>>      > I'm just giving you feedback as "user", admin of the cloud, zero DEV
>> skills
>>      > here :) , just to make sure you make practical decisions (and I
>> admit I
>>      > might be wrong with my stuff, but just giving you feedback from our
>> public
>>      > cloud setup)
>>      >
>>      >
>>      > Cheers!
>>      >
>>      >
>>      >
>>      >
>>      > On 5 April 2018 at 15:16, Tutkowski, Mike <Mike.Tutkowski@netapp.com
>>      > wrote:
>>      >
>>      > > Wow, there’s been a lot of good details noted from several people
>> on how
>>      > > this process works today and how we’d like it to work in the near
>> future.
>>      > >
>>      > > 1) Any chance this is already documented on the Wiki?
>>      > >
>>      > > 2) If not, any chance someone would be willing to do so (a flow
>> diagram
>>      > > would be particularly useful).
>>      > >
>>      > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
>> marco@exoscale.ch>
>>      > > wrote:
>>      > > >
>>      > > > Hi all,
>>      > > >
>>      > > > Good point ilya but as stated by Sergey there's more thing to
>> consider
>>      > > > before being able to do a proper shutdown. I augmented my script
>> I gave
>>      > > you
>>      > > > originally and changed code in CS. What we're doing for our
>> environment
>>      > > is
>>      > > > as follow:
>>      > > >
>>      > > > 1. the MGMT looks for a change in the file /etc/lb-agent which
>> contains
>>      > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
>> disable the
>>      > > > mgmt on the keyword "maint" and the mgmt server stops a couple of
>>      > > > threads[1] to stop processing async jobs in the queue
>>      > > > 2. Looks for the async jobs and wait until there is none to
>> ensure you
>>      > > can
>>      > > > send the reconnect commands (if jobs are running, a reconnect
>> will
>>      > result
>>      > > > in a failed job since the result will never reach the management
>>      > server -
>>      > > > the agent waits for the current job to be done before
>> reconnecting, and
>>      > > > discard the result... rooms for improvement here!)
>>      > > > 3. Issue a reconnectHost command to all the hosts connected to
>> the mgmt
>>      > > > server so that they reconnect to another one, otherwise the mgmt
>> must
>>      > be
>>      > > up
>>      > > > since it is used to forward commands to agents.
>>      > > > 4. when all agents are reconnected, we can shutdown the
>> management
>>      > server
>>      > > > and perform the maintenance.
>>      > > >
>>      > > > One issue remains for me, during the reconnect, the commands
>> that are
>>      > > > processed at the same time should be kept in a queue until the
>> agents
>>      > > have
>>      > > > finished any current jobs and have reconnected. Today the little
>> time
>>      > > > window during which the reconnect happens can lead to failed
>> jobs due
>>      > to
>>      > > > the agent not being connected at the right moment.
>>      > > >
>>      > > > I could push a PR for the change to stop some processing threads
>> based
>>      > on
>>      > > > the content of a file. It's possible also to cancel the drain of
>> the
>>      > > > management by simply changing the content of the file back to
>> "ready"
>>      > > > again, instead of "maint" [2].
>>      > > >
>>      > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
>>      > > > [2] HA proxy documentation on agent checker:
>> https://cbonte.github.io/
>>      > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
>>      > > >
>>      > > > Regarding your issue on the port blocking, I think it's fair to
>>      > consider
>>      > > > that if you want to shutdown your server at some point, you have
>> to
>>      > stop
>>      > > > serving (some) requests. Here the only way it's to stop serving
>>      > > everything.
>>      > > > If the API had a REST design, we could reject any POST/PUT/DELETE
>>      > > > operations and allow GET ones. I don't know how hard it would be
>> today
>>      > to
>>      > > > only allow listBaseCmd operations to be more friendly with the
>> users.
>>      > > >
>>      > > > Marco
>>      > > >
>>      > > >
>>      > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
>> serg38l@hotmail.com>
>>      > > > wrote:
>>      > > >
>>      > > >> Now without spellchecking :)
>>      > > >>
>>      > > >> This is not simple e.g. for VMware. Each management server also
>> acts
>>      > as
>>      > > an
>>      > > >> agent proxy so tasks against a particular ESX host will be
>> always
>>      > > >> forwarded. That right answer will be to support a native
>> “maintenance
>>      > > mode”
>>      > > >> for management server. When entered to such mode the management
>> server
>>      > > >> should release all agents including SSVM, block/redirect API
>> calls and
>>      > > >> login request and finish all async job it originated.
>>      > > >>
>>      > > >>
>>      > > >>
>>      > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
>> serg38l@hotmail.com
>>      > > <mailto:
>>      > > >> serg38l@hotmail.com>> wrote:
>>      > > >>
>>      > > >> This is not simple e.g. for VMware. Each management server also
>> acts
>>      > as
>>      > > an
>>      > > >> agent proxy so tasks against a particular ESX host will be
>> always
>>      > > >> forwarded. That right answer will be to a native support for
>>      > > “maintenance
>>      > > >> mode” for management server. When entered to such mode the
>> management
>>      > > >> server should release all agents including save, block/redirect
>> API
>>      > > calls
>>      > > >> and login request and finish all a sync job it originated.
>>      > > >>
>>      > > >> Sent from my iPhone
>>      > > >>
>>      > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
>>      > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail.com
>>      > wrote:
>>      > > >>
>>      > > >> Ilya, still regarding the management server that is being shut
>> down
>>      > > issue;
>>      > > >> if other MSs/or maybe system VMs (I am not sure to know if they
>> are
>>      > > able to
>>      > > >> do such tasks) can direct/redirect/send new jobs to this
>> management
>>      > > server
>>      > > >> (the one being shut down), the process might never end because
>> new
>>      > tasks
>>      > > >> are always being created for the management server that we want
>> to
>>      > shut
>>      > > >> down. Is this scenario possible?
>>      > > >>
>>      > > >> That is why I mentioned blocking the port 8250 for the
>>      > > “graceful-shutdown”.
>>      > > >>
>>      > > >> If this scenario is not possible, then everything s fine.
>>      > > >>
>>      > > >>
>>      > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
>>      > > ilya.mailing.lists@gmail.com
>>      > > >> <ma...@gmail.com>>
>>      > > >> wrote:
>>      > > >>
>>      > > >> I'm thinking of using a configuration from
>>      > > "job.cancel.threshold.minutes" -
>>      > > >> it will be the longest
>>      > > >>
>>      > > >>    "category": "Advanced",
>>      > > >>
>>      > > >>    "description": "Time (in minutes) for async-jobs to be
>> forcely
>>      > > >> cancelled if it has been in process for long",
>>      > > >>
>>      > > >>    "name": "job.cancel.threshold.minutes",
>>      > > >>
>>      > > >>    "value": "60"
>>      > > >>
>>      > > >>
>>      > > >>
>>      > > >>
>>      > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
>>      > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail.com
>>      > wrote:
>>      > > >>
>>      > > >> Big +1 for this feature; I only have a few doubts.
>>      > > >>
>>      > > >> * Regarding the tasks/jobs that management servers (MSs)
>> execute; are
>>      > > >> these
>>      > > >> tasks originate from requests that come to the MS, or is it
>> possible
>>      > > that
>>      > > >> requests received by one management server to be executed by
>> other? I
>>      > > >> mean,
>>      > > >> if I execute a request against MS1, will this request always be
>>      > > >> executed/threated by MS1, or is it possible that this request is
>>      > > executed
>>      > > >> by another MS (e.g. MS2)?
>>      > > >>
>>      > > >> * I would suggest that after we block traffic coming from
>>      > > >> 8080/8443/8250(we
>>      > > >> will need to block this as well right?), we can log the
>> execution of
>>      > > >> tasks.
>>      > > >> I mean, something saying, there are XXX tasks (enumerate tasks)
>> still
>>      > > >> being
>>      > > >> executed, we will wait for them to finish before shutting down.
>>      > > >>
>>      > > >> * The timeout (60 minutes suggested) could be global settings
>> that we
>>      > > can
>>      > > >> load before executing the graceful-shutdown.
>>      > > >>
>>      > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
>>      > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
>> gmail.com>
>>      > > >>
>>      > > >> wrote:
>>      > > >>
>>      > > >> Use case:
>>      > > >> In any environment - time to time - administrator needs to
>> perform a
>>      > > >> maintenance. Current stop sequence of cloudstack management
>> server
>>      > will
>>      > > >> ignore the fact that there may be long running async jobs - and
>>      > > >> terminate
>>      > > >> the process. This in turn can create a poor user experience and
>>      > > >> occasional
>>      > > >> inconsistency  in cloudstack db.
>>      > > >>
>>      > > >> This is especially painful in large environments where the user
>> has
>>      > > >> thousands of nodes and there is a continuous patching that
>> happens
>>      > > >> around
>>      > > >> the clock - that requires migration of workload from one node to
>>      > > >> another.
>>      > > >>
>>      > > >> With that said - i've created a script that monitors the async
>> job
>>      > > >> queue
>>      > > >> for given MS and waits for it complete all jobs. More details
>> are
>>      > > >> posted
>>      > > >> below.
>>      > > >>
>>      > > >> I'd like to introduce "graceful-shutdown" into the
>> systemctl/service
>>      > of
>>      > > >> cloudstack-management service.
>>      > > >>
>>      > > >> The details of how it will work is below:
>>      > > >>
>>      > > >> Workflow for graceful shutdown:
>>      > > >> Using iptables/firewalld - block any connection attempts on
>> 8080/8443
>>      > > >> (we
>>      > > >> can identify the ports dynamically)
>>      > > >> Identify the MSID for the node, using the proper msid - query
>>      > > >> async_job
>>      > > >> table for
>>      > > >> 1) any jobs that are still running (or job_status=“0”)
>>      > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
>>      > > >> 3) job_init_msid=$my_ms_id
>>      > > >>
>>      > > >> Monitor this async_job table for 60 minutes - until all async
>> jobs for
>>      > > >> MSID
>>      > > >> are done, then proceed with shutdown
>>      > > >>  If failed for any reason or terminated, catch the exit via trap
>>      > > >> command
>>      > > >> and unblock the 8080/8443
>>      > > >>
>>      > > >> Comments are welcome
>>      > > >>
>>      > > >> Regards,
>>      > > >> ilya
>>      > > >>
>>      > > >>
>>      > > >>
>>      > > >>
>>      > > >> --
>>      > > >> Rafael Weingärtner
>>      > > >>
>>      > > >>
>>      > > >>
>>      > > >>
>>      > > >>
>>      > > >> --
>>      > > >> Rafael Weingärtner
>>      > > >>
>>      > >
>>      >
>>      >
>>      >
>>      > --
>>      >
>>      > Andrija Panić
>>      >
>>
>>
>>
>

-- 
Ron Wheeler
President
Artifact Software Inc
email: rwheeler@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102


Re: [DISCUSS] CloudStack graceful shutdown

Posted by Rafael Weingärtner <ra...@gmail.com>.
Thanks for the feedback Ilya. Then, we would only need to adapt this new
feature introduced by you and ShapeBlue.


On Sat, Apr 21, 2018 at 4:03 PM, ilya musayev <il...@gmail.com>
wrote:

> Rafael
>
> What you are suggesting - was already implemented. We've created Load
> Balancing algorithms - but we did not take into account the LB algo for
> maintenance (yet). Rohit and ShapeBlue were the developers behind the
> feature.
>
> What needs to happen is a tweak to LB Algorithms to become MS maintenance
> aware - or create new LB Algos altogether. Essentially we need to merge
> your work and this feature. Please read the FS below.
>
> Functional Spec:
>
>
> The new CA framework introduced basic support for comma-separated
> list of management servers for agent, which makes an external LB
> unnecessary.
>
> This extends that feature to implement LB sorting algorithms that
> sorts the management server list before they are sent to the agents.
> This adds a central intelligence in the management server and adds
> additional enhancements to Agent class to be algorithm aware and
> have a background mechanism to check/fallback to preferred management
> server (assumed as the first in the list). This is support for any
> indirect agent such as the KVM, CPVM and SSVM agent, and would
> provide support for management server host migration during upgrade
> (when instead of in-place, new hosts are used to setup new mgmt server).
>
> This FR introduces two new global settings:
>
>    - indirect.agent.lb.algorithm: The algorithm for the indirect agent LB.
>    - indirect.agent.lb.check.interval: The preferred host check interval
>    for the agent's background task that checks and switches to agent's
>    preferred host.
>
> The indirect.agent.lb.algorithm supports following algorithm options:
>
>    - static: use the list as provided.
>    - roundrobin: evenly spreads hosts across management servers based on
>    host's id.
>    - shuffle: (pseudo) randomly sorts the list (not recommended for
>    production).
>
> Any changes to the global settings - indirect.agent.lb.algorithm and
> host does not require restarting of the mangement server(s) and the
> agents. A message bus based system dynamically reacts to change in these
> global settings and propagates them to all connected agents.
>
> Comma-separated management server list is propagated to agents on
> following cases:
>
>    - Addition of a host (including ssvm, cpvm systevms).
>    - Connection or reconnection by the agents to a management server.
>    - After admin changes the 'host' and/or the
>    'indirect.agent.lb.algorithm' global settings.
>
> On the agent side, the 'host' setting is saved in its properties file as:
> host=<comma separated addresses>@<algorithm name>.
>
> First the agent connects to the management server and sends its current
> management server list, which is compared by the management server and
> in case of failure a new/update list is sent for the agent to persist.
>
> From the agent's perspective, the first address in the propagated list
> will be considered the preferred host. A new background task can be
> activated by configuring the indirect.agent.lb.check.interval which is
> a cluster level global setting from CloudStack and admins can also
> override this by configuring the 'host.lb.check.interval' in the
> agent.properties file.
>
> Every time agent gets a ms-host list and the algorithm, the host specific
> background check interval is also sent and it dynamically reconfigures
> the background task without need to restart agents.
>
> Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
> order as expected by them, however, the 'shuffle' algorithm just checks
> for content and not the order of the comma separate ms host addresses.
>
> Regards
> ilya
>
>
> On Fri, Apr 20, 2018 at 1:01 PM, Rafael Weingärtner <
> rafaelweingartner@gmail.com> wrote:
>
> > Is that management server load balancing feature using static
> > configurations? I heard about it on the mailing list, but I did not
> follow
> > the implementation.
> >
> > I do not see many problems with agents reconnecting. We can implement in
> > agents (not just KVM, but also system VMs) a logic that instead of using
> a
> > static pool of management servers configured in a properties file, they
> > dynamically request a list of available management servers via that list
> > management servers API method. This would require us to configure agents
> > with a load balancer URL that executes the balancing between multiple
> > management servers.
> >
> > I am +1 to remove the need for that VIP, which executes the load balance
> > for connecting agents to management servers.
> >
> > On Fri, Apr 20, 2018 at 4:41 PM, ilya musayev <
> > ilya.mailing.lists@gmail.com>
> > wrote:
> >
> > > Rafael and Community
> > >
> > > All is well and good and i think we are thinking along the similar
> lines
> > -
> > > the only issue that i see right now with any approach is KVM Agents (or
> > > direct agents) and using LoadBalancer on 8250.
> > >
> > > Here is a scenario:
> > >
> > > You have 2 Management Server setup fronted with a VIP on 8250.
> > > The LB Algorithm is either Round Robin or Least Connections used.
> > > You initiate a maintenance mode operation on one of the MS servers
> (call
> > it
> > > MS1) - assume you have a long running migration job that needs 60
> minutes
> > > to complete.
> > > We attempt to evacuate the agents by telling them to disconnect and
> > > reconnect again
> > > If we are using LB on 8250 with
> > > 1) Least Connection used - then all agents will continuously try to
> > connect
> > > to a MS1 node that is attempting to go down for maintenance.
> Essentially
> > > with this  LB configuration this operation will never
> > > 2) Round Robin - this will take a while - but eventually - you will get
> > all
> > > nodes connected to MS2
> > >
> > > The current limitation is usage of external LB on 8250. For this
> > operation
> > > to work without issue - would mean agents must connect to MS server
> > without
> > > an LB. This is a recent feature we've developed with ShapeBlue - where
> we
> > > maintain the list of CloudStack Management Servers in the
> > agent.properties
> > > file.
> > >
> > > Unless you can think of other solution - it appears we may have to
> forced
> > > to bypass the 8250 VIP LB and use the new feature to maintain the list
> of
> > > management servers within agent.properties.
> > >
> > >
> > > I need to run now, let me know what your thoughts are.
> > >
> > > Regards
> > > ilya
> > >
> > >
> > >
> > > On Tue, Apr 17, 2018 at 8:27 AM, Rafael Weingärtner <
> > > rafaelweingartner@gmail.com> wrote:
> > >
> > > > Ilya and others,
> > > >
> > > > We have been discussing this idea of graceful/nicely shutdown.  Our
> > > feeling
> > > > is that we (in CloudStack community) might have been trying to solve
> > this
> > > > problem with too much scripting. What if we developed a more
> integrated
> > > > (native) solution?
> > > >
> > > > Let me explain our idea.
> > > >
> > > > ACS has a table called “mshost”, which is used to store management
> > server
> > > > information. During balancing and when jobs are dispatched to other
> > > > management servers this table is consulted/queried.  Therefore, we
> have
> > > > been discussing the idea of creating a management API for management
> > > > servers.  We could have an API method that changes the state of
> > > management
> > > > servers to “prepare to maintenance” and then “maintenance” (as soon
> as
> > > all
> > > > of the task/jobs it is managing finish). The idea is that during
> > > > rebalancing we would remove the hosts of servers that are not in “Up”
> > > state
> > > > (of course we would also ignore hosts in the aforementioned state to
> > > > receive hosts to manage).  Moreover, when we send/dispatch jobs to
> > other
> > > > management servers, we could ignore the ones that are not in “Up”
> state
> > > > (which is something already done).
> > > >
> > > > By doing this, the nicely shutdown could be executed in a few steps.
> > > >
> > > > 1 – issue the maintenance method for the management server you desire
> > > > 2 – wait until the MS goes into maintenance mode, while there are
> still
> > > > running jobs it (the management server) will be maintained in prepare
> > for
> > > > maintenance
> > > > 3 – execute the Linux shutdown command
> > > >
> > > > We would need other APIs methods to manage MSs then. An (i) API
> method
> > to
> > > > list MSs, and we could even create an (ii) API to remove
> > old/de-activated
> > > > management servers, which we currently do not have (forcing users to
> > > apply
> > > > changed directly in the database).
> > > >
> > > > Moreover, in this model, we would not kill hanging jobs; we would
> wait
> > > > until they expire and ACS expunges them. Of course, it is possible to
> > > > develop a forceful maintenance method as well. Then, when the
> “prepare
> > > for
> > > > maintenance” takes longer than a parameter, we could kill hanging
> jobs.
> > > >
> > > > All of this would allow the MS to be kept up and receiving requests
> > until
> > > > it can be safely shutdown. What do you guys about this approach?
> > > >
> > > > On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yz...@marketo.com>
> > > wrote:
> > > >
> > > > > As a cloud admin, I would love to have this feature.
> > > > >
> > > > > It so happens that I just accidentally restarted my ACS management
> > > server
> > > > > while two instances are migrating to another Xen cluster (via
> storage
> > > > > migration, not live migration).  As results, both instances
> > > > > ends up with corrupted data disk which can't be reattached or
> > migrated.
> > > > >
> > > > > Any feature which prevents this from happening would be great.  A
> low
> > > > > hanging fruit is simply checking for
> > > > > if there are any async jobs running, especially any kind of
> migration
> > > > jobs
> > > > > or other known long running type of
> > > > > jobs and warn the operator  so that he has a chance to abort server
> > > > > shutdowns.
> > > > >
> > > > > Yiping
> > > > >
> > > > > On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com>
> > > > wrote:
> > > > >
> > > > >     Andrija
> > > > >
> > > > >     This is a tough scenario.
> > > > >
> > > > >     As an admin, they way i would have handled this situation, is
> to
> > > > > advertise
> > > > >     the upcoming outage and then take away specific API commands
> > from a
> > > > > user a
> > > > >     day before - so he does not cause any long running async jobs.
> > Once
> > > > >     maintenance completes - enable the API commands back to the
> user.
> > > > > However -
> > > > >     i dont know who your user base is and if this would be an
> > > acceptable
> > > > >     solution.
> > > > >
> > > > >     Perhaps also investigate what can be done to speed up your long
> > > > running
> > > > >     tasks...
> > > > >
> > > > >     As a side node, we will be working on a feature that would
> allow
> > > for
> > > > a
> > > > >     graceful termination of the process/job, meaning if agent
> > noticed a
> > > > >     disconnect or termination request - it will abort the command
> in
> > > > > flight. We
> > > > >     can also consider restarting this tasks again or what not - but
> > it
> > > > > would
> > > > >     not be part of this enhancement.
> > > > >
> > > > >     Regards
> > > > >     ilya
> > > > >
> > > > >     On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <
> > > > andrija.panic@gmail.com
> > > > > >
> > > > >     wrote:
> > > > >
> > > > >     > Hi Ilya,
> > > > >     >
> > > > >     > thanks for the feedback - but in "real world", you need to
> > > > > "understand"
> > > > >     > that 60min is next to useless timeout for some jobs (if I
> > > > understand
> > > > > this
> > > > >     > specific parameter correctly ?? - job is really canceled, not
> > > only
> > > > > job
> > > > >     > monitoring is canceled ???) -
> > > > >     >
> > > > >     > My value for the  "job.cancel.threshold.minutes" is 2880
> > minutes
> > > (2
> > > > > days?)
> > > > >     >
> > > > >     > I can tell you when you have CEPH/NFS (CEPH even "worse"
> case,
> > > > since
> > > > > slower
> > > > >     > read durign qemu-img convert process...) of 500GB, then
> imagine
> > > > > snapshot
> > > > >     > job will take many hours. Should I mention 1TB volumes (yes,
> we
> > > had
> > > > >     > client's like that...)
> > > > >     > Than attaching 1TB volume, that was uploaded to ACS (lives
> > > > > originally on
> > > > >     > Secondary Storage, and takes time to be copied over to
> > NFS/CEPH)
> > > > > will take
> > > > >     > up to few hours.
> > > > >     > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS,
> > also
> > > > > takes
> > > > >     > time...etc.
> > > > >     >
> > > > >     > I'm just giving you feedback as "user", admin of the cloud,
> > zero
> > > > DEV
> > > > > skills
> > > > >     > here :) , just to make sure you make practical decisions
> (and I
> > > > > admit I
> > > > >     > might be wrong with my stuff, but just giving you feedback
> from
> > > our
> > > > > public
> > > > >     > cloud setup)
> > > > >     >
> > > > >     >
> > > > >     > Cheers!
> > > > >     >
> > > > >     >
> > > > >     >
> > > > >     >
> > > > >     > On 5 April 2018 at 15:16, Tutkowski, Mike <
> > > > Mike.Tutkowski@netapp.com
> > > > > >
> > > > >     > wrote:
> > > > >     >
> > > > >     > > Wow, there’s been a lot of good details noted from several
> > > people
> > > > > on how
> > > > >     > > this process works today and how we’d like it to work in
> the
> > > near
> > > > > future.
> > > > >     > >
> > > > >     > > 1) Any chance this is already documented on the Wiki?
> > > > >     > >
> > > > >     > > 2) If not, any chance someone would be willing to do so (a
> > flow
> > > > > diagram
> > > > >     > > would be particularly useful).
> > > > >     > >
> > > > >     > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
> > > > > marco@exoscale.ch>
> > > > >     > > wrote:
> > > > >     > > >
> > > > >     > > > Hi all,
> > > > >     > > >
> > > > >     > > > Good point ilya but as stated by Sergey there's more
> thing
> > to
> > > > > consider
> > > > >     > > > before being able to do a proper shutdown. I augmented my
> > > > script
> > > > > I gave
> > > > >     > > you
> > > > >     > > > originally and changed code in CS. What we're doing for
> our
> > > > > environment
> > > > >     > > is
> > > > >     > > > as follow:
> > > > >     > > >
> > > > >     > > > 1. the MGMT looks for a change in the file /etc/lb-agent
> > > which
> > > > > contains
> > > > >     > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy
> can
> > > > > disable the
> > > > >     > > > mgmt on the keyword "maint" and the mgmt server stops a
> > > couple
> > > > of
> > > > >     > > > threads[1] to stop processing async jobs in the queue
> > > > >     > > > 2. Looks for the async jobs and wait until there is none
> to
> > > > > ensure you
> > > > >     > > can
> > > > >     > > > send the reconnect commands (if jobs are running, a
> > reconnect
> > > > > will
> > > > >     > result
> > > > >     > > > in a failed job since the result will never reach the
> > > > management
> > > > >     > server -
> > > > >     > > > the agent waits for the current job to be done before
> > > > > reconnecting, and
> > > > >     > > > discard the result... rooms for improvement here!)
> > > > >     > > > 3. Issue a reconnectHost command to all the hosts
> connected
> > > to
> > > > > the mgmt
> > > > >     > > > server so that they reconnect to another one, otherwise
> the
> > > > mgmt
> > > > > must
> > > > >     > be
> > > > >     > > up
> > > > >     > > > since it is used to forward commands to agents.
> > > > >     > > > 4. when all agents are reconnected, we can shutdown the
> > > > > management
> > > > >     > server
> > > > >     > > > and perform the maintenance.
> > > > >     > > >
> > > > >     > > > One issue remains for me, during the reconnect, the
> > commands
> > > > > that are
> > > > >     > > > processed at the same time should be kept in a queue
> until
> > > the
> > > > > agents
> > > > >     > > have
> > > > >     > > > finished any current jobs and have reconnected. Today the
> > > > little
> > > > > time
> > > > >     > > > window during which the reconnect happens can lead to
> > failed
> > > > > jobs due
> > > > >     > to
> > > > >     > > > the agent not being connected at the right moment.
> > > > >     > > >
> > > > >     > > > I could push a PR for the change to stop some processing
> > > > threads
> > > > > based
> > > > >     > on
> > > > >     > > > the content of a file. It's possible also to cancel the
> > drain
> > > > of
> > > > > the
> > > > >     > > > management by simply changing the content of the file
> back
> > to
> > > > > "ready"
> > > > >     > > > again, instead of "maint" [2].
> > > > >     > > >
> > > > >     > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker,
> StatsCollector
> > > > >     > > > [2] HA proxy documentation on agent checker:
> > > > > https://cbonte.github.io/
> > > > >     > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> > > > >     > > >
> > > > >     > > > Regarding your issue on the port blocking, I think it's
> > fair
> > > to
> > > > >     > consider
> > > > >     > > > that if you want to shutdown your server at some point,
> you
> > > > have
> > > > > to
> > > > >     > stop
> > > > >     > > > serving (some) requests. Here the only way it's to stop
> > > serving
> > > > >     > > everything.
> > > > >     > > > If the API had a REST design, we could reject any
> > > > POST/PUT/DELETE
> > > > >     > > > operations and allow GET ones. I don't know how hard it
> > would
> > > > be
> > > > > today
> > > > >     > to
> > > > >     > > > only allow listBaseCmd operations to be more friendly
> with
> > > the
> > > > > users.
> > > > >     > > >
> > > > >     > > > Marco
> > > > >     > > >
> > > > >     > > >
> > > > >     > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
> > > > > serg38l@hotmail.com>
> > > > >     > > > wrote:
> > > > >     > > >
> > > > >     > > >> Now without spellchecking :)
> > > > >     > > >>
> > > > >     > > >> This is not simple e.g. for VMware. Each management
> server
> > > > also
> > > > > acts
> > > > >     > as
> > > > >     > > an
> > > > >     > > >> agent proxy so tasks against a particular ESX host will
> be
> > > > > always
> > > > >     > > >> forwarded. That right answer will be to support a native
> > > > > “maintenance
> > > > >     > > mode”
> > > > >     > > >> for management server. When entered to such mode the
> > > > management
> > > > > server
> > > > >     > > >> should release all agents including SSVM, block/redirect
> > API
> > > > > calls and
> > > > >     > > >> login request and finish all async job it originated.
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
> > > > > serg38l@hotmail.com
> > > > >     > > <mailto:
> > > > >     > > >> serg38l@hotmail.com>> wrote:
> > > > >     > > >>
> > > > >     > > >> This is not simple e.g. for VMware. Each management
> server
> > > > also
> > > > > acts
> > > > >     > as
> > > > >     > > an
> > > > >     > > >> agent proxy so tasks against a particular ESX host will
> be
> > > > > always
> > > > >     > > >> forwarded. That right answer will be to a native support
> > for
> > > > >     > > “maintenance
> > > > >     > > >> mode” for management server. When entered to such mode
> the
> > > > > management
> > > > >     > > >> server should release all agents including save,
> > > > block/redirect
> > > > > API
> > > > >     > > calls
> > > > >     > > >> and login request and finish all a sync job it
> originated.
> > > > >     > > >>
> > > > >     > > >> Sent from my iPhone
> > > > >     > > >>
> > > > >     > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> > > > >     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@
> > > > gmail.com
> > > > > >>
> > > > >     > wrote:
> > > > >     > > >>
> > > > >     > > >> Ilya, still regarding the management server that is
> being
> > > shut
> > > > > down
> > > > >     > > issue;
> > > > >     > > >> if other MSs/or maybe system VMs (I am not sure to know
> if
> > > > they
> > > > > are
> > > > >     > > able to
> > > > >     > > >> do such tasks) can direct/redirect/send new jobs to this
> > > > > management
> > > > >     > > server
> > > > >     > > >> (the one being shut down), the process might never end
> > > because
> > > > > new
> > > > >     > tasks
> > > > >     > > >> are always being created for the management server that
> we
> > > > want
> > > > > to
> > > > >     > shut
> > > > >     > > >> down. Is this scenario possible?
> > > > >     > > >>
> > > > >     > > >> That is why I mentioned blocking the port 8250 for the
> > > > >     > > “graceful-shutdown”.
> > > > >     > > >>
> > > > >     > > >> If this scenario is not possible, then everything s
> fine.
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> > > > >     > > ilya.mailing.lists@gmail.com
> > > > >     > > >> <ma...@gmail.com>>
> > > > >     > > >> wrote:
> > > > >     > > >>
> > > > >     > > >> I'm thinking of using a configuration from
> > > > >     > > "job.cancel.threshold.minutes" -
> > > > >     > > >> it will be the longest
> > > > >     > > >>
> > > > >     > > >>    "category": "Advanced",
> > > > >     > > >>
> > > > >     > > >>    "description": "Time (in minutes) for async-jobs to
> be
> > > > > forcely
> > > > >     > > >> cancelled if it has been in process for long",
> > > > >     > > >>
> > > > >     > > >>    "name": "job.cancel.threshold.minutes",
> > > > >     > > >>
> > > > >     > > >>    "value": "60"
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> > > > >     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@
> > > > gmail.com
> > > > > >>
> > > > >     > wrote:
> > > > >     > > >>
> > > > >     > > >> Big +1 for this feature; I only have a few doubts.
> > > > >     > > >>
> > > > >     > > >> * Regarding the tasks/jobs that management servers (MSs)
> > > > > execute; are
> > > > >     > > >> these
> > > > >     > > >> tasks originate from requests that come to the MS, or is
> > it
> > > > > possible
> > > > >     > > that
> > > > >     > > >> requests received by one management server to be
> executed
> > by
> > > > > other? I
> > > > >     > > >> mean,
> > > > >     > > >> if I execute a request against MS1, will this request
> > always
> > > > be
> > > > >     > > >> executed/threated by MS1, or is it possible that this
> > > request
> > > > is
> > > > >     > > executed
> > > > >     > > >> by another MS (e.g. MS2)?
> > > > >     > > >>
> > > > >     > > >> * I would suggest that after we block traffic coming
> from
> > > > >     > > >> 8080/8443/8250(we
> > > > >     > > >> will need to block this as well right?), we can log the
> > > > > execution of
> > > > >     > > >> tasks.
> > > > >     > > >> I mean, something saying, there are XXX tasks (enumerate
> > > > tasks)
> > > > > still
> > > > >     > > >> being
> > > > >     > > >> executed, we will wait for them to finish before
> shutting
> > > > down.
> > > > >     > > >>
> > > > >     > > >> * The timeout (60 minutes suggested) could be global
> > > settings
> > > > > that we
> > > > >     > > can
> > > > >     > > >> load before executing the graceful-shutdown.
> > > > >     > > >>
> > > > >     > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> > > > >     > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
> > > > > gmail.com>
> > > > >     > > >>
> > > > >     > > >> wrote:
> > > > >     > > >>
> > > > >     > > >> Use case:
> > > > >     > > >> In any environment - time to time - administrator needs
> to
> > > > > perform a
> > > > >     > > >> maintenance. Current stop sequence of cloudstack
> > management
> > > > > server
> > > > >     > will
> > > > >     > > >> ignore the fact that there may be long running async
> jobs
> > -
> > > > and
> > > > >     > > >> terminate
> > > > >     > > >> the process. This in turn can create a poor user
> > experience
> > > > and
> > > > >     > > >> occasional
> > > > >     > > >> inconsistency  in cloudstack db.
> > > > >     > > >>
> > > > >     > > >> This is especially painful in large environments where
> the
> > > > user
> > > > > has
> > > > >     > > >> thousands of nodes and there is a continuous patching
> that
> > > > > happens
> > > > >     > > >> around
> > > > >     > > >> the clock - that requires migration of workload from one
> > > node
> > > > to
> > > > >     > > >> another.
> > > > >     > > >>
> > > > >     > > >> With that said - i've created a script that monitors the
> > > async
> > > > > job
> > > > >     > > >> queue
> > > > >     > > >> for given MS and waits for it complete all jobs. More
> > > details
> > > > > are
> > > > >     > > >> posted
> > > > >     > > >> below.
> > > > >     > > >>
> > > > >     > > >> I'd like to introduce "graceful-shutdown" into the
> > > > > systemctl/service
> > > > >     > of
> > > > >     > > >> cloudstack-management service.
> > > > >     > > >>
> > > > >     > > >> The details of how it will work is below:
> > > > >     > > >>
> > > > >     > > >> Workflow for graceful shutdown:
> > > > >     > > >> Using iptables/firewalld - block any connection attempts
> > on
> > > > > 8080/8443
> > > > >     > > >> (we
> > > > >     > > >> can identify the ports dynamically)
> > > > >     > > >> Identify the MSID for the node, using the proper msid -
> > > query
> > > > >     > > >> async_job
> > > > >     > > >> table for
> > > > >     > > >> 1) any jobs that are still running (or job_status=“0”)
> > > > >     > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> > > > >     > > >> 3) job_init_msid=$my_ms_id
> > > > >     > > >>
> > > > >     > > >> Monitor this async_job table for 60 minutes - until all
> > > async
> > > > > jobs for
> > > > >     > > >> MSID
> > > > >     > > >> are done, then proceed with shutdown
> > > > >     > > >>  If failed for any reason or terminated, catch the exit
> > via
> > > > trap
> > > > >     > > >> command
> > > > >     > > >> and unblock the 8080/8443
> > > > >     > > >>
> > > > >     > > >> Comments are welcome
> > > > >     > > >>
> > > > >     > > >> Regards,
> > > > >     > > >> ilya
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >> --
> > > > >     > > >> Rafael Weingärtner
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >>
> > > > >     > > >> --
> > > > >     > > >> Rafael Weingärtner
> > > > >     > > >>
> > > > >     > >
> > > > >     >
> > > > >     >
> > > > >     >
> > > > >     > --
> > > > >     >
> > > > >     > Andrija Panić
> > > > >     >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Rafael Weingärtner
> > > >
> > >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
>



-- 
Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
Rafael

What you are suggesting - was already implemented. We've created Load
Balancing algorithms - but we did not take into account the LB algo for
maintenance (yet). Rohit and ShapeBlue were the developers behind the
feature.

What needs to happen is a tweak to LB Algorithms to become MS maintenance
aware - or create new LB Algos altogether. Essentially we need to merge
your work and this feature. Please read the FS below.

Functional Spec:


The new CA framework introduced basic support for comma-separated
list of management servers for agent, which makes an external LB
unnecessary.

This extends that feature to implement LB sorting algorithms that
sorts the management server list before they are sent to the agents.
This adds a central intelligence in the management server and adds
additional enhancements to Agent class to be algorithm aware and
have a background mechanism to check/fallback to preferred management
server (assumed as the first in the list). This is support for any
indirect agent such as the KVM, CPVM and SSVM agent, and would
provide support for management server host migration during upgrade
(when instead of in-place, new hosts are used to setup new mgmt server).

This FR introduces two new global settings:

   - indirect.agent.lb.algorithm: The algorithm for the indirect agent LB.
   - indirect.agent.lb.check.interval: The preferred host check interval
   for the agent's background task that checks and switches to agent's
   preferred host.

The indirect.agent.lb.algorithm supports following algorithm options:

   - static: use the list as provided.
   - roundrobin: evenly spreads hosts across management servers based on
   host's id.
   - shuffle: (pseudo) randomly sorts the list (not recommended for
   production).

Any changes to the global settings - indirect.agent.lb.algorithm and
host does not require restarting of the mangement server(s) and the
agents. A message bus based system dynamically reacts to change in these
global settings and propagates them to all connected agents.

Comma-separated management server list is propagated to agents on
following cases:

   - Addition of a host (including ssvm, cpvm systevms).
   - Connection or reconnection by the agents to a management server.
   - After admin changes the 'host' and/or the
   'indirect.agent.lb.algorithm' global settings.

On the agent side, the 'host' setting is saved in its properties file as:
host=<comma separated addresses>@<algorithm name>.

First the agent connects to the management server and sends its current
management server list, which is compared by the management server and
in case of failure a new/update list is sent for the agent to persist.

From the agent's perspective, the first address in the propagated list
will be considered the preferred host. A new background task can be
activated by configuring the indirect.agent.lb.check.interval which is
a cluster level global setting from CloudStack and admins can also
override this by configuring the 'host.lb.check.interval' in the
agent.properties file.

Every time agent gets a ms-host list and the algorithm, the host specific
background check interval is also sent and it dynamically reconfigures
the background task without need to restart agents.

Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
order as expected by them, however, the 'shuffle' algorithm just checks
for content and not the order of the comma separate ms host addresses.

Regards
ilya


On Fri, Apr 20, 2018 at 1:01 PM, Rafael Weingärtner <
rafaelweingartner@gmail.com> wrote:

> Is that management server load balancing feature using static
> configurations? I heard about it on the mailing list, but I did not follow
> the implementation.
>
> I do not see many problems with agents reconnecting. We can implement in
> agents (not just KVM, but also system VMs) a logic that instead of using a
> static pool of management servers configured in a properties file, they
> dynamically request a list of available management servers via that list
> management servers API method. This would require us to configure agents
> with a load balancer URL that executes the balancing between multiple
> management servers.
>
> I am +1 to remove the need for that VIP, which executes the load balance
> for connecting agents to management servers.
>
> On Fri, Apr 20, 2018 at 4:41 PM, ilya musayev <
> ilya.mailing.lists@gmail.com>
> wrote:
>
> > Rafael and Community
> >
> > All is well and good and i think we are thinking along the similar lines
> -
> > the only issue that i see right now with any approach is KVM Agents (or
> > direct agents) and using LoadBalancer on 8250.
> >
> > Here is a scenario:
> >
> > You have 2 Management Server setup fronted with a VIP on 8250.
> > The LB Algorithm is either Round Robin or Least Connections used.
> > You initiate a maintenance mode operation on one of the MS servers (call
> it
> > MS1) - assume you have a long running migration job that needs 60 minutes
> > to complete.
> > We attempt to evacuate the agents by telling them to disconnect and
> > reconnect again
> > If we are using LB on 8250 with
> > 1) Least Connection used - then all agents will continuously try to
> connect
> > to a MS1 node that is attempting to go down for maintenance. Essentially
> > with this  LB configuration this operation will never
> > 2) Round Robin - this will take a while - but eventually - you will get
> all
> > nodes connected to MS2
> >
> > The current limitation is usage of external LB on 8250. For this
> operation
> > to work without issue - would mean agents must connect to MS server
> without
> > an LB. This is a recent feature we've developed with ShapeBlue - where we
> > maintain the list of CloudStack Management Servers in the
> agent.properties
> > file.
> >
> > Unless you can think of other solution - it appears we may have to forced
> > to bypass the 8250 VIP LB and use the new feature to maintain the list of
> > management servers within agent.properties.
> >
> >
> > I need to run now, let me know what your thoughts are.
> >
> > Regards
> > ilya
> >
> >
> >
> > On Tue, Apr 17, 2018 at 8:27 AM, Rafael Weingärtner <
> > rafaelweingartner@gmail.com> wrote:
> >
> > > Ilya and others,
> > >
> > > We have been discussing this idea of graceful/nicely shutdown.  Our
> > feeling
> > > is that we (in CloudStack community) might have been trying to solve
> this
> > > problem with too much scripting. What if we developed a more integrated
> > > (native) solution?
> > >
> > > Let me explain our idea.
> > >
> > > ACS has a table called “mshost”, which is used to store management
> server
> > > information. During balancing and when jobs are dispatched to other
> > > management servers this table is consulted/queried.  Therefore, we have
> > > been discussing the idea of creating a management API for management
> > > servers.  We could have an API method that changes the state of
> > management
> > > servers to “prepare to maintenance” and then “maintenance” (as soon as
> > all
> > > of the task/jobs it is managing finish). The idea is that during
> > > rebalancing we would remove the hosts of servers that are not in “Up”
> > state
> > > (of course we would also ignore hosts in the aforementioned state to
> > > receive hosts to manage).  Moreover, when we send/dispatch jobs to
> other
> > > management servers, we could ignore the ones that are not in “Up” state
> > > (which is something already done).
> > >
> > > By doing this, the nicely shutdown could be executed in a few steps.
> > >
> > > 1 – issue the maintenance method for the management server you desire
> > > 2 – wait until the MS goes into maintenance mode, while there are still
> > > running jobs it (the management server) will be maintained in prepare
> for
> > > maintenance
> > > 3 – execute the Linux shutdown command
> > >
> > > We would need other APIs methods to manage MSs then. An (i) API method
> to
> > > list MSs, and we could even create an (ii) API to remove
> old/de-activated
> > > management servers, which we currently do not have (forcing users to
> > apply
> > > changed directly in the database).
> > >
> > > Moreover, in this model, we would not kill hanging jobs; we would wait
> > > until they expire and ACS expunges them. Of course, it is possible to
> > > develop a forceful maintenance method as well. Then, when the “prepare
> > for
> > > maintenance” takes longer than a parameter, we could kill hanging jobs.
> > >
> > > All of this would allow the MS to be kept up and receiving requests
> until
> > > it can be safely shutdown. What do you guys about this approach?
> > >
> > > On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yz...@marketo.com>
> > wrote:
> > >
> > > > As a cloud admin, I would love to have this feature.
> > > >
> > > > It so happens that I just accidentally restarted my ACS management
> > server
> > > > while two instances are migrating to another Xen cluster (via storage
> > > > migration, not live migration).  As results, both instances
> > > > ends up with corrupted data disk which can't be reattached or
> migrated.
> > > >
> > > > Any feature which prevents this from happening would be great.  A low
> > > > hanging fruit is simply checking for
> > > > if there are any async jobs running, especially any kind of migration
> > > jobs
> > > > or other known long running type of
> > > > jobs and warn the operator  so that he has a chance to abort server
> > > > shutdowns.
> > > >
> > > > Yiping
> > > >
> > > > On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com>
> > > wrote:
> > > >
> > > >     Andrija
> > > >
> > > >     This is a tough scenario.
> > > >
> > > >     As an admin, they way i would have handled this situation, is to
> > > > advertise
> > > >     the upcoming outage and then take away specific API commands
> from a
> > > > user a
> > > >     day before - so he does not cause any long running async jobs.
> Once
> > > >     maintenance completes - enable the API commands back to the user.
> > > > However -
> > > >     i dont know who your user base is and if this would be an
> > acceptable
> > > >     solution.
> > > >
> > > >     Perhaps also investigate what can be done to speed up your long
> > > running
> > > >     tasks...
> > > >
> > > >     As a side node, we will be working on a feature that would allow
> > for
> > > a
> > > >     graceful termination of the process/job, meaning if agent
> noticed a
> > > >     disconnect or termination request - it will abort the command in
> > > > flight. We
> > > >     can also consider restarting this tasks again or what not - but
> it
> > > > would
> > > >     not be part of this enhancement.
> > > >
> > > >     Regards
> > > >     ilya
> > > >
> > > >     On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <
> > > andrija.panic@gmail.com
> > > > >
> > > >     wrote:
> > > >
> > > >     > Hi Ilya,
> > > >     >
> > > >     > thanks for the feedback - but in "real world", you need to
> > > > "understand"
> > > >     > that 60min is next to useless timeout for some jobs (if I
> > > understand
> > > > this
> > > >     > specific parameter correctly ?? - job is really canceled, not
> > only
> > > > job
> > > >     > monitoring is canceled ???) -
> > > >     >
> > > >     > My value for the  "job.cancel.threshold.minutes" is 2880
> minutes
> > (2
> > > > days?)
> > > >     >
> > > >     > I can tell you when you have CEPH/NFS (CEPH even "worse" case,
> > > since
> > > > slower
> > > >     > read durign qemu-img convert process...) of 500GB, then imagine
> > > > snapshot
> > > >     > job will take many hours. Should I mention 1TB volumes (yes, we
> > had
> > > >     > client's like that...)
> > > >     > Than attaching 1TB volume, that was uploaded to ACS (lives
> > > > originally on
> > > >     > Secondary Storage, and takes time to be copied over to
> NFS/CEPH)
> > > > will take
> > > >     > up to few hours.
> > > >     > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS,
> also
> > > > takes
> > > >     > time...etc.
> > > >     >
> > > >     > I'm just giving you feedback as "user", admin of the cloud,
> zero
> > > DEV
> > > > skills
> > > >     > here :) , just to make sure you make practical decisions (and I
> > > > admit I
> > > >     > might be wrong with my stuff, but just giving you feedback from
> > our
> > > > public
> > > >     > cloud setup)
> > > >     >
> > > >     >
> > > >     > Cheers!
> > > >     >
> > > >     >
> > > >     >
> > > >     >
> > > >     > On 5 April 2018 at 15:16, Tutkowski, Mike <
> > > Mike.Tutkowski@netapp.com
> > > > >
> > > >     > wrote:
> > > >     >
> > > >     > > Wow, there’s been a lot of good details noted from several
> > people
> > > > on how
> > > >     > > this process works today and how we’d like it to work in the
> > near
> > > > future.
> > > >     > >
> > > >     > > 1) Any chance this is already documented on the Wiki?
> > > >     > >
> > > >     > > 2) If not, any chance someone would be willing to do so (a
> flow
> > > > diagram
> > > >     > > would be particularly useful).
> > > >     > >
> > > >     > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
> > > > marco@exoscale.ch>
> > > >     > > wrote:
> > > >     > > >
> > > >     > > > Hi all,
> > > >     > > >
> > > >     > > > Good point ilya but as stated by Sergey there's more thing
> to
> > > > consider
> > > >     > > > before being able to do a proper shutdown. I augmented my
> > > script
> > > > I gave
> > > >     > > you
> > > >     > > > originally and changed code in CS. What we're doing for our
> > > > environment
> > > >     > > is
> > > >     > > > as follow:
> > > >     > > >
> > > >     > > > 1. the MGMT looks for a change in the file /etc/lb-agent
> > which
> > > > contains
> > > >     > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
> > > > disable the
> > > >     > > > mgmt on the keyword "maint" and the mgmt server stops a
> > couple
> > > of
> > > >     > > > threads[1] to stop processing async jobs in the queue
> > > >     > > > 2. Looks for the async jobs and wait until there is none to
> > > > ensure you
> > > >     > > can
> > > >     > > > send the reconnect commands (if jobs are running, a
> reconnect
> > > > will
> > > >     > result
> > > >     > > > in a failed job since the result will never reach the
> > > management
> > > >     > server -
> > > >     > > > the agent waits for the current job to be done before
> > > > reconnecting, and
> > > >     > > > discard the result... rooms for improvement here!)
> > > >     > > > 3. Issue a reconnectHost command to all the hosts connected
> > to
> > > > the mgmt
> > > >     > > > server so that they reconnect to another one, otherwise the
> > > mgmt
> > > > must
> > > >     > be
> > > >     > > up
> > > >     > > > since it is used to forward commands to agents.
> > > >     > > > 4. when all agents are reconnected, we can shutdown the
> > > > management
> > > >     > server
> > > >     > > > and perform the maintenance.
> > > >     > > >
> > > >     > > > One issue remains for me, during the reconnect, the
> commands
> > > > that are
> > > >     > > > processed at the same time should be kept in a queue until
> > the
> > > > agents
> > > >     > > have
> > > >     > > > finished any current jobs and have reconnected. Today the
> > > little
> > > > time
> > > >     > > > window during which the reconnect happens can lead to
> failed
> > > > jobs due
> > > >     > to
> > > >     > > > the agent not being connected at the right moment.
> > > >     > > >
> > > >     > > > I could push a PR for the change to stop some processing
> > > threads
> > > > based
> > > >     > on
> > > >     > > > the content of a file. It's possible also to cancel the
> drain
> > > of
> > > > the
> > > >     > > > management by simply changing the content of the file back
> to
> > > > "ready"
> > > >     > > > again, instead of "maint" [2].
> > > >     > > >
> > > >     > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> > > >     > > > [2] HA proxy documentation on agent checker:
> > > > https://cbonte.github.io/
> > > >     > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> > > >     > > >
> > > >     > > > Regarding your issue on the port blocking, I think it's
> fair
> > to
> > > >     > consider
> > > >     > > > that if you want to shutdown your server at some point, you
> > > have
> > > > to
> > > >     > stop
> > > >     > > > serving (some) requests. Here the only way it's to stop
> > serving
> > > >     > > everything.
> > > >     > > > If the API had a REST design, we could reject any
> > > POST/PUT/DELETE
> > > >     > > > operations and allow GET ones. I don't know how hard it
> would
> > > be
> > > > today
> > > >     > to
> > > >     > > > only allow listBaseCmd operations to be more friendly with
> > the
> > > > users.
> > > >     > > >
> > > >     > > > Marco
> > > >     > > >
> > > >     > > >
> > > >     > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
> > > > serg38l@hotmail.com>
> > > >     > > > wrote:
> > > >     > > >
> > > >     > > >> Now without spellchecking :)
> > > >     > > >>
> > > >     > > >> This is not simple e.g. for VMware. Each management server
> > > also
> > > > acts
> > > >     > as
> > > >     > > an
> > > >     > > >> agent proxy so tasks against a particular ESX host will be
> > > > always
> > > >     > > >> forwarded. That right answer will be to support a native
> > > > “maintenance
> > > >     > > mode”
> > > >     > > >> for management server. When entered to such mode the
> > > management
> > > > server
> > > >     > > >> should release all agents including SSVM, block/redirect
> API
> > > > calls and
> > > >     > > >> login request and finish all async job it originated.
> > > >     > > >>
> > > >     > > >>
> > > >     > > >>
> > > >     > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
> > > > serg38l@hotmail.com
> > > >     > > <mailto:
> > > >     > > >> serg38l@hotmail.com>> wrote:
> > > >     > > >>
> > > >     > > >> This is not simple e.g. for VMware. Each management server
> > > also
> > > > acts
> > > >     > as
> > > >     > > an
> > > >     > > >> agent proxy so tasks against a particular ESX host will be
> > > > always
> > > >     > > >> forwarded. That right answer will be to a native support
> for
> > > >     > > “maintenance
> > > >     > > >> mode” for management server. When entered to such mode the
> > > > management
> > > >     > > >> server should release all agents including save,
> > > block/redirect
> > > > API
> > > >     > > calls
> > > >     > > >> and login request and finish all a sync job it originated.
> > > >     > > >>
> > > >     > > >> Sent from my iPhone
> > > >     > > >>
> > > >     > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> > > >     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@
> > > gmail.com
> > > > >>
> > > >     > wrote:
> > > >     > > >>
> > > >     > > >> Ilya, still regarding the management server that is being
> > shut
> > > > down
> > > >     > > issue;
> > > >     > > >> if other MSs/or maybe system VMs (I am not sure to know if
> > > they
> > > > are
> > > >     > > able to
> > > >     > > >> do such tasks) can direct/redirect/send new jobs to this
> > > > management
> > > >     > > server
> > > >     > > >> (the one being shut down), the process might never end
> > because
> > > > new
> > > >     > tasks
> > > >     > > >> are always being created for the management server that we
> > > want
> > > > to
> > > >     > shut
> > > >     > > >> down. Is this scenario possible?
> > > >     > > >>
> > > >     > > >> That is why I mentioned blocking the port 8250 for the
> > > >     > > “graceful-shutdown”.
> > > >     > > >>
> > > >     > > >> If this scenario is not possible, then everything s fine.
> > > >     > > >>
> > > >     > > >>
> > > >     > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> > > >     > > ilya.mailing.lists@gmail.com
> > > >     > > >> <ma...@gmail.com>>
> > > >     > > >> wrote:
> > > >     > > >>
> > > >     > > >> I'm thinking of using a configuration from
> > > >     > > "job.cancel.threshold.minutes" -
> > > >     > > >> it will be the longest
> > > >     > > >>
> > > >     > > >>    "category": "Advanced",
> > > >     > > >>
> > > >     > > >>    "description": "Time (in minutes) for async-jobs to be
> > > > forcely
> > > >     > > >> cancelled if it has been in process for long",
> > > >     > > >>
> > > >     > > >>    "name": "job.cancel.threshold.minutes",
> > > >     > > >>
> > > >     > > >>    "value": "60"
> > > >     > > >>
> > > >     > > >>
> > > >     > > >>
> > > >     > > >>
> > > >     > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> > > >     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@
> > > gmail.com
> > > > >>
> > > >     > wrote:
> > > >     > > >>
> > > >     > > >> Big +1 for this feature; I only have a few doubts.
> > > >     > > >>
> > > >     > > >> * Regarding the tasks/jobs that management servers (MSs)
> > > > execute; are
> > > >     > > >> these
> > > >     > > >> tasks originate from requests that come to the MS, or is
> it
> > > > possible
> > > >     > > that
> > > >     > > >> requests received by one management server to be executed
> by
> > > > other? I
> > > >     > > >> mean,
> > > >     > > >> if I execute a request against MS1, will this request
> always
> > > be
> > > >     > > >> executed/threated by MS1, or is it possible that this
> > request
> > > is
> > > >     > > executed
> > > >     > > >> by another MS (e.g. MS2)?
> > > >     > > >>
> > > >     > > >> * I would suggest that after we block traffic coming from
> > > >     > > >> 8080/8443/8250(we
> > > >     > > >> will need to block this as well right?), we can log the
> > > > execution of
> > > >     > > >> tasks.
> > > >     > > >> I mean, something saying, there are XXX tasks (enumerate
> > > tasks)
> > > > still
> > > >     > > >> being
> > > >     > > >> executed, we will wait for them to finish before shutting
> > > down.
> > > >     > > >>
> > > >     > > >> * The timeout (60 minutes suggested) could be global
> > settings
> > > > that we
> > > >     > > can
> > > >     > > >> load before executing the graceful-shutdown.
> > > >     > > >>
> > > >     > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> > > >     > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
> > > > gmail.com>
> > > >     > > >>
> > > >     > > >> wrote:
> > > >     > > >>
> > > >     > > >> Use case:
> > > >     > > >> In any environment - time to time - administrator needs to
> > > > perform a
> > > >     > > >> maintenance. Current stop sequence of cloudstack
> management
> > > > server
> > > >     > will
> > > >     > > >> ignore the fact that there may be long running async jobs
> -
> > > and
> > > >     > > >> terminate
> > > >     > > >> the process. This in turn can create a poor user
> experience
> > > and
> > > >     > > >> occasional
> > > >     > > >> inconsistency  in cloudstack db.
> > > >     > > >>
> > > >     > > >> This is especially painful in large environments where the
> > > user
> > > > has
> > > >     > > >> thousands of nodes and there is a continuous patching that
> > > > happens
> > > >     > > >> around
> > > >     > > >> the clock - that requires migration of workload from one
> > node
> > > to
> > > >     > > >> another.
> > > >     > > >>
> > > >     > > >> With that said - i've created a script that monitors the
> > async
> > > > job
> > > >     > > >> queue
> > > >     > > >> for given MS and waits for it complete all jobs. More
> > details
> > > > are
> > > >     > > >> posted
> > > >     > > >> below.
> > > >     > > >>
> > > >     > > >> I'd like to introduce "graceful-shutdown" into the
> > > > systemctl/service
> > > >     > of
> > > >     > > >> cloudstack-management service.
> > > >     > > >>
> > > >     > > >> The details of how it will work is below:
> > > >     > > >>
> > > >     > > >> Workflow for graceful shutdown:
> > > >     > > >> Using iptables/firewalld - block any connection attempts
> on
> > > > 8080/8443
> > > >     > > >> (we
> > > >     > > >> can identify the ports dynamically)
> > > >     > > >> Identify the MSID for the node, using the proper msid -
> > query
> > > >     > > >> async_job
> > > >     > > >> table for
> > > >     > > >> 1) any jobs that are still running (or job_status=“0”)
> > > >     > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> > > >     > > >> 3) job_init_msid=$my_ms_id
> > > >     > > >>
> > > >     > > >> Monitor this async_job table for 60 minutes - until all
> > async
> > > > jobs for
> > > >     > > >> MSID
> > > >     > > >> are done, then proceed with shutdown
> > > >     > > >>  If failed for any reason or terminated, catch the exit
> via
> > > trap
> > > >     > > >> command
> > > >     > > >> and unblock the 8080/8443
> > > >     > > >>
> > > >     > > >> Comments are welcome
> > > >     > > >>
> > > >     > > >> Regards,
> > > >     > > >> ilya
> > > >     > > >>
> > > >     > > >>
> > > >     > > >>
> > > >     > > >>
> > > >     > > >> --
> > > >     > > >> Rafael Weingärtner
> > > >     > > >>
> > > >     > > >>
> > > >     > > >>
> > > >     > > >>
> > > >     > > >>
> > > >     > > >> --
> > > >     > > >> Rafael Weingärtner
> > > >     > > >>
> > > >     > >
> > > >     >
> > > >     >
> > > >     >
> > > >     > --
> > > >     >
> > > >     > Andrija Panić
> > > >     >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Rafael Weingärtner
> > >
> >
>
>
>
> --
> Rafael Weingärtner
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Rafael Weingärtner <ra...@gmail.com>.
Is that management server load balancing feature using static
configurations? I heard about it on the mailing list, but I did not follow
the implementation.

I do not see many problems with agents reconnecting. We can implement in
agents (not just KVM, but also system VMs) a logic that instead of using a
static pool of management servers configured in a properties file, they
dynamically request a list of available management servers via that list
management servers API method. This would require us to configure agents
with a load balancer URL that executes the balancing between multiple
management servers.

I am +1 to remove the need for that VIP, which executes the load balance
for connecting agents to management servers.

On Fri, Apr 20, 2018 at 4:41 PM, ilya musayev <il...@gmail.com>
wrote:

> Rafael and Community
>
> All is well and good and i think we are thinking along the similar lines -
> the only issue that i see right now with any approach is KVM Agents (or
> direct agents) and using LoadBalancer on 8250.
>
> Here is a scenario:
>
> You have 2 Management Server setup fronted with a VIP on 8250.
> The LB Algorithm is either Round Robin or Least Connections used.
> You initiate a maintenance mode operation on one of the MS servers (call it
> MS1) - assume you have a long running migration job that needs 60 minutes
> to complete.
> We attempt to evacuate the agents by telling them to disconnect and
> reconnect again
> If we are using LB on 8250 with
> 1) Least Connection used - then all agents will continuously try to connect
> to a MS1 node that is attempting to go down for maintenance. Essentially
> with this  LB configuration this operation will never
> 2) Round Robin - this will take a while - but eventually - you will get all
> nodes connected to MS2
>
> The current limitation is usage of external LB on 8250. For this operation
> to work without issue - would mean agents must connect to MS server without
> an LB. This is a recent feature we've developed with ShapeBlue - where we
> maintain the list of CloudStack Management Servers in the agent.properties
> file.
>
> Unless you can think of other solution - it appears we may have to forced
> to bypass the 8250 VIP LB and use the new feature to maintain the list of
> management servers within agent.properties.
>
>
> I need to run now, let me know what your thoughts are.
>
> Regards
> ilya
>
>
>
> On Tue, Apr 17, 2018 at 8:27 AM, Rafael Weingärtner <
> rafaelweingartner@gmail.com> wrote:
>
> > Ilya and others,
> >
> > We have been discussing this idea of graceful/nicely shutdown.  Our
> feeling
> > is that we (in CloudStack community) might have been trying to solve this
> > problem with too much scripting. What if we developed a more integrated
> > (native) solution?
> >
> > Let me explain our idea.
> >
> > ACS has a table called “mshost”, which is used to store management server
> > information. During balancing and when jobs are dispatched to other
> > management servers this table is consulted/queried.  Therefore, we have
> > been discussing the idea of creating a management API for management
> > servers.  We could have an API method that changes the state of
> management
> > servers to “prepare to maintenance” and then “maintenance” (as soon as
> all
> > of the task/jobs it is managing finish). The idea is that during
> > rebalancing we would remove the hosts of servers that are not in “Up”
> state
> > (of course we would also ignore hosts in the aforementioned state to
> > receive hosts to manage).  Moreover, when we send/dispatch jobs to other
> > management servers, we could ignore the ones that are not in “Up” state
> > (which is something already done).
> >
> > By doing this, the nicely shutdown could be executed in a few steps.
> >
> > 1 – issue the maintenance method for the management server you desire
> > 2 – wait until the MS goes into maintenance mode, while there are still
> > running jobs it (the management server) will be maintained in prepare for
> > maintenance
> > 3 – execute the Linux shutdown command
> >
> > We would need other APIs methods to manage MSs then. An (i) API method to
> > list MSs, and we could even create an (ii) API to remove old/de-activated
> > management servers, which we currently do not have (forcing users to
> apply
> > changed directly in the database).
> >
> > Moreover, in this model, we would not kill hanging jobs; we would wait
> > until they expire and ACS expunges them. Of course, it is possible to
> > develop a forceful maintenance method as well. Then, when the “prepare
> for
> > maintenance” takes longer than a parameter, we could kill hanging jobs.
> >
> > All of this would allow the MS to be kept up and receiving requests until
> > it can be safely shutdown. What do you guys about this approach?
> >
> > On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yz...@marketo.com>
> wrote:
> >
> > > As a cloud admin, I would love to have this feature.
> > >
> > > It so happens that I just accidentally restarted my ACS management
> server
> > > while two instances are migrating to another Xen cluster (via storage
> > > migration, not live migration).  As results, both instances
> > > ends up with corrupted data disk which can't be reattached or migrated.
> > >
> > > Any feature which prevents this from happening would be great.  A low
> > > hanging fruit is simply checking for
> > > if there are any async jobs running, especially any kind of migration
> > jobs
> > > or other known long running type of
> > > jobs and warn the operator  so that he has a chance to abort server
> > > shutdowns.
> > >
> > > Yiping
> > >
> > > On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com>
> > wrote:
> > >
> > >     Andrija
> > >
> > >     This is a tough scenario.
> > >
> > >     As an admin, they way i would have handled this situation, is to
> > > advertise
> > >     the upcoming outage and then take away specific API commands from a
> > > user a
> > >     day before - so he does not cause any long running async jobs. Once
> > >     maintenance completes - enable the API commands back to the user.
> > > However -
> > >     i dont know who your user base is and if this would be an
> acceptable
> > >     solution.
> > >
> > >     Perhaps also investigate what can be done to speed up your long
> > running
> > >     tasks...
> > >
> > >     As a side node, we will be working on a feature that would allow
> for
> > a
> > >     graceful termination of the process/job, meaning if agent noticed a
> > >     disconnect or termination request - it will abort the command in
> > > flight. We
> > >     can also consider restarting this tasks again or what not - but it
> > > would
> > >     not be part of this enhancement.
> > >
> > >     Regards
> > >     ilya
> > >
> > >     On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <
> > andrija.panic@gmail.com
> > > >
> > >     wrote:
> > >
> > >     > Hi Ilya,
> > >     >
> > >     > thanks for the feedback - but in "real world", you need to
> > > "understand"
> > >     > that 60min is next to useless timeout for some jobs (if I
> > understand
> > > this
> > >     > specific parameter correctly ?? - job is really canceled, not
> only
> > > job
> > >     > monitoring is canceled ???) -
> > >     >
> > >     > My value for the  "job.cancel.threshold.minutes" is 2880 minutes
> (2
> > > days?)
> > >     >
> > >     > I can tell you when you have CEPH/NFS (CEPH even "worse" case,
> > since
> > > slower
> > >     > read durign qemu-img convert process...) of 500GB, then imagine
> > > snapshot
> > >     > job will take many hours. Should I mention 1TB volumes (yes, we
> had
> > >     > client's like that...)
> > >     > Than attaching 1TB volume, that was uploaded to ACS (lives
> > > originally on
> > >     > Secondary Storage, and takes time to be copied over to NFS/CEPH)
> > > will take
> > >     > up to few hours.
> > >     > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
> > > takes
> > >     > time...etc.
> > >     >
> > >     > I'm just giving you feedback as "user", admin of the cloud, zero
> > DEV
> > > skills
> > >     > here :) , just to make sure you make practical decisions (and I
> > > admit I
> > >     > might be wrong with my stuff, but just giving you feedback from
> our
> > > public
> > >     > cloud setup)
> > >     >
> > >     >
> > >     > Cheers!
> > >     >
> > >     >
> > >     >
> > >     >
> > >     > On 5 April 2018 at 15:16, Tutkowski, Mike <
> > Mike.Tutkowski@netapp.com
> > > >
> > >     > wrote:
> > >     >
> > >     > > Wow, there’s been a lot of good details noted from several
> people
> > > on how
> > >     > > this process works today and how we’d like it to work in the
> near
> > > future.
> > >     > >
> > >     > > 1) Any chance this is already documented on the Wiki?
> > >     > >
> > >     > > 2) If not, any chance someone would be willing to do so (a flow
> > > diagram
> > >     > > would be particularly useful).
> > >     > >
> > >     > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
> > > marco@exoscale.ch>
> > >     > > wrote:
> > >     > > >
> > >     > > > Hi all,
> > >     > > >
> > >     > > > Good point ilya but as stated by Sergey there's more thing to
> > > consider
> > >     > > > before being able to do a proper shutdown. I augmented my
> > script
> > > I gave
> > >     > > you
> > >     > > > originally and changed code in CS. What we're doing for our
> > > environment
> > >     > > is
> > >     > > > as follow:
> > >     > > >
> > >     > > > 1. the MGMT looks for a change in the file /etc/lb-agent
> which
> > > contains
> > >     > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
> > > disable the
> > >     > > > mgmt on the keyword "maint" and the mgmt server stops a
> couple
> > of
> > >     > > > threads[1] to stop processing async jobs in the queue
> > >     > > > 2. Looks for the async jobs and wait until there is none to
> > > ensure you
> > >     > > can
> > >     > > > send the reconnect commands (if jobs are running, a reconnect
> > > will
> > >     > result
> > >     > > > in a failed job since the result will never reach the
> > management
> > >     > server -
> > >     > > > the agent waits for the current job to be done before
> > > reconnecting, and
> > >     > > > discard the result... rooms for improvement here!)
> > >     > > > 3. Issue a reconnectHost command to all the hosts connected
> to
> > > the mgmt
> > >     > > > server so that they reconnect to another one, otherwise the
> > mgmt
> > > must
> > >     > be
> > >     > > up
> > >     > > > since it is used to forward commands to agents.
> > >     > > > 4. when all agents are reconnected, we can shutdown the
> > > management
> > >     > server
> > >     > > > and perform the maintenance.
> > >     > > >
> > >     > > > One issue remains for me, during the reconnect, the commands
> > > that are
> > >     > > > processed at the same time should be kept in a queue until
> the
> > > agents
> > >     > > have
> > >     > > > finished any current jobs and have reconnected. Today the
> > little
> > > time
> > >     > > > window during which the reconnect happens can lead to failed
> > > jobs due
> > >     > to
> > >     > > > the agent not being connected at the right moment.
> > >     > > >
> > >     > > > I could push a PR for the change to stop some processing
> > threads
> > > based
> > >     > on
> > >     > > > the content of a file. It's possible also to cancel the drain
> > of
> > > the
> > >     > > > management by simply changing the content of the file back to
> > > "ready"
> > >     > > > again, instead of "maint" [2].
> > >     > > >
> > >     > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> > >     > > > [2] HA proxy documentation on agent checker:
> > > https://cbonte.github.io/
> > >     > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> > >     > > >
> > >     > > > Regarding your issue on the port blocking, I think it's fair
> to
> > >     > consider
> > >     > > > that if you want to shutdown your server at some point, you
> > have
> > > to
> > >     > stop
> > >     > > > serving (some) requests. Here the only way it's to stop
> serving
> > >     > > everything.
> > >     > > > If the API had a REST design, we could reject any
> > POST/PUT/DELETE
> > >     > > > operations and allow GET ones. I don't know how hard it would
> > be
> > > today
> > >     > to
> > >     > > > only allow listBaseCmd operations to be more friendly with
> the
> > > users.
> > >     > > >
> > >     > > > Marco
> > >     > > >
> > >     > > >
> > >     > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
> > > serg38l@hotmail.com>
> > >     > > > wrote:
> > >     > > >
> > >     > > >> Now without spellchecking :)
> > >     > > >>
> > >     > > >> This is not simple e.g. for VMware. Each management server
> > also
> > > acts
> > >     > as
> > >     > > an
> > >     > > >> agent proxy so tasks against a particular ESX host will be
> > > always
> > >     > > >> forwarded. That right answer will be to support a native
> > > “maintenance
> > >     > > mode”
> > >     > > >> for management server. When entered to such mode the
> > management
> > > server
> > >     > > >> should release all agents including SSVM, block/redirect API
> > > calls and
> > >     > > >> login request and finish all async job it originated.
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
> > > serg38l@hotmail.com
> > >     > > <mailto:
> > >     > > >> serg38l@hotmail.com>> wrote:
> > >     > > >>
> > >     > > >> This is not simple e.g. for VMware. Each management server
> > also
> > > acts
> > >     > as
> > >     > > an
> > >     > > >> agent proxy so tasks against a particular ESX host will be
> > > always
> > >     > > >> forwarded. That right answer will be to a native support for
> > >     > > “maintenance
> > >     > > >> mode” for management server. When entered to such mode the
> > > management
> > >     > > >> server should release all agents including save,
> > block/redirect
> > > API
> > >     > > calls
> > >     > > >> and login request and finish all a sync job it originated.
> > >     > > >>
> > >     > > >> Sent from my iPhone
> > >     > > >>
> > >     > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> > >     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@
> > gmail.com
> > > >>
> > >     > wrote:
> > >     > > >>
> > >     > > >> Ilya, still regarding the management server that is being
> shut
> > > down
> > >     > > issue;
> > >     > > >> if other MSs/or maybe system VMs (I am not sure to know if
> > they
> > > are
> > >     > > able to
> > >     > > >> do such tasks) can direct/redirect/send new jobs to this
> > > management
> > >     > > server
> > >     > > >> (the one being shut down), the process might never end
> because
> > > new
> > >     > tasks
> > >     > > >> are always being created for the management server that we
> > want
> > > to
> > >     > shut
> > >     > > >> down. Is this scenario possible?
> > >     > > >>
> > >     > > >> That is why I mentioned blocking the port 8250 for the
> > >     > > “graceful-shutdown”.
> > >     > > >>
> > >     > > >> If this scenario is not possible, then everything s fine.
> > >     > > >>
> > >     > > >>
> > >     > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> > >     > > ilya.mailing.lists@gmail.com
> > >     > > >> <ma...@gmail.com>>
> > >     > > >> wrote:
> > >     > > >>
> > >     > > >> I'm thinking of using a configuration from
> > >     > > "job.cancel.threshold.minutes" -
> > >     > > >> it will be the longest
> > >     > > >>
> > >     > > >>    "category": "Advanced",
> > >     > > >>
> > >     > > >>    "description": "Time (in minutes) for async-jobs to be
> > > forcely
> > >     > > >> cancelled if it has been in process for long",
> > >     > > >>
> > >     > > >>    "name": "job.cancel.threshold.minutes",
> > >     > > >>
> > >     > > >>    "value": "60"
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> > >     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@
> > gmail.com
> > > >>
> > >     > wrote:
> > >     > > >>
> > >     > > >> Big +1 for this feature; I only have a few doubts.
> > >     > > >>
> > >     > > >> * Regarding the tasks/jobs that management servers (MSs)
> > > execute; are
> > >     > > >> these
> > >     > > >> tasks originate from requests that come to the MS, or is it
> > > possible
> > >     > > that
> > >     > > >> requests received by one management server to be executed by
> > > other? I
> > >     > > >> mean,
> > >     > > >> if I execute a request against MS1, will this request always
> > be
> > >     > > >> executed/threated by MS1, or is it possible that this
> request
> > is
> > >     > > executed
> > >     > > >> by another MS (e.g. MS2)?
> > >     > > >>
> > >     > > >> * I would suggest that after we block traffic coming from
> > >     > > >> 8080/8443/8250(we
> > >     > > >> will need to block this as well right?), we can log the
> > > execution of
> > >     > > >> tasks.
> > >     > > >> I mean, something saying, there are XXX tasks (enumerate
> > tasks)
> > > still
> > >     > > >> being
> > >     > > >> executed, we will wait for them to finish before shutting
> > down.
> > >     > > >>
> > >     > > >> * The timeout (60 minutes suggested) could be global
> settings
> > > that we
> > >     > > can
> > >     > > >> load before executing the graceful-shutdown.
> > >     > > >>
> > >     > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> > >     > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
> > > gmail.com>
> > >     > > >>
> > >     > > >> wrote:
> > >     > > >>
> > >     > > >> Use case:
> > >     > > >> In any environment - time to time - administrator needs to
> > > perform a
> > >     > > >> maintenance. Current stop sequence of cloudstack management
> > > server
> > >     > will
> > >     > > >> ignore the fact that there may be long running async jobs -
> > and
> > >     > > >> terminate
> > >     > > >> the process. This in turn can create a poor user experience
> > and
> > >     > > >> occasional
> > >     > > >> inconsistency  in cloudstack db.
> > >     > > >>
> > >     > > >> This is especially painful in large environments where the
> > user
> > > has
> > >     > > >> thousands of nodes and there is a continuous patching that
> > > happens
> > >     > > >> around
> > >     > > >> the clock - that requires migration of workload from one
> node
> > to
> > >     > > >> another.
> > >     > > >>
> > >     > > >> With that said - i've created a script that monitors the
> async
> > > job
> > >     > > >> queue
> > >     > > >> for given MS and waits for it complete all jobs. More
> details
> > > are
> > >     > > >> posted
> > >     > > >> below.
> > >     > > >>
> > >     > > >> I'd like to introduce "graceful-shutdown" into the
> > > systemctl/service
> > >     > of
> > >     > > >> cloudstack-management service.
> > >     > > >>
> > >     > > >> The details of how it will work is below:
> > >     > > >>
> > >     > > >> Workflow for graceful shutdown:
> > >     > > >> Using iptables/firewalld - block any connection attempts on
> > > 8080/8443
> > >     > > >> (we
> > >     > > >> can identify the ports dynamically)
> > >     > > >> Identify the MSID for the node, using the proper msid -
> query
> > >     > > >> async_job
> > >     > > >> table for
> > >     > > >> 1) any jobs that are still running (or job_status=“0”)
> > >     > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> > >     > > >> 3) job_init_msid=$my_ms_id
> > >     > > >>
> > >     > > >> Monitor this async_job table for 60 minutes - until all
> async
> > > jobs for
> > >     > > >> MSID
> > >     > > >> are done, then proceed with shutdown
> > >     > > >>  If failed for any reason or terminated, catch the exit via
> > trap
> > >     > > >> command
> > >     > > >> and unblock the 8080/8443
> > >     > > >>
> > >     > > >> Comments are welcome
> > >     > > >>
> > >     > > >> Regards,
> > >     > > >> ilya
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >> --
> > >     > > >> Rafael Weingärtner
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >> --
> > >     > > >> Rafael Weingärtner
> > >     > > >>
> > >     > >
> > >     >
> > >     >
> > >     >
> > >     > --
> > >     >
> > >     > Andrija Panić
> > >     >
> > >
> > >
> > >
> >
> >
> > --
> > Rafael Weingärtner
> >
>



-- 
Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
Rafael and Community

All is well and good and i think we are thinking along the similar lines -
the only issue that i see right now with any approach is KVM Agents (or
direct agents) and using LoadBalancer on 8250.

Here is a scenario:

You have 2 Management Server setup fronted with a VIP on 8250.
The LB Algorithm is either Round Robin or Least Connections used.
You initiate a maintenance mode operation on one of the MS servers (call it
MS1) - assume you have a long running migration job that needs 60 minutes
to complete.
We attempt to evacuate the agents by telling them to disconnect and
reconnect again
If we are using LB on 8250 with
1) Least Connection used - then all agents will continuously try to connect
to a MS1 node that is attempting to go down for maintenance. Essentially
with this  LB configuration this operation will never
2) Round Robin - this will take a while - but eventually - you will get all
nodes connected to MS2

The current limitation is usage of external LB on 8250. For this operation
to work without issue - would mean agents must connect to MS server without
an LB. This is a recent feature we've developed with ShapeBlue - where we
maintain the list of CloudStack Management Servers in the agent.properties
file.

Unless you can think of other solution - it appears we may have to forced
to bypass the 8250 VIP LB and use the new feature to maintain the list of
management servers within agent.properties.


I need to run now, let me know what your thoughts are.

Regards
ilya



On Tue, Apr 17, 2018 at 8:27 AM, Rafael Weingärtner <
rafaelweingartner@gmail.com> wrote:

> Ilya and others,
>
> We have been discussing this idea of graceful/nicely shutdown.  Our feeling
> is that we (in CloudStack community) might have been trying to solve this
> problem with too much scripting. What if we developed a more integrated
> (native) solution?
>
> Let me explain our idea.
>
> ACS has a table called “mshost”, which is used to store management server
> information. During balancing and when jobs are dispatched to other
> management servers this table is consulted/queried.  Therefore, we have
> been discussing the idea of creating a management API for management
> servers.  We could have an API method that changes the state of management
> servers to “prepare to maintenance” and then “maintenance” (as soon as all
> of the task/jobs it is managing finish). The idea is that during
> rebalancing we would remove the hosts of servers that are not in “Up” state
> (of course we would also ignore hosts in the aforementioned state to
> receive hosts to manage).  Moreover, when we send/dispatch jobs to other
> management servers, we could ignore the ones that are not in “Up” state
> (which is something already done).
>
> By doing this, the nicely shutdown could be executed in a few steps.
>
> 1 – issue the maintenance method for the management server you desire
> 2 – wait until the MS goes into maintenance mode, while there are still
> running jobs it (the management server) will be maintained in prepare for
> maintenance
> 3 – execute the Linux shutdown command
>
> We would need other APIs methods to manage MSs then. An (i) API method to
> list MSs, and we could even create an (ii) API to remove old/de-activated
> management servers, which we currently do not have (forcing users to apply
> changed directly in the database).
>
> Moreover, in this model, we would not kill hanging jobs; we would wait
> until they expire and ACS expunges them. Of course, it is possible to
> develop a forceful maintenance method as well. Then, when the “prepare for
> maintenance” takes longer than a parameter, we could kill hanging jobs.
>
> All of this would allow the MS to be kept up and receiving requests until
> it can be safely shutdown. What do you guys about this approach?
>
> On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yz...@marketo.com> wrote:
>
> > As a cloud admin, I would love to have this feature.
> >
> > It so happens that I just accidentally restarted my ACS management server
> > while two instances are migrating to another Xen cluster (via storage
> > migration, not live migration).  As results, both instances
> > ends up with corrupted data disk which can't be reattached or migrated.
> >
> > Any feature which prevents this from happening would be great.  A low
> > hanging fruit is simply checking for
> > if there are any async jobs running, especially any kind of migration
> jobs
> > or other known long running type of
> > jobs and warn the operator  so that he has a chance to abort server
> > shutdowns.
> >
> > Yiping
> >
> > On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com>
> wrote:
> >
> >     Andrija
> >
> >     This is a tough scenario.
> >
> >     As an admin, they way i would have handled this situation, is to
> > advertise
> >     the upcoming outage and then take away specific API commands from a
> > user a
> >     day before - so he does not cause any long running async jobs. Once
> >     maintenance completes - enable the API commands back to the user.
> > However -
> >     i dont know who your user base is and if this would be an acceptable
> >     solution.
> >
> >     Perhaps also investigate what can be done to speed up your long
> running
> >     tasks...
> >
> >     As a side node, we will be working on a feature that would allow for
> a
> >     graceful termination of the process/job, meaning if agent noticed a
> >     disconnect or termination request - it will abort the command in
> > flight. We
> >     can also consider restarting this tasks again or what not - but it
> > would
> >     not be part of this enhancement.
> >
> >     Regards
> >     ilya
> >
> >     On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <
> andrija.panic@gmail.com
> > >
> >     wrote:
> >
> >     > Hi Ilya,
> >     >
> >     > thanks for the feedback - but in "real world", you need to
> > "understand"
> >     > that 60min is next to useless timeout for some jobs (if I
> understand
> > this
> >     > specific parameter correctly ?? - job is really canceled, not only
> > job
> >     > monitoring is canceled ???) -
> >     >
> >     > My value for the  "job.cancel.threshold.minutes" is 2880 minutes (2
> > days?)
> >     >
> >     > I can tell you when you have CEPH/NFS (CEPH even "worse" case,
> since
> > slower
> >     > read durign qemu-img convert process...) of 500GB, then imagine
> > snapshot
> >     > job will take many hours. Should I mention 1TB volumes (yes, we had
> >     > client's like that...)
> >     > Than attaching 1TB volume, that was uploaded to ACS (lives
> > originally on
> >     > Secondary Storage, and takes time to be copied over to NFS/CEPH)
> > will take
> >     > up to few hours.
> >     > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
> > takes
> >     > time...etc.
> >     >
> >     > I'm just giving you feedback as "user", admin of the cloud, zero
> DEV
> > skills
> >     > here :) , just to make sure you make practical decisions (and I
> > admit I
> >     > might be wrong with my stuff, but just giving you feedback from our
> > public
> >     > cloud setup)
> >     >
> >     >
> >     > Cheers!
> >     >
> >     >
> >     >
> >     >
> >     > On 5 April 2018 at 15:16, Tutkowski, Mike <
> Mike.Tutkowski@netapp.com
> > >
> >     > wrote:
> >     >
> >     > > Wow, there’s been a lot of good details noted from several people
> > on how
> >     > > this process works today and how we’d like it to work in the near
> > future.
> >     > >
> >     > > 1) Any chance this is already documented on the Wiki?
> >     > >
> >     > > 2) If not, any chance someone would be willing to do so (a flow
> > diagram
> >     > > would be particularly useful).
> >     > >
> >     > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
> > marco@exoscale.ch>
> >     > > wrote:
> >     > > >
> >     > > > Hi all,
> >     > > >
> >     > > > Good point ilya but as stated by Sergey there's more thing to
> > consider
> >     > > > before being able to do a proper shutdown. I augmented my
> script
> > I gave
> >     > > you
> >     > > > originally and changed code in CS. What we're doing for our
> > environment
> >     > > is
> >     > > > as follow:
> >     > > >
> >     > > > 1. the MGMT looks for a change in the file /etc/lb-agent which
> > contains
> >     > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
> > disable the
> >     > > > mgmt on the keyword "maint" and the mgmt server stops a couple
> of
> >     > > > threads[1] to stop processing async jobs in the queue
> >     > > > 2. Looks for the async jobs and wait until there is none to
> > ensure you
> >     > > can
> >     > > > send the reconnect commands (if jobs are running, a reconnect
> > will
> >     > result
> >     > > > in a failed job since the result will never reach the
> management
> >     > server -
> >     > > > the agent waits for the current job to be done before
> > reconnecting, and
> >     > > > discard the result... rooms for improvement here!)
> >     > > > 3. Issue a reconnectHost command to all the hosts connected to
> > the mgmt
> >     > > > server so that they reconnect to another one, otherwise the
> mgmt
> > must
> >     > be
> >     > > up
> >     > > > since it is used to forward commands to agents.
> >     > > > 4. when all agents are reconnected, we can shutdown the
> > management
> >     > server
> >     > > > and perform the maintenance.
> >     > > >
> >     > > > One issue remains for me, during the reconnect, the commands
> > that are
> >     > > > processed at the same time should be kept in a queue until the
> > agents
> >     > > have
> >     > > > finished any current jobs and have reconnected. Today the
> little
> > time
> >     > > > window during which the reconnect happens can lead to failed
> > jobs due
> >     > to
> >     > > > the agent not being connected at the right moment.
> >     > > >
> >     > > > I could push a PR for the change to stop some processing
> threads
> > based
> >     > on
> >     > > > the content of a file. It's possible also to cancel the drain
> of
> > the
> >     > > > management by simply changing the content of the file back to
> > "ready"
> >     > > > again, instead of "maint" [2].
> >     > > >
> >     > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> >     > > > [2] HA proxy documentation on agent checker:
> > https://cbonte.github.io/
> >     > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> >     > > >
> >     > > > Regarding your issue on the port blocking, I think it's fair to
> >     > consider
> >     > > > that if you want to shutdown your server at some point, you
> have
> > to
> >     > stop
> >     > > > serving (some) requests. Here the only way it's to stop serving
> >     > > everything.
> >     > > > If the API had a REST design, we could reject any
> POST/PUT/DELETE
> >     > > > operations and allow GET ones. I don't know how hard it would
> be
> > today
> >     > to
> >     > > > only allow listBaseCmd operations to be more friendly with the
> > users.
> >     > > >
> >     > > > Marco
> >     > > >
> >     > > >
> >     > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
> > serg38l@hotmail.com>
> >     > > > wrote:
> >     > > >
> >     > > >> Now without spellchecking :)
> >     > > >>
> >     > > >> This is not simple e.g. for VMware. Each management server
> also
> > acts
> >     > as
> >     > > an
> >     > > >> agent proxy so tasks against a particular ESX host will be
> > always
> >     > > >> forwarded. That right answer will be to support a native
> > “maintenance
> >     > > mode”
> >     > > >> for management server. When entered to such mode the
> management
> > server
> >     > > >> should release all agents including SSVM, block/redirect API
> > calls and
> >     > > >> login request and finish all async job it originated.
> >     > > >>
> >     > > >>
> >     > > >>
> >     > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
> > serg38l@hotmail.com
> >     > > <mailto:
> >     > > >> serg38l@hotmail.com>> wrote:
> >     > > >>
> >     > > >> This is not simple e.g. for VMware. Each management server
> also
> > acts
> >     > as
> >     > > an
> >     > > >> agent proxy so tasks against a particular ESX host will be
> > always
> >     > > >> forwarded. That right answer will be to a native support for
> >     > > “maintenance
> >     > > >> mode” for management server. When entered to such mode the
> > management
> >     > > >> server should release all agents including save,
> block/redirect
> > API
> >     > > calls
> >     > > >> and login request and finish all a sync job it originated.
> >     > > >>
> >     > > >> Sent from my iPhone
> >     > > >>
> >     > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> >     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@
> gmail.com
> > >>
> >     > wrote:
> >     > > >>
> >     > > >> Ilya, still regarding the management server that is being shut
> > down
> >     > > issue;
> >     > > >> if other MSs/or maybe system VMs (I am not sure to know if
> they
> > are
> >     > > able to
> >     > > >> do such tasks) can direct/redirect/send new jobs to this
> > management
> >     > > server
> >     > > >> (the one being shut down), the process might never end because
> > new
> >     > tasks
> >     > > >> are always being created for the management server that we
> want
> > to
> >     > shut
> >     > > >> down. Is this scenario possible?
> >     > > >>
> >     > > >> That is why I mentioned blocking the port 8250 for the
> >     > > “graceful-shutdown”.
> >     > > >>
> >     > > >> If this scenario is not possible, then everything s fine.
> >     > > >>
> >     > > >>
> >     > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> >     > > ilya.mailing.lists@gmail.com
> >     > > >> <ma...@gmail.com>>
> >     > > >> wrote:
> >     > > >>
> >     > > >> I'm thinking of using a configuration from
> >     > > "job.cancel.threshold.minutes" -
> >     > > >> it will be the longest
> >     > > >>
> >     > > >>    "category": "Advanced",
> >     > > >>
> >     > > >>    "description": "Time (in minutes) for async-jobs to be
> > forcely
> >     > > >> cancelled if it has been in process for long",
> >     > > >>
> >     > > >>    "name": "job.cancel.threshold.minutes",
> >     > > >>
> >     > > >>    "value": "60"
> >     > > >>
> >     > > >>
> >     > > >>
> >     > > >>
> >     > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> >     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@
> gmail.com
> > >>
> >     > wrote:
> >     > > >>
> >     > > >> Big +1 for this feature; I only have a few doubts.
> >     > > >>
> >     > > >> * Regarding the tasks/jobs that management servers (MSs)
> > execute; are
> >     > > >> these
> >     > > >> tasks originate from requests that come to the MS, or is it
> > possible
> >     > > that
> >     > > >> requests received by one management server to be executed by
> > other? I
> >     > > >> mean,
> >     > > >> if I execute a request against MS1, will this request always
> be
> >     > > >> executed/threated by MS1, or is it possible that this request
> is
> >     > > executed
> >     > > >> by another MS (e.g. MS2)?
> >     > > >>
> >     > > >> * I would suggest that after we block traffic coming from
> >     > > >> 8080/8443/8250(we
> >     > > >> will need to block this as well right?), we can log the
> > execution of
> >     > > >> tasks.
> >     > > >> I mean, something saying, there are XXX tasks (enumerate
> tasks)
> > still
> >     > > >> being
> >     > > >> executed, we will wait for them to finish before shutting
> down.
> >     > > >>
> >     > > >> * The timeout (60 minutes suggested) could be global settings
> > that we
> >     > > can
> >     > > >> load before executing the graceful-shutdown.
> >     > > >>
> >     > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> >     > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
> > gmail.com>
> >     > > >>
> >     > > >> wrote:
> >     > > >>
> >     > > >> Use case:
> >     > > >> In any environment - time to time - administrator needs to
> > perform a
> >     > > >> maintenance. Current stop sequence of cloudstack management
> > server
> >     > will
> >     > > >> ignore the fact that there may be long running async jobs -
> and
> >     > > >> terminate
> >     > > >> the process. This in turn can create a poor user experience
> and
> >     > > >> occasional
> >     > > >> inconsistency  in cloudstack db.
> >     > > >>
> >     > > >> This is especially painful in large environments where the
> user
> > has
> >     > > >> thousands of nodes and there is a continuous patching that
> > happens
> >     > > >> around
> >     > > >> the clock - that requires migration of workload from one node
> to
> >     > > >> another.
> >     > > >>
> >     > > >> With that said - i've created a script that monitors the async
> > job
> >     > > >> queue
> >     > > >> for given MS and waits for it complete all jobs. More details
> > are
> >     > > >> posted
> >     > > >> below.
> >     > > >>
> >     > > >> I'd like to introduce "graceful-shutdown" into the
> > systemctl/service
> >     > of
> >     > > >> cloudstack-management service.
> >     > > >>
> >     > > >> The details of how it will work is below:
> >     > > >>
> >     > > >> Workflow for graceful shutdown:
> >     > > >> Using iptables/firewalld - block any connection attempts on
> > 8080/8443
> >     > > >> (we
> >     > > >> can identify the ports dynamically)
> >     > > >> Identify the MSID for the node, using the proper msid - query
> >     > > >> async_job
> >     > > >> table for
> >     > > >> 1) any jobs that are still running (or job_status=“0”)
> >     > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> >     > > >> 3) job_init_msid=$my_ms_id
> >     > > >>
> >     > > >> Monitor this async_job table for 60 minutes - until all async
> > jobs for
> >     > > >> MSID
> >     > > >> are done, then proceed with shutdown
> >     > > >>  If failed for any reason or terminated, catch the exit via
> trap
> >     > > >> command
> >     > > >> and unblock the 8080/8443
> >     > > >>
> >     > > >> Comments are welcome
> >     > > >>
> >     > > >> Regards,
> >     > > >> ilya
> >     > > >>
> >     > > >>
> >     > > >>
> >     > > >>
> >     > > >> --
> >     > > >> Rafael Weingärtner
> >     > > >>
> >     > > >>
> >     > > >>
> >     > > >>
> >     > > >>
> >     > > >> --
> >     > > >> Rafael Weingärtner
> >     > > >>
> >     > >
> >     >
> >     >
> >     >
> >     > --
> >     >
> >     > Andrija Panić
> >     >
> >
> >
> >
>
>
> --
> Rafael Weingärtner
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Rafael Weingärtner <ra...@gmail.com>.
Ilya and others,

We have been discussing this idea of graceful/nicely shutdown.  Our feeling
is that we (in CloudStack community) might have been trying to solve this
problem with too much scripting. What if we developed a more integrated
(native) solution?

Let me explain our idea.

ACS has a table called “mshost”, which is used to store management server
information. During balancing and when jobs are dispatched to other
management servers this table is consulted/queried.  Therefore, we have
been discussing the idea of creating a management API for management
servers.  We could have an API method that changes the state of management
servers to “prepare to maintenance” and then “maintenance” (as soon as all
of the task/jobs it is managing finish). The idea is that during
rebalancing we would remove the hosts of servers that are not in “Up” state
(of course we would also ignore hosts in the aforementioned state to
receive hosts to manage).  Moreover, when we send/dispatch jobs to other
management servers, we could ignore the ones that are not in “Up” state
(which is something already done).

By doing this, the nicely shutdown could be executed in a few steps.

1 – issue the maintenance method for the management server you desire
2 – wait until the MS goes into maintenance mode, while there are still
running jobs it (the management server) will be maintained in prepare for
maintenance
3 – execute the Linux shutdown command

We would need other APIs methods to manage MSs then. An (i) API method to
list MSs, and we could even create an (ii) API to remove old/de-activated
management servers, which we currently do not have (forcing users to apply
changed directly in the database).

Moreover, in this model, we would not kill hanging jobs; we would wait
until they expire and ACS expunges them. Of course, it is possible to
develop a forceful maintenance method as well. Then, when the “prepare for
maintenance” takes longer than a parameter, we could kill hanging jobs.

All of this would allow the MS to be kept up and receiving requests until
it can be safely shutdown. What do you guys about this approach?

On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yz...@marketo.com> wrote:

> As a cloud admin, I would love to have this feature.
>
> It so happens that I just accidentally restarted my ACS management server
> while two instances are migrating to another Xen cluster (via storage
> migration, not live migration).  As results, both instances
> ends up with corrupted data disk which can't be reattached or migrated.
>
> Any feature which prevents this from happening would be great.  A low
> hanging fruit is simply checking for
> if there are any async jobs running, especially any kind of migration jobs
> or other known long running type of
> jobs and warn the operator  so that he has a chance to abort server
> shutdowns.
>
> Yiping
>
> On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com> wrote:
>
>     Andrija
>
>     This is a tough scenario.
>
>     As an admin, they way i would have handled this situation, is to
> advertise
>     the upcoming outage and then take away specific API commands from a
> user a
>     day before - so he does not cause any long running async jobs. Once
>     maintenance completes - enable the API commands back to the user.
> However -
>     i dont know who your user base is and if this would be an acceptable
>     solution.
>
>     Perhaps also investigate what can be done to speed up your long running
>     tasks...
>
>     As a side node, we will be working on a feature that would allow for a
>     graceful termination of the process/job, meaning if agent noticed a
>     disconnect or termination request - it will abort the command in
> flight. We
>     can also consider restarting this tasks again or what not - but it
> would
>     not be part of this enhancement.
>
>     Regards
>     ilya
>
>     On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <andrija.panic@gmail.com
> >
>     wrote:
>
>     > Hi Ilya,
>     >
>     > thanks for the feedback - but in "real world", you need to
> "understand"
>     > that 60min is next to useless timeout for some jobs (if I understand
> this
>     > specific parameter correctly ?? - job is really canceled, not only
> job
>     > monitoring is canceled ???) -
>     >
>     > My value for the  "job.cancel.threshold.minutes" is 2880 minutes (2
> days?)
>     >
>     > I can tell you when you have CEPH/NFS (CEPH even "worse" case, since
> slower
>     > read durign qemu-img convert process...) of 500GB, then imagine
> snapshot
>     > job will take many hours. Should I mention 1TB volumes (yes, we had
>     > client's like that...)
>     > Than attaching 1TB volume, that was uploaded to ACS (lives
> originally on
>     > Secondary Storage, and takes time to be copied over to NFS/CEPH)
> will take
>     > up to few hours.
>     > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
> takes
>     > time...etc.
>     >
>     > I'm just giving you feedback as "user", admin of the cloud, zero DEV
> skills
>     > here :) , just to make sure you make practical decisions (and I
> admit I
>     > might be wrong with my stuff, but just giving you feedback from our
> public
>     > cloud setup)
>     >
>     >
>     > Cheers!
>     >
>     >
>     >
>     >
>     > On 5 April 2018 at 15:16, Tutkowski, Mike <Mike.Tutkowski@netapp.com
> >
>     > wrote:
>     >
>     > > Wow, there’s been a lot of good details noted from several people
> on how
>     > > this process works today and how we’d like it to work in the near
> future.
>     > >
>     > > 1) Any chance this is already documented on the Wiki?
>     > >
>     > > 2) If not, any chance someone would be willing to do so (a flow
> diagram
>     > > would be particularly useful).
>     > >
>     > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
> marco@exoscale.ch>
>     > > wrote:
>     > > >
>     > > > Hi all,
>     > > >
>     > > > Good point ilya but as stated by Sergey there's more thing to
> consider
>     > > > before being able to do a proper shutdown. I augmented my script
> I gave
>     > > you
>     > > > originally and changed code in CS. What we're doing for our
> environment
>     > > is
>     > > > as follow:
>     > > >
>     > > > 1. the MGMT looks for a change in the file /etc/lb-agent which
> contains
>     > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
> disable the
>     > > > mgmt on the keyword "maint" and the mgmt server stops a couple of
>     > > > threads[1] to stop processing async jobs in the queue
>     > > > 2. Looks for the async jobs and wait until there is none to
> ensure you
>     > > can
>     > > > send the reconnect commands (if jobs are running, a reconnect
> will
>     > result
>     > > > in a failed job since the result will never reach the management
>     > server -
>     > > > the agent waits for the current job to be done before
> reconnecting, and
>     > > > discard the result... rooms for improvement here!)
>     > > > 3. Issue a reconnectHost command to all the hosts connected to
> the mgmt
>     > > > server so that they reconnect to another one, otherwise the mgmt
> must
>     > be
>     > > up
>     > > > since it is used to forward commands to agents.
>     > > > 4. when all agents are reconnected, we can shutdown the
> management
>     > server
>     > > > and perform the maintenance.
>     > > >
>     > > > One issue remains for me, during the reconnect, the commands
> that are
>     > > > processed at the same time should be kept in a queue until the
> agents
>     > > have
>     > > > finished any current jobs and have reconnected. Today the little
> time
>     > > > window during which the reconnect happens can lead to failed
> jobs due
>     > to
>     > > > the agent not being connected at the right moment.
>     > > >
>     > > > I could push a PR for the change to stop some processing threads
> based
>     > on
>     > > > the content of a file. It's possible also to cancel the drain of
> the
>     > > > management by simply changing the content of the file back to
> "ready"
>     > > > again, instead of "maint" [2].
>     > > >
>     > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
>     > > > [2] HA proxy documentation on agent checker:
> https://cbonte.github.io/
>     > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
>     > > >
>     > > > Regarding your issue on the port blocking, I think it's fair to
>     > consider
>     > > > that if you want to shutdown your server at some point, you have
> to
>     > stop
>     > > > serving (some) requests. Here the only way it's to stop serving
>     > > everything.
>     > > > If the API had a REST design, we could reject any POST/PUT/DELETE
>     > > > operations and allow GET ones. I don't know how hard it would be
> today
>     > to
>     > > > only allow listBaseCmd operations to be more friendly with the
> users.
>     > > >
>     > > > Marco
>     > > >
>     > > >
>     > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
> serg38l@hotmail.com>
>     > > > wrote:
>     > > >
>     > > >> Now without spellchecking :)
>     > > >>
>     > > >> This is not simple e.g. for VMware. Each management server also
> acts
>     > as
>     > > an
>     > > >> agent proxy so tasks against a particular ESX host will be
> always
>     > > >> forwarded. That right answer will be to support a native
> “maintenance
>     > > mode”
>     > > >> for management server. When entered to such mode the management
> server
>     > > >> should release all agents including SSVM, block/redirect API
> calls and
>     > > >> login request and finish all async job it originated.
>     > > >>
>     > > >>
>     > > >>
>     > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
> serg38l@hotmail.com
>     > > <mailto:
>     > > >> serg38l@hotmail.com>> wrote:
>     > > >>
>     > > >> This is not simple e.g. for VMware. Each management server also
> acts
>     > as
>     > > an
>     > > >> agent proxy so tasks against a particular ESX host will be
> always
>     > > >> forwarded. That right answer will be to a native support for
>     > > “maintenance
>     > > >> mode” for management server. When entered to such mode the
> management
>     > > >> server should release all agents including save, block/redirect
> API
>     > > calls
>     > > >> and login request and finish all a sync job it originated.
>     > > >>
>     > > >> Sent from my iPhone
>     > > >>
>     > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
>     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail.com
> >>
>     > wrote:
>     > > >>
>     > > >> Ilya, still regarding the management server that is being shut
> down
>     > > issue;
>     > > >> if other MSs/or maybe system VMs (I am not sure to know if they
> are
>     > > able to
>     > > >> do such tasks) can direct/redirect/send new jobs to this
> management
>     > > server
>     > > >> (the one being shut down), the process might never end because
> new
>     > tasks
>     > > >> are always being created for the management server that we want
> to
>     > shut
>     > > >> down. Is this scenario possible?
>     > > >>
>     > > >> That is why I mentioned blocking the port 8250 for the
>     > > “graceful-shutdown”.
>     > > >>
>     > > >> If this scenario is not possible, then everything s fine.
>     > > >>
>     > > >>
>     > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
>     > > ilya.mailing.lists@gmail.com
>     > > >> <ma...@gmail.com>>
>     > > >> wrote:
>     > > >>
>     > > >> I'm thinking of using a configuration from
>     > > "job.cancel.threshold.minutes" -
>     > > >> it will be the longest
>     > > >>
>     > > >>    "category": "Advanced",
>     > > >>
>     > > >>    "description": "Time (in minutes) for async-jobs to be
> forcely
>     > > >> cancelled if it has been in process for long",
>     > > >>
>     > > >>    "name": "job.cancel.threshold.minutes",
>     > > >>
>     > > >>    "value": "60"
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
>     > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail.com
> >>
>     > wrote:
>     > > >>
>     > > >> Big +1 for this feature; I only have a few doubts.
>     > > >>
>     > > >> * Regarding the tasks/jobs that management servers (MSs)
> execute; are
>     > > >> these
>     > > >> tasks originate from requests that come to the MS, or is it
> possible
>     > > that
>     > > >> requests received by one management server to be executed by
> other? I
>     > > >> mean,
>     > > >> if I execute a request against MS1, will this request always be
>     > > >> executed/threated by MS1, or is it possible that this request is
>     > > executed
>     > > >> by another MS (e.g. MS2)?
>     > > >>
>     > > >> * I would suggest that after we block traffic coming from
>     > > >> 8080/8443/8250(we
>     > > >> will need to block this as well right?), we can log the
> execution of
>     > > >> tasks.
>     > > >> I mean, something saying, there are XXX tasks (enumerate tasks)
> still
>     > > >> being
>     > > >> executed, we will wait for them to finish before shutting down.
>     > > >>
>     > > >> * The timeout (60 minutes suggested) could be global settings
> that we
>     > > can
>     > > >> load before executing the graceful-shutdown.
>     > > >>
>     > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
>     > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
> gmail.com>
>     > > >>
>     > > >> wrote:
>     > > >>
>     > > >> Use case:
>     > > >> In any environment - time to time - administrator needs to
> perform a
>     > > >> maintenance. Current stop sequence of cloudstack management
> server
>     > will
>     > > >> ignore the fact that there may be long running async jobs - and
>     > > >> terminate
>     > > >> the process. This in turn can create a poor user experience and
>     > > >> occasional
>     > > >> inconsistency  in cloudstack db.
>     > > >>
>     > > >> This is especially painful in large environments where the user
> has
>     > > >> thousands of nodes and there is a continuous patching that
> happens
>     > > >> around
>     > > >> the clock - that requires migration of workload from one node to
>     > > >> another.
>     > > >>
>     > > >> With that said - i've created a script that monitors the async
> job
>     > > >> queue
>     > > >> for given MS and waits for it complete all jobs. More details
> are
>     > > >> posted
>     > > >> below.
>     > > >>
>     > > >> I'd like to introduce "graceful-shutdown" into the
> systemctl/service
>     > of
>     > > >> cloudstack-management service.
>     > > >>
>     > > >> The details of how it will work is below:
>     > > >>
>     > > >> Workflow for graceful shutdown:
>     > > >> Using iptables/firewalld - block any connection attempts on
> 8080/8443
>     > > >> (we
>     > > >> can identify the ports dynamically)
>     > > >> Identify the MSID for the node, using the proper msid - query
>     > > >> async_job
>     > > >> table for
>     > > >> 1) any jobs that are still running (or job_status=“0”)
>     > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
>     > > >> 3) job_init_msid=$my_ms_id
>     > > >>
>     > > >> Monitor this async_job table for 60 minutes - until all async
> jobs for
>     > > >> MSID
>     > > >> are done, then proceed with shutdown
>     > > >>  If failed for any reason or terminated, catch the exit via trap
>     > > >> command
>     > > >> and unblock the 8080/8443
>     > > >>
>     > > >> Comments are welcome
>     > > >>
>     > > >> Regards,
>     > > >> ilya
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >> --
>     > > >> Rafael Weingärtner
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >> --
>     > > >> Rafael Weingärtner
>     > > >>
>     > >
>     >
>     >
>     >
>     > --
>     >
>     > Andrija Panić
>     >
>
>
>


-- 
Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Yiping Zhang <yz...@marketo.com>.
As a cloud admin, I would love to have this feature.  

It so happens that I just accidentally restarted my ACS management server 
while two instances are migrating to another Xen cluster (via storage migration, not live migration).  As results, both instances 
ends up with corrupted data disk which can't be reattached or migrated.

Any feature which prevents this from happening would be great.  A low hanging fruit is simply checking for 
if there are any async jobs running, especially any kind of migration jobs or other known long running type of 
jobs and warn the operator  so that he has a chance to abort server shutdowns.

Yiping

On 4/5/18, 3:13 PM, "ilya musayev" <il...@gmail.com> wrote:

    Andrija
    
    This is a tough scenario.
    
    As an admin, they way i would have handled this situation, is to advertise
    the upcoming outage and then take away specific API commands from a user a
    day before - so he does not cause any long running async jobs. Once
    maintenance completes - enable the API commands back to the user. However -
    i dont know who your user base is and if this would be an acceptable
    solution.
    
    Perhaps also investigate what can be done to speed up your long running
    tasks...
    
    As a side node, we will be working on a feature that would allow for a
    graceful termination of the process/job, meaning if agent noticed a
    disconnect or termination request - it will abort the command in flight. We
    can also consider restarting this tasks again or what not - but it would
    not be part of this enhancement.
    
    Regards
    ilya
    
    On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <an...@gmail.com>
    wrote:
    
    > Hi Ilya,
    >
    > thanks for the feedback - but in "real world", you need to "understand"
    > that 60min is next to useless timeout for some jobs (if I understand this
    > specific parameter correctly ?? - job is really canceled, not only job
    > monitoring is canceled ???) -
    >
    > My value for the  "job.cancel.threshold.minutes" is 2880 minutes (2 days?)
    >
    > I can tell you when you have CEPH/NFS (CEPH even "worse" case, since slower
    > read durign qemu-img convert process...) of 500GB, then imagine snapshot
    > job will take many hours. Should I mention 1TB volumes (yes, we had
    > client's like that...)
    > Than attaching 1TB volume, that was uploaded to ACS (lives originally on
    > Secondary Storage, and takes time to be copied over to NFS/CEPH) will take
    > up to few hours.
    > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also takes
    > time...etc.
    >
    > I'm just giving you feedback as "user", admin of the cloud, zero DEV skills
    > here :) , just to make sure you make practical decisions (and I admit I
    > might be wrong with my stuff, but just giving you feedback from our public
    > cloud setup)
    >
    >
    > Cheers!
    >
    >
    >
    >
    > On 5 April 2018 at 15:16, Tutkowski, Mike <Mi...@netapp.com>
    > wrote:
    >
    > > Wow, there’s been a lot of good details noted from several people on how
    > > this process works today and how we’d like it to work in the near future.
    > >
    > > 1) Any chance this is already documented on the Wiki?
    > >
    > > 2) If not, any chance someone would be willing to do so (a flow diagram
    > > would be particularly useful).
    > >
    > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <ma...@exoscale.ch>
    > > wrote:
    > > >
    > > > Hi all,
    > > >
    > > > Good point ilya but as stated by Sergey there's more thing to consider
    > > > before being able to do a proper shutdown. I augmented my script I gave
    > > you
    > > > originally and changed code in CS. What we're doing for our environment
    > > is
    > > > as follow:
    > > >
    > > > 1. the MGMT looks for a change in the file /etc/lb-agent which contains
    > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can disable the
    > > > mgmt on the keyword "maint" and the mgmt server stops a couple of
    > > > threads[1] to stop processing async jobs in the queue
    > > > 2. Looks for the async jobs and wait until there is none to ensure you
    > > can
    > > > send the reconnect commands (if jobs are running, a reconnect will
    > result
    > > > in a failed job since the result will never reach the management
    > server -
    > > > the agent waits for the current job to be done before reconnecting, and
    > > > discard the result... rooms for improvement here!)
    > > > 3. Issue a reconnectHost command to all the hosts connected to the mgmt
    > > > server so that they reconnect to another one, otherwise the mgmt must
    > be
    > > up
    > > > since it is used to forward commands to agents.
    > > > 4. when all agents are reconnected, we can shutdown the management
    > server
    > > > and perform the maintenance.
    > > >
    > > > One issue remains for me, during the reconnect, the commands that are
    > > > processed at the same time should be kept in a queue until the agents
    > > have
    > > > finished any current jobs and have reconnected. Today the little time
    > > > window during which the reconnect happens can lead to failed jobs due
    > to
    > > > the agent not being connected at the right moment.
    > > >
    > > > I could push a PR for the change to stop some processing threads based
    > on
    > > > the content of a file. It's possible also to cancel the drain of the
    > > > management by simply changing the content of the file back to "ready"
    > > > again, instead of "maint" [2].
    > > >
    > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
    > > > [2] HA proxy documentation on agent checker: https://cbonte.github.io/
    > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
    > > >
    > > > Regarding your issue on the port blocking, I think it's fair to
    > consider
    > > > that if you want to shutdown your server at some point, you have to
    > stop
    > > > serving (some) requests. Here the only way it's to stop serving
    > > everything.
    > > > If the API had a REST design, we could reject any POST/PUT/DELETE
    > > > operations and allow GET ones. I don't know how hard it would be today
    > to
    > > > only allow listBaseCmd operations to be more friendly with the users.
    > > >
    > > > Marco
    > > >
    > > >
    > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <se...@hotmail.com>
    > > > wrote:
    > > >
    > > >> Now without spellchecking :)
    > > >>
    > > >> This is not simple e.g. for VMware. Each management server also acts
    > as
    > > an
    > > >> agent proxy so tasks against a particular ESX host will be always
    > > >> forwarded. That right answer will be to support a native “maintenance
    > > mode”
    > > >> for management server. When entered to such mode the management server
    > > >> should release all agents including SSVM, block/redirect API calls and
    > > >> login request and finish all async job it originated.
    > > >>
    > > >>
    > > >>
    > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@hotmail.com
    > > <mailto:
    > > >> serg38l@hotmail.com>> wrote:
    > > >>
    > > >> This is not simple e.g. for VMware. Each management server also acts
    > as
    > > an
    > > >> agent proxy so tasks against a particular ESX host will be always
    > > >> forwarded. That right answer will be to a native support for
    > > “maintenance
    > > >> mode” for management server. When entered to such mode the management
    > > >> server should release all agents including save, block/redirect API
    > > calls
    > > >> and login request and finish all a sync job it originated.
    > > >>
    > > >> Sent from my iPhone
    > > >>
    > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
    > > >> rafaelweingartner@gmail.com<ma...@gmail.com>>
    > wrote:
    > > >>
    > > >> Ilya, still regarding the management server that is being shut down
    > > issue;
    > > >> if other MSs/or maybe system VMs (I am not sure to know if they are
    > > able to
    > > >> do such tasks) can direct/redirect/send new jobs to this management
    > > server
    > > >> (the one being shut down), the process might never end because new
    > tasks
    > > >> are always being created for the management server that we want to
    > shut
    > > >> down. Is this scenario possible?
    > > >>
    > > >> That is why I mentioned blocking the port 8250 for the
    > > “graceful-shutdown”.
    > > >>
    > > >> If this scenario is not possible, then everything s fine.
    > > >>
    > > >>
    > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
    > > ilya.mailing.lists@gmail.com
    > > >> <ma...@gmail.com>>
    > > >> wrote:
    > > >>
    > > >> I'm thinking of using a configuration from
    > > "job.cancel.threshold.minutes" -
    > > >> it will be the longest
    > > >>
    > > >>    "category": "Advanced",
    > > >>
    > > >>    "description": "Time (in minutes) for async-jobs to be forcely
    > > >> cancelled if it has been in process for long",
    > > >>
    > > >>    "name": "job.cancel.threshold.minutes",
    > > >>
    > > >>    "value": "60"
    > > >>
    > > >>
    > > >>
    > > >>
    > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
    > > >> rafaelweingartner@gmail.com<ma...@gmail.com>>
    > wrote:
    > > >>
    > > >> Big +1 for this feature; I only have a few doubts.
    > > >>
    > > >> * Regarding the tasks/jobs that management servers (MSs) execute; are
    > > >> these
    > > >> tasks originate from requests that come to the MS, or is it possible
    > > that
    > > >> requests received by one management server to be executed by other? I
    > > >> mean,
    > > >> if I execute a request against MS1, will this request always be
    > > >> executed/threated by MS1, or is it possible that this request is
    > > executed
    > > >> by another MS (e.g. MS2)?
    > > >>
    > > >> * I would suggest that after we block traffic coming from
    > > >> 8080/8443/8250(we
    > > >> will need to block this as well right?), we can log the execution of
    > > >> tasks.
    > > >> I mean, something saying, there are XXX tasks (enumerate tasks) still
    > > >> being
    > > >> executed, we will wait for them to finish before shutting down.
    > > >>
    > > >> * The timeout (60 minutes suggested) could be global settings that we
    > > can
    > > >> load before executing the graceful-shutdown.
    > > >>
    > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
    > > >> ilya.mailing.lists@gmail.com<ma...@gmail.com>
    > > >>
    > > >> wrote:
    > > >>
    > > >> Use case:
    > > >> In any environment - time to time - administrator needs to perform a
    > > >> maintenance. Current stop sequence of cloudstack management server
    > will
    > > >> ignore the fact that there may be long running async jobs - and
    > > >> terminate
    > > >> the process. This in turn can create a poor user experience and
    > > >> occasional
    > > >> inconsistency  in cloudstack db.
    > > >>
    > > >> This is especially painful in large environments where the user has
    > > >> thousands of nodes and there is a continuous patching that happens
    > > >> around
    > > >> the clock - that requires migration of workload from one node to
    > > >> another.
    > > >>
    > > >> With that said - i've created a script that monitors the async job
    > > >> queue
    > > >> for given MS and waits for it complete all jobs. More details are
    > > >> posted
    > > >> below.
    > > >>
    > > >> I'd like to introduce "graceful-shutdown" into the systemctl/service
    > of
    > > >> cloudstack-management service.
    > > >>
    > > >> The details of how it will work is below:
    > > >>
    > > >> Workflow for graceful shutdown:
    > > >> Using iptables/firewalld - block any connection attempts on 8080/8443
    > > >> (we
    > > >> can identify the ports dynamically)
    > > >> Identify the MSID for the node, using the proper msid - query
    > > >> async_job
    > > >> table for
    > > >> 1) any jobs that are still running (or job_status=“0”)
    > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
    > > >> 3) job_init_msid=$my_ms_id
    > > >>
    > > >> Monitor this async_job table for 60 minutes - until all async jobs for
    > > >> MSID
    > > >> are done, then proceed with shutdown
    > > >>  If failed for any reason or terminated, catch the exit via trap
    > > >> command
    > > >> and unblock the 8080/8443
    > > >>
    > > >> Comments are welcome
    > > >>
    > > >> Regards,
    > > >> ilya
    > > >>
    > > >>
    > > >>
    > > >>
    > > >> --
    > > >> Rafael Weingärtner
    > > >>
    > > >>
    > > >>
    > > >>
    > > >>
    > > >> --
    > > >> Rafael Weingärtner
    > > >>
    > >
    >
    >
    >
    > --
    >
    > Andrija Panić
    >
    


Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
Andrija

This is a tough scenario.

As an admin, they way i would have handled this situation, is to advertise
the upcoming outage and then take away specific API commands from a user a
day before - so he does not cause any long running async jobs. Once
maintenance completes - enable the API commands back to the user. However -
i dont know who your user base is and if this would be an acceptable
solution.

Perhaps also investigate what can be done to speed up your long running
tasks...

As a side node, we will be working on a feature that would allow for a
graceful termination of the process/job, meaning if agent noticed a
disconnect or termination request - it will abort the command in flight. We
can also consider restarting this tasks again or what not - but it would
not be part of this enhancement.

Regards
ilya

On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <an...@gmail.com>
wrote:

> Hi Ilya,
>
> thanks for the feedback - but in "real world", you need to "understand"
> that 60min is next to useless timeout for some jobs (if I understand this
> specific parameter correctly ?? - job is really canceled, not only job
> monitoring is canceled ???) -
>
> My value for the  "job.cancel.threshold.minutes" is 2880 minutes (2 days?)
>
> I can tell you when you have CEPH/NFS (CEPH even "worse" case, since slower
> read durign qemu-img convert process...) of 500GB, then imagine snapshot
> job will take many hours. Should I mention 1TB volumes (yes, we had
> client's like that...)
> Than attaching 1TB volume, that was uploaded to ACS (lives originally on
> Secondary Storage, and takes time to be copied over to NFS/CEPH) will take
> up to few hours.
> Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also takes
> time...etc.
>
> I'm just giving you feedback as "user", admin of the cloud, zero DEV skills
> here :) , just to make sure you make practical decisions (and I admit I
> might be wrong with my stuff, but just giving you feedback from our public
> cloud setup)
>
>
> Cheers!
>
>
>
>
> On 5 April 2018 at 15:16, Tutkowski, Mike <Mi...@netapp.com>
> wrote:
>
> > Wow, there’s been a lot of good details noted from several people on how
> > this process works today and how we’d like it to work in the near future.
> >
> > 1) Any chance this is already documented on the Wiki?
> >
> > 2) If not, any chance someone would be willing to do so (a flow diagram
> > would be particularly useful).
> >
> > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <ma...@exoscale.ch>
> > wrote:
> > >
> > > Hi all,
> > >
> > > Good point ilya but as stated by Sergey there's more thing to consider
> > > before being able to do a proper shutdown. I augmented my script I gave
> > you
> > > originally and changed code in CS. What we're doing for our environment
> > is
> > > as follow:
> > >
> > > 1. the MGMT looks for a change in the file /etc/lb-agent which contains
> > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can disable the
> > > mgmt on the keyword "maint" and the mgmt server stops a couple of
> > > threads[1] to stop processing async jobs in the queue
> > > 2. Looks for the async jobs and wait until there is none to ensure you
> > can
> > > send the reconnect commands (if jobs are running, a reconnect will
> result
> > > in a failed job since the result will never reach the management
> server -
> > > the agent waits for the current job to be done before reconnecting, and
> > > discard the result... rooms for improvement here!)
> > > 3. Issue a reconnectHost command to all the hosts connected to the mgmt
> > > server so that they reconnect to another one, otherwise the mgmt must
> be
> > up
> > > since it is used to forward commands to agents.
> > > 4. when all agents are reconnected, we can shutdown the management
> server
> > > and perform the maintenance.
> > >
> > > One issue remains for me, during the reconnect, the commands that are
> > > processed at the same time should be kept in a queue until the agents
> > have
> > > finished any current jobs and have reconnected. Today the little time
> > > window during which the reconnect happens can lead to failed jobs due
> to
> > > the agent not being connected at the right moment.
> > >
> > > I could push a PR for the change to stop some processing threads based
> on
> > > the content of a file. It's possible also to cancel the drain of the
> > > management by simply changing the content of the file back to "ready"
> > > again, instead of "maint" [2].
> > >
> > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> > > [2] HA proxy documentation on agent checker: https://cbonte.github.io/
> > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> > >
> > > Regarding your issue on the port blocking, I think it's fair to
> consider
> > > that if you want to shutdown your server at some point, you have to
> stop
> > > serving (some) requests. Here the only way it's to stop serving
> > everything.
> > > If the API had a REST design, we could reject any POST/PUT/DELETE
> > > operations and allow GET ones. I don't know how hard it would be today
> to
> > > only allow listBaseCmd operations to be more friendly with the users.
> > >
> > > Marco
> > >
> > >
> > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <se...@hotmail.com>
> > > wrote:
> > >
> > >> Now without spellchecking :)
> > >>
> > >> This is not simple e.g. for VMware. Each management server also acts
> as
> > an
> > >> agent proxy so tasks against a particular ESX host will be always
> > >> forwarded. That right answer will be to support a native “maintenance
> > mode”
> > >> for management server. When entered to such mode the management server
> > >> should release all agents including SSVM, block/redirect API calls and
> > >> login request and finish all async job it originated.
> > >>
> > >>
> > >>
> > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@hotmail.com
> > <mailto:
> > >> serg38l@hotmail.com>> wrote:
> > >>
> > >> This is not simple e.g. for VMware. Each management server also acts
> as
> > an
> > >> agent proxy so tasks against a particular ESX host will be always
> > >> forwarded. That right answer will be to a native support for
> > “maintenance
> > >> mode” for management server. When entered to such mode the management
> > >> server should release all agents including save, block/redirect API
> > calls
> > >> and login request and finish all a sync job it originated.
> > >>
> > >> Sent from my iPhone
> > >>
> > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> > >> rafaelweingartner@gmail.com<ma...@gmail.com>>
> wrote:
> > >>
> > >> Ilya, still regarding the management server that is being shut down
> > issue;
> > >> if other MSs/or maybe system VMs (I am not sure to know if they are
> > able to
> > >> do such tasks) can direct/redirect/send new jobs to this management
> > server
> > >> (the one being shut down), the process might never end because new
> tasks
> > >> are always being created for the management server that we want to
> shut
> > >> down. Is this scenario possible?
> > >>
> > >> That is why I mentioned blocking the port 8250 for the
> > “graceful-shutdown”.
> > >>
> > >> If this scenario is not possible, then everything s fine.
> > >>
> > >>
> > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> > ilya.mailing.lists@gmail.com
> > >> <ma...@gmail.com>>
> > >> wrote:
> > >>
> > >> I'm thinking of using a configuration from
> > "job.cancel.threshold.minutes" -
> > >> it will be the longest
> > >>
> > >>    "category": "Advanced",
> > >>
> > >>    "description": "Time (in minutes) for async-jobs to be forcely
> > >> cancelled if it has been in process for long",
> > >>
> > >>    "name": "job.cancel.threshold.minutes",
> > >>
> > >>    "value": "60"
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> > >> rafaelweingartner@gmail.com<ma...@gmail.com>>
> wrote:
> > >>
> > >> Big +1 for this feature; I only have a few doubts.
> > >>
> > >> * Regarding the tasks/jobs that management servers (MSs) execute; are
> > >> these
> > >> tasks originate from requests that come to the MS, or is it possible
> > that
> > >> requests received by one management server to be executed by other? I
> > >> mean,
> > >> if I execute a request against MS1, will this request always be
> > >> executed/threated by MS1, or is it possible that this request is
> > executed
> > >> by another MS (e.g. MS2)?
> > >>
> > >> * I would suggest that after we block traffic coming from
> > >> 8080/8443/8250(we
> > >> will need to block this as well right?), we can log the execution of
> > >> tasks.
> > >> I mean, something saying, there are XXX tasks (enumerate tasks) still
> > >> being
> > >> executed, we will wait for them to finish before shutting down.
> > >>
> > >> * The timeout (60 minutes suggested) could be global settings that we
> > can
> > >> load before executing the graceful-shutdown.
> > >>
> > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> > >> ilya.mailing.lists@gmail.com<ma...@gmail.com>
> > >>
> > >> wrote:
> > >>
> > >> Use case:
> > >> In any environment - time to time - administrator needs to perform a
> > >> maintenance. Current stop sequence of cloudstack management server
> will
> > >> ignore the fact that there may be long running async jobs - and
> > >> terminate
> > >> the process. This in turn can create a poor user experience and
> > >> occasional
> > >> inconsistency  in cloudstack db.
> > >>
> > >> This is especially painful in large environments where the user has
> > >> thousands of nodes and there is a continuous patching that happens
> > >> around
> > >> the clock - that requires migration of workload from one node to
> > >> another.
> > >>
> > >> With that said - i've created a script that monitors the async job
> > >> queue
> > >> for given MS and waits for it complete all jobs. More details are
> > >> posted
> > >> below.
> > >>
> > >> I'd like to introduce "graceful-shutdown" into the systemctl/service
> of
> > >> cloudstack-management service.
> > >>
> > >> The details of how it will work is below:
> > >>
> > >> Workflow for graceful shutdown:
> > >> Using iptables/firewalld - block any connection attempts on 8080/8443
> > >> (we
> > >> can identify the ports dynamically)
> > >> Identify the MSID for the node, using the proper msid - query
> > >> async_job
> > >> table for
> > >> 1) any jobs that are still running (or job_status=“0”)
> > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> > >> 3) job_init_msid=$my_ms_id
> > >>
> > >> Monitor this async_job table for 60 minutes - until all async jobs for
> > >> MSID
> > >> are done, then proceed with shutdown
> > >>  If failed for any reason or terminated, catch the exit via trap
> > >> command
> > >> and unblock the 8080/8443
> > >>
> > >> Comments are welcome
> > >>
> > >> Regards,
> > >> ilya
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Rafael Weingärtner
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Rafael Weingärtner
> > >>
> >
>
>
>
> --
>
> Andrija Panić
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Andrija Panic <an...@gmail.com>.
Hi Ilya,

thanks for the feedback - but in "real world", you need to "understand"
that 60min is next to useless timeout for some jobs (if I understand this
specific parameter correctly ?? - job is really canceled, not only job
monitoring is canceled ???) -

My value for the  "job.cancel.threshold.minutes" is 2880 minutes (2 days?)

I can tell you when you have CEPH/NFS (CEPH even "worse" case, since slower
read durign qemu-img convert process...) of 500GB, then imagine snapshot
job will take many hours. Should I mention 1TB volumes (yes, we had
client's like that...)
Than attaching 1TB volume, that was uploaded to ACS (lives originally on
Secondary Storage, and takes time to be copied over to NFS/CEPH) will take
up to few hours.
Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also takes
time...etc.

I'm just giving you feedback as "user", admin of the cloud, zero DEV skills
here :) , just to make sure you make practical decisions (and I admit I
might be wrong with my stuff, but just giving you feedback from our public
cloud setup)


Cheers!




On 5 April 2018 at 15:16, Tutkowski, Mike <Mi...@netapp.com> wrote:

> Wow, there’s been a lot of good details noted from several people on how
> this process works today and how we’d like it to work in the near future.
>
> 1) Any chance this is already documented on the Wiki?
>
> 2) If not, any chance someone would be willing to do so (a flow diagram
> would be particularly useful).
>
> > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <ma...@exoscale.ch>
> wrote:
> >
> > Hi all,
> >
> > Good point ilya but as stated by Sergey there's more thing to consider
> > before being able to do a proper shutdown. I augmented my script I gave
> you
> > originally and changed code in CS. What we're doing for our environment
> is
> > as follow:
> >
> > 1. the MGMT looks for a change in the file /etc/lb-agent which contains
> > keywords for HAproxy[2] (ready, maint) so that HA-proxy can disable the
> > mgmt on the keyword "maint" and the mgmt server stops a couple of
> > threads[1] to stop processing async jobs in the queue
> > 2. Looks for the async jobs and wait until there is none to ensure you
> can
> > send the reconnect commands (if jobs are running, a reconnect will result
> > in a failed job since the result will never reach the management server -
> > the agent waits for the current job to be done before reconnecting, and
> > discard the result... rooms for improvement here!)
> > 3. Issue a reconnectHost command to all the hosts connected to the mgmt
> > server so that they reconnect to another one, otherwise the mgmt must be
> up
> > since it is used to forward commands to agents.
> > 4. when all agents are reconnected, we can shutdown the management server
> > and perform the maintenance.
> >
> > One issue remains for me, during the reconnect, the commands that are
> > processed at the same time should be kept in a queue until the agents
> have
> > finished any current jobs and have reconnected. Today the little time
> > window during which the reconnect happens can lead to failed jobs due to
> > the agent not being connected at the right moment.
> >
> > I could push a PR for the change to stop some processing threads based on
> > the content of a file. It's possible also to cancel the drain of the
> > management by simply changing the content of the file back to "ready"
> > again, instead of "maint" [2].
> >
> > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> > [2] HA proxy documentation on agent checker: https://cbonte.github.io/
> > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> >
> > Regarding your issue on the port blocking, I think it's fair to consider
> > that if you want to shutdown your server at some point, you have to stop
> > serving (some) requests. Here the only way it's to stop serving
> everything.
> > If the API had a REST design, we could reject any POST/PUT/DELETE
> > operations and allow GET ones. I don't know how hard it would be today to
> > only allow listBaseCmd operations to be more friendly with the users.
> >
> > Marco
> >
> >
> > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <se...@hotmail.com>
> > wrote:
> >
> >> Now without spellchecking :)
> >>
> >> This is not simple e.g. for VMware. Each management server also acts as
> an
> >> agent proxy so tasks against a particular ESX host will be always
> >> forwarded. That right answer will be to support a native “maintenance
> mode”
> >> for management server. When entered to such mode the management server
> >> should release all agents including SSVM, block/redirect API calls and
> >> login request and finish all async job it originated.
> >>
> >>
> >>
> >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@hotmail.com
> <mailto:
> >> serg38l@hotmail.com>> wrote:
> >>
> >> This is not simple e.g. for VMware. Each management server also acts as
> an
> >> agent proxy so tasks against a particular ESX host will be always
> >> forwarded. That right answer will be to a native support for
> “maintenance
> >> mode” for management server. When entered to such mode the management
> >> server should release all agents including save, block/redirect API
> calls
> >> and login request and finish all a sync job it originated.
> >>
> >> Sent from my iPhone
> >>
> >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> >> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
> >>
> >> Ilya, still regarding the management server that is being shut down
> issue;
> >> if other MSs/or maybe system VMs (I am not sure to know if they are
> able to
> >> do such tasks) can direct/redirect/send new jobs to this management
> server
> >> (the one being shut down), the process might never end because new tasks
> >> are always being created for the management server that we want to shut
> >> down. Is this scenario possible?
> >>
> >> That is why I mentioned blocking the port 8250 for the
> “graceful-shutdown”.
> >>
> >> If this scenario is not possible, then everything s fine.
> >>
> >>
> >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> ilya.mailing.lists@gmail.com
> >> <ma...@gmail.com>>
> >> wrote:
> >>
> >> I'm thinking of using a configuration from
> "job.cancel.threshold.minutes" -
> >> it will be the longest
> >>
> >>    "category": "Advanced",
> >>
> >>    "description": "Time (in minutes) for async-jobs to be forcely
> >> cancelled if it has been in process for long",
> >>
> >>    "name": "job.cancel.threshold.minutes",
> >>
> >>    "value": "60"
> >>
> >>
> >>
> >>
> >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> >> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
> >>
> >> Big +1 for this feature; I only have a few doubts.
> >>
> >> * Regarding the tasks/jobs that management servers (MSs) execute; are
> >> these
> >> tasks originate from requests that come to the MS, or is it possible
> that
> >> requests received by one management server to be executed by other? I
> >> mean,
> >> if I execute a request against MS1, will this request always be
> >> executed/threated by MS1, or is it possible that this request is
> executed
> >> by another MS (e.g. MS2)?
> >>
> >> * I would suggest that after we block traffic coming from
> >> 8080/8443/8250(we
> >> will need to block this as well right?), we can log the execution of
> >> tasks.
> >> I mean, something saying, there are XXX tasks (enumerate tasks) still
> >> being
> >> executed, we will wait for them to finish before shutting down.
> >>
> >> * The timeout (60 minutes suggested) could be global settings that we
> can
> >> load before executing the graceful-shutdown.
> >>
> >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> >> ilya.mailing.lists@gmail.com<ma...@gmail.com>
> >>
> >> wrote:
> >>
> >> Use case:
> >> In any environment - time to time - administrator needs to perform a
> >> maintenance. Current stop sequence of cloudstack management server will
> >> ignore the fact that there may be long running async jobs - and
> >> terminate
> >> the process. This in turn can create a poor user experience and
> >> occasional
> >> inconsistency  in cloudstack db.
> >>
> >> This is especially painful in large environments where the user has
> >> thousands of nodes and there is a continuous patching that happens
> >> around
> >> the clock - that requires migration of workload from one node to
> >> another.
> >>
> >> With that said - i've created a script that monitors the async job
> >> queue
> >> for given MS and waits for it complete all jobs. More details are
> >> posted
> >> below.
> >>
> >> I'd like to introduce "graceful-shutdown" into the systemctl/service of
> >> cloudstack-management service.
> >>
> >> The details of how it will work is below:
> >>
> >> Workflow for graceful shutdown:
> >> Using iptables/firewalld - block any connection attempts on 8080/8443
> >> (we
> >> can identify the ports dynamically)
> >> Identify the MSID for the node, using the proper msid - query
> >> async_job
> >> table for
> >> 1) any jobs that are still running (or job_status=“0”)
> >> 2) job_dispatcher not like “pseudoJobDispatcher"
> >> 3) job_init_msid=$my_ms_id
> >>
> >> Monitor this async_job table for 60 minutes - until all async jobs for
> >> MSID
> >> are done, then proceed with shutdown
> >>  If failed for any reason or terminated, catch the exit via trap
> >> command
> >> and unblock the 8080/8443
> >>
> >> Comments are welcome
> >>
> >> Regards,
> >> ilya
> >>
> >>
> >>
> >>
> >> --
> >> Rafael Weingärtner
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Rafael Weingärtner
> >>
>



-- 

Andrija Panić

Re: [DISCUSS] CloudStack graceful shutdown

Posted by "Tutkowski, Mike" <Mi...@netapp.com>.
Wow, there’s been a lot of good details noted from several people on how this process works today and how we’d like it to work in the near future.

1) Any chance this is already documented on the Wiki?

2) If not, any chance someone would be willing to do so (a flow diagram would be particularly useful).

> On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <ma...@exoscale.ch> wrote:
> 
> Hi all,
> 
> Good point ilya but as stated by Sergey there's more thing to consider
> before being able to do a proper shutdown. I augmented my script I gave you
> originally and changed code in CS. What we're doing for our environment is
> as follow:
> 
> 1. the MGMT looks for a change in the file /etc/lb-agent which contains
> keywords for HAproxy[2] (ready, maint) so that HA-proxy can disable the
> mgmt on the keyword "maint" and the mgmt server stops a couple of
> threads[1] to stop processing async jobs in the queue
> 2. Looks for the async jobs and wait until there is none to ensure you can
> send the reconnect commands (if jobs are running, a reconnect will result
> in a failed job since the result will never reach the management server -
> the agent waits for the current job to be done before reconnecting, and
> discard the result... rooms for improvement here!)
> 3. Issue a reconnectHost command to all the hosts connected to the mgmt
> server so that they reconnect to another one, otherwise the mgmt must be up
> since it is used to forward commands to agents.
> 4. when all agents are reconnected, we can shutdown the management server
> and perform the maintenance.
> 
> One issue remains for me, during the reconnect, the commands that are
> processed at the same time should be kept in a queue until the agents have
> finished any current jobs and have reconnected. Today the little time
> window during which the reconnect happens can lead to failed jobs due to
> the agent not being connected at the right moment.
> 
> I could push a PR for the change to stop some processing threads based on
> the content of a file. It's possible also to cancel the drain of the
> management by simply changing the content of the file back to "ready"
> again, instead of "maint" [2].
> 
> [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> [2] HA proxy documentation on agent checker: https://cbonte.github.io/
> haproxy-dconv/1.6/configuration.html#5.2-agent-check
> 
> Regarding your issue on the port blocking, I think it's fair to consider
> that if you want to shutdown your server at some point, you have to stop
> serving (some) requests. Here the only way it's to stop serving everything.
> If the API had a REST design, we could reject any POST/PUT/DELETE
> operations and allow GET ones. I don't know how hard it would be today to
> only allow listBaseCmd operations to be more friendly with the users.
> 
> Marco
> 
> 
> On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <se...@hotmail.com>
> wrote:
> 
>> Now without spellchecking :)
>> 
>> This is not simple e.g. for VMware. Each management server also acts as an
>> agent proxy so tasks against a particular ESX host will be always
>> forwarded. That right answer will be to support a native “maintenance mode”
>> for management server. When entered to such mode the management server
>> should release all agents including SSVM, block/redirect API calls and
>> login request and finish all async job it originated.
>> 
>> 
>> 
>> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@hotmail.com<mailto:
>> serg38l@hotmail.com>> wrote:
>> 
>> This is not simple e.g. for VMware. Each management server also acts as an
>> agent proxy so tasks against a particular ESX host will be always
>> forwarded. That right answer will be to a native support for “maintenance
>> mode” for management server. When entered to such mode the management
>> server should release all agents including save, block/redirect API calls
>> and login request and finish all a sync job it originated.
>> 
>> Sent from my iPhone
>> 
>> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
>> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
>> 
>> Ilya, still regarding the management server that is being shut down issue;
>> if other MSs/or maybe system VMs (I am not sure to know if they are able to
>> do such tasks) can direct/redirect/send new jobs to this management server
>> (the one being shut down), the process might never end because new tasks
>> are always being created for the management server that we want to shut
>> down. Is this scenario possible?
>> 
>> That is why I mentioned blocking the port 8250 for the “graceful-shutdown”.
>> 
>> If this scenario is not possible, then everything s fine.
>> 
>> 
>> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <ilya.mailing.lists@gmail.com
>> <ma...@gmail.com>>
>> wrote:
>> 
>> I'm thinking of using a configuration from "job.cancel.threshold.minutes" -
>> it will be the longest
>> 
>>    "category": "Advanced",
>> 
>>    "description": "Time (in minutes) for async-jobs to be forcely
>> cancelled if it has been in process for long",
>> 
>>    "name": "job.cancel.threshold.minutes",
>> 
>>    "value": "60"
>> 
>> 
>> 
>> 
>> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
>> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
>> 
>> Big +1 for this feature; I only have a few doubts.
>> 
>> * Regarding the tasks/jobs that management servers (MSs) execute; are
>> these
>> tasks originate from requests that come to the MS, or is it possible that
>> requests received by one management server to be executed by other? I
>> mean,
>> if I execute a request against MS1, will this request always be
>> executed/threated by MS1, or is it possible that this request is executed
>> by another MS (e.g. MS2)?
>> 
>> * I would suggest that after we block traffic coming from
>> 8080/8443/8250(we
>> will need to block this as well right?), we can log the execution of
>> tasks.
>> I mean, something saying, there are XXX tasks (enumerate tasks) still
>> being
>> executed, we will wait for them to finish before shutting down.
>> 
>> * The timeout (60 minutes suggested) could be global settings that we can
>> load before executing the graceful-shutdown.
>> 
>> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
>> ilya.mailing.lists@gmail.com<ma...@gmail.com>
>> 
>> wrote:
>> 
>> Use case:
>> In any environment - time to time - administrator needs to perform a
>> maintenance. Current stop sequence of cloudstack management server will
>> ignore the fact that there may be long running async jobs - and
>> terminate
>> the process. This in turn can create a poor user experience and
>> occasional
>> inconsistency  in cloudstack db.
>> 
>> This is especially painful in large environments where the user has
>> thousands of nodes and there is a continuous patching that happens
>> around
>> the clock - that requires migration of workload from one node to
>> another.
>> 
>> With that said - i've created a script that monitors the async job
>> queue
>> for given MS and waits for it complete all jobs. More details are
>> posted
>> below.
>> 
>> I'd like to introduce "graceful-shutdown" into the systemctl/service of
>> cloudstack-management service.
>> 
>> The details of how it will work is below:
>> 
>> Workflow for graceful shutdown:
>> Using iptables/firewalld - block any connection attempts on 8080/8443
>> (we
>> can identify the ports dynamically)
>> Identify the MSID for the node, using the proper msid - query
>> async_job
>> table for
>> 1) any jobs that are still running (or job_status=“0”)
>> 2) job_dispatcher not like “pseudoJobDispatcher"
>> 3) job_init_msid=$my_ms_id
>> 
>> Monitor this async_job table for 60 minutes - until all async jobs for
>> MSID
>> are done, then proceed with shutdown
>>  If failed for any reason or terminated, catch the exit via trap
>> command
>> and unblock the 8080/8443
>> 
>> Comments are welcome
>> 
>> Regards,
>> ilya
>> 
>> 
>> 
>> 
>> --
>> Rafael Weingärtner
>> 
>> 
>> 
>> 
>> 
>> --
>> Rafael Weingärtner
>> 

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Marc-Aurèle Brothier <ma...@exoscale.ch>.
Hi all,

Good point ilya but as stated by Sergey there's more thing to consider
before being able to do a proper shutdown. I augmented my script I gave you
originally and changed code in CS. What we're doing for our environment is
as follow:

1. the MGMT looks for a change in the file /etc/lb-agent which contains
keywords for HAproxy[2] (ready, maint) so that HA-proxy can disable the
mgmt on the keyword "maint" and the mgmt server stops a couple of
threads[1] to stop processing async jobs in the queue
2. Looks for the async jobs and wait until there is none to ensure you can
send the reconnect commands (if jobs are running, a reconnect will result
in a failed job since the result will never reach the management server -
the agent waits for the current job to be done before reconnecting, and
discard the result... rooms for improvement here!)
3. Issue a reconnectHost command to all the hosts connected to the mgmt
server so that they reconnect to another one, otherwise the mgmt must be up
since it is used to forward commands to agents.
4. when all agents are reconnected, we can shutdown the management server
and perform the maintenance.

One issue remains for me, during the reconnect, the commands that are
processed at the same time should be kept in a queue until the agents have
finished any current jobs and have reconnected. Today the little time
window during which the reconnect happens can lead to failed jobs due to
the agent not being connected at the right moment.

I could push a PR for the change to stop some processing threads based on
the content of a file. It's possible also to cancel the drain of the
management by simply changing the content of the file back to "ready"
again, instead of "maint" [2].

[1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
[2] HA proxy documentation on agent checker: https://cbonte.github.io/
haproxy-dconv/1.6/configuration.html#5.2-agent-check

Regarding your issue on the port blocking, I think it's fair to consider
that if you want to shutdown your server at some point, you have to stop
serving (some) requests. Here the only way it's to stop serving everything.
If the API had a REST design, we could reject any POST/PUT/DELETE
operations and allow GET ones. I don't know how hard it would be today to
only allow listBaseCmd operations to be more friendly with the users.

Marco


On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <se...@hotmail.com>
wrote:

> Now without spellchecking :)
>
> This is not simple e.g. for VMware. Each management server also acts as an
> agent proxy so tasks against a particular ESX host will be always
> forwarded. That right answer will be to support a native “maintenance mode”
> for management server. When entered to such mode the management server
> should release all agents including SSVM, block/redirect API calls and
> login request and finish all async job it originated.
>
>
>
> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@hotmail.com<mailto:
> serg38l@hotmail.com>> wrote:
>
> This is not simple e.g. for VMware. Each management server also acts as an
> agent proxy so tasks against a particular ESX host will be always
> forwarded. That right answer will be to a native support for “maintenance
> mode” for management server. When entered to such mode the management
> server should release all agents including save, block/redirect API calls
> and login request and finish all a sync job it originated.
>
> Sent from my iPhone
>
> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
>
> Ilya, still regarding the management server that is being shut down issue;
> if other MSs/or maybe system VMs (I am not sure to know if they are able to
> do such tasks) can direct/redirect/send new jobs to this management server
> (the one being shut down), the process might never end because new tasks
> are always being created for the management server that we want to shut
> down. Is this scenario possible?
>
> That is why I mentioned blocking the port 8250 for the “graceful-shutdown”.
>
> If this scenario is not possible, then everything s fine.
>
>
> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <ilya.mailing.lists@gmail.com
> <ma...@gmail.com>>
> wrote:
>
> I'm thinking of using a configuration from "job.cancel.threshold.minutes" -
> it will be the longest
>
>     "category": "Advanced",
>
>     "description": "Time (in minutes) for async-jobs to be forcely
> cancelled if it has been in process for long",
>
>     "name": "job.cancel.threshold.minutes",
>
>     "value": "60"
>
>
>
>
> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:
>
> Big +1 for this feature; I only have a few doubts.
>
> * Regarding the tasks/jobs that management servers (MSs) execute; are
> these
> tasks originate from requests that come to the MS, or is it possible that
> requests received by one management server to be executed by other? I
> mean,
> if I execute a request against MS1, will this request always be
> executed/threated by MS1, or is it possible that this request is executed
> by another MS (e.g. MS2)?
>
> * I would suggest that after we block traffic coming from
> 8080/8443/8250(we
> will need to block this as well right?), we can log the execution of
> tasks.
> I mean, something saying, there are XXX tasks (enumerate tasks) still
> being
> executed, we will wait for them to finish before shutting down.
>
> * The timeout (60 minutes suggested) could be global settings that we can
> load before executing the graceful-shutdown.
>
> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> ilya.mailing.lists@gmail.com<ma...@gmail.com>
>
> wrote:
>
> Use case:
> In any environment - time to time - administrator needs to perform a
> maintenance. Current stop sequence of cloudstack management server will
> ignore the fact that there may be long running async jobs - and
> terminate
> the process. This in turn can create a poor user experience and
> occasional
> inconsistency  in cloudstack db.
>
> This is especially painful in large environments where the user has
> thousands of nodes and there is a continuous patching that happens
> around
> the clock - that requires migration of workload from one node to
> another.
>
> With that said - i've created a script that monitors the async job
> queue
> for given MS and waits for it complete all jobs. More details are
> posted
> below.
>
> I'd like to introduce "graceful-shutdown" into the systemctl/service of
> cloudstack-management service.
>
> The details of how it will work is below:
>
> Workflow for graceful shutdown:
> Using iptables/firewalld - block any connection attempts on 8080/8443
> (we
> can identify the ports dynamically)
> Identify the MSID for the node, using the proper msid - query
> async_job
> table for
> 1) any jobs that are still running (or job_status=“0”)
> 2) job_dispatcher not like “pseudoJobDispatcher"
> 3) job_init_msid=$my_ms_id
>
> Monitor this async_job table for 60 minutes - until all async jobs for
> MSID
> are done, then proceed with shutdown
>   If failed for any reason or terminated, catch the exit via trap
> command
> and unblock the 8080/8443
>
> Comments are welcome
>
> Regards,
> ilya
>
>
>
>
> --
> Rafael Weingärtner
>
>
>
>
>
> --
> Rafael Weingärtner
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Sergey Levitskiy <se...@hotmail.com>.
Now without spellchecking :)

This is not simple e.g. for VMware. Each management server also acts as an agent proxy so tasks against a particular ESX host will be always forwarded. That right answer will be to support a native “maintenance mode” for management server. When entered to such mode the management server should release all agents including SSVM, block/redirect API calls and login request and finish all async job it originated.



On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <se...@hotmail.com>> wrote:

This is not simple e.g. for VMware. Each management server also acts as an agent proxy so tasks against a particular ESX host will be always forwarded. That right answer will be to a native support for “maintenance mode” for management server. When entered to such mode the management server should release all agents including save, block/redirect API calls and login request and finish all a sync job it originated.

Sent from my iPhone

On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <ra...@gmail.com>> wrote:

Ilya, still regarding the management server that is being shut down issue;
if other MSs/or maybe system VMs (I am not sure to know if they are able to
do such tasks) can direct/redirect/send new jobs to this management server
(the one being shut down), the process might never end because new tasks
are always being created for the management server that we want to shut
down. Is this scenario possible?

That is why I mentioned blocking the port 8250 for the “graceful-shutdown”.

If this scenario is not possible, then everything s fine.


On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <il...@gmail.com>>
wrote:

I'm thinking of using a configuration from "job.cancel.threshold.minutes" -
it will be the longest

    "category": "Advanced",

    "description": "Time (in minutes) for async-jobs to be forcely
cancelled if it has been in process for long",

    "name": "job.cancel.threshold.minutes",

    "value": "60"




On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
rafaelweingartner@gmail.com<ma...@gmail.com>> wrote:

Big +1 for this feature; I only have a few doubts.

* Regarding the tasks/jobs that management servers (MSs) execute; are
these
tasks originate from requests that come to the MS, or is it possible that
requests received by one management server to be executed by other? I
mean,
if I execute a request against MS1, will this request always be
executed/threated by MS1, or is it possible that this request is executed
by another MS (e.g. MS2)?

* I would suggest that after we block traffic coming from
8080/8443/8250(we
will need to block this as well right?), we can log the execution of
tasks.
I mean, something saying, there are XXX tasks (enumerate tasks) still
being
executed, we will wait for them to finish before shutting down.

* The timeout (60 minutes suggested) could be global settings that we can
load before executing the graceful-shutdown.

On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
ilya.mailing.lists@gmail.com<ma...@gmail.com>

wrote:

Use case:
In any environment - time to time - administrator needs to perform a
maintenance. Current stop sequence of cloudstack management server will
ignore the fact that there may be long running async jobs - and
terminate
the process. This in turn can create a poor user experience and
occasional
inconsistency  in cloudstack db.

This is especially painful in large environments where the user has
thousands of nodes and there is a continuous patching that happens
around
the clock - that requires migration of workload from one node to
another.

With that said - i've created a script that monitors the async job
queue
for given MS and waits for it complete all jobs. More details are
posted
below.

I'd like to introduce "graceful-shutdown" into the systemctl/service of
cloudstack-management service.

The details of how it will work is below:

Workflow for graceful shutdown:
Using iptables/firewalld - block any connection attempts on 8080/8443
(we
can identify the ports dynamically)
Identify the MSID for the node, using the proper msid - query
async_job
table for
1) any jobs that are still running (or job_status=“0”)
2) job_dispatcher not like “pseudoJobDispatcher"
3) job_init_msid=$my_ms_id

Monitor this async_job table for 60 minutes - until all async jobs for
MSID
are done, then proceed with shutdown
  If failed for any reason or terminated, catch the exit via trap
command
and unblock the 8080/8443

Comments are welcome

Regards,
ilya




--
Rafael Weingärtner





--
Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Sergey Levitskiy <se...@hotmail.com>.
This is not simple e.g. for VMware. Each management server also acts as an agent proxy so tasks against a particular ESX host will be always forwarded. That right answer will be to a native support for “maintenance mode” for management server. When entered to such mode the management server should release all agents including save, block/redirect API calls and login request and finish all a sync job it originated.

Sent from my iPhone

> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <ra...@gmail.com> wrote:
> 
> Ilya, still regarding the management server that is being shut down issue;
> if other MSs/or maybe system VMs (I am not sure to know if they are able to
> do such tasks) can direct/redirect/send new jobs to this management server
> (the one being shut down), the process might never end because new tasks
> are always being created for the management server that we want to shut
> down. Is this scenario possible?
> 
> That is why I mentioned blocking the port 8250 for the “graceful-shutdown”.
> 
> If this scenario is not possible, then everything s fine.
> 
> 
> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <il...@gmail.com>
> wrote:
> 
>> I'm thinking of using a configuration from "job.cancel.threshold.minutes" -
>> it will be the longest
>> 
>>      "category": "Advanced",
>> 
>>      "description": "Time (in minutes) for async-jobs to be forcely
>> cancelled if it has been in process for long",
>> 
>>      "name": "job.cancel.threshold.minutes",
>> 
>>      "value": "60"
>> 
>> 
>> 
>> 
>> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
>> rafaelweingartner@gmail.com> wrote:
>> 
>>> Big +1 for this feature; I only have a few doubts.
>>> 
>>> * Regarding the tasks/jobs that management servers (MSs) execute; are
>> these
>>> tasks originate from requests that come to the MS, or is it possible that
>>> requests received by one management server to be executed by other? I
>> mean,
>>> if I execute a request against MS1, will this request always be
>>> executed/threated by MS1, or is it possible that this request is executed
>>> by another MS (e.g. MS2)?
>>> 
>>> * I would suggest that after we block traffic coming from
>> 8080/8443/8250(we
>>> will need to block this as well right?), we can log the execution of
>> tasks.
>>> I mean, something saying, there are XXX tasks (enumerate tasks) still
>> being
>>> executed, we will wait for them to finish before shutting down.
>>> 
>>> * The timeout (60 minutes suggested) could be global settings that we can
>>> load before executing the graceful-shutdown.
>>> 
>>> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
>> ilya.mailing.lists@gmail.com
>>>> 
>>> wrote:
>>> 
>>>> Use case:
>>>> In any environment - time to time - administrator needs to perform a
>>>> maintenance. Current stop sequence of cloudstack management server will
>>>> ignore the fact that there may be long running async jobs - and
>> terminate
>>>> the process. This in turn can create a poor user experience and
>>> occasional
>>>> inconsistency  in cloudstack db.
>>>> 
>>>> This is especially painful in large environments where the user has
>>>> thousands of nodes and there is a continuous patching that happens
>> around
>>>> the clock - that requires migration of workload from one node to
>> another.
>>>> 
>>>> With that said - i've created a script that monitors the async job
>> queue
>>>> for given MS and waits for it complete all jobs. More details are
>> posted
>>>> below.
>>>> 
>>>> I'd like to introduce "graceful-shutdown" into the systemctl/service of
>>>> cloudstack-management service.
>>>> 
>>>> The details of how it will work is below:
>>>> 
>>>> Workflow for graceful shutdown:
>>>>  Using iptables/firewalld - block any connection attempts on 8080/8443
>>> (we
>>>> can identify the ports dynamically)
>>>>  Identify the MSID for the node, using the proper msid - query
>> async_job
>>>> table for
>>>> 1) any jobs that are still running (or job_status=“0”)
>>>> 2) job_dispatcher not like “pseudoJobDispatcher"
>>>> 3) job_init_msid=$my_ms_id
>>>> 
>>>> Monitor this async_job table for 60 minutes - until all async jobs for
>>> MSID
>>>> are done, then proceed with shutdown
>>>>    If failed for any reason or terminated, catch the exit via trap
>>> command
>>>> and unblock the 8080/8443
>>>> 
>>>> Comments are welcome
>>>> 
>>>> Regards,
>>>> ilya
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Rafael Weingärtner
>>> 
>> 
> 
> 
> 
> -- 
> Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Rafael Weingärtner <ra...@gmail.com>.
Ilya, still regarding the management server that is being shut down issue;
if other MSs/or maybe system VMs (I am not sure to know if they are able to
do such tasks) can direct/redirect/send new jobs to this management server
(the one being shut down), the process might never end because new tasks
are always being created for the management server that we want to shut
down. Is this scenario possible?

That is why I mentioned blocking the port 8250 for the “graceful-shutdown”.

If this scenario is not possible, then everything s fine.


On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <il...@gmail.com>
wrote:

> I'm thinking of using a configuration from "job.cancel.threshold.minutes" -
> it will be the longest
>
>       "category": "Advanced",
>
>       "description": "Time (in minutes) for async-jobs to be forcely
> cancelled if it has been in process for long",
>
>       "name": "job.cancel.threshold.minutes",
>
>       "value": "60"
>
>
>
>
> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> rafaelweingartner@gmail.com> wrote:
>
> > Big +1 for this feature; I only have a few doubts.
> >
> > * Regarding the tasks/jobs that management servers (MSs) execute; are
> these
> > tasks originate from requests that come to the MS, or is it possible that
> > requests received by one management server to be executed by other? I
> mean,
> > if I execute a request against MS1, will this request always be
> > executed/threated by MS1, or is it possible that this request is executed
> > by another MS (e.g. MS2)?
> >
> > * I would suggest that after we block traffic coming from
> 8080/8443/8250(we
> > will need to block this as well right?), we can log the execution of
> tasks.
> > I mean, something saying, there are XXX tasks (enumerate tasks) still
> being
> > executed, we will wait for them to finish before shutting down.
> >
> > * The timeout (60 minutes suggested) could be global settings that we can
> > load before executing the graceful-shutdown.
> >
> > On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> ilya.mailing.lists@gmail.com
> > >
> > wrote:
> >
> > > Use case:
> > > In any environment - time to time - administrator needs to perform a
> > > maintenance. Current stop sequence of cloudstack management server will
> > > ignore the fact that there may be long running async jobs - and
> terminate
> > > the process. This in turn can create a poor user experience and
> > occasional
> > > inconsistency  in cloudstack db.
> > >
> > > This is especially painful in large environments where the user has
> > > thousands of nodes and there is a continuous patching that happens
> around
> > > the clock - that requires migration of workload from one node to
> another.
> > >
> > > With that said - i've created a script that monitors the async job
> queue
> > > for given MS and waits for it complete all jobs. More details are
> posted
> > > below.
> > >
> > > I'd like to introduce "graceful-shutdown" into the systemctl/service of
> > > cloudstack-management service.
> > >
> > > The details of how it will work is below:
> > >
> > > Workflow for graceful shutdown:
> > >   Using iptables/firewalld - block any connection attempts on 8080/8443
> > (we
> > > can identify the ports dynamically)
> > >   Identify the MSID for the node, using the proper msid - query
> async_job
> > > table for
> > > 1) any jobs that are still running (or job_status=“0”)
> > > 2) job_dispatcher not like “pseudoJobDispatcher"
> > > 3) job_init_msid=$my_ms_id
> > >
> > > Monitor this async_job table for 60 minutes - until all async jobs for
> > MSID
> > > are done, then proceed with shutdown
> > >     If failed for any reason or terminated, catch the exit via trap
> > command
> > > and unblock the 8080/8443
> > >
> > > Comments are welcome
> > >
> > > Regards,
> > > ilya
> > >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
>



-- 
Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Posted by ilya musayev <il...@gmail.com>.
I'm thinking of using a configuration from "job.cancel.threshold.minutes" -
it will be the longest

      "category": "Advanced",

      "description": "Time (in minutes) for async-jobs to be forcely
cancelled if it has been in process for long",

      "name": "job.cancel.threshold.minutes",

      "value": "60"




On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
rafaelweingartner@gmail.com> wrote:

> Big +1 for this feature; I only have a few doubts.
>
> * Regarding the tasks/jobs that management servers (MSs) execute; are these
> tasks originate from requests that come to the MS, or is it possible that
> requests received by one management server to be executed by other? I mean,
> if I execute a request against MS1, will this request always be
> executed/threated by MS1, or is it possible that this request is executed
> by another MS (e.g. MS2)?
>
> * I would suggest that after we block traffic coming from 8080/8443/8250(we
> will need to block this as well right?), we can log the execution of tasks.
> I mean, something saying, there are XXX tasks (enumerate tasks) still being
> executed, we will wait for them to finish before shutting down.
>
> * The timeout (60 minutes suggested) could be global settings that we can
> load before executing the graceful-shutdown.
>
> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <ilya.mailing.lists@gmail.com
> >
> wrote:
>
> > Use case:
> > In any environment - time to time - administrator needs to perform a
> > maintenance. Current stop sequence of cloudstack management server will
> > ignore the fact that there may be long running async jobs - and terminate
> > the process. This in turn can create a poor user experience and
> occasional
> > inconsistency  in cloudstack db.
> >
> > This is especially painful in large environments where the user has
> > thousands of nodes and there is a continuous patching that happens around
> > the clock - that requires migration of workload from one node to another.
> >
> > With that said - i've created a script that monitors the async job queue
> > for given MS and waits for it complete all jobs. More details are posted
> > below.
> >
> > I'd like to introduce "graceful-shutdown" into the systemctl/service of
> > cloudstack-management service.
> >
> > The details of how it will work is below:
> >
> > Workflow for graceful shutdown:
> >   Using iptables/firewalld - block any connection attempts on 8080/8443
> (we
> > can identify the ports dynamically)
> >   Identify the MSID for the node, using the proper msid - query async_job
> > table for
> > 1) any jobs that are still running (or job_status=“0”)
> > 2) job_dispatcher not like “pseudoJobDispatcher"
> > 3) job_init_msid=$my_ms_id
> >
> > Monitor this async_job table for 60 minutes - until all async jobs for
> MSID
> > are done, then proceed with shutdown
> >     If failed for any reason or terminated, catch the exit via trap
> command
> > and unblock the 8080/8443
> >
> > Comments are welcome
> >
> > Regards,
> > ilya
> >
>
>
>
> --
> Rafael Weingärtner
>

Re: [DISCUSS] CloudStack graceful shutdown

Posted by Rafael Weingärtner <ra...@gmail.com>.
Big +1 for this feature; I only have a few doubts.

* Regarding the tasks/jobs that management servers (MSs) execute; are these
tasks originate from requests that come to the MS, or is it possible that
requests received by one management server to be executed by other? I mean,
if I execute a request against MS1, will this request always be
executed/threated by MS1, or is it possible that this request is executed
by another MS (e.g. MS2)?

* I would suggest that after we block traffic coming from 8080/8443/8250(we
will need to block this as well right?), we can log the execution of tasks.
I mean, something saying, there are XXX tasks (enumerate tasks) still being
executed, we will wait for them to finish before shutting down.

* The timeout (60 minutes suggested) could be global settings that we can
load before executing the graceful-shutdown.

On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <il...@gmail.com>
wrote:

> Use case:
> In any environment - time to time - administrator needs to perform a
> maintenance. Current stop sequence of cloudstack management server will
> ignore the fact that there may be long running async jobs - and terminate
> the process. This in turn can create a poor user experience and occasional
> inconsistency  in cloudstack db.
>
> This is especially painful in large environments where the user has
> thousands of nodes and there is a continuous patching that happens around
> the clock - that requires migration of workload from one node to another.
>
> With that said - i've created a script that monitors the async job queue
> for given MS and waits for it complete all jobs. More details are posted
> below.
>
> I'd like to introduce "graceful-shutdown" into the systemctl/service of
> cloudstack-management service.
>
> The details of how it will work is below:
>
> Workflow for graceful shutdown:
>   Using iptables/firewalld - block any connection attempts on 8080/8443 (we
> can identify the ports dynamically)
>   Identify the MSID for the node, using the proper msid - query async_job
> table for
> 1) any jobs that are still running (or job_status=“0”)
> 2) job_dispatcher not like “pseudoJobDispatcher"
> 3) job_init_msid=$my_ms_id
>
> Monitor this async_job table for 60 minutes - until all async jobs for MSID
> are done, then proceed with shutdown
>     If failed for any reason or terminated, catch the exit via trap command
> and unblock the 8080/8443
>
> Comments are welcome
>
> Regards,
> ilya
>



-- 
Rafael Weingärtner