You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Daan Hoogland <da...@gmail.com> on 2015/09/16 09:36:07 UTC

[DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Ladies and gentlemen,

A ticket was entered by Rene Moser [1] on the way ACS handles missing power
state reports. His issue is that VMs ma at times be regarded as off while
they are registered as running and actually running as well. The missing
report has to be handled in some way so right now it is handled as if the
VM is down. I made a PR [2] that does minimal handling by only logging the
fact. This seems an improvement but I have no idea if it is and how to test
it.

so two questions:
- is this a blocker?
- is the ignoring after logging the appropriate way to handle this?

I would be very pleased with all and any of your opinions.

regards,

[1] https://issues.apache.org/jira/browse/CLOUDSTACK-8848
​[2] https://github.com/apache/cloudstack/pull/829​

-- 
Daan

Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Anshul Gangwar <an...@citrix.com>.
I don’t think there was any discussion around this. Kelven have made fixes around VMSync. So to find details look into FS https://cwiki.apache.org/confluence/display/CLOUDSTACK/FS+-+VMSync+improvement .

Regards,
Anshul

On 16-Sep-2015, at 3:32 PM, Daan Hoogland <da...@gmail.com>> wrote:

On Wed, Sep 16, 2015 at 11:46 AM, Anshul Gangwar <an...@citrix.com>>
wrote:

It’s not difficult to find a good grace period. It will simply depend on
your Hypervisor settings how it is configured for HA. You can easily figure
out for how much time there will be no VM on any Host from your settings
and simply put 2-3 times of that period as grace period.

​That seems kludgey.
​


It seems you have considered only one aspect of change i.e. User VMs HA.
Did you consider System VMs HA?
Did you consider that we have already explored that territory of separate
handling of PowerOff and PowerReportMissing?

​for VMware or for all hypervisors? Do you have a link to the discussion?
These states are different.​

​Why was it decided to treat them the same?
​

And even if you are still thinking of this change then add marvin tests
for this change. Unit tests will not tell anything about the change.

​Yes, that I definitely agree on.​



Regards,
Anshul

On 16-Sep-2015, at 2:48 PM, Rene Moser <ma...@renemoser.net>> wrote:


Hi René

On 09/16/2015 10:17 AM, Anshul Gangwar wrote:
Currently we report only PowerOn VMs and do not report PowerOff VMs
that's why we consider Missing and PowerOff as same And that's how most of
the code is written for VM sync and each Hypervisor resource has same
understanding. This will effect HA and many more unknown places. So please
do not even consider to merge this change.

So Now coming to bug we can fix that by changing global setting
pingInterval to appropriate value according to hypervisor settings which
takes care of these transitional period of missing report here or can be
handled by introducing gracePeriod global setting.

This is interesting, I also wrote in the bug report gracePeriod
calculation might be related.

https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110
.

IMHO making this value configurable would might solve it, but it is hard
to "guess" what a good grace period would be.

In terms of VMware it depends on amounts of esx in the clusters, and
they can be different.

But another question is, why make one _global_ grace period for every
hypervisor. Think about, users can have mixed hypervisors setups.

So to me, a global grace period setting might not be the best solution,
instead we should take care hypervisor functionality, in this case
VMware, it handels HA by itself.

I know a VR in 4.5 would be broken after an VMware HA event, but there
is another global setting, which can be enabled if you like for out of
band migrations router restarts.

So to me, in 4.5 I am +1 for the patch of daan makes sense, if
hypervisor is VMware.

Yours
René





--
Daan


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Anshul Gangwar <an...@citrix.com>.
I don’t think there was any discussion around this. Kelven have made fixes around VMSync. So to find details look into FS https://cwiki.apache.org/confluence/display/CLOUDSTACK/FS+-+VMSync+improvement .

Regards,
Anshul

On 16-Sep-2015, at 3:32 PM, Daan Hoogland <da...@gmail.com>> wrote:

On Wed, Sep 16, 2015 at 11:46 AM, Anshul Gangwar <an...@citrix.com>>
wrote:

It’s not difficult to find a good grace period. It will simply depend on
your Hypervisor settings how it is configured for HA. You can easily figure
out for how much time there will be no VM on any Host from your settings
and simply put 2-3 times of that period as grace period.

​That seems kludgey.
​


It seems you have considered only one aspect of change i.e. User VMs HA.
Did you consider System VMs HA?
Did you consider that we have already explored that territory of separate
handling of PowerOff and PowerReportMissing?

​for VMware or for all hypervisors? Do you have a link to the discussion?
These states are different.​

​Why was it decided to treat them the same?
​

And even if you are still thinking of this change then add marvin tests
for this change. Unit tests will not tell anything about the change.

​Yes, that I definitely agree on.​



Regards,
Anshul

On 16-Sep-2015, at 2:48 PM, Rene Moser <ma...@renemoser.net>> wrote:


Hi René

On 09/16/2015 10:17 AM, Anshul Gangwar wrote:
Currently we report only PowerOn VMs and do not report PowerOff VMs
that's why we consider Missing and PowerOff as same And that's how most of
the code is written for VM sync and each Hypervisor resource has same
understanding. This will effect HA and many more unknown places. So please
do not even consider to merge this change.

So Now coming to bug we can fix that by changing global setting
pingInterval to appropriate value according to hypervisor settings which
takes care of these transitional period of missing report here or can be
handled by introducing gracePeriod global setting.

This is interesting, I also wrote in the bug report gracePeriod
calculation might be related.

https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110
.

IMHO making this value configurable would might solve it, but it is hard
to "guess" what a good grace period would be.

In terms of VMware it depends on amounts of esx in the clusters, and
they can be different.

But another question is, why make one _global_ grace period for every
hypervisor. Think about, users can have mixed hypervisors setups.

So to me, a global grace period setting might not be the best solution,
instead we should take care hypervisor functionality, in this case
VMware, it handels HA by itself.

I know a VR in 4.5 would be broken after an VMware HA event, but there
is another global setting, which can be enabled if you like for out of
band migrations router restarts.

So to me, in 4.5 I am +1 for the patch of daan makes sense, if
hypervisor is VMware.

Yours
René





--
Daan


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Daan Hoogland <da...@gmail.com>.
On Wed, Sep 16, 2015 at 11:46 AM, Anshul Gangwar <an...@citrix.com>
wrote:

> It’s not difficult to find a good grace period. It will simply depend on
> your Hypervisor settings how it is configured for HA. You can easily figure
> out for how much time there will be no VM on any Host from your settings
> and simply put 2-3 times of that period as grace period.
>
​That seems kludgey.
​


> It seems you have considered only one aspect of change i.e. User VMs HA.
> Did you consider System VMs HA?
> Did you consider that we have already explored that territory of separate
> handling of PowerOff and PowerReportMissing?
>
​for VMware or for all hypervisors? Do you have a link to the discussion?
These states are different.​

​Why was it decided to treat them the same?
​

> And even if you are still thinking of this change then add marvin tests
> for this change. Unit tests will not tell anything about the change.
>
​Yes, that I definitely agree on.​



> Regards,
> Anshul
>
> > On 16-Sep-2015, at 2:48 PM, Rene Moser <ma...@renemoser.net> wrote:
> >
> >
> > Hi René
> >
> > On 09/16/2015 10:17 AM, Anshul Gangwar wrote:
> >> Currently we report only PowerOn VMs and do not report PowerOff VMs
> that's why we consider Missing and PowerOff as same And that's how most of
> the code is written for VM sync and each Hypervisor resource has same
> understanding. This will effect HA and many more unknown places. So please
> do not even consider to merge this change.
> >>
> >> So Now coming to bug we can fix that by changing global setting
> pingInterval to appropriate value according to hypervisor settings which
> takes care of these transitional period of missing report here or can be
> handled by introducing gracePeriod global setting.
> >
> > This is interesting, I also wrote in the bug report gracePeriod
> > calculation might be related.
> >
> https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110
> .
> >
> > IMHO making this value configurable would might solve it, but it is hard
> > to "guess" what a good grace period would be.
> >
> > In terms of VMware it depends on amounts of esx in the clusters, and
> > they can be different.
> >
> > But another question is, why make one _global_ grace period for every
> > hypervisor. Think about, users can have mixed hypervisors setups.
> >
> > So to me, a global grace period setting might not be the best solution,
> > instead we should take care hypervisor functionality, in this case
> > VMware, it handels HA by itself.
> >
> > I know a VR in 4.5 would be broken after an VMware HA event, but there
> > is another global setting, which can be enabled if you like for out of
> > band migrations router restarts.
> >
> > So to me, in 4.5 I am +1 for the patch of daan makes sense, if
> > hypervisor is VMware.
> >
> > Yours
> > René
> >
>
>


-- 
Daan

Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Rene Moser <ma...@renemoser.net>.
On 09/16/2015 11:46 AM, Anshul Gangwar wrote:
> It’s not difficult to find a good grace period. It will simply depend on your Hypervisor settings how it is configured for HA. You can easily figure out for how much time there will be no VM on any Host from your settings and simply put 2-3 times of that period as grace period.

I still think I would not that easy as it seems in every case but to
stay constructive, I would like to give it a shot.

René



Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Rene Moser <ma...@renemoser.net>.
On 09/16/2015 11:46 AM, Anshul Gangwar wrote:
> It’s not difficult to find a good grace period. It will simply depend on your Hypervisor settings how it is configured for HA. You can easily figure out for how much time there will be no VM on any Host from your settings and simply put 2-3 times of that period as grace period.

I still think I would not that easy as it seems in every case but to
stay constructive, I would like to give it a shot.

René



Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Daan Hoogland <da...@gmail.com>.
On Wed, Sep 16, 2015 at 11:46 AM, Anshul Gangwar <an...@citrix.com>
wrote:

> It’s not difficult to find a good grace period. It will simply depend on
> your Hypervisor settings how it is configured for HA. You can easily figure
> out for how much time there will be no VM on any Host from your settings
> and simply put 2-3 times of that period as grace period.
>
​That seems kludgey.
​


> It seems you have considered only one aspect of change i.e. User VMs HA.
> Did you consider System VMs HA?
> Did you consider that we have already explored that territory of separate
> handling of PowerOff and PowerReportMissing?
>
​for VMware or for all hypervisors? Do you have a link to the discussion?
These states are different.​

​Why was it decided to treat them the same?
​

> And even if you are still thinking of this change then add marvin tests
> for this change. Unit tests will not tell anything about the change.
>
​Yes, that I definitely agree on.​



> Regards,
> Anshul
>
> > On 16-Sep-2015, at 2:48 PM, Rene Moser <ma...@renemoser.net> wrote:
> >
> >
> > Hi René
> >
> > On 09/16/2015 10:17 AM, Anshul Gangwar wrote:
> >> Currently we report only PowerOn VMs and do not report PowerOff VMs
> that's why we consider Missing and PowerOff as same And that's how most of
> the code is written for VM sync and each Hypervisor resource has same
> understanding. This will effect HA and many more unknown places. So please
> do not even consider to merge this change.
> >>
> >> So Now coming to bug we can fix that by changing global setting
> pingInterval to appropriate value according to hypervisor settings which
> takes care of these transitional period of missing report here or can be
> handled by introducing gracePeriod global setting.
> >
> > This is interesting, I also wrote in the bug report gracePeriod
> > calculation might be related.
> >
> https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110
> .
> >
> > IMHO making this value configurable would might solve it, but it is hard
> > to "guess" what a good grace period would be.
> >
> > In terms of VMware it depends on amounts of esx in the clusters, and
> > they can be different.
> >
> > But another question is, why make one _global_ grace period for every
> > hypervisor. Think about, users can have mixed hypervisors setups.
> >
> > So to me, a global grace period setting might not be the best solution,
> > instead we should take care hypervisor functionality, in this case
> > VMware, it handels HA by itself.
> >
> > I know a VR in 4.5 would be broken after an VMware HA event, but there
> > is another global setting, which can be enabled if you like for out of
> > band migrations router restarts.
> >
> > So to me, in 4.5 I am +1 for the patch of daan makes sense, if
> > hypervisor is VMware.
> >
> > Yours
> > René
> >
>
>


-- 
Daan

Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Anshul Gangwar <an...@citrix.com>.
It’s not difficult to find a good grace period. It will simply depend on your Hypervisor settings how it is configured for HA. You can easily figure out for how much time there will be no VM on any Host from your settings and simply put 2-3 times of that period as grace period.

It seems you have considered only one aspect of change i.e. User VMs HA. 
Did you consider System VMs HA? 
Did you consider that we have already explored that territory of separate handling of PowerOff and PowerReportMissing?

And even if you are still thinking of this change then add marvin tests for this change. Unit tests will not tell anything about the change.

Regards,
Anshul

> On 16-Sep-2015, at 2:48 PM, Rene Moser <ma...@renemoser.net> wrote:
> 
> 
> Hi René
> 
> On 09/16/2015 10:17 AM, Anshul Gangwar wrote:
>> Currently we report only PowerOn VMs and do not report PowerOff VMs that's why we consider Missing and PowerOff as same And that's how most of the code is written for VM sync and each Hypervisor resource has same understanding. This will effect HA and many more unknown places. So please do not even consider to merge this change.
>> 
>> So Now coming to bug we can fix that by changing global setting pingInterval to appropriate value according to hypervisor settings which takes care of these transitional period of missing report here or can be handled by introducing gracePeriod global setting.
> 
> This is interesting, I also wrote in the bug report gracePeriod
> calculation might be related.
> https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110.
> 
> IMHO making this value configurable would might solve it, but it is hard
> to "guess" what a good grace period would be.
> 
> In terms of VMware it depends on amounts of esx in the clusters, and
> they can be different.
> 
> But another question is, why make one _global_ grace period for every
> hypervisor. Think about, users can have mixed hypervisors setups.
> 
> So to me, a global grace period setting might not be the best solution,
> instead we should take care hypervisor functionality, in this case
> VMware, it handels HA by itself.
> 
> I know a VR in 4.5 would be broken after an VMware HA event, but there
> is another global setting, which can be enabled if you like for out of
> band migrations router restarts.
> 
> So to me, in 4.5 I am +1 for the patch of daan makes sense, if
> hypervisor is VMware.
> 
> Yours
> René
> 


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Anshul Gangwar <an...@citrix.com>.
It’s not difficult to find a good grace period. It will simply depend on your Hypervisor settings how it is configured for HA. You can easily figure out for how much time there will be no VM on any Host from your settings and simply put 2-3 times of that period as grace period.

It seems you have considered only one aspect of change i.e. User VMs HA. 
Did you consider System VMs HA? 
Did you consider that we have already explored that territory of separate handling of PowerOff and PowerReportMissing?

And even if you are still thinking of this change then add marvin tests for this change. Unit tests will not tell anything about the change.

Regards,
Anshul

> On 16-Sep-2015, at 2:48 PM, Rene Moser <ma...@renemoser.net> wrote:
> 
> 
> Hi René
> 
> On 09/16/2015 10:17 AM, Anshul Gangwar wrote:
>> Currently we report only PowerOn VMs and do not report PowerOff VMs that's why we consider Missing and PowerOff as same And that's how most of the code is written for VM sync and each Hypervisor resource has same understanding. This will effect HA and many more unknown places. So please do not even consider to merge this change.
>> 
>> So Now coming to bug we can fix that by changing global setting pingInterval to appropriate value according to hypervisor settings which takes care of these transitional period of missing report here or can be handled by introducing gracePeriod global setting.
> 
> This is interesting, I also wrote in the bug report gracePeriod
> calculation might be related.
> https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110.
> 
> IMHO making this value configurable would might solve it, but it is hard
> to "guess" what a good grace period would be.
> 
> In terms of VMware it depends on amounts of esx in the clusters, and
> they can be different.
> 
> But another question is, why make one _global_ grace period for every
> hypervisor. Think about, users can have mixed hypervisors setups.
> 
> So to me, a global grace period setting might not be the best solution,
> instead we should take care hypervisor functionality, in this case
> VMware, it handels HA by itself.
> 
> I know a VR in 4.5 would be broken after an VMware HA event, but there
> is another global setting, which can be enabled if you like for out of
> band migrations router restarts.
> 
> So to me, in 4.5 I am +1 for the patch of daan makes sense, if
> hypervisor is VMware.
> 
> Yours
> René
> 


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Rene Moser <ma...@renemoser.net>.
Hi René

On 09/16/2015 10:17 AM, Anshul Gangwar wrote:
> Currently we report only PowerOn VMs and do not report PowerOff VMs that's why we consider Missing and PowerOff as same And that's how most of the code is written for VM sync and each Hypervisor resource has same understanding. This will effect HA and many more unknown places. So please do not even consider to merge this change.
> 
> So Now coming to bug we can fix that by changing global setting pingInterval to appropriate value according to hypervisor settings which takes care of these transitional period of missing report here or can be handled by introducing gracePeriod global setting.

This is interesting, I also wrote in the bug report gracePeriod
calculation might be related.
https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110.

IMHO making this value configurable would might solve it, but it is hard
to "guess" what a good grace period would be.

In terms of VMware it depends on amounts of esx in the clusters, and
they can be different.

But another question is, why make one _global_ grace period for every
hypervisor. Think about, users can have mixed hypervisors setups.

So to me, a global grace period setting might not be the best solution,
instead we should take care hypervisor functionality, in this case
VMware, it handels HA by itself.

I know a VR in 4.5 would be broken after an VMware HA event, but there
is another global setting, which can be enabled if you like for out of
band migrations router restarts.

So to me, in 4.5 I am +1 for the patch of daan makes sense, if
hypervisor is VMware.

Yours
René


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Anshul Gangwar <an...@citrix.com>.
Please find my answers inline

Regards,
Anshul

On 16-Sep-2015, at 1:53 PM, Daan Hoogland <da...@gmail.com>> wrote:

On Wed, Sep 16, 2015 at 10:17 AM, Anshul Gangwar <an...@citrix.com>>
wrote:

Currently we report only PowerOn VMs and do not report PowerOff VMs that's
why we consider Missing and PowerOff as same

​This is not the behavior reported in the ticket. It is intermittent.​

[Anshul] Your PR is changing this behaviour. It is intermittent because of the ping interval.


And that's how most of the code is written for VM sync and each Hypervisor
resource has same understanding. This will effect HA and many more unknown
places. So please do not even consider to merge this change.

​sure, I will hold. We do need to not regard a missing report the same as
power off however.

[Anshul] Initially this was the case that PowerOff and PowerResportMissing states were different but later when many issues were reported in vmSync it makes more sense to consider PowerReportMissing and PowerOff as same.
So if you are thinking of considering these states to be different then make sure you have considered all these scenarios and tested it on every Hypervisor that we support as each Hypervisor reports states according to this understanding.

Kaushik can give a better understanding around this.

​


So Now coming to bug we can fix that by changing global setting
pingInterval to appropriate value according to hypervisor settings which
takes care of these transitional period of missing report here or can be
handled by introducing gracePeriod global setting.

​I can see what you are talking about but not what you are saying here. Do
you mean that we need to keep track of ping interval and grace period in
the statemachine for a particular VM?​

[Anshul] Here I am just talking about changing global setting “ping.interval” value or adding one more global setting for grace period to get more finer control for these kind of scenarios.






Regards,
Anshul

On 16-Sep-2015, at 1:11 PM, Rohit Yadav <ro...@shapeblue.com><mailto:
rohit.yadav@shapeblue.com<ma...@shapeblue.com>>> wrote:


On 16-Sep-2015, at 1:06 pm, Daan Hoogland <da...@gmail.com><mailto:
daan.hoogland@gmail.com<ma...@gmail.com>>> wrote:


so two questions:
- is this a blocker?

A missing state handler is definitely a corner case, and IMO a blocker
(so, for both 4.6.0 and future 4.5.3).

- is the ignoring after logging the appropriate way to handle this?

It may be undesirable for all cases; so either introducing another global
setting to control logic (just log, vs update state in the db vs do any
operation?); or simply find out if the VM is actually running (or any other
state) just mark it as running in the DB.

Regards,
Rohit Yadav
Software Architect, ShapeBlue




M. +91 88 262 30892 | rohit.yadav@shapeblue.com<ma...@shapeblue.com><mailto:
rohit.yadav@shapeblue.com<ma...@shapeblue.com>>
Blog: bhaisaab.org<http://bhaisaab.org><http://bhaisaab.org/> | Twitter: @_bhaisaab




Find out more about ShapeBlue and our range of CloudStack related services

IaaS Cloud Design & Build<
http://shapeblue.com/iaas-cloud-design-and-build//>
CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
CloudStack Software Engineering<
http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<
http://shapeblue.com/cloudstack-infrastructure-support/>
CloudStack Bootcamp Training Courses<
http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended
solely for the use of the individual to whom it is addressed. Any views or
opinions expressed are solely those of the author and do not necessarily
represent those of Shape Blue Ltd or related companies. If you are not the
intended recipient of this email, you must neither take any action based
upon its contents, nor copy or show it to anyone. Please contact the sender
if you believe you have received this email in error. Shape Blue Ltd is a
company incorporated in England & Wales. ShapeBlue Services India LLP is a
company incorporated in India and is operated under license from Shape Blue
Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil
and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is
a company registered by The Republic of South Africa and is traded under
license from Shape Blue Ltd. ShapeBlue is a registered trademark.




--
Daan


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Anshul Gangwar <an...@citrix.com>.
Please find my answers inline

Regards,
Anshul

On 16-Sep-2015, at 1:53 PM, Daan Hoogland <da...@gmail.com>> wrote:

On Wed, Sep 16, 2015 at 10:17 AM, Anshul Gangwar <an...@citrix.com>>
wrote:

Currently we report only PowerOn VMs and do not report PowerOff VMs that's
why we consider Missing and PowerOff as same

​This is not the behavior reported in the ticket. It is intermittent.​

[Anshul] Your PR is changing this behaviour. It is intermittent because of the ping interval.


And that's how most of the code is written for VM sync and each Hypervisor
resource has same understanding. This will effect HA and many more unknown
places. So please do not even consider to merge this change.

​sure, I will hold. We do need to not regard a missing report the same as
power off however.

[Anshul] Initially this was the case that PowerOff and PowerResportMissing states were different but later when many issues were reported in vmSync it makes more sense to consider PowerReportMissing and PowerOff as same.
So if you are thinking of considering these states to be different then make sure you have considered all these scenarios and tested it on every Hypervisor that we support as each Hypervisor reports states according to this understanding.

Kaushik can give a better understanding around this.

​


So Now coming to bug we can fix that by changing global setting
pingInterval to appropriate value according to hypervisor settings which
takes care of these transitional period of missing report here or can be
handled by introducing gracePeriod global setting.

​I can see what you are talking about but not what you are saying here. Do
you mean that we need to keep track of ping interval and grace period in
the statemachine for a particular VM?​

[Anshul] Here I am just talking about changing global setting “ping.interval” value or adding one more global setting for grace period to get more finer control for these kind of scenarios.






Regards,
Anshul

On 16-Sep-2015, at 1:11 PM, Rohit Yadav <ro...@shapeblue.com><mailto:
rohit.yadav@shapeblue.com<ma...@shapeblue.com>>> wrote:


On 16-Sep-2015, at 1:06 pm, Daan Hoogland <da...@gmail.com><mailto:
daan.hoogland@gmail.com<ma...@gmail.com>>> wrote:


so two questions:
- is this a blocker?

A missing state handler is definitely a corner case, and IMO a blocker
(so, for both 4.6.0 and future 4.5.3).

- is the ignoring after logging the appropriate way to handle this?

It may be undesirable for all cases; so either introducing another global
setting to control logic (just log, vs update state in the db vs do any
operation?); or simply find out if the VM is actually running (or any other
state) just mark it as running in the DB.

Regards,
Rohit Yadav
Software Architect, ShapeBlue




M. +91 88 262 30892 | rohit.yadav@shapeblue.com<ma...@shapeblue.com><mailto:
rohit.yadav@shapeblue.com<ma...@shapeblue.com>>
Blog: bhaisaab.org<http://bhaisaab.org><http://bhaisaab.org/> | Twitter: @_bhaisaab




Find out more about ShapeBlue and our range of CloudStack related services

IaaS Cloud Design & Build<
http://shapeblue.com/iaas-cloud-design-and-build//>
CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
CloudStack Software Engineering<
http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<
http://shapeblue.com/cloudstack-infrastructure-support/>
CloudStack Bootcamp Training Courses<
http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended
solely for the use of the individual to whom it is addressed. Any views or
opinions expressed are solely those of the author and do not necessarily
represent those of Shape Blue Ltd or related companies. If you are not the
intended recipient of this email, you must neither take any action based
upon its contents, nor copy or show it to anyone. Please contact the sender
if you believe you have received this email in error. Shape Blue Ltd is a
company incorporated in England & Wales. ShapeBlue Services India LLP is a
company incorporated in India and is operated under license from Shape Blue
Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil
and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is
a company registered by The Republic of South Africa and is traded under
license from Shape Blue Ltd. ShapeBlue is a registered trademark.




--
Daan


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Daan Hoogland <da...@gmail.com>.
On Wed, Sep 16, 2015 at 10:17 AM, Anshul Gangwar <an...@citrix.com>
wrote:

> Currently we report only PowerOn VMs and do not report PowerOff VMs that's
> why we consider Missing and PowerOff as same

​This is not the behavior reported in the ticket. It is intermittend.​


> And that's how most of the code is written for VM sync and each Hypervisor
> resource has same understanding. This will effect HA and many more unknown
> places. So please do not even consider to merge this change.
>
​sure, I will hold. We do need to not regard a missing report the same as
power off however.
​


> So Now coming to bug we can fix that by changing global setting
> pingInterval to appropriate value according to hypervisor settings which
> takes care of these transitional period of missing report here or can be
> handled by introducing gracePeriod global setting.
>
​I can see what you are talking about but not what you are saying here. Do
you mean that we need to keep track of ping interval and grace period in
the statemachine for a particular VM?​



>
> Regards,
> Anshul
>
> On 16-Sep-2015, at 1:11 PM, Rohit Yadav <rohit.yadav@shapeblue.com<mailto:
> rohit.yadav@shapeblue.com>> wrote:
>
>
> On 16-Sep-2015, at 1:06 pm, Daan Hoogland <daan.hoogland@gmail.com<mailto:
> daan.hoogland@gmail.com>> wrote:
>
>
> so two questions:
> - is this a blocker?
>
> A missing state handler is definitely a corner case, and IMO a blocker
> (so, for both 4.6.0 and future 4.5.3).
>
> - is the ignoring after logging the appropriate way to handle this?
>
> It may be undesirable for all cases; so either introducing another global
> setting to control logic (just log, vs update state in the db vs do any
> operation?); or simply find out if the VM is actually running (or any other
> state) just mark it as running in the DB.
>
> Regards,
> Rohit Yadav
> Software Architect, ShapeBlue
>
>
>
>
> M. +91 88 262 30892 | rohit.yadav@shapeblue.com<mailto:
> rohit.yadav@shapeblue.com>
> Blog: bhaisaab.org<http://bhaisaab.org/> | Twitter: @_bhaisaab
>
>
>
>
> Find out more about ShapeBlue and our range of CloudStack related services
>
> IaaS Cloud Design & Build<
> http://shapeblue.com/iaas-cloud-design-and-build//>
> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
> CloudStack Software Engineering<
> http://shapeblue.com/cloudstack-software-engineering/>
> CloudStack Infrastructure Support<
> http://shapeblue.com/cloudstack-infrastructure-support/>
> CloudStack Bootcamp Training Courses<
> http://shapeblue.com/cloudstack-training/>
>
> This email and any attachments to it may be confidential and are intended
> solely for the use of the individual to whom it is addressed. Any views or
> opinions expressed are solely those of the author and do not necessarily
> represent those of Shape Blue Ltd or related companies. If you are not the
> intended recipient of this email, you must neither take any action based
> upon its contents, nor copy or show it to anyone. Please contact the sender
> if you believe you have received this email in error. Shape Blue Ltd is a
> company incorporated in England & Wales. ShapeBlue Services India LLP is a
> company incorporated in India and is operated under license from Shape Blue
> Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil
> and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is
> a company registered by The Republic of South Africa and is traded under
> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>
>


-- 
Daan

Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Rene Moser <ma...@renemoser.net>.
Hi René

On 09/16/2015 10:17 AM, Anshul Gangwar wrote:
> Currently we report only PowerOn VMs and do not report PowerOff VMs that's why we consider Missing and PowerOff as same And that's how most of the code is written for VM sync and each Hypervisor resource has same understanding. This will effect HA and many more unknown places. So please do not even consider to merge this change.
> 
> So Now coming to bug we can fix that by changing global setting pingInterval to appropriate value according to hypervisor settings which takes care of these transitional period of missing report here or can be handled by introducing gracePeriod global setting.

This is interesting, I also wrote in the bug report gracePeriod
calculation might be related.
https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110.

IMHO making this value configurable would might solve it, but it is hard
to "guess" what a good grace period would be.

In terms of VMware it depends on amounts of esx in the clusters, and
they can be different.

But another question is, why make one _global_ grace period for every
hypervisor. Think about, users can have mixed hypervisors setups.

So to me, a global grace period setting might not be the best solution,
instead we should take care hypervisor functionality, in this case
VMware, it handels HA by itself.

I know a VR in 4.5 would be broken after an VMware HA event, but there
is another global setting, which can be enabled if you like for out of
band migrations router restarts.

So to me, in 4.5 I am +1 for the patch of daan makes sense, if
hypervisor is VMware.

Yours
René


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Daan Hoogland <da...@gmail.com>.
On Wed, Sep 16, 2015 at 10:17 AM, Anshul Gangwar <an...@citrix.com>
wrote:

> Currently we report only PowerOn VMs and do not report PowerOff VMs that's
> why we consider Missing and PowerOff as same

​This is not the behavior reported in the ticket. It is intermittend.​


> And that's how most of the code is written for VM sync and each Hypervisor
> resource has same understanding. This will effect HA and many more unknown
> places. So please do not even consider to merge this change.
>
​sure, I will hold. We do need to not regard a missing report the same as
power off however.
​


> So Now coming to bug we can fix that by changing global setting
> pingInterval to appropriate value according to hypervisor settings which
> takes care of these transitional period of missing report here or can be
> handled by introducing gracePeriod global setting.
>
​I can see what you are talking about but not what you are saying here. Do
you mean that we need to keep track of ping interval and grace period in
the statemachine for a particular VM?​



>
> Regards,
> Anshul
>
> On 16-Sep-2015, at 1:11 PM, Rohit Yadav <rohit.yadav@shapeblue.com<mailto:
> rohit.yadav@shapeblue.com>> wrote:
>
>
> On 16-Sep-2015, at 1:06 pm, Daan Hoogland <daan.hoogland@gmail.com<mailto:
> daan.hoogland@gmail.com>> wrote:
>
>
> so two questions:
> - is this a blocker?
>
> A missing state handler is definitely a corner case, and IMO a blocker
> (so, for both 4.6.0 and future 4.5.3).
>
> - is the ignoring after logging the appropriate way to handle this?
>
> It may be undesirable for all cases; so either introducing another global
> setting to control logic (just log, vs update state in the db vs do any
> operation?); or simply find out if the VM is actually running (or any other
> state) just mark it as running in the DB.
>
> Regards,
> Rohit Yadav
> Software Architect, ShapeBlue
>
>
>
>
> M. +91 88 262 30892 | rohit.yadav@shapeblue.com<mailto:
> rohit.yadav@shapeblue.com>
> Blog: bhaisaab.org<http://bhaisaab.org/> | Twitter: @_bhaisaab
>
>
>
>
> Find out more about ShapeBlue and our range of CloudStack related services
>
> IaaS Cloud Design & Build<
> http://shapeblue.com/iaas-cloud-design-and-build//>
> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
> CloudStack Software Engineering<
> http://shapeblue.com/cloudstack-software-engineering/>
> CloudStack Infrastructure Support<
> http://shapeblue.com/cloudstack-infrastructure-support/>
> CloudStack Bootcamp Training Courses<
> http://shapeblue.com/cloudstack-training/>
>
> This email and any attachments to it may be confidential and are intended
> solely for the use of the individual to whom it is addressed. Any views or
> opinions expressed are solely those of the author and do not necessarily
> represent those of Shape Blue Ltd or related companies. If you are not the
> intended recipient of this email, you must neither take any action based
> upon its contents, nor copy or show it to anyone. Please contact the sender
> if you believe you have received this email in error. Shape Blue Ltd is a
> company incorporated in England & Wales. ShapeBlue Services India LLP is a
> company incorporated in India and is operated under license from Shape Blue
> Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil
> and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is
> a company registered by The Republic of South Africa and is traded under
> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>
>


-- 
Daan

Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Anshul Gangwar <an...@citrix.com>.
Currently we report only PowerOn VMs and do not report PowerOff VMs that's why we consider Missing and PowerOff as same And that's how most of the code is written for VM sync and each Hypervisor resource has same understanding. This will effect HA and many more unknown places. So please do not even consider to merge this change.

So Now coming to bug we can fix that by changing global setting pingInterval to appropriate value according to hypervisor settings which takes care of these transitional period of missing report here or can be handled by introducing gracePeriod global setting.

Regards,
Anshul

On 16-Sep-2015, at 1:11 PM, Rohit Yadav <ro...@shapeblue.com>> wrote:


On 16-Sep-2015, at 1:06 pm, Daan Hoogland <da...@gmail.com>> wrote:


so two questions:
- is this a blocker?

A missing state handler is definitely a corner case, and IMO a blocker (so, for both 4.6.0 and future 4.5.3).

- is the ignoring after logging the appropriate way to handle this?

It may be undesirable for all cases; so either introducing another global setting to control logic (just log, vs update state in the db vs do any operation?); or simply find out if the VM is actually running (or any other state) just mark it as running in the DB.

Regards,
Rohit Yadav
Software Architect, ShapeBlue




M. +91 88 262 30892 | rohit.yadav@shapeblue.com<ma...@shapeblue.com>
Blog: bhaisaab.org<http://bhaisaab.org/> | Twitter: @_bhaisaab




Find out more about ShapeBlue and our range of CloudStack related services

IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//>
CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/>
CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Anshul Gangwar <an...@citrix.com>.
Currently we report only PowerOn VMs and do not report PowerOff VMs that's why we consider Missing and PowerOff as same And that's how most of the code is written for VM sync and each Hypervisor resource has same understanding. This will effect HA and many more unknown places. So please do not even consider to merge this change.

So Now coming to bug we can fix that by changing global setting pingInterval to appropriate value according to hypervisor settings which takes care of these transitional period of missing report here or can be handled by introducing gracePeriod global setting.

Regards,
Anshul

On 16-Sep-2015, at 1:11 PM, Rohit Yadav <ro...@shapeblue.com>> wrote:


On 16-Sep-2015, at 1:06 pm, Daan Hoogland <da...@gmail.com>> wrote:


so two questions:
- is this a blocker?

A missing state handler is definitely a corner case, and IMO a blocker (so, for both 4.6.0 and future 4.5.3).

- is the ignoring after logging the appropriate way to handle this?

It may be undesirable for all cases; so either introducing another global setting to control logic (just log, vs update state in the db vs do any operation?); or simply find out if the VM is actually running (or any other state) just mark it as running in the DB.

Regards,
Rohit Yadav
Software Architect, ShapeBlue




M. +91 88 262 30892 | rohit.yadav@shapeblue.com<ma...@shapeblue.com>
Blog: bhaisaab.org<http://bhaisaab.org/> | Twitter: @_bhaisaab




Find out more about ShapeBlue and our range of CloudStack related services

IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//>
CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/>
CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.


Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Rohit Yadav <ro...@shapeblue.com>.
On 16-Sep-2015, at 1:06 pm, Daan Hoogland <da...@gmail.com>> wrote:


so two questions:
- is this a blocker?

A missing state handler is definitely a corner case, and IMO a blocker (so, for both 4.6.0 and future 4.5.3).

- is the ignoring after logging the appropriate way to handle this?

It may be undesirable for all cases; so either introducing another global setting to control logic (just log, vs update state in the db vs do any operation?); or simply find out if the VM is actually running (or any other state) just mark it as running in the DB.

Regards,
Rohit Yadav
Software Architect, ShapeBlue


[cid:9DD97B41-04C5-45F0-92A7-951F3E962F7A]


M. +91 88 262 30892 | rohit.yadav@shapeblue.com<ma...@shapeblue.com>
Blog: bhaisaab.org<http://bhaisaab.org> | Twitter: @_bhaisaab




Find out more about ShapeBlue and our range of CloudStack related services

IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//>
CSForge - rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/>
CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.

Re: [DISCUSS][PROPOSAL]missing power state reports from hypervisors on VMs ([BLOCKER]?)

Posted by Rohit Yadav <ro...@shapeblue.com>.
On 16-Sep-2015, at 1:06 pm, Daan Hoogland <da...@gmail.com>> wrote:


so two questions:
- is this a blocker?

A missing state handler is definitely a corner case, and IMO a blocker (so, for both 4.6.0 and future 4.5.3).

- is the ignoring after logging the appropriate way to handle this?

It may be undesirable for all cases; so either introducing another global setting to control logic (just log, vs update state in the db vs do any operation?); or simply find out if the VM is actually running (or any other state) just mark it as running in the DB.

Regards,
Rohit Yadav
Software Architect, ShapeBlue


[cid:9DD97B41-04C5-45F0-92A7-951F3E962F7A]


M. +91 88 262 30892 | rohit.yadav@shapeblue.com<ma...@shapeblue.com>
Blog: bhaisaab.org<http://bhaisaab.org> | Twitter: @_bhaisaab




Find out more about ShapeBlue and our range of CloudStack related services

IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//>
CSForge - rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/>
CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.