You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Marcus Sorensen <sh...@gmail.com> on 2013/04/25 22:01:56 UTC

[ACS41] new critical bug

I didn't mark this one as a blocker, but it's still pretty bad for someone
managing VMs in an automated fashion. Trying to stop a VM when the KVM
agent is disconnected reports success.

https://issues.apache.org/jira/browse/CLOUDSTACK-2195

Re: [ACS41] new critical bug

Posted by Marcus Sorensen <sh...@gmail.com>.
Ok, review request in. I applied it to master, but I think there's a
whitespace issue or something that keeps it from applying to 4.1. It's
basically Edison's patch plus a logger message
in VirtualMachineManagerImpl.java that tells the admin why stopping the VM
failed (unable to connect to agent).


On Thu, Apr 25, 2013 at 10:13 PM, Marcus Sorensen <sh...@gmail.com>wrote:

> argh, that version of stopVirtualMachine (the one returning boolean) isn't
> even used!
>
>
> On Thu, Apr 25, 2013 at 10:07 PM, Marcus Sorensen <sh...@gmail.com>wrote:
>
>> Actually it's that your patch is for the 'forced' stop. Still doesn't
>> apply to 4.1, but the code I'm looking at for the moment is the non-forced
>> stop, which returns a boolean.
>>
>>
>> On Thu, Apr 25, 2013 at 10:05 PM, Marcus Sorensen <sh...@gmail.com>wrote:
>>
>>> Uh, nevermind, of course I can't return null there because it requires a
>>> boolean in 4.1. I'll play with this a bit and get back with you.
>>>
>>>
>>> On Thu, Apr 25, 2013 at 10:01 PM, Marcus Sorensen <sh...@gmail.com>wrote:
>>>
>>>> Thanks, Edison. That doesn't apply to 4.1, but it's basically the same
>>>> as the first hunk of my patch above, just that it returns null instead of
>>>> throwing an exception.  I'm going to test that now. I didn't see why that
>>>> CloudRuntimeException in the first hunk would be squelched, but the finally
>>>> clause makes sense in the second hunk.
>>>>
>>>>
>>>> On Thu, Apr 25, 2013 at 6:08 PM, Edison Su <Ed...@citrix.com>wrote:
>>>>
>>>>> This code will do the magic:
>>>>> } finally {
>>>>> if (!stopped) {
>>>>>                 if (!forced) {
>>>>>                     s_logger.warn("Unable to stop vm " + vm);
>>>>>                     try {
>>>>>                         stateTransitTo(vm, Event.OperationFailed,
>>>>> vm.getHostId());
>>>>>                     } catch (NoTransitionException e) {
>>>>>                         s_logger.warn("Unable to transition the state
>>>>> " + vm);
>>>>>                     }
>>>>>                     return false;
>>>>>                 } else {
>>>>>                     s_logger.warn("Unable to actually stop " + vm + "
>>>>> but continue with release because it's a force stop");
>>>>>                     vmGuru.finalizeStop(profile, answer);
>>>>>                 }
>>>>>             }
>>>>> }
>>>>>
>>>>>
>>>>> Basically, it means, if stop failed and it's not force stop, then mark
>>>>> the vm as running.
>>>>> I think the logic here sounds correct, as cloudstack can't delete the
>>>>> vm(due to the connection between mgt server and kvm agent is broken), then
>>>>> all the operations on the VM should fail, and the VM status shouldn't be
>>>>> changed.
>>>>>
>>>>> But the problem is, the stop api call should fail, as stop vm actually
>>>>> failed.
>>>>> Could you try the following patch:
>>>>>
>>>>> diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> index 7bf04ec..ff20c54 100755
>>>>> --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> @@ -3684,10 +3684,15 @@ public class UserVmManagerImpl extends
>>>>> ManagerBase implements UserVmManager, Use
>>>>>          }
>>>>>
>>>>>          UserVO user = _userDao.findById(userId);
>>>>> -
>>>>> +        boolean status = false;
>>>>>          try {
>>>>>              VirtualMachineEntity vmEntity =
>>>>> _orchSrvc.getVirtualMachine(vm.getUuid());
>>>>> -            vmEntity.stop(new Long(userId).toString());
>>>>> +            status = vmEntity.stop(new Long(userId).toString());
>>>>> +            if (status) {
>>>>> +               return _vmDao.findById(vmId);
>>>>> +            } else {
>>>>> +               return null;
>>>>> +            }
>>>>>          } catch (ResourceUnavailableException e) {
>>>>>              throw new CloudRuntimeException(
>>>>>                      "Unable to contact the agent to stop the virtual
>>>>> machine "
>>>>> @@ -3698,7 +3703,7 @@ public class UserVmManagerImpl extends
>>>>> ManagerBase implements UserVmManager, Use
>>>>>                              + vm, e);
>>>>>          }
>>>>>
>>>>> -        return _vmDao.findById(vmId);
>>>>> +
>>>>>      }
>>>>>
>>>>>      @Override
>>>>>
>>>>>
>>>>> > -----Original Message-----
>>>>> > From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>>>> > Sent: Thursday, April 25, 2013 4:46 PM
>>>>> > To: dev@cloudstack.apache.org
>>>>> > Subject: Re: [ACS41] new critical bug
>>>>> >
>>>>> > I tried a few things, including throwing a CloudRuntimeException in
>>>>> several
>>>>> > places where I thought it made sense, such as an empty
>>>>> > AgentUnavailableException catch block, but it doesn't seem to do
>>>>> anything at
>>>>> > all, I don't see it in the logs, even though I do see the debug code
>>>>> that I
>>>>> > placed next to it.  So I give up for now :-)
>>>>> >
>>>>> > diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> > b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> > index 7bf04ec..0373690 100755
>>>>> > --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> > +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> > @@ -685,7 +685,8 @@ public class UserVmManagerImpl extends
>>>>> > ManagerBase implements UserVmManager, Use
>>>>> >          if (status) {
>>>>> >              return status;
>>>>> >          } else {
>>>>> > -            return status;
>>>>> > +            throw new CloudRuntimeException(
>>>>> > +                    "Unable to stop the virtual machine" + vm );
>>>>> >          }
>>>>> >      }
>>>>> >
>>>>> > diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>> > b/server/src/com/cloud/vm/VirtualMachineManagerImpl.
>>>>> > index 2c2986f..e320ff1 100755
>>>>> > --- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>> > +++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>> > @@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends
>>>>> > ManagerBase implements VirtualMac
>>>>> >              vmGuru.finalizeStop(profile, answer);
>>>>> >
>>>>> >          } catch (AgentUnavailableException e) {
>>>>> > +            s_logger.error("Unable to stop vm, agent unavailable: "
>>>>> +
>>>>> > e.toString());
>>>>> > +            throw new CloudRuntimeException("Unable to stop vm,
>>>>> agent
>>>>> > unavailable");
>>>>> >          } catch (OperationTimedoutException e) {
>>>>> > +            s_logger.error("Unable to stop vm, operation timed out:
>>>>> " +
>>>>> > e.toString());
>>>>> > +            throw new CloudRuntimeException("Unable to stop vm,
>>>>> > + operation
>>>>> > timed out");
>>>>> >          } finally {
>>>>> >              if (!stopped) {
>>>>> >                  if (!forced) {
>>>>> >
>>>>> >
>>>>> > On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers
>>>>> > <ch...@sungard.com>wrote:
>>>>> >
>>>>> > > On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
>>>>> > > > I didn't mark this one as a blocker, but it's still pretty bad
>>>>> for
>>>>> > > someone
>>>>> > > > managing VMs in an automated fashion. Trying to stop a VM when
>>>>> the
>>>>> > > > KVM agent is disconnected reports success.
>>>>> > > >
>>>>> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-2195
>>>>> > >
>>>>> > > I'll pull a fix in, if we have one ready before the final blockers.
>>>>> > > Otherwise I'd pull it into a 4.1.1 release.
>>>>> > >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: [ACS41] new critical bug

Posted by Marcus Sorensen <sh...@gmail.com>.
argh, that version of stopVirtualMachine (the one returning boolean) isn't
even used!


On Thu, Apr 25, 2013 at 10:07 PM, Marcus Sorensen <sh...@gmail.com>wrote:

> Actually it's that your patch is for the 'forced' stop. Still doesn't
> apply to 4.1, but the code I'm looking at for the moment is the non-forced
> stop, which returns a boolean.
>
>
> On Thu, Apr 25, 2013 at 10:05 PM, Marcus Sorensen <sh...@gmail.com>wrote:
>
>> Uh, nevermind, of course I can't return null there because it requires a
>> boolean in 4.1. I'll play with this a bit and get back with you.
>>
>>
>> On Thu, Apr 25, 2013 at 10:01 PM, Marcus Sorensen <sh...@gmail.com>wrote:
>>
>>> Thanks, Edison. That doesn't apply to 4.1, but it's basically the same
>>> as the first hunk of my patch above, just that it returns null instead of
>>> throwing an exception.  I'm going to test that now. I didn't see why that
>>> CloudRuntimeException in the first hunk would be squelched, but the finally
>>> clause makes sense in the second hunk.
>>>
>>>
>>> On Thu, Apr 25, 2013 at 6:08 PM, Edison Su <Ed...@citrix.com> wrote:
>>>
>>>> This code will do the magic:
>>>> } finally {
>>>> if (!stopped) {
>>>>                 if (!forced) {
>>>>                     s_logger.warn("Unable to stop vm " + vm);
>>>>                     try {
>>>>                         stateTransitTo(vm, Event.OperationFailed,
>>>> vm.getHostId());
>>>>                     } catch (NoTransitionException e) {
>>>>                         s_logger.warn("Unable to transition the state "
>>>> + vm);
>>>>                     }
>>>>                     return false;
>>>>                 } else {
>>>>                     s_logger.warn("Unable to actually stop " + vm + "
>>>> but continue with release because it's a force stop");
>>>>                     vmGuru.finalizeStop(profile, answer);
>>>>                 }
>>>>             }
>>>> }
>>>>
>>>>
>>>> Basically, it means, if stop failed and it's not force stop, then mark
>>>> the vm as running.
>>>> I think the logic here sounds correct, as cloudstack can't delete the
>>>> vm(due to the connection between mgt server and kvm agent is broken), then
>>>> all the operations on the VM should fail, and the VM status shouldn't be
>>>> changed.
>>>>
>>>> But the problem is, the stop api call should fail, as stop vm actually
>>>> failed.
>>>> Could you try the following patch:
>>>>
>>>> diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>> b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>> index 7bf04ec..ff20c54 100755
>>>> --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>> +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>> @@ -3684,10 +3684,15 @@ public class UserVmManagerImpl extends
>>>> ManagerBase implements UserVmManager, Use
>>>>          }
>>>>
>>>>          UserVO user = _userDao.findById(userId);
>>>> -
>>>> +        boolean status = false;
>>>>          try {
>>>>              VirtualMachineEntity vmEntity =
>>>> _orchSrvc.getVirtualMachine(vm.getUuid());
>>>> -            vmEntity.stop(new Long(userId).toString());
>>>> +            status = vmEntity.stop(new Long(userId).toString());
>>>> +            if (status) {
>>>> +               return _vmDao.findById(vmId);
>>>> +            } else {
>>>> +               return null;
>>>> +            }
>>>>          } catch (ResourceUnavailableException e) {
>>>>              throw new CloudRuntimeException(
>>>>                      "Unable to contact the agent to stop the virtual
>>>> machine "
>>>> @@ -3698,7 +3703,7 @@ public class UserVmManagerImpl extends
>>>> ManagerBase implements UserVmManager, Use
>>>>                              + vm, e);
>>>>          }
>>>>
>>>> -        return _vmDao.findById(vmId);
>>>> +
>>>>      }
>>>>
>>>>      @Override
>>>>
>>>>
>>>> > -----Original Message-----
>>>> > From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>>> > Sent: Thursday, April 25, 2013 4:46 PM
>>>> > To: dev@cloudstack.apache.org
>>>> > Subject: Re: [ACS41] new critical bug
>>>> >
>>>> > I tried a few things, including throwing a CloudRuntimeException in
>>>> several
>>>> > places where I thought it made sense, such as an empty
>>>> > AgentUnavailableException catch block, but it doesn't seem to do
>>>> anything at
>>>> > all, I don't see it in the logs, even though I do see the debug code
>>>> that I
>>>> > placed next to it.  So I give up for now :-)
>>>> >
>>>> > diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>> > b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>> > index 7bf04ec..0373690 100755
>>>> > --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>> > +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>> > @@ -685,7 +685,8 @@ public class UserVmManagerImpl extends
>>>> > ManagerBase implements UserVmManager, Use
>>>> >          if (status) {
>>>> >              return status;
>>>> >          } else {
>>>> > -            return status;
>>>> > +            throw new CloudRuntimeException(
>>>> > +                    "Unable to stop the virtual machine" + vm );
>>>> >          }
>>>> >      }
>>>> >
>>>> > diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>> > b/server/src/com/cloud/vm/VirtualMachineManagerImpl.
>>>> > index 2c2986f..e320ff1 100755
>>>> > --- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>> > +++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>> > @@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends
>>>> > ManagerBase implements VirtualMac
>>>> >              vmGuru.finalizeStop(profile, answer);
>>>> >
>>>> >          } catch (AgentUnavailableException e) {
>>>> > +            s_logger.error("Unable to stop vm, agent unavailable: " +
>>>> > e.toString());
>>>> > +            throw new CloudRuntimeException("Unable to stop vm, agent
>>>> > unavailable");
>>>> >          } catch (OperationTimedoutException e) {
>>>> > +            s_logger.error("Unable to stop vm, operation timed out:
>>>> " +
>>>> > e.toString());
>>>> > +            throw new CloudRuntimeException("Unable to stop vm,
>>>> > + operation
>>>> > timed out");
>>>> >          } finally {
>>>> >              if (!stopped) {
>>>> >                  if (!forced) {
>>>> >
>>>> >
>>>> > On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers
>>>> > <ch...@sungard.com>wrote:
>>>> >
>>>> > > On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
>>>> > > > I didn't mark this one as a blocker, but it's still pretty bad for
>>>> > > someone
>>>> > > > managing VMs in an automated fashion. Trying to stop a VM when the
>>>> > > > KVM agent is disconnected reports success.
>>>> > > >
>>>> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-2195
>>>> > >
>>>> > > I'll pull a fix in, if we have one ready before the final blockers.
>>>> > > Otherwise I'd pull it into a 4.1.1 release.
>>>> > >
>>>>
>>>
>>>
>>
>

Re: [ACS41] new critical bug

Posted by Marcus Sorensen <sh...@gmail.com>.
Actually it's that your patch is for the 'forced' stop. Still doesn't apply
to 4.1, but the code I'm looking at for the moment is the non-forced stop,
which returns a boolean.


On Thu, Apr 25, 2013 at 10:05 PM, Marcus Sorensen <sh...@gmail.com>wrote:

> Uh, nevermind, of course I can't return null there because it requires a
> boolean in 4.1. I'll play with this a bit and get back with you.
>
>
> On Thu, Apr 25, 2013 at 10:01 PM, Marcus Sorensen <sh...@gmail.com>wrote:
>
>> Thanks, Edison. That doesn't apply to 4.1, but it's basically the same as
>> the first hunk of my patch above, just that it returns null instead of
>> throwing an exception.  I'm going to test that now. I didn't see why that
>> CloudRuntimeException in the first hunk would be squelched, but the finally
>> clause makes sense in the second hunk.
>>
>>
>> On Thu, Apr 25, 2013 at 6:08 PM, Edison Su <Ed...@citrix.com> wrote:
>>
>>> This code will do the magic:
>>> } finally {
>>> if (!stopped) {
>>>                 if (!forced) {
>>>                     s_logger.warn("Unable to stop vm " + vm);
>>>                     try {
>>>                         stateTransitTo(vm, Event.OperationFailed,
>>> vm.getHostId());
>>>                     } catch (NoTransitionException e) {
>>>                         s_logger.warn("Unable to transition the state "
>>> + vm);
>>>                     }
>>>                     return false;
>>>                 } else {
>>>                     s_logger.warn("Unable to actually stop " + vm + "
>>> but continue with release because it's a force stop");
>>>                     vmGuru.finalizeStop(profile, answer);
>>>                 }
>>>             }
>>> }
>>>
>>>
>>> Basically, it means, if stop failed and it's not force stop, then mark
>>> the vm as running.
>>> I think the logic here sounds correct, as cloudstack can't delete the
>>> vm(due to the connection between mgt server and kvm agent is broken), then
>>> all the operations on the VM should fail, and the VM status shouldn't be
>>> changed.
>>>
>>> But the problem is, the stop api call should fail, as stop vm actually
>>> failed.
>>> Could you try the following patch:
>>>
>>> diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>> b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>> index 7bf04ec..ff20c54 100755
>>> --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>> +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>> @@ -3684,10 +3684,15 @@ public class UserVmManagerImpl extends
>>> ManagerBase implements UserVmManager, Use
>>>          }
>>>
>>>          UserVO user = _userDao.findById(userId);
>>> -
>>> +        boolean status = false;
>>>          try {
>>>              VirtualMachineEntity vmEntity =
>>> _orchSrvc.getVirtualMachine(vm.getUuid());
>>> -            vmEntity.stop(new Long(userId).toString());
>>> +            status = vmEntity.stop(new Long(userId).toString());
>>> +            if (status) {
>>> +               return _vmDao.findById(vmId);
>>> +            } else {
>>> +               return null;
>>> +            }
>>>          } catch (ResourceUnavailableException e) {
>>>              throw new CloudRuntimeException(
>>>                      "Unable to contact the agent to stop the virtual
>>> machine "
>>> @@ -3698,7 +3703,7 @@ public class UserVmManagerImpl extends ManagerBase
>>> implements UserVmManager, Use
>>>                              + vm, e);
>>>          }
>>>
>>> -        return _vmDao.findById(vmId);
>>> +
>>>      }
>>>
>>>      @Override
>>>
>>>
>>> > -----Original Message-----
>>> > From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>> > Sent: Thursday, April 25, 2013 4:46 PM
>>> > To: dev@cloudstack.apache.org
>>> > Subject: Re: [ACS41] new critical bug
>>> >
>>> > I tried a few things, including throwing a CloudRuntimeException in
>>> several
>>> > places where I thought it made sense, such as an empty
>>> > AgentUnavailableException catch block, but it doesn't seem to do
>>> anything at
>>> > all, I don't see it in the logs, even though I do see the debug code
>>> that I
>>> > placed next to it.  So I give up for now :-)
>>> >
>>> > diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>> > b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>> > index 7bf04ec..0373690 100755
>>> > --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>> > +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>> > @@ -685,7 +685,8 @@ public class UserVmManagerImpl extends
>>> > ManagerBase implements UserVmManager, Use
>>> >          if (status) {
>>> >              return status;
>>> >          } else {
>>> > -            return status;
>>> > +            throw new CloudRuntimeException(
>>> > +                    "Unable to stop the virtual machine" + vm );
>>> >          }
>>> >      }
>>> >
>>> > diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>> > b/server/src/com/cloud/vm/VirtualMachineManagerImpl.
>>> > index 2c2986f..e320ff1 100755
>>> > --- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>> > +++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>> > @@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends
>>> > ManagerBase implements VirtualMac
>>> >              vmGuru.finalizeStop(profile, answer);
>>> >
>>> >          } catch (AgentUnavailableException e) {
>>> > +            s_logger.error("Unable to stop vm, agent unavailable: " +
>>> > e.toString());
>>> > +            throw new CloudRuntimeException("Unable to stop vm, agent
>>> > unavailable");
>>> >          } catch (OperationTimedoutException e) {
>>> > +            s_logger.error("Unable to stop vm, operation timed out: "
>>> +
>>> > e.toString());
>>> > +            throw new CloudRuntimeException("Unable to stop vm,
>>> > + operation
>>> > timed out");
>>> >          } finally {
>>> >              if (!stopped) {
>>> >                  if (!forced) {
>>> >
>>> >
>>> > On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers
>>> > <ch...@sungard.com>wrote:
>>> >
>>> > > On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
>>> > > > I didn't mark this one as a blocker, but it's still pretty bad for
>>> > > someone
>>> > > > managing VMs in an automated fashion. Trying to stop a VM when the
>>> > > > KVM agent is disconnected reports success.
>>> > > >
>>> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-2195
>>> > >
>>> > > I'll pull a fix in, if we have one ready before the final blockers.
>>> > > Otherwise I'd pull it into a 4.1.1 release.
>>> > >
>>>
>>
>>
>

Re: [ACS41] new critical bug

Posted by Marcus Sorensen <sh...@gmail.com>.
Uh, nevermind, of course I can't return null there because it requires a
boolean in 4.1. I'll play with this a bit and get back with you.


On Thu, Apr 25, 2013 at 10:01 PM, Marcus Sorensen <sh...@gmail.com>wrote:

> Thanks, Edison. That doesn't apply to 4.1, but it's basically the same as
> the first hunk of my patch above, just that it returns null instead of
> throwing an exception.  I'm going to test that now. I didn't see why that
> CloudRuntimeException in the first hunk would be squelched, but the finally
> clause makes sense in the second hunk.
>
>
> On Thu, Apr 25, 2013 at 6:08 PM, Edison Su <Ed...@citrix.com> wrote:
>
>> This code will do the magic:
>> } finally {
>> if (!stopped) {
>>                 if (!forced) {
>>                     s_logger.warn("Unable to stop vm " + vm);
>>                     try {
>>                         stateTransitTo(vm, Event.OperationFailed,
>> vm.getHostId());
>>                     } catch (NoTransitionException e) {
>>                         s_logger.warn("Unable to transition the state " +
>> vm);
>>                     }
>>                     return false;
>>                 } else {
>>                     s_logger.warn("Unable to actually stop " + vm + " but
>> continue with release because it's a force stop");
>>                     vmGuru.finalizeStop(profile, answer);
>>                 }
>>             }
>> }
>>
>>
>> Basically, it means, if stop failed and it's not force stop, then mark
>> the vm as running.
>> I think the logic here sounds correct, as cloudstack can't delete the
>> vm(due to the connection between mgt server and kvm agent is broken), then
>> all the operations on the VM should fail, and the VM status shouldn't be
>> changed.
>>
>> But the problem is, the stop api call should fail, as stop vm actually
>> failed.
>> Could you try the following patch:
>>
>> diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>> b/server/src/com/cloud/vm/UserVmManagerImpl.java
>> index 7bf04ec..ff20c54 100755
>> --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>> +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>> @@ -3684,10 +3684,15 @@ public class UserVmManagerImpl extends
>> ManagerBase implements UserVmManager, Use
>>          }
>>
>>          UserVO user = _userDao.findById(userId);
>> -
>> +        boolean status = false;
>>          try {
>>              VirtualMachineEntity vmEntity =
>> _orchSrvc.getVirtualMachine(vm.getUuid());
>> -            vmEntity.stop(new Long(userId).toString());
>> +            status = vmEntity.stop(new Long(userId).toString());
>> +            if (status) {
>> +               return _vmDao.findById(vmId);
>> +            } else {
>> +               return null;
>> +            }
>>          } catch (ResourceUnavailableException e) {
>>              throw new CloudRuntimeException(
>>                      "Unable to contact the agent to stop the virtual
>> machine "
>> @@ -3698,7 +3703,7 @@ public class UserVmManagerImpl extends ManagerBase
>> implements UserVmManager, Use
>>                              + vm, e);
>>          }
>>
>> -        return _vmDao.findById(vmId);
>> +
>>      }
>>
>>      @Override
>>
>>
>> > -----Original Message-----
>> > From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>> > Sent: Thursday, April 25, 2013 4:46 PM
>> > To: dev@cloudstack.apache.org
>> > Subject: Re: [ACS41] new critical bug
>> >
>> > I tried a few things, including throwing a CloudRuntimeException in
>> several
>> > places where I thought it made sense, such as an empty
>> > AgentUnavailableException catch block, but it doesn't seem to do
>> anything at
>> > all, I don't see it in the logs, even though I do see the debug code
>> that I
>> > placed next to it.  So I give up for now :-)
>> >
>> > diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>> > b/server/src/com/cloud/vm/UserVmManagerImpl.java
>> > index 7bf04ec..0373690 100755
>> > --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>> > +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>> > @@ -685,7 +685,8 @@ public class UserVmManagerImpl extends
>> > ManagerBase implements UserVmManager, Use
>> >          if (status) {
>> >              return status;
>> >          } else {
>> > -            return status;
>> > +            throw new CloudRuntimeException(
>> > +                    "Unable to stop the virtual machine" + vm );
>> >          }
>> >      }
>> >
>> > diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>> > b/server/src/com/cloud/vm/VirtualMachineManagerImpl.
>> > index 2c2986f..e320ff1 100755
>> > --- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>> > +++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>> > @@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends
>> > ManagerBase implements VirtualMac
>> >              vmGuru.finalizeStop(profile, answer);
>> >
>> >          } catch (AgentUnavailableException e) {
>> > +            s_logger.error("Unable to stop vm, agent unavailable: " +
>> > e.toString());
>> > +            throw new CloudRuntimeException("Unable to stop vm, agent
>> > unavailable");
>> >          } catch (OperationTimedoutException e) {
>> > +            s_logger.error("Unable to stop vm, operation timed out: " +
>> > e.toString());
>> > +            throw new CloudRuntimeException("Unable to stop vm,
>> > + operation
>> > timed out");
>> >          } finally {
>> >              if (!stopped) {
>> >                  if (!forced) {
>> >
>> >
>> > On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers
>> > <ch...@sungard.com>wrote:
>> >
>> > > On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
>> > > > I didn't mark this one as a blocker, but it's still pretty bad for
>> > > someone
>> > > > managing VMs in an automated fashion. Trying to stop a VM when the
>> > > > KVM agent is disconnected reports success.
>> > > >
>> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-2195
>> > >
>> > > I'll pull a fix in, if we have one ready before the final blockers.
>> > > Otherwise I'd pull it into a 4.1.1 release.
>> > >
>>
>
>

Re: [ACS41] new critical bug

Posted by Marcus Sorensen <sh...@gmail.com>.
Thanks, Edison. That doesn't apply to 4.1, but it's basically the same as
the first hunk of my patch above, just that it returns null instead of
throwing an exception.  I'm going to test that now. I didn't see why that
CloudRuntimeException in the first hunk would be squelched, but the finally
clause makes sense in the second hunk.


On Thu, Apr 25, 2013 at 6:08 PM, Edison Su <Ed...@citrix.com> wrote:

> This code will do the magic:
> } finally {
> if (!stopped) {
>                 if (!forced) {
>                     s_logger.warn("Unable to stop vm " + vm);
>                     try {
>                         stateTransitTo(vm, Event.OperationFailed,
> vm.getHostId());
>                     } catch (NoTransitionException e) {
>                         s_logger.warn("Unable to transition the state " +
> vm);
>                     }
>                     return false;
>                 } else {
>                     s_logger.warn("Unable to actually stop " + vm + " but
> continue with release because it's a force stop");
>                     vmGuru.finalizeStop(profile, answer);
>                 }
>             }
> }
>
>
> Basically, it means, if stop failed and it's not force stop, then mark the
> vm as running.
> I think the logic here sounds correct, as cloudstack can't delete the
> vm(due to the connection between mgt server and kvm agent is broken), then
> all the operations on the VM should fail, and the VM status shouldn't be
> changed.
>
> But the problem is, the stop api call should fail, as stop vm actually
> failed.
> Could you try the following patch:
>
> diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
> b/server/src/com/cloud/vm/UserVmManagerImpl.java
> index 7bf04ec..ff20c54 100755
> --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
> +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
> @@ -3684,10 +3684,15 @@ public class UserVmManagerImpl extends ManagerBase
> implements UserVmManager, Use
>          }
>
>          UserVO user = _userDao.findById(userId);
> -
> +        boolean status = false;
>          try {
>              VirtualMachineEntity vmEntity =
> _orchSrvc.getVirtualMachine(vm.getUuid());
> -            vmEntity.stop(new Long(userId).toString());
> +            status = vmEntity.stop(new Long(userId).toString());
> +            if (status) {
> +               return _vmDao.findById(vmId);
> +            } else {
> +               return null;
> +            }
>          } catch (ResourceUnavailableException e) {
>              throw new CloudRuntimeException(
>                      "Unable to contact the agent to stop the virtual
> machine "
> @@ -3698,7 +3703,7 @@ public class UserVmManagerImpl extends ManagerBase
> implements UserVmManager, Use
>                              + vm, e);
>          }
>
> -        return _vmDao.findById(vmId);
> +
>      }
>
>      @Override
>
>
> > -----Original Message-----
> > From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> > Sent: Thursday, April 25, 2013 4:46 PM
> > To: dev@cloudstack.apache.org
> > Subject: Re: [ACS41] new critical bug
> >
> > I tried a few things, including throwing a CloudRuntimeException in
> several
> > places where I thought it made sense, such as an empty
> > AgentUnavailableException catch block, but it doesn't seem to do
> anything at
> > all, I don't see it in the logs, even though I do see the debug code
> that I
> > placed next to it.  So I give up for now :-)
> >
> > diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
> > b/server/src/com/cloud/vm/UserVmManagerImpl.java
> > index 7bf04ec..0373690 100755
> > --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
> > +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
> > @@ -685,7 +685,8 @@ public class UserVmManagerImpl extends
> > ManagerBase implements UserVmManager, Use
> >          if (status) {
> >              return status;
> >          } else {
> > -            return status;
> > +            throw new CloudRuntimeException(
> > +                    "Unable to stop the virtual machine" + vm );
> >          }
> >      }
> >
> > diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
> > b/server/src/com/cloud/vm/VirtualMachineManagerImpl.
> > index 2c2986f..e320ff1 100755
> > --- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
> > +++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
> > @@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends
> > ManagerBase implements VirtualMac
> >              vmGuru.finalizeStop(profile, answer);
> >
> >          } catch (AgentUnavailableException e) {
> > +            s_logger.error("Unable to stop vm, agent unavailable: " +
> > e.toString());
> > +            throw new CloudRuntimeException("Unable to stop vm, agent
> > unavailable");
> >          } catch (OperationTimedoutException e) {
> > +            s_logger.error("Unable to stop vm, operation timed out: " +
> > e.toString());
> > +            throw new CloudRuntimeException("Unable to stop vm,
> > + operation
> > timed out");
> >          } finally {
> >              if (!stopped) {
> >                  if (!forced) {
> >
> >
> > On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers
> > <ch...@sungard.com>wrote:
> >
> > > On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
> > > > I didn't mark this one as a blocker, but it's still pretty bad for
> > > someone
> > > > managing VMs in an automated fashion. Trying to stop a VM when the
> > > > KVM agent is disconnected reports success.
> > > >
> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-2195
> > >
> > > I'll pull a fix in, if we have one ready before the final blockers.
> > > Otherwise I'd pull it into a 4.1.1 release.
> > >
>

RE: [ACS41] new critical bug

Posted by Edison Su <Ed...@citrix.com>.
This code will do the magic:
} finally {
if (!stopped) {
                if (!forced) {
                    s_logger.warn("Unable to stop vm " + vm);
                    try {
                        stateTransitTo(vm, Event.OperationFailed, vm.getHostId());
                    } catch (NoTransitionException e) {
                        s_logger.warn("Unable to transition the state " + vm);
                    }
                    return false;
                } else {
                    s_logger.warn("Unable to actually stop " + vm + " but continue with release because it's a force stop");
                    vmGuru.finalizeStop(profile, answer);
                }
            }
}


Basically, it means, if stop failed and it's not force stop, then mark the vm as running. 
I think the logic here sounds correct, as cloudstack can't delete the vm(due to the connection between mgt server and kvm agent is broken), then all the operations on the VM should fail, and the VM status shouldn't be changed.

But the problem is, the stop api call should fail, as stop vm actually failed.
Could you try the following patch:

diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java b/server/src/com/cloud/vm/UserVmManagerImpl.java
index 7bf04ec..ff20c54 100755
--- a/server/src/com/cloud/vm/UserVmManagerImpl.java
+++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
@@ -3684,10 +3684,15 @@ public class UserVmManagerImpl extends ManagerBase implements UserVmManager, Use
         }
 
         UserVO user = _userDao.findById(userId);
-
+        boolean status = false;
         try {
             VirtualMachineEntity vmEntity = _orchSrvc.getVirtualMachine(vm.getUuid());
-            vmEntity.stop(new Long(userId).toString());            
+            status = vmEntity.stop(new Long(userId).toString());   
+            if (status) {
+               return _vmDao.findById(vmId);
+            } else {
+               return null;
+            }
         } catch (ResourceUnavailableException e) {
             throw new CloudRuntimeException(
                     "Unable to contact the agent to stop the virtual machine "
@@ -3698,7 +3703,7 @@ public class UserVmManagerImpl extends ManagerBase implements UserVmManager, Use
                             + vm, e);
         }
 
-        return _vmDao.findById(vmId);
+        
     }
 
     @Override


> -----Original Message-----
> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> Sent: Thursday, April 25, 2013 4:46 PM
> To: dev@cloudstack.apache.org
> Subject: Re: [ACS41] new critical bug
> 
> I tried a few things, including throwing a CloudRuntimeException in several
> places where I thought it made sense, such as an empty
> AgentUnavailableException catch block, but it doesn't seem to do anything at
> all, I don't see it in the logs, even though I do see the debug code that I
> placed next to it.  So I give up for now :-)
> 
> diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
> b/server/src/com/cloud/vm/UserVmManagerImpl.java
> index 7bf04ec..0373690 100755
> --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
> +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
> @@ -685,7 +685,8 @@ public class UserVmManagerImpl extends
> ManagerBase implements UserVmManager, Use
>          if (status) {
>              return status;
>          } else {
> -            return status;
> +            throw new CloudRuntimeException(
> +                    "Unable to stop the virtual machine" + vm );
>          }
>      }
> 
> diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
> b/server/src/com/cloud/vm/VirtualMachineManagerImpl.
> index 2c2986f..e320ff1 100755
> --- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
> +++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
> @@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends
> ManagerBase implements VirtualMac
>              vmGuru.finalizeStop(profile, answer);
> 
>          } catch (AgentUnavailableException e) {
> +            s_logger.error("Unable to stop vm, agent unavailable: " +
> e.toString());
> +            throw new CloudRuntimeException("Unable to stop vm, agent
> unavailable");
>          } catch (OperationTimedoutException e) {
> +            s_logger.error("Unable to stop vm, operation timed out: " +
> e.toString());
> +            throw new CloudRuntimeException("Unable to stop vm,
> + operation
> timed out");
>          } finally {
>              if (!stopped) {
>                  if (!forced) {
> 
> 
> On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers
> <ch...@sungard.com>wrote:
> 
> > On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
> > > I didn't mark this one as a blocker, but it's still pretty bad for
> > someone
> > > managing VMs in an automated fashion. Trying to stop a VM when the
> > > KVM agent is disconnected reports success.
> > >
> > > https://issues.apache.org/jira/browse/CLOUDSTACK-2195
> >
> > I'll pull a fix in, if we have one ready before the final blockers.
> > Otherwise I'd pull it into a 4.1.1 release.
> >

Re: [ACS41] new critical bug

Posted by Marcus Sorensen <sh...@gmail.com>.
I tried a few things, including throwing a CloudRuntimeException in several
places where I thought it made sense, such as an empty
AgentUnavailableException catch block, but it doesn't seem to do anything
at all, I don't see it in the logs, even though I do see the debug code
that I placed next to it.  So I give up for now :-)

diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
b/server/src/com/cloud/vm/UserVmManagerImpl.java
index 7bf04ec..0373690 100755
--- a/server/src/com/cloud/vm/UserVmManagerImpl.java
+++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
@@ -685,7 +685,8 @@ public class UserVmManagerImpl extends ManagerBase
implements UserVmManager, Use
         if (status) {
             return status;
         } else {
-            return status;
+            throw new CloudRuntimeException(
+                    "Unable to stop the virtual machine" + vm );
         }
     }

diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
b/server/src/com/cloud/vm/VirtualMachineManagerImpl.
index 2c2986f..e320ff1 100755
--- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
+++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
@@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends
ManagerBase implements VirtualMac
             vmGuru.finalizeStop(profile, answer);

         } catch (AgentUnavailableException e) {
+            s_logger.error("Unable to stop vm, agent unavailable: " +
e.toString());
+            throw new CloudRuntimeException("Unable to stop vm, agent
unavailable");
         } catch (OperationTimedoutException e) {
+            s_logger.error("Unable to stop vm, operation timed out: " +
e.toString());
+            throw new CloudRuntimeException("Unable to stop vm, operation
timed out");
         } finally {
             if (!stopped) {
                 if (!forced) {


On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers <ch...@sungard.com>wrote:

> On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
> > I didn't mark this one as a blocker, but it's still pretty bad for
> someone
> > managing VMs in an automated fashion. Trying to stop a VM when the KVM
> > agent is disconnected reports success.
> >
> > https://issues.apache.org/jira/browse/CLOUDSTACK-2195
>
> I'll pull a fix in, if we have one ready before the final blockers.
> Otherwise I'd pull it into a 4.1.1 release.
>

Re: [ACS41] new critical bug

Posted by Chip Childers <ch...@sungard.com>.
On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
> I didn't mark this one as a blocker, but it's still pretty bad for someone
> managing VMs in an automated fashion. Trying to stop a VM when the KVM
> agent is disconnected reports success.
> 
> https://issues.apache.org/jira/browse/CLOUDSTACK-2195

I'll pull a fix in, if we have one ready before the final blockers.
Otherwise I'd pull it into a 4.1.1 release.