You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Mike Tutkowski <mi...@solidfire.com> on 2014/04/28 05:44:51 UTC

[ACS4.4, XenServer] Problem starting system VMs

Hi,

I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
Xenserver625StorageProcessor would be utilized).

When I create a cloud from scratch, my SSVM starts up fine, but CPVM ends
up in the Paused state. I have to force a shutdown of that VM and then
CloudStack restarts it and it works. This consistently happens. The system
VMs are being deployed to the local storage of the one XS host I have in my
one and only cluster.

Any thoughts on that?

Also, if I try to kick off a user VM to local storage, I get the
general-purpose InsufficientCapacityException and the virtual router does
not even start up.

Can anyone create a similar cloud to what I've described here with XS 6.2,
XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it
works just fine.

At the moment, this is blocking a test case I'm trying to execute to verify
code I had to write in Xenserver625StorageProcessor.

Thanks!

-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*(tm)*

Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Punith S <pu...@cloudbyte.com>.
hi mike,

it seems you may be running out of management ip addresses but i'm not sure
of this error !

try these global settings ,

router.extra.public.nics = 5

router.version.check = false.

thanks.


On Mon, Apr 28, 2014 at 10:23 AM, Mike Tutkowski <
mike.tutkowski@solidfire.com> wrote:

> I didn't mention this, but I'm using a Basic Zone.
>
> The problem is in the VirtualRouterElement, but I'm having a little trouble
> with my debugger at the moment, so haven't traced it any deeper than
> VirtualRouterElement.deployVirtualRouterInGuestNetwork.
>
>
> On Sun, Apr 27, 2014 at 10:10 PM, Mike Tutkowski <
> mike.tutkowski@solidfire.com> wrote:
>
> > It is actually a networking problem (although the exception is vague):
> >
> > "Unable to create a deployment for VM[DomainRouter|r-10-VM]"
> >
> > Inside of NetworkOrchestrator.prepareNic, prepareElement fails for the
> > Virtual Router.
> >
> > Is there some network setting I might be missing for this patched version
> > of 6.2?
> >
> > Thanks!
> >
> >         List<Provider> providersToImplement =
> > getNetworkProviders(network.getId());
> >
> >         for (NetworkElement element : networkElements) {
> >
> >             if (providersToImplement.contains(element.getProvider())) {
> >
> >                 if (!_networkModel.isProviderEnabledInPhysicalNetwork(
> > _networkModel.getPhysicalNetworkId(network),
> > element.getProvider().getName())) {
> >
> >                     throw new CloudRuntimeException("Service provider " +
> > element.getProvider().getName() + " either doesn't exist or is not
> > enabled in physical network id: "
> >
> >                             + network.getPhysicalNetworkId());
> >
> >                 }
> >
> >                 if (s_logger.isDebugEnabled()) {
> >
> >                     s_logger.debug("Asking " + element.getName() + " to
> > prepare for " + nic);
> >
> >                 }
> >
> >                 if (!prepareElement(element, network, profile, vmProfile,
> > dest, context)) {
> >
> >                     throw new
> InsufficientAddressCapacityException("unable
> > to configure the dhcp service, due to insufficiant address capacity",
> > Network.class, network.getId());
> >
> >                 }
> >
> >             }
> >
> >         }
> >
> >
> > On Sun, Apr 27, 2014 at 9:44 PM, Mike Tutkowski <
> > mike.tutkowski@solidfire.com> wrote:
> >
> >> Hi,
> >>
> >> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
> >> Xenserver625StorageProcessor would be utilized).
> >>
> >> When I create a cloud from scratch, my SSVM starts up fine, but CPVM
> ends
> >> up in the Paused state. I have to force a shutdown of that VM and then
> >> CloudStack restarts it and it works. This consistently happens. The
> system
> >> VMs are being deployed to the local storage of the one XS host I have
> in my
> >> one and only cluster.
> >>
> >> Any thoughts on that?
> >>
> >> Also, if I try to kick off a user VM to local storage, I get the
> >> general-purpose InsufficientCapacityException and the virtual router
> does
> >> not even start up.
> >>
> >> Can anyone create a similar cloud to what I've described here with XS
> >> 6.2, XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host
> and
> >> it works just fine.
> >>
> >> At the moment, this is blocking a test case I'm trying to execute to
> >> verify code I had to write in Xenserver625StorageProcessor.
> >>
> >> Thanks!
> >>
> >> --
> >> *Mike Tutkowski*
> >>  *Senior CloudStack Developer, SolidFire Inc.*
> >> e: mike.tutkowski@solidfire.com
> >> o: 303.746.7302
> >> Advancing the way the world uses the cloud<
> http://solidfire.com/solution/overview/?video=play>
> >> *(tm)*
> >>
> >
> >
> >
> > --
> > *Mike Tutkowski*
> > *Senior CloudStack Developer, SolidFire Inc.*
> > e: mike.tutkowski@solidfire.com
> > o: 303.746.7302
> > Advancing the way the world uses the cloud<
> http://solidfire.com/solution/overview/?video=play>
> > *(tm)*
> >
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*
>



-- 
regards,

punith s
cloudbyte.com

Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Mike Tutkowski <mi...@solidfire.com>.
I didn't mention this, but I'm using a Basic Zone.

The problem is in the VirtualRouterElement, but I'm having a little trouble
with my debugger at the moment, so haven't traced it any deeper than
VirtualRouterElement.deployVirtualRouterInGuestNetwork.


On Sun, Apr 27, 2014 at 10:10 PM, Mike Tutkowski <
mike.tutkowski@solidfire.com> wrote:

> It is actually a networking problem (although the exception is vague):
>
> "Unable to create a deployment for VM[DomainRouter|r-10-VM]"
>
> Inside of NetworkOrchestrator.prepareNic, prepareElement fails for the
> Virtual Router.
>
> Is there some network setting I might be missing for this patched version
> of 6.2?
>
> Thanks!
>
>         List<Provider> providersToImplement =
> getNetworkProviders(network.getId());
>
>         for (NetworkElement element : networkElements) {
>
>             if (providersToImplement.contains(element.getProvider())) {
>
>                 if (!_networkModel.isProviderEnabledInPhysicalNetwork(
> _networkModel.getPhysicalNetworkId(network),
> element.getProvider().getName())) {
>
>                     throw new CloudRuntimeException("Service provider " +
> element.getProvider().getName() + " either doesn't exist or is not
> enabled in physical network id: "
>
>                             + network.getPhysicalNetworkId());
>
>                 }
>
>                 if (s_logger.isDebugEnabled()) {
>
>                     s_logger.debug("Asking " + element.getName() + " to
> prepare for " + nic);
>
>                 }
>
>                 if (!prepareElement(element, network, profile, vmProfile,
> dest, context)) {
>
>                     throw new InsufficientAddressCapacityException("unable
> to configure the dhcp service, due to insufficiant address capacity",
> Network.class, network.getId());
>
>                 }
>
>             }
>
>         }
>
>
> On Sun, Apr 27, 2014 at 9:44 PM, Mike Tutkowski <
> mike.tutkowski@solidfire.com> wrote:
>
>> Hi,
>>
>> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
>> Xenserver625StorageProcessor would be utilized).
>>
>> When I create a cloud from scratch, my SSVM starts up fine, but CPVM ends
>> up in the Paused state. I have to force a shutdown of that VM and then
>> CloudStack restarts it and it works. This consistently happens. The system
>> VMs are being deployed to the local storage of the one XS host I have in my
>> one and only cluster.
>>
>> Any thoughts on that?
>>
>> Also, if I try to kick off a user VM to local storage, I get the
>> general-purpose InsufficientCapacityException and the virtual router does
>> not even start up.
>>
>> Can anyone create a similar cloud to what I've described here with XS
>> 6.2, XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and
>> it works just fine.
>>
>> At the moment, this is blocking a test case I'm trying to execute to
>> verify code I had to write in Xenserver625StorageProcessor.
>>
>> Thanks!
>>
>> --
>> *Mike Tutkowski*
>>  *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
>> *(tm)*
>>
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*(tm)*

Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Mike Tutkowski <mi...@solidfire.com>.
It is actually a networking problem (although the exception is vague):

"Unable to create a deployment for VM[DomainRouter|r-10-VM]"

Inside of NetworkOrchestrator.prepareNic, prepareElement fails for the
Virtual Router.

Is there some network setting I might be missing for this patched version
of 6.2?

Thanks!

        List<Provider> providersToImplement =
getNetworkProviders(network.getId());

        for (NetworkElement element : networkElements) {

            if (providersToImplement.contains(element.getProvider())) {

                if (!_networkModel.isProviderEnabledInPhysicalNetwork(
_networkModel.getPhysicalNetworkId(network),
element.getProvider().getName())) {

                    throw new CloudRuntimeException("Service provider " +
element.getProvider().getName() + " either doesn't exist or is not enabled
in physical network id: "

                            + network.getPhysicalNetworkId());

                }

                if (s_logger.isDebugEnabled()) {

                    s_logger.debug("Asking " + element.getName() + " to
prepare for " + nic);

                }

                if (!prepareElement(element, network, profile, vmProfile,
dest, context)) {

                    throw new InsufficientAddressCapacityException("unable
to configure the dhcp service, due to insufficiant address capacity",
Network.class, network.getId());

                }

            }

        }


On Sun, Apr 27, 2014 at 9:44 PM, Mike Tutkowski <
mike.tutkowski@solidfire.com> wrote:

> Hi,
>
> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
> Xenserver625StorageProcessor would be utilized).
>
> When I create a cloud from scratch, my SSVM starts up fine, but CPVM ends
> up in the Paused state. I have to force a shutdown of that VM and then
> CloudStack restarts it and it works. This consistently happens. The system
> VMs are being deployed to the local storage of the one XS host I have in my
> one and only cluster.
>
> Any thoughts on that?
>
> Also, if I try to kick off a user VM to local storage, I get the
> general-purpose InsufficientCapacityException and the virtual router does
> not even start up.
>
> Can anyone create a similar cloud to what I've described here with XS 6.2,
> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it
> works just fine.
>
> At the moment, this is blocking a test case I'm trying to execute to
> verify code I had to write in Xenserver625StorageProcessor.
>
> Thanks!
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*(tm)*

Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Jon Ludlam <jo...@eu.citrix.com>.
On 01/05/14 11:35, Dave Scott wrote:
> Hi,
>
> I think I’ve tracked this down. I believe it’s a bug in the XenServer’s event mechanism, specifically a bug where some shared state causes parallel calls to event.from to interfere with each other. From CloudStack’s point of view this manifests as
>
> * spurious SESSION_INVALID exceptions in waitForTask, which triggers cleanup (Task.destroy), which prevents the VM.start from completing, leaving the VM paused
> * empty lists of events being returned in non-timeout cases
>
> I’ve prototyped a fix together with a test case (which fails before and passes after) and made a pull request containing both:
>
> https://github.com/xapi-project/xen-api/pull/1719

Pull request looks very nice. Your second bullet point was due to the
fact that the autogenerated code couldn't cope with the immutable
database being passed in, so we're generating the snapshots from the
live db. I believe this has now changed and we can associate a database
snapshot with a context, so we could make that problem go away
completely rather than looping until the problem doesn't happen :-)

I think the snapshots fix is a nice-to-have though, so if you could make
a PR for master rather than the clearwater branch, I'll merge.

Jon


> I’d appreciate review from xapi experts, particularly Jon Ludlam (cc:d). I’ve also cc:d the main xapi development list.
>
> Cheers,
> Dave
>


Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Dave Scott <Da...@citrix.com>.
Hi,

I think I’ve tracked this down. I believe it’s a bug in the XenServer’s event mechanism, specifically a bug where some shared state causes parallel calls to event.from to interfere with each other. From CloudStack’s point of view this manifests as

* spurious SESSION_INVALID exceptions in waitForTask, which triggers cleanup (Task.destroy), which prevents the VM.start from completing, leaving the VM paused
* empty lists of events being returned in non-timeout cases

I’ve prototyped a fix together with a test case (which fails before and passes after) and made a pull request containing both:

https://github.com/xapi-project/xen-api/pull/1719

I’d appreciate review from xapi experts, particularly Jon Ludlam (cc:d). I’ve also cc:d the main xapi development list.

Cheers,
Dave

On 29 Apr 2014, at 05:15, Mike Tutkowski <mi...@solidfire.com> wrote:

> Actually, the only issue I'm noticing now is the SSVM being automatically
> paused shortly after being created (while creating a new cloud).
> 
> If I go to XenCenter and forcefully shut the VM down, CloudStack restarts
> it OK.
> 
> 
> On Mon, Apr 28, 2014 at 7:34 PM, Mike Tutkowski <
> mike.tutkowski@solidfire.com> wrote:
> 
>> Figured I'd CC Anthony and Edison to see if they have any input on this
>> (it looks like most of the changes on the relevant file
>> (Xenserver625StorageProcessor.java) were performed by one or the other).
>> 
>> 
>> On Mon, Apr 28, 2014 at 12:40 PM, Mike Tutkowski <
>> mike.tutkowski@solidfire.com> wrote:
>> 
>>> Thanks for the reply, guys.
>>> 
>>> Just wanted to point out that this is on 4.4 for me (although the issue
>>> may also be present on master).
>>> 
>>> I have a sufficient number of IP addresses for both system and user VMs,
>>> so that should be OK (but good thought, Punith).
>>> 
>>> I plan to continue debugging this later this afternoon, but have been in
>>> meetings all morning.
>>> 
>>> Thanks!
>>> 
>>> 
>>> On Mon, Apr 28, 2014 at 10:41 AM, Dave Scott <Da...@citrix.com>wrote:
>>> 
>>>> Hi,
>>>> 
>>>> (sorry to reply to my own email!)
>>>> 
>>>> On 28 Apr 2014, at 11:42, Dave Scott <Da...@citrix.com> wrote:
>>>> 
>>>>> 
>>>>> Hi Mike,
>>>>> 
>>>>> On 28 Apr 2014, at 04:44, Mike Tutkowski <mi...@solidfire.com>
>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
>>>>>> Xenserver625StorageProcessor would be utilized).
>>>>>> 
>>>>>> When I create a cloud from scratch, my SSVM starts up fine, but CPVM
>>>> ends
>>>>>> up in the Paused state. I have to force a shutdown of that VM and then
>>>>>> CloudStack restarts it and it works. This consistently happens. The
>>>> system
>>>>>> VMs are being deployed to the local storage of the one XS host I have
>>>> in my
>>>>>> one and only cluster.
>>>>>> 
>>>>>> Any thoughts on that?
>>>>> 
>>>>> I'm seeing the same symptom on my test cloud with 6.2 and XS62ESP1004.
>>>> I think there's a problem with XenAPI session and task handling in the
>>>> cloudstack master branch, although I've not tracked it down yet. In my
>>>> management server log I see:
>>>>> 
>>>>> WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1)
>>>> Unable to start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
>>>>> 20f6) due to You gave an invalid session reference.  It may have been
>>>> invalidated by a server restart, or timed out.  You should get
>>>>> a new session handle, using one of the session.login_ calls.  This
>>>> error does not invalidate the current connection.  The handle para
>>>>> meter echoes the bad value given.
>>>>> You gave an invalid session reference.  It may have been invalidated
>>>> by a server restart, or timed out.  You should get a new session
>>>>> handle, using one of the session.login_ calls.  This error does not
>>>> invalidate the current connection.  The handle parameter echoes
>>>>> the bad value given.
>>>>>       at com.xensource.xenapi.Types.checkResponse(Types.java:218)
>>>>>       at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
>>>>>       at
>>>> com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
>>>>>       at com.xensource.xenapi.Event.from(Event.java:270)
>>>>>       at
>>>> org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
>>>>>       at
>>>> com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)
>>>>> 
>>>>> Somehow the XenAPI session being used by the Event.from in the
>>>> XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only)
>>>> is being logged-out somewhere. When this happens, the cloudstack cleanup
>>>> code calls Task.cancel and Task.destroy, and then the XenServer
>>>> Async.VM.start fails trying to update Task.progress before it internally
>>>> calls VM.unpause.
>>>>> 
>>>>> I made a hack to disable caching of Connection/sessions:
>>>>> 
>>>>> 
>>>> https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4
>>>> 
>>>> For reference / experimentation, I've made a slightly more plausible
>>>> patch:
>>>> 
>>>> 
>>>> https://github.com/djs55/cloudstack/commit/9d40f56c6384d04a5f0fb22e5b97530c0164e0b2
>>>> 
>>>> It catches the SESSION_INVALID in the XenServerConnection and
>>>> transparently logs back in. This would prevent the higher level bits of the
>>>> XenServer plugin from having to deal with sessions being expired beneath
>>>> them.
>>>> 
>>>> Chers,
>>>> Dave
>>>> 
>>>>> 
>>>>> I suspect this now leaks Connections/sessions, but the symptom goes
>>>> away.
>>>>> 
>>>>> So far my thoughts are:
>>>>> 
>>>>> 1. we need to find who's calling session.logout and why -- this will
>>>> help fix the problem in the short term
>>>>> 
>>>>> 2. The XenServer XenAPI bindings are harder to use than they should be
>>>> (IMHO). In particular I think the bindings should take care of handling
>>>> SESSION_INVALID exceptions and re-authenticating transparently, to avoid
>>>> polluting the cloudstack code with rarely-used exception handlers.
>>>>> 
>>>>> 3. the semantics of XenAPI task.destroy could be improved: instead of
>>>> immediately removing the task (which then causes cleanup code to fail
>>>> randomly it seems), it should be more like Unix waitpid with NOHANG i.e.
>>>> set a bit which says, "I'm done with this. Destroy it when you are finished
>>>> with it."
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Also, if I try to kick off a user VM to local storage, I get the
>>>>>> general-purpose InsufficientCapacityException and the virtual router
>>>> does
>>>>>> not even start up.
>>>>> 
>>>>> No idea about this one :)
>>>>> 
>>>>> Cheers,
>>>>> Dave
>>>>> 
>>>>>> 
>>>>>> Can anyone create a similar cloud to what I've described here with XS
>>>> 6.2,
>>>>>> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and
>>>> it
>>>>>> works just fine.
>>>>>> 
>>>>>> At the moment, this is blocking a test case I'm trying to execute to
>>>> verify
>>>>>> code I had to write in Xenserver625StorageProcessor.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> --
>>>>>> *Mike Tutkowski*
>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>> e: mike.tutkowski@solidfire.com
>>>>>> o: 303.746.7302
>>>>>> Advancing the way the world uses the
>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>> *(tm)*
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> *Mike Tutkowski*
>>> *Senior CloudStack Developer, SolidFire Inc.*
>>> e: mike.tutkowski@solidfire.com
>>> o: 303.746.7302
>>> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
>>> *(tm)*
>>> 
>> 
>> 
>> 
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
>> *(tm)*
>> 
> 
> 
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*


Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Mike Tutkowski <mi...@solidfire.com>.
Actually, the only issue I'm noticing now is the SSVM being automatically
paused shortly after being created (while creating a new cloud).

If I go to XenCenter and forcefully shut the VM down, CloudStack restarts
it OK.


On Mon, Apr 28, 2014 at 7:34 PM, Mike Tutkowski <
mike.tutkowski@solidfire.com> wrote:

> Figured I'd CC Anthony and Edison to see if they have any input on this
> (it looks like most of the changes on the relevant file
> (Xenserver625StorageProcessor.java) were performed by one or the other).
>
>
> On Mon, Apr 28, 2014 at 12:40 PM, Mike Tutkowski <
> mike.tutkowski@solidfire.com> wrote:
>
>> Thanks for the reply, guys.
>>
>> Just wanted to point out that this is on 4.4 for me (although the issue
>> may also be present on master).
>>
>> I have a sufficient number of IP addresses for both system and user VMs,
>> so that should be OK (but good thought, Punith).
>>
>> I plan to continue debugging this later this afternoon, but have been in
>> meetings all morning.
>>
>> Thanks!
>>
>>
>> On Mon, Apr 28, 2014 at 10:41 AM, Dave Scott <Da...@citrix.com>wrote:
>>
>>> Hi,
>>>
>>> (sorry to reply to my own email!)
>>>
>>> On 28 Apr 2014, at 11:42, Dave Scott <Da...@citrix.com> wrote:
>>>
>>> >
>>> > Hi Mike,
>>> >
>>> > On 28 Apr 2014, at 04:44, Mike Tutkowski <mi...@solidfire.com>
>>> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
>>> >> Xenserver625StorageProcessor would be utilized).
>>> >>
>>> >> When I create a cloud from scratch, my SSVM starts up fine, but CPVM
>>> ends
>>> >> up in the Paused state. I have to force a shutdown of that VM and then
>>> >> CloudStack restarts it and it works. This consistently happens. The
>>> system
>>> >> VMs are being deployed to the local storage of the one XS host I have
>>> in my
>>> >> one and only cluster.
>>> >>
>>> >> Any thoughts on that?
>>> >
>>> > I'm seeing the same symptom on my test cloud with 6.2 and XS62ESP1004.
>>> I think there's a problem with XenAPI session and task handling in the
>>> cloudstack master branch, although I've not tracked it down yet. In my
>>> management server log I see:
>>> >
>>> > WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1)
>>> Unable to start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
>>> > 20f6) due to You gave an invalid session reference.  It may have been
>>> invalidated by a server restart, or timed out.  You should get
>>> > a new session handle, using one of the session.login_ calls.  This
>>> error does not invalidate the current connection.  The handle para
>>> > meter echoes the bad value given.
>>> > You gave an invalid session reference.  It may have been invalidated
>>> by a server restart, or timed out.  You should get a new session
>>> > handle, using one of the session.login_ calls.  This error does not
>>> invalidate the current connection.  The handle parameter echoes
>>> > the bad value given.
>>> >        at com.xensource.xenapi.Types.checkResponse(Types.java:218)
>>> >        at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
>>> >        at
>>> com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
>>> >        at com.xensource.xenapi.Event.from(Event.java:270)
>>> >        at
>>> org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
>>> >        at
>>> com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)
>>> >
>>> > Somehow the XenAPI session being used by the Event.from in the
>>> XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only)
>>> is being logged-out somewhere. When this happens, the cloudstack cleanup
>>> code calls Task.cancel and Task.destroy, and then the XenServer
>>> Async.VM.start fails trying to update Task.progress before it internally
>>> calls VM.unpause.
>>> >
>>> > I made a hack to disable caching of Connection/sessions:
>>> >
>>> >
>>> https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4
>>>
>>> For reference / experimentation, I've made a slightly more plausible
>>> patch:
>>>
>>>
>>> https://github.com/djs55/cloudstack/commit/9d40f56c6384d04a5f0fb22e5b97530c0164e0b2
>>>
>>> It catches the SESSION_INVALID in the XenServerConnection and
>>> transparently logs back in. This would prevent the higher level bits of the
>>> XenServer plugin from having to deal with sessions being expired beneath
>>> them.
>>>
>>> Chers,
>>> Dave
>>>
>>> >
>>> > I suspect this now leaks Connections/sessions, but the symptom goes
>>> away.
>>> >
>>> > So far my thoughts are:
>>> >
>>> > 1. we need to find who's calling session.logout and why -- this will
>>> help fix the problem in the short term
>>> >
>>> > 2. The XenServer XenAPI bindings are harder to use than they should be
>>> (IMHO). In particular I think the bindings should take care of handling
>>> SESSION_INVALID exceptions and re-authenticating transparently, to avoid
>>> polluting the cloudstack code with rarely-used exception handlers.
>>> >
>>> > 3. the semantics of XenAPI task.destroy could be improved: instead of
>>> immediately removing the task (which then causes cleanup code to fail
>>> randomly it seems), it should be more like Unix waitpid with NOHANG i.e.
>>> set a bit which says, "I'm done with this. Destroy it when you are finished
>>> with it."
>>> >
>>> >
>>> >>
>>> >> Also, if I try to kick off a user VM to local storage, I get the
>>> >> general-purpose InsufficientCapacityException and the virtual router
>>> does
>>> >> not even start up.
>>> >
>>> > No idea about this one :)
>>> >
>>> > Cheers,
>>> > Dave
>>> >
>>> >>
>>> >> Can anyone create a similar cloud to what I've described here with XS
>>> 6.2,
>>> >> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and
>>> it
>>> >> works just fine.
>>> >>
>>> >> At the moment, this is blocking a test case I'm trying to execute to
>>> verify
>>> >> code I had to write in Xenserver625StorageProcessor.
>>> >>
>>> >> Thanks!
>>> >>
>>> >> --
>>> >> *Mike Tutkowski*
>>> >> *Senior CloudStack Developer, SolidFire Inc.*
>>> >> e: mike.tutkowski@solidfire.com
>>> >> o: 303.746.7302
>>> >> Advancing the way the world uses the
>>> >> cloud<http://solidfire.com/solution/overview/?video=play>
>>> >> *(tm)*
>>> >
>>>
>>>
>>
>>
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
>> *(tm)*
>>
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*(tm)*

Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Mike Tutkowski <mi...@solidfire.com>.
Figured I'd CC Anthony and Edison to see if they have any input on this (it
looks like most of the changes on the relevant file
(Xenserver625StorageProcessor.java) were performed by one or the other).


On Mon, Apr 28, 2014 at 12:40 PM, Mike Tutkowski <
mike.tutkowski@solidfire.com> wrote:

> Thanks for the reply, guys.
>
> Just wanted to point out that this is on 4.4 for me (although the issue
> may also be present on master).
>
> I have a sufficient number of IP addresses for both system and user VMs,
> so that should be OK (but good thought, Punith).
>
> I plan to continue debugging this later this afternoon, but have been in
> meetings all morning.
>
> Thanks!
>
>
> On Mon, Apr 28, 2014 at 10:41 AM, Dave Scott <Da...@citrix.com>wrote:
>
>> Hi,
>>
>> (sorry to reply to my own email!)
>>
>> On 28 Apr 2014, at 11:42, Dave Scott <Da...@citrix.com> wrote:
>>
>> >
>> > Hi Mike,
>> >
>> > On 28 Apr 2014, at 04:44, Mike Tutkowski <mi...@solidfire.com>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
>> >> Xenserver625StorageProcessor would be utilized).
>> >>
>> >> When I create a cloud from scratch, my SSVM starts up fine, but CPVM
>> ends
>> >> up in the Paused state. I have to force a shutdown of that VM and then
>> >> CloudStack restarts it and it works. This consistently happens. The
>> system
>> >> VMs are being deployed to the local storage of the one XS host I have
>> in my
>> >> one and only cluster.
>> >>
>> >> Any thoughts on that?
>> >
>> > I'm seeing the same symptom on my test cloud with 6.2 and XS62ESP1004.
>> I think there's a problem with XenAPI session and task handling in the
>> cloudstack master branch, although I've not tracked it down yet. In my
>> management server log I see:
>> >
>> > WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1)
>> Unable to start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
>> > 20f6) due to You gave an invalid session reference.  It may have been
>> invalidated by a server restart, or timed out.  You should get
>> > a new session handle, using one of the session.login_ calls.  This
>> error does not invalidate the current connection.  The handle para
>> > meter echoes the bad value given.
>> > You gave an invalid session reference.  It may have been invalidated by
>> a server restart, or timed out.  You should get a new session
>> > handle, using one of the session.login_ calls.  This error does not
>> invalidate the current connection.  The handle parameter echoes
>> > the bad value given.
>> >        at com.xensource.xenapi.Types.checkResponse(Types.java:218)
>> >        at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
>> >        at
>> com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
>> >        at com.xensource.xenapi.Event.from(Event.java:270)
>> >        at
>> org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
>> >        at
>> com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)
>> >
>> > Somehow the XenAPI session being used by the Event.from in the
>> XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only)
>> is being logged-out somewhere. When this happens, the cloudstack cleanup
>> code calls Task.cancel and Task.destroy, and then the XenServer
>> Async.VM.start fails trying to update Task.progress before it internally
>> calls VM.unpause.
>> >
>> > I made a hack to disable caching of Connection/sessions:
>> >
>> >
>> https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4
>>
>> For reference / experimentation, I've made a slightly more plausible
>> patch:
>>
>>
>> https://github.com/djs55/cloudstack/commit/9d40f56c6384d04a5f0fb22e5b97530c0164e0b2
>>
>> It catches the SESSION_INVALID in the XenServerConnection and
>> transparently logs back in. This would prevent the higher level bits of the
>> XenServer plugin from having to deal with sessions being expired beneath
>> them.
>>
>> Chers,
>> Dave
>>
>> >
>> > I suspect this now leaks Connections/sessions, but the symptom goes
>> away.
>> >
>> > So far my thoughts are:
>> >
>> > 1. we need to find who's calling session.logout and why -- this will
>> help fix the problem in the short term
>> >
>> > 2. The XenServer XenAPI bindings are harder to use than they should be
>> (IMHO). In particular I think the bindings should take care of handling
>> SESSION_INVALID exceptions and re-authenticating transparently, to avoid
>> polluting the cloudstack code with rarely-used exception handlers.
>> >
>> > 3. the semantics of XenAPI task.destroy could be improved: instead of
>> immediately removing the task (which then causes cleanup code to fail
>> randomly it seems), it should be more like Unix waitpid with NOHANG i.e.
>> set a bit which says, "I'm done with this. Destroy it when you are finished
>> with it."
>> >
>> >
>> >>
>> >> Also, if I try to kick off a user VM to local storage, I get the
>> >> general-purpose InsufficientCapacityException and the virtual router
>> does
>> >> not even start up.
>> >
>> > No idea about this one :)
>> >
>> > Cheers,
>> > Dave
>> >
>> >>
>> >> Can anyone create a similar cloud to what I've described here with XS
>> 6.2,
>> >> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and
>> it
>> >> works just fine.
>> >>
>> >> At the moment, this is blocking a test case I'm trying to execute to
>> verify
>> >> code I had to write in Xenserver625StorageProcessor.
>> >>
>> >> Thanks!
>> >>
>> >> --
>> >> *Mike Tutkowski*
>> >> *Senior CloudStack Developer, SolidFire Inc.*
>> >> e: mike.tutkowski@solidfire.com
>> >> o: 303.746.7302
>> >> Advancing the way the world uses the
>> >> cloud<http://solidfire.com/solution/overview/?video=play>
>> >> *(tm)*
>> >
>>
>>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*(tm)*

Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Mike Tutkowski <mi...@solidfire.com>.
Thanks for the reply, guys.

Just wanted to point out that this is on 4.4 for me (although the issue may
also be present on master).

I have a sufficient number of IP addresses for both system and user VMs, so
that should be OK (but good thought, Punith).

I plan to continue debugging this later this afternoon, but have been in
meetings all morning.

Thanks!


On Mon, Apr 28, 2014 at 10:41 AM, Dave Scott <Da...@citrix.com> wrote:

> Hi,
>
> (sorry to reply to my own email!)
>
> On 28 Apr 2014, at 11:42, Dave Scott <Da...@citrix.com> wrote:
>
> >
> > Hi Mike,
> >
> > On 28 Apr 2014, at 04:44, Mike Tutkowski <mi...@solidfire.com>
> wrote:
> >
> >> Hi,
> >>
> >> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
> >> Xenserver625StorageProcessor would be utilized).
> >>
> >> When I create a cloud from scratch, my SSVM starts up fine, but CPVM
> ends
> >> up in the Paused state. I have to force a shutdown of that VM and then
> >> CloudStack restarts it and it works. This consistently happens. The
> system
> >> VMs are being deployed to the local storage of the one XS host I have
> in my
> >> one and only cluster.
> >>
> >> Any thoughts on that?
> >
> > I'm seeing the same symptom on my test cloud with 6.2 and XS62ESP1004. I
> think there's a problem with XenAPI session and task handling in the
> cloudstack master branch, although I've not tracked it down yet. In my
> management server log I see:
> >
> > WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1) Unable
> to start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
> > 20f6) due to You gave an invalid session reference.  It may have been
> invalidated by a server restart, or timed out.  You should get
> > a new session handle, using one of the session.login_ calls.  This error
> does not invalidate the current connection.  The handle para
> > meter echoes the bad value given.
> > You gave an invalid session reference.  It may have been invalidated by
> a server restart, or timed out.  You should get a new session
> > handle, using one of the session.login_ calls.  This error does not
> invalidate the current connection.  The handle parameter echoes
> > the bad value given.
> >        at com.xensource.xenapi.Types.checkResponse(Types.java:218)
> >        at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
> >        at
> com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
> >        at com.xensource.xenapi.Event.from(Event.java:270)
> >        at
> org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
> >        at
> com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)
> >
> > Somehow the XenAPI session being used by the Event.from in the
> XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only)
> is being logged-out somewhere. When this happens, the cloudstack cleanup
> code calls Task.cancel and Task.destroy, and then the XenServer
> Async.VM.start fails trying to update Task.progress before it internally
> calls VM.unpause.
> >
> > I made a hack to disable caching of Connection/sessions:
> >
> >
> https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4
>
> For reference / experimentation, I've made a slightly more plausible patch:
>
>
> https://github.com/djs55/cloudstack/commit/9d40f56c6384d04a5f0fb22e5b97530c0164e0b2
>
> It catches the SESSION_INVALID in the XenServerConnection and
> transparently logs back in. This would prevent the higher level bits of the
> XenServer plugin from having to deal with sessions being expired beneath
> them.
>
> Chers,
> Dave
>
> >
> > I suspect this now leaks Connections/sessions, but the symptom goes away.
> >
> > So far my thoughts are:
> >
> > 1. we need to find who's calling session.logout and why -- this will help
> fix the problem in the short term
> >
> > 2. The XenServer XenAPI bindings are harder to use than they should be
> (IMHO). In particular I think the bindings should take care of handling
> SESSION_INVALID exceptions and re-authenticating transparently, to avoid
> polluting the cloudstack code with rarely-used exception handlers.
> >
> > 3. the semantics of XenAPI task.destroy could be improved: instead of
> immediately removing the task (which then causes cleanup code to fail
> randomly it seems), it should be more like Unix waitpid with NOHANG i.e.
> set a bit which says, "I'm done with this. Destroy it when you are finished
> with it."
> >
> >
> >>
> >> Also, if I try to kick off a user VM to local storage, I get the
> >> general-purpose InsufficientCapacityException and the virtual router
> does
> >> not even start up.
> >
> > No idea about this one :)
> >
> > Cheers,
> > Dave
> >
> >>
> >> Can anyone create a similar cloud to what I've described here with XS
> 6.2,
> >> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it
> >> works just fine.
> >>
> >> At the moment, this is blocking a test case I'm trying to execute to
> verify
> >> code I had to write in Xenserver625StorageProcessor.
> >>
> >> Thanks!
> >>
> >> --
> >> *Mike Tutkowski*
> >> *Senior CloudStack Developer, SolidFire Inc.*
> >> e: mike.tutkowski@solidfire.com
> >> o: 303.746.7302
> >> Advancing the way the world uses the
> >> cloud<http://solidfire.com/solution/overview/?video=play>
> >> *(tm)*
> >
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*(tm)*

Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Dave Scott <Da...@citrix.com>.
Hi,

(sorry to reply to my own email!)

On 28 Apr 2014, at 11:42, Dave Scott <Da...@citrix.com> wrote:

> 
> Hi Mike,
> 
> On 28 Apr 2014, at 04:44, Mike Tutkowski <mi...@solidfire.com> wrote:
> 
>> Hi,
>> 
>> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
>> Xenserver625StorageProcessor would be utilized).
>> 
>> When I create a cloud from scratch, my SSVM starts up fine, but CPVM ends
>> up in the Paused state. I have to force a shutdown of that VM and then
>> CloudStack restarts it and it works. This consistently happens. The system
>> VMs are being deployed to the local storage of the one XS host I have in my
>> one and only cluster.
>> 
>> Any thoughts on that?
> 
> I’m seeing the same symptom on my test cloud with 6.2 and XS62ESP1004. I think there’s a problem with XenAPI session and task handling in the cloudstack master branch, although I’ve not tracked it down yet. In my management server log I see:
> 
> WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1) Unable to start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
> 20f6) due to You gave an invalid session reference.  It may have been invalidated by a server restart, or timed out.  You should get 
> a new session handle, using one of the session.login_ calls.  This error does not invalidate the current connection.  The handle para
> meter echoes the bad value given.
> You gave an invalid session reference.  It may have been invalidated by a server restart, or timed out.  You should get a new session
> handle, using one of the session.login_ calls.  This error does not invalidate the current connection.  The handle parameter echoes 
> the bad value given.
>        at com.xensource.xenapi.Types.checkResponse(Types.java:218)
>        at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
>        at com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
>        at com.xensource.xenapi.Event.from(Event.java:270)
>        at org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
>        at com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)
> 
> Somehow the XenAPI session being used by the Event.from in the XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only) is being logged-out somewhere. When this happens, the cloudstack cleanup code calls Task.cancel and Task.destroy, and then the XenServer Async.VM.start fails trying to update Task.progress before it internally calls VM.unpause.
> 
> I made a hack to disable caching of Connection/sessions:
> 
> https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4

For reference / experimentation, I’ve made a slightly more plausible patch:

https://github.com/djs55/cloudstack/commit/9d40f56c6384d04a5f0fb22e5b97530c0164e0b2

It catches the SESSION_INVALID in the XenServerConnection and transparently logs back in. This would prevent the higher level bits of the XenServer plugin from having to deal with sessions being expired beneath them.

Chers,
Dave

> 
> I suspect this now leaks Connections/sessions, but the symptom goes away.
> 
> So far my thoughts are:
> 
> 1. we need to find who’s calling session.logout and why — this will help fix the problem in the short term
> 
> 2. The XenServer XenAPI bindings are harder to use than they should be (IMHO). In particular I think the bindings should take care of handling SESSION_INVALID exceptions and re-authenticating transparently, to avoid polluting the cloudstack code with rarely-used exception handlers.
> 
> 3. the semantics of XenAPI task.destroy could be improved: instead of immediately removing the task (which then causes cleanup code to fail randomly it seems), it should be more like Unix waitpid with NOHANG i.e. set a bit which says, “I’m done with this. Destroy it when you are finished with it."
> 
> 
>> 
>> Also, if I try to kick off a user VM to local storage, I get the
>> general-purpose InsufficientCapacityException and the virtual router does
>> not even start up.
> 
> No idea about this one :)
> 
> Cheers,
> Dave
> 
>> 
>> Can anyone create a similar cloud to what I've described here with XS 6.2,
>> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it
>> works just fine.
>> 
>> At the moment, this is blocking a test case I'm trying to execute to verify
>> code I had to write in Xenserver625StorageProcessor.
>> 
>> Thanks!
>> 
>> -- 
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=play>
>> *(tm)*
> 


Re: [ACS4.4, XenServer] Problem starting system VMs

Posted by Dave Scott <Da...@citrix.com>.
Hi Mike,

On 28 Apr 2014, at 04:44, Mike Tutkowski <mi...@solidfire.com> wrote:

> Hi,
> 
> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
> Xenserver625StorageProcessor would be utilized).
> 
> When I create a cloud from scratch, my SSVM starts up fine, but CPVM ends
> up in the Paused state. I have to force a shutdown of that VM and then
> CloudStack restarts it and it works. This consistently happens. The system
> VMs are being deployed to the local storage of the one XS host I have in my
> one and only cluster.
> 
> Any thoughts on that?

I’m seeing the same symptom on my test cloud with 6.2 and XS62ESP1004. I think there’s a problem with XenAPI session and task handling in the cloudstack master branch, although I’ve not tracked it down yet. In my management server log I see:

WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1) Unable to start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
20f6) due to You gave an invalid session reference.  It may have been invalidated by a server restart, or timed out.  You should get 
a new session handle, using one of the session.login_ calls.  This error does not invalidate the current connection.  The handle para
meter echoes the bad value given.
You gave an invalid session reference.  It may have been invalidated by a server restart, or timed out.  You should get a new session
 handle, using one of the session.login_ calls.  This error does not invalidate the current connection.  The handle parameter echoes 
the bad value given.
        at com.xensource.xenapi.Types.checkResponse(Types.java:218)
        at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
        at com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
        at com.xensource.xenapi.Event.from(Event.java:270)
        at org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
        at com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)

Somehow the XenAPI session being used by the Event.from in the XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only) is being logged-out somewhere. When this happens, the cloudstack cleanup code calls Task.cancel and Task.destroy, and then the XenServer Async.VM.start fails trying to update Task.progress before it internally calls VM.unpause.

I made a hack to disable caching of Connection/sessions:

https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4

I suspect this now leaks Connections/sessions, but the symptom goes away.

So far my thoughts are:

1. we need to find who’s calling session.logout and why — this will help fix the problem in the short term

2. The XenServer XenAPI bindings are harder to use than they should be (IMHO). In particular I think the bindings should take care of handling SESSION_INVALID exceptions and re-authenticating transparently, to avoid polluting the cloudstack code with rarely-used exception handlers.

3. the semantics of XenAPI task.destroy could be improved: instead of immediately removing the task (which then causes cleanup code to fail randomly it seems), it should be more like Unix waitpid with NOHANG i.e. set a bit which says, “I’m done with this. Destroy it when you are finished with it."


> 
> Also, if I try to kick off a user VM to local storage, I get the
> general-purpose InsufficientCapacityException and the virtual router does
> not even start up.

No idea about this one :)

Cheers,
Dave

> 
> Can anyone create a similar cloud to what I've described here with XS 6.2,
> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it
> works just fine.
> 
> At the moment, this is blocking a test case I'm trying to execute to verify
> code I had to write in Xenserver625StorageProcessor.
> 
> Thanks!
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*