You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Steve Searles <ss...@zimcom.net> on 2014/11/21 16:40:33 UTC

Re: ALL Hosts Stuck in Maintenance

For some reason it is affecting every host.  VMware, KVM, and XenServer.  No hosts will come out of maintenance same NPE for all.  Storage will go in and out of maintenance fine.  Weird. Any ideas?  The only way to get the host back online is to remove it and re-add it.


Steven Searles

[cid:image001.jpg@01CF54D3.D25E1ED0]

On Nov 21, 2014, at 9:04 AM, Steve Searles <ss...@zimcom.net>> wrote:

Yea, tried all that.  Now its affecting KVM as well. Thanks for the reply, I will dig a bit deeper.


Steven Searles

[cid:image001.jpg@01CF54D3.D25E1ED0]

On Nov 20, 2014, at 4:00 PM, Motty Cruz <mo...@gmail.com>> wrote:

Hi Steve,
have you try stopping and restarting ACS? also I would do the following in xenserver
xe-toolstack-restart
it won't affect your VMs.

To restart Cloudstack
service cloudstack-management restart (in CentOs)

Thanks,
Motty
On 11/20/2014 12:55 PM, Steve Searles wrote:
Found this in the catalina.out log on the management server.

INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-4:ctx-8faa1563 job-12695) Add job-12695 into job monitoring
WARN  [c.c.a.d.ParamGenericValidationWorker] (API-Job-Executor-4:ctx-8faa1563 job-12695 ctx-81dfab11) Received unknown parameters for command cancelHostMaintenance. Unknown parameters : signatureversion expires
ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-4:ctx-8faa1563 job-12695) Unexpected exception while executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd


Anyone know what the unknown parameters are all about?

—Steve



On Nov 20, 2014, at 3:14 PM, Steve Searles <ss...@zimcom.net>> wrote:

CS 4.4.1 - 4.4.2
I am having a problem with my xenserver hosts getting stuck in maintenance.  Trying to cancel the maintenance produces the following NPE.

2014-11-20 15:04:28,575 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Add job-12626 into job monitoring
2014-11-20 15:04:28,576 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Executing AsyncJobVO {id:12626, userId: 2, accountId: 2, instanceType: Host, instanceId: 114, cmd: org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd, cmdInfo: {"id":"189c3843-8d92-419b-a8b2-e343ea02c8fd","response":"json","sessionkey":"OEXANRcg2kzKJrfGXpvCK3E6k28\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"189c3843-8d92-419b-a8b2-e343ea02c8fd\"}","cmdEventType":"MAINT.CANCEL","ctxUserId":"2","httpmethod":"GET","_":"1416513869627","uuid":"189c3843-8d92-419b-a8b2-e343ea02c8fd","ctxAccountId":"2","ctxStartEventId":"140662"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 345049793560, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
2014-11-20 15:04:28,577 DEBUG [c.c.a.ApiServlet] (catalina-exec-12:ctx-78ec5c48 ctx-e7500b2b) ===END===  172.23.0.1 -- GET  command=cancelHostMaintenance&id=189c3843-8d92-419b-a8b2-e343ea02c8fd&response=json&sessionkey=OEXANRcg2kzKJrfGXpvCK3E6k28%3D&_=1416513869627
2014-11-20 15:04:28,601 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Unexpected exception while executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
java.lang.NullPointerException
at com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)




I have tried upgrading to the latest git build 4.4.2 and the problem still exists.   I think it started in 4.4.1 because it used to work properly in 4.4.0.  I also deleted and re-created the SSVM but that did not help either.  Does anyone have a solution or workaround?  Is there a way to manually take a host out of maintenance? I think there is more to it than setting the status in the DB?


— Steve








Re: ALL Hosts Stuck in Maintenance

Posted by Steve Searles <ss...@zimcom.net>.
Thanks for the reply. Yes that did seem to be the issue.  I did a manual cleanup of all of the vm’s that were assigned to that host, cleared out the async job queue and restarted the management server.  This seems to have resolved the issue system wide.


Steven Searles, CTO | ssearles@zimcom.net<ma...@zimcom.net>
Zimcom Internet Solutions  | www.zimcom.net<http://www.zimcom.net>
O: 513.231.9500  |  D: 513.233.4130

[cid:image001.jpg@01CF54D3.D25E1ED0]

On Nov 22, 2014, at 2:54 PM, Daan Hoogland <da...@gmail.com>> wrote:

Steven,

I had a look at your stacktrace. It seems a vm is marked to be migrated
back during maintenance mode cancel and it doesn't exist. Did you destroy
any of the vms on the host while it was in maintenance?

There is a number of jobs for migration each of which pertains to a vm. For
one of them the vm does not exist. Doesn't exist in this case means that it
is not in the database. This is a bug, obviously. I can't be sure what
happened during the maintenance mode and thus not what the nature of the
bug exactly is. It would be nice if you could to investigate this.

Daan


On Fri, Nov 21, 2014 at 6:35 PM, Steve Searles <ss...@zimcom.net>> wrote:

Below is the full NPE from the catalina.out if that would help.  This is
the same NPE for all hypervisor types.

INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-10:ctx-208629c7
job-12679) Add job-12679 into job monitoring
ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-10:ctx-208629c7
job-12679) Unexpected exception while executing
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
java.lang.NullPointerException
at
com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy148.cancelMaintenance(Unknown Source)
at
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:102)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
at
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-10:ctx-208629c7
job-12679) Remove job-12679 from job monitoring


  Steven Searles, CTO | ssearles@zimcom.net<ma...@zimcom.net>
Zimcom Internet Solutions  | www.zimcom.net<http://www.zimcom.net>
O: 513.231.9500  |  D: 513.233.4130


On Nov 21, 2014, at 12:31 PM, Steve Searles <ss...@zimcom.net>> wrote:

In my case the agent is connected fine.  I have tried even restarting
the hosts.  I have the agent log in debug for the KVM server and am not
seeing the mgmt server even trying to execute anything when coming out of
maintenance.  I just get the NPE immediately in the management server
logs.

 Steven Searles, CTO | ssearles@zimcom.net<ma...@zimcom.net>
Zimcom Internet Solutions  | www.zimcom.net<http://www.zimcom.net>
O: 513.231.9500  |  D: 513.233.4130

<image001.jpg>

On Nov 21, 2014, at 10:54 AM, Andrija Panic <an...@gmail.com>>
wrote:

Experienced similar behaviour, for kvm - seems like restarting libvirtd and
give it some time to settle, and than agent connects ... on its own...

Sent from Google Nexus 4
On Nov 21, 2014 4:43 PM, "Steve Searles" <ss...@zimcom.net>> wrote:

For some reason it is affecting every host.  VMware, KVM, and XenServer.
No hosts will come out of maintenance same NPE for all.  Storage will go in
and out of maintenance fine.  Weird. Any ideas?  The only way to get the
host back online is to remove it and re-add it.


Steven Searles


On Nov 21, 2014, at 9:04 AM, Steve Searles <ss...@zimcom.net>> wrote:

Yea, tried all that.  Now its affecting KVM as well. Thanks for the
reply, I will dig a bit deeper.


Steven Searles


On Nov 20, 2014, at 4:00 PM, Motty Cruz <mo...@gmail.com>> wrote:

Hi Steve,
have you try stopping and restarting ACS? also I would do the following in
xenserver
xe-toolstack-restart
it won't affect your VMs.

To restart Cloudstack
service cloudstack-management restart (in CentOs)

Thanks,
Motty
On 11/20/2014 12:55 PM, Steve Searles wrote:

Found this in the catalina.out log on the management server.

INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-4:ctx-8faa1563
job-12695) Add job-12695 into job monitoring
WARN  [c.c.a.d.ParamGenericValidationWorker]
(API-Job-Executor-4:ctx-8faa1563 job-12695 ctx-81dfab11) Received unknown
parameters for command cancelHostMaintenance. Unknown parameters :
signatureversion expires
ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-4:ctx-8faa1563
job-12695) Unexpected exception while executing
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd


Anyone know what the unknown parameters are all about?

—Steve



On Nov 20, 2014, at 3:14 PM, Steve Searles <ss...@zimcom.net><
mailto:ssearles@zimcom.net <ss...@zimcom.net>> <ss...@zimcom.net>>>>
wrote:

CS 4.4.1 - 4.4.2
I am having a problem with my xenserver hosts getting stuck in
maintenance.  Trying to cancel the maintenance produces the following NPE.

2014-11-20 15:04:28,575 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Add job-12626 into job
monitoring
2014-11-20 15:04:28,576 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Executing AsyncJobVO
{id:12626, userId: 2, accountId: 2, instanceType: Host, instanceId: 114,
cmd: org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd,
cmdInfo:

{"id":"189c3843-8d92-419b-a8b2-e343ea02c8fd","response":"json","sessionkey":"OEXANRcg2kzKJrfGXpvCK3E6k28\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"189c3843-8d92-419b-a8b2-e343ea02c8fd\"}","cmdEventType":"MAINT.CANCEL","ctxUserId":"2","httpmethod":"GET","_":"1416513869627","uuid":"189c3843-8d92-419b-a8b2-e343ea02c8fd","ctxAccountId":"2","ctxStartEventId":"140662"},
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 345049793560, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
2014-11-20 15:04:28,577 DEBUG [c.c.a.ApiServlet]
(catalina-exec-12:ctx-78ec5c48 ctx-e7500b2b) ===END===  172.23.0.1 -- GET

command=cancelHostMaintenance&id=189c3843-8d92-419b-a8b2-e343ea02c8fd&response=json&sessionkey=OEXANRcg2kzKJrfGXpvCK3E6k28%3D&_=1416513869627
2014-11-20 15:04:28,601 ERROR [c.c.a.ApiAsyncJobDispatcher]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Unexpected exception while
executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
java.lang.NullPointerException
at

com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
at

com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
at

com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)




I have tried upgrading to the latest git build 4.4.2 and the problem still
exists.   I think it started in 4.4.1 because it used to work properly in
4.4.0.  I also deleted and re-created the SSVM but that did not help
either.  Does anyone have a solution or workaround?  Is there a way to
manually take a host out of maintenance? I think there is more to it than
setting the status in the DB?


— Steve













--
Daan


Re: ALL Hosts Stuck in Maintenance

Posted by Daan Hoogland <da...@gmail.com>.
Steven,

I had a look at your stacktrace. It seems a vm is marked to be migrated
back during maintenance mode cancel and it doesn't exist. Did you destroy
any of the vms on the host while it was in maintenance?

There is a number of jobs for migration each of which pertains to a vm. For
one of them the vm does not exist. Doesn't exist in this case means that it
is not in the database. This is a bug, obviously. I can't be sure what
happened during the maintenance mode and thus not what the nature of the
bug exactly is. It would be nice if you could to investigate this.

Daan


On Fri, Nov 21, 2014 at 6:35 PM, Steve Searles <ss...@zimcom.net> wrote:

>  Below is the full NPE from the catalina.out if that would help.  This is
> the same NPE for all hypervisor types.
>
>  INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-10:ctx-208629c7
> job-12679) Add job-12679 into job monitoring
> ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-10:ctx-208629c7
> job-12679) Unexpected exception while executing
> org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
> java.lang.NullPointerException
> at
> com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
> at
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
> at
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> at
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
> at com.sun.proxy.$Proxy148.cancelMaintenance(Unknown Source)
> at
> org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:102)
> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
> at
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
> at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-10:ctx-208629c7
> job-12679) Remove job-12679 from job monitoring
>
>
>    Steven Searles, CTO | ssearles@zimcom.net
>  Zimcom Internet Solutions  | www.zimcom.net
>  O: 513.231.9500  |  D: 513.233.4130
>
>
>  On Nov 21, 2014, at 12:31 PM, Steve Searles <ss...@zimcom.net> wrote:
>
>  In my case the agent is connected fine.  I have tried even restarting
> the hosts.  I have the agent log in debug for the KVM server and am not
> seeing the mgmt server even trying to execute anything when coming out of
> maintenance.  I just get the NPE immediately in the management server
> logs.
>
>   Steven Searles, CTO | ssearles@zimcom.net
>  Zimcom Internet Solutions  | www.zimcom.net
>  O: 513.231.9500  |  D: 513.233.4130
>
> <image001.jpg>
>
>  On Nov 21, 2014, at 10:54 AM, Andrija Panic <an...@gmail.com>
> wrote:
>
> Experienced similar behaviour, for kvm - seems like restarting libvirtd and
> give it some time to settle, and than agent connects ... on its own...
>
> Sent from Google Nexus 4
> On Nov 21, 2014 4:43 PM, "Steve Searles" <ss...@zimcom.net> wrote:
>
> For some reason it is affecting every host.  VMware, KVM, and XenServer.
> No hosts will come out of maintenance same NPE for all.  Storage will go in
> and out of maintenance fine.  Weird. Any ideas?  The only way to get the
> host back online is to remove it and re-add it.
>
>
>  Steven Searles
>
>
> On Nov 21, 2014, at 9:04 AM, Steve Searles <ss...@zimcom.net> wrote:
>
> Yea, tried all that.  Now its affecting KVM as well. Thanks for the
> reply, I will dig a bit deeper.
>
>
>  Steven Searles
>
>
> On Nov 20, 2014, at 4:00 PM, Motty Cruz <mo...@gmail.com> wrote:
>
> Hi Steve,
> have you try stopping and restarting ACS? also I would do the following in
> xenserver
> xe-toolstack-restart
> it won't affect your VMs.
>
> To restart Cloudstack
> service cloudstack-management restart (in CentOs)
>
> Thanks,
> Motty
> On 11/20/2014 12:55 PM, Steve Searles wrote:
>
> Found this in the catalina.out log on the management server.
>
> INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-4:ctx-8faa1563
> job-12695) Add job-12695 into job monitoring
> WARN  [c.c.a.d.ParamGenericValidationWorker]
> (API-Job-Executor-4:ctx-8faa1563 job-12695 ctx-81dfab11) Received unknown
> parameters for command cancelHostMaintenance. Unknown parameters :
> signatureversion expires
> ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-4:ctx-8faa1563
> job-12695) Unexpected exception while executing
> org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
>
>
> Anyone know what the unknown parameters are all about?
>
> —Steve
>
>
>
> On Nov 20, 2014, at 3:14 PM, Steve Searles <ssearles@zimcom.net<
> mailto:ssearles@zimcom.net <ss...@zimcom.net> <ss...@zimcom.net>>>
> wrote:
>
> CS 4.4.1 - 4.4.2
> I am having a problem with my xenserver hosts getting stuck in
> maintenance.  Trying to cancel the maintenance produces the following NPE.
>
> 2014-11-20 15:04:28,575 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Add job-12626 into job
> monitoring
> 2014-11-20 15:04:28,576 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Executing AsyncJobVO
> {id:12626, userId: 2, accountId: 2, instanceType: Host, instanceId: 114,
> cmd: org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd,
> cmdInfo:
>
> {"id":"189c3843-8d92-419b-a8b2-e343ea02c8fd","response":"json","sessionkey":"OEXANRcg2kzKJrfGXpvCK3E6k28\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"189c3843-8d92-419b-a8b2-e343ea02c8fd\"}","cmdEventType":"MAINT.CANCEL","ctxUserId":"2","httpmethod":"GET","_":"1416513869627","uuid":"189c3843-8d92-419b-a8b2-e343ea02c8fd","ctxAccountId":"2","ctxStartEventId":"140662"},
> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
> result: null, initMsid: 345049793560, completeMsid: null, lastUpdated:
> null, lastPolled: null, created: null}
> 2014-11-20 15:04:28,577 DEBUG [c.c.a.ApiServlet]
> (catalina-exec-12:ctx-78ec5c48 ctx-e7500b2b) ===END===  172.23.0.1 -- GET
>
> command=cancelHostMaintenance&id=189c3843-8d92-419b-a8b2-e343ea02c8fd&response=json&sessionkey=OEXANRcg2kzKJrfGXpvCK3E6k28%3D&_=1416513869627
> 2014-11-20 15:04:28,601 ERROR [c.c.a.ApiAsyncJobDispatcher]
> (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Unexpected exception while
> executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
> java.lang.NullPointerException
> at
>
> com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
> at
>
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
> at
>
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>
>
>
> I have tried upgrading to the latest git build 4.4.2 and the problem still
> exists.   I think it started in 4.4.1 because it used to work properly in
> 4.4.0.  I also deleted and re-created the SSVM but that did not help
> either.  Does anyone have a solution or workaround?  Is there a way to
> manually take a host out of maintenance? I think there is more to it than
> setting the status in the DB?
>
>
> — Steve
>
>
>
>
>
>
>
>
>
>
>


-- 
Daan

Re: ALL Hosts Stuck in Maintenance

Posted by Steve Searles <ss...@zimcom.net>.
Below is the full NPE from the catalina.out if that would help.  This is the same NPE for all hypervisor types.

INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-10:ctx-208629c7 job-12679) Add job-12679 into job monitoring
ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-10:ctx-208629c7 job-12679) Unexpected exception while executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
java.lang.NullPointerException
at com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy148.cancelMaintenance(Unknown Source)
at org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:102)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-10:ctx-208629c7 job-12679) Remove job-12679 from job monitoring


Steven Searles, CTO | ssearles@zimcom.net<ma...@zimcom.net>
Zimcom Internet Solutions  | www.zimcom.net<http://www.zimcom.net>
O: 513.231.9500  |  D: 513.233.4130

[cid:image001.jpg@01CF54D3.D25E1ED0]

On Nov 21, 2014, at 12:31 PM, Steve Searles <ss...@zimcom.net>> wrote:

In my case the agent is connected fine.  I have tried even restarting the hosts.  I have the agent log in debug for the KVM server and am not seeing the mgmt server even trying to execute anything when coming out of maintenance.  I just get the NPE immediately in the management server logs.

Steven Searles, CTO | ssearles@zimcom.net<ma...@zimcom.net>
Zimcom Internet Solutions  | www.zimcom.net<http://www.zimcom.net/>
O: 513.231.9500  |  D: 513.233.4130

<image001.jpg>

On Nov 21, 2014, at 10:54 AM, Andrija Panic <an...@gmail.com>> wrote:

Experienced similar behaviour, for kvm - seems like restarting libvirtd and
give it some time to settle, and than agent connects ... on its own...

Sent from Google Nexus 4
On Nov 21, 2014 4:43 PM, "Steve Searles" <ss...@zimcom.net>> wrote:

For some reason it is affecting every host.  VMware, KVM, and XenServer.
No hosts will come out of maintenance same NPE for all.  Storage will go in
and out of maintenance fine.  Weird. Any ideas?  The only way to get the
host back online is to remove it and re-add it.


 Steven Searles


On Nov 21, 2014, at 9:04 AM, Steve Searles <ss...@zimcom.net>> wrote:

Yea, tried all that.  Now its affecting KVM as well. Thanks for the
reply, I will dig a bit deeper.


 Steven Searles


On Nov 20, 2014, at 4:00 PM, Motty Cruz <mo...@gmail.com>> wrote:

Hi Steve,
have you try stopping and restarting ACS? also I would do the following in
xenserver
xe-toolstack-restart
it won't affect your VMs.

To restart Cloudstack
service cloudstack-management restart (in CentOs)

Thanks,
Motty
On 11/20/2014 12:55 PM, Steve Searles wrote:

Found this in the catalina.out log on the management server.

INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-4:ctx-8faa1563
job-12695) Add job-12695 into job monitoring
WARN  [c.c.a.d.ParamGenericValidationWorker]
(API-Job-Executor-4:ctx-8faa1563 job-12695 ctx-81dfab11) Received unknown
parameters for command cancelHostMaintenance. Unknown parameters :
signatureversion expires
ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-4:ctx-8faa1563
job-12695) Unexpected exception while executing
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd


Anyone know what the unknown parameters are all about?

—Steve



On Nov 20, 2014, at 3:14 PM, Steve Searles <ss...@zimcom.net><
mailto:ssearles@zimcom.net <ss...@zimcom.net>>>> wrote:

CS 4.4.1 - 4.4.2
I am having a problem with my xenserver hosts getting stuck in
maintenance.  Trying to cancel the maintenance produces the following NPE.

2014-11-20 15:04:28,575 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Add job-12626 into job
monitoring
2014-11-20 15:04:28,576 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Executing AsyncJobVO
{id:12626, userId: 2, accountId: 2, instanceType: Host, instanceId: 114,
cmd: org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd,
cmdInfo:
{"id":"189c3843-8d92-419b-a8b2-e343ea02c8fd","response":"json","sessionkey":"OEXANRcg2kzKJrfGXpvCK3E6k28\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"189c3843-8d92-419b-a8b2-e343ea02c8fd\"}","cmdEventType":"MAINT.CANCEL","ctxUserId":"2","httpmethod":"GET","_":"1416513869627","uuid":"189c3843-8d92-419b-a8b2-e343ea02c8fd","ctxAccountId":"2","ctxStartEventId":"140662"},
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 345049793560, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
2014-11-20 15:04:28,577 DEBUG [c.c.a.ApiServlet]
(catalina-exec-12:ctx-78ec5c48 ctx-e7500b2b) ===END===  172.23.0.1 -- GET
command=cancelHostMaintenance&id=189c3843-8d92-419b-a8b2-e343ea02c8fd&response=json&sessionkey=OEXANRcg2kzKJrfGXpvCK3E6k28%3D&_=1416513869627
2014-11-20 15:04:28,601 ERROR [c.c.a.ApiAsyncJobDispatcher]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Unexpected exception while
executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
java.lang.NullPointerException
at
com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)




I have tried upgrading to the latest git build 4.4.2 and the problem still
exists.   I think it started in 4.4.1 because it used to work properly in
4.4.0.  I also deleted and re-created the SSVM but that did not help
either.  Does anyone have a solution or workaround?  Is there a way to
manually take a host out of maintenance? I think there is more to it than
setting the status in the DB?


— Steve











Re: ALL Hosts Stuck in Maintenance

Posted by Steve Searles <ss...@zimcom.net>.
In my case the agent is connected fine.  I have tried even restarting the hosts.  I have the agent log in debug for the KVM server and am not seeing the mgmt server even trying to execute anything when coming out of maintenance.  I just get the NPE immediately in the management server logs.

Steven Searles, CTO | ssearles@zimcom.net<ma...@zimcom.net>
Zimcom Internet Solutions  | www.zimcom.net<http://www.zimcom.net>
O: 513.231.9500  |  D: 513.233.4130

[cid:image001.jpg@01CF54D3.D25E1ED0]

On Nov 21, 2014, at 10:54 AM, Andrija Panic <an...@gmail.com>> wrote:

Experienced similar behaviour, for kvm - seems like restarting libvirtd and
give it some time to settle, and than agent connects ... on its own...

Sent from Google Nexus 4
On Nov 21, 2014 4:43 PM, "Steve Searles" <ss...@zimcom.net>> wrote:

For some reason it is affecting every host.  VMware, KVM, and XenServer.
No hosts will come out of maintenance same NPE for all.  Storage will go in
and out of maintenance fine.  Weird. Any ideas?  The only way to get the
host back online is to remove it and re-add it.


 Steven Searles


On Nov 21, 2014, at 9:04 AM, Steve Searles <ss...@zimcom.net>> wrote:

Yea, tried all that.  Now its affecting KVM as well. Thanks for the
reply, I will dig a bit deeper.


 Steven Searles


On Nov 20, 2014, at 4:00 PM, Motty Cruz <mo...@gmail.com>> wrote:

Hi Steve,
have you try stopping and restarting ACS? also I would do the following in
xenserver
xe-toolstack-restart
it won't affect your VMs.

To restart Cloudstack
service cloudstack-management restart (in CentOs)

Thanks,
Motty
On 11/20/2014 12:55 PM, Steve Searles wrote:

Found this in the catalina.out log on the management server.

INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-4:ctx-8faa1563
job-12695) Add job-12695 into job monitoring
WARN  [c.c.a.d.ParamGenericValidationWorker]
(API-Job-Executor-4:ctx-8faa1563 job-12695 ctx-81dfab11) Received unknown
parameters for command cancelHostMaintenance. Unknown parameters :
signatureversion expires
ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-4:ctx-8faa1563
job-12695) Unexpected exception while executing
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd


Anyone know what the unknown parameters are all about?

—Steve



On Nov 20, 2014, at 3:14 PM, Steve Searles <ss...@zimcom.net><
mailto:ssearles@zimcom.net <ss...@zimcom.net>>>> wrote:

CS 4.4.1 - 4.4.2
I am having a problem with my xenserver hosts getting stuck in
maintenance.  Trying to cancel the maintenance produces the following NPE.

2014-11-20 15:04:28,575 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Add job-12626 into job
monitoring
2014-11-20 15:04:28,576 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Executing AsyncJobVO
{id:12626, userId: 2, accountId: 2, instanceType: Host, instanceId: 114,
cmd: org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd,
cmdInfo:
{"id":"189c3843-8d92-419b-a8b2-e343ea02c8fd","response":"json","sessionkey":"OEXANRcg2kzKJrfGXpvCK3E6k28\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"189c3843-8d92-419b-a8b2-e343ea02c8fd\"}","cmdEventType":"MAINT.CANCEL","ctxUserId":"2","httpmethod":"GET","_":"1416513869627","uuid":"189c3843-8d92-419b-a8b2-e343ea02c8fd","ctxAccountId":"2","ctxStartEventId":"140662"},
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 345049793560, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
2014-11-20 15:04:28,577 DEBUG [c.c.a.ApiServlet]
(catalina-exec-12:ctx-78ec5c48 ctx-e7500b2b) ===END===  172.23.0.1 -- GET
command=cancelHostMaintenance&id=189c3843-8d92-419b-a8b2-e343ea02c8fd&response=json&sessionkey=OEXANRcg2kzKJrfGXpvCK3E6k28%3D&_=1416513869627
2014-11-20 15:04:28,601 ERROR [c.c.a.ApiAsyncJobDispatcher]
(API-Job-Executor-14:ctx-4e8a63d4 job-12626) Unexpected exception while
executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
java.lang.NullPointerException
at
com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)




I have tried upgrading to the latest git build 4.4.2 and the problem still
exists.   I think it started in 4.4.1 because it used to work properly in
4.4.0.  I also deleted and re-created the SSVM but that did not help
either.  Does anyone have a solution or workaround?  Is there a way to
manually take a host out of maintenance? I think there is more to it than
setting the status in the DB?


— Steve










Re: ALL Hosts Stuck in Maintenance

Posted by Andrija Panic <an...@gmail.com>.
Experienced similar behaviour, for kvm - seems like restarting libvirtd and
give it some time to settle, and than agent connects ... on its own...

Sent from Google Nexus 4
On Nov 21, 2014 4:43 PM, "Steve Searles" <ss...@zimcom.net> wrote:

>  For some reason it is affecting every host.  VMware, KVM, and XenServer.
> No hosts will come out of maintenance same NPE for all.  Storage will go in
> and out of maintenance fine.  Weird. Any ideas?  The only way to get the
> host back online is to remove it and re-add it.
>
>
>   Steven Searles
>
>
>  On Nov 21, 2014, at 9:04 AM, Steve Searles <ss...@zimcom.net> wrote:
>
>  Yea, tried all that.  Now its affecting KVM as well. Thanks for the
> reply, I will dig a bit deeper.
>
>
>   Steven Searles
>
>
>  On Nov 20, 2014, at 4:00 PM, Motty Cruz <mo...@gmail.com> wrote:
>
> Hi Steve,
> have you try stopping and restarting ACS? also I would do the following in
> xenserver
> xe-toolstack-restart
> it won't affect your VMs.
>
> To restart Cloudstack
> service cloudstack-management restart (in CentOs)
>
> Thanks,
> Motty
> On 11/20/2014 12:55 PM, Steve Searles wrote:
>
> Found this in the catalina.out log on the management server.
>
> INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-4:ctx-8faa1563
> job-12695) Add job-12695 into job monitoring
> WARN  [c.c.a.d.ParamGenericValidationWorker]
> (API-Job-Executor-4:ctx-8faa1563 job-12695 ctx-81dfab11) Received unknown
> parameters for command cancelHostMaintenance. Unknown parameters :
> signatureversion expires
> ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-4:ctx-8faa1563
> job-12695) Unexpected exception while executing
> org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
>
>
> Anyone know what the unknown parameters are all about?
>
> —Steve
>
>
>
> On Nov 20, 2014, at 3:14 PM, Steve Searles <ssearles@zimcom.net<
> mailto:ssearles@zimcom.net <ss...@zimcom.net>>> wrote:
>
> CS 4.4.1 - 4.4.2
> I am having a problem with my xenserver hosts getting stuck in
> maintenance.  Trying to cancel the maintenance produces the following NPE.
>
> 2014-11-20 15:04:28,575 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Add job-12626 into job
> monitoring
> 2014-11-20 15:04:28,576 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Executing AsyncJobVO
> {id:12626, userId: 2, accountId: 2, instanceType: Host, instanceId: 114,
> cmd: org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd,
> cmdInfo:
> {"id":"189c3843-8d92-419b-a8b2-e343ea02c8fd","response":"json","sessionkey":"OEXANRcg2kzKJrfGXpvCK3E6k28\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"189c3843-8d92-419b-a8b2-e343ea02c8fd\"}","cmdEventType":"MAINT.CANCEL","ctxUserId":"2","httpmethod":"GET","_":"1416513869627","uuid":"189c3843-8d92-419b-a8b2-e343ea02c8fd","ctxAccountId":"2","ctxStartEventId":"140662"},
> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
> result: null, initMsid: 345049793560, completeMsid: null, lastUpdated:
> null, lastPolled: null, created: null}
> 2014-11-20 15:04:28,577 DEBUG [c.c.a.ApiServlet]
> (catalina-exec-12:ctx-78ec5c48 ctx-e7500b2b) ===END===  172.23.0.1 -- GET
>  command=cancelHostMaintenance&id=189c3843-8d92-419b-a8b2-e343ea02c8fd&response=json&sessionkey=OEXANRcg2kzKJrfGXpvCK3E6k28%3D&_=1416513869627
> 2014-11-20 15:04:28,601 ERROR [c.c.a.ApiAsyncJobDispatcher]
> (API-Job-Executor-14:ctx-4e8a63d4 job-12626) Unexpected exception while
> executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
> java.lang.NullPointerException
> at
> com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
> at
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
> at
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>
>
>
> I have tried upgrading to the latest git build 4.4.2 and the problem still
> exists.   I think it started in 4.4.1 because it used to work properly in
> 4.4.0.  I also deleted and re-created the SSVM but that did not help
> either.  Does anyone have a solution or workaround?  Is there a way to
> manually take a host out of maintenance? I think there is more to it than
> setting the status in the DB?
>
>
> — Steve
>
>
>
>
>
>
>
>

AW: ALL Hosts Stuck in Maintenance

Posted by Martin Emrich <ma...@empolis.com>.
Ok thanks... We found out that the Host can be safely removed and readded to CloudStack without further disturbance, so that's currently our Way To Go ;)
This even works for XenServer pool masters.

Ciao

Martin 

-----Ursprüngliche Nachricht-----
Von: Daan Hoogland [mailto:daan.hoogland@gmail.com] 
Gesendet: Mittwoch, 28. Januar 2015 11:34
An: users@cloudstack.apache.org
Betreff: Re: ALL Hosts Stuck in Maintenance

Martin, My diagnosis hasn't been altered studied or challanged ;)

for a quick fix vm != null && should be added in the expression line
2083 of ResourceManagerImpl. A more robust solution must be possible but i don't have time to look into that right now.

On Tue, Jan 27, 2015 at 4:16 PM, Martin Emrich <ma...@empolis.com> wrote:
> Hi!
>
> Any news on this issue? I just fell in to this pit again :(
>
> Regards,
>
> Martin
>
> -----Ursprüngliche Nachricht-----
> Von: Martin Emrich [mailto:martin.emrich@empolis.com]
> Gesendet: Dienstag, 25. November 2014 13:39
> An: users@cloudstack.apache.org
> Betreff: Re: ALL Hosts Stuck in Maintenance
>
> Hi!
>
> Same problem here with CS 4.4.1 and 5 XenServers in one Cluster: Two hosts are in maintenance mode, and canceling maintenance mode results in "Internal Server Error". Here's my backtrace:
>
> 2014-11-25 13:08:45,448 ERROR [c.c.a.ApiAsyncJobDispatcher] 
> (API-Job-Executor-54:ctx-a076520e job-765) Unexpected exception while 
> executing 
> org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
> java.lang.NullPointerException
>          at
> com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
>          at
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
>          at
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
>          at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
>          at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>          at java.lang.reflect.Method.invoke(Method.java:606)
>          at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>          at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>          at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>          at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>          at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>          at
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>          at com.sun.proxy.$Proxy148.cancelMaintenance(Unknown Source)
>          at
> org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:102)
>          at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
>          at
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
>          at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
>          at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>          at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>          at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>          at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>          at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>          at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
>          at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>          at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>          at java.lang.Thread.run(Thread.java:745)
> 2014-11-25 13:08:45,449 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] 
> (API-Job-Executor-54:ctx-a076520e job-765) Complete async job-765,
> jobStatus: FAILED, resultCode: 530, result:
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":
> [],"errorcode":530}
>
>
> Ciao
>
> Martin



--
Daan

Re: ALL Hosts Stuck in Maintenance

Posted by Daan Hoogland <da...@gmail.com>.
Martin, My diagnosis hasn't been altered studied or challanged ;)

for a quick fix vm != null && should be added in the expression line
2083 of ResourceManagerImpl. A more robust solution must be possible
but i don't have time to look into that right now.

On Tue, Jan 27, 2015 at 4:16 PM, Martin Emrich
<ma...@empolis.com> wrote:
> Hi!
>
> Any news on this issue? I just fell in to this pit again :(
>
> Regards,
>
> Martin
>
> -----Ursprüngliche Nachricht-----
> Von: Martin Emrich [mailto:martin.emrich@empolis.com]
> Gesendet: Dienstag, 25. November 2014 13:39
> An: users@cloudstack.apache.org
> Betreff: Re: ALL Hosts Stuck in Maintenance
>
> Hi!
>
> Same problem here with CS 4.4.1 and 5 XenServers in one Cluster: Two hosts are in maintenance mode, and canceling maintenance mode results in "Internal Server Error". Here's my backtrace:
>
> 2014-11-25 13:08:45,448 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-54:ctx-a076520e job-765) Unexpected exception while executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
> java.lang.NullPointerException
>          at
> com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
>          at
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
>          at
> com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
>          at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
>          at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>          at java.lang.reflect.Method.invoke(Method.java:606)
>          at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>          at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>          at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>          at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>          at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>          at
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>          at com.sun.proxy.$Proxy148.cancelMaintenance(Unknown Source)
>          at
> org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:102)
>          at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
>          at
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
>          at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
>          at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>          at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>          at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>          at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>          at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>          at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
>          at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>          at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>          at java.lang.Thread.run(Thread.java:745)
> 2014-11-25 13:08:45,449 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-54:ctx-a076520e job-765) Complete async job-765,
> jobStatus: FAILED, resultCode: 530, result:
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530}
>
>
> Ciao
>
> Martin



-- 
Daan

AW: ALL Hosts Stuck in Maintenance

Posted by Martin Emrich <ma...@empolis.com>.
Hi!

Any news on this issue? I just fell in to this pit again :(

Regards,

Martin

-----Ursprüngliche Nachricht-----
Von: Martin Emrich [mailto:martin.emrich@empolis.com] 
Gesendet: Dienstag, 25. November 2014 13:39
An: users@cloudstack.apache.org
Betreff: Re: ALL Hosts Stuck in Maintenance

Hi!

Same problem here with CS 4.4.1 and 5 XenServers in one Cluster: Two hosts are in maintenance mode, and canceling maintenance mode results in "Internal Server Error". Here's my backtrace:

2014-11-25 13:08:45,448 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-54:ctx-a076520e job-765) Unexpected exception while executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
java.lang.NullPointerException
         at
com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
         at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
         at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
         at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:606)
         at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
         at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
         at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
         at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
         at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
         at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
         at com.sun.proxy.$Proxy148.cancelMaintenance(Unknown Source)
         at
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:102)
         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
         at
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
         at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
         at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
         at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
         at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
         at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
         at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
         at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
         at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
         at java.lang.Thread.run(Thread.java:745)
2014-11-25 13:08:45,449 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-54:ctx-a076520e job-765) Complete async job-765,
jobStatus: FAILED, resultCode: 530, result: 
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530}


Ciao

Martin

Re: ALL Hosts Stuck in Maintenance

Posted by Martin Emrich <ma...@empolis.com>.
Hi!

Same problem here with CS 4.4.1 and 5 XenServers in one Cluster: Two 
hosts are in maintenance mode, and canceling maintenance mode results in 
"Internal Server Error". Here's my backtrace:

2014-11-25 13:08:45,448 ERROR [c.c.a.ApiAsyncJobDispatcher] 
(API-Job-Executor-54:ctx-a076520e job-765) Unexpected exception while 
executing org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd
java.lang.NullPointerException
         at 
com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
         at 
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
         at 
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
         at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:606)
         at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
         at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
         at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
         at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
         at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
         at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
         at com.sun.proxy.$Proxy148.cancelMaintenance(Unknown Source)
         at 
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:102)
         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
         at 
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
         at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
         at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
         at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
         at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
         at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
         at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
         at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
         at java.lang.Thread.run(Thread.java:745)
2014-11-25 13:08:45,449 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] 
(API-Job-Executor-54:ctx-a076520e job-765) Complete async job-765, 
jobStatus: FAILED, resultCode: 530, result: 
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530}


Ciao

Martin