You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "edison su (JIRA)" <ji...@apache.org> on 2013/12/31 22:42:52 UTC

[jira] [Assigned] (CLOUDSTACK-5688) NPE when the KVM host is rebooted on the upgraded environment

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

edison su reassigned CLOUDSTACK-5688:
-------------------------------------

    Assignee: Kelven Yang  (was: edison su)

NPE related to new vmsync

> NPE when the KVM host is rebooted  on the upgraded environment
> --------------------------------------------------------------
>
>                 Key: CLOUDSTACK-5688
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5688
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM, Upgrade
>    Affects Versions: 4.3.0
>         Environment: upgraded from 4.2.1 to 4.3
>            Reporter: manasaveloori
>            Assignee: Kelven Yang
>            Priority: Critical
>             Fix For: 4.3.0
>
>         Attachments: agent.log, management-server.rar, mysqldump421.dmp, mysqldump43.dmp
>
>
> Steps:
> 1.	Have CS 4.2.1 with KVM host.---advanced zone.
> 2.	Upgrade the CS to 4.3.
> 3.	Reboot the KVM host(did not enable maintenance)
> 4.	The KVM host is connected and then disconnecting from the CS.
> Observing the following NPE in CS logs:
> 2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BaremetalDhcpManagerImpl
> 2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to NetworkUsageManagerImpl
> 2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BaremetalPxeManagerImpl
> 2013-12-31 18:03:38,128 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to PaloAltoExternalFirewallElement
> 2013-12-31 18:03:38,128 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to KvmServerDiscoverer
> 2013-12-31 18:03:38,172 DEBUG [c.c.r.ResourceState] (AgentConnectTaskPool-194:ctx-00137ad5) Resource state update: [id = 1; name = RHLE64; old state = Enabled; event = InternalCreated; new state = Enabled]
> 2013-12-31 18:03:38,173 DEBUG [c.c.h.Status] (AgentConnectTaskPool-194:ctx-00137ad5) Transition:[Resource state = Enabled, Agent event = AgentConnected, Host id = 1, name = RHLE64]
> 2013-12-31 18:03:38,185 DEBUG [c.c.h.Status] (AgentConnectTaskPool-194:ctx-00137ad5) Agent status update: [id = 1; name = RHLE64; old status = Alert; event = AgentConnected; new status = Connecting; old update count = 390; new update count = 391]
> 2013-12-31 18:03:38,185 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) create ClusteredAgentAttache for 1
> 2013-12-31 18:03:38,188 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: XcpServerDiscoverer
> 2013-12-31 18:03:38,189 DEBUG [c.c.h.x.d.XcpServerDiscoverer] (AgentConnectTaskPool-194:ctx-00137ad5) Not XenServer so moving on.
> 2013-12-31 18:03:38,189 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: HypervServerDiscoverer
> 2013-12-31 18:03:38,189 DEBUG [c.c.h.h.d.HypervServerDiscoverer] (AgentConnectTaskPool-194:ctx-00137ad5) Not Hyper-V hypervisor, so moving on.
> 2013-12-31 18:03:38,189 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: StoragePoolMonitor
> 2013-12-31 18:03:38,202 DEBUG [c.c.s.l.StoragePoolMonitor] (AgentConnectTaskPool-194:ctx-00137ad5) Host 1 connected, sending down storage pool information ...
> 2013-12-31 18:03:38,205 DEBUG [c.c.s.StorageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Adding pool null to  host 1
> 2013-12-31 18:03:38,215 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387585: Sending  { Cmd , MgmtId: 6758231703598, via: 1(RHLE64), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":true,"pool":{"id":1,"uuid":"efbe1d5a-12c4-3a30-bef6-a219dfe81249","host":"10.147.40.27","path":"/export/home/manasa/primaryKVM","port":2049,"type":"NetworkFilesystem"},"localPath":"/mnt//efbe1d5a-12c4-3a30-bef6-a219dfe81249","wait":0}}] }
> 2013-12-31 18:03:38,220 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-6:null) Ping from 1
> 2013-12-31 18:03:38,228 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-6:null) Not processing PingRoutingCommand for agent id=0; can't find the host in the DB
> 2013-12-31 18:03:38,233 DEBUG [c.c.a.t.Request] (AgentManager-Handler-4:null) Seq 1-920387585: Processing:  { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.ModifyStoragePoolAnswer":{"poolInfo":{"host":"10.147.40.27","localPath":"/mnt//efbe1d5a-12c4-3a30-bef6-a219dfe81249","hostPath":"/export/home/manasa/primaryKVM","poolType":"NetworkFilesystem","capacityBytes":52844687360,"availableBytes":40225341440},"templateInfo":{},"result":true,"wait":0}}] }
> 2013-12-31 18:03:38,234 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387585: Received:  { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 10, { ModifyStoragePoolAnswer } }
> 2013-12-31 18:03:38,273 INFO  [o.a.c.s.d.p.DefaultHostListener] (AgentConnectTaskPool-194:ctx-00137ad5) Connection established between org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@22daf59c host + 1
> 2013-12-31 18:03:38,283 DEBUG [c.c.s.StorageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Successfully set Capacity - 105689374720 for capacity type - 3 , DataCenterId - 1, HostOrPoolId - 1, PodId null
> 2013-12-31 18:03:38,283 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: DeploymentPlanningManagerImpl
> 2013-12-31 18:03:38,284 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: NetworkOrchestrator
> 2013-12-31 18:03:38,286 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Host's hypervisorType is: KVM
> 2013-12-31 18:03:38,293 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Sending CheckNetworkCommand to check the Network is setup correctly on Agent
> 2013-12-31 18:03:38,306 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387586: Sending  { Cmd , MgmtId: 6758231703598, via: 1(RHLE64), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.CheckNetworkCommand":{"networkInfoList":[{"physicalNetworkId":200}],"wait":0}}] }
> 2013-12-31 18:03:38,360 DEBUG [c.c.a.t.Request] (AgentManager-Handler-9:null) Seq 1-920387586: Processing:  { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.CheckNetworkAnswer":{"_reconnect":false,"result":true,"wait":0}}] }
> 2013-12-31 18:03:38,360 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387586: Received:  { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 110, { CheckNetworkAnswer } }
> 2013-12-31 18:03:38,360 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Network setup is correct on Agent
> 2013-12-31 18:03:38,361 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: VmwareManagerImpl
> 2013-12-31 18:03:38,362 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: SecondaryStorageListener
> 2013-12-31 18:03:38,362 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: ClusteredVirtualMachineManagerImpl
> 2013-12-31 18:03:38,362 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Received startup command from hypervisor host. host id: 1
> 2013-12-31 18:03:38,362 INFO  [c.c.v.VirtualMachinePowerStateSyncImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Reset VM power state sync for host: 1
> 2013-12-31 18:03:38,361 DEBUG [c.c.a.m.AgentAttache] (AgentManager-Handler-9:null) Seq 1-920387586: No more commands found
> 2013-12-31 18:03:38,383 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Found 8 VMs for host 1
> 2013-12-31 18:03:38,385 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM s-1-VM: cs state = Running and realState = Stopped
> 2013-12-31 18:03:38,385 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM s-1-VM: cs state = Running and realState = Stopped
> 2013-12-31 18:03:38,386 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM does not require investigation so I'm marking it as Stopped: VM[SecondaryStorageVm|s-1-VM]
> 2013-12-31 18:03:38,386 WARN  [o.a.c.alerts] (AgentConnectTaskPool-194:ctx-00137ad5)  alertType:: 19 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name: s-1-VM, id: 1) stopped unexpectedly on host id:1, availability zone id:1, pod id:1
> 2013-12-31 18:03:38,396 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM is not HA enabled so we're done.
> 2013-12-31 18:03:38,396 ERROR [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Monitor ClusteredVirtualMachineManagerImpl says there is an error in the connect process for 1 due to null
> java.lang.NullPointerException
>         at com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1264)
>         at com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
>         at com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2715)
>         at com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2278)
>         at com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:2929)
>         at com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
>         at com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
>         at com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
>         at com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> 2013-12-31 18:03:38,398 INFO  [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Host 1 is disconnecting with event AgentDisconnected
> 2013-12-31 18:03:38,401 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) The next status of agent 1is Alert, current status is Connecting
> 2013-12-31 18:03:38,402 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Deregistering link for 1 with state Alert
> 2013-12-31 18:03:38,402 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Remove Agent : 1
> 2013-12-31 18:03:38,402 DEBUG [c.c.a.m.ConnectedAgentAttache] (AgentConnectTaskPool-194:ctx-00137ad5) Processing Disconnect.
> 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer
> 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
> 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.listener.StoragePoolMonitor
> 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.deploy.DeploymentPlanningManagerImpl
> 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
> 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.vmware.manager.VmwareManagerImpl
> 2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.secondary.SecondaryStorageListener
> 2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
> 2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.network.security.SecurityGroupListener
> 2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.consoleproxy.ConsoleProxyListener
> 2013-12-31 18:03:38,407 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
> 2013-12-31 18:03:38,407 DEBUG [c.c.n.NetworkUsageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Disconnected called on 1 with status Alert
> 2013-12-31 18:03:38,407 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.LocalStoragePoolListener
> 2013-12-31 18:03:38,408 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.agent.manager.AgentManager
> The KVM host will be in Alert state:
> The log snippet from agent log:
> 2013-12-31 13:06:17,978 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Reconnecting...
> 2013-12-31 13:06:17,979 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.147.38.164:8250
> 2013-12-31 13:06:18,070 INFO  [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
> 2013-12-31 13:06:18,070 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.147.38.164:8250
> 2013-12-31 13:06:18,124 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
> 2013-12-31 13:06:18,125 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
> 2013-12-31 13:06:18,125 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0
> 2013-12-31 13:06:23,070 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Connected to the server
> 2013-12-31 13:06:23,378 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Lost connection to the server. Dealing with the remaining commands...
> 2013-12-31 13:06:28,379 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Reconnecting...
> 2013-12-31 13:06:28,380 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.147.38.164:8250
> 2013-12-31 13:06:28,473 INFO  [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
> 2013-12-31 13:06:28,473 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.147.38.164:8250
> 2013-12-31 13:06:28,528 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
> 2013-12-31 13:06:28,528 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
> 2013-12-31 13:06:28,529 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0
> Attaching the MS and agent logs.
> Dumps before and after upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)