You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "manasaveloori (JIRA)" <ji...@apache.org> on 2013/12/31 08:42:50 UTC

[jira] [Created] (CLOUDSTACK-5688) NPE when the KVM host is rebooted on the upgraded environment

manasaveloori created CLOUDSTACK-5688:
-----------------------------------------

             Summary: NPE when the KVM host is rebooted  on the upgraded environment
                 Key: CLOUDSTACK-5688
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5688
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: KVM, Upgrade
    Affects Versions: 4.3.0
         Environment: upgraded from 4.2.1 to 4.3
            Reporter: manasaveloori
            Priority: Critical
             Fix For: 4.3.0


Steps:
1.	Have CS 4.2.1 with KVM host.---advanced zone.
2.	Upgrade the CS to 4.3.
3.	Reboot the KVM host(did not enable maintenance)
4.	The KVM host is connected and then disconnecting from the CS.
Observing the following NPE in CS logs:


2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BaremetalDhcpManagerImpl
2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to NetworkUsageManagerImpl
2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BaremetalPxeManagerImpl
2013-12-31 18:03:38,128 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to PaloAltoExternalFirewallElement
2013-12-31 18:03:38,128 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to KvmServerDiscoverer
2013-12-31 18:03:38,172 DEBUG [c.c.r.ResourceState] (AgentConnectTaskPool-194:ctx-00137ad5) Resource state update: [id = 1; name = RHLE64; old state = Enabled; event = InternalCreated; new state = Enabled]
2013-12-31 18:03:38,173 DEBUG [c.c.h.Status] (AgentConnectTaskPool-194:ctx-00137ad5) Transition:[Resource state = Enabled, Agent event = AgentConnected, Host id = 1, name = RHLE64]
2013-12-31 18:03:38,185 DEBUG [c.c.h.Status] (AgentConnectTaskPool-194:ctx-00137ad5) Agent status update: [id = 1; name = RHLE64; old status = Alert; event = AgentConnected; new status = Connecting; old update count = 390; new update count = 391]
2013-12-31 18:03:38,185 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) create ClusteredAgentAttache for 1
2013-12-31 18:03:38,188 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: XcpServerDiscoverer
2013-12-31 18:03:38,189 DEBUG [c.c.h.x.d.XcpServerDiscoverer] (AgentConnectTaskPool-194:ctx-00137ad5) Not XenServer so moving on.
2013-12-31 18:03:38,189 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: HypervServerDiscoverer
2013-12-31 18:03:38,189 DEBUG [c.c.h.h.d.HypervServerDiscoverer] (AgentConnectTaskPool-194:ctx-00137ad5) Not Hyper-V hypervisor, so moving on.
2013-12-31 18:03:38,189 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: StoragePoolMonitor
2013-12-31 18:03:38,202 DEBUG [c.c.s.l.StoragePoolMonitor] (AgentConnectTaskPool-194:ctx-00137ad5) Host 1 connected, sending down storage pool information ...
2013-12-31 18:03:38,205 DEBUG [c.c.s.StorageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Adding pool null to  host 1
2013-12-31 18:03:38,215 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387585: Sending  { Cmd , MgmtId: 6758231703598, via: 1(RHLE64), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":true,"pool":{"id":1,"uuid":"efbe1d5a-12c4-3a30-bef6-a219dfe81249","host":"10.147.40.27","path":"/export/home/manasa/primaryKVM","port":2049,"type":"NetworkFilesystem"},"localPath":"/mnt//efbe1d5a-12c4-3a30-bef6-a219dfe81249","wait":0}}] }
2013-12-31 18:03:38,220 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-6:null) Ping from 1
2013-12-31 18:03:38,228 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-6:null) Not processing PingRoutingCommand for agent id=0; can't find the host in the DB
2013-12-31 18:03:38,233 DEBUG [c.c.a.t.Request] (AgentManager-Handler-4:null) Seq 1-920387585: Processing:  { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.ModifyStoragePoolAnswer":{"poolInfo":{"host":"10.147.40.27","localPath":"/mnt//efbe1d5a-12c4-3a30-bef6-a219dfe81249","hostPath":"/export/home/manasa/primaryKVM","poolType":"NetworkFilesystem","capacityBytes":52844687360,"availableBytes":40225341440},"templateInfo":{},"result":true,"wait":0}}] }
2013-12-31 18:03:38,234 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387585: Received:  { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 10, { ModifyStoragePoolAnswer } }
2013-12-31 18:03:38,273 INFO  [o.a.c.s.d.p.DefaultHostListener] (AgentConnectTaskPool-194:ctx-00137ad5) Connection established between org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@22daf59c host + 1
2013-12-31 18:03:38,283 DEBUG [c.c.s.StorageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Successfully set Capacity - 105689374720 for capacity type - 3 , DataCenterId - 1, HostOrPoolId - 1, PodId null
2013-12-31 18:03:38,283 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: DeploymentPlanningManagerImpl
2013-12-31 18:03:38,284 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: NetworkOrchestrator
2013-12-31 18:03:38,286 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Host's hypervisorType is: KVM
2013-12-31 18:03:38,293 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Sending CheckNetworkCommand to check the Network is setup correctly on Agent
2013-12-31 18:03:38,306 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387586: Sending  { Cmd , MgmtId: 6758231703598, via: 1(RHLE64), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.CheckNetworkCommand":{"networkInfoList":[{"physicalNetworkId":200}],"wait":0}}] }
2013-12-31 18:03:38,360 DEBUG [c.c.a.t.Request] (AgentManager-Handler-9:null) Seq 1-920387586: Processing:  { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.CheckNetworkAnswer":{"_reconnect":false,"result":true,"wait":0}}] }
2013-12-31 18:03:38,360 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387586: Received:  { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 110, { CheckNetworkAnswer } }
2013-12-31 18:03:38,360 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Network setup is correct on Agent
2013-12-31 18:03:38,361 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: VmwareManagerImpl
2013-12-31 18:03:38,362 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: SecondaryStorageListener
2013-12-31 18:03:38,362 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: ClusteredVirtualMachineManagerImpl
2013-12-31 18:03:38,362 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Received startup command from hypervisor host. host id: 1
2013-12-31 18:03:38,362 INFO  [c.c.v.VirtualMachinePowerStateSyncImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Reset VM power state sync for host: 1
2013-12-31 18:03:38,361 DEBUG [c.c.a.m.AgentAttache] (AgentManager-Handler-9:null) Seq 1-920387586: No more commands found
2013-12-31 18:03:38,383 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Found 8 VMs for host 1
2013-12-31 18:03:38,385 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM s-1-VM: cs state = Running and realState = Stopped
2013-12-31 18:03:38,385 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM s-1-VM: cs state = Running and realState = Stopped
2013-12-31 18:03:38,386 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM does not require investigation so I'm marking it as Stopped: VM[SecondaryStorageVm|s-1-VM]
2013-12-31 18:03:38,386 WARN  [o.a.c.alerts] (AgentConnectTaskPool-194:ctx-00137ad5)  alertType:: 19 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name: s-1-VM, id: 1) stopped unexpectedly on host id:1, availability zone id:1, pod id:1
2013-12-31 18:03:38,396 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM is not HA enabled so we're done.
2013-12-31 18:03:38,396 ERROR [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Monitor ClusteredVirtualMachineManagerImpl says there is an error in the connect process for 1 due to null
java.lang.NullPointerException
        at com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1264)
        at com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
        at com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2715)
        at com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2278)
        at com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:2929)
        at com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
        at com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
        at com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
        at com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
2013-12-31 18:03:38,398 INFO  [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Host 1 is disconnecting with event AgentDisconnected
2013-12-31 18:03:38,401 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) The next status of agent 1is Alert, current status is Connecting
2013-12-31 18:03:38,402 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Deregistering link for 1 with state Alert
2013-12-31 18:03:38,402 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Remove Agent : 1
2013-12-31 18:03:38,402 DEBUG [c.c.a.m.ConnectedAgentAttache] (AgentConnectTaskPool-194:ctx-00137ad5) Processing Disconnect.
2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer
2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.listener.StoragePoolMonitor
2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.deploy.DeploymentPlanningManagerImpl
2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.vmware.manager.VmwareManagerImpl
2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.secondary.SecondaryStorageListener
2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.network.security.SecurityGroupListener
2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.consoleproxy.ConsoleProxyListener
2013-12-31 18:03:38,407 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
2013-12-31 18:03:38,407 DEBUG [c.c.n.NetworkUsageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Disconnected called on 1 with status Alert
2013-12-31 18:03:38,407 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.LocalStoragePoolListener
2013-12-31 18:03:38,408 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.agent.manager.AgentManager



The KVM host will be in Alert state:


The log snippet from agent log:

2013-12-31 13:06:17,978 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Reconnecting...
2013-12-31 13:06:17,979 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.147.38.164:8250
2013-12-31 13:06:18,070 INFO  [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
2013-12-31 13:06:18,070 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.147.38.164:8250
2013-12-31 13:06:18,124 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
2013-12-31 13:06:18,125 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
2013-12-31 13:06:18,125 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0
2013-12-31 13:06:23,070 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Connected to the server
2013-12-31 13:06:23,378 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Lost connection to the server. Dealing with the remaining commands...
2013-12-31 13:06:28,379 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Reconnecting...
2013-12-31 13:06:28,380 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.147.38.164:8250
2013-12-31 13:06:28,473 INFO  [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
2013-12-31 13:06:28,473 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.147.38.164:8250
2013-12-31 13:06:28,528 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
2013-12-31 13:06:28,528 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
2013-12-31 13:06:28,529 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0


Attaching the MS and agent logs.
Dumps before and after upgrade.






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)