You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2018/10/30 08:09:16 UTC

[GitHub] rhtyd commented on issue #2890: KVMHAMonitor thread blocks indefinitely while NFS not available

rhtyd commented on issue #2890: KVMHAMonitor thread blocks indefinitely while NFS not available
URL: https://github.com/apache/cloudstack/issues/2890#issuecomment-434207830
 
 
   @csquire I could not reproduce the issue. I guess it depends on how you're simulating NFS/shared-storage failure and what your iptables rules are. In my case I simply shutdown the NFS server instead of iptables rules and observed following:
   
   ```
   .
   2018-10-30 08:04:04,842 WARN  [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Timed out: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 172.20.0.1 -p /export/testing/primary -m /mnt/05c30f6d-5725-3369-a8d6-b5fb37c7ba8f -h 172.20.1.10 .  Output is: 
   2018-10-30 08:04:04,843 WARN  [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) write heartbeat failed: timeout, try: 5 of 5
   2018-10-30 08:04:04,843 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) [ignored] interupted between heartbeat retries.
   2018-10-30 08:04:04,843 WARN  [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) write heartbeat failed: timeout; stopping cloudstack-agent
   2018-10-30 08:04:04,843 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 172.20.0.1 -p /export/testing/primary -m /mnt/05c30f6d-5725-3369-a8d6-b5fb37c7ba8f -c 
   2018-10-30 08:04:04,845 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Executing while with timeout : 60000
   2018-10-30 08:04:05,007 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-5:null) (logid:) Executing: /usr/share/cloudstack-common/scripts/vm/network/security_group.py get_rule_logs_for_vms 
   2018-10-30 08:04:05,009 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-5:null) (logid:) Executing while with timeout : 1800000
   2018-10-30 08:04:05,109 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-5:null) (logid:) Execution is successful.
   2018-10-30 08:04:05,109 DEBUG [kvm.resource.LibvirtConnection] (UgentTask-5:null) (logid:) Looking for libvirtd connection at: qemu:///system
   2018-10-30 08:04:05,114 DEBUG [cloud.agent.Agent] (UgentTask-5:null) (logid:) Sending ping: Seq 1-157:  { Cmd , MgmtId: -1, via: 1, Ver: v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{"r-5-VM":{"state":"PowerOn","host":"centos7-kvm1"},"s-1-VM":{"state":"PowerOn","host":"centos7-kvm1"},"v-2-VM":{"state":"PowerOn","host":"centos7-kvm1"},"i-2-4-VM":{"state":"PowerOn","host":"centos7-kvm1"}},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":1,"wait":0}}] }
   2018-10-30 08:04:05,166 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null) (logid:) Received response: Seq 1-157:  { Ans: , MgmtId: 2485476061376, via: 1, Ver: v1, Flags: 100010, [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":1,"wait":0},"result":true,"wait":0}}] }
   2018-10-30 08:04:06,762 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:0aa26d2d) Request:Seq 1-3303108851699548248:  { Cmd , MgmtId: 2485476061376, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.GetVncPortCommand":{"id":4,"name":"i-2-4-VM","wait":0}}] }
   2018-10-30 08:04:06,762 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:0aa26d2d) Processing command: com.cloud.agent.api.GetVncPortCommand
   2018-10-30 08:04:06,762 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:null) (logid:0aa26d2d) Looking for libvirtd connection at: qemu:///system
   2018-10-30 08:04:06,765 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:0aa26d2d) Seq 1-3303108851699548248:  { Ans: , MgmtId: 2485476061376, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"172.20.1.10","port":5903,"result":true,"wait":0}}] }
   2018-10-30 08:04:09,871 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Exit value is 143
   2018-10-30 08:04:09,872 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Redirecting to /bin/systemctl stop cloudstack-agent.service
   2018-10-30 08:04:09,873 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill
   2018-10-30 08:04:09,874 DEBUG [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Sending shutdown to management server
   2018-10-30 08:04:09,876 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.20.0.1,port=8250,localport=43646] closed on read.  Probably -1 returned: Connection closed with -1 on reading size.
   2018-10-30 08:04:09,876 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.20.0.1,port=8250,localport=43646]
   2018-10-30 08:04:10,875 DEBUG [kvm.resource.LibvirtConnection] (AgentShutdownThread:null) (logid:) Looking for libvirtd connection at: qemu:///system
   ```
   
   I could see the `kvmheartbeat.sh` try 5 times every 60 seconds and eventually failing to shutdown the agent by this: `/bin/systemctl stop cloudstack-agent.service`. On the management server side, I saw the host state go from Up to Disconnected:
   
   ![screenshot from 2018-10-30 13-38-20](https://user-images.githubusercontent.com/95203/47704265-223dd100-dc49-11e8-9f71-1aa36e66e47d.png)
   
   The only change in behaviour I see is that the KVM host is not rebooted but only the agent gets shutdown.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services