You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2018/10/30 08:13:00 UTC
[GitHub] rhtyd edited a comment on issue #2890: KVMHAMonitor thread blocks
indefinitely while NFS not available
rhtyd edited a comment on issue #2890: KVMHAMonitor thread blocks indefinitely while NFS not available
URL: https://github.com/apache/cloudstack/issues/2890#issuecomment-434207830
@csquire I could not reproduce the issue. I guess it depends on how you're simulating NFS/shared-storage failure and what your iptables rules are. In my case I simply shutdown the NFS server instead of iptables rules and observed following:
```
.
2018-10-30 08:04:04,842 WARN [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Timed out: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 172.20.0.1 -p /export/testing/primary -m /mnt/05c30f6d-5725-3369-a8d6-b5fb37c7ba8f -h 172.20.1.10 . Output is:
2018-10-30 08:04:04,843 WARN [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) write heartbeat failed: timeout, try: 5 of 5
2018-10-30 08:04:04,843 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) [ignored] interupted between heartbeat retries.
2018-10-30 08:04:04,843 WARN [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) write heartbeat failed: timeout; stopping cloudstack-agent
2018-10-30 08:04:04,843 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 172.20.0.1 -p /export/testing/primary -m /mnt/05c30f6d-5725-3369-a8d6-b5fb37c7ba8f -c
2018-10-30 08:04:04,845 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Executing while with timeout : 60000
2018-10-30 08:04:05,007 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-5:null) (logid:) Executing: /usr/share/cloudstack-common/scripts/vm/network/security_group.py get_rule_logs_for_vms
2018-10-30 08:04:05,009 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-5:null) (logid:) Executing while with timeout : 1800000
2018-10-30 08:04:05,109 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-5:null) (logid:) Execution is successful.
2018-10-30 08:04:05,109 DEBUG [kvm.resource.LibvirtConnection] (UgentTask-5:null) (logid:) Looking for libvirtd connection at: qemu:///system
2018-10-30 08:04:05,114 DEBUG [cloud.agent.Agent] (UgentTask-5:null) (logid:) Sending ping: Seq 1-157: { Cmd , MgmtId: -1, via: 1, Ver: v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{"r-5-VM":{"state":"PowerOn","host":"centos7-kvm1"},"s-1-VM":{"state":"PowerOn","host":"centos7-kvm1"},"v-2-VM":{"state":"PowerOn","host":"centos7-kvm1"},"i-2-4-VM":{"state":"PowerOn","host":"centos7-kvm1"}},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":1,"wait":0}}] }
2018-10-30 08:04:05,166 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null) (logid:) Received response: Seq 1-157: { Ans: , MgmtId: 2485476061376, via: 1, Ver: v1, Flags: 100010, [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":1,"wait":0},"result":true,"wait":0}}] }
2018-10-30 08:04:06,762 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:0aa26d2d) Request:Seq 1-3303108851699548248: { Cmd , MgmtId: 2485476061376, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.GetVncPortCommand":{"id":4,"name":"i-2-4-VM","wait":0}}] }
2018-10-30 08:04:06,762 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:0aa26d2d) Processing command: com.cloud.agent.api.GetVncPortCommand
2018-10-30 08:04:06,762 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:null) (logid:0aa26d2d) Looking for libvirtd connection at: qemu:///system
2018-10-30 08:04:06,765 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:0aa26d2d) Seq 1-3303108851699548248: { Ans: , MgmtId: 2485476061376, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"172.20.1.10","port":5903,"result":true,"wait":0}}] }
2018-10-30 08:04:09,871 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Exit value is 143
2018-10-30 08:04:09,872 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) (logid:) Redirecting to /bin/systemctl stop cloudstack-agent.service
2018-10-30 08:04:09,873 INFO [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill
2018-10-30 08:04:09,874 DEBUG [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Sending shutdown to management server
2018-10-30 08:04:09,876 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.20.0.1,port=8250,localport=43646] closed on read. Probably -1 returned: Connection closed with -1 on reading size.
2018-10-30 08:04:09,876 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.20.0.1,port=8250,localport=43646]
2018-10-30 08:04:10,875 DEBUG [kvm.resource.LibvirtConnection] (AgentShutdownThread:null) (logid:) Looking for libvirtd connection at: qemu:///system
```
I could see the `kvmheartbeat.sh` try 5 times every 60 seconds and eventually failing to shutdown the agent by this: `/bin/systemctl stop cloudstack-agent.service`. On the management server side, I saw the host state go from Up to Disconnected:
![screenshot from 2018-10-30 13-38-20](https://user-images.githubusercontent.com/95203/47704265-223dd100-dc49-11e8-9f71-1aa36e66e47d.png)
The only change in behaviour I see is that the KVM host is not rebooted but only the agent gets shutdown. The downside of this approach is that when the kvm agent is restarted and NFS server is started, the KVM hosts and its VMs are not responsive and I had to reboot the host manually to start on a clean slate, perhaps the previous behaviour of triggering a host reboot was better. Can you advise if we should revert that behaviour - @DaanHoogland @PaulAngus @borisstoyanov ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services