You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@cloudstack.apache.org by "Wido den Hollander (JIRA)" <ji...@apache.org> on 2017/03/27 12:03:41 UTC

[jira] [Closed] (CLOUDSTACK-8643) Helper for KVM High Availability

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wido den Hollander closed CLOUDSTACK-8643.
------------------------------------------
    Resolution: Won't Fix

> Helper for KVM High Availability
> --------------------------------
>
>                 Key: CLOUDSTACK-8643
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8643
>             Project: CloudStack
>          Issue Type: Improvement
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM, Management Server
>         Environment: KVM hypervisors
>            Reporter: Wido den Hollander
>              Labels: fence, high-availability, kvm, libvirt
>             Fix For: Future
>
>
> When running KVM with NFS storage all Agents will write a heartbeat to the NFS.
> Should a Agent go down, it will still be writing heartbeats even if libvirt has died.
> Using these heartbeats the Management Server can ask other KVM Agents if the other server is still beating. If not, it can fence it.
> While this works I've also encountered scenarios where you run without NFS and still want investigators.
> My proposal would be a Agent Helper running NEXT to the Agent it self.
> A simple Python daemon running a Basic HTTP server which queries libvirt every X seconds about:
> * Running Instances
> * Storage pools
> If keeps this in memory, so that even when libvirt goes down it knows what the last state was.
> Using the Qemu Monitor sockets we can actually see if the guests we have in memory are still online.
> If they are we simply keep the list.
> Now, if a investigator comes by and wants to know if the host is still up it can ALSO ask the helper.
> The management server can ask the helper, but the other agents could as well.
> This doesn't work in all cases, eg where storage is lost. But a additional helper would be useful to catch scenarios where the Agent itself became unresponsive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)