You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by "Alejandro Z. Tomsic" <al...@gmail.com> on 2013/04/09 13:45:24 UTC

how is failure detection achieved in cloudstack?

I would like to know how the process of failure detection is achieved in
cloudStack (if any). I would like to know about the implementation details,
i.e. if its done at physical, virtual machine or at application level. Does
cloudStack use any known failure detection mechanisms? e.g. [1][2][3][4] or
any other. Where can I find this information?

Thank you in advance.

Alejandro




[1] M.Bertier,O.Marin,andP.Sens.Implementation and performance evaluation
of an adaptable failure detector. In International Conference on Dependable
Systems and Networks (DSN), pages 354–363, June 2002.

[2] W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of
failure detectors. IEEE Transactions on Computers, 51(5):561–580, May 2002.

[3] N. Hayashibara, X. De ́fago, R. Yared, and T. Katayama. The φ accrual
failure detector. In IEEE Symposium on Reliable Distributed Systems (SRDS),
pages 66–78, Oct. 2004.

[4] Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael
Walfish. 2011. Detecting failures in distributed systems with the Falcon
spy network. In Proceedings of the Twenty-Third ACM Symposium on Operating
Systems Principles (SOSP '11). ACM, New York, NY, USA, 279-294.

Re: how is failure detection achieved in cloudstack?

Posted by Chip Childers <ch...@sungard.com>.

On Tue, Apr 09, 2013 at 01:45:24PM +0200, Alejandro Z. Tomsic wrote:
> I would like to know how the process of failure detection is achieved in
> cloudStack (if any). I would like to know about the implementation details,
> i.e. if its done at physical, virtual machine or at application level. Does
> cloudStack use any known failure detection mechanisms? e.g. [1][2][3][4] or
> any other. Where can I find this information?

Failure detection of *what*?

There is a feature to enable "HA" (which is incorrectly named frankly)
for certain compute offerings.  When that feature is enabled, and the
underlying host dies, the affected VMs will be restarted on another
host.

> 
> Thank you in advance.
> 
> Alejandro
> 
> 
> 
> 
> [1] M.Bertier,O.Marin,andP.Sens.Implementation and performance evaluation
> of an adaptable failure detector. In International Conference on Dependable
> Systems and Networks (DSN), pages 354–363, June 2002.
> 
> [2] W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of
> failure detectors. IEEE Transactions on Computers, 51(5):561–580, May 2002.
> 
> [3] N. Hayashibara, X. De ́fago, R. Yared, and T. Katayama. The φ accrual
> failure detector. In IEEE Symposium on Reliable Distributed Systems (SRDS),
> pages 66–78, Oct. 2004.
> 
> [4] Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael
> Walfish. 2011. Detecting failures in distributed systems with the Falcon
> spy network. In Proceedings of the Twenty-Third ACM Symposium on Operating
> Systems Principles (SOSP '11). ACM, New York, NY, USA, 279-294.