You are viewing a plain text version of this content. The canonical link for it is here.
Posted to builds@apache.org by Lance Albertson <la...@osuosl.org> on 2018/06/13 14:48:29 UTC

[Hosting] Unplanned outage: Hypervisor issue with gprod1 on primary Ganeti cluster

All,

At approximately 2:36AM PDT (0900 UTC), one of the hypervisors (gprod1) in
our primary Ganeti cluster started having hardware issues. This took down
all of the instances running on that node. I attempted to bring the node
back online however the hardware issue prevented it to come back online. At
that point I failed all of the VM instances over to their secondary nodes
and forced another node to become the Ganeti master (since gprod1 WAS the
master). All of the instances were back online by around 7:40AM PDT (1400
UTC).

Everything at this point seems to be back to normal (except for gprod1). I
will look into bringing gprod1 back online later today.

Thank you and sorry for the outages this caused.

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab