You are viewing a plain text version of this content. The canonical link for it is here.
Posted to builds@apache.org by Lance Albertson <la...@osuosl.org> on 2018/06/30 16:21:56 UTC

[Hosting] Partial datacenter outage this morning

All,

It seems as though we had some kind of a power event at approximately
6:21AM PDT (13:21 UTC). that affected some (but not all)  of our hosts. At
this point I'm not entirely sure what happened but my guess that one of the
power circuits went down and then came back online. This is confusing since
the UPS should have prevented that. I'm going to be heading into the
datacenter soon to do a visual inspection.

If you have any hosts that are offline and need me to help bring them back,
please send an email to support@osuosl.org and I will take a look. Feel
free to also reach out on IRC at #osuosl.

Thanks-

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab

Re: [Hosting] Partial datacenter outage this morning

Posted by Lance Albertson <la...@osuosl.org>.
Looks like we had another power event while they were trying to fix the UPS
today. We didn't have any outages except for one project machine which was
a single PSU host. Apologies if this affected anyone's hosts. Hopefully
this is the last of this!

Thanks-

​​​---------- Forwarded message ----------
From: Fowler, Stephen Lee <st...@oregonstate.edu>
Date: Mon, Jul 2, 2018 at 2:42 PM
Subject: Re: [Kerr_b210-announce] Saturday Power issue

All,

We had the maintenance techs in and it turns out that we had a battery
short when the UPS tried to take the load.  That resulted in a momentary
power loss until the generator was able to spin up and provide power.
While we do have these units tested on a regular basis there is no way to
predict when a battery is going give up the fight.  The battery has been
replaced and the unit returned to normal operation.



On Mon, Jul 2, 2018 at 10:29 AM, Lance Albertson <la...@osuosl.org> wrote:

> FYI: I got the following regarding the power event on Saturday morning.
>
> ​---------- Forwarded message ----------
> From: Fowler, Stephen Lee <st...@oregonstate.edu>
> Date: Mon, Jul 2, 2018 at 10:26 AM
> Subject: [Kerr_b210-announce] Saturday Power issue
>
> All,
>
>
> I learned after the fact that we had a power event on Saturday that
> affected power in B210.  I did see that the generator came on line, but I
> did not get any alerts from the other units in that power chain.  Further
> investigation revealed that one of the UPS suffered an inverter fault that
> is likely the cause of some systems losing power.  While we monitor the
> systems in B210 we did not receive any errors from the UPS themselves, so I
> was not aware there had been an issue.
>
> What is happening:
>
> I have engaged the UPS maintenance service to investigate and repair the
> faulty UPS.  I will also be talking with them about the logging and
> notification failure of both units.
>
>
> On Sat, Jun 30, 2018 at 9:21 AM, Lance Albertson <la...@osuosl.org> wrote:
>
>> All,
>>
>> It seems as though we had some kind of a power event at approximately
>> 6:21AM PDT (13:21 UTC). that affected some (but not all)  of our hosts. At
>> this point I'm not entirely sure what happened but my guess that one of the
>> power circuits went down and then came back online. This is confusing since
>> the UPS should have prevented that. I'm going to be heading into the
>> datacenter soon to do a visual inspection.
>>
>> If you have any hosts that are offline and need me to help bring them
>> back, please send an email to support@osuosl.org and I will take a look.
>> Feel free to also reach out on IRC at #osuosl.
>>
>> Thanks-
>>
>> --
>> Lance Albertson
>> Director
>> Oregon State University | Open Source Lab
>>
>
>
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>



-- 
Lance Albertson
Director
Oregon State University | Open Source Lab

Re: [Hosting] Partial datacenter outage this morning

Posted by Lance Albertson <la...@osuosl.org>.
FYI: I got the following regarding the power event on Saturday morning.

​---------- Forwarded message ----------
From: Fowler, Stephen Lee <st...@oregonstate.edu>
Date: Mon, Jul 2, 2018 at 10:26 AM
Subject: [Kerr_b210-announce] Saturday Power issue

All,


I learned after the fact that we had a power event on Saturday that
affected power in B210.  I did see that the generator came on line, but I
did not get any alerts from the other units in that power chain.  Further
investigation revealed that one of the UPS suffered an inverter fault that
is likely the cause of some systems losing power.  While we monitor the
systems in B210 we did not receive any errors from the UPS themselves, so I
was not aware there had been an issue.

What is happening:

I have engaged the UPS maintenance service to investigate and repair the
faulty UPS.  I will also be talking with them about the logging and
notification failure of both units.


On Sat, Jun 30, 2018 at 9:21 AM, Lance Albertson <la...@osuosl.org> wrote:

> All,
>
> It seems as though we had some kind of a power event at approximately
> 6:21AM PDT (13:21 UTC). that affected some (but not all)  of our hosts. At
> this point I'm not entirely sure what happened but my guess that one of the
> power circuits went down and then came back online. This is confusing since
> the UPS should have prevented that. I'm going to be heading into the
> datacenter soon to do a visual inspection.
>
> If you have any hosts that are offline and need me to help bring them
> back, please send an email to support@osuosl.org and I will take a look.
> Feel free to also reach out on IRC at #osuosl.
>
> Thanks-
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>



-- 
Lance Albertson
Director
Oregon State University | Open Source Lab