You are viewing a plain text version of this content. The canonical link for it is here.
Posted to builds@apache.org by Lance Albertson <la...@osuosl.org> on 2022/12/10 18:23:38 UTC

[Hosting] [OUTAGE]: Backbone switch reboot - Dec 10, 2022 5:01-5:08AM PST (1301-1308 UTC)

All,

Early this morning at around 5:01AM PST (1301 UTC) one of our backbone
switches (oslsw1 - Cisco Nexus) had a kernel panic and rebooted itself.
This has been an ongoing problem that I've described before unfortunately.
All services came back online at around 5:08AM PST (1308 UTC) other than a
few services that were impacted by this outage that I just fixed a few
minutes ago.

As I mentioned before, our long term plan is to completely migrate off of
this switch.

Here is where we stand currently with that:

   - Two (2) "new" edge switches that are ready to replace two (2) of the
   three (3) connected to this troublesome switch
   - We will start migrating internal OSL hosts next week to these switches
   (which will fix the secondary issues we have to manually fix when this
   switch reboots)
   - After we've completed migrating our internal hosts, we'll start
   migrating project hosts to these switches

Longer term, we still need to either purchase or have some Arista 48 port
1G edge switches donated to complete this process. We will need an
additional eight (8) Arista 48 port 1G edge switches to fully replace our
aging network backbone. This would also get us off of an ancient Cisco 6509
system that we're still paying a support contract for!

Related to all of this above, we do need to have another network
maintenance window in the coming weeks to fully migrate to the new set of
core switches. I'll send another email out about that when I'm ready to
make that happen.

Thanks and Happy Holidays everyone!

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab