You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by jm...@jmason.org on 2006/05/11 11:39:04 UTC

SVN server problems (fwd)

some more info.  sounds like quite a nasty one :(

--j.

------- Forwarded Message

Date:    Wed, 10 May 2006 18:15:04 -0700
From:    "Henri Yandell" <fl...@gmail.com>
To:      "Apache Infrastructure" <in...@apache.org>
Subject: SVN server problems

Putting an email on the infra list to provide an update (from a
spectator's point of view) on today's issues.

Minotaur, which is chiefly handling the SVN repository, Apache project
sites, people.apache.org sites and login@apache.org email addresses
had a kernel panic and was acting up after the reboot (errors on
commands that should not have errored).

Things were taken down for a memtest run and the static websites (not
the dynamic tcl.apache or perl.apache) were failed over to Ajax
(failover machine in Europe). The memtest passed, and the subversion
repositories are being checked for validity while also making sure
that an up to date backup of the svn repositories is made.

Minotaur is slowly being brought back online. Sometime tonight
read-only SVN will be re-enabled; and presuming things go well I
imagine read-write and unix accounts will be turned back on. Medium
term, SVN will be moved to one of the machines that Infra have
recently been getting ready.

----

As an aside. It's very impressive to sit on #asfinfra and watch the
infra@ volunteers dealing with such things. Half a dozen people have
juggled their schedules to spend a lot of today dealing with the
problem, and I'm sure they'll be putting many more hours in to get
there.

So remember to say thanks in Dublin next month if you get the chance :)

Hen

------- End of Forwarded Message

Subject: Outage on www.apache.org, svn.apache.org, and other related infrastructure
From: "Garrett Rooney" <ro...@apache.org>
Date: Wed, 10 May 2006 22:08:07 -0700 (Thu 06:08 IST)
To: committers@apache.org
Cc: infrastructure@apache.org

	(text/plain)
We've been having a heck of a day in infrastructure land, as you may
have noticed if you tried to access the Subversion repository or any
of the web sites...

Early this morning (PDT) minotaur, the machine that hosts
svn.apache.org, the ASF web sites, people.apache.org, and various
other things, kernel paniced.  After it was brought back up there was
some odd behavior observed, random programs aborting, stuff like that.

As a result, we decided to take some action before bringing all
services back online.  The data stored on the machine has been backed
up, both to ajax (the european backup server) and to helios, another
machine in the same datacenter as minotaur.  We've also run 'svnadmin
verify' on the repositories, both on the backup copies and on minotaur
itself, to confirm that whatever is wrong with minotaur has had no ill
effects.  We also ran memtest86 tests on minotaur, and have found no
sign of memory failure, which was our primary fear.

During this time, DNS for the websites served off of minotaur was
failed over to ajax, but svn.apache.org, perl.apache.org, and
tcl.apache.org remained down.  Subversion stayed down because we
needed to verify that the repository was ok before doing anything with
it, and TCL and Perl remained down because they require special setups
that are not mirrored on ajax.

At this point we have brought minotaur back to multiuser mode, and the
mail that was backed up on it (@apache.org mailing lists and
addresses) is flowing again.  The Subversion repositories have been
verified to be ok, there was no corruption found.  Soon the Subversion
repository will be brought back up in a read-only mode.  We are also
planning on upgrading minotaur to a newer version of FreeBSD, because
our best guess for what caused the initial problem is now a kernel
bug.  Once that is done we will likely turn write access to the
Subversion repository back on.  Sometime in the near future the
Subversion repository will be moved to an entirely new machine, but
the details of when and how that will happen are still being
determined.

Thank you for your patience during this outage,

The Apache Infrastructure Team

------- End of Forwarded Message