You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-dev@apache.org by Paul Querna <pa...@querna.org> on 2009/08/23 04:55:04 UTC

long term goal: reliable services for developers

(CC'ed to infra-private to get eyes, please discuss on infra-dev)

The ASF Infra Team had a goal over the years ago to remove as many
single points of failure on public facing services as possible -- and
today, you see the results.  The Websites, Version Control, E-Mail,
traditionally our 'core' services are all redundant to multiple data
centers in the United States and Europe.  It was not a quick process,
it was not painless, but today we sit pretty happily for most public
facing services.

The ASF of 5 years ago, when having public facing services redundant
was enough, is not the ASF of today.

We are pushing 2300+ committers to with tens of thousands of accounts
on the Wikis and Issue trackers.  It is feasible to see the ASF hit
100+ Top level projects before long -- so many that there is always
someone doing a release, development work is always going on.

I believe our next goal should be to make all Developer facing
services more redundant and reliable. This includes both more public
facing ones like issue trackers, to private ones like shell accounts.

Recently the disk array on minotaur aka people.apache.org has been
having problems.  We can go into other threads about minotaurs ZFS
issues, but it has exposed how we have abused people.apache.org as a
shortcut to provide many services.

When minotaur.apache.org is having problems, the following services
are disrupted:
 1) people.apache.org website
 2) people.apache.org/~userid websites
 3) planet.apache.org website (ease of access)
 4) Maven Repositories (ease of access)
 5) E-Mail Forwarding of userid@apache.org
 6) Mirror Network Seed  (ease of access)
 7) TLP & www.apache.org Website Seeds (ease of access)
 8) DNS Hidden Master

When brutus.apache.org is having problems:
 9) issues.apache.org Website & Databases
   9a) JIRA
   9b) Bugzilla
 10) cwiki.apache.org (Confluence)

When eos.apache.org is having problems:
 11) wiki.apache.org (MoinMoin)  [semi-mirrored to eu, but not trivial]

A common theme for many of the services hosted on minotaur is using
them as seeds, due to every committer having an account there.  With
LDAP backends coming online, we can start to use other methods of
authentication for committers.   The biggest bang for our buck would
be to figure out how to distribute files in a distributed way, without
needing a centralized host. (Website Seeds, Mirror Network Seeds,
Maven Repositories).

What can you do to help?

Pick one service, figure out a plan to make it reliable, preferably
hosted in both EU and the US., recruit volunteers to help you out, and
make it happen.

We have a budget, we will spend it on hardware as needed.  This won't
all be done in 1 year, but fixing one services at a time will put us
into a better place eventually.

Thoughts?

Thanks,

Paul

RE: long term goal: reliable services for developers

Posted by Gavin <ga...@16degrees.com.au>.

> -----Original Message-----
> From: Paul Querna [mailto:paul@querna.org]
> Sent: Tuesday, 25 August 2009 7:03 PM
> To: infrastructure-dev@apache.org
> Subject: Re: long term goal: reliable services for developers
> 
> On Tue, Aug 25, 2009 at 1:41 AM, Tony Stevenson<to...@pc-tony.com> wrote:
> >
> > On 25 Aug 2009, at 09:31, Jukka Zitting wrote:
> >
> >> Hi,
> >>
> >> On Sun, Aug 23, 2009 at 4:55 AM, Paul Querna<pa...@querna.org> wrote:
> >>>
> >>>  7) TLP & www.apache.org Website Seeds (ease of access)
> >>
> >> Picking up one item that's been bugging me. The idea of deploying
> >> sites directly from svn was already mentioned, and I'd also like to
> >> simplify the ways in which projects can set up CI builds for site
> >> deployment.
> >
> > The problem is that there are several methods of deployment at the
> moment.
> >  i.e. XML->HTML (ala httpd) Confluence exports
> > To rationalise these into one kind of site would be extremely difficult
> and
> > ultimately a laborious task which I am not sure many folks will be all
> that
> > willing to be involved in.
> >
> > For example to convert the httpd to confluence would not go down to
> well,
> > due to the sheer size and nature of the content.
> >
> > The inverse is true too, I'm sure, for sites like the spamassassin,
> > converting them to the xml->html (via ant) may not go down so well.
> 
> Exactly, there is not a one size fit all for websites, but I do
> believe we can offer multiple options.
> 
> The only one currently available: Put static files onto
> people.apache.org, and once an hour they get rsync'ed to the live
> machines.
> 
> Adding direct subversion pulls, which would just map a path in
> subversion -> path on live machines, and automatically sync them.
> This is another option TLPs could consider for their site.
> 
> In the long run it would be nice to have an alternative file store,
> like maybe a WebDav URL, which a TLP can push their static files to,
> and it would be synced to all the web servers right away.  Maybe even
> WebDAV backed by a replicated Subversion server :)

Just to mention yet another option recently added, that is any projects that
want to make use of Buildbot to automatically build their sites from svn can
do so.

For instance log4php uses it, their site docs, api, etc all get built from
svn commit changes, previewed at (http://ci.apache.org/projects/log4php/)
and then that is synced across to /www/buildbot-exports/$project/ where the
project then via a cron job syncs this to the proper location. Jukka
mentions CI making it easier for projects to deploy -- this can not be made
any easier for the project, just ask and it shall be done! (It is not direct
to eos/aurora yet but I think we can do that)

Gav...

> 
> Thoughts?



Re: long term goal: reliable services for developers

Posted by chris <ch...@ia.gov>.
Daniel Kulp wrote:
> On Wed August 26 2009 9:39:22 am Justin Erenkrantz wrote:
>> 2009/8/26 Daniel Kulp <dk...@apache.org>:
>>> Is the /www area on people.apache.org really considered a staging site?  
>>> As far as I know, I cannot (easily) point a browser at it to make sure
>>> the output looks correct and such.    Maybe setup https on p.a.o to
>>> direct things like http://cxf.staging.apache.org
>>> to the appropriate /www dir?
>> Set your browser's proxy to p.a.o.  -- justin
> 
> Interesting.   I learned something new.   :-)
> 
> Kind of a pain that you have to reset browser settings and obviously wouldn't 
> work if your already behind a proxy, but good to know.   Thanks!
> 

Use a local proxy.pac file to automagically choose a proxy based on destination.  Example:

function FindProxyForURL(url, host)
        {
        if (shExpMatch(url,"cxf.staging.apache.org"))              {return "people.apache.org:3128";}
	else { return "DIRECT";}
	}


That else could also be an default local proxy like so:
	else { return "someotherproxy.com:3128";}

You define this in the "auto configuration url" setting in most browsers proxy config area.

If you already have to go out via a proxy, google "proxy chaining" to see how to make that work.  There's different
tricks depending on what type of proxies are in use.

crr/arryder












Re: long term goal: reliable services for developers

Posted by Daniel Kulp <dk...@apache.org>.
On Wed August 26 2009 9:39:22 am Justin Erenkrantz wrote:
> 2009/8/26 Daniel Kulp <dk...@apache.org>:
> > Is the /www area on people.apache.org really considered a staging site?  
> > As far as I know, I cannot (easily) point a browser at it to make sure
> > the output looks correct and such.    Maybe setup https on p.a.o to
> > direct things like http://cxf.staging.apache.org
> > to the appropriate /www dir?
>
> Set your browser's proxy to p.a.o.  -- justin

Interesting.   I learned something new.   :-)

Kind of a pain that you have to reset browser settings and obviously wouldn't 
work if your already behind a proxy, but good to know.   Thanks!

-- 
Daniel Kulp
dkulp@apache.org
http://www.dankulp.com/blog

Re: long term goal: reliable services for developers

Posted by Tony Stevenson <to...@pc-tony.com>.
On 26 Aug 2009, at 17:20, Paul Querna wrote:

> worked magically

Really?  Magic.  Great.

:-)




Cheers,
Tony


--------------------------------------------
Tony Stevenson

tony@pc-tony.com - pctony@apache.org
pctony@freenode.net - tony@caret.cam.ac.uk

http://blog.pc-tony.com

1024D/51047D66 ECAF DC55 C608 5E82 0B5E
3359 C9C7 924E 5104 7D66
--------------------------------------------






Re: long term goal: reliable services for developers

Posted by Paul Querna <pa...@querna.org>.
On Wed, Aug 26, 2009 at 6:39 AM, Justin Erenkrantz<ju...@erenkrantz.com> wrote:
> 2009/8/26 Daniel Kulp <dk...@apache.org>:
>> Is the /www area on people.apache.org really considered a staging site?   As
>> far as I know, I cannot (easily) point a browser at it to make sure the output
>> looks correct and such.    Maybe setup https on p.a.o to direct things like
>> http://cxf.staging.apache.org
>> to the appropriate /www dir?
>
> Set your browser's proxy to p.a.o.  -- justin
>

imo its a hack.

I'd rather just have a "apr.staging.apache.org" or somehting that
worked magically.

Re: long term goal: reliable services for developers

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
2009/8/26 Daniel Kulp <dk...@apache.org>:
> Is the /www area on people.apache.org really considered a staging site?   As
> far as I know, I cannot (easily) point a browser at it to make sure the output
> looks correct and such.    Maybe setup https on p.a.o to direct things like
> http://cxf.staging.apache.org
> to the appropriate /www dir?

Set your browser's proxy to p.a.o.  -- justin

Re: long term goal: reliable services for developers

Posted by Paul Davis <pa...@gmail.com>.
> Several times now I've had a couple pages that "worked" in the Confluence
> staging are (cwiki.apache.org/CXF), but when synced off of cwiki, they stopped
> working.   Between the delay in the sync from cwiki -> people and the delay
> for my cron to sync from the confluence->export to /www/cxf and then the delay
> syncing from /www/cxf to the live site, it was SEVERAL hours before we figured
> it out, and several more before a fix would sync.
>

Personally I'd rather avoid a staging area for this reason alone.
Perhaps this could be something that'd be configurable per project as
part of the CI setup?

Paul Davis

Re: long term goal: reliable services for developers

Posted by Daniel Kulp <dk...@apache.org>.
 
Is the /www area on people.apache.org really considered a staging site?   As 
far as I know, I cannot (easily) point a browser at it to make sure the output 
looks correct and such.    Maybe setup https on p.a.o to direct things like
http://cxf.staging.apache.org 
to the appropriate /www dir?

Several times now I've had a couple pages that "worked" in the Confluence 
staging are (cwiki.apache.org/CXF), but when synced off of cwiki, they stopped 
working.   Between the delay in the sync from cwiki -> people and the delay 
for my cron to sync from the confluence->export to /www/cxf and then the delay 
syncing from /www/cxf to the live site, it was SEVERAL hours before we figured 
it out, and several more before a fix would sync.

Dan


On Wed August 26 2009 4:49:34 am Justin Mason wrote:
> On Wed, Aug 26, 2009 at 09:23, Jukka Zitting<ju...@gmail.com> wrote:
> > Hi,
> >
> > On Tue, Aug 25, 2009 at 6:30 PM, Martin Cooper<ma...@apache.org> wrote:
> >> I'm all for speedy and automated deployment, but I just want to throw
> >> in one thing that has been deemed important in previous discussions of
> >> this topic (of which there have been many, over the years). That is
> >> the notion of having a staging area for proofing prior to live
> >> deployment. Once the site is built, it should be made available
> >> somewhere so that it can be checked over by a real live person before
> >> going live. Obviously this is to prevent inadvertent live site
> >> screw-ups.
> >
> > In all the CI site build setups I've created, the CI build simply runs
> > the same site build command that the committer uses locally to check
> > the generated site. And the CI builds will only deploy the site if the
> > build command finishes successfully, so if you're paranoid you could
> > even add explicit site tests to the process.
>
> The issues I'd imagine (based on my experience) would be:
>
> 1. typos
> 2. unclosed tags, e.g. bolding extending to the end of the paragraph
> 3. accidentally-broken links, e.g. if the closer.cgi link format is screwed
> up
>
> Writing automated tests for that kind of thing is hard to do and they
> can be hard to anticipate.  I'd prefer to be able to check on a
> staging site, as Martin suggests.

-- 
Daniel Kulp
dkulp@apache.org
http://www.dankulp.com/blog

Re: long term goal: reliable services for developers

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Aug 26, 2009 at 10:49 AM, Justin Mason<jm...@jmason.org> wrote:
> Writing automated tests for that kind of thing is hard to do and they
> can be hard to anticipate.  I'd prefer to be able to check on a
> staging site, as Martin suggests.

"mvn site:run" is what I use.

BR,

Jukka Zitting

Re: long term goal: reliable services for developers

Posted by Justin Mason <jm...@jmason.org>.
On Wed, Aug 26, 2009 at 09:23, Jukka Zitting<ju...@gmail.com> wrote:
> Hi,
>
> On Tue, Aug 25, 2009 at 6:30 PM, Martin Cooper<ma...@apache.org> wrote:
>> I'm all for speedy and automated deployment, but I just want to throw
>> in one thing that has been deemed important in previous discussions of
>> this topic (of which there have been many, over the years). That is
>> the notion of having a staging area for proofing prior to live
>> deployment. Once the site is built, it should be made available
>> somewhere so that it can be checked over by a real live person before
>> going live. Obviously this is to prevent inadvertent live site
>> screw-ups.
>
> In all the CI site build setups I've created, the CI build simply runs
> the same site build command that the committer uses locally to check
> the generated site. And the CI builds will only deploy the site if the
> build command finishes successfully, so if you're paranoid you could
> even add explicit site tests to the process.

The issues I'd imagine (based on my experience) would be:

1. typos
2. unclosed tags, e.g. bolding extending to the end of the paragraph
3. accidentally-broken links, e.g. if the closer.cgi link format is screwed up

Writing automated tests for that kind of thing is hard to do and they
can be hard to anticipate.  I'd prefer to be able to check on a
staging site, as Martin suggests.

-- 
--j.

Re: long term goal: reliable services for developers

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Aug 25, 2009 at 6:30 PM, Martin Cooper<ma...@apache.org> wrote:
> I'm all for speedy and automated deployment, but I just want to throw
> in one thing that has been deemed important in previous discussions of
> this topic (of which there have been many, over the years). That is
> the notion of having a staging area for proofing prior to live
> deployment. Once the site is built, it should be made available
> somewhere so that it can be checked over by a real live person before
> going live. Obviously this is to prevent inadvertent live site
> screw-ups.

In all the CI site build setups I've created, the CI build simply runs
the same site build command that the committer uses locally to check
the generated site. And the CI builds will only deploy the site if the
build command finishes successfully, so if you're paranoid you could
even add explicit site tests to the process.

BR,

Jukka Zitting

RE: long term goal: reliable services for developers

Posted by Gavin <ga...@16degrees.com.au>.

> -----Original Message-----
> From: mfncooper@gmail.com [mailto:mfncooper@gmail.com] On Behalf Of Martin
> Cooper
> Sent: Wednesday, 26 August 2009 2:30 AM
> To: infrastructure-dev@apache.org
> Subject: Re: long term goal: reliable services for developers
> 
> On Tue, Aug 25, 2009 at 2:03 AM, Paul Querna<pa...@querna.org> wrote:
> > On Tue, Aug 25, 2009 at 1:41 AM, Tony Stevenson<to...@pc-tony.com> wrote:
> >>
> >> On 25 Aug 2009, at 09:31, Jukka Zitting wrote:
> >>
> >>> Hi,
> >>>
> >>> On Sun, Aug 23, 2009 at 4:55 AM, Paul Querna<pa...@querna.org> wrote:
> >>>>
> >>>>  7) TLP & www.apache.org Website Seeds (ease of access)
> >>>
> >>> Picking up one item that's been bugging me. The idea of deploying
> >>> sites directly from svn was already mentioned, and I'd also like to
> >>> simplify the ways in which projects can set up CI builds for site
> >>> deployment.
> >>
> >> The problem is that there are several methods of deployment at the
> moment.
> >>  i.e. XML->HTML (ala httpd) Confluence exports
> >> To rationalise these into one kind of site would be extremely difficult
> and
> >> ultimately a laborious task which I am not sure many folks will be all
> that
> >> willing to be involved in.
> >>
> >> For example to convert the httpd to confluence would not go down to
> well,
> >> due to the sheer size and nature of the content.
> >>
> >> The inverse is true too, I'm sure, for sites like the spamassassin,
> >> converting them to the xml->html (via ant) may not go down so well.
> >
> > Exactly, there is not a one size fit all for websites, but I do
> > believe we can offer multiple options.
> >
> > The only one currently available: Put static files onto
> > people.apache.org, and once an hour they get rsync'ed to the live
> > machines.
> >
> > Adding direct subversion pulls, which would just map a path in
> > subversion -> path on live machines, and automatically sync them.
> > This is another option TLPs could consider for their site.
> >
> > In the long run it would be nice to have an alternative file store,
> > like maybe a WebDav URL, which a TLP can push their static files to,
> > and it would be synced to all the web servers right away.  Maybe even
> > WebDAV backed by a replicated Subversion server :)
> >
> > Thoughts?
> 
> I'm all for speedy and automated deployment, but I just want to throw
> in one thing that has been deemed important in previous discussions of
> this topic (of which there have been many, over the years). That is
> the notion of having a staging area for proofing prior to live
> deployment. Once the site is built, it should be made available
> somewhere so that it can be checked over by a real live person before
> going live. Obviously this is to prevent inadvertent live site
> screw-ups.
> 
> Anyway, just wanted to throw that into the 'requirements' pile and
> make sure it's not missed.

The ASF Buildbot does that already within a few seconds of a commit build
(see link in my other post on this thread), not sure about Hudson etc, I'm
sure they could.

BTW, seems to me that having a staging area as a requirement counteracts the
requirement for immediate syncing to the live web servers. So our current 1
hour delay seems appropriate given the need to check a staging area first.

Gav...

> 
> --
> Martin Cooper



Re: long term goal: reliable services for developers

Posted by Martin Cooper <ma...@apache.org>.
On Tue, Aug 25, 2009 at 2:03 AM, Paul Querna<pa...@querna.org> wrote:
> On Tue, Aug 25, 2009 at 1:41 AM, Tony Stevenson<to...@pc-tony.com> wrote:
>>
>> On 25 Aug 2009, at 09:31, Jukka Zitting wrote:
>>
>>> Hi,
>>>
>>> On Sun, Aug 23, 2009 at 4:55 AM, Paul Querna<pa...@querna.org> wrote:
>>>>
>>>>  7) TLP & www.apache.org Website Seeds (ease of access)
>>>
>>> Picking up one item that's been bugging me. The idea of deploying
>>> sites directly from svn was already mentioned, and I'd also like to
>>> simplify the ways in which projects can set up CI builds for site
>>> deployment.
>>
>> The problem is that there are several methods of deployment at the moment.
>>  i.e. XML->HTML (ala httpd) Confluence exports
>> To rationalise these into one kind of site would be extremely difficult and
>> ultimately a laborious task which I am not sure many folks will be all that
>> willing to be involved in.
>>
>> For example to convert the httpd to confluence would not go down to well,
>> due to the sheer size and nature of the content.
>>
>> The inverse is true too, I'm sure, for sites like the spamassassin,
>> converting them to the xml->html (via ant) may not go down so well.
>
> Exactly, there is not a one size fit all for websites, but I do
> believe we can offer multiple options.
>
> The only one currently available: Put static files onto
> people.apache.org, and once an hour they get rsync'ed to the live
> machines.
>
> Adding direct subversion pulls, which would just map a path in
> subversion -> path on live machines, and automatically sync them.
> This is another option TLPs could consider for their site.
>
> In the long run it would be nice to have an alternative file store,
> like maybe a WebDav URL, which a TLP can push their static files to,
> and it would be synced to all the web servers right away.  Maybe even
> WebDAV backed by a replicated Subversion server :)
>
> Thoughts?

I'm all for speedy and automated deployment, but I just want to throw
in one thing that has been deemed important in previous discussions of
this topic (of which there have been many, over the years). That is
the notion of having a staging area for proofing prior to live
deployment. Once the site is built, it should be made available
somewhere so that it can be checked over by a real live person before
going live. Obviously this is to prevent inadvertent live site
screw-ups.

Anyway, just wanted to throw that into the 'requirements' pile and
make sure it's not missed.

--
Martin Cooper

Re: long term goal: reliable services for developers

Posted by Paul Querna <pa...@querna.org>.
On Tue, Aug 25, 2009 at 1:41 AM, Tony Stevenson<to...@pc-tony.com> wrote:
>
> On 25 Aug 2009, at 09:31, Jukka Zitting wrote:
>
>> Hi,
>>
>> On Sun, Aug 23, 2009 at 4:55 AM, Paul Querna<pa...@querna.org> wrote:
>>>
>>>  7) TLP & www.apache.org Website Seeds (ease of access)
>>
>> Picking up one item that's been bugging me. The idea of deploying
>> sites directly from svn was already mentioned, and I'd also like to
>> simplify the ways in which projects can set up CI builds for site
>> deployment.
>
> The problem is that there are several methods of deployment at the moment.
>  i.e. XML->HTML (ala httpd) Confluence exports
> To rationalise these into one kind of site would be extremely difficult and
> ultimately a laborious task which I am not sure many folks will be all that
> willing to be involved in.
>
> For example to convert the httpd to confluence would not go down to well,
> due to the sheer size and nature of the content.
>
> The inverse is true too, I'm sure, for sites like the spamassassin,
> converting them to the xml->html (via ant) may not go down so well.

Exactly, there is not a one size fit all for websites, but I do
believe we can offer multiple options.

The only one currently available: Put static files onto
people.apache.org, and once an hour they get rsync'ed to the live
machines.

Adding direct subversion pulls, which would just map a path in
subversion -> path on live machines, and automatically sync them.
This is another option TLPs could consider for their site.

In the long run it would be nice to have an alternative file store,
like maybe a WebDav URL, which a TLP can push their static files to,
and it would be synced to all the web servers right away.  Maybe even
WebDAV backed by a replicated Subversion server :)

Thoughts?

Re: long term goal: reliable services for developers

Posted by Tony Stevenson <to...@pc-tony.com>.
On 25 Aug 2009, at 09:31, Jukka Zitting wrote:

> Hi,
>
> On Sun, Aug 23, 2009 at 4:55 AM, Paul Querna<pa...@querna.org> wrote:
>>  7) TLP & www.apache.org Website Seeds (ease of access)
>
> Picking up one item that's been bugging me. The idea of deploying
> sites directly from svn was already mentioned, and I'd also like to
> simplify the ways in which projects can set up CI builds for site
> deployment.

The problem is that there are several methods of deployment at the  
moment.  i.e. XML->HTML (ala httpd) Confluence exports
To rationalise these into one kind of site would be extremely  
difficult and ultimately a laborious task which I am not sure many  
folks will be all that willing to be involved in.

For example to convert the httpd to confluence would not go down to  
well, due to the sheer size and nature of the content.

The inverse is true too, I'm sure, for sites like the spamassassin,  
converting them to the xml->html (via ant) may not go down so well.



>
> I unfortunately don't know much about the current setup beyond the
> /www subtree on people.apache.org, but I'd be happy to help with this
> task in any way I can.
>
> BR,
>
> Jukka Zitting
>




Cheers,
Tony


--------------------------------------------
Tony Stevenson

tony@pc-tony.com - pctony@apache.org
pctony@freenode.net - tony@caret.cam.ac.uk

http://blog.pc-tony.com

1024D/51047D66 ECAF DC55 C608 5E82 0B5E
3359 C9C7 924E 5104 7D66
--------------------------------------------






Re: long term goal: reliable services for developers

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Sun, Aug 23, 2009 at 4:55 AM, Paul Querna<pa...@querna.org> wrote:
>  7) TLP & www.apache.org Website Seeds (ease of access)

Picking up one item that's been bugging me. The idea of deploying
sites directly from svn was already mentioned, and I'd also like to
simplify the ways in which projects can set up CI builds for site
deployment.

I unfortunately don't know much about the current setup beyond the
/www subtree on people.apache.org, but I'd be happy to help with this
task in any way I can.

BR,

Jukka Zitting

Re: long term goal: reliable services for developers

Posted by Graham Leggett <mi...@sharp.fm>.
Paul Querna wrote:

> What can you do to help?
> 
> Pick one service, figure out a plan to make it reliable, preferably
> hosted in both EU and the US., recruit volunteers to help you out, and
> make it happen.
> 
> We have a budget, we will spend it on hardware as needed.  This won't
> all be done in 1 year, but fixing one services at a time will put us
> into a better place eventually.
> 
> Thoughts?

A solid goal with a clear plan to get there, +1.

Regards,
Graham
--