You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mirrors@apache.org by Brian Behlendorf <br...@hyperreal.com> on 1997/06/13 04:30:35 UTC

some updates

The release of 1.2.0 caused an unprecedented amount of traffic on our
site; so much so that the bandwidth provider for www.apache.org has
begun to grumble.  To address this, we have taken some measures to bolster
the effectiveness of the site in supporting mirrors:

1) The home page has been rearranged to make the mirror link more
prominent.

2) The "Download" link has been turned into a CGI script which selects
from the list of mirrors those most appropriate for your domain, based
upon the country-code of the requesting agent.  This list is also
randomized to rotate priority between the selected mirrors. If it can't
find any it gives you a default listing of the US addresses.  

3) The "List of mirror sites" has also been turned into a CGI script.

4) I have turned on "Expires" headers for the whole site, meaning content
will stay in proxy caches (and ideally, ProxyPass-based mirrors) for 24
hours after being accessed.  

This of course has some interesting ramifications:

1) We will now be running CGI scripts on mirror sites.  Previously all CGI
scripts, such as the search field and bug database, had an explicit link
back to www.apache.org.  These CGI scripts only rely upon perl 4 (or 5)
being at "/usr/local/bin/perl".  Is this a problem?

2) Sites which pull down their content via ProxyPass will not have
dynamically generated mirror pages, though they should be cacheable.


There may be more issues brought up by this than I anticipated - let's
discuss them here.  Thanks again to everyone for providing mirror sites,
hopefully the storm will die down soon.

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hyperreal.com     http://www.apache.org     http://www.organic.com/jobs


Re: some updates

Posted by Ira Abramov <ir...@scso.com>.
On Fri, 13 Jun 1997, Brian Behlendorf wrote:

> One other thing this has brought up - apparently some mirrors are not
> preserving file permissions, so some of the CGI's are apprently not
> executable.  In addition, the "index.cgi" in the /mirrors/ directory
> is not being executed.  Looks like we have two more dependencies:
> 
> .cgi being able to run anywhere
> DirectoryIndex index
> Options Multiviews turned on.
> 
> Eek.  This fancy mirror strategy is looking to be a boondoggle.  What
> do you all think?

for better service you need more complicated setups. it's not a lot of
work, just a small change in the configs and well worth the bother. All
for the promotion of GNU software and Bill's demise :-)

> > I hope I'm not out of my line here: is this method at all realistic for
> > this use? sounds to me like a cool way to mirror joe's page of jokes, not
> > a busy site such as Apache is.
> 
> Not out of line at all.  In theory, this should be a very slick

Many surfers -> many hits -> slower response -> nervous surfers clicking
"reload" a lot -> cache server polls master site again (and no expire
headers will help here).

My apache mirror doesn't take binary distribs and old sources, just the
website, latest sources and the contrib, I think 6 megs cover most of my
pollers' needs. not worth hacking around the proxy configuration for this,
IMHO.

> Right.  I am so used to sprinkling .cgi's throughout the site, I
> didn't even think about that.  I had abolished /cgi-bin/ long ago :)
> The problem is that if we do that, everyone will have a different
> /cgi-bin/ directory they'll have to configure, since many people have
> their apache mirror a few directory levels down.  That will make
> several things difficult.

ok, sprinkled style script setup (SSSS from now on? :) it is. Throw in the
.htaccess files, make sure over-ride is permited, add one line in
pmirror's config, hey presto.




   -------------------------------------------------------------
   Ira Abramov          <ir...@scso.com>        Scalable Solutions
   POBox 3600, Jerusalem 91035, Israel       Tel (972)2-642-6822
   http://www.scso.com/~ira   Check out: http://www.linux.org.il



Re: some updates

Posted by Brian Behlendorf <br...@organic.com>.
oops, got cut off

> > 2) Sites which pull down their content via ProxyPass will not have
> > dynamically generated mirror pages, though they should be cacheable.
> 
> I hope I'm not out of my line here: is this method at all realistic for
> this use? sounds to me like a cool way to mirror joe's page of jokes, not
> a busy site such as Apache is.

Not out of line at all.  In theory, this should be a very slick way of
mirroring a site.  A request comes to you for the front page, you
query the Apache site for it, and then return it - the next request
which comes in gets the cached copy.  No worrying about cgi
configurations, etc.  The 24 hour DefaultExpire I've set on
www.apache.org content should mean that the oldest content gets is 24
hours.  We can tune that depending on how well it works, or how close
we are to a major release, etc.

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  www.apache.org  hyperreal.com  http://www.organic.com/JOBS


Re: some updates

Posted by Brian Behlendorf <br...@organic.com>.
On Fri, 13 Jun 1997, Ira Abramov wrote:
> On Thu, 12 Jun 1997, Brian Behlendorf wrote:
> 
> > The release of 1.2.0 caused an unprecedented amount of traffic on our
> > site; so much so that the bandwidth provider for www.apache.org has
> > begun to grumble.  To address this, we have taken some measures to bolster
> > the effectiveness of the site in supporting mirrors:
> 
> why am I not surprised? ;)
> 
> If netscape can do it, so can we... I think the load on apache's central
> site means that it shouldn't even be linked as a download site, (smart
> users will go directly to the FTP, but web surfers will never be turned
> directly to ftp://www.apache.org).

Heh - bandwidth has calmed down quite a bit since we've made the
changes, but this is something we'll consider.

> > 1) We will now be running CGI scripts on mirror sites.  Previously all CGI
> > scripts, such as the search field and bug database, had an explicit link
> > back to www.apache.org.  These CGI scripts only rely upon perl 4 (or 5)
> > being at "/usr/local/bin/perl".  Is this a problem?
> 
> I imagine every perl installation around has either the binary or a link
> there, to support the many scripts out there.

One other thing this has brought up - apparently some mirrors are not
preserving file permissions, so some of the CGI's are apprently not
executable.  In addition, the "index.cgi" in the /mirrors/ directory
is not being executed.  Looks like we have two more dependencies:

.cgi being able to run anywhere
DirectoryIndex index
Options Multiviews turned on.

Eek.  This fancy mirror strategy is looking to be a boondoggle.  What
do you all think?

> > 2) Sites which pull down their content via ProxyPass will not have
> > dynamically generated mirror pages, though they should be cacheable.
> 
> I hope I'm not out of my line here: is this method at all realistic for
> this use? sounds to me like a cool way to mirror joe's page of jokes, not
> a busy site such as Apache is.

Not out of line at all.  In theory, this should be a very slick

> > There may be more issues brought up by this than I anticipated - let's
> 
> first: where are the CGI's going to sit permanently? if there is a cgi-bin
> dir, we should make sure there is a ScriptAlias for it, otherwise make
> *.cgi executable in the apache dir (default for me, but not for some).

Right.  I am so used to sprinkling .cgi's throughout the site, I
didn't even think about that.  I had abolished /cgi-bin/ long ago :)
The problem is that if we do that, everyone will have a different
/cgi-bin/ directory they'll have to configure, since many people have
their apache mirror a few directory levels down.  That will make
several things difficult.

> maybe make sure through a .htaccess to cover at least SOME of the holes.
> (assuming it's allowed to override)

Hmm, maybe if we put a .htaccess in the apache directory with these
directives, things would be a lot smoother.  How many out there use
.htaccess?

	Brian
	
--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  www.apache.org  hyperreal.com  http://www.organic.com/JOBS


Re: some updates

Posted by Ira Abramov <ir...@scso.com>.
On Thu, 12 Jun 1997, Brian Behlendorf wrote:

> The release of 1.2.0 caused an unprecedented amount of traffic on our
> site; so much so that the bandwidth provider for www.apache.org has
> begun to grumble.  To address this, we have taken some measures to bolster
> the effectiveness of the site in supporting mirrors:

why am I not surprised? ;)

If netscape can do it, so can we... I think the load on apache's central
site means that it shouldn't even be linked as a download site, (smart
users will go directly to the FTP, but web surfers will never be turned
directly to ftp://www.apache.org).

> 4) I have turned on "Expires" headers for the whole site, meaning content
> will stay in proxy caches (and ideally, ProxyPass-based mirrors) for 24
> hours after being accessed.  

how about 48 hours for two weeks?

> 1) We will now be running CGI scripts on mirror sites.  Previously all CGI
> scripts, such as the search field and bug database, had an explicit link
> back to www.apache.org.  These CGI scripts only rely upon perl 4 (or 5)
> being at "/usr/local/bin/perl".  Is this a problem?

I imagine every perl installation around has either the binary or a link
there, to support the many scripts out there.

> 2) Sites which pull down their content via ProxyPass will not have
> dynamically generated mirror pages, though they should be cacheable.

I hope I'm not out of my line here: is this method at all realistic for
this use? sounds to me like a cool way to mirror joe's page of jokes, not
a busy site such as Apache is.

> There may be more issues brought up by this than I anticipated - let's

first: where are the CGI's going to sit permanently? if there is a cgi-bin
dir, we should make sure there is a ScriptAlias for it, otherwise make
*.cgi executable in the apache dir (default for me, but not for some).
maybe make sure through a .htaccess to cover at least SOME of the holes.
(assuming it's allowed to override)

ok, off to check on what's new since the night... :)


   -------------------------------------------------------------
   Ira Abramov          <ir...@scso.com>        Scalable Solutions
   POBox 3600, Jerusalem 91035, Israel       Tel (972)2-642-6822
   http://www.scso.com/~ira   Check out: http://www.linux.org.il


Re: some updates

Posted by "Karsten W. Rohrbach" <ro...@nacamar.net>.
On Thu, 12 Jun 1997, Brian Behlendorf wrote:

(...deletia...)
> 
> This of course has some interesting ramifications:
> 
> 1) We will now be running CGI scripts on mirror sites.  Previously all CGI
> scripts, such as the search field and bug database, had an explicit link
> back to www.apache.org.  These CGI scripts only rely upon perl 4 (or 5)
> being at "/usr/local/bin/perl".  Is this a problem?
i think it is, since bofh's like me like to produce the security loopholes
on their own =) no, serously, i wont allow cgi stuff on my system when
mirroring via ftp from another site. the impact of having a hacked script
transferred to www.apache.org infesting all of the mirrors make
www.apache.org a primary target for those wannabe-crackers out there, so i
would propose to have a http://cgi.apache.org with bugdb and every other
cgi stuff on it.

> 
> 2) Sites which pull down their content via ProxyPass will not have
> dynamically generated mirror pages, though they should be cacheable.
anyway, shouldnt proxy systems bailout from the cache procedure when they
encounter a ? or & in the url? i think this is standard since a while
(squid/harvest cached do it this way)

> 
> 
> There may be more issues brought up by this than I anticipated - let's
> discuss them here.  Thanks again to everyone for providing mirror sites,
> hopefully the storm will die down soon.

try to randomize download locations on dns basis (dns round robin) for the
different locations in the net topology.

like having the following:
download.us.apache.org for the us
download.eu.apache.org for europe

i think we can learn a lot from the freebsd mirroring strategy in this
point, they dont have dns round robin but this is a point which would
render the traffic occurance a little nicer.

> 
> 	Brian
> 
> --=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
> brian@hyperreal.com     http://www.apache.org     http://www.organic.com/jobs
> 

With best regards,
Karsten W. Rohrbach

-- 
Nuclear war can ruin your whole compile. (Karl Lehenbauer)
-> http://www.webmonster.de
-> http://www.nacamar.de
-> http://www.quakeforum.de
-> http://www.apache.de