You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Jeroen Massar <je...@unfix.org> on 2001/03/14 01:04:21 UTC

ProxyPass does cache the data, but gives 502 when the real host dies...

Boo :)

After some fiddling around I came to the following config (excerpt of important parts):
8<-----------------
LoadModule		proxy_module /usr/local/libexec/apache/libproxy.so
ProxyRequests		Off
CacheRoot		/usr/local/www/proxy
CacheSize		5
CacheGcInterval		4
CacheMaxExpire		24
CacheLastModifiedFactor	666.0
CacheDefaultExpire	1
ProxyPass		/other/		http://box2/other/
ProxyPassReverse	/other/		http://box2/other/
-------------------->8

This allows access to the /other/ dir through box1, though the dir actually resides on box2...
Box1 is running an Apache 1.3.19+IPv6 on FreeBSD.

The following questions arise:
- The CacheRoot dir contains mangled filenames, could these be non-mangled, preserving their original names and path so I could put them into the /other/ dir effectively mirroring on-demand... see also point 3...

- It also caches it correctly, except for some non-static files... which headers are required to make sure a file will be cached into the cache?

- Whenever I turn off box2, box1 will start spitting 502's (Bad gateway - proxy error) because it can't contact box2.
Is there a fallback to serve directly from the cache (at least the data it has... the rest should 404... ) ?

NOTE: This last thing will also probably happen with all the apache.org mirror's around the world using the:
8<-----------
ProxyPass / http://www.apache.org/
CacheDefaultExpire 24
------------->8
Method as described on http://www.apache.org/info/how-to-mirror.html...

Greets,
 Jeroen


Re: ProxyPass does cache the data, but gives 502 when the real host dies...

Posted by Ben Wise <gi...@gidon.com>.
Hello everybody,

I've changed around a caching proxy for my own
site's special needs. (It has lots of slow CGI's
and BIG traffic and lots of people clicking reload.)

It works really well. Here are my experiences and hacks.


> The mangled filenames are designed for speed and uniqueness, as well as
> an even distribution across the filesystem - all performance issues.

1) MANUALLY EXPIRING
Sometimes people want to write their own scripts to expire
particular URLs. So I just linked to the apache library
and make a small C executable that returns a mangled name
given a URL as an argument. Then you can just delete the cached
file for any URL's you want.

2) MANUALLY EXPIRING WITH RULES
Sometimes people want to write a script that retires URLs
that're based upon some RULE. (i.e. $url =~ /34.*oldies.*cgi?.*dog.*$/ )
In this case the solutions is to create an index of URL ==> mangled.
If you have a limited number of URLs you can create the index
with the executable in point 1. If you have an unlimited number
of URLs (dynamic sites...), then you'll either have to crawl through
the filesystem, or add a HOOK into the cache so that it adds a record
to the index file each time it caches a file.





> One way of doing this is - when the "special mode" is on - replace a 502
> Bad Gateway result from a conditional response with a 304 Not Modified.
> This way when the browser or the intermediate proxy asks "is my cached
> copy fresh" the Apache proxy will say "yes" - even though the backend
> server is toasted and there is no way of being sure.
 

3) STOPPING PRAGMA NO-CACHE
Sometimes URL's are so expensive to create for the backend
that you NEED to use a "special mode" that ignores "Pragma no-cache"
and other such headers. I just removed all of that code my self,
it's not a big deal. But then the resulting proxy is only good
for that one application. I didn't do it in general.

4) SIMULTANEOUS REQUESTS NOT CACHED
Sometimes URL's are so expensive that your server can go DOWN
in the time that it takes to generate a new content for the URL.
For example, I have a CGI that takes 7 seconds to generate. In
the time that it takes to generate that CGI 20 people can click
and thereby kill the server. For this you need some sort of FLOCK
or semaphore to make them all wait. I didn't implement this and instead
generate the file with a program in the background. Alternatively
you can set the server that responds to this to only handle one
process, but then I think you'll get error messages to the client...

5) I WANT TO HELP
By the way, I hope to contribute in someway to the proxy cache.
I know C and CVS and am pretty meticulous. Unfortunately, I don't
have huge amounts of time and have never worked on a group open
source project before. Nevertheless, if someone needs something
done maybe I can do it.

Bye

Re: ProxyPass does cache the data, but gives 502 when the real host dies...

Posted by Gabriel Russell <g....@ieee.org>.
At 10:43 PM 3/14/2001 +0100, you wrote:
>This is something I am keen to sort out in the v2.0 mod_proxy - handling
>of dodgy or unreliable backend servers in a useful way. To do this
>though means doing some things that break HTTP/1.1, so they'll have to
>be special configuration options, and not the default.

Are you sure they would break http/1.1?
Doesn't this section (13.1.1) possibly cover the dodgy backend server 
situation?

"If a stored response is not "fresh enough" by the most restrictive 
freshness requirement of both the client and the origin server, in 
carefully considered circumstances the cache MAY still return the response 
with the appropriate Warning header (see section 13.1.5 and 14.46), unless 
such a response is prohibited (e.g., by a "no-store" cache-directive, or by 
a "no-cache" cache-request-directive; see section 14.9)."

I'm not saying that there shouldn't be "special configuration options" for 
these things, but I'm just not totally convinced that it would be breaking 
http/1.1

- Gabriel


Re: ProxyPass does cache the data, but gives 502 when the real host dies...

Posted by Graham Leggett <mi...@sharp.fm>.
Jeroen Massar wrote:

> Boo :)

Aaaaaargh!!! Don't wake me up like that...!

> - The CacheRoot dir contains mangled filenames, could
> these be non-mangled, preserving their original names
> and path so I could put them into the /other/ dir
> effectively mirroring on-demand... see also point 3...

The mangled filenames are designed for speed and uniqueness, as well as
an even distribution across the filesystem - all performance issues.

> - It also caches it correctly, except for some non-static
> files... which headers are required to make sure a file will be cached into the cache?

On HTTP/1.0, it's the Pragma header (simply "Pragma: no cache"). The
Expires header gives the time when the object becomes non-fresh, as will
be revalidated.

On HTTP/1.1, it's the Cache-Control header, which has a whole lot of
options, like no-cache (same as "Pragma: no-cache"), no-store, and a
whole lot of others. Each object representation is represented by an
Etag (electronic tag?) which is a unique string which changes when the
object changes. This allows very fine control over whether an object is
fresh or not.

> - Whenever I turn off box2, box1 will start spitting
> 502's (Bad gateway - proxy error) because it can't contact
> box2.
> Is there a fallback to serve directly from the cache (at
> least the data it has... the rest should 404... ) ?

In theory using the Expires header correctly should support this, but
often browsers will send a conditional request for checking whether
cached data has changed - and this ends up all the way at the backend
which isn't there - so a bad gateway gets sent back up the chain.

This is something I am keen to sort out in the v2.0 mod_proxy - handling
of dodgy or unreliable backend servers in a useful way. To do this
though means doing some things that break HTTP/1.1, so they'll have to
be special configuration options, and not the default.

One way of doing this is - when the "special mode" is on - replace a 502
Bad Gateway result from a conditional response with a 304 Not Modified.
This way when the browser or the intermediate proxy asks "is my cached
copy fresh" the Apache proxy will say "yes" - even though the backend
server is toasted and there is no way of being sure.

There is nothing that can be done about normal requests - if the backend
is dead and nothing is cached then there is nothing the browser can show
(502 Bad Gateway). Sending a 404 is a bad idea - could send someone
mailing in saying "the file is missing" when in fact the server was
missing, sending admins on wild goose chases.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: ProxyPass does cache the data, but gives 502 when the real host dies...

Posted by Graham Leggett <mi...@sharp.fm>.
Jeroen Massar wrote:

> Boo :)

Aaaaaargh!!! Don't wake me up like that...!

> - The CacheRoot dir contains mangled filenames, could
> these be non-mangled, preserving their original names
> and path so I could put them into the /other/ dir
> effectively mirroring on-demand... see also point 3...

The mangled filenames are designed for speed and uniqueness, as well as
an even distribution across the filesystem - all performance issues.

> - It also caches it correctly, except for some non-static
> files... which headers are required to make sure a file will be cached into the cache?

On HTTP/1.0, it's the Pragma header (simply "Pragma: no cache"). The
Expires header gives the time when the object becomes non-fresh, as will
be revalidated.

On HTTP/1.1, it's the Cache-Control header, which has a whole lot of
options, like no-cache (same as "Pragma: no-cache"), no-store, and a
whole lot of others. Each object representation is represented by an
Etag (electronic tag?) which is a unique string which changes when the
object changes. This allows very fine control over whether an object is
fresh or not.

> - Whenever I turn off box2, box1 will start spitting
> 502's (Bad gateway - proxy error) because it can't contact
> box2.
> Is there a fallback to serve directly from the cache (at
> least the data it has... the rest should 404... ) ?

In theory using the Expires header correctly should support this, but
often browsers will send a conditional request for checking whether
cached data has changed - and this ends up all the way at the backend
which isn't there - so a bad gateway gets sent back up the chain.

This is something I am keen to sort out in the v2.0 mod_proxy - handling
of dodgy or unreliable backend servers in a useful way. To do this
though means doing some things that break HTTP/1.1, so they'll have to
be special configuration options, and not the default.

One way of doing this is - when the "special mode" is on - replace a 502
Bad Gateway result from a conditional response with a 304 Not Modified.
This way when the browser or the intermediate proxy asks "is my cached
copy fresh" the Apache proxy will say "yes" - even though the backend
server is toasted and there is no way of being sure.

There is nothing that can be done about normal requests - if the backend
is dead and nothing is cached then there is nothing the browser can show
(502 Bad Gateway). Sending a 404 is a bad idea - could send someone
mailing in saying "the file is missing" when in fact the server was
missing, sending admins on wild goose chases.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."