You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Graham Leggett <mi...@sharp.fm> on 2002/03/01 03:01:12 UTC

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Igor Sysoev wrote:

> mod_proxy can not do many things that mod_accel can. Some of
> them can be easy implemented, some not.

Keep in mind that mod_proxy is exactly that - a proxy. It does not try
to duplicate functionality that is performed by other parts of Apache.
(This is the main reason mod_proxy and mod_cache were separated from
each other in v2.0)

> mod_accel can:
> 
> *) ignore headers like 'Pragma: no-cache' and 'Authorization'.

This is the job of mod_headers, not mod_proxy.

However: ignoring headers violates the HTTP protocol and is not
something that should be included in a product that claims to be as HTTP
compliant as possible. If you want to cache heavy data sources, use the
Cache-Control header correctly, or correct the design of the application
so as to be less inefficient.

> *) log its results.

In theory mod_proxy (and mod_cache) should allow results to be logged
via the normal logging modules. If this is not yet possible, it should
be fixed.

> *) pass cookies to backend even response can be cached.

Again RFC2616 dictates how this should be done - proxy should support
the specification.

> *) taking cookies into account while caching responses.
> 
> *) mod_accel has AccelNoPass directive.

What does this do?

If it allows certain parts of a proxied URL space to be "not-proxied",
then the following will achieve this effect:

ProxyPass /blah http://somewhere/blah
ProxyPass /blah/somewhere/else !

Everything under /blah is proxied, except for everything under
/blah/somewhere/else.

> *) proxy mass name-based virtual hosts with one directive on frontend:
>    AccelPass   /      http://192.168.1.1/    [PH]
>    [PH] means preserve hostname, i.e. request to backend would go with
>    original 'Host' header.

mod_accel does this in one directive, mod_proxy does it in two - but the
effect is the same. Should we consider adding a combined directive to
mod_proxy the same way mod_accel works...?

> *) resolve backend on startup.

This is a good idea.

> *) make simple fault-tolerance with dns-balanced backends.

mod_proxy does this already.

> *) use timeout when it connects to backend.

mod_proxy should do this - if it doesn't, it is a bug.

> *) use temporary file for buffering client request body (there is patch
>    for mod_proxy).

What advantage does this give?

> *) serve byte-range requests.

This needs to be fixed in proxy, yes.

> *) get backend response as soon as possible even it's very big.
>    mod_accel uses temporary file for buffering backend response if
>    reponse can not fill in mod_accel configurable buffer.

This kind of thing is fixed in v2.0 in mod_cache. It is too big an
architecture change for the v1.3 proxy.

> *) use busy locks. If there are several the same requests to backend
>    then only one of them would go to backend during specified time.
> 
> *) limit concurrent connections and waiting processes on per-backend
>    or per Location basis.

This is not the job of mod_proxy, but the job of a separate module.

Both busy locks and limiting concurrent connections can be useful in a
normal Apache server using mod_cgi, or one of the Java servlet
connectors. Adding this to proxy means it can only be used in proxy -
which is a bad idea.

> *) mod_accel has mod_randban module that allow to randomize some
>    part of content. For example it can replace '11111' number in
>    <img src="http://host/path1?place=1&key=1234&rand=11111">
>    with random value.

This is the job of mod_rewrite.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Graham Leggett <mi...@sharp.fm>.
Bill Stoddard wrote:

> > - In mem_cache, objects need content-lengths. Partially cached objects
> > are fetchable, solving the load spike problem.
> 
> I think mem_cache should be able to cache (or begin caching) objects with unknown content
> length.  Perhaps by mirroring the content to a temp file on disk and promoting it to
> in-mem when the full content is received or garbage collecting it if it exceeds max cache
> object size thresholds. Many content generators (I am thinking servlets and JSPs) generate
> small (cacheable) responses but we may not know the length of these responses upon first
> entry to CACHE_IN.

Perhaps there is an alternative to this:

When mod_proxy fetches a request, and it is small enough to fit in it's
internal buffer (of say a configurable 64kb or whatever), and the
content-length is missing, a content-length should be added to the
stream by mod_proxy. (Let me check that proxy does this)

As a result, responses of up to a certain size will have content-lengths
added before they hit the cache, making them cacheable. Responses over
that size will have no content-length, will be chunked, and will not be
cached.

This way the cache can be kept simple (no cache without a
content-length) but small dynamic responses will become cacheable
through the addition of content-length.

Thoughts?

> Serving partially cached responses seems rather flaky to me. And as you alluded to,
> handling the case where you are serving a partially cached response that is subsequently
> abandonded is a really funky problem to solve cleanly. Need to give it some more
> thought...

If a response has a content-length, then the only time that response
will be abandoned is if the backend server flakes out. If this happens,
the front end response will be forced to flake out (probably by
connection closed).

If another request is shadowing this response, and this response flakes
out, then both the original request and the other request will both
flake out. I don't see this as a serious problem.

> To solve the backend load spike problem, it would be relatively straight forward to stall
> threads requesting partially cached objects (with a user defineable sleep time and retry
> period) to keep those threads from firing requests off to the backend servers.

The best (from the point of view of delivering content as fast as
possible to the client) way I think is for the shadowing threads to ship
all the cached content possible to the client as long as cached data is
available. If the shadowing thread runs out of stuff to send, it should
sleep again until there is more to send. A simple flag on the cached
file will tell whether this file is "finished" or not. Shadowing threads
will simply read as much as possible from the cached file, until the
cached file is marked as complete. Then the shadowing threads can signal
the transmit as being complete too.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Bill Stoddard <bi...@wstoddard.com>.
> Bill Stoddard wrote:
>
> > mod_disk_cache does not require knowledge of content length. In principle, do you
think
> > this is a problem for a proxy cache provided we can gracefully detect and handle cases
> > where cache thresholds are being exceeded? What does squid and apache 1.3 do?
>
> I have no idea what squid does. Apache v1.3 only makes a cached object
> available after it has been downloaded completely, and I think only
> objects with content-lengths. This causes the problem of nasty load
> spikes hitting a backend server when cached content expires.
>
> I think the following logic is a compromise:
>
> - In mem_cache, objects need content-lengths. Partially cached objects
> are fetchable, solving the load spike problem.

I think mem_cache should be able to cache (or begin caching) objects with unknown content
length.  Perhaps by mirroring the content to a temp file on disk and promoting it to
in-mem when the full content is received or garbage collecting it if it exceeds max cache
object size thresholds. Many content generators (I am thinking servlets and JSPs) generate
small (cacheable) responses but we may not know the length of these responses upon first
entry to CACHE_IN.

>
> - In disk_cache, objects do not need content-lengths, but attempts to
> cache may be abandoned once the magic threshhold is reached.
Yep.

>
> - As a result of the above possibility that downloads might be
> abandoned, partially cached objects should not be fetchable.
>
> Does this make sense?
>
> Is there a way you can see to make disk_cache support partial responses
> being fetchable?

Serving partially cached responses seems rather flaky to me. And as you alluded to,
handling the case where you are serving a partially cached response that is subsequently
abandonded is a really funky problem to solve cleanly. Need to give it some more
thought...

To solve the backend load spike problem, it would be relatively straight forward to stall
threads requesting partially cached objects (with a user defineable sleep time and retry
period) to keep those threads from firing requests off to the backend servers.

>
> Regards,
> Graham
> --

Bill


Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Graham Leggett <mi...@sharp.fm>.
Bill Stoddard wrote:

> mod_disk_cache does not require knowledge of content length. In principle, do you think
> this is a problem for a proxy cache provided we can gracefully detect and handle cases
> where cache thresholds are being exceeded? What does squid and apache 1.3 do?

I have no idea what squid does. Apache v1.3 only makes a cached object
available after it has been downloaded completely, and I think only
objects with content-lengths. This causes the problem of nasty load
spikes hitting a backend server when cached content expires.

I think the following logic is a compromise:

- In mem_cache, objects need content-lengths. Partially cached objects
are fetchable, solving the load spike problem.

- In disk_cache, objects do not need content-lengths, but attempts to
cache may be abandoned once the magic threshhold is reached.

- As a result of the above possibility that downloads might be
abandoned, partially cached objects should not be fetchable.

Does this make sense?

Is there a way you can see to make disk_cache support partial responses
being fetchable?

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Bill Stoddard <bi...@wstoddard.com>.
> Bill Stoddard wrote:
>
> > Haven't thought this through, but there is at least one complicated case to consider
and
> > handle correctly.  If the backend is chunking a response back to the proxy and that
> > response exceeds the size the proxy is allowed to cache, then the proxy would need to
> > abort the caching, send the partial cached file, cleanup that file, then continue
reading
> > from the backend. And would we want to make this behaviour configurable? Are there
> > practicle (non-contrived) cases where it is unacceptable to defer sending bytes to the
> > client?
>
> In the default design (I dunno if this has been changed) only responses
> with content-lengths were able to be cached for this reason.
>
> Regards,
> Graham

mod_disk_cache does not require knowledge of content length. In principle, do you think
this is a problem for a proxy cache provided we can gracefully detect and handle cases
where cache thresholds are being exceeded? What does squid and apache 1.3 do?

Bill

> --
> -----------------------------------------
> minfrin@sharp.fm "There's a moon
> over Bourbon Street
> tonight..."


Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Graham Leggett <mi...@sharp.fm>.
Bill Stoddard wrote:

> Haven't thought this through, but there is at least one complicated case to consider and
> handle correctly.  If the backend is chunking a response back to the proxy and that
> response exceeds the size the proxy is allowed to cache, then the proxy would need to
> abort the caching, send the partial cached file, cleanup that file, then continue reading
> from the backend. And would we want to make this behaviour configurable? Are there
> practicle (non-contrived) cases where it is unacceptable to defer sending bytes to the
> client?

In the default design (I dunno if this has been changed) only responses
with content-lengths were able to be cached for this reason.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Bill Stoddard <bi...@wstoddard.com>.
> >
> > *) get backend response as soon as possible even it's very big.
> >    mod_accel uses temporary file for buffering backend response if
> >    reponse can not fill in mod_accel configurable buffer.
>
> This kind of thing is fixed in v2.0 in mod_cache. It is too big an
> architecture change for the v1.3 proxy.

Haven't thought this through, but there is at least one complicated case to consider and
handle correctly.  If the backend is chunking a response back to the proxy and that
response exceeds the size the proxy is allowed to cache, then the proxy would need to
abort the caching, send the partial cached file, cleanup that file, then continue reading
from the backend. And would we want to make this behaviour configurable? Are there
practicle (non-contrived) cases where it is unacceptable to defer sending bytes to the
client?


Bill


Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Fri, 8 Mar 2002, Graham Leggett wrote:

> Igor Sysoev wrote:
> 
> > > > *) make simple fault-tolerance with dns-balanced backends.
> > >
> > > mod_proxy does this already.
> > 
> > No. mod_proxy tries it but code is broken. If connection failed it try
> > to connect with the same socket. It should make new socket.
> > Anyway mod_accel tries another backend if connection failed, backend
> > has not sent header, and backend has send 5xx response.
> 
> I just checked this code - when a connection fails a new socket is
> created. Are you sure this has not been fixed since you last checked?

I had seen 1.3.23

Igor Sysoev


Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Graham Leggett <mi...@sharp.fm>.
Igor Sysoev wrote:

> > > *) make simple fault-tolerance with dns-balanced backends.
> >
> > mod_proxy does this already.
> 
> No. mod_proxy tries it but code is broken. If connection failed it try
> to connect with the same socket. It should make new socket.
> Anyway mod_accel tries another backend if connection failed, backend
> has not sent header, and backend has send 5xx response.

I just checked this code - when a connection fails a new socket is
created. Are you sure this has not been fixed since you last checked?

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Graham Leggett <mi...@sharp.fm>.
Igor Sysoev wrote:

> Do you mean that Squid returns cached gzipped content to client
> that does not send 'Accept-Encoding' ? mod_proxy 1.3.23 does the same.
> Would it be changed in 1.3.24 ?

Looking into this further, proxy uses the HTTP/1.1 Vary mechanism for
determining whether a negoitated response is cacheable or not. HTTP/1.0
requests are not checked for negotiated responses. This is in line with
the behaviour of mod_negotiation, which adds Pragma: no-cache to
negotiated responses on HTTP/1.0 requests. I assume other webservers
have similar behaviour, which is why this hasn't been raised as a
problem before.

In the v1.3 mod_proxy, if a cached variant turns out not to mach the
Vary mechanism, that cached variant is deleted and a new variant is
requested. This ensures that the client is not sent the wrong variant.

In the v2.0 mod_cache, the capability exists to cache multiple variants
of the same URL simultaneously. As a result, should a cached variant fit
the client's pushlished capabilities, then that variant will be
returned, otherwise a new variant will be requested from the remote
server, possibly adding an additional variant to the cache. As mod_cache
has built in negotiation capabilities, this should in theory work with
both HTTP/1.0 and HTTP/1.1 requests.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Graham Leggett <mi...@sharp.fm>.
Igor Sysoev wrote:

> The main reason why Squid is better than Apache is much lesser
> memory overhead per connection. And of course, Squid has many other
> proxing features - it's proxy, not webserver.

In my experience, use a proxy as a forward proxy (like Squid), and use a
webserver as a reverse proxy (like Apache).

> Do you mean that Squid returns cached gzipped content to client
> that does not send 'Accept-Encoding' ? mod_proxy 1.3.23 does the same.
> Would it be changed in 1.3.24 ?

It should not do - if it does, it's a bug.

> I live in real world and many webmasters are too. It's not always possible
> to redesign backend. Unfortunately while Internet boom too many brain-damaged
> solutions were born.

But Apache tries to be the reference implementation of HTTP/1.1.
Although there are features in Apache to compensate for client and
server brokenness, trying too hard to accomodate broken design allows
too many webmasters to get away with broken design. If the clients fixed
all server errors, why bother creating a server that meets spec?

> > Use the ProxyPreserveHost option.
> 
> I suppose in 1.3.24 ?

Someone posted a patch a few weeks ago - no idea which versions it
appeared in, other than it's in the head of both 1.3 and 2.0.

> > The idea behind mod_cache was to separate the "send" threads from the
> > "receive" thread. This means that if a response is half-way downloaded,
> > and a new request comes in, the new request will be served from the
> > half-cached half-downloaded file, and not from a new request. When the
> > original request is finished, the backend is released, and the "receive"
> > threads carry on regardless.
> 
> Would it be work in prefork MPM ?

The requirement would be based on the presence of shared memory, and
should work in all MPMs.

> > You should have created a separate module for this, and run it alongside
> > mod_accel. This can still be done though.
> 
> I did not use mod_cgi and Java.

But other people do. If the busy locks logic was in it's own module,
there would be a lot more use for it out there.

> Your phrase is like 'mod_rewrite should be patched to do some SSI job'
> mod_rewrite works with URLs and filenames only. It can not change content.
> mod_randban changes content on the fly.

Then I misunderstood what you were trying to do - fiddling with content
on the fly is the job of a separate module entirely, probably a filter.
I thought you were manipulating URLs.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Wed, 6 Mar 2002, Graham Leggett wrote:

> > mod_accel is not proxy. It's accelarator. It can not work as usual proxy.
> > I did not even try to implement it - Apache 1.3 is poor proxy. Squid or
> > Oops are much better.
> 
> Until recently you were not aware that the proxy had been updated - I
> would look at the code again before passing this judgement ;)

The main reason why Squid is better than Apache is much lesser
memory overhead per connection. And of course, Squid has many other
proxing features - it's proxy, not webserver.

> For example, you pointed out some problems with Squid and content
> negotiation - mod_proxy doesn't have these problems.

Do you mean that Squid returns cached gzipped content to client
that does not send 'Accept-Encoding' ? mod_proxy 1.3.23 does the same.
Would it be changed in 1.3.24 ?

> > mod_accel can ignore client's 'Pragma: no-cache' and
> > 'Cache-Control: no-cache'. These headers are sent if you press Reload
> > button in Netscape or Mozilla. By default if mod_accel gets these headers
> > then it does not look cache but send request to backend.
> > Webmaster can set 'AccelIgnoreNoCache on' if he sure that
> > backend did not give fresh data and such requests only overload backend.
> 
> This design is broken.
> 
> If the client sent a cache-control or pragma header it was because the
> client specifically wanted that behaviour. If this causes grief on the
> backend, then your backend needs to be redesigned so that it does not
> have such a performance hit.

I live in real world and many webmasters are too. It's not always possible
to redesign backend. Unfortunately while Internet boom too many brain-damaged
solutions were born.

> Breaking the HTTP protocol isn't the fix to a broken backend.

I'm considering mod_accel and backend as single entity. It does not
matter for me which protocol I use for communication between them.
Clients see nice HTTP protocol.

> > > Everything under /blah is proxied, except for everything under
> > > /blah/somewhere/else.
> > 
> > Yes. But '!' is already implemented ?
> 
> Yes it is.

I suppose in 1.3.24 ? By the way mod_accel's syntax is more flexible -
mod_accel can use regexp.

> > > > *) proxy mass name-based virtual hosts with one directive on frontend:
> > > >    AccelPass   /      http://192.168.1.1/    [PH]
> > > >    [PH] means preserve hostname, i.e. request to backend would go with
> > > >    original 'Host' header.
> > >
> > > mod_accel does this in one directive, mod_proxy does it in two - but the
> > > effect is the same. Should we consider adding a combined directive to
> > > mod_proxy the same way mod_accel works...?
> > 
> > What are two mod_proxy's directives ?
> > As far as I know mod_proxy always change 'Host' header.
> 
> Use the ProxyPreserveHost option.

I suppose in 1.3.24 ?

> > mod_accel can send part of answer to client even backend has not sent
> > whole answer. But even in this case slow client never block backend -
> > I use nonblocking operations and select().
> > Would it be possible with mod_cache ?
> 
> The idea behind mod_cache was to separate the "send" threads from the
> "receive" thread. This means that if a response is half-way downloaded,
> and a new request comes in, the new request will be served from the
> half-cached half-downloaded file, and not from a new request. When the
> original request is finished, the backend is released, and the "receive"
> threads carry on regardless.

Would it be work in prefork MPM ?

> > > Both busy locks and limiting concurrent connections can be useful in a
> > > normal Apache server using mod_cgi, or one of the Java servlet
> > > connectors. Adding this to proxy means it can only be used in proxy -
> > > which is a bad idea.
> > 
> > Probably but Apache 1.3.x has not such module and I needed it too much
> > in mod_accel.
> 
> You should have created a separate module for this, and run it alongside
> mod_accel. This can still be done though.

I did not use mod_cgi and Java.

> > > This is the job of mod_rewrite.
> > 
> > mod_rewrite can not do it.
> 
> Then rewrite should be patched to do it.

Your phrase is like 'mod_rewrite should be patched to do some SSI job'
mod_rewrite works with URLs and filenames only. It can not change content.
mod_randban changes content on the fly.

Igor Sysoev


Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Graham Leggett <mi...@sharp.fm>.
Igor Sysoev wrote:

> mod_accel is not proxy. It's accelarator. It can not work as usual proxy.
> I did not even try to implement it - Apache 1.3 is poor proxy. Squid or
> Oops are much better.

Until recently you were not aware that the proxy had been updated - I
would look at the code again before passing this judgement ;)

For example, you pointed out some problems with Squid and content
negotiation - mod_proxy doesn't have these problems.

> mod_accel can ignore client's 'Pragma: no-cache' and
> 'Cache-Control: no-cache'. These headers are sent if you press Reload
> button in Netscape or Mozilla. By default if mod_accel gets these headers
> then it does not look cache but send request to backend.
> Webmaster can set 'AccelIgnoreNoCache on' if he sure that
> backend did not give fresh data and such requests only overload backend.

This design is broken.

If the client sent a cache-control or pragma header it was because the
client specifically wanted that behaviour. If this causes grief on the
backend, then your backend needs to be redesigned so that it does not
have such a performance hit.

Breaking the HTTP protocol isn't the fix to a broken backend.

> > In theory mod_proxy (and mod_cache) should allow results to be logged
> > via the normal logging modules. If this is not yet possible, it should
> > be fixed.
> 
> In theory but not in practice.

Then it needs fixing.

> > Everything under /blah is proxied, except for everything under
> > /blah/somewhere/else.
> 
> Yes. But '!' is already implemented ?

Yes it is.

> > > *) proxy mass name-based virtual hosts with one directive on frontend:
> > >    AccelPass   /      http://192.168.1.1/    [PH]
> > >    [PH] means preserve hostname, i.e. request to backend would go with
> > >    original 'Host' header.
> >
> > mod_accel does this in one directive, mod_proxy does it in two - but the
> > effect is the same. Should we consider adding a combined directive to
> > mod_proxy the same way mod_accel works...?
> 
> What are two mod_proxy's directives ?
> As far as I know mod_proxy always change 'Host' header.

Use the ProxyPreserveHost option.

> No. mod_proxy tries it but code is broken. If connection failed it try
> to connect with the same socket. It should make new socket.
> Anyway mod_accel tries another backend if connection failed, backend
> has not sent header, and backend has send 5xx response.

This should be fixed.

> mod_accel can send part of answer to client even backend has not sent
> whole answer. But even in this case slow client never block backend -
> I use nonblocking operations and select().
> Would it be possible with mod_cache ?

The idea behind mod_cache was to separate the "send" threads from the
"receive" thread. This means that if a response is half-way downloaded,
and a new request comes in, the new request will be served from the
half-cached half-downloaded file, and not from a new request. When the
original request is finished, the backend is released, and the "receive"
threads carry on regardless.

> > Both busy locks and limiting concurrent connections can be useful in a
> > normal Apache server using mod_cgi, or one of the Java servlet
> > connectors. Adding this to proxy means it can only be used in proxy -
> > which is a bad idea.
> 
> Probably but Apache 1.3.x has not such module and I needed it too much
> in mod_accel.

You should have created a separate module for this, and run it alongside
mod_accel. This can still be done though.

> > This is the job of mod_rewrite.
> 
> mod_rewrite can not do it.

Then rewrite should be patched to do it.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_proxy Cache-Control: no-cache= support Apache1.3

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Fri, 1 Mar 2002, Graham Leggett wrote:

> Igor Sysoev wrote:
> 
> > mod_proxy can not do many things that mod_accel can. Some of
> > them can be easy implemented, some not.
> 
> Keep in mind that mod_proxy is exactly that - a proxy. It does not try
> to duplicate functionality that is performed by other parts of Apache.
> (This is the main reason mod_proxy and mod_cache were separated from
> each other in v2.0)

mod_accel is not proxy. It's accelarator. It can not work as usual proxy.
I did not even try to implement it - Apache 1.3 is poor proxy. Squid or
Oops are much better.

> > mod_accel can:
> > 
> > *) ignore headers like 'Pragma: no-cache' and 'Authorization'.
> 
> This is the job of mod_headers, not mod_proxy.
> 
> However: ignoring headers violates the HTTP protocol and is not
> something that should be included in a product that claims to be as HTTP
> compliant as possible. If you want to cache heavy data sources, use the
> Cache-Control header correctly, or correct the design of the application
> so as to be less inefficient.

mod_accel can ignore client's 'Pragma: no-cache' and
'Cache-Control: no-cache'. These headers are sent if you press Reload
button in Netscape or Mozilla. By default if mod_accel gets these headers
then it does not look cache but send request to backend.
Webmaster can set 'AccelIgnoreNoCache on' if he sure that
backend did not give fresh data and such requests only overload backend.

As to 'Authorization' mod_accel by default sends this header
to backend and never caches such answers. Webmaster can set
'AccelIgnoreAuth on' if backend never ask authorization but
client anyway send 'Authorization' - so in this case 'Authorization'
is simply very powerfull 'no-cache' header.
I know at least one download utility, FlashGet, that sends in
'Authorization' header name and password for anonymous FTP access.
It's probably bug in FlashGet but this bug effectively trashes cache
and backend.

Yes, of course all these directives work per Location and Files level.

> > *) log its results.
> 
> In theory mod_proxy (and mod_cache) should allow results to be logged
> via the normal logging modules. If this is not yet possible, it should
> be fixed.

In theory but not in practice.

> > *) pass cookies to backend even response can be cached.
> 
> Again RFC2616 dictates how this should be done - proxy should support
> the specification.

As I said mod_accel is not proxy.
By default mod_accel did not send cookies to backend if reponse
can be cached. But webmaster can set 'AccelPassCookie on'
and  all cookies goes to backend. Backend is responsible to
control which answers should be cached and which are not.
Anyway 'Set-Cookie' headers never goes to cache.
This directive works per Location and Files level.

> > *) taking cookies into account while caching responses.
> > 
> > *) mod_accel has AccelNoPass directive.
> 
> What does this do?
> 
> If it allows certain parts of a proxied URL space to be "not-proxied",
> then the following will achieve this effect:
> 
> ProxyPass /blah http://somewhere/blah
> ProxyPass /blah/somewhere/else !
> 
> Everything under /blah is proxied, except for everything under
> /blah/somewhere/else.

Yes. But '!' is already implemented ?
I use another syntax:

AccelPass     /     http://backend/
AccelNoPass   /images  /download  ~*\.jpg$

> > *) proxy mass name-based virtual hosts with one directive on frontend:
> >    AccelPass   /      http://192.168.1.1/    [PH]
> >    [PH] means preserve hostname, i.e. request to backend would go with
> >    original 'Host' header.
> 
> mod_accel does this in one directive, mod_proxy does it in two - but the
> effect is the same. Should we consider adding a combined directive to
> mod_proxy the same way mod_accel works...?

What are two mod_proxy's directives ?
As far as I know mod_proxy always change 'Host' header.

> > *) resolve backend on startup.
> 
> This is a good idea.

mod_accel does it by default. You can disable it with [NR] flag
in AccelPass directive.

> > *) make simple fault-tolerance with dns-balanced backends.
> 
> mod_proxy does this already.

No. mod_proxy tries it but code is broken. If connection failed it try
to connect with the same socket. It should make new socket.
Anyway mod_accel tries another backend if connection failed, backend
has not sent header, and backend has send 5xx response.

> > *) use timeout when it connects to backend.
> 
> mod_proxy should do this - if it doesn't, it is a bug.

mod_proxy does not.

> > *) use temporary file for buffering client request body (there is patch
> >    for mod_proxy).
> 
> What advantage does this give?

Suppose slow client (3K/s) that POST 10K form. Backend is busy
for 3 seconds. Suppose client uploads 100K file.

> > *) get backend response as soon as possible even it's very big.
> >    mod_accel uses temporary file for buffering backend response if
> >    reponse can not fill in mod_accel configurable buffer.
> 
> This kind of thing is fixed in v2.0 in mod_cache. It is too big an
> architecture change for the v1.3 proxy.

mod_accel can send part of answer to client even backend has not sent
whole answer. But even in this case slow client never block backend -
I use nonblocking operations and select().
Would it be possible with mod_cache ?

> > *) use busy locks. If there are several the same requests to backend
> >    then only one of them would go to backend during specified time.
> > 
> > *) limit concurrent connections and waiting processes on per-backend
> >    or per Location basis.
> 
> This is not the job of mod_proxy, but the job of a separate module.
> 
> Both busy locks and limiting concurrent connections can be useful in a
> normal Apache server using mod_cgi, or one of the Java servlet
> connectors. Adding this to proxy means it can only be used in proxy -
> which is a bad idea.

Probably but Apache 1.3.x has not such module and I needed it too much
in mod_accel.

> > *) mod_accel has mod_randban module that allow to randomize some
> >    part of content. For example it can replace '11111' number in
> >    <img src="http://host/path1?place=1&key=1234&rand=11111">
> >    with random value.
> 
> This is the job of mod_rewrite.

mod_rewrite can not do it.
Suppose we cache some response containing banner's or counter's URL.
If client reload page it gets the same URL in content and did not
load banner. mod_randban change content on the fly.

Igor Sysoev