You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Roy T. Fielding" <fi...@gbiv.com> on 2009/02/25 04:27:38 UTC

notes on filters in 2.2.x

I spent a while looking at mod_deflate and various filter related
issues in 2.2.x/trunk, but I had to context switch away before
I could create such a large fix.  This message is to write
down my conclusions so that I can remember them and maybe fix
the silly thing when I get a chance, or maybe encourage someone
else to do it in the meantime.

The current state is broken.  There is no other way to describe it.
mod_deflate is adding -gzip to the end of etags in order to be
correct in regard to variants, but ap_meets_conditions() is not
aware of the -gzip addition; in fact, etags are compared before
the output filter is even initialized for a given response, so
there is no way for ap_meets_conditions() to work without resorting
to stupid hacks like always checking for "foo" and "foo-gzip" in
the core (even the hack is difficult without completely rewriting
how etags are compared).

As a result, mod_deflate in 2.2.x and trunk defeats all conditional
GET requests, which is far worse than the problem that called
for appending the -gzip in the first place.  That change needs to
be reverted or fixed prior to the next release.

The deeper problem here is that output filters create a dynamic
view of representations that must be accounted for on the
interface in *both* directions.  In other words, if an output
filter is applied on responses that causes the response metadata
to be modified, then an input filter must also be applied on all
corresponding requests.  The input filter must adjust the incoming
etags within the If* headers (if any) and, for resources allowing
PUT, perform the inverse filtering on a request body.

Seems simple, right?  Unfortunately, no, because the filter
design is borked.  The filters are currently being placed in the
chain using various mechanisms and then deciding for themselves
whether or not to do work based on the nature of the message
once the data stream is being sent to the client.  In other words,
there is no way for the server to know whether a resource has
its content-encoding changed until after the encoding is applied
and sent down the wire.

The first fix we need is independent of the others.  We need an
implementation of transfer-encoding in the http filters, layered
above the chunked filter, that automatically handles TE and
transfer-encoding as designed in HTTP/1.1.  This would allow us
to start encouraging browser implementation of on-the-fly encoding
that doesn't suffer from any of these problems (Opera already
implements it).

The second fix is to add a pre_send_response hook between the
handler's "handle the request" and "generate the response"
phases.  I am not sure if this should be a true hook or just
a requirement that a special bucket be sent down the filters
before the response is generated.  In any case, the call to
ap_meets_conditions() must be after this hook but before any
action is applied to the resource (perhaps it should always be
the very last hook run).  Another way of thinking of it is that
we need a pre_handler phase that takes effect after all filters
are chosen and metadata is set by the handler+filters,
after which me must forbid any changes to the metadata (aside
from error redirects).

The third fix is to change the filters so that they
do all initialization, checks, and setting of metadata within
the pre_send_response hook/whatever.  Filters also need to be
registered in pairs (filter, inverse) so that the insert_filter
hook can apply the inverse filter to requests at the same time
the filter is applied to responses.

Alternatively, separate all filter initialization and
placement so that the arranger of filters (adding to the chain)
is the one responsible for setting metadata and ensuring the
corresponding inverse filter is placed on requests.  Filters
would then become simple brigade-replacement devices and
forbidden from setting headers/metadata.

The fourth fix is to somehow make mod_filters work with this.

The final fix is to replace the other filter-adding directives
with mod_filters and make it a core module, since it is stupid
to do all this work in two places.

I need to switch back to editing the HTTPbis spec, and then
preparing an unexpected presentation and unplanned trip to
Italy.  I probably won't be able to do any more work on this
until April.

....Roy

Re: notes on filters in 2.2.x

Posted by Albert Lash <al...@gmail.com>.
On Tue, Feb 24, 2009 at 10:27 PM, Roy T. Fielding <fi...@gbiv.com> wrote:

> I spent a while looking at mod_deflate and various filter related
> issues in 2.2.x/trunk, but I had to context switch away before
> I could create such a large fix.  This message is to write
> down my conclusions so that I can remember them and maybe fix
> the silly thing when I get a chance, or maybe encourage someone
> else to do it in the meantime.
>
> The current state is broken.  There is no other way to describe it.
> mod_deflate is adding -gzip to the end of etags in order to be
> correct in regard to variants, but ap_meets_conditions() is not
> aware of the -gzip addition; in fact, etags are compared before
> the output filter is even initialized for a given response, so
> there is no way for ap_meets_conditions() to work without resorting
> to stupid hacks like always checking for "foo" and "foo-gzip" in
> the core (even the hack is difficult without completely rewriting
> how etags are compared).
>
>
Roy,

For what its worth (maybe to confirm your observations?), while trying to
track down a memory leak with mod_perl, I noted that Apache started using a
little more memory for every request only when mod_deflate was enabled.

http://www.docunext.com/blog/2008/06/24/even-more-perl-notes/

And to sing the praise of filters, I believe them to be a very good thing
for dynamic content generation. And, even if its not the most efficient,
mod_ext_filter is a sweet one.

- Albert