You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Cliff Woolley <jw...@virginia.edu> on 2002/05/20 03:06:21 UTC

Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c

On 20 May 2002 ianh@apache.org wrote:

> ianh        02/05/19 17:07:33
>
>   Modified:    modules/filters mod_deflate.c
>   Log:
>   content with "Content-Encoding" header, content is encoded.
>   But mod_deflate does not check it. It cause to encode content twice.
>
>   This problem is reproducable by getting encoded content via mod_proxy.
>
>   Patch Contributed by kaz@asada.sytes.net (ASADA Kazuhisa)
>   Bug #9222

Whoa, wait a minute.  That doesn't strike me as the right solution.  The
encoding should be one-hop only.  If it's encoded and we want to maintain
that encoding, chances are we'll have to decode it and re-encode it later.
Why you ask?  Because leaving it encoded makes it impossible to apply
another filter on the proxy server (eg mod_include).  Now perhaps if we
can guarantee that there are no other filters in the chain that will want
to modify the content *and* that the client can actually accept the
encoding, then as an optimization we can pass the data through the filter
chain still encoded.  But that would only be an optimization.  And it
seems like it could be tricky to get it to always work doing it that way,
perhaps.  (Is there ever a case where the client does not accept an
encoding but the proxy does?)

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: Content Codings (was: Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c)

Posted by Ian Holsman <ia...@apache.org>.
This is a patch to implement what you mentioned Greg.
I haven't got the time to fully test so hence it's in a email
instead of just being committed.
--Ian

Index: mod_deflate.c
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/filters/mod_deflate.c,v
retrieving revision 1.6
diff -u -r1.6 mod_deflate.c
--- mod_deflate.c	20 May 2002 00:07:33 -0000	1.6
+++ mod_deflate.c	26 May 2002 04:21:00 -0000
@@ -232,6 +232,7 @@
  {
      apr_bucket *e;
      const char *accepts;
+    const char *contentencoding;
      request_rec *r = f->r;
      deflate_ctx *ctx = f->ctx;
      char *token = NULL;
@@ -273,8 +274,9 @@
              return ap_pass_brigade(f->next, bb);
          }

-        /* content is already encoded, so don't encode it again */
-        if (apr_table_get(r->headers_in, "Content-Encoding")) {
+        /* content is already gzip encoded, so don't encode it again */
+        contentencoding = apr_table_get(r->headers_in, "Content-Encoding");
+        if (contentencoding && ap_strstr(contentencoding,"gzip")) {
              ap_remove_output_filter(f);
              return ap_pass_brigade(f->next, bb);			
          }
@@ -303,12 +305,7 @@
          ctx = f->ctx = apr_pcalloc(r->pool, sizeof(*ctx));
          ctx->bb = apr_brigade_create(r->pool, f->c->bucket_alloc);
          ctx->buffer = apr_palloc(r->pool, c->bufferSize);
-/*
-        ctx->stream.zalloc = (alloc_func) 0;
-        ctx->stream.zfree = (free_func) 0;
-        ctx->stream.opaque = (voidpf) 0;
-        ctx->crc = 0L;
-*/
+
          zRC = deflateInit2(&ctx->stream, Z_BEST_SPEED, Z_DEFLATED,
                             c->windowSize, c->memlevel,
                             Z_DEFAULT_STRATEGY);
@@ -328,7 +325,16 @@
          e = apr_bucket_pool_create(buf, 10, r->pool, f->c->bucket_alloc);
          APR_BRIGADE_INSERT_TAIL(ctx->bb, e);

-        apr_table_setn(r->headers_out, "Content-Encoding", "gzip");
+        /*
+         * if it's not there, or is set to 'identity' just hard code it 
to gzip
+         * otherwise append 'gzip' to the other ones already there
+         */
+        if (contentencoding && strcmp(contentencoding,"identity")) {
+            apr_table_setn(r->headers_out, "Content-Encoding", 
apr_pstrcat(r->pool,contentencoding,", gzip", NULL));
+        }
+        else {
+            apr_table_setn(r->headers_out, "Content-Encoding", "gzip");
+        }
          apr_table_setn(r->headers_out, "Vary", "Accept-Encoding");
          apr_table_unset(r->headers_out, "Content-Length");
      }


Greg Stein wrote:
> On Mon, May 20, 2002 at 08:40:03AM -0700, Ian Holsman wrote:
> 
>>Cliff Woolley wrote:
>>
>>>On 20 May 2002 ianh@apache.org wrote:
>>>
>>>
>>>
>>>>ianh        02/05/19 17:07:33
>>>>
>>>> Modified:    modules/filters mod_deflate.c
>>>> Log:
>>>> content with "Content-Encoding" header, content is encoded.
>>>> But mod_deflate does not check it. It cause to encode content twice.
>>>>
>>>> This problem is reproducable by getting encoded content via mod_proxy.
>>>>
>>>> Patch Contributed by kaz@asada.sytes.net (ASADA Kazuhisa)
>>>> Bug #9222
>>>
> 
> This is the wrong fix. Take a look at RFC 2616. MULTIPLE encodings are
> allowed. If the Content-Encoding header said "identity", then we'll skip the
> compression. Bad Bad Bad.
> 
> The following is entirely legal:
> 
> Content-Encoding: gzip, compress, deflate
> 
> (dumb, but legal)
> 
> [ see sections 14.11 and 3.5 of RFC 2616 for more info ]
> 
> 
>>>Whoa, wait a minute.  That doesn't strike me as the right solution.  The
>>>encoding should be one-hop only.  If it's encoded and we want to maintain
>>>that encoding, chances are we'll have to decode it and re-encode it later.
>>
> 
> Depends on what we want to do with proxied requests. Do we run them through
> our own filter stack? (uncompressed) But also, it is entirely reasonable
> that we could build an inflator on the client-piece of our proxy to handle
> the case where Apache can speak compressed to the origin server, but we have
> to inflate it for our client.
> 
> (right now, the proxy is just copying over what the client allows; it would
>  be entirely reasonable to tell the origin server we can always accept
>  deflated content; of course, that also means we need to write the inflate
>  logic :-)
> 
> 
>>deflate runs at the end (I think it's a header/network type output 
>>filter)  and the output of this is compressed data.
>>
>>we do check if the user can accept the gzip encoding,
>>i assumed that if this has been content-encoded already that module has 
>>done all the checks necessary to make sure the client supports it.
> 
> 
> It happens to work because the proxy client stands in for the true client.
> But that isn't always a good assumption.
> 
> 
>>This should also fix the case where you have multiple content-encoders 
>>running on the same file.. the one with the highest priority (first in 
>>the chain) will do the encoding and the others will leave it alone.
> 
> 
> Absolutely wrong. Each of them can and should be able to apply themselves if
> they so choose.
> 
> The correct test is something like this:
> 
>   if Content-Encoding is present:
>       if Content-Encoding contains "deflate" then
>           skip-self
> 
> Essentially, we will apply a "deflate" encoding if it isn't already present.
> After doing the deflate, we then need to set the Content-Encoding using:
> 
>   if Content-Encoding is not present, or it equals "identity":
>       set Content-Encoding to "deflate"
>   else
>       append ", deflate" to Content-Encoding
> 
> 
>>...
>>
>>>Why you ask?  Because leaving it encoded makes it impossible to apply
>>>another filter on the proxy server (eg mod_include).  Now perhaps if we
>>>can guarantee that there are no other filters in the chain that will want
>>>to modify the content *and* that the client can actually accept the
>>>encoding, then as an optimization we can pass the data through the filter
>>>chain still encoded.  But that would only be an optimization.  And it
>>
> 
> Tough optimization. Right now, we don't have these kinds of "whole-chain"
> introspection concepts. Really, what we'd want is some kind of flag for the
> filters to say "I want uncompressed content". When the proxy wants to
> deliver compressed content, but it sees that filter in the stack, then it
> will decompress it.
> 
> (or the filter code itself would examine the stack and insert the right
>  translators between filters to ensure their inputs and ouputs are matched
>  up with the right encodings, character sets, etc)
> 
> 
>>>seems like it could be tricky to get it to always work doing it that way,
>>>perhaps.  (Is there ever a case where the client does not accept an
>>>encoding but the proxy does?)
>>
> 
> We could certainly implement the proxy that way.
> 
> Cheers,
> -g
> 




Content Codings (was: Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c)

Posted by Greg Stein <gs...@lyra.org>.
On Mon, May 20, 2002 at 08:40:03AM -0700, Ian Holsman wrote:
> Cliff Woolley wrote:
> > On 20 May 2002 ianh@apache.org wrote:
> > 
> > 
> >>ianh        02/05/19 17:07:33
> >>
> >>  Modified:    modules/filters mod_deflate.c
> >>  Log:
> >>  content with "Content-Encoding" header, content is encoded.
> >>  But mod_deflate does not check it. It cause to encode content twice.
> >>
> >>  This problem is reproducable by getting encoded content via mod_proxy.
> >>
> >>  Patch Contributed by kaz@asada.sytes.net (ASADA Kazuhisa)
> >>  Bug #9222

This is the wrong fix. Take a look at RFC 2616. MULTIPLE encodings are
allowed. If the Content-Encoding header said "identity", then we'll skip the
compression. Bad Bad Bad.

The following is entirely legal:

Content-Encoding: gzip, compress, deflate

(dumb, but legal)

[ see sections 14.11 and 3.5 of RFC 2616 for more info ]

> > Whoa, wait a minute.  That doesn't strike me as the right solution.  The
> > encoding should be one-hop only.  If it's encoded and we want to maintain
> > that encoding, chances are we'll have to decode it and re-encode it later.

Depends on what we want to do with proxied requests. Do we run them through
our own filter stack? (uncompressed) But also, it is entirely reasonable
that we could build an inflator on the client-piece of our proxy to handle
the case where Apache can speak compressed to the origin server, but we have
to inflate it for our client.

(right now, the proxy is just copying over what the client allows; it would
 be entirely reasonable to tell the origin server we can always accept
 deflated content; of course, that also means we need to write the inflate
 logic :-)

> deflate runs at the end (I think it's a header/network type output 
> filter)  and the output of this is compressed data.
> 
> we do check if the user can accept the gzip encoding,
> i assumed that if this has been content-encoded already that module has 
> done all the checks necessary to make sure the client supports it.

It happens to work because the proxy client stands in for the true client.
But that isn't always a good assumption.

> This should also fix the case where you have multiple content-encoders 
> running on the same file.. the one with the highest priority (first in 
> the chain) will do the encoding and the others will leave it alone.

Absolutely wrong. Each of them can and should be able to apply themselves if
they so choose.

The correct test is something like this:

  if Content-Encoding is present:
      if Content-Encoding contains "deflate" then
          skip-self

Essentially, we will apply a "deflate" encoding if it isn't already present.
After doing the deflate, we then need to set the Content-Encoding using:

  if Content-Encoding is not present, or it equals "identity":
      set Content-Encoding to "deflate"
  else
      append ", deflate" to Content-Encoding

>...
> > Why you ask?  Because leaving it encoded makes it impossible to apply
> > another filter on the proxy server (eg mod_include).  Now perhaps if we
> > can guarantee that there are no other filters in the chain that will want
> > to modify the content *and* that the client can actually accept the
> > encoding, then as an optimization we can pass the data through the filter
> > chain still encoded.  But that would only be an optimization.  And it

Tough optimization. Right now, we don't have these kinds of "whole-chain"
introspection concepts. Really, what we'd want is some kind of flag for the
filters to say "I want uncompressed content". When the proxy wants to
deliver compressed content, but it sees that filter in the stack, then it
will decompress it.

(or the filter code itself would examine the stack and insert the right
 translators between filters to ensure their inputs and ouputs are matched
 up with the right encodings, character sets, etc)

> > seems like it could be tricky to get it to always work doing it that way,
> > perhaps.  (Is there ever a case where the client does not accept an
> > encoding but the proxy does?)

We could certainly implement the proxy that way.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c

Posted by Ian Holsman <ia...@apache.org>.
Cliff Woolley wrote:
> On 20 May 2002 ianh@apache.org wrote:
> 
> 
>>ianh        02/05/19 17:07:33
>>
>>  Modified:    modules/filters mod_deflate.c
>>  Log:
>>  content with "Content-Encoding" header, content is encoded.
>>  But mod_deflate does not check it. It cause to encode content twice.
>>
>>  This problem is reproducable by getting encoded content via mod_proxy.
>>
>>  Patch Contributed by kaz@asada.sytes.net (ASADA Kazuhisa)
>>  Bug #9222
> 
> 
> Whoa, wait a minute.  That doesn't strike me as the right solution.  The
> encoding should be one-hop only.  If it's encoded and we want to maintain
> that encoding, chances are we'll have to decode it and re-encode it later.

deflate runs at the end (I think it's a header/network type output 
filter)  and the output of this is compressed data.

we do check if the user can accept the gzip encoding,
i assumed that if this has been content-encoded already that module has 
done all the checks necessary to make sure the client supports it.

This should also fix the case where you have multiple content-encoders 
running on the same file.. the one with the highest priority (first in 
the chain) will do the encoding and the others will leave it alone.

I think what your after is a mod-inflate which de-compresses the data


> Why you ask?  Because leaving it encoded makes it impossible to apply
> another filter on the proxy server (eg mod_include).  Now perhaps if we
> can guarantee that there are no other filters in the chain that will want
> to modify the content *and* that the client can actually accept the
> encoding, then as an optimization we can pass the data through the filter
> chain still encoded.  But that would only be an optimization.  And it
> seems like it could be tricky to get it to always work doing it that way,
> perhaps.  (Is there ever a case where the client does not accept an
> encoding but the proxy does?)

It shouldn't .. the proxy should be fixed if it does IMHO

--Ian

> 
> --Cliff
> 
> --------------------------------------------------------------
>    Cliff Woolley
>    cliffwoolley@yahoo.com
>    Charlottesville, VA
> 
>