You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Thomas Eckert <th...@gmail.com> on 2014/01/03 14:39:01 UTC

Re: Revisiting: xml2enc, mod_proxy_html and content compression

After applying

@@ -1569,10 +1579,13 @@ static void proxy_html_insert(request_rec *r)
     proxy_html_conf *cfg;
     cfg = ap_get_module_config(r->per_dir_config, &proxy_html_module);
     if (cfg->enabled) {
-        if (xml2enc_filter)
+        ap_add_output_filter("INFLATE", NULL, r, r->connection);
+        if (xml2enc_filter) {
             xml2enc_filter(r, NULL, ENCIO_INPUT_CHECKS);
+        }
         ap_add_output_filter("proxy-html", NULL, r, r->connection);
         ap_add_output_filter("proxy-css", NULL, r, r->connection);
+        ap_add_output_filter("DEFLATE", NULL, r, r->connection);
     }
 }

a simple

  ProxyHTMLEnable On

will do the trick for simple text/html but I did have to remove the
mod_deflate config (see further down). This does not solve the problem
regarding .gz files however. They still suffer from a double-compression.
Using the above patch/configuration we could either
  1) patch mod_deflate to bail out when it sees a .gz file
or
  2) patch mod_proxy_html (in the above mentioned section) to bail out if
it sees a .gz file.
I cannot think of a situation where we would actually want to "HTTP
compress" a .gz file. There might also be other formats then gzip invovled
- at least the RFC allows for them, though I've only seen gzip in the wild.
For there two reasons I would to with 1).


In order to get the above patch working I also had to remove

  AddOutputFilterByType DEFLATE text/html text/plain text/xml
  AddOutputFilterByType DEFLATE text/css
  AddOutputFilterByType DEFLATE application/x-javascript
application/javascript application/ecmascript
  AddOutputFilterByType DEFLATE application/rss+xml

from the (global) configuration because the compression would kick in
*before* mod_xml2enc was called for the second time in the output filter
chain. This makes mod_xml2enc see compressed content and fail. Here's how
the output filter chain looks like at different points in time:

called: inflate_out_filter()
output filters:
inflate
xml2enc
proxy-html
proxy-css
BYTYPE:DEFLATE
deflate
mod_session_out
byterange
content_length
http_header
http_outerror
core

called: proxy_html_filter()
output filters:
xml2enc
proxy-html
proxy-css
BYTYPE:DEFLATE
deflate
mod_session_out
byterange
content_length
http_header
http_outerror
core

called: proxy_css_filter()
output filters:
xml2enc
proxy-html
proxy-css
BYTYPE:DEFLATE
xml2enc
deflate
mod_session_out
byterange
content_length
http_header
http_outerror
core

How do I move the second pass to xml2enc before BYTYPE:DEFLATE ? I'm not
aware of a variant of ap_add_output_filter() which lets one adjust the
position of the to-insert filter.

Solving this problem would allow to remove the call to
ap_add_output_filter() in the above patch, which in turn allows for nice
and clean configurations (e.g. by using the example config of mod_deflate)
as well as allowing the reverseproxy to do "HTTP compression" even if the
backend did not choose to do so.


On Thu, Dec 19, 2013 at 4:01 AM, Nick Kew <ni...@webthing.com> wrote:

>
> On 18 Dec 2013, at 14:47, Thomas Eckert wrote:
>
> > No, yes and I tried but couldn't get it to work. Following your advice I
> went along the lines of
>
> Yes, I'd be trying something like that.  You can insert inflate (and
> deflate)
> unconditionally, as they will check the headers themselves and remove
> themselves if appropriate.
>
> But I'd make at least the deflate component of that configurable:
> many sysops may prefer to sacrifice compression to avoid that
> unnecessary overhead.
>
> --
> Nick Kew

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Thomas Eckert <th...@gmail.com>.
Here is what I ended up with.

diff --git a/modules/filters/mod_deflate.c b/modules/filters/mod_deflate.c
index 605c158..fd3662a 100644
--- a/modules/filters/mod_deflate.c
+++ b/modules/filters/mod_deflate.c
@@ -450,6 +450,12 @@ static apr_status_t deflate_out_filter(ap_filter_t *f,
         return APR_SUCCESS;
     }

+    if (!strncasecmp(f->r->content_type, "application/x-gzip", 18)) {
+      ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, f->r, "not going to
compress application/x-gzip content");
+      ap_remove_output_filter(f);
+      return ap_pass_brigade(f->next, bb);
+    }
+
     c = ap_get_module_config(r->server->module_config,
                              &deflate_module);

@@ -1162,7 +1168,6 @@ static apr_status_t deflate_in_filter(ap_filter_t *f,
     return APR_SUCCESS;
 }

-
 /* Filter to inflate for a content-transforming proxy.  */
 static apr_status_t inflate_out_filter(ap_filter_t *f,
                                       apr_bucket_brigade *bb)
@@ -1181,6 +1186,12 @@ static apr_status_t inflate_out_filter(ap_filter_t
*f,
         return APR_SUCCESS;
     }

+    if (!strncasecmp(f->r->content_type, "application/x-gzip", 18)) {
+      ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, f->r, "not going to
decompress application/x-gzip content");
+      ap_remove_output_filter(f);
+      return ap_pass_brigade(f->next, bb);
+    }
+
     c = ap_get_module_config(r->server->module_config, &deflate_module);

     if (!ctx) {


 diff --git a/modules/filters/mod_proxy_html.c
b/modules/filters/mod_proxy_html.c
index b964fec..61834ff 100644
--- a/modules/filters/mod_proxy_html.c
+++ b/modules/filters/mod_proxy_html.c
@@ -107,6 +107,8 @@ typedef struct {
     int strip_comments;
     int interp;
     int enabled;
+    int inflate;
+    int deflate;
 } proxy_html_conf;
 typedef struct {
     ap_filter_t *f;
@@ -1322,6 +1324,8 @@ static void *proxy_html_merge(apr_pool_t *pool, void
*BASE, void *ADD)
         conf->interp = add->interp;
         conf->strip_comments = add->strip_comments;
         conf->enabled = add->enabled;
+        conf->inflate = add->inflate;
+        conf->deflate = add->deflate;
     }
     else {
         conf->flags = base->flags | add->flags;
@@ -1330,6 +1334,8 @@ static void *proxy_html_merge(apr_pool_t *pool, void
*BASE, void *ADD)
         conf->interp = base->interp | add->interp;
         conf->strip_comments = base->strip_comments | add->strip_comments;
         conf->enabled = add->enabled | base->enabled;
+        conf->inflate = add->inflate | base->inflate;
+        conf->deflate = add->deflate | base->deflate;
     }
     return conf;
 }
@@ -1537,6 +1543,14 @@ static const command_rec proxy_html_cmds[] = {
                  (void*)APR_OFFSETOF(proxy_html_conf, enabled),
                  RSRC_CONF|ACCESS_CONF,
                  "Enable proxy-html and xml2enc filters"),
+    AP_INIT_FLAG("ProxyHTMLInflate", ap_set_flag_slot,
+                (void*)APR_OFFSETOF(proxy_html_conf, inflate),
+                RSRC_CONF|ACCESS_CONF,
+                "Will inflate compressed content before rewriting"),
+    AP_INIT_FLAG("ProxyHTMLDeflate", ap_set_flag_slot,
+                (void*)APR_OFFSETOF(proxy_html_conf, deflate),
+                RSRC_CONF|ACCESS_CONF,
+                "Will deflate content after rewriting"),
     { NULL }
 };
 static int mod_proxy_html(apr_pool_t *p, apr_pool_t *p1, apr_pool_t *p2)
@@ -1569,10 +1583,16 @@ static void proxy_html_insert(request_rec *r)
     proxy_html_conf *cfg;
     cfg = ap_get_module_config(r->per_dir_config, &proxy_html_module);
     if (cfg->enabled) {
+        if (cfg->inflate) {
+          ap_add_output_filter("inflate", NULL, r, r->connection);
+        }
         if (xml2enc_filter)
             xml2enc_filter(r, NULL, ENCIO_INPUT_CHECKS);
         ap_add_output_filter("proxy-html", NULL, r, r->connection);
         ap_add_output_filter("proxy-css", NULL, r, r->connection);
+        if (cfg->deflate) {
+          ap_add_output_filter("deflate", NULL, r, r->connection);
+        }
     }
 }
 static void proxy_html_hooks(apr_pool_t *p)


The diffs are obviously not against trunk/2.4.x since they are just meant
to show what I have in mind. I'm still worried about the mod_xml2enc
though. Seeing how it inserts itself into the output filter chain, above
mod_proxy_html patch might actually result in xml2enc attaching itself
*behind* deflate - which is bad. I haven't figured out how to work around
this yet. Any suggestions on how to do this ?

In general, is this a sensible way to approach the proxy-html/compression
issue in your opinion ?



On Tue, Jan 14, 2014 at 2:08 PM, Thomas Eckert
<th...@gmail.com>wrote:

> > IIRC the OP wants to decompress such contents and run them
> > through mod_proxy_html.  I don't think that works with any sane
> > setup: running non-HTML content-types through proxy_html
> > will always be an at-your-own-risk hack.
>
> What I want is a (preferrably as simple as possible) method of configuring
> mod_proxy_html in such a way that it will attempt to rewrite html(/css/js)
> content even if the content was delivered in a compressed format by the
> backend server. In my opinion the part about compression should actually be
> done transparently (to the user) by mod_proxy_html/mod_deflate.
>
> The reason I brought the .gz files up as example is because they were
> handled sligthly incorrect (unnecessary overhead + unpleasant side effect
> on client side).
>
>
>
> > Gzip compressed content sometimes gets served with no declared encoding
> and a media type of, e.g., “application/x-gzip”. I reckon that's more
> common than serving it as
> > application/octet-stream or with no Content-Type: declared.
>
> > mod_deflate could use this information to avoid compressing the
> response, and without sniffing the content.
>
> Exactly what I'm aiming for. I think that's the way to go here, see '1)'
> in my previous reply. In this case we should also make mod_xml2enc bail out
> with corresponding log message when it gets to see compressed content, e.g.
> either via env variable set by inflate filter or read Content-Type header,
> so all of the involved modules act consistently and their log output will
> not be misunderstood as errors.
>
>
>
> > This more limited approach is already available through configuration,
> so maybe the way to handle this is via a change to documentation / default
> configuration, rather than code.
>
> In order to make mod_proxy_html work with possibly compressed contents you
> cannot simply do a
>   ProxyHTMLEnable On
> and what I have been using since the last discussion which I mentioned
> before is
>   SetOutputFilter inflate;xml2enc;proxy-html;deflate
> with no other explicit configuration of mod_deflate. I'm aware of
>
>     AddOutputFilterByType DEFLATE text/html text/plain text/xml
>     AddOutputFilterByType DEFLATE text/css
>     AddOutputFilterByType DEFLATE application/x-javascript
> application/javascript application/ecmascript
>     AddOutputFilterByType DEFLATE application/rss+xml
> but this is not compatible with the above output filter chain (see my
> previous reply).
>
> Maybe one is able to disable output compression on already-compressed
> content with a smart <If> like block but do we really want this as default
> configuration ? Is there ever a case where someone does *NOT* want
> mod_proxy_html and friends to handle compression transparently ?
>
>
>
> On Sun, Jan 5, 2014 at 2:57 PM, Tim Bannister <is...@jellybaby.net> wrote:
>
>> On 5 Jan 2014, at 02:21, Nick Kew wrote:
>>
>> > IIRC the OP wants to decompress such contents and run them through
>> mod_proxy_html.  I don't think that works with any sane setup: running
>> non-HTML content-types through proxy_html will always be an
>> at-your-own-risk hack.
>>
>> I've believed for a while that the right way to address this is for httpd
>> to support gzip Transfer-Encoding which is always hop-by-hop and applies to
>> the transfer rather than the entity being transferred. For this scenario,
>> it could look like this:
>>
>> [Client] ⇦ gzip content-encoding ⇦ [transforming reverse proxy] ⇦
>> gzip,chunked transfer-encodings ⇦ [origin server]
>>
>> (I'm assuming that the client doesn't negotiate gzip transfer encoding)
>>
>>
>> Of course, this still won't help with a badly-configured origin server.
>>
>> --
>> Tim Bannister – isoma@jellybaby.net
>>
>>
>

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Thomas Eckert <th...@gmail.com>.
> IIRC the OP wants to decompress such contents and run them
> through mod_proxy_html.  I don't think that works with any sane
> setup: running non-HTML content-types through proxy_html
> will always be an at-your-own-risk hack.

What I want is a (preferrably as simple as possible) method of configuring
mod_proxy_html in such a way that it will attempt to rewrite html(/css/js)
content even if the content was delivered in a compressed format by the
backend server. In my opinion the part about compression should actually be
done transparently (to the user) by mod_proxy_html/mod_deflate.

The reason I brought the .gz files up as example is because they were
handled sligthly incorrect (unnecessary overhead + unpleasant side effect
on client side).


> Gzip compressed content sometimes gets served with no declared encoding
and a media type of, e.g., “application/x-gzip”. I reckon that's more
common than serving it as
> application/octet-stream or with no Content-Type: declared.

> mod_deflate could use this information to avoid compressing the response,
and without sniffing the content.

Exactly what I'm aiming for. I think that's the way to go here, see '1)' in
my previous reply. In this case we should also make mod_xml2enc bail out
with corresponding log message when it gets to see compressed content, e.g.
either via env variable set by inflate filter or read Content-Type header,
so all of the involved modules act consistently and their log output will
not be misunderstood as errors.


> This more limited approach is already available through configuration, so
maybe the way to handle this is via a change to documentation / default
configuration, rather than code.

In order to make mod_proxy_html work with possibly compressed contents you
cannot simply do a
  ProxyHTMLEnable On
and what I have been using since the last discussion which I mentioned
before is
  SetOutputFilter inflate;xml2enc;proxy-html;deflate
with no other explicit configuration of mod_deflate. I'm aware of
    AddOutputFilterByType DEFLATE text/html text/plain text/xml
    AddOutputFilterByType DEFLATE text/css
    AddOutputFilterByType DEFLATE application/x-javascript
application/javascript application/ecmascript
    AddOutputFilterByType DEFLATE application/rss+xml
but this is not compatible with the above output filter chain (see my
previous reply).

Maybe one is able to disable output compression on already-compressed
content with a smart <If> like block but do we really want this as default
configuration ? Is there ever a case where someone does *NOT* want
mod_proxy_html and friends to handle compression transparently ?



On Sun, Jan 5, 2014 at 2:57 PM, Tim Bannister <is...@jellybaby.net> wrote:

> On 5 Jan 2014, at 02:21, Nick Kew wrote:
>
> > IIRC the OP wants to decompress such contents and run them through
> mod_proxy_html.  I don't think that works with any sane setup: running
> non-HTML content-types through proxy_html will always be an
> at-your-own-risk hack.
>
> I've believed for a while that the right way to address this is for httpd
> to support gzip Transfer-Encoding which is always hop-by-hop and applies to
> the transfer rather than the entity being transferred. For this scenario,
> it could look like this:
>
> [Client] ⇦ gzip content-encoding ⇦ [transforming reverse proxy] ⇦
> gzip,chunked transfer-encodings ⇦ [origin server]
>
> (I'm assuming that the client doesn't negotiate gzip transfer encoding)
>
>
> Of course, this still won't help with a badly-configured origin server.
>
> --
> Tim Bannister – isoma@jellybaby.net
>
>

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Tim Bannister <is...@jellybaby.net>.
On 5 Jan 2014, at 02:21, Nick Kew wrote:

> IIRC the OP wants to decompress such contents and run them through mod_proxy_html.  I don't think that works with any sane setup: running non-HTML content-types through proxy_html will always be an at-your-own-risk hack.

I've believed for a while that the right way to address this is for httpd to support gzip Transfer-Encoding which is always hop-by-hop and applies to the transfer rather than the entity being transferred. For this scenario, it could look like this:

[Client] ⇦ gzip content-encoding ⇦ [transforming reverse proxy] ⇦ gzip,chunked transfer-encodings ⇦ [origin server]

(I'm assuming that the client doesn't negotiate gzip transfer encoding)


Of course, this still won't help with a badly-configured origin server.

-- 
Tim Bannister – isoma@jellybaby.net


Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 4 Jan 2014, at 13:36, Tim Bannister wrote:

> Gzip compressed content sometimes gets served with no declared encoding and a media type of, e.g., “application/x-gzip”. I reckon that's more common than serving it as application/octet-stream or with no Content-Type: declared.
> 
> mod_deflate could use this information to avoid compressing the response, and without sniffing the content.
> 
> This more limited approach is already available through configuration, so maybe the way to handle this is via a change to documentation / default configuration, rather than code.
> 
> Any thoughts?

Agree in principle.  In practice to work it we'd want to enumerate
scenarios we can/should support, then figure out whether they can
all be accomplished using configuration alone, and if not what
code changes we can or should make.

IIRC the OP wants to decompress such contents and run them
through mod_proxy_html.  I don't think that works with any sane
setup: running non-HTML content-types through proxy_html
will always be an at-your-own-risk hack.

-- 
Nick Kew

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Tim Bannister <is...@jellybaby.net>.
On 4 Jan 2014, at 00:20, Nick Kew wrote:
> On 3 Jan 2014, at 13:39, Thomas Eckert wrote:
> 
>> This does not solve the problem regarding .gz files however. They still suffer from a double-compression.
…
> I'd say any such fix must lie in adding a compression-sniffing option
> to mod_deflate:
>  - let the inflate filter sniff for compressed contents
>  - let the deflate filter sniff for already-compressed contents
> even if the headers fail to declare it.
> 
> An option with big "at your own risk" warnings.

Gzip compressed content sometimes gets served with no declared encoding and a media type of, e.g., “application/x-gzip”. I reckon that's more common than serving it as application/octet-stream or with no Content-Type: declared.

mod_deflate could use this information to avoid compressing the response, and without sniffing the content.

This more limited approach is already available through configuration, so maybe the way to handle this is via a change to documentation / default configuration, rather than code.

Any thoughts?

-- 
Tim Bannister – isoma@jellybaby.net


Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 3 Jan 2014, at 13:39, Thomas Eckert wrote:

>  This does not solve the problem regarding .gz files however. They still suffer from a double-compression.

AFAICT that's only when the backend sends compressed contents but
fails to declare the content-encoding?

> Using the above patch/configuration we could either
>   1) patch mod_deflate to bail out when it sees a .gz file
> or
>   2) patch mod_proxy_html (in the above mentioned section) to bail out if it sees a .gz file.
> I cannot think of a situation where we would actually want to "HTTP compress" a .gz file. There might also be other formats then gzip invovled - at least the RFC allows for them, though I've only seen gzip in the wild. For there two reasons I would to with 1).

I don't think we can do any of those those: we'd be breaking HTTP.
Any solution to your issue has to be configurable and not a default.

I'd say any such fix must lie in adding a compression-sniffing option
to mod_deflate:
  - let the inflate filter sniff for compressed contents
  - let the deflate filter sniff for already-compressed contents
even if the headers fail to declare it.

An option with big "at your own risk" warnings.

-- 
Nick Kew