You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Thomas Eckert <th...@gmail.com> on 2013/12/17 11:32:48 UTC

Revisiting: xml2enc, mod_proxy_html and content compression

I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to do
the detection magic but mod_xml2enc fails to detect compressed content
correctly. Hence a simple "ProxyHTMLEnable" fails when content compression
is in place.

To work around this without dropping support for content compression you
can do

  SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE

or at least that was the kind-of-result of the half-finished discussion
last time.

Aside from being plain ugly and troublesome to use (e.g. if you want to use
AddOutputfilter somewhere else) the above also has a major shortcoming,
which lies with already-compressed content.

Suppose the client does

  GET /something.tar.gz HTTP/1.1
  ...
  Accept-Encoding: gzip, deflate

to which the backend will respond with 200 but *not* send an
"Content-Encoding" header since the content is already encoded. Using the
above filter chain "corrupts" the content because it will be inflated and
then deflated, double compressing it in the end.

Imho this whole issue lies with proxy_html using xml2enc to do the content
type detection and xml2enc failing to detect the content encoding. I guess
all it really takes is to have xml2enc inspect the headers_in to see if
there is a "Content-Encoding" header and then add the inflate/deflate
filters (unless there is a general reason not to rely on the input headers,
see below).

Adding the inflate/deflate filters inside xml2enc is where I need some
advice. For the deflate part I can probably do something like

    const char *compression_method = apr_table_get(f->r->headers_in,
                                                       "Content-Encoding");
    if (compression_method != NULL &&
        strncasecmp(compression_method, "gzip", 4) == 0) {
            ap_add_output_filter("deflate", NULL, r, NULL);
    }

but what about the inflate part ? I can't simply add the inflate input
filter because at that point (in mod_xml2enc's xml2enc_ffunc() ) I would
then need to "go back" in the input filter chain which is afaik not
possible. So I would have to run the inflate input filter "in place".

Of course, this whole issue would disappear if inflate/deflate would be run
automagically (upon seeing a Content-Encoding header) in general. Anyway,
what's the reasoning behind not having them run always and give them the
knowledge (e.g. about the input headers) to get out of the way if necessary
?

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Ewald Dieterich <ew...@t-online.de>.
On 02/11/2014 06:03 PM, Nick Kew wrote:
>
> On 6 Feb 2014, at 09:40, Ewald Dieterich wrote:
>> My wishlist:
>>
>> * Make the configuration option as powerful as the compiled in fallback so that you can configure eg. "contains xml". But how would you do that? Support regular expressions?
>
> Nice thought.  Perhaps the expression parser would be the ideal solution?

A good idea even if it could be a challenge to configure because the 
expression parser covers so much.

>> * Provide a configuration option to blacklist content types so that you can use the defaults that are compiled in but exclude specific types from processing (this is how I work around the Sharepoint problem, I simply exclude content type "multipart/related").
>
> Perhaps combined with the expression parser as a 'magic' clause that
> expands to the default?

Yes, that would be very convenient.

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 6 Feb 2014, at 09:40, Ewald Dieterich wrote:

> Thanks for the patch!
> 
> On 02/05/2014 02:57 PM, Nick Kew wrote:
>> 
>> The hesitation is because I've been wanting to review the
>> patch before committing, and round tuits are in woefully
>> short supply.  So I'm attaching it here.  I'll take any feedback
>> from you or other users as a substitute for my own review,
>> and commit if it works for you without glitches.
> 
> Minor glitch: the patch doesn't compile because it uses the unknown variable cfg in xml2enc_ffunc().

Damn.  That may mean it's a "wrong" version of the patch.  At least incomplete.
Thanks for picking that up.

> Otherwise it works as advertised.

That's good to hear.

> My wishlist:
> 
> * Make the configuration option as powerful as the compiled in fallback so that you can configure eg. "contains xml". But how would you do that? Support regular expressions?

Nice thought.  Perhaps the expression parser would be the ideal solution?

> * Provide a configuration option to blacklist content types so that you can use the defaults that are compiled in but exclude specific types from processing (this is how I work around the Sharepoint problem, I simply exclude content type "multipart/related").

Perhaps combined with the expression parser as a 'magic' clause that
expands to the default?

${markup-types}

[Thomas Eckert]

> Doesn't
> 
> +    else { 
> +        /* default - only act if starts-with "text/" or contains "xml" */ 
> +        wanted = !strncmp(ctype, "text/", 5) || strstr(ctype, "xml"); 
> +    }
> 
> suffer from the same problem as the original code ?

That *is* the old behaviour, and keeps back-compatibility for the majority
of users who are fine with that.  If the default doesn't work for you then just
set something else!

What do you think of using the expression parser to determine whether to run?

-- 
Nick Kew



Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Thomas Eckert <th...@gmail.com>.
Doesn't

+    else {
+        /* default - only act if starts-with "text/" or contains "xml" */
+        wanted = !strncmp(ctype, "text/", 5) || strstr(ctype, "xml");
+    }

suffer from the same problem as the original code ? So if the user did not
give any "xml2Types" the default behaviour will hit the same problem as
with http://www.mail-archive.com/dev@httpd.apache.org/msg57029.html (the
mentioned earlier discussion regarding ctypes). IMO, this is especially
important seeing how using the new code requires configuration entries to
be added which a lot of admins will either not be aware of or simply not be
doing. In regards to the cited patch section above, I suggest something
along the lines of my patch suggestion in the first message of the
mentioned thread, e.g. include
  !strncmp(ctype, "application", 11)

Otherweise it's fine I think. Will test it sometime this week.



On Thu, Feb 6, 2014 at 10:40 AM, Ewald Dieterich <
ewald_dieterich@t-online.de> wrote:

> Thanks for the patch!
>
>
> On 02/05/2014 02:57 PM, Nick Kew wrote:
>
>>
>> The hesitation is because I've been wanting to review the
>> patch before committing, and round tuits are in woefully
>> short supply.  So I'm attaching it here.  I'll take any feedback
>> from you or other users as a substitute for my own review,
>> and commit if it works for you without glitches.
>>
>
> Minor glitch: the patch doesn't compile because it uses the unknown
> variable cfg in xml2enc_ffunc(). Otherwise it works as advertised.
>
> My wishlist:
>
> * Make the configuration option as powerful as the compiled in fallback so
> that you can configure eg. "contains xml". But how would you do that?
> Support regular expressions?
>
> * Provide a configuration option to blacklist content types so that you
> can use the defaults that are compiled in but exclude specific types from
> processing (this is how I work around the Sharepoint problem, I simply
> exclude content type "multipart/related").
>

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Ewald Dieterich <ew...@t-online.de>.
Thanks for the patch!

On 02/05/2014 02:57 PM, Nick Kew wrote:
>
> The hesitation is because I've been wanting to review the
> patch before committing, and round tuits are in woefully
> short supply.  So I'm attaching it here.  I'll take any feedback
> from you or other users as a substitute for my own review,
> and commit if it works for you without glitches.

Minor glitch: the patch doesn't compile because it uses the unknown 
variable cfg in xml2enc_ffunc(). Otherwise it works as advertised.

My wishlist:

* Make the configuration option as powerful as the compiled in fallback 
so that you can configure eg. "contains xml". But how would you do that? 
Support regular expressions?

* Provide a configuration option to blacklist content types so that you 
can use the defaults that are compiled in but exclude specific types 
from processing (this is how I work around the Sharepoint problem, I 
simply exclude content type "multipart/related").

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 21 Jan 2014, at 07:14, Ewald Dieterich wrote:

> On 12/17/2013 12:47 PM, Nick Kew wrote:
>> 
>> On 17 Dec 2013, at 10:32, Thomas Eckert wrote:
>> 
>>> I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to do the detection magic but mod_xml2enc fails to detect compressed content correctly. Hence a simple "ProxyHTMLEnable" fails when content compression is in place.
>> 
>> Aha!  Revisiting that, I see I still have an uncommitted patch to make
>> content types to process configurable.  I think that was an issue you
>> originally raised?  But compression is another issue.
> 
> I don't think you committed the patch to make content types configurable. Would you mind to share that patch? I have problems with a SharePoint 2013 server that sends a response with a multipart/related content type and I need to exclude that content type from processing:

OK, I've been meaning to commit for far too long.
Very sorry to have left you in limbo!

The hesitation is because I've been wanting to review the
patch before committing, and round tuits are in woefully
short supply.  So I'm attaching it here.  I'll take any feedback
from you or other users as a substitute for my own review,
and commit if it works for you without glitches.

-- 

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Ewald Dieterich <ew...@t-online.de>.
On 12/17/2013 12:47 PM, Nick Kew wrote:
>
> On 17 Dec 2013, at 10:32, Thomas Eckert wrote:
>
>> I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to do the detection magic but mod_xml2enc fails to detect compressed content correctly. Hence a simple "ProxyHTMLEnable" fails when content compression is in place.
>
> Aha!  Revisiting that, I see I still have an uncommitted patch to make
> content types to process configurable.  I think that was an issue you
> originally raised?  But compression is another issue.

I don't think you committed the patch to make content types 
configurable. Would you mind to share that patch? I have problems with a 
SharePoint 2013 server that sends a response with a multipart/related 
content type and I need to exclude that content type from processing:

Content-Type: multipart/related;
   type="application/xop+xml";
   boundary="urn:uuid:96f4525c-3b5b-4abf-ab09-7cfc8d346216";
   start="<6c...@tempuri.org>";
   start-Info="text/xml; charset=utf-8"

Ewald

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Ruediger Pluem <rp...@apache.org>.

Nick Kew wrote:

> 
> Returning to:
>> SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE
> 
> AFAICS the only thing that's missing is the nonessential step 4 above.

Which can be avoided with a setenvif Request_URI "\.gz$" no-gzip

Regards

Rüdiger

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Thomas Eckert <th...@gmail.com>.
Here is what I ended up with.

diff --git a/modules/filters/mod_deflate.c b/modules/filters/mod_deflate.c
index 605c158..fd3662a 100644
--- a/modules/filters/mod_deflate.c
+++ b/modules/filters/mod_deflate.c
@@ -450,6 +450,12 @@ static apr_status_t deflate_out_filter(ap_filter_t *f,
         return APR_SUCCESS;
     }

+    if (!strncasecmp(f->r->content_type, "application/x-gzip", 18)) {
+      ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, f->r, "not going to
compress application/x-gzip content");
+      ap_remove_output_filter(f);
+      return ap_pass_brigade(f->next, bb);
+    }
+
     c = ap_get_module_config(r->server->module_config,
                              &deflate_module);

@@ -1162,7 +1168,6 @@ static apr_status_t deflate_in_filter(ap_filter_t *f,
     return APR_SUCCESS;
 }

-
 /* Filter to inflate for a content-transforming proxy.  */
 static apr_status_t inflate_out_filter(ap_filter_t *f,
                                       apr_bucket_brigade *bb)
@@ -1181,6 +1186,12 @@ static apr_status_t inflate_out_filter(ap_filter_t
*f,
         return APR_SUCCESS;
     }

+    if (!strncasecmp(f->r->content_type, "application/x-gzip", 18)) {
+      ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, f->r, "not going to
decompress application/x-gzip content");
+      ap_remove_output_filter(f);
+      return ap_pass_brigade(f->next, bb);
+    }
+
     c = ap_get_module_config(r->server->module_config, &deflate_module);

     if (!ctx) {


 diff --git a/modules/filters/mod_proxy_html.c
b/modules/filters/mod_proxy_html.c
index b964fec..61834ff 100644
--- a/modules/filters/mod_proxy_html.c
+++ b/modules/filters/mod_proxy_html.c
@@ -107,6 +107,8 @@ typedef struct {
     int strip_comments;
     int interp;
     int enabled;
+    int inflate;
+    int deflate;
 } proxy_html_conf;
 typedef struct {
     ap_filter_t *f;
@@ -1322,6 +1324,8 @@ static void *proxy_html_merge(apr_pool_t *pool, void
*BASE, void *ADD)
         conf->interp = add->interp;
         conf->strip_comments = add->strip_comments;
         conf->enabled = add->enabled;
+        conf->inflate = add->inflate;
+        conf->deflate = add->deflate;
     }
     else {
         conf->flags = base->flags | add->flags;
@@ -1330,6 +1334,8 @@ static void *proxy_html_merge(apr_pool_t *pool, void
*BASE, void *ADD)
         conf->interp = base->interp | add->interp;
         conf->strip_comments = base->strip_comments | add->strip_comments;
         conf->enabled = add->enabled | base->enabled;
+        conf->inflate = add->inflate | base->inflate;
+        conf->deflate = add->deflate | base->deflate;
     }
     return conf;
 }
@@ -1537,6 +1543,14 @@ static const command_rec proxy_html_cmds[] = {
                  (void*)APR_OFFSETOF(proxy_html_conf, enabled),
                  RSRC_CONF|ACCESS_CONF,
                  "Enable proxy-html and xml2enc filters"),
+    AP_INIT_FLAG("ProxyHTMLInflate", ap_set_flag_slot,
+                (void*)APR_OFFSETOF(proxy_html_conf, inflate),
+                RSRC_CONF|ACCESS_CONF,
+                "Will inflate compressed content before rewriting"),
+    AP_INIT_FLAG("ProxyHTMLDeflate", ap_set_flag_slot,
+                (void*)APR_OFFSETOF(proxy_html_conf, deflate),
+                RSRC_CONF|ACCESS_CONF,
+                "Will deflate content after rewriting"),
     { NULL }
 };
 static int mod_proxy_html(apr_pool_t *p, apr_pool_t *p1, apr_pool_t *p2)
@@ -1569,10 +1583,16 @@ static void proxy_html_insert(request_rec *r)
     proxy_html_conf *cfg;
     cfg = ap_get_module_config(r->per_dir_config, &proxy_html_module);
     if (cfg->enabled) {
+        if (cfg->inflate) {
+          ap_add_output_filter("inflate", NULL, r, r->connection);
+        }
         if (xml2enc_filter)
             xml2enc_filter(r, NULL, ENCIO_INPUT_CHECKS);
         ap_add_output_filter("proxy-html", NULL, r, r->connection);
         ap_add_output_filter("proxy-css", NULL, r, r->connection);
+        if (cfg->deflate) {
+          ap_add_output_filter("deflate", NULL, r, r->connection);
+        }
     }
 }
 static void proxy_html_hooks(apr_pool_t *p)


The diffs are obviously not against trunk/2.4.x since they are just meant
to show what I have in mind. I'm still worried about the mod_xml2enc
though. Seeing how it inserts itself into the output filter chain, above
mod_proxy_html patch might actually result in xml2enc attaching itself
*behind* deflate - which is bad. I haven't figured out how to work around
this yet. Any suggestions on how to do this ?

In general, is this a sensible way to approach the proxy-html/compression
issue in your opinion ?



On Tue, Jan 14, 2014 at 2:08 PM, Thomas Eckert
<th...@gmail.com>wrote:

> > IIRC the OP wants to decompress such contents and run them
> > through mod_proxy_html.  I don't think that works with any sane
> > setup: running non-HTML content-types through proxy_html
> > will always be an at-your-own-risk hack.
>
> What I want is a (preferrably as simple as possible) method of configuring
> mod_proxy_html in such a way that it will attempt to rewrite html(/css/js)
> content even if the content was delivered in a compressed format by the
> backend server. In my opinion the part about compression should actually be
> done transparently (to the user) by mod_proxy_html/mod_deflate.
>
> The reason I brought the .gz files up as example is because they were
> handled sligthly incorrect (unnecessary overhead + unpleasant side effect
> on client side).
>
>
>
> > Gzip compressed content sometimes gets served with no declared encoding
> and a media type of, e.g., “application/x-gzip”. I reckon that's more
> common than serving it as
> > application/octet-stream or with no Content-Type: declared.
>
> > mod_deflate could use this information to avoid compressing the
> response, and without sniffing the content.
>
> Exactly what I'm aiming for. I think that's the way to go here, see '1)'
> in my previous reply. In this case we should also make mod_xml2enc bail out
> with corresponding log message when it gets to see compressed content, e.g.
> either via env variable set by inflate filter or read Content-Type header,
> so all of the involved modules act consistently and their log output will
> not be misunderstood as errors.
>
>
>
> > This more limited approach is already available through configuration,
> so maybe the way to handle this is via a change to documentation / default
> configuration, rather than code.
>
> In order to make mod_proxy_html work with possibly compressed contents you
> cannot simply do a
>   ProxyHTMLEnable On
> and what I have been using since the last discussion which I mentioned
> before is
>   SetOutputFilter inflate;xml2enc;proxy-html;deflate
> with no other explicit configuration of mod_deflate. I'm aware of
>
>     AddOutputFilterByType DEFLATE text/html text/plain text/xml
>     AddOutputFilterByType DEFLATE text/css
>     AddOutputFilterByType DEFLATE application/x-javascript
> application/javascript application/ecmascript
>     AddOutputFilterByType DEFLATE application/rss+xml
> but this is not compatible with the above output filter chain (see my
> previous reply).
>
> Maybe one is able to disable output compression on already-compressed
> content with a smart <If> like block but do we really want this as default
> configuration ? Is there ever a case where someone does *NOT* want
> mod_proxy_html and friends to handle compression transparently ?
>
>
>
> On Sun, Jan 5, 2014 at 2:57 PM, Tim Bannister <is...@jellybaby.net> wrote:
>
>> On 5 Jan 2014, at 02:21, Nick Kew wrote:
>>
>> > IIRC the OP wants to decompress such contents and run them through
>> mod_proxy_html.  I don't think that works with any sane setup: running
>> non-HTML content-types through proxy_html will always be an
>> at-your-own-risk hack.
>>
>> I've believed for a while that the right way to address this is for httpd
>> to support gzip Transfer-Encoding which is always hop-by-hop and applies to
>> the transfer rather than the entity being transferred. For this scenario,
>> it could look like this:
>>
>> [Client] ⇦ gzip content-encoding ⇦ [transforming reverse proxy] ⇦
>> gzip,chunked transfer-encodings ⇦ [origin server]
>>
>> (I'm assuming that the client doesn't negotiate gzip transfer encoding)
>>
>>
>> Of course, this still won't help with a badly-configured origin server.
>>
>> --
>> Tim Bannister – isoma@jellybaby.net
>>
>>
>

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Thomas Eckert <th...@gmail.com>.
> IIRC the OP wants to decompress such contents and run them
> through mod_proxy_html.  I don't think that works with any sane
> setup: running non-HTML content-types through proxy_html
> will always be an at-your-own-risk hack.

What I want is a (preferrably as simple as possible) method of configuring
mod_proxy_html in such a way that it will attempt to rewrite html(/css/js)
content even if the content was delivered in a compressed format by the
backend server. In my opinion the part about compression should actually be
done transparently (to the user) by mod_proxy_html/mod_deflate.

The reason I brought the .gz files up as example is because they were
handled sligthly incorrect (unnecessary overhead + unpleasant side effect
on client side).


> Gzip compressed content sometimes gets served with no declared encoding
and a media type of, e.g., “application/x-gzip”. I reckon that's more
common than serving it as
> application/octet-stream or with no Content-Type: declared.

> mod_deflate could use this information to avoid compressing the response,
and without sniffing the content.

Exactly what I'm aiming for. I think that's the way to go here, see '1)' in
my previous reply. In this case we should also make mod_xml2enc bail out
with corresponding log message when it gets to see compressed content, e.g.
either via env variable set by inflate filter or read Content-Type header,
so all of the involved modules act consistently and their log output will
not be misunderstood as errors.


> This more limited approach is already available through configuration, so
maybe the way to handle this is via a change to documentation / default
configuration, rather than code.

In order to make mod_proxy_html work with possibly compressed contents you
cannot simply do a
  ProxyHTMLEnable On
and what I have been using since the last discussion which I mentioned
before is
  SetOutputFilter inflate;xml2enc;proxy-html;deflate
with no other explicit configuration of mod_deflate. I'm aware of
    AddOutputFilterByType DEFLATE text/html text/plain text/xml
    AddOutputFilterByType DEFLATE text/css
    AddOutputFilterByType DEFLATE application/x-javascript
application/javascript application/ecmascript
    AddOutputFilterByType DEFLATE application/rss+xml
but this is not compatible with the above output filter chain (see my
previous reply).

Maybe one is able to disable output compression on already-compressed
content with a smart <If> like block but do we really want this as default
configuration ? Is there ever a case where someone does *NOT* want
mod_proxy_html and friends to handle compression transparently ?



On Sun, Jan 5, 2014 at 2:57 PM, Tim Bannister <is...@jellybaby.net> wrote:

> On 5 Jan 2014, at 02:21, Nick Kew wrote:
>
> > IIRC the OP wants to decompress such contents and run them through
> mod_proxy_html.  I don't think that works with any sane setup: running
> non-HTML content-types through proxy_html will always be an
> at-your-own-risk hack.
>
> I've believed for a while that the right way to address this is for httpd
> to support gzip Transfer-Encoding which is always hop-by-hop and applies to
> the transfer rather than the entity being transferred. For this scenario,
> it could look like this:
>
> [Client] ⇦ gzip content-encoding ⇦ [transforming reverse proxy] ⇦
> gzip,chunked transfer-encodings ⇦ [origin server]
>
> (I'm assuming that the client doesn't negotiate gzip transfer encoding)
>
>
> Of course, this still won't help with a badly-configured origin server.
>
> --
> Tim Bannister – isoma@jellybaby.net
>
>

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Tim Bannister <is...@jellybaby.net>.
On 5 Jan 2014, at 02:21, Nick Kew wrote:

> IIRC the OP wants to decompress such contents and run them through mod_proxy_html.  I don't think that works with any sane setup: running non-HTML content-types through proxy_html will always be an at-your-own-risk hack.

I've believed for a while that the right way to address this is for httpd to support gzip Transfer-Encoding which is always hop-by-hop and applies to the transfer rather than the entity being transferred. For this scenario, it could look like this:

[Client] ⇦ gzip content-encoding ⇦ [transforming reverse proxy] ⇦ gzip,chunked transfer-encodings ⇦ [origin server]

(I'm assuming that the client doesn't negotiate gzip transfer encoding)


Of course, this still won't help with a badly-configured origin server.

-- 
Tim Bannister – isoma@jellybaby.net


Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 4 Jan 2014, at 13:36, Tim Bannister wrote:

> Gzip compressed content sometimes gets served with no declared encoding and a media type of, e.g., “application/x-gzip”. I reckon that's more common than serving it as application/octet-stream or with no Content-Type: declared.
> 
> mod_deflate could use this information to avoid compressing the response, and without sniffing the content.
> 
> This more limited approach is already available through configuration, so maybe the way to handle this is via a change to documentation / default configuration, rather than code.
> 
> Any thoughts?

Agree in principle.  In practice to work it we'd want to enumerate
scenarios we can/should support, then figure out whether they can
all be accomplished using configuration alone, and if not what
code changes we can or should make.

IIRC the OP wants to decompress such contents and run them
through mod_proxy_html.  I don't think that works with any sane
setup: running non-HTML content-types through proxy_html
will always be an at-your-own-risk hack.

-- 
Nick Kew

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Tim Bannister <is...@jellybaby.net>.
On 4 Jan 2014, at 00:20, Nick Kew wrote:
> On 3 Jan 2014, at 13:39, Thomas Eckert wrote:
> 
>> This does not solve the problem regarding .gz files however. They still suffer from a double-compression.
…
> I'd say any such fix must lie in adding a compression-sniffing option
> to mod_deflate:
>  - let the inflate filter sniff for compressed contents
>  - let the deflate filter sniff for already-compressed contents
> even if the headers fail to declare it.
> 
> An option with big "at your own risk" warnings.

Gzip compressed content sometimes gets served with no declared encoding and a media type of, e.g., “application/x-gzip”. I reckon that's more common than serving it as application/octet-stream or with no Content-Type: declared.

mod_deflate could use this information to avoid compressing the response, and without sniffing the content.

This more limited approach is already available through configuration, so maybe the way to handle this is via a change to documentation / default configuration, rather than code.

Any thoughts?

-- 
Tim Bannister – isoma@jellybaby.net


Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 3 Jan 2014, at 13:39, Thomas Eckert wrote:

>  This does not solve the problem regarding .gz files however. They still suffer from a double-compression.

AFAICT that's only when the backend sends compressed contents but
fails to declare the content-encoding?

> Using the above patch/configuration we could either
>   1) patch mod_deflate to bail out when it sees a .gz file
> or
>   2) patch mod_proxy_html (in the above mentioned section) to bail out if it sees a .gz file.
> I cannot think of a situation where we would actually want to "HTTP compress" a .gz file. There might also be other formats then gzip invovled - at least the RFC allows for them, though I've only seen gzip in the wild. For there two reasons I would to with 1).

I don't think we can do any of those those: we'd be breaking HTTP.
Any solution to your issue has to be configurable and not a default.

I'd say any such fix must lie in adding a compression-sniffing option
to mod_deflate:
  - let the inflate filter sniff for compressed contents
  - let the deflate filter sniff for already-compressed contents
even if the headers fail to declare it.

An option with big "at your own risk" warnings.

-- 
Nick Kew


Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Thomas Eckert <th...@gmail.com>.
After applying

@@ -1569,10 +1579,13 @@ static void proxy_html_insert(request_rec *r)
     proxy_html_conf *cfg;
     cfg = ap_get_module_config(r->per_dir_config, &proxy_html_module);
     if (cfg->enabled) {
-        if (xml2enc_filter)
+        ap_add_output_filter("INFLATE", NULL, r, r->connection);
+        if (xml2enc_filter) {
             xml2enc_filter(r, NULL, ENCIO_INPUT_CHECKS);
+        }
         ap_add_output_filter("proxy-html", NULL, r, r->connection);
         ap_add_output_filter("proxy-css", NULL, r, r->connection);
+        ap_add_output_filter("DEFLATE", NULL, r, r->connection);
     }
 }

a simple

  ProxyHTMLEnable On

will do the trick for simple text/html but I did have to remove the
mod_deflate config (see further down). This does not solve the problem
regarding .gz files however. They still suffer from a double-compression.
Using the above patch/configuration we could either
  1) patch mod_deflate to bail out when it sees a .gz file
or
  2) patch mod_proxy_html (in the above mentioned section) to bail out if
it sees a .gz file.
I cannot think of a situation where we would actually want to "HTTP
compress" a .gz file. There might also be other formats then gzip invovled
- at least the RFC allows for them, though I've only seen gzip in the wild.
For there two reasons I would to with 1).


In order to get the above patch working I also had to remove

  AddOutputFilterByType DEFLATE text/html text/plain text/xml
  AddOutputFilterByType DEFLATE text/css
  AddOutputFilterByType DEFLATE application/x-javascript
application/javascript application/ecmascript
  AddOutputFilterByType DEFLATE application/rss+xml

from the (global) configuration because the compression would kick in
*before* mod_xml2enc was called for the second time in the output filter
chain. This makes mod_xml2enc see compressed content and fail. Here's how
the output filter chain looks like at different points in time:

called: inflate_out_filter()
output filters:
inflate
xml2enc
proxy-html
proxy-css
BYTYPE:DEFLATE
deflate
mod_session_out
byterange
content_length
http_header
http_outerror
core

called: proxy_html_filter()
output filters:
xml2enc
proxy-html
proxy-css
BYTYPE:DEFLATE
deflate
mod_session_out
byterange
content_length
http_header
http_outerror
core

called: proxy_css_filter()
output filters:
xml2enc
proxy-html
proxy-css
BYTYPE:DEFLATE
xml2enc
deflate
mod_session_out
byterange
content_length
http_header
http_outerror
core

How do I move the second pass to xml2enc before BYTYPE:DEFLATE ? I'm not
aware of a variant of ap_add_output_filter() which lets one adjust the
position of the to-insert filter.

Solving this problem would allow to remove the call to
ap_add_output_filter() in the above patch, which in turn allows for nice
and clean configurations (e.g. by using the example config of mod_deflate)
as well as allowing the reverseproxy to do "HTTP compression" even if the
backend did not choose to do so.


On Thu, Dec 19, 2013 at 4:01 AM, Nick Kew <ni...@webthing.com> wrote:

>
> On 18 Dec 2013, at 14:47, Thomas Eckert wrote:
>
> > No, yes and I tried but couldn't get it to work. Following your advice I
> went along the lines of
>
> Yes, I'd be trying something like that.  You can insert inflate (and
> deflate)
> unconditionally, as they will check the headers themselves and remove
> themselves if appropriate.
>
> But I'd make at least the deflate component of that configurable:
> many sysops may prefer to sacrifice compression to avoid that
> unnecessary overhead.
>
> --
> Nick Kew

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 18 Dec 2013, at 14:47, Thomas Eckert wrote:

> No, yes and I tried but couldn't get it to work. Following your advice I went along the lines of

Yes, I'd be trying something like that.  You can insert inflate (and deflate)
unconditionally, as they will check the headers themselves and remove
themselves if appropriate.

But I'd make at least the deflate component of that configurable:
many sysops may prefer to sacrifice compression to avoid that
unnecessary overhead.

-- 
Nick Kew

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Thomas Eckert <th...@gmail.com>.
> Aha!  Revisiting that, I see I still have an uncommitted patch to make
> content types to process configurable.  I think that was an issue you
> originally raised?  But compression is another issue.

Yep.

> Hmmm?

> If the backend sends compressed contents with no content-encoding,
doesn't that imply:
> 1. INFLATE doesn't see encoding, so steps away.
> 2. xml2enc and proxy-html can't parse compressed content, so step away
(log an error?)
> 3. DEFLATE … aha, that's what you meant about double-compression.
> In effect the whole chain was reduced to just DEFLATE.   That's a bit
nonsensical
> but not incorrect, and the user-agent will reverse the DEFLATE and
restore the
> original from the backend, yesno?

I think you are right. Yet, when using FF or Chrome (both in the latest
versions) the final result is 'double compressed' nonetheless. Repeating
the steps 'manually' (curl + gzip) it's all good, meaning the original file
from the server is restored as it should be. I'm reluctant to blame the
clients however.

> But is the real issue anything more than an inability to use
ProxyHTMLEnable
> with compressed contents? In which case, wouldn't mod_proxy_html be the
> place to patch?  Have it test/insert deflate at the same point as it
inserts xml2enc?

No, yes and I tried but couldn't get it to work. Following your advice I
went along the lines of

diff --git a/modules/filters/mod_proxy_html.c
b/modules/filters/mod_proxy_html.c
index b964fec..9760115 100644
--- a/modules/filters/mod_proxy_html.c
+++ b/modules/filters/mod_proxy_html.c
@@ -1569,10 +1569,19 @@ static void proxy_html_insert(request_rec *r)
     proxy_html_conf *cfg;
     cfg = ap_get_module_config(r->per_dir_config, &proxy_html_module);
     if (cfg->enabled) {
-        if (xml2enc_filter)
+        int add_deflate_output_filter = 0;
+        if (apr_table_get(r->headers_in, "Content-Encoding:") != NULL) {
+            ap_add_input_filter("inflate", NULL, r, r->connection);
+            add_deflate_output_filter = 1;
+        }
+        if (xml2enc_filter) {
             xml2enc_filter(r, NULL, ENCIO_INPUT_CHECKS);
+        }
         ap_add_output_filter("proxy-html", NULL, r, r->connection);
         ap_add_output_filter("proxy-css", NULL, r, r->connection);
+        if (add_deflate_output_filter) {
+            ap_add_output_filter("deflate", NULL, r, r->connection);
+        }
     }
 }
 static void proxy_html_hooks(apr_pool_t *p)

but it appears to be way off because it does exactly nothing. When logging
the headers at this point, I found r->headers_in to contain the client
request whereas r->headers_out was empty. Doesn't this tell me I'm doing
all of this too early ?




On Tue, Dec 17, 2013 at 12:47 PM, Nick Kew <ni...@webthing.com> wrote:

>
> On 17 Dec 2013, at 10:32, Thomas Eckert wrote:
>
> > I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to
> do the detection magic but mod_xml2enc fails to detect compressed content
> correctly. Hence a simple "ProxyHTMLEnable" fails when content compression
> is in place.
>
> Aha!  Revisiting that, I see I still have an uncommitted patch to make
> content types to process configurable.  I think that was an issue you
> originally raised?  But compression is another issue.
>
> > To work around this without dropping support for content compression you
> can do
> >
> >   SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE
> >
> > or at least that was the kind-of-result of the half-finished discussion
> last time.
>
> I didn't find that discussion.  But I suspect my reaction would have
> included
> a certain aversion to that level of processing overhead in the proxy in
> these
> days of fatter pipes and hardware compression.
>
> > Suppose the client does
> >
> >   GET /something.tar.gz HTTP/1.1
> >   ...
> >   Accept-Encoding: gzip, deflate
> >
> > to which the backend will respond with 200 but *not* send an
> "Content-Encoding" header since the content is already encoded. Using the
> above filter chain "corrupts" the content because it will be inflated and
> then deflated, double compressing it in the end.
>
> Hmmm?
>
> If the backend sends compressed contents with no content-encoding, doesn't
> that imply:
> 1. INFLATE doesn't see encoding, so steps away.
> 2. xml2enc and proxy-html can't parse compressed content, so step away
> (log an error?)
> 3. DEFLATE … aha, that's what you meant about double-compression.
> In effect the whole chain was reduced to just DEFLATE.   That's a bit
> nonsensical
> but not incorrect, and the user-agent will reverse the DEFLATE and restore
> the
> original from the backend, yesno?
>
> > Imho this whole issue lies with proxy_html using xml2enc to do the
> content type detection and xml2enc failing to detect the content encoding.
> I guess all it really takes is to have xml2enc inspect the headers_in to
> see if there is a "Content-Encoding" header and then add the
> inflate/deflate filters (unless there is a general reason not to rely on
> the input headers, see below).
>
> Well in this particular case, surely it lies with the backend?
> But is the real issue anything more than an inability to use
> ProxyHTMLEnable
> with compressed contents?  In which case, wouldn't mod_proxy_html be the
> place to patch?  Have it test/insert deflate at the same point as it
> inserts xml2enc?
>
> > Of course, this whole issue would disappear if inflate/deflate would be
> run automagically (upon seeing a Content-Encoding header) in general.
> Anyway, what's the reasoning behind not having them run always and give
> them the knowledge (e.g. about the input headers) to get out of the way if
> necessary ?
>
> That's an interesting thought.  mod_deflate will of course do exactly that
> if configured, so the issue seems to boil down to configuring that filter
> chain.
>
> The ultimate chain here would be:
> 1.      INFLATE // unpack compressed contents
> 2.      xml2enc         // deal with charset for libxml2/mod_proxy_html
> 3.      proxy-html      // fix URLs
> 4.      xml2enc         // set an output encoding other than utf-8
> 5.      DEFLATE // compress
>
> That's not possible with SetOutputFilter or FilterChain&family, because
> you can't configure both instances of xml2enc at once (that's what
> ProxyHTMLEnable deals with).  But of those, 4 and 5 seem low-priority
> as they're not doing really essential things.
>
> Returning to:
> > SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE
>
> AFAICS the only thing that's missing is the nonessential step 4 above.
>
> Am I missing something?
>
> --
> Nick Kew

Re: Revisiting: xml2enc, mod_proxy_html and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 17 Dec 2013, at 10:32, Thomas Eckert wrote:

> I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to do the detection magic but mod_xml2enc fails to detect compressed content correctly. Hence a simple "ProxyHTMLEnable" fails when content compression is in place.

Aha!  Revisiting that, I see I still have an uncommitted patch to make
content types to process configurable.  I think that was an issue you
originally raised?  But compression is another issue.

> To work around this without dropping support for content compression you can do
> 
>   SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE
> 
> or at least that was the kind-of-result of the half-finished discussion last time.

I didn't find that discussion.  But I suspect my reaction would have included
a certain aversion to that level of processing overhead in the proxy in these
days of fatter pipes and hardware compression.

> Suppose the client does
>   
>   GET /something.tar.gz HTTP/1.1
>   ...
>   Accept-Encoding: gzip, deflate
> 
> to which the backend will respond with 200 but *not* send an "Content-Encoding" header since the content is already encoded. Using the above filter chain "corrupts" the content because it will be inflated and then deflated, double compressing it in the end. 

Hmmm?

If the backend sends compressed contents with no content-encoding, doesn't that imply:
1. INFLATE doesn't see encoding, so steps away.
2. xml2enc and proxy-html can't parse compressed content, so step away (log an error?)
3. DEFLATE … aha, that's what you meant about double-compression.
In effect the whole chain was reduced to just DEFLATE.   That's a bit nonsensical
but not incorrect, and the user-agent will reverse the DEFLATE and restore the
original from the backend, yesno?

> Imho this whole issue lies with proxy_html using xml2enc to do the content type detection and xml2enc failing to detect the content encoding. I guess all it really takes is to have xml2enc inspect the headers_in to see if there is a "Content-Encoding" header and then add the inflate/deflate filters (unless there is a general reason not to rely on the input headers, see below).

Well in this particular case, surely it lies with the backend?
But is the real issue anything more than an inability to use ProxyHTMLEnable
with compressed contents?  In which case, wouldn't mod_proxy_html be the
place to patch?  Have it test/insert deflate at the same point as it inserts xml2enc?

> Of course, this whole issue would disappear if inflate/deflate would be run automagically (upon seeing a Content-Encoding header) in general. Anyway, what's the reasoning behind not having them run always and give them the knowledge (e.g. about the input headers) to get out of the way if necessary ?

That's an interesting thought.  mod_deflate will of course do exactly that
if configured, so the issue seems to boil down to configuring that filter chain.

The ultimate chain here would be:
1.	INFLATE	// unpack compressed contents
2.	xml2enc		// deal with charset for libxml2/mod_proxy_html
3.	proxy-html	// fix URLs
4.	xml2enc		// set an output encoding other than utf-8
5.	DEFLATE	// compress

That's not possible with SetOutputFilter or FilterChain&family, because
you can't configure both instances of xml2enc at once (that's what
ProxyHTMLEnable deals with).  But of those, 4 and 5 seem low-priority
as they're not doing really essential things.

Returning to:
> SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE

AFAICS the only thing that's missing is the nonessential step 4 above.

Am I missing something?

-- 
Nick Kew