You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Thomas Eckert <Th...@Sophos.com> on 2012/11/14 16:53:19 UTC

[users@httpd] mod_proxy_html, HTML rewrite and content compression

Hi folks

I'm using apache (2.4.3) as reverse proxy with mod_proxy_html (as 
delivered with 2.4.3) and encountered an issue using HTML rewriting in 
combination with content compression, as with the "Accept-Encoding" and 
"Content-Encoding" HTTP headers.

This issue has been encountered by numerous people and the solution 
presented always comes down to setting the output filters "manually" 
instead of using the mod_proxy_html directive to do it (e.g. see 
http://forums.gentoo.org/viewtopic-t-908890-start-0.html). So it boils 
down to setting

         SetOutputFilter INFLATE;proxy-html;DEFLATE
instead of
         ProxyHTMLEnable On

In my test setup this actually solved the problem but it has a side 
effect which I am worried about.

In file modules/filters/mod_proxy_html.c function proxy_html_insert() 
it's clearly visible the xml2enc function is only called if cfg->enabled 
is set - which in turn is set via the ProxyHTMLEnable directive as 
declared with

AP_INIT_FLAG("ProxyHTMLEnable", ap_set_flag_slot,
                  (void*)APR_OFFSETOF(proxy_html_conf, enabled),
                  RSRC_CONF|ACCESS_CONF,
                  "Enable proxy-html and xml2enc filters")

I took a look at mod_xml2enc to see if there was a directive which I 
could use to establish the filtering in a way that matches the 
ProxyHTMLEnable directive but I could fine none.

Is there a way to work around this ? I do want the call to mod_xml2enc 
to happen but I also want the reverse proxy to support content compression.

Any suggestions on how to go forward/where to dig on this issue ?

Regards,
   Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_proxy_html, HTML rewrite and content compression

Posted by Thomas Eckert <Th...@Sophos.com>.
On 11/16/2012 05:12 PM, Nick Kew wrote:
> On Fri, 16 Nov 2012 11:31:38 +0100
> Thomas Eckert<Th...@Sophos.com>  wrote:
>
>> Thanks for the hint but unfortunately "manually" adding xml2enc to the
>> filtering chain does not help.
> Looks like you've got problems over and above anything to do with
> your configuration!
>
>>       "SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly
> I thought you said it had charset issues?
>
>
>> [pid 15039:tid 3007834992] mod_xml2enc.c(259): [client
>> 10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2;
>> trying apr_xlate
> That seems implausible.  How do you get a libxml2 install that
> doesn't natively support ISO-8859-1 (latin1)?
>
>> [pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
>> [client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
>> [pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
>> invalid byte(s) in input stream!
>> (and more conversion errors)
> It looks as if your backend incorrectly identifies the charset
> of the page in question.  Either that or you found a bug.
> Do you have a URL where your unprocessed page could be viewed?
>
Sorry for the delay on this. The basic problem remains: If I enable html 
rewriting and connect with a client requesting content compression the 
reverse proxy will fail with a message pointing at libxml2/encoding. I 
can also see different log entries depending on whether I set the 
charset of the page.

So if I just send the page with "Content-Type: text/html" this is what I get

mod_deflate.c(1283): [client 10.10.10.10:39771] AH01398: Zlib: Inflated 
348 to 682 : URL /
mod_xml2enc.c(183): [client 10.10.10.10:39771] AH01430: Content-Type is 
text/html
mod_xml2enc.c(259): [client 10.10.10.10:39771] AH01434: Charset 
ISO-8859-1 not supported by libxml2; trying apr_xlate
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: 
consuming 682 bytes from bucket
mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc: 
converted 682/682 bytes
mod_deflate.c(763): [client 10.10.10.10:39771] AH01384: Zlib: Compressed 
668 to 344 : URL /
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: 
consuming 10 bytes from bucket
[client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder 
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] 
AH01441: xml2enc: converted 1/1 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc: 
converted 9/8 bytes
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: 
consuming 344 bytes from bucket
[client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder 
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] 
AH01441: xml2enc: converted 4/4 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] 
AH01441: xml2enc: converted 4/3 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] 
AH01441: xml2enc: converted 1/0 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(481): [client 10.10.10.10:39771] AH01440: xml2enc: 
reinserting 334 unconsumed bytes from bucket
[client 10.10.10.10:39771] AH01385: Zlib error -2 flushing zlib output 
buffer ((null))


But if "Content-Type: text/html; charset=ISO-8859-1" is sent this is 
what I get

mod_deflate.c(1283): [client 10.10.10.10:40040] AH01398: Zlib: Inflated 
348 to 682 : URL /
mod_xml2enc.c(183): [client 10.10.10.10:40040] AH01430: Content-Type is 
text/html;charset=ISO-8859-1
[client 10.10.10.10:40040] AH01431: Got charset ISO-8859-1 from HTTP headers
mod_deflate.c(763): [client 10.10.10.10:40040] AH01384: Zlib: Compressed 
668 to 344 : URL /
mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc: 
consuming 10 bytes from bucket
[client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder 
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] 
AH01441: xml2enc: converted 1/1 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): [client 10.10.10.10:40040] AH01441: xml2enc: 
converted 9/8 bytes
mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc: 
consuming 344 bytes from bucket
[client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder 
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] 
AH01441: xml2enc: converted 4/4 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] 
AH01441: xml2enc: converted 4/3 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] 
AH01441: xml2enc: converted 1/0 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(481): [client 10.10.10.10:40040] AH01440: xml2enc: 
reinserting 334 unconsumed bytes from bucket

 From what I can tell, this still seems to be the "wrong" processing as 
the page cannot be inflated correctly at the user's end. Nevertheless 
the message
   AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
does not show up anymore. Looking at mod_xml2enc.c +185-194 and +251-268 
that makes sense but would imply the enc detection in +198-206 failed. I 
suggest adding some sort of "failed" debug message in case 
xmlDetectCharEncoding() didn't work as desired.

I've tried a couple more combinations, including using mod_charset_lite 
and different non-latin1 encodings on the backend, but the only thing 
that works is using the Header directive on the backend to set 
"Content-Type: text/html; charset=UTF-8" while leaving the actual 
contents unchanged. Here, "works" means the page is displayed correctly 
at the client's end.

The goal is still to get mod_proxy_html to rewrite the html just like it 
would to with "ProxyHTMLEnable On" but at the same time retaining 
compression support. So setting
  SetOutputFilter INFLATE;proxy-html
which "drops out" the "xml2enc" filter might be problematic.

Unfortunately, the page is not accessible publicly. It is rather simply, 
though, and I made sure there is nothing 'special' on that page - e.g. 
it's just plain ascii, no meta tags, etc.

Note, I tried both "ProxyHTMLEnable On" and "SetOutputFilter 
INFLATE;proxy-html" as filter directives for all above mentioned setups. 
Neither worked except with the mentioned forced UTF-8 header.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_proxy_html, HTML rewrite and content compression

Posted by Nick Kew <ni...@webthing.com>.
On Fri, 16 Nov 2012 11:31:38 +0100
Thomas Eckert <Th...@Sophos.com> wrote:

> Thanks for the hint but unfortunately "manually" adding xml2enc to the 
> filtering chain does not help.

Looks like you've got problems over and above anything to do with
your configuration!

>      "SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly

I thought you said it had charset issues?


> [pid 15039:tid 3007834992] mod_xml2enc.c(259): [client 
> 10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2; 
> trying apr_xlate

That seems implausible.  How do you get a libxml2 install that
doesn't natively support ISO-8859-1 (latin1)?

> [pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument: 
> [client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
> [pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping 
> invalid byte(s) in input stream!

> (and more conversion errors)

It looks as if your backend incorrectly identifies the charset
of the page in question.  Either that or you found a bug.
Do you have a URL where your unprocessed page could be viewed?

-- 
Nick Kew

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_proxy_html, HTML rewrite and content compression

Posted by Thomas Eckert <Th...@Sophos.com>.
On 11/14/2012 06:12 PM, Nick Kew wrote:
> On 14 Nov 2012, at 15:53, Thomas Eckert wrote:
>
>> Is there a way to work around this ? I do want the call to mod_xml2enc to happen but I also want the reverse proxy to support content compression.
> That's a lot of correct analysis.
>
> The output chain you want is INFLATE;xml2enc;proxy-html;DEFLATE .
> What problems do you encounter (apart from processing overhead)
> when you set all that?
>
> I guess ideally ProxyHTMLEnable should detect compressed content
> and insert INFLATE where necessary.  Maybe even have an option to
> set DEFLATE.  But as of now it doesn't: I got the impression most
> users prefer to disable compression, and avoid the substantial
> processing overhead of zipping in a proxy.
>
Thanks for the hint but unfortunately "manually" adding xml2enc to the 
filtering chain does not help. The output is still broken. One thing I 
noticed was the "DEFLATE" filter is not necessary, since apache will do 
the compression anyway (even though I removed "AddOutputFilter text/html 
DEFLATE" from my global config). That's why it's not present below.

Here are some debug log extracts which confuse me. Note, I patched 
mod_proxy_html.c with a one-liner to get me "Running proxy_html_filter" 
into the log, otherwise mod_proxy_html only gives feedback in error 
situations and I wouldn't be able to defer from the log when 
mod_proxy_html is running.


     "SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly

[pid 14245:tid 2714090352] proxy_util.c(1998): AH00943: http: has 
released connection for (vhost01.backend03.local)
[pid 14245:tid 2714090352] mod_deflate.c(1283): [client 
10.10.10.10:40375] AH01398: Zlib: Inflated 348 to 674 : URL /
[pid 14245:tid 2714090352] [client 10.128.128.60:40375] Running 
proxy_html_filter
[pid 14245:tid 2714090352] mod_deflate.c(763): [client 
10.10.10.60:40375] AH01384: Zlib: Compressed 655 to 342 : URL /
< customized log message from LogFormat+CustomLog appears here >


     "SetOutputFilter INFLATE;xml2enc;proxy-html" results in "curl: (52) 
Empty reply from server"

[pid 15039:tid 3007834992] proxy_util.c(1998): AH00943: http: has 
released connection for (vhost01.backend03.local)
[pid 15039:tid 3007834992] mod_deflate.c(1283): [client 
10.10.10.10:40388] AH01398: Zlib: Inflated 348 to 674 : URL /
[pid 15039:tid 3007834992] mod_xml2enc.c(183): [client 
10.10.10.10:40388] AH01430: Content-Type is text/html
[pid 15039:tid 3007834992] mod_xml2enc.c(259): [client 
10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2; 
trying apr_xlate
[pid 15039:tid 3007834992] mod_xml2enc.c(463): [client 
10.10.10.10:40388] AH01439: xml2enc: consuming 674 bytes from bucket
[pid 15039:tid 3007834992] mod_xml2enc.c(490): [client 
10.10.10.10:40388] AH01441: xml2enc: converted 674/674 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] Running 
proxy_html_filter
[pid 15039:tid 3007834992] mod_deflate.c(763): [client 
10.10.10.10:40388] AH01384: Zlib: Compressed 655 to 342 : URL /
[pid 15039:tid 3007834992] mod_xml2enc.c(463): [client 
10.10.10.10:40388] AH01439: xml2enc: consuming 10 bytes from bucket
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument: 
[client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping 
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(490): [client 
10.10.10.10:40388] AH01441: xml2enc: converted 9/8 bytes
[pid 15039:tid 3007834992] mod_xml2enc.c(463): [client 
10.10.10.10:40388] AH01439: xml2enc: consuming 342 bytes from bucket
< customized log message from LogFormat+CustomLog appears here >
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument: 
[client 10.10.10.10:40388] AH01441: xml2enc: converted 2/2 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping 
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument: 
[client 10.10.10.10:40388] AH01441: xml2enc: converted 2/1 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping 
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument: 
[client 10.10.10.10:40388] AH01441: xml2enc: converted 4/3 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping 
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument: 
[client 10.10.10.10:40388] AH01441: xml2enc: converted 1/0 bytes
[pid 15039:tid 3007834992] [client 10.10.10.100:40388] AH01444: Skipping 
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(481): [client 
10.10.10.10:40388] AH01440: xml2enc: reinserting 332 unconsumed bytes 
from bucket
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01385: Zlib 
error -2 flushing zlib output buffer ((null))


     "ProxyHTMLEnable On" results in  "curl: (61) Error while processing 
content unencoding: invalid code lengths set"

[pid 16165:tid 3007834992] proxy_util.c(1998): AH00943: http: has 
released connection for (vhost01.backend03.local)
[pid 16165:tid 3007834992] mod_xml2enc.c(183): [client 
10.10.10.10:40406] AH01430: Content-Type is text/html
[pid 16165:tid 3007834992] mod_xml2enc.c(259): [client 
10.10.10.10:40406] AH01434: Charset ISO-8859-1 not supported by libxml2; 
trying apr_xlate
[pid 16165:tid 3007834992] mod_xml2enc.c(463): [client 
10.10.10.10:40406] AH01439: xml2enc: consuming 366 bytes from bucket
[pid 16165:tid 3007834992] mod_xml2enc.c(490): [client 
10.10.10.10:40406] AH01441: xml2enc: converted 255/366 bytes
[pid 16165:tid 3007834992] mod_xml2enc.c(490): [client 
10.10.10.10:40406] AH01441: xml2enc: converted 111/159 bytes
[pid 16165:tid 3007834992] [client 10.128.128.60:40406] Running 
proxy_html_filter
[pid 16165:tid 3007834992] mod_xml2enc.c(463): [client 
10.10.10.10:40406] AH01439: xml2enc: consuming 537 bytes from bucket
[pid 16165:tid 3007834992] mod_xml2enc.c(490): [client 
10.10.10.10:40406] AH01441: xml2enc: converted 1306/1306 bytes
[pid 16165:tid 3007834992] mod_proxy_balancer.c(656): [client 
10.10.10.10:40406] AH01176: proxy_balancer_post_request for 
(balancer://cd107d9706d71153bafd4ab15f1c6b5d)
< customized log message from LogFormat+CustomLog appears here >

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_proxy_html, HTML rewrite and content compression

Posted by Nick Kew <ni...@webthing.com>.
On 14 Nov 2012, at 15:53, Thomas Eckert wrote:

> Is there a way to work around this ? I do want the call to mod_xml2enc to happen but I also want the reverse proxy to support content compression.

That's a lot of correct analysis.

The output chain you want is INFLATE;xml2enc;proxy-html;DEFLATE .
What problems do you encounter (apart from processing overhead)
when you set all that?

I guess ideally ProxyHTMLEnable should detect compressed content
and insert INFLATE where necessary.  Maybe even have an option to
set DEFLATE.  But as of now it doesn't: I got the impression most
users prefer to disable compression, and avoid the substantial
processing overhead of zipping in a proxy.

-- 
Nick Kew
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org