You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Thomas Eckert <Th...@Sophos.com> on 2012/11/14 16:53:19 UTC
[users@httpd] mod_proxy_html, HTML rewrite and content compression
Hi folks
I'm using apache (2.4.3) as reverse proxy with mod_proxy_html (as
delivered with 2.4.3) and encountered an issue using HTML rewriting in
combination with content compression, as with the "Accept-Encoding" and
"Content-Encoding" HTTP headers.
This issue has been encountered by numerous people and the solution
presented always comes down to setting the output filters "manually"
instead of using the mod_proxy_html directive to do it (e.g. see
http://forums.gentoo.org/viewtopic-t-908890-start-0.html). So it boils
down to setting
SetOutputFilter INFLATE;proxy-html;DEFLATE
instead of
ProxyHTMLEnable On
In my test setup this actually solved the problem but it has a side
effect which I am worried about.
In file modules/filters/mod_proxy_html.c function proxy_html_insert()
it's clearly visible the xml2enc function is only called if cfg->enabled
is set - which in turn is set via the ProxyHTMLEnable directive as
declared with
AP_INIT_FLAG("ProxyHTMLEnable", ap_set_flag_slot,
(void*)APR_OFFSETOF(proxy_html_conf, enabled),
RSRC_CONF|ACCESS_CONF,
"Enable proxy-html and xml2enc filters")
I took a look at mod_xml2enc to see if there was a directive which I
could use to establish the filtering in a way that matches the
ProxyHTMLEnable directive but I could fine none.
Is there a way to work around this ? I do want the call to mod_xml2enc
to happen but I also want the reverse proxy to support content compression.
Any suggestions on how to go forward/where to dig on this issue ?
Regards,
Thomas
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_proxy_html, HTML rewrite and content compression
Posted by Thomas Eckert <Th...@Sophos.com>.
On 11/16/2012 05:12 PM, Nick Kew wrote:
> On Fri, 16 Nov 2012 11:31:38 +0100
> Thomas Eckert<Th...@Sophos.com> wrote:
>
>> Thanks for the hint but unfortunately "manually" adding xml2enc to the
>> filtering chain does not help.
> Looks like you've got problems over and above anything to do with
> your configuration!
>
>> "SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly
> I thought you said it had charset issues?
>
>
>> [pid 15039:tid 3007834992] mod_xml2enc.c(259): [client
>> 10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2;
>> trying apr_xlate
> That seems implausible. How do you get a libxml2 install that
> doesn't natively support ISO-8859-1 (latin1)?
>
>> [pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
>> [client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
>> [pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
>> invalid byte(s) in input stream!
>> (and more conversion errors)
> It looks as if your backend incorrectly identifies the charset
> of the page in question. Either that or you found a bug.
> Do you have a URL where your unprocessed page could be viewed?
>
Sorry for the delay on this. The basic problem remains: If I enable html
rewriting and connect with a client requesting content compression the
reverse proxy will fail with a message pointing at libxml2/encoding. I
can also see different log entries depending on whether I set the
charset of the page.
So if I just send the page with "Content-Type: text/html" this is what I get
mod_deflate.c(1283): [client 10.10.10.10:39771] AH01398: Zlib: Inflated
348 to 682 : URL /
mod_xml2enc.c(183): [client 10.10.10.10:39771] AH01430: Content-Type is
text/html
mod_xml2enc.c(259): [client 10.10.10.10:39771] AH01434: Charset
ISO-8859-1 not supported by libxml2; trying apr_xlate
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc:
consuming 682 bytes from bucket
mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc:
converted 682/682 bytes
mod_deflate.c(763): [client 10.10.10.10:39771] AH01384: Zlib: Compressed
668 to 344 : URL /
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc:
consuming 10 bytes from bucket
[client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771]
AH01441: xml2enc: converted 1/1 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc:
converted 9/8 bytes
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc:
consuming 344 bytes from bucket
[client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771]
AH01441: xml2enc: converted 4/4 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771]
AH01441: xml2enc: converted 4/3 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771]
AH01441: xml2enc: converted 1/0 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(481): [client 10.10.10.10:39771] AH01440: xml2enc:
reinserting 334 unconsumed bytes from bucket
[client 10.10.10.10:39771] AH01385: Zlib error -2 flushing zlib output
buffer ((null))
But if "Content-Type: text/html; charset=ISO-8859-1" is sent this is
what I get
mod_deflate.c(1283): [client 10.10.10.10:40040] AH01398: Zlib: Inflated
348 to 682 : URL /
mod_xml2enc.c(183): [client 10.10.10.10:40040] AH01430: Content-Type is
text/html;charset=ISO-8859-1
[client 10.10.10.10:40040] AH01431: Got charset ISO-8859-1 from HTTP headers
mod_deflate.c(763): [client 10.10.10.10:40040] AH01384: Zlib: Compressed
668 to 344 : URL /
mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc:
consuming 10 bytes from bucket
[client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040]
AH01441: xml2enc: converted 1/1 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): [client 10.10.10.10:40040] AH01441: xml2enc:
converted 9/8 bytes
mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc:
consuming 344 bytes from bucket
[client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040]
AH01441: xml2enc: converted 4/4 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040]
AH01441: xml2enc: converted 4/3 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040]
AH01441: xml2enc: converted 1/0 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(481): [client 10.10.10.10:40040] AH01440: xml2enc:
reinserting 334 unconsumed bytes from bucket
From what I can tell, this still seems to be the "wrong" processing as
the page cannot be inflated correctly at the user's end. Nevertheless
the message
AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
does not show up anymore. Looking at mod_xml2enc.c +185-194 and +251-268
that makes sense but would imply the enc detection in +198-206 failed. I
suggest adding some sort of "failed" debug message in case
xmlDetectCharEncoding() didn't work as desired.
I've tried a couple more combinations, including using mod_charset_lite
and different non-latin1 encodings on the backend, but the only thing
that works is using the Header directive on the backend to set
"Content-Type: text/html; charset=UTF-8" while leaving the actual
contents unchanged. Here, "works" means the page is displayed correctly
at the client's end.
The goal is still to get mod_proxy_html to rewrite the html just like it
would to with "ProxyHTMLEnable On" but at the same time retaining
compression support. So setting
SetOutputFilter INFLATE;proxy-html
which "drops out" the "xml2enc" filter might be problematic.
Unfortunately, the page is not accessible publicly. It is rather simply,
though, and I made sure there is nothing 'special' on that page - e.g.
it's just plain ascii, no meta tags, etc.
Note, I tried both "ProxyHTMLEnable On" and "SetOutputFilter
INFLATE;proxy-html" as filter directives for all above mentioned setups.
Neither worked except with the mentioned forced UTF-8 header.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_proxy_html, HTML rewrite and content
compression
Posted by Nick Kew <ni...@webthing.com>.
On Fri, 16 Nov 2012 11:31:38 +0100
Thomas Eckert <Th...@Sophos.com> wrote:
> Thanks for the hint but unfortunately "manually" adding xml2enc to the
> filtering chain does not help.
Looks like you've got problems over and above anything to do with
your configuration!
> "SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly
I thought you said it had charset issues?
> [pid 15039:tid 3007834992] mod_xml2enc.c(259): [client
> 10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2;
> trying apr_xlate
That seems implausible. How do you get a libxml2 install that
doesn't natively support ISO-8859-1 (latin1)?
> [pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
> [client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
> [pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
> invalid byte(s) in input stream!
> (and more conversion errors)
It looks as if your backend incorrectly identifies the charset
of the page in question. Either that or you found a bug.
Do you have a URL where your unprocessed page could be viewed?
--
Nick Kew
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_proxy_html, HTML rewrite and content compression
Posted by Thomas Eckert <Th...@Sophos.com>.
On 11/14/2012 06:12 PM, Nick Kew wrote:
> On 14 Nov 2012, at 15:53, Thomas Eckert wrote:
>
>> Is there a way to work around this ? I do want the call to mod_xml2enc to happen but I also want the reverse proxy to support content compression.
> That's a lot of correct analysis.
>
> The output chain you want is INFLATE;xml2enc;proxy-html;DEFLATE .
> What problems do you encounter (apart from processing overhead)
> when you set all that?
>
> I guess ideally ProxyHTMLEnable should detect compressed content
> and insert INFLATE where necessary. Maybe even have an option to
> set DEFLATE. But as of now it doesn't: I got the impression most
> users prefer to disable compression, and avoid the substantial
> processing overhead of zipping in a proxy.
>
Thanks for the hint but unfortunately "manually" adding xml2enc to the
filtering chain does not help. The output is still broken. One thing I
noticed was the "DEFLATE" filter is not necessary, since apache will do
the compression anyway (even though I removed "AddOutputFilter text/html
DEFLATE" from my global config). That's why it's not present below.
Here are some debug log extracts which confuse me. Note, I patched
mod_proxy_html.c with a one-liner to get me "Running proxy_html_filter"
into the log, otherwise mod_proxy_html only gives feedback in error
situations and I wouldn't be able to defer from the log when
mod_proxy_html is running.
"SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly
[pid 14245:tid 2714090352] proxy_util.c(1998): AH00943: http: has
released connection for (vhost01.backend03.local)
[pid 14245:tid 2714090352] mod_deflate.c(1283): [client
10.10.10.10:40375] AH01398: Zlib: Inflated 348 to 674 : URL /
[pid 14245:tid 2714090352] [client 10.128.128.60:40375] Running
proxy_html_filter
[pid 14245:tid 2714090352] mod_deflate.c(763): [client
10.10.10.60:40375] AH01384: Zlib: Compressed 655 to 342 : URL /
< customized log message from LogFormat+CustomLog appears here >
"SetOutputFilter INFLATE;xml2enc;proxy-html" results in "curl: (52)
Empty reply from server"
[pid 15039:tid 3007834992] proxy_util.c(1998): AH00943: http: has
released connection for (vhost01.backend03.local)
[pid 15039:tid 3007834992] mod_deflate.c(1283): [client
10.10.10.10:40388] AH01398: Zlib: Inflated 348 to 674 : URL /
[pid 15039:tid 3007834992] mod_xml2enc.c(183): [client
10.10.10.10:40388] AH01430: Content-Type is text/html
[pid 15039:tid 3007834992] mod_xml2enc.c(259): [client
10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2;
trying apr_xlate
[pid 15039:tid 3007834992] mod_xml2enc.c(463): [client
10.10.10.10:40388] AH01439: xml2enc: consuming 674 bytes from bucket
[pid 15039:tid 3007834992] mod_xml2enc.c(490): [client
10.10.10.10:40388] AH01441: xml2enc: converted 674/674 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] Running
proxy_html_filter
[pid 15039:tid 3007834992] mod_deflate.c(763): [client
10.10.10.10:40388] AH01384: Zlib: Compressed 655 to 342 : URL /
[pid 15039:tid 3007834992] mod_xml2enc.c(463): [client
10.10.10.10:40388] AH01439: xml2enc: consuming 10 bytes from bucket
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
[client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(490): [client
10.10.10.10:40388] AH01441: xml2enc: converted 9/8 bytes
[pid 15039:tid 3007834992] mod_xml2enc.c(463): [client
10.10.10.10:40388] AH01439: xml2enc: consuming 342 bytes from bucket
< customized log message from LogFormat+CustomLog appears here >
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
[client 10.10.10.10:40388] AH01441: xml2enc: converted 2/2 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
[client 10.10.10.10:40388] AH01441: xml2enc: converted 2/1 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
[client 10.10.10.10:40388] AH01441: xml2enc: converted 4/3 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
[client 10.10.10.10:40388] AH01441: xml2enc: converted 1/0 bytes
[pid 15039:tid 3007834992] [client 10.10.10.100:40388] AH01444: Skipping
invalid byte(s) in input stream!
[pid 15039:tid 3007834992] mod_xml2enc.c(481): [client
10.10.10.10:40388] AH01440: xml2enc: reinserting 332 unconsumed bytes
from bucket
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01385: Zlib
error -2 flushing zlib output buffer ((null))
"ProxyHTMLEnable On" results in "curl: (61) Error while processing
content unencoding: invalid code lengths set"
[pid 16165:tid 3007834992] proxy_util.c(1998): AH00943: http: has
released connection for (vhost01.backend03.local)
[pid 16165:tid 3007834992] mod_xml2enc.c(183): [client
10.10.10.10:40406] AH01430: Content-Type is text/html
[pid 16165:tid 3007834992] mod_xml2enc.c(259): [client
10.10.10.10:40406] AH01434: Charset ISO-8859-1 not supported by libxml2;
trying apr_xlate
[pid 16165:tid 3007834992] mod_xml2enc.c(463): [client
10.10.10.10:40406] AH01439: xml2enc: consuming 366 bytes from bucket
[pid 16165:tid 3007834992] mod_xml2enc.c(490): [client
10.10.10.10:40406] AH01441: xml2enc: converted 255/366 bytes
[pid 16165:tid 3007834992] mod_xml2enc.c(490): [client
10.10.10.10:40406] AH01441: xml2enc: converted 111/159 bytes
[pid 16165:tid 3007834992] [client 10.128.128.60:40406] Running
proxy_html_filter
[pid 16165:tid 3007834992] mod_xml2enc.c(463): [client
10.10.10.10:40406] AH01439: xml2enc: consuming 537 bytes from bucket
[pid 16165:tid 3007834992] mod_xml2enc.c(490): [client
10.10.10.10:40406] AH01441: xml2enc: converted 1306/1306 bytes
[pid 16165:tid 3007834992] mod_proxy_balancer.c(656): [client
10.10.10.10:40406] AH01176: proxy_balancer_post_request for
(balancer://cd107d9706d71153bafd4ab15f1c6b5d)
< customized log message from LogFormat+CustomLog appears here >
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_proxy_html, HTML rewrite and content compression
Posted by Nick Kew <ni...@webthing.com>.
On 14 Nov 2012, at 15:53, Thomas Eckert wrote:
> Is there a way to work around this ? I do want the call to mod_xml2enc to happen but I also want the reverse proxy to support content compression.
That's a lot of correct analysis.
The output chain you want is INFLATE;xml2enc;proxy-html;DEFLATE .
What problems do you encounter (apart from processing overhead)
when you set all that?
I guess ideally ProxyHTMLEnable should detect compressed content
and insert INFLATE where necessary. Maybe even have an option to
set DEFLATE. But as of now it doesn't: I got the impression most
users prefer to disable compression, and avoid the substantial
processing overhead of zipping in a proxy.
--
Nick Kew
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org