You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Peter <pm...@citylink.dinoex.sub.org> on 2019/04/15 13:44:09 UTC
[users@httpd] [patch] Apache converts GZIPed data into UTF-8 - 2nd Act
Oh, nobody has an answer to the issue?
Okay...
Investigating, it appears that mod_xml2enc indeed grabs everything it
can lay hands on, if only it is tagged as some 'text/whatver', and
"converts" it (assuming it were ISO8859-1), no matter the damage, and
giving a f*** damn on compressed data. :((
This gets obvious from the code, it is also visible in the
debuglog:
[proxy_http:trace3] [pid 52505] mod_proxy_http.c(1402): [client 192.168.97.18:28882] Status from backend: 200
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1052): [client 192.168.97.18:28882] Headers received from backend:
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Last-Modified: Sun, 14 Apr 2019 05:53:26 GMT
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Content-Type: text/css
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Content-Encoding: gzip
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Vary: Accept-Encoding
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Content-Length: 6194
[proxy_http:trace3] [pid 52505] mod_proxy_http.c(1672): [client 192.168.97.18:28882] start body send
[xml2enc:debug] [pid 52505] mod_xml2enc.c(176): [client 192.168.97.18:28882] AH01430: Content-Type is text/css
[xml2enc:debug] [pid 52505] mod_xml2enc.c(250): [client 192.168.97.18:28882] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
[xml2enc:debug] [pid 52505] mod_xml2enc.c(464): [client 192.168.97.18:28882] AH01439: xml2enc: consuming 6194 bytes from bucket
[xml2enc:debug] [pid 52505] mod_xml2enc.c(490): [client 192.168.97.18:28882] AH01441: xml2enc: converted 4049/6193 bytes
[xml2enc:debug] [pid 52505] mod_xml2enc.c(490): [client 192.168.97.18:28882] AH01441: xml2enc: converted 2145/3242 bytes
[proxy_html:trace1] [pid 52505] mod_proxy_html.c(832): [client 192.168.97.18:28882] Non-HTML content; not inserting proxy-html filter
[http:trace3] [pid 52505] http_filters.c(1125): [client 192.168.97.18:28882] Response sent with status 200, headers:
[http:trace5] [pid 52505] http_filters.c(1134): [client 192.168.97.18:28882] Date: Sun, 14 Apr 2019 16:07:20 GMT
[http:trace5] [pid 52505] http_filters.c(1137): [client 192.168.97.18:28882] Server: Apache/2.4.39 (FreeBSD)
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882] Last-Modified: Sun, 14 Apr 2019 05:53:26 GMT
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882] Content-Type: text/css;charset=utf-8
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882] Content-Encoding: gzip
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882] Vary: Accept-Encoding
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882] Keep-Alive: timeout=15, max=100
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882] Connection: Keep-Alive
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882] Transfer-Encoding: chunked
Then, depending on which filters are configured, this may or may not
happen. It may even be runtime dependent. I tried to put proxy_html
into a filter chain to get a more defined behaviour, but this is not
possible, it produces a configuration error with FilterProvider,
although the documentation says:
"Any content filter may be used as a provider to mod_filter;
no change to existing filter modules is required"
So this does not work, either.
Finally I decided to fix the code, as good as I can. (As stated before,
I have absolutely no idea about this stuff and it's conventions, I just
need to make the thing workable.)
---------------------------------------------------------------------------
--- modules/filters/mod_xml2enc.c.orig 2018-06-22 10:43:46.000000000 +0000
+++ modules/filters/mod_xml2enc.c 2019-04-14 23:33:16.661705000 +0000
@@ -305,6 +305,7 @@
apr_size_t insz = 0;
int pending_meta = 0;
char *ctype;
+ const char *c_enc = NULL;
char *p;
if (!ctx || !f->r->content_type) {
@@ -324,6 +325,17 @@
return ap_pass_brigade(f->next, bb) ;
}
+ if((c_enc = apr_table_get(f->r->headers_out, "Content-Encoding")) &&
+ !strstr(c_enc, "identity") &&
+ !apr_table_get(f->r->notes, "X-PMc-was-here")) {
+ ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, f->r, APLOGNO(66666)
+ "Probable deflated content, standing down") ;
+ ap_remove_output_filter(f);
+ return ap_pass_brigade(f->next, bb) ;
+ } else {
+ apr_table_set(f->r->notes, "X-PMc-was-here", "1");
+ }
+
if (ctx->bbsave == NULL) {
ctx->bbsave = apr_brigade_create(f->r->pool,
f->r->connection->bucket_alloc);
---------------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 -
2nd Act
Posted by Peter <pm...@citylink.dinoex.sub.org>.
On Mon, Apr 15, 2019 at 05:21:27PM +0100, Nick Kew wrote:
! > Oh, nobody has an answer to the issue?
!
! Well I might have done, but I was out rehearsing and performing Bach,
! not reading your email!
Oh, You're perfectly welcome to do so!
In fact I was just hoping for *any* reply - I didn't have the hope to
actually reach somebody deeply involved. Your reply is highly
appreciated!!
! mod_proxy_html knows to remove itself from the chain when it sees non-HTML,
! but mod_xml2enc doesn't.
From my viewpoint, the problem seemed to be that xml2enc is always
pulled into the process-chain, no matter if one wants it or not, and
the (appearingly) only way to avoid that being to not load the module
(and living with the warnings issued on server start).
! > [xml2enc:debug] [pid 52505] mod_xml2enc.c(176): [client 192.168.97.18:28882] AH01430: Content-Type is text/css
!
! At which point, you want the same reaction from xml2enc as from proxy_html.
! i.e. remove itself and leave your contents untouched.
Not really, but that would be a viable approach in the sense of
"do-the-least-unexpected".
No, I would indeed like to run the xml2enc on all kinds of text
(because that may ease my issue with the always-postponed character
coding cleanup on my 20+ years old machines); I just want it to run
where _I_ want it to run - and definitely not on compressed data.
! > [xml2enc:debug] [pid 52505] mod_xml2enc.c(250): [client 192.168.97.18:28882] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
!
! That looks to me like a problem with your libxml2.
! But that's outside the scope of this discussion.
Hm. Another piece of software I never looked at...
! > Then, depending on which filters are configured, this may or may not
! > happen. It may even be runtime dependent. I tried to put proxy_html
! > into a filter chain to get a more defined behaviour, but this is not
! > possible, it produces a configuration error with FilterProvider,
!
! Did you misspell it? It's proxy-html (hyphen, not underscore).
Now that's a hint! Indeed, I probably missed that one - I tried
with and without underscore, upper and lowercase, but likely
not the hyphen... and I failed to find the place in the source
where that name is declared. (Now, knowing the spelling, it is
easy to find ;))
And indeed! That works like I had hoped for - with
"ProxyHTMLEnable Off" and properly steered from the FilterChain,
so I can suppress it on proably compressed objects.
But it seems proxy-html does not even invoke xml2enc when called
in the filter chain - so the whole issue vaporizes in beauty. ;)
Nevertheless, the average stupid user (like me) might likely start
with the most simple configuration, and might run into this, and
would have a hard time figuring what is actually wrong; so we should
do something about it, and spare them a night searching.
! > Finally I decided to fix the code, as good as I can. (As stated before,
! > I have absolutely no idea about this stuff and it's conventions, I just
! > need to make the thing workable.)
!
! Hmm. Your fix does the job for you, but shouldn't be necessary.
No, it's just that I didn't get it running discretely from the
filter chain.
! I'm thinking, mod_proxy_html does the right thing removing itself.
! mod_xml2enc should do the same when inserted by mod_proxy_html.
Yepp, and leave the option to insert xml2enc explicitely for other
kind of files, if one wants to do that! Agreed!
Whereas, in an ideal world, mod_proxy_html would not stand down, but
would fixup the URLs in the stylesheet-documents as well.
But then, most people are concerned about performance and use an
asset-server anyway and not get such documents from the backend
(while I am just using Rails as scriptable database-GUI that I can
reach from anywhere in the world, disregarding performance), so
public demand for this may be limited; and it can nicely be done with
substitute.
! Thanks for the detailed analysis!
Thanks for the (unvoluntary) invitation to look a bit deeper
into the internals of that apache beast. :))
cheerio,
PMc
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 -
2nd Act
Posted by Peter <pm...@citylink.dinoex.sub.org>.
On Mon, Apr 15, 2019 at 11:43:21PM +0100, Nick Kew wrote:
Hi Nick,
! OK, I've looked.
me too. ;)
! What I'd like to do - pass responsibility back to the module
! that inserted the xml2enc filter - calls for a minor API
! change, so isn't going to happen in 2.4.x. A variant on
! that approach might work, but right now I don't see anything
! better than replicating mod_proxy_html's logic in mod_xml2enc
! to deal with the situation where they're interacting.
!
! Your check on content-encoding can also looks good.
! Except that unless I'm missing something, your use of f->r->notes
! is unnecessary: ap_remove_output_filter means we don't revisit
! that code!
Yes, it were unnecessary, but for a different reason: my code is
currently not at the proper place.
Given a chain DEFLATE;XML2ENC;INFLATE it looks like this:
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:126 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'inflate' matched
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:127 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'xml2enc' matched
[xml2enc:debug] [pid 77874] mod_xml2enc.c(176): [client 192.168.97.18:65401] AH01430: Content-Type is text/css
[xml2enc:debug] [pid 77874] mod_xml2enc.c(250): [client 192.168.97.18:65401] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:130 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'deflate' matched
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[deflate:debug] [pid 77874] mod_deflate.c(1622): [client 192.168.97.18:65401] AH01398: Zlib: Inflated 6176 to 28247 : URL /fin-stage/assets/application-3a5821b5be536e0108d5934c96815299001dfa3c1ddff9f39676a3a3126d8190.css
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 3959 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 3959/3959 bytes
[deflate:debug] [pid 77874] mod_deflate.c(854): [client 192.168.97.18:65401] AH01384: Zlib: Compressed 28247 to 6226 : URL /fin-stage/assets/application-3a5821b5be536e0108d5934c96815299001dfa3c1ddff9f39676a3a3126d8190.css
Currently my snippet it is run for each of these chunks of data
(which is not a good idea, but I didn't hope to be able to understand
the code in its fullness and find a better place). So, with the
DEFLATE walking behind, when it comes to the second chunk, the
DEFLATE will already have put the "gzip" header back in, and so
I watched xml2enc quit in the midst of the document.
Thats why I put that in.
Another minor flaw is that the test for "Content-Encoding: identity"
(btw: does anybody use that?) is probably not case-insensitive.
And then I was thinking about a different and probably better approach:
if we can check the first few bytes of the actual document
beforehand, we can test these against the signatures of the usual
compression-algorithms (in the same way as the "file" command does it
on Unix). This seems more safe than relying on header information.
Because, I don't see a reason why an HTML document might not also be
compressed - and then it wouldn't help to just stop processing CSS
documents.
Btw, concerning this message, I had a look at that one, too:
AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
It seems to me that this message is reached just because the document
is compressed (and libxml2 can obviousely not find a charset in
that); only the message text seems misleading.
Maybe a conservative approach would be to just stop at that point
and give up - because, compression might not be the only issue here;
people might get the idea to use some end-to-end encryption for
certain documents, and that would also appear as binary data that we
must not tamper with...
(just thinking along)
cheerio,
PMc
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 -
2nd Act
Posted by Nick Kew <ni...@apache.org>.
On Mon, 15 Apr 2019 17:21:27 +0100
Nick Kew <ni...@apache.org> wrote:
> Hmm. Your fix does the job for you, but shouldn't be necessary.
>
> I'm thinking, mod_proxy_html does the right thing removing itself.
> mod_xml2enc should do the same when inserted by mod_proxy_html.
> That should be straightforward to fix. I'll take a look later today.
>
> Thanks for the detailed analysis!
OK, I've looked.
What I'd like to do - pass responsibility back to the module
that inserted the xml2enc filter - calls for a minor API
change, so isn't going to happen in 2.4.x. A variant on
that approach might work, but right now I don't see anything
better than replicating mod_proxy_html's logic in mod_xml2enc
to deal with the situation where they're interacting.
Your check on content-encoding can also looks good.
Except that unless I'm missing something, your use of f->r->notes
is unnecessary: ap_remove_output_filter means we don't revisit
that code!
--
Nick Kew
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 -
2nd Act
Posted by Nick Kew <ni...@apache.org>.
> On 15 Apr 2019, at 14:44, Peter <pm...@citylink.dinoex.sub.org> wrote:
>
>
> Oh, nobody has an answer to the issue?
Well I might have done, but I was out rehearsing and performing Bach,
not reading your email!
> Okay...
>
> Investigating, it appears that mod_xml2enc indeed grabs everything it
> can lay hands on, if only it is tagged as some 'text/whatver', and
> "converts" it (assuming it were ISO8859-1), no matter the damage, and
> giving a f*** damn on compressed data. :((
Heh.
Well, you've identified an issue, albeit in rather colourful language!
mod_proxy_html knows to remove itself from the chain when it sees non-HTML,
but mod_xml2enc doesn't.
Probably my fault.
> [xml2enc:debug] [pid 52505] mod_xml2enc.c(176): [client 192.168.97.18:28882] AH01430: Content-Type is text/css
At which point, you want the same reaction from xml2enc as from proxy_html.
i.e. remove itself and leave your contents untouched.
> [xml2enc:debug] [pid 52505] mod_xml2enc.c(250): [client 192.168.97.18:28882] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
That looks to me like a problem with your libxml2.
But that's outside the scope of this discussion.
> Then, depending on which filters are configured, this may or may not
> happen. It may even be runtime dependent. I tried to put proxy_html
> into a filter chain to get a more defined behaviour, but this is not
> possible, it produces a configuration error with FilterProvider,
Did you misspell it? It's proxy-html (hyphen, not underscore).
> Finally I decided to fix the code, as good as I can. (As stated before,
> I have absolutely no idea about this stuff and it's conventions, I just
> need to make the thing workable.)
Hmm. Your fix does the job for you, but shouldn't be necessary.
I'm thinking, mod_proxy_html does the right thing removing itself.
mod_xml2enc should do the same when inserted by mod_proxy_html.
That should be straightforward to fix. I'll take a look later today.
Thanks for the detailed analysis!
--
Nick Kew
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org