You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Peter <pm...@citylink.dinoex.sub.org> on 2019/04/15 13:44:09 UTC

[users@httpd] [patch] Apache converts GZIPed data into UTF-8 - 2nd Act

Oh, nobody has an answer to the issue?

Okay...

Investigating, it appears that mod_xml2enc indeed grabs everything it
can lay hands on, if only it is tagged as some 'text/whatver', and
"converts" it (assuming it were ISO8859-1), no matter the damage, and
giving a f*** damn on compressed data. :((

This gets obvious from the code, it is also visible in the
debuglog:

[proxy_http:trace3] [pid 52505] mod_proxy_http.c(1402): [client 192.168.97.18:28882] Status from backend: 200
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1052): [client 192.168.97.18:28882] Headers received from backend:
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Last-Modified: Sun, 14 Apr 2019 05:53:26 GMT
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Content-Type: text/css
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Content-Encoding: gzip
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Vary: Accept-Encoding
[proxy_http:trace4] [pid 52505] mod_proxy_http.c(1075): [client 192.168.97.18:28882] Content-Length: 6194
[proxy_http:trace3] [pid 52505] mod_proxy_http.c(1672): [client 192.168.97.18:28882] start body send
[xml2enc:debug] [pid 52505] mod_xml2enc.c(176): [client 192.168.97.18:28882] AH01430: Content-Type is text/css
[xml2enc:debug] [pid 52505] mod_xml2enc.c(250): [client 192.168.97.18:28882] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
[xml2enc:debug] [pid 52505] mod_xml2enc.c(464): [client 192.168.97.18:28882] AH01439: xml2enc: consuming 6194 bytes from bucket
[xml2enc:debug] [pid 52505] mod_xml2enc.c(490): [client 192.168.97.18:28882] AH01441: xml2enc: converted 4049/6193 bytes
[xml2enc:debug] [pid 52505] mod_xml2enc.c(490): [client 192.168.97.18:28882] AH01441: xml2enc: converted 2145/3242 bytes
[proxy_html:trace1] [pid 52505] mod_proxy_html.c(832): [client 192.168.97.18:28882] Non-HTML content; not inserting proxy-html filter
[http:trace3] [pid 52505] http_filters.c(1125): [client 192.168.97.18:28882] Response sent with status 200, headers:
[http:trace5] [pid 52505] http_filters.c(1134): [client 192.168.97.18:28882]   Date: Sun, 14 Apr 2019 16:07:20 GMT
[http:trace5] [pid 52505] http_filters.c(1137): [client 192.168.97.18:28882]   Server: Apache/2.4.39 (FreeBSD)
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882]   Last-Modified: Sun, 14 Apr 2019 05:53:26 GMT
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882]   Content-Type: text/css;charset=utf-8
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882]   Content-Encoding: gzip
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882]   Vary: Accept-Encoding
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882]   Keep-Alive: timeout=15, max=100
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882]   Connection: Keep-Alive
[http:trace4] [pid 52505] http_filters.c(955): [client 192.168.97.18:28882]   Transfer-Encoding: chunked


Then, depending on which filters are configured, this may or may not
happen. It may even be runtime dependent. I tried to put proxy_html
into a filter chain to get a more defined behaviour, but this is not
possible, it produces a configuration error with FilterProvider, 
although the documentation says:
        "Any content filter may be used as a provider to mod_filter;
         no change to existing filter modules is required"
So this does not work, either.

Finally I decided to fix the code, as good as I can. (As stated before, 
I have absolutely no idea about this stuff and it's conventions, I just
need to make the thing workable.)
---------------------------------------------------------------------------
--- modules/filters/mod_xml2enc.c.orig  2018-06-22 10:43:46.000000000 +0000
+++ modules/filters/mod_xml2enc.c       2019-04-14 23:33:16.661705000 +0000
@@ -305,6 +305,7 @@
     apr_size_t insz = 0;
     int pending_meta = 0;
     char *ctype;
+    const char *c_enc = NULL;
     char *p;
 
     if (!ctx || !f->r->content_type) {
@@ -324,6 +325,17 @@
         return ap_pass_brigade(f->next, bb) ;
     }
 
+    if((c_enc = apr_table_get(f->r->headers_out, "Content-Encoding")) &&
+            !strstr(c_enc, "identity") &&
+            !apr_table_get(f->r->notes, "X-PMc-was-here")) {
+        ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, f->r, APLOGNO(66666)
+             "Probable deflated content, standing down") ;
+        ap_remove_output_filter(f);
+        return ap_pass_brigade(f->next, bb) ;
+    } else {
+        apr_table_set(f->r->notes, "X-PMc-was-here", "1");
+    }
+    
     if (ctx->bbsave == NULL) {
         ctx->bbsave = apr_brigade_create(f->r->pool,
                                          f->r->connection->bucket_alloc);
---------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 - 2nd Act

Posted by Peter <pm...@citylink.dinoex.sub.org>.
On Mon, Apr 15, 2019 at 05:21:27PM +0100, Nick Kew wrote:
! > Oh, nobody has an answer to the issue?
! 
! Well I might have done, but I was out rehearsing and performing Bach,
! not reading your email!

Oh, You're perfectly welcome to do so!

In fact I was just hoping for *any* reply - I didn't have the hope to
actually reach somebody deeply involved. Your reply is highly
appreciated!!

! mod_proxy_html knows to remove itself from the chain when it sees non-HTML,
! but mod_xml2enc doesn't.

From my viewpoint, the problem seemed to be that xml2enc is always
pulled into the process-chain, no matter if one wants it or not, and
the (appearingly) only way to avoid that being to not load the module
(and living with the warnings issued on server start).

! > [xml2enc:debug] [pid 52505] mod_xml2enc.c(176): [client 192.168.97.18:28882] AH01430: Content-Type is text/css
! 
! At which point, you want the same reaction from xml2enc as from proxy_html.
! i.e. remove itself and leave your contents untouched.

Not really, but that would be a viable approach in the sense of 
"do-the-least-unexpected".

No, I would indeed like to run the xml2enc on all kinds of text
(because that may ease my issue with the always-postponed character
coding cleanup on my 20+ years old machines); I just want it to run
where _I_ want it to run - and definitely not on compressed data.

! > [xml2enc:debug] [pid 52505] mod_xml2enc.c(250): [client 192.168.97.18:28882] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
! 
! That looks to me like a problem with your libxml2.
! But that's outside the scope of this discussion.

Hm. Another piece of software I never looked at...

! > Then, depending on which filters are configured, this may or may not
! > happen. It may even be runtime dependent. I tried to put proxy_html
! > into a filter chain to get a more defined behaviour, but this is not
! > possible, it produces a configuration error with FilterProvider, 
! 
! Did you misspell it?  It's proxy-html (hyphen, not underscore).

Now that's a hint! Indeed, I probably missed that one - I tried
with and without underscore, upper and lowercase, but likely
not the hyphen... and I failed to find the place in the source
where that name is declared. (Now, knowing the spelling, it is
easy to find ;))

And indeed! That works like I had hoped for - with
"ProxyHTMLEnable Off" and properly steered from the FilterChain,
so I can suppress it on proably compressed objects. 
But it seems proxy-html does not even invoke xml2enc when called
in the filter chain - so the whole issue vaporizes in beauty. ;)

Nevertheless, the average stupid user (like me) might likely start
with the most simple configuration, and might run into this, and
would have a hard time figuring what is actually wrong; so we should
do something about it, and spare them a night searching.

! > Finally I decided to fix the code, as good as I can. (As stated before, 
! > I have absolutely no idea about this stuff and it's conventions, I just
! > need to make the thing workable.)
! 
! Hmm.  Your fix does the job for you, but shouldn't be necessary.

No, it's just that I didn't get it running discretely from the
filter chain.

! I'm thinking, mod_proxy_html does the right thing removing itself.
! mod_xml2enc should do the same when inserted by mod_proxy_html.

Yepp, and leave the option to insert xml2enc explicitely for other 
kind of files, if one wants to do that! Agreed!

Whereas, in an ideal world, mod_proxy_html would not stand down, but
would fixup the URLs in the stylesheet-documents as well. 
But then, most people are concerned about performance and use an 
asset-server anyway and not get such documents from the backend
(while I am just using Rails as scriptable database-GUI that I can
reach from anywhere in the world, disregarding performance), so
public demand for this may be limited; and it can nicely be done with
substitute.

! Thanks for the detailed analysis!

Thanks for the (unvoluntary) invitation to look a bit deeper
into the internals of that apache beast. :))

cheerio,
PMc

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 - 2nd Act

Posted by Peter <pm...@citylink.dinoex.sub.org>.
On Mon, Apr 15, 2019 at 11:43:21PM +0100, Nick Kew wrote:

Hi Nick,

! OK, I've looked.

me too. ;)

! What I'd like to do - pass responsibility back to the module
! that inserted the xml2enc filter - calls for a minor API
! change, so isn't going to happen in 2.4.x.  A variant on
! that approach might work, but right now I don't see anything
! better than replicating mod_proxy_html's logic in mod_xml2enc
! to deal with the situation where they're interacting.
! 
! Your check on content-encoding can also looks good.
! Except that unless I'm missing something, your use of f->r->notes
! is unnecessary: ap_remove_output_filter means we don't revisit
! that code!

Yes, it were unnecessary, but for a different reason: my code is
currently not at the proper place.
Given a chain DEFLATE;XML2ENC;INFLATE it looks like this:

[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:126 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'inflate' matched
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:127 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'xml2enc' matched
[xml2enc:debug] [pid 77874] mod_xml2enc.c(176): [client 192.168.97.18:65401] AH01430: Content-Type is text/css
[xml2enc:debug] [pid 77874] mod_xml2enc.c(250): [client 192.168.97.18:65401] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:130 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'deflate' matched
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[deflate:debug] [pid 77874] mod_deflate.c(1622): [client 192.168.97.18:65401] AH01398: Zlib: Inflated 6176 to 28247 : URL /fin-stage/assets/application-3a5821b5be536e0108d5934c96815299001dfa3c1ddff9f39676a3a3126d8190.css
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 3959 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 3959/3959 bytes
[deflate:debug] [pid 77874] mod_deflate.c(854): [client 192.168.97.18:65401] AH01384: Zlib: Compressed 28247 to 6226 : URL /fin-stage/assets/application-3a5821b5be536e0108d5934c96815299001dfa3c1ddff9f39676a3a3126d8190.css

Currently my snippet it is run for each of these chunks of data
(which is not a good idea, but I didn't hope to be able to understand
the code in its fullness and find a better place). So, with the
DEFLATE walking behind, when it comes to the second chunk, the
DEFLATE will already have put the "gzip" header back in, and so 
I watched xml2enc quit in the midst of the document.
Thats why I put that in.

Another minor flaw is that the test for "Content-Encoding: identity" 
(btw: does anybody use that?) is probably not case-insensitive.

And then I was thinking about a different and probably better approach: 
if we can check the first few bytes of the actual document
beforehand, we can test these against the signatures of the usual
compression-algorithms (in the same way as the "file" command does it
on Unix). This seems more safe than relying on header information.

Because, I don't see a reason why an HTML document might not also be
compressed - and then it wouldn't help to just stop processing CSS 
documents. 

Btw, concerning this message, I had a look at that one, too:
   AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate

It seems to me that this message is reached just because the document
is compressed (and libxml2 can obviousely not find a charset in
that); only the message text seems misleading.
Maybe a conservative approach would be to just stop at that point
and give up - because, compression might not be the only issue here;
people might get the idea to use some end-to-end encryption for
certain documents, and that would also appear as binary data that we
must not tamper with...
(just thinking along)

cheerio,
PMc

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 - 2nd Act

Posted by Nick Kew <ni...@apache.org>.
On Mon, 15 Apr 2019 17:21:27 +0100
Nick Kew <ni...@apache.org> wrote:


> Hmm.  Your fix does the job for you, but shouldn't be necessary.
> 
> I'm thinking, mod_proxy_html does the right thing removing itself.
> mod_xml2enc should do the same when inserted by mod_proxy_html.
> That should be straightforward to fix.  I'll take a look later today.
> 
> Thanks for the detailed analysis!

OK, I've looked.

What I'd like to do - pass responsibility back to the module
that inserted the xml2enc filter - calls for a minor API
change, so isn't going to happen in 2.4.x.  A variant on
that approach might work, but right now I don't see anything
better than replicating mod_proxy_html's logic in mod_xml2enc
to deal with the situation where they're interacting.

Your check on content-encoding can also looks good.
Except that unless I'm missing something, your use of f->r->notes
is unnecessary: ap_remove_output_filter means we don't revisit
that code!

-- 
Nick Kew

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 - 2nd Act

Posted by Nick Kew <ni...@apache.org>.

> On 15 Apr 2019, at 14:44, Peter <pm...@citylink.dinoex.sub.org> wrote:
> 
> 
> Oh, nobody has an answer to the issue?

Well I might have done, but I was out rehearsing and performing Bach,
not reading your email!

> Okay...
> 
> Investigating, it appears that mod_xml2enc indeed grabs everything it
> can lay hands on, if only it is tagged as some 'text/whatver', and
> "converts" it (assuming it were ISO8859-1), no matter the damage, and
> giving a f*** damn on compressed data. :((

Heh.

Well, you've identified an issue, albeit in rather colourful language!
mod_proxy_html knows to remove itself from the chain when it sees non-HTML,
but mod_xml2enc doesn't.

Probably my fault.

> [xml2enc:debug] [pid 52505] mod_xml2enc.c(176): [client 192.168.97.18:28882] AH01430: Content-Type is text/css

At which point, you want the same reaction from xml2enc as from proxy_html.
i.e. remove itself and leave your contents untouched.

> [xml2enc:debug] [pid 52505] mod_xml2enc.c(250): [client 192.168.97.18:28882] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate

That looks to me like a problem with your libxml2.
But that's outside the scope of this discussion.

> Then, depending on which filters are configured, this may or may not
> happen. It may even be runtime dependent. I tried to put proxy_html
> into a filter chain to get a more defined behaviour, but this is not
> possible, it produces a configuration error with FilterProvider, 

Did you misspell it?  It's proxy-html (hyphen, not underscore).

> Finally I decided to fix the code, as good as I can. (As stated before, 
> I have absolutely no idea about this stuff and it's conventions, I just
> need to make the thing workable.)

Hmm.  Your fix does the job for you, but shouldn't be necessary.

I'm thinking, mod_proxy_html does the right thing removing itself.
mod_xml2enc should do the same when inserted by mod_proxy_html.
That should be straightforward to fix.  I'll take a look later today.

Thanks for the detailed analysis!

-- 
Nick Kew
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org