You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by jo...@apache.org on 2020/12/16 16:23:23 UTC

svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt modules/filters/mod_xml2enc.c

Author: jorton
Date: Wed Dec 16 16:23:23 2020
New Revision: 1884505

URL: http://svn.apache.org/viewvc?rev=1884505&view=rev
Log:
The Microsoft OOXML format uses xml packaged into a zip file, and has
mimetypes like:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

This mimetypes contains 'xml', but is unfortunately not an xml file.

xml2enc processes these files (in particular, when mod_proxy_html is
used), typically resulting in them being corrupted as it seems to
attempt to perform a ISO-8859-1 to UTF-8 conversion on them.

* modules/filters/mod_xml2enc.c (xml2enc_ffunc): Restrict test for XML
  types to matching "+xml".

Submitted by: Joseph Heenan <joseph.heenan fintechlabs.io>
PR: 64339
Github: closes #150

Added:
    httpd/httpd/trunk/changes-entries/pr64339.txt
Modified:
    httpd/httpd/trunk/modules/filters/mod_xml2enc.c

Added: httpd/httpd/trunk/changes-entries/pr64339.txt
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/changes-entries/pr64339.txt?rev=1884505&view=auto
==============================================================================
--- httpd/httpd/trunk/changes-entries/pr64339.txt (added)
+++ httpd/httpd/trunk/changes-entries/pr64339.txt Wed Dec 16 16:23:23 2020
@@ -0,0 +1,4 @@
+  *) mod_xml2enc: Update check to match MIME types matching
+     "+xml" rather than anything containing "xml", avoiding
+     corruption of Microsoft OOXML formats.  PR 64339.
+     [Joseph Heenan <joseph.heenan fintechlabs.io>]

Modified: httpd/httpd/trunk/modules/filters/mod_xml2enc.c
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/filters/mod_xml2enc.c?rev=1884505&r1=1884504&r2=1884505&view=diff
==============================================================================
--- httpd/httpd/trunk/modules/filters/mod_xml2enc.c (original)
+++ httpd/httpd/trunk/modules/filters/mod_xml2enc.c Wed Dec 16 16:23:23 2020
@@ -343,8 +343,8 @@ static apr_status_t xml2enc_ffunc(ap_fil
         if (isupper(*p))
             *p = tolower(*p);
 
-    /* only act if starts-with "text/" or contains "xml" */
-    if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml"))  {
+    /* only act if starts-with "text/" or contains "+xml" */
+    if (strncmp(ctype, "text/", 5) && !strstr(ctype, "+xml"))  {
         ap_remove_output_filter(f);
         return ap_pass_brigade(f->next, bb) ;
     }



Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt modules/filters/mod_xml2enc.c

Posted by Yann Ylavic <yl...@gmail.com>.
On Sat, Dec 19, 2020 at 11:02 PM Nick Kew <ni...@apache.org> wrote:
>
> On Thu, 17 Dec 2020 17:31:20 +0000
> Nick Kew <ni...@apache.org> wrote:
>
> > > On 17 Dec 2020, at 16:22, Joe Orton <jo...@redhat.com> wrote:
> >  [chop]
> > Thanks for prompting me to take a proper look at where this thread
> > started.
>
> Further thought: anything hardwired will come up against future
> use cases where it acts contrary to expectations and indeed
> requirements.  A future-proof solution is to make it configurable.
>
> I attach a patch I've just hacked (untested): if folks are happy
> with the approach I'll flesh it out and commit.

Looks good to me.

I still wonder if the default (when no xml2MimeType is configured)
shouldn't be something like:

        if (!ctx->checkedmime) {
            if (strncmp(ctype, "text/html", 9)
                && (!(x = strstr(ctype, "xml"))
                    || (x > ctype && apr_isalnum(x[-1])
                    || apr_isalnum(x[3]))))  {
                ap_remove_output_filter(f);
                return ap_pass_brigade(f->next, bb) ;
            }
            ctx->checkedmime = 1;
        }

so that we still include "text/html" and anything "xml" but not in the
middle of a word like the problematic
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
from PR 64339 (or github #150).
Or are there some real middle-word "xml" mime types which we want to handle?

Regards;
Yann.

Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt modules/filters/mod_xml2enc.c

Posted by Nick Kew <ni...@apache.org>.
On Thu, 17 Dec 2020 17:31:20 +0000
Nick Kew <ni...@apache.org> wrote:

> > On 17 Dec 2020, at 16:22, Joe Orton <jo...@redhat.com> wrote:
>  [chop]
> Thanks for prompting me to take a proper look at where this thread
> started.

Further thought: anything hardwired will come up against future
use cases where it acts contrary to expectations and indeed
requirements.  A future-proof solution is to make it configurable.

I attach a patch I've just hacked (untested): if folks are happy
with the approach I'll flesh it out and commit.

-- 
Nick Kew

Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt modules/filters/mod_xml2enc.c

Posted by Nick Kew <ni...@apache.org>.

> On 17 Dec 2020, at 16:22, Joe Orton <jo...@redhat.com> wrote:
> 
> On Wed, Dec 16, 2020 at 07:41:59PM +0000, Nick Kew wrote:
>>> On 16 Dec 2020, at 17:47, Yann Ylavic <yl...@gmail.com> wrote:
>>>> Wouldn't this stop matching "application/xml" for instance?
>>>> 
>>>> Possibly this test instead:
>>>>   if (strncmp(ctype, "text/", 5)
>>>>       && (!(x = strstr(ctype, "xml"))
>>>>           || x == ctype || !strchr("/+", x[-1]))) {
>>>> ?
>>> 
>>> I would even remove the "text/" check (why act on "text/plain" for
>>> instance), so maybe:
>>>   if (!(x = strstr(ctype, "xml"))
>>>           || x == ctype || !strchr("/+", x[-1])
>>>           || apr_isalnum(x[3])) {
>>> ?
>> 
>> Be liberal in what you accept.  You can limit it further in configuration,
>> but you can't override a hardwired check.
>> 
>> It certainly needs to operate on text/html for mod_proxy_html, and users might
>> find reasons for running it on other text types as an alternative to an iconv
>> filter like mod_charset(_lite).
> 
> I'm not sure if you are agreeing with Yann or not.  You wrote the code, 
> how do you think we should you resolve PR 64339, NOTABUG & revert 
> r1884505 or something else?

Thanks for prompting me to take a proper look at where this thread started.

I started to compose a reply here, bug got bogged down.  So I've gone to the
PR instead.  I see there's a patch submitted by Giovanni Bechis which looks
more-or-less acceptable, though it does raise further questions.

I've marked the bug NEEDINFO and asked further questions there.

-- 
Nick Kew

Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt modules/filters/mod_xml2enc.c

Posted by Joe Orton <jo...@redhat.com>.
On Wed, Dec 16, 2020 at 07:41:59PM +0000, Nick Kew wrote:
> > On 16 Dec 2020, at 17:47, Yann Ylavic <yl...@gmail.com> wrote:
> >> Wouldn't this stop matching "application/xml" for instance?
> >> 
> >> Possibly this test instead:
> >>    if (strncmp(ctype, "text/", 5)
> >>        && (!(x = strstr(ctype, "xml"))
> >>            || x == ctype || !strchr("/+", x[-1]))) {
> >> ?
> > 
> > I would even remove the "text/" check (why act on "text/plain" for
> > instance), so maybe:
> >    if (!(x = strstr(ctype, "xml"))
> >            || x == ctype || !strchr("/+", x[-1])
> >            || apr_isalnum(x[3])) {
> > ?
> 
> Be liberal in what you accept.  You can limit it further in configuration,
> but you can't override a hardwired check.
> 
> It certainly needs to operate on text/html for mod_proxy_html, and users might
> find reasons for running it on other text types as an alternative to an iconv
> filter like mod_charset(_lite).

I'm not sure if you are agreeing with Yann or not.  You wrote the code, 
how do you think we should you resolve PR 64339, NOTABUG & revert 
r1884505 or something else?

Regards, Joe


Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt modules/filters/mod_xml2enc.c

Posted by Nick Kew <ni...@apache.org>.

> On 16 Dec 2020, at 17:47, Yann Ylavic <yl...@gmail.com> wrote:
> 
> On Wed, Dec 16, 2020 at 6:36 PM Yann Ylavic <yl...@gmail.com> wrote:
>> 
>> On Wed, Dec 16, 2020 at 5:23 PM <jo...@apache.org> wrote:
>>> 
>>> -    /* only act if starts-with "text/" or contains "xml" */
>>> -    if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml"))  {
>>> +    /* only act if starts-with "text/" or contains "+xml" */
>>> +    if (strncmp(ctype, "text/", 5) && !strstr(ctype, "+xml"))  {
>>>         ap_remove_output_filter(f);
>>>         return ap_pass_brigade(f->next, bb) ;
>>>     }
>> 
>> Wouldn't this stop matching "application/xml" for instance?
>> 
>> Possibly this test instead:
>>    if (strncmp(ctype, "text/", 5)
>>        && (!(x = strstr(ctype, "xml"))
>>            || x == ctype || !strchr("/+", x[-1]))) {
>> ?
> 
> I would even remove the "text/" check (why act on "text/plain" for
> instance), so maybe:
>    if (!(x = strstr(ctype, "xml"))
>            || x == ctype || !strchr("/+", x[-1])
>            || apr_isalnum(x[3])) {
> ?

Be liberal in what you accept.  You can limit it further in configuration,
but you can't override a hardwired check.

It certainly needs to operate on text/html for mod_proxy_html, and users might
find reasons for running it on other text types as an alternative to an iconv
filter like mod_charset(_lite).

-- 
Nick Kew

Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt modules/filters/mod_xml2enc.c

Posted by Yann Ylavic <yl...@gmail.com>.
On Wed, Dec 16, 2020 at 6:36 PM Yann Ylavic <yl...@gmail.com> wrote:
>
> On Wed, Dec 16, 2020 at 5:23 PM <jo...@apache.org> wrote:
> >
> > -    /* only act if starts-with "text/" or contains "xml" */
> > -    if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml"))  {
> > +    /* only act if starts-with "text/" or contains "+xml" */
> > +    if (strncmp(ctype, "text/", 5) && !strstr(ctype, "+xml"))  {
> >          ap_remove_output_filter(f);
> >          return ap_pass_brigade(f->next, bb) ;
> >      }
>
> Wouldn't this stop matching "application/xml" for instance?
>
> Possibly this test instead:
>     if (strncmp(ctype, "text/", 5)
>         && (!(x = strstr(ctype, "xml"))
>             || x == ctype || !strchr("/+", x[-1]))) {
> ?

I would even remove the "text/" check (why act on "text/plain" for
instance), so maybe:
    if (!(x = strstr(ctype, "xml"))
            || x == ctype || !strchr("/+", x[-1])
            || apr_isalnum(x[3])) {
?

Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt modules/filters/mod_xml2enc.c

Posted by Yann Ylavic <yl...@gmail.com>.
On Wed, Dec 16, 2020 at 5:23 PM <jo...@apache.org> wrote:
>
> -    /* only act if starts-with "text/" or contains "xml" */
> -    if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml"))  {
> +    /* only act if starts-with "text/" or contains "+xml" */
> +    if (strncmp(ctype, "text/", 5) && !strstr(ctype, "+xml"))  {
>          ap_remove_output_filter(f);
>          return ap_pass_brigade(f->next, bb) ;
>      }

Wouldn't this stop matching "application/xml" for instance?

Possibly this test instead:
    if (strncmp(ctype, "text/", 5)
        && (!(x = strstr(ctype, "xml"))
            || x == ctype || !strchr("/+", x[-1]))) {
?