You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by jo...@apache.org on 2020/12/16 16:23:23 UTC
svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt
modules/filters/mod_xml2enc.c
Author: jorton
Date: Wed Dec 16 16:23:23 2020
New Revision: 1884505
URL: http://svn.apache.org/viewvc?rev=1884505&view=rev
Log:
The Microsoft OOXML format uses xml packaged into a zip file, and has
mimetypes like:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
This mimetypes contains 'xml', but is unfortunately not an xml file.
xml2enc processes these files (in particular, when mod_proxy_html is
used), typically resulting in them being corrupted as it seems to
attempt to perform a ISO-8859-1 to UTF-8 conversion on them.
* modules/filters/mod_xml2enc.c (xml2enc_ffunc): Restrict test for XML
types to matching "+xml".
Submitted by: Joseph Heenan <joseph.heenan fintechlabs.io>
PR: 64339
Github: closes #150
Added:
httpd/httpd/trunk/changes-entries/pr64339.txt
Modified:
httpd/httpd/trunk/modules/filters/mod_xml2enc.c
Added: httpd/httpd/trunk/changes-entries/pr64339.txt
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/changes-entries/pr64339.txt?rev=1884505&view=auto
==============================================================================
--- httpd/httpd/trunk/changes-entries/pr64339.txt (added)
+++ httpd/httpd/trunk/changes-entries/pr64339.txt Wed Dec 16 16:23:23 2020
@@ -0,0 +1,4 @@
+ *) mod_xml2enc: Update check to match MIME types matching
+ "+xml" rather than anything containing "xml", avoiding
+ corruption of Microsoft OOXML formats. PR 64339.
+ [Joseph Heenan <joseph.heenan fintechlabs.io>]
Modified: httpd/httpd/trunk/modules/filters/mod_xml2enc.c
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/filters/mod_xml2enc.c?rev=1884505&r1=1884504&r2=1884505&view=diff
==============================================================================
--- httpd/httpd/trunk/modules/filters/mod_xml2enc.c (original)
+++ httpd/httpd/trunk/modules/filters/mod_xml2enc.c Wed Dec 16 16:23:23 2020
@@ -343,8 +343,8 @@ static apr_status_t xml2enc_ffunc(ap_fil
if (isupper(*p))
*p = tolower(*p);
- /* only act if starts-with "text/" or contains "xml" */
- if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml")) {
+ /* only act if starts-with "text/" or contains "+xml" */
+ if (strncmp(ctype, "text/", 5) && !strstr(ctype, "+xml")) {
ap_remove_output_filter(f);
return ap_pass_brigade(f->next, bb) ;
}
Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt
modules/filters/mod_xml2enc.c
Posted by Yann Ylavic <yl...@gmail.com>.
On Sat, Dec 19, 2020 at 11:02 PM Nick Kew <ni...@apache.org> wrote:
>
> On Thu, 17 Dec 2020 17:31:20 +0000
> Nick Kew <ni...@apache.org> wrote:
>
> > > On 17 Dec 2020, at 16:22, Joe Orton <jo...@redhat.com> wrote:
> > [chop]
> > Thanks for prompting me to take a proper look at where this thread
> > started.
>
> Further thought: anything hardwired will come up against future
> use cases where it acts contrary to expectations and indeed
> requirements. A future-proof solution is to make it configurable.
>
> I attach a patch I've just hacked (untested): if folks are happy
> with the approach I'll flesh it out and commit.
Looks good to me.
I still wonder if the default (when no xml2MimeType is configured)
shouldn't be something like:
if (!ctx->checkedmime) {
if (strncmp(ctype, "text/html", 9)
&& (!(x = strstr(ctype, "xml"))
|| (x > ctype && apr_isalnum(x[-1])
|| apr_isalnum(x[3])))) {
ap_remove_output_filter(f);
return ap_pass_brigade(f->next, bb) ;
}
ctx->checkedmime = 1;
}
so that we still include "text/html" and anything "xml" but not in the
middle of a word like the problematic
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
from PR 64339 (or github #150).
Or are there some real middle-word "xml" mime types which we want to handle?
Regards;
Yann.
Re: svn commit: r1884505 - in /httpd/httpd/trunk:
changes-entries/pr64339.txt modules/filters/mod_xml2enc.c
Posted by Nick Kew <ni...@apache.org>.
On Thu, 17 Dec 2020 17:31:20 +0000
Nick Kew <ni...@apache.org> wrote:
> > On 17 Dec 2020, at 16:22, Joe Orton <jo...@redhat.com> wrote:
> [chop]
> Thanks for prompting me to take a proper look at where this thread
> started.
Further thought: anything hardwired will come up against future
use cases where it acts contrary to expectations and indeed
requirements. A future-proof solution is to make it configurable.
I attach a patch I've just hacked (untested): if folks are happy
with the approach I'll flesh it out and commit.
--
Nick Kew
Re: svn commit: r1884505 - in /httpd/httpd/trunk:
changes-entries/pr64339.txt modules/filters/mod_xml2enc.c
Posted by Nick Kew <ni...@apache.org>.
> On 17 Dec 2020, at 16:22, Joe Orton <jo...@redhat.com> wrote:
>
> On Wed, Dec 16, 2020 at 07:41:59PM +0000, Nick Kew wrote:
>>> On 16 Dec 2020, at 17:47, Yann Ylavic <yl...@gmail.com> wrote:
>>>> Wouldn't this stop matching "application/xml" for instance?
>>>>
>>>> Possibly this test instead:
>>>> if (strncmp(ctype, "text/", 5)
>>>> && (!(x = strstr(ctype, "xml"))
>>>> || x == ctype || !strchr("/+", x[-1]))) {
>>>> ?
>>>
>>> I would even remove the "text/" check (why act on "text/plain" for
>>> instance), so maybe:
>>> if (!(x = strstr(ctype, "xml"))
>>> || x == ctype || !strchr("/+", x[-1])
>>> || apr_isalnum(x[3])) {
>>> ?
>>
>> Be liberal in what you accept. You can limit it further in configuration,
>> but you can't override a hardwired check.
>>
>> It certainly needs to operate on text/html for mod_proxy_html, and users might
>> find reasons for running it on other text types as an alternative to an iconv
>> filter like mod_charset(_lite).
>
> I'm not sure if you are agreeing with Yann or not. You wrote the code,
> how do you think we should you resolve PR 64339, NOTABUG & revert
> r1884505 or something else?
Thanks for prompting me to take a proper look at where this thread started.
I started to compose a reply here, bug got bogged down. So I've gone to the
PR instead. I see there's a patch submitted by Giovanni Bechis which looks
more-or-less acceptable, though it does raise further questions.
I've marked the bug NEEDINFO and asked further questions there.
--
Nick Kew
Re: svn commit: r1884505 - in /httpd/httpd/trunk:
changes-entries/pr64339.txt modules/filters/mod_xml2enc.c
Posted by Joe Orton <jo...@redhat.com>.
On Wed, Dec 16, 2020 at 07:41:59PM +0000, Nick Kew wrote:
> > On 16 Dec 2020, at 17:47, Yann Ylavic <yl...@gmail.com> wrote:
> >> Wouldn't this stop matching "application/xml" for instance?
> >>
> >> Possibly this test instead:
> >> if (strncmp(ctype, "text/", 5)
> >> && (!(x = strstr(ctype, "xml"))
> >> || x == ctype || !strchr("/+", x[-1]))) {
> >> ?
> >
> > I would even remove the "text/" check (why act on "text/plain" for
> > instance), so maybe:
> > if (!(x = strstr(ctype, "xml"))
> > || x == ctype || !strchr("/+", x[-1])
> > || apr_isalnum(x[3])) {
> > ?
>
> Be liberal in what you accept. You can limit it further in configuration,
> but you can't override a hardwired check.
>
> It certainly needs to operate on text/html for mod_proxy_html, and users might
> find reasons for running it on other text types as an alternative to an iconv
> filter like mod_charset(_lite).
I'm not sure if you are agreeing with Yann or not. You wrote the code,
how do you think we should you resolve PR 64339, NOTABUG & revert
r1884505 or something else?
Regards, Joe
Re: svn commit: r1884505 - in /httpd/httpd/trunk:
changes-entries/pr64339.txt modules/filters/mod_xml2enc.c
Posted by Nick Kew <ni...@apache.org>.
> On 16 Dec 2020, at 17:47, Yann Ylavic <yl...@gmail.com> wrote:
>
> On Wed, Dec 16, 2020 at 6:36 PM Yann Ylavic <yl...@gmail.com> wrote:
>>
>> On Wed, Dec 16, 2020 at 5:23 PM <jo...@apache.org> wrote:
>>>
>>> - /* only act if starts-with "text/" or contains "xml" */
>>> - if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml")) {
>>> + /* only act if starts-with "text/" or contains "+xml" */
>>> + if (strncmp(ctype, "text/", 5) && !strstr(ctype, "+xml")) {
>>> ap_remove_output_filter(f);
>>> return ap_pass_brigade(f->next, bb) ;
>>> }
>>
>> Wouldn't this stop matching "application/xml" for instance?
>>
>> Possibly this test instead:
>> if (strncmp(ctype, "text/", 5)
>> && (!(x = strstr(ctype, "xml"))
>> || x == ctype || !strchr("/+", x[-1]))) {
>> ?
>
> I would even remove the "text/" check (why act on "text/plain" for
> instance), so maybe:
> if (!(x = strstr(ctype, "xml"))
> || x == ctype || !strchr("/+", x[-1])
> || apr_isalnum(x[3])) {
> ?
Be liberal in what you accept. You can limit it further in configuration,
but you can't override a hardwired check.
It certainly needs to operate on text/html for mod_proxy_html, and users might
find reasons for running it on other text types as an alternative to an iconv
filter like mod_charset(_lite).
--
Nick Kew
Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt
modules/filters/mod_xml2enc.c
Posted by Yann Ylavic <yl...@gmail.com>.
On Wed, Dec 16, 2020 at 6:36 PM Yann Ylavic <yl...@gmail.com> wrote:
>
> On Wed, Dec 16, 2020 at 5:23 PM <jo...@apache.org> wrote:
> >
> > - /* only act if starts-with "text/" or contains "xml" */
> > - if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml")) {
> > + /* only act if starts-with "text/" or contains "+xml" */
> > + if (strncmp(ctype, "text/", 5) && !strstr(ctype, "+xml")) {
> > ap_remove_output_filter(f);
> > return ap_pass_brigade(f->next, bb) ;
> > }
>
> Wouldn't this stop matching "application/xml" for instance?
>
> Possibly this test instead:
> if (strncmp(ctype, "text/", 5)
> && (!(x = strstr(ctype, "xml"))
> || x == ctype || !strchr("/+", x[-1]))) {
> ?
I would even remove the "text/" check (why act on "text/plain" for
instance), so maybe:
if (!(x = strstr(ctype, "xml"))
|| x == ctype || !strchr("/+", x[-1])
|| apr_isalnum(x[3])) {
?
Re: svn commit: r1884505 - in /httpd/httpd/trunk: changes-entries/pr64339.txt
modules/filters/mod_xml2enc.c
Posted by Yann Ylavic <yl...@gmail.com>.
On Wed, Dec 16, 2020 at 5:23 PM <jo...@apache.org> wrote:
>
> - /* only act if starts-with "text/" or contains "xml" */
> - if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml")) {
> + /* only act if starts-with "text/" or contains "+xml" */
> + if (strncmp(ctype, "text/", 5) && !strstr(ctype, "+xml")) {
> ap_remove_output_filter(f);
> return ap_pass_brigade(f->next, bb) ;
> }
Wouldn't this stop matching "application/xml" for instance?
Possibly this test instead:
if (strncmp(ctype, "text/", 5)
&& (!(x = strstr(ctype, "xml"))
|| x == ctype || !strchr("/+", x[-1]))) {
?