You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Gareth McCaughan <gm...@synaptics-uk.com> on 2006/06/02 16:02:04 UTC

mod_mbox: incorrect parsing of MIME part-names

(I sent this to dev@ yesterday, but I wasn't subscribed then.
Presumably the message just got dropped. It would have been
nice to have a bounce message at least, but perhaps the levels
of spam make even that impossible. Alas.)

When mod_mbox sees a multipart message, the code in
mbox_mime_decode_multipart looks in the Content-Type header
of each part for a "name" parameter. Unfortunately, its
handling of this is broken: it treats any whitespace or ";"
as ending the parameter value, which is wrong because a
parameter value can be an RFC822 quoted-string, which can
include any characters at all.

So when mod_mbox sees

    Content-Type: text/plain; name="I am the walrus.txt"

it takes the name to be <<<"I>>>; it then doesn't strip
the double-quotes because the name doesn't end as well as
start with a double-quote; the name then gets stuffed into
an XML attribute value without any sort of encoding; the
result is something that looks like

    <part name=""I" ...>

which confuses the mod_mbox AJAX front end and makes it
fail to retrieve, or at least fail to show, the message.
That's a pity :-).

The following patch kinda-sorta deals with this. Remaining
deficiencies, definite and arguable:

  - Double-quoted parts are permitted anywhere within
    the value, whereas in fact the MIME specs insist
    that a value be *either* a token *or* a single
    quoted-string. (I'm coping reasonably robustly
    with broken data here; I don't think this is a
    deficiency.)

  - Double-quote pairs other than at the start and end
    of the value aren't stripped. In other words, the
    above-mentioned robustness isn't used properly.
    This is a deficiency.

  - Escaped characters are left escaped. This is probably
    a deficiency, but a minor one.

  - Isolated double-quotes aren't escaped in any way.
    There's no clearly-correct way to deal with them,
    I think, but leaving them as they are means that
    the problem that prompted me to write this has only
    been lessened, not entirely fixed. This is a
    deficiency. Fixing it would substantially complicate
    the code. Kludging it, for instance by a second pass
    that simply annihilates double-quotes, would be
    pretty easy but arguably the Wrong Thing.

  - I haven't looked to see whether there are other bits
    of MIME handling that have the same problem.

Applying this patch would certainly be an improvement,
despite these deficiencies. It might be better to do
something that goes further, but I shan't attempt to
do so without some idea of what kind of going-further
is thought appropriate by the mod_mbox maintainers.

---------- patch begins ----------
--- mod_mbox_mime.c.ORIG        Thu Jun  1 13:32:37 2006
+++ mod_mbox_mime.c     Thu Jun  1 14:04:55 2006
@@ -85,12 +85,22 @@
        tmp = ap_strstr(body, "name=");
        if (tmp && tmp < headers_bound) {
             char c;
+            int in_quoted=0;
+            int escaped=0;
            tmp += sizeof("name=") - 1;
            k = tmp;
 
            while (*k) {
-               if (isspace(*k) || *k == ';') {
-                    c = *k;
+               if (in_quoted) {
+                   in_quoted = escaped || (*k != '"');
+                   escaped = (!escaped) && (*k == '\\');
+               }
+               else if (*k == '"') {
+                   in_quoted = 1;
+                   escaped = 0;
+               }
+               else if (isspace(*k) || *k == ';') {
+                   c = *k;
                    *k = 0;
                    break;
                }
---------- patch ends ----------

-- 
Gareth McCaughan


Re: mod_mbox: incorrect parsing of MIME part-names

Posted by Paul Querna <ch...@force-elite.com>.
Gareth McCaughan wrote:
  > Speaking of mod_mbox problems, the dev@ archives at apache.org
> (which are managed by mod_mbox) seem to be some way behind the
> times. Gmane shows 20 messages already in June, whereas the official
> archives have only 8. Perhaps it's just that the archives are
> only updated daily; if not, something may be amiss. I expect this
> is the wrong place to report the problem (if it *is* a problem),
> though...

When everything is working correctly, the mod_mbox archives should be 
about 1 hour behind.

Currently, it might be more, since the server hosting the mail archives, 
ajax, has an rsync that can take longer than an hour to complete... so, 
it might be up to several hours behind, but it should never be more than 
that.

-Paul


Re: mod_mbox: incorrect parsing of MIME part-names

Posted by Gareth McCaughan <gm...@synaptics-uk.com>.
On Friday 2006-06-02 16:14, Maxime Petazzoni wrote:

[me:]
> >   - Isolated double-quotes aren't escaped in any way.
> >     There's no clearly-correct way to deal with them,
> >     I think, but leaving them as they are means that
> >     the problem that prompted me to write this has only
> >     been lessened, not entirely fixed. This is a
> >     deficiency. Fixing it would substantially complicate
> >     the code. Kludging it, for instance by a second pass
> >     that simply annihilates double-quotes, would be
> >     pretty easy but arguably the Wrong Thing.

[Sam:]
> This would be pretty hard to parse properly anyway. Although it should
> be the job of the MUA to escape those quotes and encapsulate the whole
> filename in double-quotes, I've seen so many d***ss mailers while
> working on the multipart decoding that we can't rely on them :)

Right. :-)

> >   - Escaped characters are left escaped. This is probably
> >     a deficiency, but a minor one.
> 
> Aren't they supposed to remain escaped ? If they're unescaped, won't
> they hurt the AJAX interface too ?

The mechanism for escaping characters in XML attribute values
isn't the same as the mechanism for escaping characters in
MIME parameter values. The XML way is to replace the offending
characters with entities (&#34; or whatever), and the MIME way
is to backslash them. Unfortunately, escaping XML-style can
make the string longer, which is the reason why a lazy
quick-and-dirty patch like mine doesn't do it :-).

> That's a good patch, and working. Thanks for your work and for making
> it neat.

You're welcome.

                              *

Speaking of mod_mbox problems, the dev@ archives at apache.org
(which are managed by mod_mbox) seem to be some way behind the
times. Gmane shows 20 messages already in June, whereas the official
archives have only 8. Perhaps it's just that the archives are
only updated daily; if not, something may be amiss. I expect this
is the wrong place to report the problem (if it *is* a problem),
though...

-- 
g


Re: mod_mbox: incorrect parsing of MIME part-names

Posted by Maxime Petazzoni <ma...@bulix.org>.
Hi,

* Gareth McCaughan <gm...@synaptics-uk.com> [2006-06-02 15:02:04]:

> (I sent this to dev@ yesterday, but I wasn't subscribed then.
> Presumably the message just got dropped. It would have been
> nice to have a bounce message at least, but perhaps the levels
> of spam make even that impossible. Alas.)

Thanks a lot for re-posting, feedback like this is always welcome !

> The following patch kinda-sorta deals with this. Remaining
> deficiencies, definite and arguable:
> 
>   - Double-quote pairs other than at the start and end
>     of the value aren't stripped. In other words, the
>     above-mentioned robustness isn't used properly.
>     This is a deficiency.
> 
>   - Isolated double-quotes aren't escaped in any way.
>     There's no clearly-correct way to deal with them,
>     I think, but leaving them as they are means that
>     the problem that prompted me to write this has only
>     been lessened, not entirely fixed. This is a
>     deficiency. Fixing it would substantially complicate
>     the code. Kludging it, for instance by a second pass
>     that simply annihilates double-quotes, would be
>     pretty easy but arguably the Wrong Thing.

This would be pretty hard to parse properly anyway. Although it should
be the job of the MUA to escape those quotes and encapsulate the whole
filename in double-quotes, I've seen so many d***ss mailers while
working on the multipart decoding that we can't rely on them :)

>   - Escaped characters are left escaped. This is probably
>     a deficiency, but a minor one.

Aren't they supposed to remain escaped ? If they're unescaped, won't
they hurt the AJAX interface too ?

>   - I haven't looked to see whether there are other bits
>     of MIME handling that have the same problem.

I've also fixed a boundary research problem, and it should not suffer
from the same problem ... I hope.

> Applying this patch would certainly be an improvement,
> despite these deficiencies. It might be better to do
> something that goes further, but I shan't attempt to
> do so without some idea of what kind of going-further
> is thought appropriate by the mod_mbox maintainers.

That's a good patch, and working. Thanks for your work and for making
it neat.

Bests,
- Sam
-- 
Maxime Petazzoni (http://www.bulix.org)
 -- gone crazy, back soon. leave message.