You are viewing a plain text version of this content. The canonical link for it is here.

Posted to server-dev@james.apache.org by Markus Wiederkehr <ma...@gmail.com> on 2008/12/03 20:15:42 UTC

[mime4j] Possible header field parsing problem

I think I have found a minor(?) issue when parsing header fields.

RFC 822 defines a field as:
     field       =  field-name ":" [ field-body ] CRLF
     field-name  =  1*<any CHAR, excluding CTLs, SPACE, and ":">

.. which implies two things. First a field name must consist of at
least one character. And second a field name may not contain spaces or
tabs; not even trailing ones.

Now take a look at o.a.j.mime4j.parser.AbstractEntity#parseField. This
method accepts empty field name, that is, header lines that
immediately start with a colon. It does not accept trailing
tabs/spaces.

On the other hand o.a.j.mime4j.field.Field#parse uses a regular
expression that allows trailing tabs or spaces in the field name. The
regex does not match empty field names.

I think both methods should be very strict and allow neither empty
field names nor trailing spaces. Or at the very least they should be
consistent with each other.

If both method would behave consistently this would also resolve
another issue that's been bothering me: the MimeException in
Field#parse. It could be changed back to an IllegalArgumentException
because the method would never be invoked with an invalid argument
when parsing an InputStream. AbstractEntity#parseField already drops
such invalid header fields.

Opinions?

Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: [mime4j] Possible header field parsing problem

Posted by Oleg Kalnichevski <ol...@apache.org>.

Markus Wiederkehr wrote:
> I think I have found a minor(?) issue when parsing header fields.
> 
> RFC 822 defines a field as:
>      field       =  field-name ":" [ field-body ] CRLF
>      field-name  =  1*<any CHAR, excluding CTLs, SPACE, and ":">
> 
> .. which implies two things. First a field name must consist of at
> least one character. And second a field name may not contain spaces or
> tabs; not even trailing ones.
> 
> Now take a look at o.a.j.mime4j.parser.AbstractEntity#parseField. This
> method accepts empty field name, that is, header lines that
> immediately start with a colon. 

This is clearly wrong and should be fixed.


It does not accept trailing
> tabs/spaces.
> 
> On the other hand o.a.j.mime4j.field.Field#parse uses a regular
> expression that allows trailing tabs or spaces in the field name. The
> regex does not match empty field names.
> 

In the HTTP world fields with the trailing blanks in the field name are 
not uncommon. I do not mind either way as long as it is consistent.

Oleg


> I think both methods should be very strict and allow neither empty
> field names nor trailing spaces. Or at the very least they should be
> consistent with each other.
> 
> If both method would behave consistently this would also resolve
> another issue that's been bothering me: the MimeException in
> Field#parse. It could be changed back to an IllegalArgumentException
> because the method would never be invoked with an invalid argument
> when parsing an InputStream. AbstractEntity#parseField already drops
> such invalid header fields.
> 
> Opinions?
> 
> Markus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org