You are viewing a plain text version of this content. The canonical link for it is here.

Posted to server-dev@james.apache.org by Antony Bowesman <ad...@teamware.com> on 2008/03/18 00:43:08 UTC

Bug in ContentTypeParser.jj?

Hi,

The ContentTypeParser does not like boundary strings that only contain digits. 
My .jj is not so hot, but I debugged the parser and it is returning 20 as the 
token.kind, whereas it seems to want 19 (QUOTEDSTRING) or 21 (ATOKEN).

I guess 20 is 'DIGITS' and the definition of digits is before ATOKEN, so it is 
satisfied before ATOKEN.

Should the fix be either

String value() :
{Token t;}
{
(	t=<ATOKEN>
|	t=<DIGITS>
|	t=<QUOTEDSTRING>
)
	{ return t.image; }
}

or the ATOKEN definition be defined before DIGITS?

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Bug in ContentTypeParser.jj?

Posted by Robert Burrell Donkin <ro...@gmail.com>.

On Tue, Mar 18, 2008 at 9:54 PM, Antony Bowesman <ad...@teamware.com> wrote:

> Hi Robert,
>
> > IIRC there are some test mails which use boundary string containing only
> > digits. if you are willing to contribute a unit test or two that
> > demonstrates the problem, it'd help a lot. please use
> > https://issues.apache.org/jira/browse/MIME4J
>
> Done.  I ticked the wrong box for the ASF licence, but could not edit it
> to
> grant licence to ASF, but the test case can be included if needed.

i did wonder whether that was the case but decided to err on the side of
caution. if (hopefully when ;-) you contribute again it's usually best to
use an apache namespace and to include the license boilerplate at the top of
any new files (it's not strictly necessary but your intention to contribute
is clear that way).

in the end, the existing example mails also break Message. the test case was
very useful since it pointed me at exactly the problem,

- robert

Re: Bug in ContentTypeParser.jj?

Posted by Robert Burrell Donkin <ro...@gmail.com>.

On Wed, Mar 19, 2008 at 12:07 AM, Antony Bowesman <ad...@teamware.com> wrote:

> Hi Robert,
>
> > the main pull parser parses all digit boundaries fine.
>
> Ah, now I see what you mean.  I debugged again and there's two lots of
> parsing
> going on,
>
> MimeStreamParser.parseHeader()
>   ContentHandler.field(fieldData)
>   BodyDescriptor.addField()
>     getHeaderParams()
>
> The Message implementation of ContentHandler.field() is parsing the
> message
> using the JavaCC parser, the same as the SimpleContentHandler
> implementation.


the pull parser parses the fields into name and value but the content
handler does not expose the results


> BodyDescriptor.addField() only parses the
> Content-Transfer-Encoding/Content-Type
> headers.  NB: It does not check 'Content-*' headers as the Javadocs state,
> e.g.
> Content-Disposition.
>
> So, it seems like BodyDescriptor.addField() is the odd-man-out.


IMHO BodyDescriptor.addField() doesn't belong in the interface at all: it
should only be used for initial population by the pull parser


> It's parsing
> the Content-Type header, but using a different parsing mechanism to the
> standard
> ContentTypeParser, and as there's no way in the ContentHandler
> implementation to
> get this BodyDescriptor parsed information to the handler, the handler
> must also
> parse it.


a fully parsed body descriptor is passed in. should be able to use that.

If BodyDescriptor is managing Content-* headers, it should it save them all,
> e.g. in
>
> private params Map<String, Map<String, String>>;
>
> where Strings are headerName/paramName/paramValue.
>
> For example, the Content-Disposition parameters should be available.
>  Currently
> only Content-Type parameters are.
>
> The it could offer methods such as
>
> // Returns the parameters for the given header, e.g. C-T or C-D
> String getContentParam(String headerName, String paramName);
> Map getContentParams(String headerName);


+1


> and helper methods:
> boolean isInline();
> boolean isAttachment();


i agree in principal but since BodyDescriptor is an interface it's may be
kinder to implementors to factor them into MimeUtil


> Anyway, I've just dived into the Mime4J as I want to use it for a project
> of
> mine.


it's just unfortunate that Message really isn't as well tested or mature as
the pull parser


> If I make any changes, I'll submit them back as patches at a later date.


that'd be great

- robert

Re: Bug in ContentTypeParser.jj?

Posted by Antony Bowesman <ad...@teamware.com>.

Hi Robert,

> the main pull parser parses all digit boundaries fine. 

Ah, now I see what you mean.  I debugged again and there's two lots of parsing 
going on,

MimeStreamParser.parseHeader()
   ContentHandler.field(fieldData)
   BodyDescriptor.addField()
     getHeaderParams()

The Message implementation of ContentHandler.field() is parsing the message 
using the JavaCC parser, the same as the SimpleContentHandler implementation.

BodyDescriptor.addField() only parses the Content-Transfer-Encoding/Content-Type 
headers.  NB: It does not check 'Content-*' headers as the Javadocs state, e.g. 
Content-Disposition.

So, it seems like BodyDescriptor.addField() is the odd-man-out.  It's parsing 
the Content-Type header, but using a different parsing mechanism to the standard 
ContentTypeParser, and as there's no way in the ContentHandler implementation to 
get this BodyDescriptor parsed information to the handler, the handler must also 
parse it.

If BodyDescriptor is managing Content-* headers, it should it save them all, 
e.g. in

private params Map<String, Map<String, String>>;

where Strings are headerName/paramName/paramValue.

For example, the Content-Disposition parameters should be available.  Currently 
only Content-Type parameters are.

The it could offer methods such as

// Returns the parameters for the given header, e.g. C-T or C-D
String getContentParam(String headerName, String paramName);
Map getContentParams(String headerName);

and helper methods:
boolean isInline();
boolean isAttachment();

Anyway, I've just dived into the Mime4J as I want to use it for a project of 
mine.  If I make any changes, I'll submit them back as patches at a later date.

 > having taken a look,
> Message seems to me to do far too much of it's own parsing. this parsing
> seems pretty buggy and also unnecessary: BodyDescriptor contains parsed
> meta-data.
> 
> IMHO the right way to fix this problem is to rewrite Message so that the
> information is obtained from the meta-data provided by the main parser. but
> Message isn't really something i've played around with before and i'm out of
> time this evening. if you'd like to had a go at revising the implementation,
> it should be reasonable easy. otherwise, unless anyone jumps in with a
> better implementations strategy, i should be able to take a look at this
> tomorrow.

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Bug in ContentTypeParser.jj?

Posted by Robert Burrell Donkin <ro...@gmail.com>.

On Tue, Mar 18, 2008 at 9:54 PM, Antony Bowesman <ad...@teamware.com> wrote:

> Hi Robert,
>
> > IIRC there are some test mails which use boundary string containing only
> > digits. if you are willing to contribute a unit test or two that
> > demonstrates the problem, it'd help a lot. please use
> > https://issues.apache.org/jira/browse/MIME4J
>
> Done.  I ticked the wrong box for the ASF licence, but could not edit it
> to
> grant licence to ASF, but the test case can be included if needed.

thanks

> It's related to general digit only parameter values, not just boundary
> strings
> but the test case shows a digit only example boundary string.

yes

> > hmmm...
> >
> > fiddling with the tokenisation seems more risky. so the first sounds
> like
> > the better solution to me. (hopefully people will jump in if this seems
> > wrong to them...)
>
> I agree, RFC allows all digits, so there's probably nothing wrong in
> including it.

the main pull parser parses all digit boundaries fine. having taken a look,
Message seems to me to do far too much of it's own parsing. this parsing
seems pretty buggy and also unnecessary: BodyDescriptor contains parsed
meta-data.

IMHO the right way to fix this problem is to rewrite Message so that the
information is obtained from the meta-data provided by the main parser. but
Message isn't really something i've played around with before and i'm out of
time this evening. if you'd like to had a go at revising the implementation,
it should be reasonable easy. otherwise, unless anyone jumps in with a
better implementations strategy, i should be able to take a look at this
tomorrow.

- robert

Re: Bug in ContentTypeParser.jj?

Posted by Antony Bowesman <ad...@teamware.com>.

Hi Robert,

> IIRC there are some test mails which use boundary string containing only
> digits. if you are willing to contribute a unit test or two that
> demonstrates the problem, it'd help a lot. please use
> https://issues.apache.org/jira/browse/MIME4J

Done.  I ticked the wrong box for the ASF licence, but could not edit it to 
grant licence to ASF, but the test case can be included if needed.

It's related to general digit only parameter values, not just boundary strings 
but the test case shows a digit only example boundary string.

> hmmm...
> 
> fiddling with the tokenisation seems more risky. so the first sounds like
> the better solution to me. (hopefully people will jump in if this seems
> wrong to them...)

I agree, RFC allows all digits, so there's probably nothing wrong in including it.

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Bug in ContentTypeParser.jj?

Posted by Robert Burrell Donkin <ro...@gmail.com>.

On Mon, Mar 17, 2008 at 11:43 PM, Antony Bowesman <ad...@teamware.com> wrote:

> Hi,
>
> The ContentTypeParser does not like boundary strings that only contain
> digits.

IIRC there are some test mails which use boundary string containing only
digits. if you are willing to contribute a unit test or two that
demonstrates the problem, it'd help a lot. please use
https://issues.apache.org/jira/browse/MIME4J

> My .jj is not so hot,

(mine neither but we usually manage to work it out in the end ;-)

> but I debugged the parser and it is returning 20 as the
> token.kind, whereas it seems to want 19 (QUOTEDSTRING) or 21 (ATOKEN).
>
> I guess 20 is 'DIGITS' and the definition of digits is before ATOKEN, so
> it is
> satisfied before ATOKEN.
>
> Should the fix be either
>
> String value() :
> {Token t;}
> {
> (       t=<ATOKEN>
> |       t=<DIGITS>
> |       t=<QUOTEDSTRING>
> )
>        { return t.image; }
> }
>
> or the ATOKEN definition be defined before DIGITS?

hmmm...

fiddling with the tokenisation seems more risky. so the first sounds like
the better solution to me. (hopefully people will jump in if this seems
wrong to them...)

- robert