You are viewing a plain text version of this content. The canonical link for it is here.

Posted to server-dev@james.apache.org by "Noel J. Bergman" <no...@devtech.com> on 2005/06/05 03:28:00 UTC

Mime4J

>>> We've just added some committers to James to incoporate MIME4j
>>> (currently at http://mime4j.sourceforge.net/).  Maybe we consider
>>> using that.
>> Absolutely, and develop the rest of the NotJavaMail APIs around it.
>> It looks like these guys know MIME and if there are problems I'm sure
>> we can work with them in short order.  Once we get this effort off the
>> ground we should ask them to join the Mailet2 subproject.

> note that mime4j only provides read-only capabilities at this time.
> mime4j doesn't directly support creating or modifying messages.

This is something that could be worked on, though.  :-)

> On the other hand, emitting RFC822/MIME messages seems like a much
> easier problem than parsing them.  The hard part IMO is coming up with
> an API to let you work naturally with messages in a read/write fashion
> while remaining performant and memory-efficient.

When do you expect to get mime4j loaded into source control so that we can
start looking at these areas, as well as just using it within JAMES?

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Mime4J

Posted by Laurent Rouvet <la...@rouvet.com>.

Joe Cheng wrote:

> * Is it useful/desirable to parse the message-id field (into left and 
> right parts)?  


no.

> * Is it useful/desirable to parse the Received field into name/value 
> pairs? 


Yes, I'm using the date....

Laurent

RE: Mime4J

Posted by "Noel J. Bergman" <no...@devtech.com>.

> > * Is it useful/desirable to parse the message-id field (into left and
> > right parts)?

Pretty low on my priority list.

> > * Is it useful/desirable to parse the Received field into name/value
> > pairs?

If you can, yes.  Otherwise, we would have to do some on our own.  I've used
regex to do it for specific cases, e.g., (from some old code):

  if (checkRelays) try
  {
    matcher = new Perl5Matcher();
    pattern = new Perl5Compiler().compile("(from|FROM)
.*[\\(\\[]([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3})[\\]\\)]");
  }

The intent was to pull IP addresses from intermediate servers.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

RE: Mime4J

Posted by Daniel Perry <d....@netcase.co.uk>.

> * Is it useful/desirable to parse the message-id field (into left and
> right parts)?  I get the impression that a relatively high number of
> mail messages have syntactically illegal message-id values--two @ signs
> seems to be a particularly common offense.  Since the message-id is
> really intended to be used as an opaque value, wouldn't it be more
> robust and equally useful for the parser to treat it as such?

I dont think there's any need to parse this.

> * Is it useful/desirable to parse the Received field into name/value
> pairs?  If so, anyone on this list have a lot of experience with
> real-world Received values?  It looks to me like there are a lot of
> illegal Received headers out there, as well as a lot of useful
> information stuck in parenthetical comments.  In fact, according to Dan
> Bernstein, "It is probably best for readers to treat everything before
> the final semicolon as unstructured text, purely for human consumption."
> (http://cr.yp.to/immhf/envelope.html)  Agree/disagree?

Yes and no. I dont really know if it's intended to be machine readable, but
it is more useful for tracing messages.  However, one useful part of this
header is the "for" attribute, and i believe this is used in fetchmail.

Daniel.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Mime4J

Posted by Joe Cheng <co...@joecheng.com>.

Noel J. Bergman wrote:

> Do you want to post some revised JAMES code to demonstrate the features?

Anything in particular you had in mind?  I have to admit I hardly know 
the JAMES codebase at all.

>>The header parser is also missing the ability to parse trace
>>(i.e. return-path, received) and message-id fields.
>>    
>>
>Both Stream and DOM, or just the latter?
>  
>
Both--they use the same header parser(s).  When I say "missing the 
ability to parse", I mean, you can ask for the header value, but other 
than whitespace unfolding, we don't do any transforming of the field data.

Actually, I've been a little uneasy about implementing these two 
parsers, maybe some people on this list can shed some light on these 
questions.

* Is it useful/desirable to parse the message-id field (into left and 
right parts)?  I get the impression that a relatively high number of 
mail messages have syntactically illegal message-id values--two @ signs 
seems to be a particularly common offense.  Since the message-id is 
really intended to be used as an opaque value, wouldn't it be more 
robust and equally useful for the parser to treat it as such?

* Is it useful/desirable to parse the Received field into name/value 
pairs?  If so, anyone on this list have a lot of experience with 
real-world Received values?  It looks to me like there are a lot of 
illegal Received headers out there, as well as a lot of useful 
information stuck in parenthetical comments.  In fact, according to Dan 
Bernstein, "It is probably best for readers to treat everything before 
the final semicolon as unstructured text, purely for human consumption." 
(http://cr.yp.to/immhf/envelope.html)  Agree/disagree?

-jmc

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

RE: Mime4J

Posted by "Noel J. Bergman" <no...@devtech.com>.

Joe Cheng wrote:

> To use mime4j in streaming mode [...]

> To use mime4j's DOM-like mode [...]

Do you want to post some revised JAMES code to demonstrate the features?

> I'm currently working on getting the [DOM] mode to work without using 
> temp files, when possible.

That'd be good :-)

> The header parser is also missing the ability to parse trace
> (i.e. return-path, received) and message-id fields.

Both Stream and DOM, or just the latter?

	--- Noel

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Mime4J

Posted by Joe Cheng <co...@joecheng.com>.

Noel J. Bergman wrote:

>>note that mime4j only provides read-only capabilities at this time.
>>mime4j doesn't directly support creating or modifying messages.
>>    
>>
>This is something that could be worked on, though.  :-)
>  
>
Absolutely. :)

>>On the other hand, emitting RFC822/MIME messages seems like a much
>>easier problem than parsing them.  The hard part IMO is coming up with
>>an API to let you work naturally with messages in a read/write fashion
>>while remaining performant and memory-efficient.
>>    
>>
>
>When do you expect to get mime4j loaded into source control so that we can
>start looking at these areas, as well as just using it within JAMES?
>  
>
It's up now:
https://svn.apache.org/repos/asf/james/mime4j/trunk/

Sorry, and thanks for the gentle prod.

To use mime4j in streaming mode, write an implementation of the 
ContentHandler interface (or subclass SimpleContentHandler, which 
automates header parsing and content-transfer decoding), then new up a 
MimeStreamParser and call setContentHandler(ContentHandler) on it, and 
finally call mimeStreamParser.parse(InputStream).

To use mime4j's DOM-like mode, you just use new Message(InputStream) to 
create a message, then call methods on it.  I can't be more specific 
about the methods it offers, as I haven't used it much--this is Niklas' 
area of expertise.

I'm currently working on getting the latter mode to work without using 
temp files, when possible.  The header parser is also missing the 
ability to parse trace (i.e. return-path, received) and message-id 
fields.  Other than that, mime4j is ready to use (and in fact has been 
in use in at least two production apps for several months).  So far it 
has proven completely robust for everyone who has tried it AFAIK.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Mime4J

Posted by Joe Cheng <co...@joecheng.com>.

Serge Knystautas wrote:

>I would strongly suggest implementing something with segments instead
>of DOM.  To see what I mean, take a look at
>http://jerichohtml.sourceforge.net/.  It gives you roughly a DOM-style
>to access and modify an HTML document.  But when you rebuild the
>modified HTML, it assembles the content by merging the existing raw
>stream content with whatever you've changed.
>  
>
Thanks for the pointer... this is a weird coincidence.  I recently wrote 
an HTML parsing library with a strikingly similar design!  It's closed 
source (belongs to my employer) so I'm glad to see Jericho exists.

>In comparison, building a stream from a DOM means you are converting
>all your object representations back into streams.  This has the
>downside of a) additional processing time b) possible changes to parts
>you didn't modify.  This b) part is key to me since (as with HTML)
>mime can have badly formatted parts that I would prefer we could just
>ignore and leave alone if we didn't touch it.
>  
>
Keep in mind that mime4j is a read-only parser, at least for now.  
Little or no thought has gone into how the API would work for read-write 
cases.  But I strongly agree that if and when we get around to working 
on modifying messages, we should work very hard to avoid collateral damage.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Mime4J

Posted by Serge Knystautas <sk...@gmail.com>.

On 6/5/05, Joe Cheng <co...@joecheng.com> wrote:
> To use mime4j's DOM-like mode, you just use new Message(InputStream) to
> create a message, then call methods on it.  I can't be more specific
> about the methods it offers, as I haven't used it much--this is Niklas'
> area of expertise.
> 
> I'm currently working on getting the latter mode to work without using
> temp files, when possible.  The header parser is also missing the
> ability to parse trace (i.e. return-path, received) and message-id
> fields.  Other than that, mime4j is ready to use (and in fact has been
> in use in at least two production apps for several months).  So far it
> has proven completely robust for everyone who has tried it AFAIK.

I would strongly suggest implementing something with segments instead
of DOM.  To see what I mean, take a look at
http://jerichohtml.sourceforge.net/.  It gives you roughly a DOM-style
to access and modify an HTML document.  But when you rebuild the
modified HTML, it assembles the content by merging the existing raw
stream content with whatever you've changed.

In comparison, building a stream from a DOM means you are converting
all your object representations back into streams.  This has the
downside of a) additional processing time b) possible changes to parts
you didn't modify.  This b) part is key to me since (as with HTML)
mime can have badly formatted parts that I would prefer we could just
ignore and leave alone if we didn't touch it.

Anyway, that's my 2 cents.  Again, strongly suggest checking out how
Jericho does this stuff and looking forward to seeing what you come up
with.

-- 
Serge Knystautas
Lokitech >> software . strategy . design >> http://www.lokitech.com
p. 301.656.5501
e. sergek@lokitech.com

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org