You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Noel J. Bergman" <no...@devtech.com> on 2005/06/05 03:28:00 UTC
Mime4J
>>> We've just added some committers to James to incoporate MIME4j
>>> (currently at http://mime4j.sourceforge.net/). Maybe we consider
>>> using that.
>> Absolutely, and develop the rest of the NotJavaMail APIs around it.
>> It looks like these guys know MIME and if there are problems I'm sure
>> we can work with them in short order. Once we get this effort off the
>> ground we should ask them to join the Mailet2 subproject.
> note that mime4j only provides read-only capabilities at this time.
> mime4j doesn't directly support creating or modifying messages.
This is something that could be worked on, though. :-)
> On the other hand, emitting RFC822/MIME messages seems like a much
> easier problem than parsing them. The hard part IMO is coming up with
> an API to let you work naturally with messages in a read/write fashion
> while remaining performant and memory-efficient.
When do you expect to get mime4j loaded into source control so that we can
start looking at these areas, as well as just using it within JAMES?
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
Re: Mime4J
Posted by Laurent Rouvet <la...@rouvet.com>.
Joe Cheng wrote:
> * Is it useful/desirable to parse the message-id field (into left and
> right parts)?
no.
> * Is it useful/desirable to parse the Received field into name/value
> pairs?
Yes, I'm using the date....
Laurent
RE: Mime4J
Posted by "Noel J. Bergman" <no...@devtech.com>.
> > * Is it useful/desirable to parse the message-id field (into left and
> > right parts)?
Pretty low on my priority list.
> > * Is it useful/desirable to parse the Received field into name/value
> > pairs?
If you can, yes. Otherwise, we would have to do some on our own. I've used
regex to do it for specific cases, e.g., (from some old code):
if (checkRelays) try
{
matcher = new Perl5Matcher();
pattern = new Perl5Compiler().compile("(from|FROM)
.*[\\(\\[]([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3})[\\]\\)]");
}
The intent was to pull IP addresses from intermediate servers.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
RE: Mime4J
Posted by Daniel Perry <d....@netcase.co.uk>.
> * Is it useful/desirable to parse the message-id field (into left and
> right parts)? I get the impression that a relatively high number of
> mail messages have syntactically illegal message-id values--two @ signs
> seems to be a particularly common offense. Since the message-id is
> really intended to be used as an opaque value, wouldn't it be more
> robust and equally useful for the parser to treat it as such?
I dont think there's any need to parse this.
> * Is it useful/desirable to parse the Received field into name/value
> pairs? If so, anyone on this list have a lot of experience with
> real-world Received values? It looks to me like there are a lot of
> illegal Received headers out there, as well as a lot of useful
> information stuck in parenthetical comments. In fact, according to Dan
> Bernstein, "It is probably best for readers to treat everything before
> the final semicolon as unstructured text, purely for human consumption."
> (http://cr.yp.to/immhf/envelope.html) Agree/disagree?
Yes and no. I dont really know if it's intended to be machine readable, but
it is more useful for tracing messages. However, one useful part of this
header is the "for" attribute, and i believe this is used in fetchmail.
Daniel.
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
Re: Mime4J
Posted by Joe Cheng <co...@joecheng.com>.
Noel J. Bergman wrote:
> Do you want to post some revised JAMES code to demonstrate the features?
Anything in particular you had in mind? I have to admit I hardly know
the JAMES codebase at all.
>>The header parser is also missing the ability to parse trace
>>(i.e. return-path, received) and message-id fields.
>>
>>
>Both Stream and DOM, or just the latter?
>
>
Both--they use the same header parser(s). When I say "missing the
ability to parse", I mean, you can ask for the header value, but other
than whitespace unfolding, we don't do any transforming of the field data.
Actually, I've been a little uneasy about implementing these two
parsers, maybe some people on this list can shed some light on these
questions.
* Is it useful/desirable to parse the message-id field (into left and
right parts)? I get the impression that a relatively high number of
mail messages have syntactically illegal message-id values--two @ signs
seems to be a particularly common offense. Since the message-id is
really intended to be used as an opaque value, wouldn't it be more
robust and equally useful for the parser to treat it as such?
* Is it useful/desirable to parse the Received field into name/value
pairs? If so, anyone on this list have a lot of experience with
real-world Received values? It looks to me like there are a lot of
illegal Received headers out there, as well as a lot of useful
information stuck in parenthetical comments. In fact, according to Dan
Bernstein, "It is probably best for readers to treat everything before
the final semicolon as unstructured text, purely for human consumption."
(http://cr.yp.to/immhf/envelope.html) Agree/disagree?
-jmc
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
RE: Mime4J
Posted by "Noel J. Bergman" <no...@devtech.com>.
Joe Cheng wrote:
> To use mime4j in streaming mode [...]
> To use mime4j's DOM-like mode [...]
Do you want to post some revised JAMES code to demonstrate the features?
> I'm currently working on getting the [DOM] mode to work without using
> temp files, when possible.
That'd be good :-)
> The header parser is also missing the ability to parse trace
> (i.e. return-path, received) and message-id fields.
Both Stream and DOM, or just the latter?
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
Re: Mime4J
Posted by Joe Cheng <co...@joecheng.com>.
Noel J. Bergman wrote:
>>note that mime4j only provides read-only capabilities at this time.
>>mime4j doesn't directly support creating or modifying messages.
>>
>>
>This is something that could be worked on, though. :-)
>
>
Absolutely. :)
>>On the other hand, emitting RFC822/MIME messages seems like a much
>>easier problem than parsing them. The hard part IMO is coming up with
>>an API to let you work naturally with messages in a read/write fashion
>>while remaining performant and memory-efficient.
>>
>>
>
>When do you expect to get mime4j loaded into source control so that we can
>start looking at these areas, as well as just using it within JAMES?
>
>
It's up now:
https://svn.apache.org/repos/asf/james/mime4j/trunk/
Sorry, and thanks for the gentle prod.
To use mime4j in streaming mode, write an implementation of the
ContentHandler interface (or subclass SimpleContentHandler, which
automates header parsing and content-transfer decoding), then new up a
MimeStreamParser and call setContentHandler(ContentHandler) on it, and
finally call mimeStreamParser.parse(InputStream).
To use mime4j's DOM-like mode, you just use new Message(InputStream) to
create a message, then call methods on it. I can't be more specific
about the methods it offers, as I haven't used it much--this is Niklas'
area of expertise.
I'm currently working on getting the latter mode to work without using
temp files, when possible. The header parser is also missing the
ability to parse trace (i.e. return-path, received) and message-id
fields. Other than that, mime4j is ready to use (and in fact has been
in use in at least two production apps for several months). So far it
has proven completely robust for everyone who has tried it AFAIK.
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
Re: Mime4J
Posted by Joe Cheng <co...@joecheng.com>.
Serge Knystautas wrote:
>I would strongly suggest implementing something with segments instead
>of DOM. To see what I mean, take a look at
>http://jerichohtml.sourceforge.net/. It gives you roughly a DOM-style
>to access and modify an HTML document. But when you rebuild the
>modified HTML, it assembles the content by merging the existing raw
>stream content with whatever you've changed.
>
>
Thanks for the pointer... this is a weird coincidence. I recently wrote
an HTML parsing library with a strikingly similar design! It's closed
source (belongs to my employer) so I'm glad to see Jericho exists.
>In comparison, building a stream from a DOM means you are converting
>all your object representations back into streams. This has the
>downside of a) additional processing time b) possible changes to parts
>you didn't modify. This b) part is key to me since (as with HTML)
>mime can have badly formatted parts that I would prefer we could just
>ignore and leave alone if we didn't touch it.
>
>
Keep in mind that mime4j is a read-only parser, at least for now.
Little or no thought has gone into how the API would work for read-write
cases. But I strongly agree that if and when we get around to working
on modifying messages, we should work very hard to avoid collateral damage.
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
Re: Mime4J
Posted by Serge Knystautas <sk...@gmail.com>.
On 6/5/05, Joe Cheng <co...@joecheng.com> wrote:
> To use mime4j's DOM-like mode, you just use new Message(InputStream) to
> create a message, then call methods on it. I can't be more specific
> about the methods it offers, as I haven't used it much--this is Niklas'
> area of expertise.
>
> I'm currently working on getting the latter mode to work without using
> temp files, when possible. The header parser is also missing the
> ability to parse trace (i.e. return-path, received) and message-id
> fields. Other than that, mime4j is ready to use (and in fact has been
> in use in at least two production apps for several months). So far it
> has proven completely robust for everyone who has tried it AFAIK.
I would strongly suggest implementing something with segments instead
of DOM. To see what I mean, take a look at
http://jerichohtml.sourceforge.net/. It gives you roughly a DOM-style
to access and modify an HTML document. But when you rebuild the
modified HTML, it assembles the content by merging the existing raw
stream content with whatever you've changed.
In comparison, building a stream from a DOM means you are converting
all your object representations back into streams. This has the
downside of a) additional processing time b) possible changes to parts
you didn't modify. This b) part is key to me since (as with HTML)
mime can have badly formatted parts that I would prefer we could just
ignore and leave alone if we didn't touch it.
Anyway, that's my 2 cents. Again, strongly suggest checking out how
Jericho does this stuff and looking forward to seeing what you come up
with.
--
Serge Knystautas
Lokitech >> software . strategy . design >> http://www.lokitech.com
p. 301.656.5501
e. sergek@lokitech.com
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org