You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by "Sharma, Ashish" <as...@hp.com> on 2011/06/27 14:44:28 UTC
Using mime4j for parsing incoming emails
Hi,
I have a project where I need to parse incoming email streams provided in raw form and parse them out in their constituents viz (email body as separate file and email attachments as separate file).
I am able to do this by extending the class org.apache.james.mime4j.message.SimpleContentHandler.
Here I am facing following problem and request suggestions for that:
1. Since I have raw emails and I am parsing them out in their constituents, how can I test whether the parsing is working fine for a large corpus of raw emails that I have to use to test the efficiency and correctness of the mime parsing by mime4j.
How can I write test cases for such a scenario?
Meaning how would I be able to determine whether the file that was parsed out is correctly parsed by mime4j or not?
2. Any other kind of testing that I need to implement for improvement?
Thanks
Ashish
Re: Using mime4j for parsing incoming emails
Posted by Stefano Bagnara <ap...@bago.org>.
2011/7/1 Sharma, Ashish <as...@hp.com>:
> Norman,
>
> I got the following idea from one of the forums:
>
> "decode the attachments and then re-encode them. If the re-encoded stream matches (byte-for-byte) the original, then that's a good sign that mime4j is properly handling them"
This won't work. Serialization is not done byte per byte identical to
the original version. It is semantically identical, but not byte per
byte. E.g: the way mime4j encodes quoted-printable or base64 is
hardcoded and cannot match any input encoding.
There's no way you can automatically verify if parsing is correct:
what we do is parsing using perl mime tools tests that generates xml
files in the same format we generate them using our testsuite. At most
you can check if mime4j results equals perl mime tools results and
manually check where it doesn't match.
We have a test suite, if you believe a message is not correctly parsed
submit it to us and we'll review.
> http://stackoverflow.com/questions/6521010/verifying-testing-the-output-of-mime4j-parsed-content
>
> What is your comment on this and what classes should I use for implementing the suggestion?
As I said I consider this approach useless.
> Thanks
> Ashish
What you ask is impossible: the fact that per parsed and rewritten
message equals don't even tell you that the parsing was correct,
anyway.
Stefano
RE: Using mime4j for parsing incoming emails
Posted by "Sharma, Ashish" <as...@hp.com>.
Norman,
I got the following idea from one of the forums:
"decode the attachments and then re-encode them. If the re-encoded stream matches (byte-for-byte) the original, then that's a good sign that mime4j is properly handling them"
http://stackoverflow.com/questions/6521010/verifying-testing-the-output-of-mime4j-parsed-content
What is your comment on this and what classes should I use for implementing the suggestion?
Thanks
Ashish
-----Original Message-----
From: Norman Maurer [mailto:norman@apache.org]
Sent: Monday, June 27, 2011 6:24 PM
To: mime4j-dev@james.apache.org
Subject: Re: Using mime4j for parsing incoming emails
Hi there...
mime4j ships with many tests to check if does the right thing. Anyway
I'm almost sure the test don't cover everything.. You will need to read
the rfc to really understand if the email is parsed correctly. I would
only do this if you think it does not the right thing.
For Testing I suggest you to write junit tests...
Bye,
Norman
Am 27.06.2011 14:44, schrieb Sharma, Ashish:
> Hi,
>
> I have a project where I need to parse incoming email streams provided in raw form and parse them out in their constituents viz (email body as separate file and email attachments as separate file).
>
> I am able to do this by extending the class org.apache.james.mime4j.message.SimpleContentHandler.
>
> Here I am facing following problem and request suggestions for that:
>
> 1. Since I have raw emails and I am parsing them out in their constituents, how can I test whether the parsing is working fine for a large corpus of raw emails that I have to use to test the efficiency and correctness of the mime parsing by mime4j.
> How can I write test cases for such a scenario?
>
> Meaning how would I be able to determine whether the file that was parsed out is correctly parsed by mime4j or not?
>
> 2. Any other kind of testing that I need to implement for improvement?
>
> Thanks
> Ashish
Re: Using mime4j for parsing incoming emails
Posted by Norman Maurer <no...@apache.org>.
Hi there...
mime4j ships with many tests to check if does the right thing. Anyway
I'm almost sure the test don't cover everything.. You will need to read
the rfc to really understand if the email is parsed correctly. I would
only do this if you think it does not the right thing.
For Testing I suggest you to write junit tests...
Bye,
Norman
Am 27.06.2011 14:44, schrieb Sharma, Ashish:
> Hi,
>
> I have a project where I need to parse incoming email streams provided in raw form and parse them out in their constituents viz (email body as separate file and email attachments as separate file).
>
> I am able to do this by extending the class org.apache.james.mime4j.message.SimpleContentHandler.
>
> Here I am facing following problem and request suggestions for that:
>
> 1. Since I have raw emails and I am parsing them out in their constituents, how can I test whether the parsing is working fine for a large corpus of raw emails that I have to use to test the efficiency and correctness of the mime parsing by mime4j.
> How can I write test cases for such a scenario?
>
> Meaning how would I be able to determine whether the file that was parsed out is correctly parsed by mime4j or not?
>
> 2. Any other kind of testing that I need to implement for improvement?
>
> Thanks
> Ashish