You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by "Sharma, Ashish" <as...@hp.com> on 2011/06/27 14:44:28 UTC

Using mime4j for parsing incoming emails

Hi,

I have a project where I need to parse incoming email streams provided in raw form and parse them out in their constituents viz (email body as separate file and email attachments as separate file).

I am able to do this by extending the class org.apache.james.mime4j.message.SimpleContentHandler.

Here I am facing following problem and request suggestions for that:

1. Since I have raw emails and I am parsing them out in their constituents, how can I test whether the parsing is working fine for a large corpus of raw emails that I have to use to test the efficiency and correctness of the mime parsing by mime4j. 
How can I write test cases for such a scenario?

Meaning how would I be able to determine whether the file that was parsed out is correctly parsed by mime4j or not?

2. Any other kind of testing that I need to implement for improvement?

Thanks
Ashish

Re: Using mime4j for parsing incoming emails

Posted by Stefano Bagnara <ap...@bago.org>.
2011/7/1 Sharma, Ashish <as...@hp.com>:
> Norman,
>
> I got the following idea from one of the forums:
>
> "decode the attachments and then re-encode them. If the re-encoded stream matches (byte-for-byte) the original, then that's a good sign that mime4j is properly handling them"

This won't work. Serialization is not done byte per byte identical to
the original version. It is semantically identical, but not byte per
byte. E.g: the way mime4j encodes quoted-printable or base64 is
hardcoded and cannot match any input encoding.

There's no way you can automatically verify if parsing is correct:
what we do is parsing using perl mime tools tests that generates xml
files in the same format we generate them using our testsuite. At most
you can check if mime4j results equals perl mime tools results and
manually check where it doesn't match.

We have a test suite, if you believe a message is not correctly parsed
submit it to us and we'll review.

> http://stackoverflow.com/questions/6521010/verifying-testing-the-output-of-mime4j-parsed-content
>
> What is your comment on this and what classes should I use for implementing the suggestion?

As I said I consider this approach useless.

> Thanks
> Ashish

What you ask is impossible: the fact that per parsed and rewritten
message equals don't even tell you that the parsing was correct,
anyway.

Stefano

RE: Using mime4j for parsing incoming emails

Posted by "Sharma, Ashish" <as...@hp.com>.
Norman,

I got the following idea from one of the forums:

"decode the attachments and then re-encode them. If the re-encoded stream matches (byte-for-byte) the original, then that's a good sign that mime4j is properly handling them"

http://stackoverflow.com/questions/6521010/verifying-testing-the-output-of-mime4j-parsed-content

What is your comment on this and what classes should I use for implementing the suggestion?

Thanks
Ashish

-----Original Message-----
From: Norman Maurer [mailto:norman@apache.org] 
Sent: Monday, June 27, 2011 6:24 PM
To: mime4j-dev@james.apache.org
Subject: Re: Using mime4j for parsing incoming emails

Hi there...

mime4j ships with many tests to check if does the right thing. Anyway 
I'm almost sure the test don't cover everything.. You will need to read 
the rfc to really understand if the email is parsed correctly. I would 
only do this if you think it does not the right thing.
For Testing I suggest you to write junit tests...

Bye,
Norman

Am 27.06.2011 14:44, schrieb Sharma, Ashish:
> Hi,
>
> I have a project where I need to parse incoming email streams provided in raw form and parse them out in their constituents viz (email body as separate file and email attachments as separate file).
>
> I am able to do this by extending the class org.apache.james.mime4j.message.SimpleContentHandler.
>
> Here I am facing following problem and request suggestions for that:
>
> 1. Since I have raw emails and I am parsing them out in their constituents, how can I test whether the parsing is working fine for a large corpus of raw emails that I have to use to test the efficiency and correctness of the mime parsing by mime4j.
> How can I write test cases for such a scenario?
>
> Meaning how would I be able to determine whether the file that was parsed out is correctly parsed by mime4j or not?
>
> 2. Any other kind of testing that I need to implement for improvement?
>
> Thanks
> Ashish



Re: Using mime4j for parsing incoming emails

Posted by Norman Maurer <no...@apache.org>.
Hi there...

mime4j ships with many tests to check if does the right thing. Anyway 
I'm almost sure the test don't cover everything.. You will need to read 
the rfc to really understand if the email is parsed correctly. I would 
only do this if you think it does not the right thing.
For Testing I suggest you to write junit tests...

Bye,
Norman

Am 27.06.2011 14:44, schrieb Sharma, Ashish:
> Hi,
>
> I have a project where I need to parse incoming email streams provided in raw form and parse them out in their constituents viz (email body as separate file and email attachments as separate file).
>
> I am able to do this by extending the class org.apache.james.mime4j.message.SimpleContentHandler.
>
> Here I am facing following problem and request suggestions for that:
>
> 1. Since I have raw emails and I am parsing them out in their constituents, how can I test whether the parsing is working fine for a large corpus of raw emails that I have to use to test the efficiency and correctness of the mime parsing by mime4j.
> How can I write test cases for such a scenario?
>
> Meaning how would I be able to determine whether the file that was parsed out is correctly parsed by mime4j or not?
>
> 2. Any other kind of testing that I need to implement for improvement?
>
> Thanks
> Ashish