You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Oleg Kalnichevski <ol...@apache.org> on 2008/12/24 19:21:54 UTC

[mime4j] Simple benchmark for testing performance of the MIME stream parser

Folks

I took liberty to commit an ultra-simple benchmark I use for testing 
performance of the MIME stream parser.

http://svn.apache.org/viewvc?view=rev&revision=729347

Feel free to improve / extend / remove if useless.

Merry XMass to those who celebrate it, Merry Whatever to those who do not.

Oleg

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4j] Simple benchmark for testing performance of the MIME stream parser

Posted by Markus Wiederkehr <ma...@gmail.com>.
On Sun, Jan 4, 2009 at 11:14 AM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Sat, Jan 3, 2009 at 8:15 PM, Markus Wiederkehr
> <ma...@gmail.com> wrote:
>> On Wed, Dec 24, 2008 at 7:21 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
>>> Folks
>>>
>>> I took liberty to commit an ultra-simple benchmark I use for testing
>>> performance of the MIME stream parser.
>>>
>>> http://svn.apache.org/viewvc?view=rev&revision=729347
>>>
>>> Feel free to improve / extend / remove if useless.
>>
>> I have extended the class a bit. It is now possible to choose from
>> four different tests.
>>
>> Test 0 is the one Oleg wrote. It reads from a MimeTokenStream until
>> its end is reached.
>> Test 1 uses a MimeStreamParser and reports to an empty AbstractContentHandler.
>> Test 2 uses a MimeStreamParser and reports to an empty SimpleContentHandler.
>> Test 3 creates Message objects in memory.
>>
>> On my machine the results are:
>> Test 0: ~ 8 sec
>> Test 1: ~ 8 sec
>> Test 2: ~ 41 sec
>> Test 3: ~ 47 sec
>>
>> So it looks like parsing the header fields consumes about 80 percent of test 2.
>>
>> The difference between #2 and 3 is probably caused by copying the
>> message bodies into Storage objects.
>>
>> Maybe the header fields should be parsed lazily?
>
> IIRC there are a few wrinkles with this (at least some need to be
> parsed and some care need to be taken with folded values) but i think
> only structural headers really need to be parsed on the first pass.
>
>> Does anybody have a better idea?
>
> (this one isn't really a better idea but it's a little different so
> i'll throw it out there and see what happens...)
>
> the minimal useful MIME parser would read just the structural headers
> and the boundaries: dividing the stream into header lines and body
> parts without unnecessary parsing of the contents.

I think this already happens in a way. Look into DefaultBodyDescriptor
for example. There is a method parseContentType which determines the
boundary string from the content-type field. Note that this is used by
MimeTokenStream and has nothing to do with building a DOM.

Later when a Message objects gets built all header fields are parsed
_again_. Only this time a javacc generated parser is used. This is
where things get slow and this is what could be done lazily in my
opinion.

> the generalised use case i have in mind is streaming into storage.
> this use case occurs naturally when dealing with mail protocols but
> has other applications (for example, in CMRs).
>
> 1. a MIME message starts to be delivered to a socket
> 2. the protocol processor feeds the stream to a parser
> 3. the processor analyzes the boundaries streams head lines and body
> parts to permanent storage without unnecessary semantic parsing of the
> meta-data
> 4. when the message is complete, the processor continues to parse the
> incoming stream
>
> one problem with full DOMs (as used by JavaMail) is that large MIME
> documents are too big to fit in memory. this causes problems for
> protocols server. a structural DOM (maintaining at most meta-data in
> memory whilst allowing access to content through streams) backed by
> storage would be much more useful in this case.

But isn't that what we already have with Mime4j? The structure of the
message is kept in memory (including the header fields) whereas the
contents of text and binary parts are kept in Storage objects (on disk
or wherever).

The only thing is that base64 and quoted-printable parts are decoded
before they get stored. But I don't think this is necessarily the
number one performance problem.

The benchmark test message currently has 7-bit encoded body parts. I
have changed that to base64 to see what happens. It turns out that
test #3 now runs in about 55 seconds (47 before the change).

My interpretation (from the differences in runtime of the various
tests) is that from those 55 seconds 8 seconds are used up by
MimeTokenStream, about 6 seconds go into copying the body parts to
storage, 8 seconds are for base64 decoding and the remaining 33
seconds are for parsing header fields. That's about 60 percent.

And by parsing header fields I mean Field.parse(), not some parsing
already done in MimeTokenStream..

Of course this is only a rough estimate but I don't think I'm very far off.

Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4j] Simple benchmark for testing performance of the MIME stream parser

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Sat, Jan 3, 2009 at 8:15 PM, Markus Wiederkehr
<ma...@gmail.com> wrote:
> On Wed, Dec 24, 2008 at 7:21 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
>> Folks
>>
>> I took liberty to commit an ultra-simple benchmark I use for testing
>> performance of the MIME stream parser.
>>
>> http://svn.apache.org/viewvc?view=rev&revision=729347
>>
>> Feel free to improve / extend / remove if useless.
>
> I have extended the class a bit. It is now possible to choose from
> four different tests.
>
> Test 0 is the one Oleg wrote. It reads from a MimeTokenStream until
> its end is reached.
> Test 1 uses a MimeStreamParser and reports to an empty AbstractContentHandler.
> Test 2 uses a MimeStreamParser and reports to an empty SimpleContentHandler.
> Test 3 creates Message objects in memory.
>
> On my machine the results are:
> Test 0: ~ 8 sec
> Test 1: ~ 8 sec
> Test 2: ~ 41 sec
> Test 3: ~ 47 sec
>
> So it looks like parsing the header fields consumes about 80 percent of test 2.
>
> The difference between #2 and 3 is probably caused by copying the
> message bodies into Storage objects.
>
> Maybe the header fields should be parsed lazily?

IIRC there are a few wrinkles with this (at least some need to be
parsed and some care need to be taken with folded values) but i think
only structural headers really need to be parsed on the first pass.

> Does anybody have a better idea?

(this one isn't really a better idea but it's a little different so
i'll throw it out there and see what happens...)

the minimal useful MIME parser would read just the structural headers
and the boundaries: dividing the stream into header lines and body
parts without unnecessary parsing of the contents.

the generalised use case i have in mind is streaming into storage.
this use case occurs naturally when dealing with mail protocols but
has other applications (for example, in CMRs).

1. a MIME message starts to be delivered to a socket
2. the protocol processor feeds the stream to a parser
3. the processor analyzes the boundaries streams head lines and body
parts to permanent storage without unnecessary semantic parsing of the
meta-data
4. when the message is complete, the processor continues to parse the
incoming stream

one problem with full DOMs (as used by JavaMail) is that large MIME
documents are too big to fit in memory. this causes problems for
protocols server. a structural DOM (maintaining at most meta-data in
memory whilst allowing access to content through streams) backed by
storage would be much more useful in this case.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4j] Simple benchmark for testing performance of the MIME stream parser

Posted by Markus Wiederkehr <ma...@gmail.com>.
On Wed, Dec 24, 2008 at 7:21 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
> Folks
>
> I took liberty to commit an ultra-simple benchmark I use for testing
> performance of the MIME stream parser.
>
> http://svn.apache.org/viewvc?view=rev&revision=729347
>
> Feel free to improve / extend / remove if useless.

I have extended the class a bit. It is now possible to choose from
four different tests.

Test 0 is the one Oleg wrote. It reads from a MimeTokenStream until
its end is reached.
Test 1 uses a MimeStreamParser and reports to an empty AbstractContentHandler.
Test 2 uses a MimeStreamParser and reports to an empty SimpleContentHandler.
Test 3 creates Message objects in memory.

On my machine the results are:
Test 0: ~ 8 sec
Test 1: ~ 8 sec
Test 2: ~ 41 sec
Test 3: ~ 47 sec

So it looks like parsing the header fields consumes about 80 percent of test 2.

The difference between #2 and 3 is probably caused by copying the
message bodies into Storage objects.

Maybe the header fields should be parsed lazily? Does anybody have a
better idea?

Markus

PS: I have also written a test that creates a JavaMail MimeMessage.
That takes about 12 seconds on my machine.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4j] Simple benchmark for testing performance of the MIME stream parser

Posted by Markus Wiederkehr <ma...@gmail.com>.
On Wed, Dec 24, 2008 at 7:21 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
> Folks
>
> I took liberty to commit an ultra-simple benchmark I use for testing
> performance of the MIME stream parser.
>
> http://svn.apache.org/viewvc?view=rev&revision=729347

Good idea! I have added the two benchmark tests I have recently
written for MIME4J-71 and -92..

Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4j] Simple benchmark for testing performance of the MIME stream parser

Posted by Oleg Kalnichevski <ol...@apache.org>.
Robert Burrell Donkin wrote:
> On Wed, Dec 24, 2008 at 6:21 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
>> Folks
>>
>> I took liberty to commit an ultra-simple benchmark I use for testing
>> performance of the MIME stream parser.
>>
>> http://svn.apache.org/viewvc?view=rev&revision=729347
>>
>> Feel free to improve / extend / remove if useless.
> 
> cool
> 
> maybe it's about time to think about splitting mime4j into a
> multi-module project...
> 

Hi Robert

I doubt the benchmark code should be made into a releasable binary 
artifact, unless there is a way to run the benchmark automatically every 
once in a while using a build or a CI server.

+1 to the multi-module project layout.

Oleg

>> Merry XMass to those who celebrate it, Merry Whatever to those who do not.
> 
> +1
> 
> - robert
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4j] Simple benchmark for testing performance of the MIME stream parser

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Wed, Dec 24, 2008 at 6:21 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
> Folks
>
> I took liberty to commit an ultra-simple benchmark I use for testing
> performance of the MIME stream parser.
>
> http://svn.apache.org/viewvc?view=rev&revision=729347
>
> Feel free to improve / extend / remove if useless.

cool

maybe it's about time to think about splitting mime4j into a
multi-module project...

> Merry XMass to those who celebrate it, Merry Whatever to those who do not.

+1

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org