You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Nick Burch <ni...@torchbox.com> on 2007/07/10 00:37:28 UTC

Re: Exchange/PST/Mail parsing (fwd)

On Sat, 7 Jul 2007, Travis wrote:
> I've got an initial copy that reads most of the important String chunks
> in the root node/chunk.  Just to make sure I did this right here is a
> zip with all the data needed to reproduce what I've done (the msg files
> were generated with Outlook 2000).

Excellent, that all looks good


> Let me know if I did this correctly and I'll drop a bug into Bugzilla to
> patch dev svn trunk.

Now I've un-packed it, I might as well commit it as-is. Now it's set up in
svn, bugzilla's definately the way to go for the next patches :)


> The patch file doesn't contain the contents of the actual MSG files, so I
> included them in the zip file.  You'll need to move them into the proper
> test directory after you patch.

It seemed to work fine, once I un-packed the rar file (not used one of
those in ages!) and shuffled everything into the right place.

Now I've committed it all to svn, you should be able to just do a "svn
diff" and send that in. That should then apply cleanly, so just do the
"svn diff"  and attach the patch to bugzilla.

(Oh, but if you're adding new binary files, you'll need to upload those by
hand to bugzilla)


Looking at the code, a few thoughts / suggestions:
* Can you ever have more than one message in a file? If so, it might be
   good to make the top level class handle getting one or more model level
   messages out
* I'd be inclined to move quite a bit of the MAPIMessage logic under
   /model/. Have a top level class that works directly on the POIFS, then
   have that give you one (or more?) model level classes that actually get
   (and in future set) all the nice strings
* The Chunks.getInstance().<Chunk> ideom doesn't quite seem the best fit
   to me. I might be inclined to go for a hslf style types + static method
   (RecordTypes + Record.createRecordForType), or a hssf style factory
   (RecordFactory). Do either of those ways look like they might work?
* Once the code has settled, it'd be great to get something on the site.
   It's all done with forrest and src/documentation/content/xdocs/

Otherwise, looks like great work. Do keep the patches coming in!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Exchange/PST/Mail parsing (fwd)

Posted by Nick Burch <ni...@torchbox.com>.
On Mon, 9 Jul 2007, Travis wrote:
>> Looking at the code, a few thoughts / suggestions:
>> * Can you ever have more than one message in a file? If so, it might be
>>    good to make the top level class handle getting one or more model level
>>    messages out
>
> Not in the MSG file format.  If there is ever a case of an MSG file 
> inside an MSG file it would be due to "attachments"

OK, I wasn't sure how it all fitted together

> ...which is simply a case of creating an input stream to read the data 
> from the attachment, just like you would for a text document, HTML file, 
> or zip.  I'm not sure if creating a special method in the library itself 
> for pulling this directly into a model is the best route to go...what 
> are your thoughts?

I guess it depends on if the attachment will be wrapped in its own poifs 
or not. If it is, don't worry too much, just have the api allow you to get 
all the MAPIMessages of attachments. If not, probably just a new 
MAPIMessage constructor to take the new thing.

> * I'd be inclined to move quite a bit of the MAPIMessage logic under
>>    /model/. Have a top level class that works directly on the POIFS, then
>>    have that give you one (or more?) model level classes that actually get
>>    (and in future set) all the nice strings
>
> I was actually thinking about refactoring the "Chunks" object out into 
> subclasses of StringChunk.  (I basically just wrote a bunch of unittests 
> and then did the simplest thing that would pass the tests).

That might be handy, I'm not sure. Whatever you do though, I'd be inclined 
to have MAPIMessage just deal in Chunks, and stuff (1+ classes) under 
model that deal with the user facing stuff. The class would let you get 
the subject as a string, and would call down to MAPIMessage for the chunk 
stuff.

(In general, we try to split the low level from the user facing stuff, so 
it's easier to understand, and also so we can hide the user from some of 
the more crazy aspects of the file formats)

>> * The Chunks.getInstance().<Chunk> ideom doesn't quite seem the best fit
>>    to me. I might be inclined to go for a hslf style types + static method
>>    (RecordTypes + Record.createRecordForType), or a hssf style factory
>>    (RecordFactory). Do either of those ways look like they might work?
>
> Yes, It is however required to be a new instance of the chunk, due to 
> having the data value stored in it (this isn't that important right now, 
> but will be extremely important when we start writing the files out to 
> disk).

You'll have the factory / static method create the new chunk instance. The 
idea is to only have the one factory/static method, only one definition of 
all the different kinds of chunks, then many instances. Take a look at the 
code suggested for two different ways of achieving it, and see if either 
of them look like your style

>> * Once the code has settled, it'd be great to get something on the site.
>>    It's all done with forrest and src/documentation/content/xdocs/
>
> I'll look into forrest, I've never used it before.

If you want to just tweak an existing setup, it's pretty nice. You'll 
probably want forrest 0.5.x though, as I don't think our build environment 
works with forrest 0.6

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org