You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Hölzl, Dominik <Do...@fabasoft.com> on 2019/05/17 10:24:35 UTC

Reduce memory footprint on parsing MSG attachments

Hello!

I have some suggestions to reduce memory footprint when parsing MSG files with huge/many attachments.

Currently AttachmentChunks uses ByteChunk for the attachment content data.
When parsing a MSG file (MAPIMessage ctor -> POIFSChunkParser.parse) this causes the complete attachment data to be read into memory as ByteChunk just reads the content into a plain byte array in ByteChunk.readValue/POIFSChunkParser.process.

My suggestion: Replace this with a newly introduced "ByteStreamChunk" which does not read the data initially on parsing but only refers the underlying InputStream which gives the possibility to read "directly" from the base input stream later.

This change would be a breaking change as with this the underlying stream (DocumentInputStream / POIFSFileSystem / ...) must not be closed prior to reading the attachment content.

Regards,
Dominik