You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Nick Burch <ni...@apache.org> on 2010/12/13 02:33:16 UTC

Lower memory POIFS

Hi All

Just a heads-up that I'm planning to spending some time while I'm off over 
Christmas working on a lower memory POIFS implementation. The idea is to 
have it load blocks on demand, in disk instead of logical order, rather 
than the current code which loads everything and sorts before use.

Initially I'm aiming for enough support to be able to read the directory 
listing, and read streams. I'm not intending to support write, but I do 
aim to leave extension hooks in place so it can be added later if anyone 
wants to! Performance wise, I suspect it'll be faster to load, but slower 
to read streams (due to disk seeks)

If anyone has done anything like this before that they might be willing to 
open source, please shout now! (There have previously been dev@ 
discussions about it, but alas no code that I'm aware of). Otherwise I'll 
be hopefully starting to commit stuff to svn in a week or so, and will 
report back when there's something to see.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Lower memory POIFS

Posted by David Fisher <df...@jmlafferty.com>.
Cheers! Quite an accomplishment!

Dave

On Dec 28, 2010, at 7:41 PM, Nick Burch wrote:

> On Mon, 13 Dec 2010, Nick Burch wrote:
>> Just a heads-up that I'm planning to spending some time while I'm off over Christmas working on a lower memory POIFS implementation. The idea is to have it load blocks on demand, in disk instead of logical order, rather than the current code which loads everything and sorts before use.
>> 
>> Initially I'm aiming for enough support to be able to read the directory listing, and read streams. I'm not intending to support write, but I do aim to leave extension hooks in place so it can be added later if anyone wants to! Performance wise, I suspect it'll be faster to load, but slower to read streams (due to disk seeks)
> 
> I think we're now largely there with this, with all the code in svn trunk. It still needs to be documented, and not all the POIDocument implementations can work with it for read (write isn't supported yet)
> 
> However, running HPSFPropertiesExtractor against a 9mb file:
> * POIFSFileSystem - minimum Xmx 12mb, average time 250ms
> * NPOIFSFileSystem - minimum Xmx 2mb, average time 190ms
> So, the new code is both lower memory and faster, which seems a result!
> 
> Once write support is done, and tested, I'd lean towards making NPOIFS an option for one release, then rename POIFS to OPOIFS and make NPOIFS the new default. More on that when they're feature compatible though!
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Lower memory POIFS

Posted by Nick Burch <ni...@apache.org>.
On Mon, 13 Dec 2010, Nick Burch wrote:
> Just a heads-up that I'm planning to spending some time while I'm off over 
> Christmas working on a lower memory POIFS implementation. The idea is to have 
> it load blocks on demand, in disk instead of logical order, rather than the 
> current code which loads everything and sorts before use.
>
> Initially I'm aiming for enough support to be able to read the directory 
> listing, and read streams. I'm not intending to support write, but I do aim 
> to leave extension hooks in place so it can be added later if anyone wants 
> to! Performance wise, I suspect it'll be faster to load, but slower to read 
> streams (due to disk seeks)

I think we're now largely there with this, with all the code in svn trunk. 
It still needs to be documented, and not all the POIDocument 
implementations can work with it for read (write isn't supported yet)

However, running HPSFPropertiesExtractor against a 9mb file:
* POIFSFileSystem - minimum Xmx 12mb, average time 250ms
* NPOIFSFileSystem - minimum Xmx 2mb, average time 190ms
So, the new code is both lower memory and faster, which seems a result!

Once write support is done, and tested, I'd lean towards making NPOIFS an 
option for one release, then rename POIFS to OPOIFS and make NPOIFS the 
new default. More on that when they're feature compatible though!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Lower memory POIFS

Posted by Clemens <cl...@mysign.ch>.
thx for this clarification

-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/Lower-memory-POIFS-tp3302596p3316284.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Lower memory POIFS

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 23 Dec 2010, Clemens wrote:
> Do we approx. know what files sizes (of xlsx, ...) cause problems for (e.g.)
> 1G heap size?

xlsx files are OOXML, not OLE2, so they don't touch POIFS. It's only .xls 
/ .doc / etc files that use POIFS

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Lower memory POIFS

Posted by Clemens <cl...@mysign.ch>.
I would also like to have some "bad" files that are known to cause OOM...

Do we approx. know what files sizes (of xlsx, ...) cause problems for (e.g.)
1G heap size?
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/Lower-memory-POIFS-tp3302596p3316241.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Lower memory POIFS

Posted by Nick Burch <ni...@apache.org>.
On Mon, 13 Dec 2010, Nick Burch wrote:
> Just a heads-up that I'm planning to spending some time while I'm off 
> over Christmas working on a lower memory POIFS implementation. The idea 
> is to have it load blocks on demand, in disk instead of logical order, 
> rather than the current code which loads everything and sorts before 
> use.

This work is now in progress on trunk. I'm using the unit tests to try to 
avoid breaking anything as I go, but it's possible that problems could 
slip through. So, if you spot any POIFS issues when using trunk / 
nightlies due to my refactoring to support both uses, please shout...

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org