You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by David North <dt...@corefiling.co.uk> on 2014/07/02 11:07:07 UTC

Streaming read of XSSF

Greetings, fellow POI users.

I just wanted to check that my understanding is correct:

I have discovered that reading my entire workbook into memory using the
obvious method of loading it as an XSSFWorkbook consumes many gigabytes
of memory. I want to reduce this.

Fortunately, my code is already set up to consume the rows in one pass,
one at a time, so the obvious solution seems to be streaming.

The classes in org.apache.poi.xssf.streaming.* (in particular,
SXSSFWorkbook) are for streaming *write*.

If I want to do streaming *read*, I need to follow the example of
XSSFSheetXMLHandler, building something on top of SAX - i.e. there is no
API as such for streaming read access to Row and Cell objects.

Correct?

Thanks,
David

-- 
David North, Technical Lead, CoreFiling Limited
http://www.corefiling.com
Phone: +44-1865-203192


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Streaming read of XSSF

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 2 Jul 2014, David North wrote:
> I have discovered that reading my entire workbook into memory using the 
> obvious method of loading it as an XSSFWorkbook consumes many gigabytes 
> of memory. I want to reduce this.

Going from compressed xml, through decompressed xml and xml parsing, to 
many java objects does sadly mean XSSF needs many multiples of the file's 
size in memory use, sorry.

If you have a very large workbook with many sheets, and only care about a 
couple of sheets, then it's possible that a lazy-loading approach for 
sheets (not yet supported but  not much work) might help. For someone who 
needs most of the file's contents, your only option for using XSSF 
UserModel is to smile sweatly at your sysadmin and ask them to buy you 
some more memory...

> If I want to do streaming *read*, I need to follow the example of 
> XSSFSheetXMLHandler, building something on top of SAX - i.e. there is no 
> API as such for streaming read access to Row and Cell objects.

Correct, SAX reading is your only option for streaming read. POI provides 
a number of helper classes to make your life easier. There's also a few 
examples which show how to handle formatting etc when doing sax reading, 
try looking at the XLSX to CSV example in POI, and the XLSX to XHTML code 
in Apache Tika

Alternately, if you can think of a good model for providing a Workbook 
style paging XSSF reader on top of SAX, to provide a streaming read 
equivalent to SXSSF, please open a bugzilla enahncement and make a start!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org