You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by David North <dt...@corefiling.co.uk> on 2014/07/02 11:07:07 UTC
Streaming read of XSSF
Greetings, fellow POI users.
I just wanted to check that my understanding is correct:
I have discovered that reading my entire workbook into memory using the
obvious method of loading it as an XSSFWorkbook consumes many gigabytes
of memory. I want to reduce this.
Fortunately, my code is already set up to consume the rows in one pass,
one at a time, so the obvious solution seems to be streaming.
The classes in org.apache.poi.xssf.streaming.* (in particular,
SXSSFWorkbook) are for streaming *write*.
If I want to do streaming *read*, I need to follow the example of
XSSFSheetXMLHandler, building something on top of SAX - i.e. there is no
API as such for streaming read access to Row and Cell objects.
Correct?
Thanks,
David
--
David North, Technical Lead, CoreFiling Limited
http://www.corefiling.com
Phone: +44-1865-203192
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Streaming read of XSSF
Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 2 Jul 2014, David North wrote:
> I have discovered that reading my entire workbook into memory using the
> obvious method of loading it as an XSSFWorkbook consumes many gigabytes
> of memory. I want to reduce this.
Going from compressed xml, through decompressed xml and xml parsing, to
many java objects does sadly mean XSSF needs many multiples of the file's
size in memory use, sorry.
If you have a very large workbook with many sheets, and only care about a
couple of sheets, then it's possible that a lazy-loading approach for
sheets (not yet supported but not much work) might help. For someone who
needs most of the file's contents, your only option for using XSSF
UserModel is to smile sweatly at your sysadmin and ask them to buy you
some more memory...
> If I want to do streaming *read*, I need to follow the example of
> XSSFSheetXMLHandler, building something on top of SAX - i.e. there is no
> API as such for streaming read access to Row and Cell objects.
Correct, SAX reading is your only option for streaming read. POI provides
a number of helper classes to make your life easier. There's also a few
examples which show how to handle formatting etc when doing sax reading,
try looking at the XLSX to CSV example in POI, and the XLSX to XHTML code
in Apache Tika
Alternately, if you can think of a good model for providing a Workbook
style paging XSSF reader on top of SAX, to provide a streaming read
equivalent to SXSSF, please open a bugzilla enahncement and make a start!
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org