You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Charles G Harvey <ha...@isc.upenn.edu> on 2009/08/27 18:52:29 UTC
Handling extremely large files
Does anyone have any tips or pointers for reading in large excel files? I expect to be working with files over 100megs... as far as I can tell, even after registering a listener, POI reads the entire file before disposing of it, which equals heap space errors for me. Do methods exist for parsing each line of an excel doc as poi reads it in from the stream, getting access to those cell objects, and then allowing the objects to be garbage collected as I move on to the next row of cells?
Thanks in advance for your advice!
Charles Harvey
Charles Harvey
Sr. Programmer Analyst
ISC-AIT
215.898.4773
Re: Handling extremely large files
Posted by Chris Lott <ma...@invest-faq.com>.
Charles G Harvey wrote:
> Does anyone have any tips or pointers for reading in large excel files? I expect to be working with files over 100megs... as far as I can tell, even after registering a listener, POI reads the entire file before disposing of it, which equals heap space errors for me. Do methods exist for parsing each line of an excel doc as poi reads it in from the stream, getting access to those cell objects, and then allowing the objects to be garbage collected as I move on to the next row of cells?
Yes. Most of the examples show POI in the mode where it slurps the
whole file. Poi is a memory hog even on 1Mb files. You can simply
forget reading 100Mb excel workbooks into Poi all at once.
You will probably have to use the POI event-driven API that reads the
file a bit at a time. (If you are familiar with XML processing, this is
like using DOM versus SAX parsing, but that's a digression.)
Poi provides example code "Xls2CsvMra" which does a stream-based
conversion of old-fashioned "XLS" files to CSV. Search for it, it's not
hard to find.
I provide example code "Xlsx2Csv" which does a stream-based conversion
of new-fangled "XLSX" documents to CSV. You might look here:
http://chris-lott.org/software/ Maybe someday the POI project will
accept that code into their examples area.
HTH
chris...
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org