You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Charles G Harvey <ha...@isc.upenn.edu> on 2009/08/27 18:52:29 UTC

Handling extremely large files

Does anyone have any tips or pointers for reading in large excel files? I expect to be working with files over 100megs... as far as I can tell, even after registering a listener, POI reads the entire file before disposing of it, which equals heap space errors for me. Do methods exist for parsing each line of an excel doc as poi reads it in from the stream, getting access to those cell objects, and then allowing the objects to be garbage collected as I move on to the next row of cells?

Thanks in advance for your advice!
Charles Harvey

Charles Harvey
Sr. Programmer Analyst
ISC-AIT
215.898.4773


Re: Handling extremely large files

Posted by Chris Lott <ma...@invest-faq.com>.
Charles G Harvey wrote:
> Does anyone have any tips or pointers for reading in large excel files? I expect to be working with files over 100megs... as far as I can tell, even after registering a listener, POI reads the entire file before disposing of it, which equals heap space errors for me. Do methods exist for parsing each line of an excel doc as poi reads it in from the stream, getting access to those cell objects, and then allowing the objects to be garbage collected as I move on to the next row of cells?

Yes.  Most of the examples show POI in the mode where it slurps the 
whole file.  Poi is a memory hog even on 1Mb files.  You can simply 
forget reading 100Mb excel workbooks into Poi all at once.

You will probably have to use the POI event-driven API that reads the 
file a bit at a time.  (If you are familiar with XML processing, this is 
like using DOM versus SAX parsing, but that's a digression.)

Poi provides example code "Xls2CsvMra" which does a stream-based 
conversion of old-fashioned "XLS" files to CSV.  Search for it, it's not 
hard to find.

I provide example code "Xlsx2Csv" which does a stream-based conversion 
of new-fangled "XLSX" documents to CSV.  You might look here: 
http://chris-lott.org/software/  Maybe someday the POI project will 
accept that code into their examples area.

HTH

chris...

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org