You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by ajaygarga <aj...@gmail.com> on 2012/01/08 09:32:05 UTC

Reading Excel (xls and xlsx) files Row by Row

I am trying to read Excel (xls and xlsx) files Row by Row using POI 3.5. The
POI User model APIs (aka DOM) loads the whole set of rows into the Memory
(Java Heap) which I don't  want to happen. 

Using POI HSSF Event model APIs for XLS format, I tried with
AbortableHSSFListener so that I can abort the processing as soon as I am
done with a row. But I ran into issue with AbortableHSSFListener as it
doesn't emit records for missing rows and cells and end of row records. If I
try to use MissingRecordAwareHSSFListener, this doesn't work with
AbortableHSSFListener instead invokes the processRecord(Record record) on
the child listener. When I try to write my own Abortable MissinfRecordAware
Listener similar to MissingRecordAwareHSSFListener, I found that it can emit
LastCellOfRowDummyRecord to notify the End of Row, only after reading the
cell of the next row or the beginning of the next sheet or the EOFRecord. If
I want to pause the processing until the caller request for the next record,
I will have to push back the last read record back into the stream which I
am not aware of any lowel level POI APIs for the same. 

Using POI XSSF Event model APIs for XLSX format, I have only option with SAX
Parser (because I am using JDK5) which doesn't support pause and resume. I
know with JDK 6 Stax XML APIs, I could do pause and resume. 

Please let me know if any of you know to use POI 3.5 to read row by row and
pause the processing after reading and returning the row to the caller until
the caller request for the next.

Thanks
Ajay

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-Excel-xls-and-xlsx-files-Row-by-Row-tp5128945p5128945.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading Excel (xls and xlsx) files Row by Row

Posted by Nick Burch <ni...@alfresco.com>.
On Sun, 8 Jan 2012, ajaygarga wrote:
> Using POI HSSF Event model APIs for XLS format, I tried with
> AbortableHSSFListener so that I can abort the processing as soon as I am
> done with a row.

You could just use MissingRecordAwareHSSFListener, and throw an exception 
when you want to abort, I believe that'll work

> When I try to write my own Abortable MissinfRecordAware Listener similar 
> to MissingRecordAwareHSSFListener, I found that it can emit 
> LastCellOfRowDummyRecord to notify the End of Row, only after reading 
> the cell of the next row or the beginning of the next sheet or the 
> EOFRecord. If I want to pause the processing until the caller request 
> for the next record, I will have to push back the last read record back 
> into the stream which I am not aware of any lowel level POI APIs for the 
> same.

That's correct - the excel format doesn't have an "end of row" marker, you 
just have to deduce it from the next record and spot it has moved rows. 
You don't need to "push back" directly though, just have a pre-read record 
cached on your side, and return that if present. I'm pretty sure we do 
something like that in a few places in POI

> Please let me know if any of you know to use POI 3.5

As an aside, I'd suggest you try 3.8 beta 5 - there have been lots of 
fixes since 3.5

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org