You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by William Graham <wi...@vanderbilt.edu> on 2010/07/07 19:25:05 UTC
Newbie: Problem using XSSFEventBasedExcelExtractor in 3.7 beta1
Hi Folks,
I'm new to using POI. I've searched the archives and the bug database but have
seen no reference to my problem. Please forgive me if I missed something ... I
really did try! :)
I'm using the XSSFEventBasedExcelExtractor to read very large Excel files and
return the sheets as delimited text. I love it because the memory footprint
and speed of access are amazing. I have, though, encountered one problem. In
using either the getText or processSheet methods it seems that this extractor
somehow treats an empty cell as not actually being there when it returns the
delimited text. So if there are column headers, but one column has empty
cells, the result in the delimited output is that cells to the right of the
empty cells are shifted over one cell to the left ... sort of like consecutive
delimiters being treated as a single delimiter.
I have sample code and a sample Excel file to demonstrate this if anyone would
like to take a look.
Again, very sorry if this is a known issue and I just missed it in my search.
Best Regards,
William Graham
Vanderbilt University
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Newbie: Problem using XSSFEventBasedExcelExtractor in 3.7
beta1
Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 8 Jul 2010, William Graham wrote:
>> That's because in the file format, empty cells really aren't there!
>> (.xlsx is a zip file of XML files, so you can unzip it and see for
>> yourself. Empty cells are skipped.) When using the usermodel, we can
>> detect this and return an empty cell for you if requested. If you're using
>> the event model, you're much too low down and it's all up to you...
>
> We'll see ... I'd enjoy the challenge. It's just a matter of finding the
> time. :)
Always the way! The logic for detecting missing cells and rows in
MissingRecordAwareHSSFListener should work for you in the xssf case, it's
just how you tell what the current cell is, and how you trigger the
missing record events that'll be different
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Newbie: Problem using XSSFEventBasedExcelExtractor in 3.7 beta1
Posted by William Graham <wi...@vanderbilt.edu>.
Thanks so much for the answer, Nick! Makes sense to me now.
Nick Burch <nick.burch <at> alfresco.com> writes:
> That's because in the file format, empty cells really aren't there!
> (.xlsx is a zip file of XML files, so you can unzip it and see for
> yourself. Empty cells are skipped.) When using the usermodel, we can
> detect this and return an empty cell for you if requested. If you're using
> the event model, you're much too low down and it's all up to you...
We'll see ... I'd enjoy the challenge. It's just a matter of finding the
time. :)
> If this matters to you, I'd suggest you try writing similar logic for
> XSSF. If you do get it working, do please send in the patch! :)
>
> Nick
>
Thanks again,
William
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Newbie: Problem using XSSFEventBasedExcelExtractor in 3.7
beta1
Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 7 Jul 2010, William Graham wrote:
> In using either the getText or processSheet methods it seems that this
> extractor somehow treats an empty cell as not actually being there when
> it returns the delimited text.
That's because in the file format, empty cells really aren't there!
(.xlsx is a zip file of XML files, so you can unzip it and see for
yourself. Empty cells are skipped.) When using the usermodel, we can
detect this and return an empty cell for you if requested. If you're using
the event model, you're much too low down and it's all up to you...
For HSSF, we have a wrapper around the HSSF event based stuff that
watches the cell numbers, and spots when things are missed:
http://poi.apache.org/apidocs/org/apache/poi/hssf/eventusermodel/MissingRecordAwareHSSFListener.html
If this matters to you, I'd suggest you try writing similar logic for
XSSF. If you do get it working, do please send in the patch! :)
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org