You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by William Graham <wi...@vanderbilt.edu> on 2010/07/07 19:25:05 UTC

Newbie: Problem using XSSFEventBasedExcelExtractor in 3.7 beta1

Hi Folks,

I'm new to using POI. I've searched the archives and the bug database but have 
seen no reference to my problem. Please forgive me if I missed something ... I 
really did try! :)

I'm using the XSSFEventBasedExcelExtractor to read very large Excel files and 
return the sheets as delimited text. I love it because the memory footprint 
and speed of access are amazing. I have, though, encountered one problem. In 
using either the getText or processSheet methods it seems that this extractor 
somehow treats an empty cell as not actually being there when it returns the 
delimited text. So if there are column headers, but one column has empty 
cells, the result in the delimited output is that cells to the right of the 
empty cells are shifted over one cell to the left ... sort of like consecutive 
delimiters being treated as a single delimiter.

I have sample code and a sample Excel file to demonstrate this if anyone would 
like to take a look.

Again, very sorry if this is a known issue and I just missed it in my search.

Best Regards,
William Graham
Vanderbilt University


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Newbie: Problem using XSSFEventBasedExcelExtractor in 3.7 beta1

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 8 Jul 2010, William Graham wrote:
>> That's because in the file format, empty cells really aren't there!
>> (.xlsx is a zip file of XML files, so you can unzip it and see for
>> yourself. Empty cells are skipped.) When using the usermodel, we can
>> detect this and return an empty cell for you if requested. If you're using
>> the event model, you're much too low down and it's all up to you...
>
> We'll see ... I'd enjoy the challenge. It's just a matter of finding the
> time.  :)

Always the way! The logic for detecting missing cells and rows in 
MissingRecordAwareHSSFListener should work for you in the xssf case, it's 
just how you tell what the current cell is, and how you trigger the 
missing record events that'll be different

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Newbie: Problem using XSSFEventBasedExcelExtractor in 3.7 beta1

Posted by William Graham <wi...@vanderbilt.edu>.
Thanks so much for the answer, Nick! Makes sense to me now.

Nick Burch <nick.burch <at> alfresco.com> writes:

> That's because in the file format, empty cells really aren't there!
> (.xlsx is a zip file of XML files, so you can unzip it and see for 
> yourself. Empty cells are skipped.) When using the usermodel, we can 
> detect this and return an empty cell for you if requested. If you're using 
> the event model, you're much too low down and it's all up to you...


We'll see ... I'd enjoy the challenge. It's just a matter of finding the 
time.  :) 

> If this matters to you, I'd suggest you try writing similar logic for 
> XSSF. If you do get it working, do please send in the patch! :)
> 
> Nick
> 


Thanks again,
William


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Newbie: Problem using XSSFEventBasedExcelExtractor in 3.7 beta1

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 7 Jul 2010, William Graham wrote:
> In using either the getText or processSheet methods it seems that this 
> extractor somehow treats an empty cell as not actually being there when 
> it returns the delimited text.

That's because in the file format, empty cells really aren't there!
(.xlsx is a zip file of XML files, so you can unzip it and see for 
yourself. Empty cells are skipped.) When using the usermodel, we can 
detect this and return an empty cell for you if requested. If you're using 
the event model, you're much too low down and it's all up to you...

For HSSF, we have a wrapper around the HSSF event based stuff that 
watches the cell numbers, and spots when things are missed:
http://poi.apache.org/apidocs/org/apache/poi/hssf/eventusermodel/MissingRecordAwareHSSFListener.html

If this matters to you, I'd suggest you try writing similar logic for 
XSSF. If you do get it working, do please send in the patch! :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org