You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by David Hoffer <da...@issinc.com> on 2008/08/09 04:05:03 UTC

How to set missing cell policy using ExcelExtractor?

I have an Excel file where extracting text using ExcelExtractor works fine
except that it does not insert extra tab characters for missing cells.  This
results in table formatted data being incorrect if there are any missing
cells.

Calling setMissingCellPolicy on the HSSFWorkbook doesn't seem to have any
effect.

I can I tell it to insert missing cells?  (i.e. add \t)

-Dave


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: How to set missing cell policy using ExcelExtractor?

Posted by Nick Burch <ni...@torchbox.com>.
On Sat, 9 Aug 2008, David Hoffer wrote:
> I found that XLS2CSVmra was beyond my ability to repair as it generated 
> lots of errors that I didn't know how to fix.

That almost certainly means you're trying to compile the svn version of 
XLS2CSVmra against an earlier POI jar. That rarely works well :( Your best 
bet is to go for svn all the way.

> ExcelExtractor on the other hand, was almost perfect, all I had to do 
> was add the following 3 lines in the source in the cell type switch 
> statement.

I've put something like that into svn trunk. It actually needs a tiny bit 
more, for when then first cell isn't in column 0, but it's there now if 
you call setIncludeBlankCells(true)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: How to set missing cell policy using ExcelExtractor?

Posted by David Hoffer <da...@issinc.com>.
Hi Nick,

I found that XLS2CSVmra was beyond my ability to repair as it generated lots
of errors that I didn't know how to fix.

ExcelExtractor on the other hand, was almost perfect, all I had to do was
add the following 3 lines in the source in the cell type switch statement.

case HSSFCell.CELL_TYPE_BLANK:
     outputContents = true;
     break;

This worked in my case because although the text is numeric, Excel thinks
they are strings/text; they come back as "" and this enables the output of
the \t character.

Ideally this would be a switch in the API just like
setIncludeCellComments(), etc.

-Dave



-----Original Message-----
From: Nick Burch [mailto:nick@torchbox.com] 
Sent: Saturday, August 09, 2008 3:57 AM
To: POI Users List
Subject: Re: How to set missing cell policy using ExcelExtractor?

On Fri, 8 Aug 2008, David Hoffer wrote:
> I have an Excel file where extracting text using ExcelExtractor works 
> fine except that it does not insert extra tab characters for missing 
> cells.  This results in table formatted data being incorrect if there 
> are any missing cells.

You might find org.apache.poi.hssf.eventusermodel.examples.XLS2CSVmra 
(from examples) a better fit for your needs.

ExcelExtractor is more designed for text extraction for lucene style 
indexing. As part of that, it only outputs cells in the range that exist 
in the row (so might not start at column 0), and skips all blank cells.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to set missing cell policy using ExcelExtractor?

Posted by Nick Burch <ni...@torchbox.com>.
On Fri, 8 Aug 2008, David Hoffer wrote:
> I have an Excel file where extracting text using ExcelExtractor works 
> fine except that it does not insert extra tab characters for missing 
> cells.  This results in table formatted data being incorrect if there 
> are any missing cells.

You might find org.apache.poi.hssf.eventusermodel.examples.XLS2CSVmra 
(from examples) a better fit for your needs.

ExcelExtractor is more designed for text extraction for lucene style 
indexing. As part of that, it only outputs cells in the range that exist 
in the row (so might not start at column 0), and skips all blank cells.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org