You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by tcole6 <tc...@gmail.com> on 2012/02/15 16:07:46 UTC

Reading cell values that have a custom format

I have searched the forums, but cannot find a solution, or a definitive, "you
cannot do that". 

I am processing Excel (xlsx) files that are being delivered from multiple
locations. My goal is to be able to read the Excel file into a
multi-dimensional String array and have each item contain an exact String
duplication of the cell values as seen when opened in Excel. This would
include cells that contain custom formats.

In my current example, I have a column of values that each contain exactly
11 digits. The cells in this column have a custom format applied to them of
"000-00000-000". In the .xlsx file I can see the numberFormat
"000\-00000\-000" with an id of 164 in the styles.xml file.

When I read the cell value using DataFormatter.formatCellValue(cell), I am
returned the numeric value, not the formatted value. In my example I am
getting "12312345123" and not "123-12345-123". 

I have no way of knowing what custom formats will be applied to a cell or
group of cells before hand, each one can be different.

Is it possible within the API to read this cell as "123-12345-123" or am I
going to have to parse the styles.xml file and add each custom format that I
find there? And if I do, would it even work (I am assuming that when I add
the form I would have to unescape characters).

Any help would be appreciated, even if it's "you can't do that".

Thanks.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5486315.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
I have been using the following code block to retrieve sheet contents into a
String[][] array with success:

for (int i = 0, len = sheet.getLastRowNum(); i <= len; i++) {
	Row row = sheet.getRow(i);
	if (row != null) {
		for (int j = 0, jlen = row.getLastCellNum(); j < jlen; j++) {
			Cell cell = row.getCell(j);
			if (cell != null) {
				if (cell.getCellType() == Cell.CELL_TYPE_FORMULA) {
					if (cell.getCachedFormulaResultType() == Cell.CELL_TYPE_ERROR) { 
						data[i][j] = "#VALUE!"; 
					} 
					else { 
						data[i][j] = formatter.formatCellValue(cell, evaluator); 
					} 
				}
				else if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC && !
DateUtil.isCellDateFormatted(cell)) {
					short formatIndex = cell.getCellStyle().getDataFormat();
					if (formatIndex >= 164) { //custom format
						CellFormat cellFormat =
CellFormat.getInstance(cell.getCellStyle().getDataFormatString());
						data[i][j] = cellFormat.apply(cell).text;
					}
					else {
						data[i][j] = formatter.formatCellValue(cell);
					}
				}
				else {
					try {
						data[i][j] = formatter.formatCellValue(cell);
					}
					catch(Exception e) {
						cell.setCellType(Cell.CELL_TYPE_STRING);
						data[i][j] = cell.getRichStringCellValue().toString();
					}
				}
			}
			else {
				data[i][j] = "";
			}
		}
	}
}

So far this has been working for all my test cases and no new issues have
been reported. This is obviously not the entire code, just the portion that
deals with the formatting of each cell value. There is other stuff there
that strips empty rows, etc. 

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5524910.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by svysny <st...@gmail.com>.
Is it still looking good?

What is odd, is that like you I've seen that the General and @ formats don't
work correctly, but I experience different incorrect behaviour than you do
for your 0.0005 example. For General format, I get '0.'  and for '@' format
I get 5.0E-4. This is with poi 3.7 jars. I'm wondering if there is also
something else going on. This is my sample code and output. Can you see
please what I'm doing differently in my code than in yours?


Sample code:

InputStream is = new FileInputStream("Workbook with 0.0005 in cell A1.xls");
    
XSSFWorkbook wb = new XSSFWorkbook(is);
XSSFSheet sheet = wb.getSheet("Sheet1");
XSSFRow row = sheet.getRow(0);
XSSFCell cell = row.getCell(0);

System.out.println(cell.getNumericCellValue());
        		
CellFormat cellFormat1 = CellFormat.getInstance("General");
String formattedData1 = cellFormat1.apply(cell).text;

System.out.println("'General' format: " + formattedData1);

CellFormat cellFormat2 = CellFormat.getInstance("@");
String formattedData2 = cellFormat2.apply(cell).text;

System.out.println("'@' format: " + formattedData2);
        
is.close();


Output:

5.0E-4
'General' format: 0.     <-- you see 5.0E-4 for 'General' format
'@' format: 5.0E-4

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5506619.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
It is General Format, no custom format applied. I also get the same display
(5.0E-4) when attempting to read it as a string by setting the cell type to
CELL_TYPE_STRING and reading it.

I have a bunch of guys running tests right now, but so far using
DataFormatter for any DataFormats < 164 and CellFormat for format >= 164
seems to be working pretty well. No reports of errors yet :) Fingers
crossed.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5502315.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by svysny <st...@gmail.com>.
What is the format string for the cell with contents 0.0005 that is being
returned as 5.0E-4? Is it '@'?

Using DataFormatter and CellFormat together as you suggest sounds like it
has potential, although (I assume) you will need to examine the cell format
string and determine which class to use.

>From my experience, DataFormatter has been reliable for standard formats.
But I found that DataFormatter does not support scaling commas and
quoted/escaped text in the format which was why I looked at CellFormat.

I'm interested in whether your approach works. Please post back with how you
get on.

Steve

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5500416.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
One other idea is that I could use DataFormatter to handle any cells that
have standard formats, and use CellFormat for cells that have custom
formats. That still appears to give me 100% results on the test cases I
have.

Any thoughts on the feasibility of that? Am I just hiding a potential future
issue?

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5499188.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
I found another issue today regarding the CellFormat that does make it nearly
impossible for me to use.

In an excel spreadsheet there is a numeric cell with the contents 0.0005.

Using the following code:

CellFormat cf = new CellFormat(cell.getCellStyle().getDataFormatString());
data[i][j] = cf.apply(cell).text;

Results in the data[i][j] containing the string 5.0E-4.

So now I would have to go about determining that the results were in SN, and
go about converting this. This occurence I hit way more often than the one
custom format that I still have to fix so for now I will have to switch back
to the DataFormatter (which returns me 0.0005 in the above example) and work
it out from there.

Thanks again for your assistance.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5498921.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by svysny <st...@gmail.com>.
Here are the issues I've encountered with CellFormat:

- for a format that has one section (e.g. "0.00"), negative numbers are
formatted without a minus ("-") sign
- for a format with three or four sections ("0.00;-0.00;-" or
"0.00;-0.00;-;@"), a negative number is formatted using the zero section and
vice versa
- for the "General" and "@" formats, decimal values are formatted with the
wrong number of decimal places
- TRUE and FALSE (for boolean values) are formatted in lower case
- date-time values are formatted incorrectly for a format with hours and
minutes but not seconds (e.g. "h:m") 

(I've created a patch for the first four issues (bug 52708 in bugzilla), the
last issue is raised as bug 52438 and the fix will be in 3.8-beta6.)

Steve

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5497290.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
Thanks for the reply Steve.

The good news is that this appears to work very well for me. I have 6 test
cases for which I had written a custom formatter that worked on 5 of the 6,
however your suggestion has fixed all 6 of my test cases.

Which issues have you discovered? Just in case I should come across a format
that doesn't work?

Again, thank you so much for your assistance. This gets me exactly where I
need to be!!!

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5495927.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by svysny <st...@gmail.com>.
If you are using version 3.7-beta1 or higher, try the CellFormat class
instead of DataFormatter. E.g. in your example code use

CellFormat cellFormat =
CellFormat.getInstance(cell.getCellStyle().getDataFormatString());

and

data[i][j] = cellFormat.apply(cell).text

CellFormat appears to work for your "000-00000-000" custom format but I've
found that CellFormat has some issues (I'm working on a patch) so it may not
work correctly for other custom cell formats in your Excel files.

Regards

Steve

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5495664.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
One has to wonder how I missed this, listed directly in the Javadocs for
DataFormatter:

"However the pattern "00-00-00" is incorrectly formatted by DecimalFormat as
"000000--". 

Hmmm...

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5492311.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
Raised bug 52684. Will have to work on the patch. I haven't found a
successful way to work this yet but when I do, I will submit it. :)

Thank you.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5489905.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 16 Feb 2012, tcole6 wrote:
> Thanks a ton for your feedback. I look forward to seeing/testing the 
> results.

You'll want to raise a bug in bugzilla, and ideally also work up a patch 
for it.... :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
Thanks a ton for your feedback. I look forward to seeing/testing the results.

I had another issue I had posted awhile back regarding Cell Comments and
cloning sheets. When cloning a sheet, the Comments weren't getting cloned,
though everything else (formats, styles, etc.) was.

I had to fake that in code to get it working and for now is getting me what
I needed. This custom format issue would be the only remaining issue for my
use case.

Again, thank you for the prompt replies. The POI API has allowed me to do
some really cool things using Excel to transport all kinds of disparrate
information between disconnected systems.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5489797.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 16 Feb 2012, tcole6 wrote:
> So I had originally tried that as well. Two ways...
>
> 1. Leaving the format string exactly as is "000\-00000\-000"
>
> This resulted in 12312345123\-\-.
>
> 2. I tried unescaping the format string which turned it into "000-00000-000"
>
> This resulted in 12312345123--.

OK, it looks like Decimial Format probably doesn't handle it. We'll have 
to fake it in code :/

The good news is that we do already have some support for that style of 
formatters though, in things like PhoneFormat and ZipFormat. The answer 
may be to make one of those more generic

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
So I had originally tried that as well. Two ways...

1. Leaving the format string exactly as is "000\-00000\-000"

This resulted in 12312345123\-\-.

2. I tried unescaping the format string which turned it into "000-00000-000"

This resulted in 12312345123--.

Terribly, neither was what I wanted nor what I was getting back from the
DataFormatter.

:(

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5489705.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 16 Feb 2012, tcole6 wrote:
> Then I obtain the cell in question...
>
>     cell.getCellStyle().getDataFormatString() returns "000\-00000\-000"
>     cell.getCellStyle().getDataFormat() returns 164
>
> Correct and correct. But...
>
>     formatter.formatCellValue(cell) returns "12312345123"

Hmm, I wonder if it's perhaps a limitation of the Java class libary 
underneath?

Can you try calling DecimalFormat directly with your format pattern, and 
see what it makes of it?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
So far I have the following:

Cell H:2 in Excel Spreadsheet contains value 12312345123
Cell H:2 in Excel Spreadsheet has custom format 000-00000-000

I look at styles.xml and confirm the format is there:

     <numFmt formatCode="000\-00000\-000" numFmtId="164"/>

When I read this cell using 3.8 beta 5...

I create my DataFormatter and FormulaEvaluator

     DataFormatter formatter = new DataFormatter();
     FormulaEvaluator evaluator =
workbook.getCreationHelper().createFormulaEvaluator();

Then I obtain the cell in question...

     cell.getCellStyle().getDataFormatString() returns "000\-00000\-000" 
     cell.getCellStyle().getDataFormat() returns 164

Correct and correct. But...

     formatter.formatCellValue(cell) returns "12312345123"

Any ideas? Thanks again. 

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5489483.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
I was able to confirm that indeed there was a format with that pattern and it
had the appropriate short identifier of 164.

It must be something wrong in my code, I just can't figure out what it is.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5486738.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by tcole6 <tc...@gmail.com>.
Thank you for the reply.

The following code:

System.out.println("Cell " + getEquivalentColumnName(j) + ":" + (i + 1) + "
style = " + cell.getCellStyle().getDataFormatString());

Results in (for example):

Cell H:430 style = 000\-00000\-000

So that looks correct. However the following code:

DataFormatter formatter = new DataFormatter();
FormulaEvaluator evaluator =
workbook.getCreationHelper().createFormulaEvaluator();
...
if (cell.getCellType() == Cell.CELL_TYPE_FORMULA) {
     if (cell.getCachedFormulaResultType() == Cell.CELL_TYPE_ERROR) { 
          data[i][j] = "#VALUE!"; 
     } 
     else { 
          data[i][j] = formatter.formatCellValue(cell, evaluator); 
    } 
}
else {
     try {
          data[i][j] = formatter.formatCellValue(cell);
     }
     catch(Exception e) {
          System.out.println("Cell " + getEquivalentColumnName(j) + ":" + i
+ " could not be formatted. Switched to string type.");
          cell.setCellType(Cell.CELL_TYPE_STRING);
          data[i][j] = cell.getRichStringCellValue().toString();
     }
}

With a cell value of 12312345123, results in data[i][j] == "12312345123" and
not "123-12345-123".

I hope that helps clarify the issue.

Should I be searching to see if a format exists for 000\-00000\-000 and if
not then adding it prior to attempting to format? It would appear that the
format must be there as I am getting the appropriate format back.

Any help would be appreciated.





--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Reading-cell-values-that-have-a-custom-format-tp5486315p5486553.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reading cell values that have a custom format

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 15 Feb 2012, tcole6 wrote:
> In my current example, I have a column of values that each contain exactly
> 11 digits. The cells in this column have a custom format applied to them of
> "000-00000-000". In the .xlsx file I can see the numberFormat
> "000\-00000\-000" with an id of 164 in the styles.xml file.
>
> When I read the cell value using DataFormatter.formatCellValue(cell), I am
> returned the numeric value, not the formatted value. In my example I am
> getting "12312345123" and not "123-12345-123".

I'd expect that to work properly. If you read the cell style for that 
cell, can you verify that the correct style and formatting really is 
applied to it?

Also, have you tried with POI 3.8 beta 5, to make sure it's not a bug that 
has been fixed?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org