You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by David North <dt...@corefiling.co.uk> on 2014/06/02 13:55:32 UTC

Floating-point behaviour - difference between POI and Excel?

Consider the attached xlsx spreadsheet.

The value in sheet 1, cell A1, looks like this in the XML:

<v>62408.000000000007</v>

This is stripped down from a real spreadsheet supplied by one of my
users. I think that value was the result of a calculation.

Now, when opening the attached sheet in Excel 2010 or 2013, the UI shows
the value in cell A1 as:

62408

No matter how I format it, the UI refuses to acknowledge or display the
.000000000007 in the underlying file.

However, when reading this cell using POI, getNumericCellValue returns

62408.00000000001

Is this correct?

If I enter the formula =62408-A1 in Excel, the value serialized is
<v>0</v>, which strongly suggests Excel is reading 62408 as the closest
number to the serialized data. So it would be ideal if POI did likewise.

Thanks,
David

-- 
David North, Technical Lead, CoreFiling Limited
http://www.corefiling.com
Phone: +44-1865-203192


Re: Floating-point behaviour - difference between POI and Excel?

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 2 Jul 2014, David North wrote:
> "Precision in Excel means that any number exceeding 15 digits is stored
> and shown with only 15 digits of precision. Those digits can be in any
> combination before or after the decimal point. Any digits to the right
> of the 15^th digit will be zeros."
>
> So it looks like I could make my extracted numbers match what the user
> sees in Excel by dropping significant digits beyond the 15th. If we go
> down that route I'll consider submitting a patch, or at least some
> JavaDoc pointing out the trap.

If you could provide such a patch, that'd be great!

Thanks
Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Floating-point behaviour - difference between POI and Excel?

Posted by David North <dt...@corefiling.com>.
On 04/06/14 13:26, David North wrote:
> On 02/06/14 18:21, David North wrote:
>> On 02/06/14 17:01, Nick Burch wrote:
>>>>>> No matter how I format it, the UI refuses to acknowledge or display
>>>>>> the
>>>>>> .000000000007 in the underlying file.
>>>>> What about if you set a format string of something like
>>>>> 0.000000000000000 ? (I suspect the 62408 displayed is due to a format
>>>>> string of 0 or General)
>>>> Setting that format string results in 62408.000000000000000 - adding
>>>> more zeroes to the format string just adds more zeroes to the displayed
>>>> number, with no sign of any non-zero digits after the decimal point.
>>> Hmm, interesting
>>>
>>>>> Also, if you save the file as a .xls, does that change how both POI
>>>>> and Excel see it? (.xlsx and .xls use different ways of serialising
>>>>> numeric values to disk)
>>>> No, Excel 2010/2013 and POI both behave the same when the example is
>>>> converted to .xls - including DataFormatter giving "62408".
>>>>
>>>> I think I'll have to resort to reading the office formats spec to try
>>>> and find the official word on how applications are supposed to deal with
>>>> reading values outside the range of double-precision floating point...
>>> FWIW, the relevent code in XSSFCell is simply:
>>>
>>>                       return Double.parseDouble(_cell.getV());
>>>
>>> So we're relying on the default Java behaviour
>> Indeed, which does not seem unreasonable as Java's behaviour is closely
>> related to IEEE 754, which is also what Excel claims to use.
>>
>> I've posted here to see if someone from Microsoft can give me a
>> definitive answer:
>>
>> http://openxmldeveloper.org/discussions/formats/f/14/p/7774/163115.aspx
>>
>> Wading through the Office Open XML Specifications has failed to throw up
>> a definitive reference on number parsing.
>>
> http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%201st%20edition%20Part%204%20(PDF).zip
>
> Section 3.17.5.2 on page 2531:
>
> "[The value space of numbers in SpreadsheetML] is patterned after the
> IEEE double-precision 64-bit floating-point type"
>
> So it looks like POI is right and Excel is wrong. LibreOffice is
> consistent with Excel, so I'll try and find out the justification for that.

Further digging reveals this article on the Microsoft site:

http://office.microsoft.com/en-us/excel-help/change-formula-recalculation-iteration-or-precision-HA102749052.aspx?CTT=1#_Toc305944079

"Precision in Excel means that any number exceeding 15 digits is stored
and shown with only 15 digits of precision. Those digits can be in any
combination before or after the decimal point. Any digits to the right
of the 15^th digit will be zeros."

So it looks like I could make my extracted numbers match what the user
sees in Excel by dropping significant digits beyond the 15th. If we go
down that route I'll consider submitting a patch, or at least some
JavaDoc pointing out the trap.

David

-- 
David North, Technical Lead, CoreFiling Limited
http://www.corefiling.com
Phone: +44-1865-203192


Re: Floating-point behaviour - difference between POI and Excel?

Posted by David North <dt...@corefiling.co.uk>.
On 02/06/14 18:21, David North wrote:
> On 02/06/14 17:01, Nick Burch wrote:
>>>>> No matter how I format it, the UI refuses to acknowledge or display
>>>>> the
>>>>> .000000000007 in the underlying file.
>>>> What about if you set a format string of something like
>>>> 0.000000000000000 ? (I suspect the 62408 displayed is due to a format
>>>> string of 0 or General)
>>> Setting that format string results in 62408.000000000000000 - adding
>>> more zeroes to the format string just adds more zeroes to the displayed
>>> number, with no sign of any non-zero digits after the decimal point.
>> Hmm, interesting
>>
>>>> Also, if you save the file as a .xls, does that change how both POI
>>>> and Excel see it? (.xlsx and .xls use different ways of serialising
>>>> numeric values to disk)
>>> No, Excel 2010/2013 and POI both behave the same when the example is
>>> converted to .xls - including DataFormatter giving "62408".
>>>
>>> I think I'll have to resort to reading the office formats spec to try
>>> and find the official word on how applications are supposed to deal with
>>> reading values outside the range of double-precision floating point...
>> FWIW, the relevent code in XSSFCell is simply:
>>
>>                       return Double.parseDouble(_cell.getV());
>>
>> So we're relying on the default Java behaviour
> Indeed, which does not seem unreasonable as Java's behaviour is closely
> related to IEEE 754, which is also what Excel claims to use.
>
> I've posted here to see if someone from Microsoft can give me a
> definitive answer:
>
> http://openxmldeveloper.org/discussions/formats/f/14/p/7774/163115.aspx
>
> Wading through the Office Open XML Specifications has failed to throw up
> a definitive reference on number parsing.
>

http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%201st%20edition%20Part%204%20(PDF).zip

Section 3.17.5.2 on page 2531:

"[The value space of numbers in SpreadsheetML] is patterned after the
IEEE double-precision 64-bit floating-point type"

So it looks like POI is right and Excel is wrong. LibreOffice is
consistent with Excel, so I'll try and find out the justification for that.

David

-- 
David North, Technical Lead, CoreFiling Limited
http://www.corefiling.com
Phone: +44-1865-203192


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Floating-point behaviour - difference between POI and Excel?

Posted by David North <dt...@corefiling.co.uk>.
On 02/06/14 17:01, Nick Burch wrote:
>>>> No matter how I format it, the UI refuses to acknowledge or display
>>>> the
>>>> .000000000007 in the underlying file.
>>>
>>> What about if you set a format string of something like
>>> 0.000000000000000 ? (I suspect the 62408 displayed is due to a format
>>> string of 0 or General)
>>
>> Setting that format string results in 62408.000000000000000 - adding
>> more zeroes to the format string just adds more zeroes to the displayed
>> number, with no sign of any non-zero digits after the decimal point.
>
> Hmm, interesting
>
>>> Also, if you save the file as a .xls, does that change how both POI
>>> and Excel see it? (.xlsx and .xls use different ways of serialising
>>> numeric values to disk)
>>
>> No, Excel 2010/2013 and POI both behave the same when the example is
>> converted to .xls - including DataFormatter giving "62408".
>>
>> I think I'll have to resort to reading the office formats spec to try
>> and find the official word on how applications are supposed to deal with
>> reading values outside the range of double-precision floating point...
>
> FWIW, the relevent code in XSSFCell is simply:
>
>                       return Double.parseDouble(_cell.getV());
>
> So we're relying on the default Java behaviour

Indeed, which does not seem unreasonable as Java's behaviour is closely
related to IEEE 754, which is also what Excel claims to use.

I've posted here to see if someone from Microsoft can give me a
definitive answer:

http://openxmldeveloper.org/discussions/formats/f/14/p/7774/163115.aspx

Wading through the Office Open XML Specifications has failed to throw up
a definitive reference on number parsing.

-- 
David North, Technical Lead, CoreFiling Limited
http://www.corefiling.com
Phone: +44-1865-203192


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Floating-point behaviour - difference between POI and Excel?

Posted by Nick Burch <ap...@gagravarr.org>.
>>> No matter how I format it, the UI refuses to acknowledge or display the
>>> .000000000007 in the underlying file.
>>
>> What about if you set a format string of something like
>> 0.000000000000000 ? (I suspect the 62408 displayed is due to a format
>> string of 0 or General)
>
> Setting that format string results in 62408.000000000000000 - adding
> more zeroes to the format string just adds more zeroes to the displayed
> number, with no sign of any non-zero digits after the decimal point.

Hmm, interesting

>> Also, if you save the file as a .xls, does that change how both POI
>> and Excel see it? (.xlsx and .xls use different ways of serialising
>> numeric values to disk)
>
> No, Excel 2010/2013 and POI both behave the same when the example is
> converted to .xls - including DataFormatter giving "62408".
>
> I think I'll have to resort to reading the office formats spec to try
> and find the official word on how applications are supposed to deal with
> reading values outside the range of double-precision floating point...

FWIW, the relevent code in XSSFCell is simply:

                       return Double.parseDouble(_cell.getV());

So we're relying on the default Java behaviour

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Floating-point behaviour - difference between POI and Excel?

Posted by David North <dt...@corefiling.co.uk>.
On 02/06/14 13:09, Nick Burch wrote:
> On Mon, 2 Jun 2014, David North wrote:
>> The value in sheet 1, cell A1, looks like this in the XML:
>>
>> <v>62408.000000000007</v>
>
> Isn't floating point fun?
>
That's one word for it!
>> Now, when opening the attached sheet in Excel 2010 or 2013, the UI shows
>> the value in cell A1 as:
>>
>> 62408
>>
>> No matter how I format it, the UI refuses to acknowledge or display the
>> .000000000007 in the underlying file.
>
> What about if you set a format string of something like
> 0.000000000000000 ? (I suspect the 62408 displayed is due to a format
> string of 0 or General)

Setting that format string results in 62408.000000000000000 - adding
more zeroes to the format string just adds more zeroes to the displayed
number, with no sign of any non-zero digits after the decimal point.

>
>> However, when reading this cell using POI, getNumericCellValue returns
>>
>> 62408.00000000001
>>
>> Is this correct?
>
> Quite possibly. What does DataFormatter give you if you ask for the
> formatted String representation of the cell?

new DataFormatter().formatCellValue(theCell) gives the string 62408
>
> Also, if you save the file as a .xls, does that change how both POI
> and Excel see it? (.xlsx and .xls use different ways of serialising
> numeric values to disk)

No, Excel 2010/2013 and POI both behave the same when the example is
converted to .xls - including DataFormatter giving "62408".

I think I'll have to resort to reading the office formats spec to try
and find the official word on how applications are supposed to deal with
reading values outside the range of double-precision floating point...

David

-- 
David North, Technical Lead, CoreFiling Limited
http://www.corefiling.com
Phone: +44-1865-203192


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Floating-point behaviour - difference between POI and Excel?

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 2 Jun 2014, David North wrote:
> The value in sheet 1, cell A1, looks like this in the XML:
>
> <v>62408.000000000007</v>

Isn't floating point fun?

> Now, when opening the attached sheet in Excel 2010 or 2013, the UI shows
> the value in cell A1 as:
>
> 62408
>
> No matter how I format it, the UI refuses to acknowledge or display the
> .000000000007 in the underlying file.

What about if you set a format string of something like 0.000000000000000 
? (I suspect the 62408 displayed is due to a format string of 0 or 
General)

> However, when reading this cell using POI, getNumericCellValue returns
>
> 62408.00000000001
>
> Is this correct?

Quite possibly. What does DataFormatter give you if you ask for the 
formatted String representation of the cell?

Also, if you save the file as a .xls, does that change how both POI and 
Excel see it? (.xlsx and .xls use different ways of serialising numeric 
values to disk)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org