You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by "Dr. Chavdar Ivanov" <ch...@outlook.hu> on 2020/08/18 04:48:19 UTC

Float comparison

Hello



I posted the message below to the TopBraid users mailing list and already clarified that as sh:equals is based on RDF node equality, values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct. So I am keeping this for the interest of others in the list



But on SPARQL float comparison I got an advise to check in this mailing list for other opinions.

I understand that SPARQL comparison is mathematically based so 1.0 should be equal to 1. However below in item 2 you will see the numbers I compared and I am getting confused. Take into account that in the data graph the 2 compared properties are typed literals with datatype float.

I wanted to know what is the precision when float is compared. So I have 2 questions

*       What is the precision? - is it 6th decimal and is it OK to compare different forms of float, i.e. one is in scientific form
*       Why I am getting wrong comparison result for bigger values such as    100123456.1     and  100123459     which are found as same



Best regards

Chavdar





========





Dear all,



I have a very basic question...

I need to compare literals that are floats and tried to use two ways. 1) using sh:equals to compare 2 properties and 2) using SPARQL where I filter != different values



For the filter I tried using

FILTER (xsd:float(?value1)!=xsd:float(?value1)).

or

FILTER (?value1!=?value1).

Both give the same outcome.



Below I listed a summary of the tests I did



I think sh:equals treats the literals as strings even though they are floats. It also gives 2 results. I thing this looks like according to the SHACL spec although I didn't if the sh:equals ignores the datatype.



However In some cases the result form the SPARQL is kind of strange. It looks like the precision is 10-6, but for the big numbers  and when scientific form on float number is used we have something different.



What is followed to define the difference?

If I use google calculator

100123456.1-100.123459E+06=-2.90000000596



Normally it should be OK to compare different forms of float.





1) using sh:equals in the property shape

Value1 ; value 2  ; comparisson result

1.123456 ; 1.123456 ; same

1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)

31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)

30    ;      30.0000001 ; different (sh:equals reports it twice)

30     ;      30.000001 ; different (sh:equals reports it twice)

100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)

100123456.0  ; 100123456.0 ; same

100123456    ;  100.123456E6 ; different (sh:equals reports it twice)

100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)

-0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it twice)

-0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it twice)

100123456.1    ;  100.123456E+06  ; different (sh:equals reports it twice)

100123456.1     ;   100.123459E+06 ; different (sh:equals reports it twice)

100123456.1     ;  100123459      ; different (sh:equals reports it twice)

100123456.1     ;  100123459.0    ; different (sh:equals reports it twice)



2) using SPARQL (in the property shape)

1.123456 ; 1.123456 ; same

1.1234560 ; 1.1234561 ; different

31.1234560 ; 31.1234561 ;different

30    ;      30.0000001 ; same

30     ;      30.000001 ; different

100123456.0  ; 100123456.1 ; same

100123456.0  ; 100123456.0 ; same

100123456    ;  100.123456E6 ; same

100123456    ;  100.123456E+06 ; same

-0.123456789  ;  -123.456789E-3 ; same

-0.123456789  ;  -123.456789E-03 ; same

100123456.1    ;  100.123456E+06  ; same

100123456.1     ;   100.123459E+06 ; same

100123456.1     ;  100123459      ; same

100123456.1     ;  100123459.0    ; same



Best regards

Chavdar

RE: Float comparison

Posted by "Dr. Chavdar Ivanov" <ch...@outlook.hu>.

Good catch, but I mixed this up just in the email
In the query filter is ok - the 2 values are different
FILTER (xsd:decimal(?value1)!=xsd:decimal(?value2)).


-----Original Message-----
From: Marco Neumann <ma...@gmail.com>
Sent: Tuesday, 18 August, 2020 23:44
To: users@jena.apache.org
Subject: Re: Float comparison

take a look at your filter again. seems like you have a typo in your query for the variable name

On Tue, Aug 18, 2020 at 10:17 PM Dr. Chavdar Ivanov <ch...@outlook.hu>
wrote:

> Andy, Richard,
> Thank you for the feedback.
>
> In the graph  I have the 2 values as xsd:float so this is how the data
> is coming
>
> In the SPAQL query I tried to cast the float to decimal by using
> FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).
>
> I am not sure if this is correct way, but I am now seeing a difference
> in the comparison result
>
> 0.1001244561 Is different from 0.1001234590 which is OK
> But these are reported as same 100123456.1     and  100123459.0
> If I get the value before the comparison is executed the xsd:decimal
> of the two values appears to be the same 100123456.0 so this is why !=
> does not reports the difference.
> Here the decimal does not seem to help, but I guess this falls in the
> same category that large absolute values are less precise. So same
> effect as for xsd:float.
>
> Best regards
> Chavdar
>
>
>
> -----Original Message-----
> From: Andy Seaborne <an...@apache.org>
> Sent: Tuesday, 18 August, 2020 19:07
> To: users@jena.apache.org
> Subject: Re: Float comparison
>
>
>
> On 18/08/2020 10:31, Richard Cyganiak wrote:
> > The xsd:float datatype represents IEEE 754 single-precision floating
> point numbers.
> >
> > As with any floating-point datatype, the precision depends on the
> > size
> of the number. Numbers close to zero are very precise. Numbers with a
> large absolute value (large positive or large negative) are less
> precise. For the gory details see for example here:
> >
> > https://en.wikipedia.org/wiki/Single-precision_floating-point_format
> > #P recision_limitations_on_decimal_values_in_[1,_16777216]
> >
> > There is rarely a good reason to use xsd:float in RDF. xsd:double is
> much more precise at a small increase of storage cost (4 more bytes,
> which is negligible given the total size of an RDF triple).
> xsd:decimal provides arbitrary precision (in theory), but is more
> expensive in storage and computation.
> >
> > My general view is that if storage size and performance of
> > mathematical
> computations are a major concern for the application, RDF is probably
> not the best choice—RDF optimises for other concerns. Therefore the
> best choice for representing non-integer numbers in RDF is usually
> xsd:decimal—more expensive, but no issues with precision.
> >
> > Richard
>
> xsd:decimal can record any decimal precision but division may loose
> precision - otherwise "1/3" is infinite storage.
>
> Jena uses 24 digit precision for division for inexact results like 1/3.
>
> >
> >
> >> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <ch...@outlook.hu>
> wrote:
> >>
> >> Hello
> >>
> >>
> >>
> >> I posted the message below to the TopBraid users mailing list and
> >> already clarified that as sh:equals is based on RDF node equality,
> >> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
> >> So I am keeping this for the interest of others in the list
>
> SPARQL has both comparisons.
>
> The "sameTerm()" operator for RDF termequality, and SPARQL "=" for
> value comparison (by op:numeric-equal):
>
>      Andy
>
> >>
> >>
> >>
> >> But on SPARQL float comparison I got an advise to check in this
> >> mailing
> list for other opinions.
> >>
> >> I understand that SPARQL comparison is mathematically based so 1.0
> should be equal to 1. However below in item 2 you will see the numbers
> I compared and I am getting confused. Take into account that in the
> data graph the 2 compared properties are typed literals with datatype float.
> >>
> >> I wanted to know what is the precision when float is compared. So I
> >> have 2 questions
> >>
> >> *       What is the precision? - is it 6th decimal and is it OK to
> compare different forms of float, i.e. one is in scientific form
> >> *       Why I am getting wrong comparison result for bigger values such
> as    100123456.1     and  100123459     which are found as same
> >>
> >>
> >>
> >> Best regards
> >>
> >> Chavdar
> >>
> >>
> >>
> >>
> >>
> >> ========
> >>
> >>
> >>
> >>
> >>
> >> Dear all,
> >>
> >>
> >>
> >> I have a very basic question...
> >>
> >> I need to compare literals that are floats and tried to use two ways.
> >> 1) using sh:equals to compare 2 properties and 2) using SPARQL
> >> where I filter != different values
> >>
> >>
> >>
> >> For the filter I tried using
> >>
> >> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
> >>
> >> or
> >>
> >> FILTER (?value1!=?value1).
> >>
> >> Both give the same outcome.
> >>
> >>
> >>
> >> Below I listed a summary of the tests I did
> >>
> >>
> >>
> >> I think sh:equals treats the literals as strings even though they
> >> are
> floats. It also gives 2 results. I thing this looks like according to
> the SHACL spec although I didn't if the sh:equals ignores the datatype.
> >>
> >>
> >>
> >> However In some cases the result form the SPARQL is kind of
> >> strange. It
> looks like the precision is 10-6, but for the big numbers  and when
> scientific form on float number is used we have something different.
> >>
> >>
> >>
> >> What is followed to define the difference?
> >>
> >> If I use google calculator
> >>
> >> 100123456.1-100.123459E+06=-2.90000000596
> >>
> >>
> >>
> >> Normally it should be OK to compare different forms of float.
> >>
> >>
> >>
> >>
> >>
> >> 1) using sh:equals in the property shape
> >>
> >> Value1 ; value 2  ; comparisson result
> >>
> >> 1.123456 ; 1.123456 ; same
> >>
> >> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
> >>
> >> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
> >>
> >> 30    ;      30.0000001 ; different (sh:equals reports it twice)
> >>
> >> 30     ;      30.000001 ; different (sh:equals reports it twice)
> >>
> >> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
> >>
> >> 100123456.0  ; 100123456.0 ; same
> >>
> >> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
> >>
> >> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
> >>
> >> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
> >> twice)
> >>
> >> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
> >> twice)
> >>
> >> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1     ;  100123459      ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1     ;  100123459.0    ; different (sh:equals reports it
> twice)
> >>
> >>
> >>
> >> 2) using SPARQL (in the property shape)
> >>
> >> 1.123456 ; 1.123456 ; same
> >>
> >> 1.1234560 ; 1.1234561 ; different
> >>
> >> 31.1234560 ; 31.1234561 ;different
> >>
> >> 30    ;      30.0000001 ; same
> >>
> >> 30     ;      30.000001 ; different
> >>
> >> 100123456.0  ; 100123456.1 ; same
> >>
> >> 100123456.0  ; 100123456.0 ; same
> >>
> >> 100123456    ;  100.123456E6 ; same
> >>
> >> 100123456    ;  100.123456E+06 ; same
> >>
> >> -0.123456789  ;  -123.456789E-3 ; same
> >>
> >> -0.123456789  ;  -123.456789E-03 ; same
> >>
> >> 100123456.1    ;  100.123456E+06  ; same
> >>
> >> 100123456.1     ;   100.123459E+06 ; same
> >>
> >> 100123456.1     ;  100123459      ; same
> >>
> >> 100123456.1     ;  100123459.0    ; same
> >>
> >>
> >>
> >> Best regards
> >>
> >> Chavdar
> >>
> >>
> >>
> >
>


--


---
Marco Neumann
KONA

Re: Float comparison

Posted by Marco Neumann <ma...@gmail.com>.

take a look at your filter again. seems like you have a typo in your query
for the variable name

On Tue, Aug 18, 2020 at 10:17 PM Dr. Chavdar Ivanov <ch...@outlook.hu>
wrote:

> Andy, Richard,
> Thank you for the feedback.
>
> In the graph  I have the 2 values as xsd:float so this is how the data is
> coming
>
> In the SPAQL query I tried to cast the float to decimal by using
> FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).
>
> I am not sure if this is correct way, but I am now seeing a difference in
> the comparison result
>
> 0.1001244561 Is different from 0.1001234590 which is OK
> But these are reported as same 100123456.1     and  100123459.0
> If I get the value before the comparison is executed the xsd:decimal of
> the two values appears to be the same 100123456.0 so this is why != does
> not reports the difference.
> Here the decimal does not seem to help, but I guess this falls in the same
> category that large absolute values are less precise. So same effect as for
> xsd:float.
>
> Best regards
> Chavdar
>
>
>
> -----Original Message-----
> From: Andy Seaborne <an...@apache.org>
> Sent: Tuesday, 18 August, 2020 19:07
> To: users@jena.apache.org
> Subject: Re: Float comparison
>
>
>
> On 18/08/2020 10:31, Richard Cyganiak wrote:
> > The xsd:float datatype represents IEEE 754 single-precision floating
> point numbers.
> >
> > As with any floating-point datatype, the precision depends on the size
> of the number. Numbers close to zero are very precise. Numbers with a large
> absolute value (large positive or large negative) are less precise. For the
> gory details see for example here:
> >
> > https://en.wikipedia.org/wiki/Single-precision_floating-point_format#P
> > recision_limitations_on_decimal_values_in_[1,_16777216]
> >
> > There is rarely a good reason to use xsd:float in RDF. xsd:double is
> much more precise at a small increase of storage cost (4 more bytes, which
> is negligible given the total size of an RDF triple). xsd:decimal provides
> arbitrary precision (in theory), but is more expensive in storage and
> computation.
> >
> > My general view is that if storage size and performance of mathematical
> computations are a major concern for the application, RDF is probably not
> the best choice—RDF optimises for other concerns. Therefore the best choice
> for representing non-integer numbers in RDF is usually xsd:decimal—more
> expensive, but no issues with precision.
> >
> > Richard
>
> xsd:decimal can record any decimal precision but division may loose
> precision - otherwise "1/3" is infinite storage.
>
> Jena uses 24 digit precision for division for inexact results like 1/3.
>
> >
> >
> >> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <ch...@outlook.hu>
> wrote:
> >>
> >> Hello
> >>
> >>
> >>
> >> I posted the message below to the TopBraid users mailing list and
> >> already clarified that as sh:equals is based on RDF node equality,
> >> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
> >> So I am keeping this for the interest of others in the list
>
> SPARQL has both comparisons.
>
> The "sameTerm()" operator for RDF termequality, and SPARQL "=" for value
> comparison (by op:numeric-equal):
>
>      Andy
>
> >>
> >>
> >>
> >> But on SPARQL float comparison I got an advise to check in this mailing
> list for other opinions.
> >>
> >> I understand that SPARQL comparison is mathematically based so 1.0
> should be equal to 1. However below in item 2 you will see the numbers I
> compared and I am getting confused. Take into account that in the data
> graph the 2 compared properties are typed literals with datatype float.
> >>
> >> I wanted to know what is the precision when float is compared. So I
> >> have 2 questions
> >>
> >> *       What is the precision? - is it 6th decimal and is it OK to
> compare different forms of float, i.e. one is in scientific form
> >> *       Why I am getting wrong comparison result for bigger values such
> as    100123456.1     and  100123459     which are found as same
> >>
> >>
> >>
> >> Best regards
> >>
> >> Chavdar
> >>
> >>
> >>
> >>
> >>
> >> ========
> >>
> >>
> >>
> >>
> >>
> >> Dear all,
> >>
> >>
> >>
> >> I have a very basic question...
> >>
> >> I need to compare literals that are floats and tried to use two ways.
> >> 1) using sh:equals to compare 2 properties and 2) using SPARQL where
> >> I filter != different values
> >>
> >>
> >>
> >> For the filter I tried using
> >>
> >> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
> >>
> >> or
> >>
> >> FILTER (?value1!=?value1).
> >>
> >> Both give the same outcome.
> >>
> >>
> >>
> >> Below I listed a summary of the tests I did
> >>
> >>
> >>
> >> I think sh:equals treats the literals as strings even though they are
> floats. It also gives 2 results. I thing this looks like according to the
> SHACL spec although I didn't if the sh:equals ignores the datatype.
> >>
> >>
> >>
> >> However In some cases the result form the SPARQL is kind of strange. It
> looks like the precision is 10-6, but for the big numbers  and when
> scientific form on float number is used we have something different.
> >>
> >>
> >>
> >> What is followed to define the difference?
> >>
> >> If I use google calculator
> >>
> >> 100123456.1-100.123459E+06=-2.90000000596
> >>
> >>
> >>
> >> Normally it should be OK to compare different forms of float.
> >>
> >>
> >>
> >>
> >>
> >> 1) using sh:equals in the property shape
> >>
> >> Value1 ; value 2  ; comparisson result
> >>
> >> 1.123456 ; 1.123456 ; same
> >>
> >> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
> >>
> >> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
> >>
> >> 30    ;      30.0000001 ; different (sh:equals reports it twice)
> >>
> >> 30     ;      30.000001 ; different (sh:equals reports it twice)
> >>
> >> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
> >>
> >> 100123456.0  ; 100123456.0 ; same
> >>
> >> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
> >>
> >> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
> >>
> >> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
> >> twice)
> >>
> >> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
> >> twice)
> >>
> >> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1     ;  100123459      ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1     ;  100123459.0    ; different (sh:equals reports it
> twice)
> >>
> >>
> >>
> >> 2) using SPARQL (in the property shape)
> >>
> >> 1.123456 ; 1.123456 ; same
> >>
> >> 1.1234560 ; 1.1234561 ; different
> >>
> >> 31.1234560 ; 31.1234561 ;different
> >>
> >> 30    ;      30.0000001 ; same
> >>
> >> 30     ;      30.000001 ; different
> >>
> >> 100123456.0  ; 100123456.1 ; same
> >>
> >> 100123456.0  ; 100123456.0 ; same
> >>
> >> 100123456    ;  100.123456E6 ; same
> >>
> >> 100123456    ;  100.123456E+06 ; same
> >>
> >> -0.123456789  ;  -123.456789E-3 ; same
> >>
> >> -0.123456789  ;  -123.456789E-03 ; same
> >>
> >> 100123456.1    ;  100.123456E+06  ; same
> >>
> >> 100123456.1     ;   100.123459E+06 ; same
> >>
> >> 100123456.1     ;  100123459      ; same
> >>
> >> 100123456.1     ;  100123459.0    ; same
> >>
> >>
> >>
> >> Best regards
> >>
> >> Chavdar
> >>
> >>
> >>
> >
>


-- 


---
Marco Neumann
KONA

Re: Float comparison

Posted by Andy Seaborne <an...@apache.org>.


On 19/08/2020 10:44, Marco Neumann wrote:
> Andy, yes I would agree xsd:float can lead to some funky behavior here due
> to precision. While you are at it this could also explain why ?y is bound
> to ?x in the example below on blazegraph but still "correctly" mapped in
> Jena. Simply a bug in wikidata/blazegraph that doesn't throw an error and
> is not caught on the server side.

Blazegraph has inlined the value as a float so the lexical form is lost.

In Jena, in memory, it is keeping the lexical form around but as soon as 
you touch it that's lost.

See the example:

qexpr '"0.1111234521234567"^^xsd:float+0'

qexpr '"0.1111234591234567"^^xsd:float+0'


What happens is:

In F&O arithmetic is in one of integer, decimal, float or double. 
Anything else is cast to one of those before

1/ Get float value of the LHS - precision lost.
2/ Cast 0 to float, the least promotion
3/ Do floating point add.

You now have a FP number. It has about 6 digits of precision.

So X+0 != X.

Don't exact compare floats or doubles!


Try this in Java:

         float f =  12345.6789f ;
         System.out.println(f);

==>
         12345.679

6 digits of precision then noise.

Now try TDB2 with

PREFIX : <http://example/>
PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

:s :p "12345.678987654321"^^xsd:float .

     Andy

> 
> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
> 
> SELECT ?x ?y ?z WHERE{
> values ?x { "100123456.01"^^xsd:float }
> values ?y { "100123459.01"^^xsd:float }
> values ?z { "100123451.01"^^xsd:float}
> }
> 
> Blazegraph
> 
> https://query.wikidata.org/#PREFIX%20xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fx%20%3Fy%20%3Fz%20WHERE%7B%0Avalues%20%3Fx%20%7B%20%22100123456.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fy%20%7B%20%22100123459.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fz%20%7B%20%22100123451.01%22%5E%5Exsd%3Afloat%7D%0A%7D
> 
> 
> x                      y                      z
> 100123456.01 100123456.01 100123451.01
> 
> Jena 3.15
> 
> http://www.lotico.com:3030/lotico/sparql?query=PREFIX+xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0A%0D%0ASELECT+%3Fx+%3Fy+%3Fz+WHERE%7B%0D%0Avalues+%3Fx+%7B+%22100123456.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fy+%7B+%22100123459.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fz+%7B+%22100123451.01%22%5E%5Exsd%3Afloat%7D%0D%0A%7D&output=text
> 
> 
> x                                             y
>                 z
> 100123456.01                     100123459.01
> 100123451.01
> 
> 
> On Wed, Aug 19, 2020 at 10:13 AM Andy Seaborne <an...@apache.org> wrote:
> 
>>
>>
>> On 18/08/2020 22:17, Dr. Chavdar Ivanov wrote:
>>> Andy, Richard,
>>> Thank you for the feedback.
>>>
>>> In the graph  I have the 2 values as xsd:float so this is how the data
>> is coming
>>>
>>> In the SPAQL query I tried to cast the float to decimal by using
>>> FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).
>>>
>>> I am not sure if this is correct way, but I am now seeing a difference
>> in the comparison result
>>>
>>> 0.1001244561 Is different from 0.1001234590 which is OK
>>            ^^ typo?
>>
>>
>>> But these are reported as same 100123456.1     and  100123459.0
>>
>>
>> 100123456.1 is not a floating point number. It has more precision than
>> xsf:float can represent.
>>
>> It's "1.00123456E8"^^xsd:float
>>
>>
>> (Please copy and paste expressions into email.)
>>
>> xsd:decimal(?value1)
>>
>> is:
>> evaluate ?value1 to get an xsd:float.
>>
>> which is
>>
>> '"0.1001244561"^^xsd:float
>>
>> Using Jena's expression evaluator:
>>
>> qexpr '"0.1001244561"^^xsd:float+0'
>>    ==>
>> "0.100124456"^^xsd:float
>>
>> See? Already lost precision.
>>
>> Then turn it into a deciminal.
>>
>> it is different to:
>> xsd:decimal(str(?value1))
>>
>> which takes the lexical form, not the floating point value, of ?value1.
>>
>>> If I get the value before the comparison is executed the xsd:decimal of
>> the two values appears to be the same 100123456.0 so this is why != does
>> not reports the difference.
>>> Here the decimal does not seem to help,
>>
>> Because precision was lost making the decimal.  Start with a decimal.
>>
>> xsd:decimal("0.1001244561")
>>    or "0.1001244561"^^xsd:decimal
>>    or 0.1001244561   (in Turtle and SPARQL).
>>
>>> but I guess this falls in the same category that large absolute values
>> are less precise. So same effect as for xsd:float.
>>>
>>> Best regards
>>> Chavdar
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Andy Seaborne <an...@apache.org>
>>> Sent: Tuesday, 18 August, 2020 19:07
>>> To: users@jena.apache.org
>>> Subject: Re: Float comparison
>>>
>>>
>>>
>>> On 18/08/2020 10:31, Richard Cyganiak wrote:
>>>> The xsd:float datatype represents IEEE 754 single-precision floating
>> point numbers.
>>>>
>>>> As with any floating-point datatype, the precision depends on the size
>> of the number. Numbers close to zero are very precise. Numbers with a large
>> absolute value (large positive or large negative) are less precise. For the
>> gory details see for example here:
>>>>
>>>> https://en.wikipedia.org/wiki/Single-precision_floating-point_format#P
>>>> recision_limitations_on_decimal_values_in_[1,_16777216]
>>>>
>>>> There is rarely a good reason to use xsd:float in RDF. xsd:double is
>> much more precise at a small increase of storage cost (4 more bytes, which
>> is negligible given the total size of an RDF triple). xsd:decimal provides
>> arbitrary precision (in theory), but is more expensive in storage and
>> computation.
>>>>
>>>> My general view is that if storage size and performance of mathematical
>> computations are a major concern for the application, RDF is probably not
>> the best choice—RDF optimises for other concerns. Therefore the best choice
>> for representing non-integer numbers in RDF is usually xsd:decimal—more
>> expensive, but no issues with precision.
>>>>
>>>> Richard
>>>
>>> xsd:decimal can record any decimal precision but division may loose
>> precision - otherwise "1/3" is infinite storage.
>>>
>>> Jena uses 24 digit precision for division for inexact results like 1/3.
>>>
>>>>
>>>>
>>>>> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <ch...@outlook.hu>
>> wrote:
>>>>>
>>>>> Hello
>>>>>
>>>>>
>>>>>
>>>>> I posted the message below to the TopBraid users mailing list and
>>>>> already clarified that as sh:equals is based on RDF node equality,
>>>>> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
>>>>> So I am keeping this for the interest of others in the list
>>>
>>> SPARQL has both comparisons.
>>>
>>> The "sameTerm()" operator for RDF termequality, and SPARQL "=" for value
>> comparison (by op:numeric-equal):
>>>
>>>        Andy
>>>
>>>>>
>>>>>
>>>>>
>>>>> But on SPARQL float comparison I got an advise to check in this
>> mailing list for other opinions.
>>>>>
>>>>> I understand that SPARQL comparison is mathematically based so 1.0
>> should be equal to 1. However below in item 2 you will see the numbers I
>> compared and I am getting confused. Take into account that in the data
>> graph the 2 compared properties are typed literals with datatype float.
>>>>>
>>>>> I wanted to know what is the precision when float is compared. So I
>>>>> have 2 questions
>>>>>
>>>>> *       What is the precision? - is it 6th decimal and is it OK to
>> compare different forms of float, i.e. one is in scientific form
>>>>> *       Why I am getting wrong comparison result for bigger values
>> such as    100123456.1     and  100123459     which are found as same
>>>>>
>>>>>
>>>>>
>>>>> Best regards
>>>>>
>>>>> Chavdar
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ========
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dear all,
>>>>>
>>>>>
>>>>>
>>>>> I have a very basic question...
>>>>>
>>>>> I need to compare literals that are floats and tried to use two ways.
>>>>> 1) using sh:equals to compare 2 properties and 2) using SPARQL where
>>>>> I filter != different values
>>>>>
>>>>>
>>>>>
>>>>> For the filter I tried using
>>>>>
>>>>> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
>>>>>
>>>>> or
>>>>>
>>>>> FILTER (?value1!=?value1).
>>>>>
>>>>> Both give the same outcome.
>>>>>
>>>>>
>>>>>
>>>>> Below I listed a summary of the tests I did
>>>>>
>>>>>
>>>>>
>>>>> I think sh:equals treats the literals as strings even though they are
>> floats. It also gives 2 results. I thing this looks like according to the
>> SHACL spec although I didn't if the sh:equals ignores the datatype.
>>>>>
>>>>>
>>>>>
>>>>> However In some cases the result form the SPARQL is kind of strange.
>> It looks like the precision is 10-6, but for the big numbers  and when
>> scientific form on float number is used we have something different.
>>>>>
>>>>>
>>>>>
>>>>> What is followed to define the difference?
>>>>>
>>>>> If I use google calculator
>>>>>
>>>>> 100123456.1-100.123459E+06=-2.90000000596
>>>>>
>>>>>
>>>>>
>>>>> Normally it should be OK to compare different forms of float.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 1) using sh:equals in the property shape
>>>>>
>>>>> Value1 ; value 2  ; comparisson result
>>>>>
>>>>> 1.123456 ; 1.123456 ; same
>>>>>
>>>>> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
>>>>>
>>>>> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
>>>>>
>>>>> 30    ;      30.0000001 ; different (sh:equals reports it twice)
>>>>>
>>>>> 30     ;      30.000001 ; different (sh:equals reports it twice)
>>>>>
>>>>> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
>>>>>
>>>>> 100123456.0  ; 100123456.0 ; same
>>>>>
>>>>> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
>>>>>
>>>>> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
>>>>>
>>>>> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
>>>>> twice)
>>>>>
>>>>> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
>>>>> twice)
>>>>>
>>>>> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it
>> twice)
>>>>>
>>>>> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it
>> twice)
>>>>>
>>>>> 100123456.1     ;  100123459      ; different (sh:equals reports it
>> twice)
>>>>>
>>>>> 100123456.1     ;  100123459.0    ; different (sh:equals reports it
>> twice)
>>>>>
>>>>>
>>>>>
>>>>> 2) using SPARQL (in the property shape)
>>>>>
>>>>> 1.123456 ; 1.123456 ; same
>>>>>
>>>>> 1.1234560 ; 1.1234561 ; different
>>>>>
>>>>> 31.1234560 ; 31.1234561 ;different
>>>>>
>>>>> 30    ;      30.0000001 ; same
>>>>>
>>>>> 30     ;      30.000001 ; different
>>>>>
>>>>> 100123456.0  ; 100123456.1 ; same
>>>>>
>>>>> 100123456.0  ; 100123456.0 ; same
>>>>>
>>>>> 100123456    ;  100.123456E6 ; same
>>>>>
>>>>> 100123456    ;  100.123456E+06 ; same
>>>>>
>>>>> -0.123456789  ;  -123.456789E-3 ; same
>>>>>
>>>>> -0.123456789  ;  -123.456789E-03 ; same
>>>>>
>>>>> 100123456.1    ;  100.123456E+06  ; same
>>>>>
>>>>> 100123456.1     ;   100.123459E+06 ; same
>>>>>
>>>>> 100123456.1     ;  100123459      ; same
>>>>>
>>>>> 100123456.1     ;  100123459.0    ; same
>>>>>
>>>>>
>>>>>
>>>>> Best regards
>>>>>
>>>>> Chavdar
>>>>>
>>>>>
>>>>>
>>>>
>>
> 
>

RE: Float comparison

Posted by "Dr. Chavdar Ivanov" <ch...@outlook.hu>.

Andy, Richard,
Thanks a lot.
Indeed this xsd:float as a starting point makes the problem together with the fact that it is better that the float is in the scientific form
I think all these explanations help a lot to also put some requirements at the data source side.

Best regards
Chavdar

-----Original Message-----
From: Marco Neumann <ma...@gmail.com>
Sent: Wednesday, 19 August, 2020 11:45
To: users@jena.apache.org
Subject: Re: Float comparison

Andy, yes I would agree xsd:float can lead to some funky behavior here due to precision. While you are at it this could also explain why ?y is bound to ?x in the example below on blazegraph but still "correctly" mapped in Jena. Simply a bug in wikidata/blazegraph that doesn't throw an error and is not caught on the server side.

PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>

SELECT ?x ?y ?z WHERE{
values ?x { "100123456.01"^^xsd:float }
values ?y { "100123459.01"^^xsd:float }
values ?z { "100123451.01"^^xsd:float}
}

Blazegraph

https://query.wikidata.org/#PREFIX%20xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fx%20%3Fy%20%3Fz%20WHERE%7B%0Avalues%20%3Fx%20%7B%20%22100123456.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fy%20%7B%20%22100123459.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fz%20%7B%20%22100123451.01%22%5E%5Exsd%3Afloat%7D%0A%7D


x                      y                      z
100123456.01 100123456.01 100123451.01

Jena 3.15

http://www.lotico.com:3030/lotico/sparql?query=PREFIX+xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0A%0D%0ASELECT+%3Fx+%3Fy+%3Fz+WHERE%7B%0D%0Avalues+%3Fx+%7B+%22100123456.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fy+%7B+%22100123459.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fz+%7B+%22100123451.01%22%5E%5Exsd%3Afloat%7D%0D%0A%7D&output=text


x                                             y
               z
100123456.01                     100123459.01
100123451.01


On Wed, Aug 19, 2020 at 10:13 AM Andy Seaborne <an...@apache.org> wrote:

>
>
> On 18/08/2020 22:17, Dr. Chavdar Ivanov wrote:
> > Andy, Richard,
> > Thank you for the feedback.
> >
> > In the graph  I have the 2 values as xsd:float so this is how the
> > data
> is coming
> >
> > In the SPAQL query I tried to cast the float to decimal by using
> > FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).
> >
> > I am not sure if this is correct way, but I am now seeing a
> > difference
> in the comparison result
> >
> > 0.1001244561 Is different from 0.1001234590 which is OK
>           ^^ typo?
>
>
> > But these are reported as same 100123456.1     and  100123459.0
>
>
> 100123456.1 is not a floating point number. It has more precision than
> xsf:float can represent.
>
> It's "1.00123456E8"^^xsd:float
>
>
> (Please copy and paste expressions into email.)
>
> xsd:decimal(?value1)
>
> is:
> evaluate ?value1 to get an xsd:float.
>
> which is
>
> '"0.1001244561"^^xsd:float
>
> Using Jena's expression evaluator:
>
> qexpr '"0.1001244561"^^xsd:float+0'
>   ==>
> "0.100124456"^^xsd:float
>
> See? Already lost precision.
>
> Then turn it into a deciminal.
>
> it is different to:
> xsd:decimal(str(?value1))
>
> which takes the lexical form, not the floating point value, of ?value1.
>
> > If I get the value before the comparison is executed the xsd:decimal
> > of
> the two values appears to be the same 100123456.0 so this is why !=
> does not reports the difference.
> > Here the decimal does not seem to help,
>
> Because precision was lost making the decimal.  Start with a decimal.
>
> xsd:decimal("0.1001244561")
>   or "0.1001244561"^^xsd:decimal
>   or 0.1001244561   (in Turtle and SPARQL).
>
> > but I guess this falls in the same category that large absolute
> > values
> are less precise. So same effect as for xsd:float.
> >
> > Best regards
> > Chavdar
> >
> >
> >
> > -----Original Message-----
> > From: Andy Seaborne <an...@apache.org>
> > Sent: Tuesday, 18 August, 2020 19:07
> > To: users@jena.apache.org
> > Subject: Re: Float comparison
> >
> >
> >
> > On 18/08/2020 10:31, Richard Cyganiak wrote:
> >> The xsd:float datatype represents IEEE 754 single-precision
> >> floating
> point numbers.
> >>
> >> As with any floating-point datatype, the precision depends on the
> >> size
> of the number. Numbers close to zero are very precise. Numbers with a
> large absolute value (large positive or large negative) are less
> precise. For the gory details see for example here:
> >>
> >> https://en.wikipedia.org/wiki/Single-precision_floating-point_forma
> >> t#P recision_limitations_on_decimal_values_in_[1,_16777216]
> >>
> >> There is rarely a good reason to use xsd:float in RDF. xsd:double
> >> is
> much more precise at a small increase of storage cost (4 more bytes,
> which is negligible given the total size of an RDF triple).
> xsd:decimal provides arbitrary precision (in theory), but is more
> expensive in storage and computation.
> >>
> >> My general view is that if storage size and performance of
> >> mathematical
> computations are a major concern for the application, RDF is probably
> not the best choice—RDF optimises for other concerns. Therefore the
> best choice for representing non-integer numbers in RDF is usually
> xsd:decimal—more expensive, but no issues with precision.
> >>
> >> Richard
> >
> > xsd:decimal can record any decimal precision but division may loose
> precision - otherwise "1/3" is infinite storage.
> >
> > Jena uses 24 digit precision for division for inexact results like 1/3.
> >
> >>
> >>
> >>> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov
> >>> <ch...@outlook.hu>
> wrote:
> >>>
> >>> Hello
> >>>
> >>>
> >>>
> >>> I posted the message below to the TopBraid users mailing list and
> >>> already clarified that as sh:equals is based on RDF node equality,
> >>> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
> >>> So I am keeping this for the interest of others in the list
> >
> > SPARQL has both comparisons.
> >
> > The "sameTerm()" operator for RDF termequality, and SPARQL "=" for
> > value
> comparison (by op:numeric-equal):
> >
> >       Andy
> >
> >>>
> >>>
> >>>
> >>> But on SPARQL float comparison I got an advise to check in this
> mailing list for other opinions.
> >>>
> >>> I understand that SPARQL comparison is mathematically based so 1.0
> should be equal to 1. However below in item 2 you will see the numbers
> I compared and I am getting confused. Take into account that in the
> data graph the 2 compared properties are typed literals with datatype float.
> >>>
> >>> I wanted to know what is the precision when float is compared. So
> >>> I have 2 questions
> >>>
> >>> *       What is the precision? - is it 6th decimal and is it OK to
> compare different forms of float, i.e. one is in scientific form
> >>> *       Why I am getting wrong comparison result for bigger values
> such as    100123456.1     and  100123459     which are found as same
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Chavdar
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ========
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Dear all,
> >>>
> >>>
> >>>
> >>> I have a very basic question...
> >>>
> >>> I need to compare literals that are floats and tried to use two ways.
> >>> 1) using sh:equals to compare 2 properties and 2) using SPARQL
> >>> where I filter != different values
> >>>
> >>>
> >>>
> >>> For the filter I tried using
> >>>
> >>> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
> >>>
> >>> or
> >>>
> >>> FILTER (?value1!=?value1).
> >>>
> >>> Both give the same outcome.
> >>>
> >>>
> >>>
> >>> Below I listed a summary of the tests I did
> >>>
> >>>
> >>>
> >>> I think sh:equals treats the literals as strings even though they
> >>> are
> floats. It also gives 2 results. I thing this looks like according to
> the SHACL spec although I didn't if the sh:equals ignores the datatype.
> >>>
> >>>
> >>>
> >>> However In some cases the result form the SPARQL is kind of strange.
> It looks like the precision is 10-6, but for the big numbers  and when
> scientific form on float number is used we have something different.
> >>>
> >>>
> >>>
> >>> What is followed to define the difference?
> >>>
> >>> If I use google calculator
> >>>
> >>> 100123456.1-100.123459E+06=-2.90000000596
> >>>
> >>>
> >>>
> >>> Normally it should be OK to compare different forms of float.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 1) using sh:equals in the property shape
> >>>
> >>> Value1 ; value 2  ; comparisson result
> >>>
> >>> 1.123456 ; 1.123456 ; same
> >>>
> >>> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
> >>>
> >>> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
> >>>
> >>> 30    ;      30.0000001 ; different (sh:equals reports it twice)
> >>>
> >>> 30     ;      30.000001 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456.0  ; 100123456.1 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> 100123456.0  ; 100123456.0 ; same
> >>>
> >>> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
> >>>
> >>> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;  100123459      ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;  100123459.0    ; different (sh:equals reports it
> twice)
> >>>
> >>>
> >>>
> >>> 2) using SPARQL (in the property shape)
> >>>
> >>> 1.123456 ; 1.123456 ; same
> >>>
> >>> 1.1234560 ; 1.1234561 ; different
> >>>
> >>> 31.1234560 ; 31.1234561 ;different
> >>>
> >>> 30    ;      30.0000001 ; same
> >>>
> >>> 30     ;      30.000001 ; different
> >>>
> >>> 100123456.0  ; 100123456.1 ; same
> >>>
> >>> 100123456.0  ; 100123456.0 ; same
> >>>
> >>> 100123456    ;  100.123456E6 ; same
> >>>
> >>> 100123456    ;  100.123456E+06 ; same
> >>>
> >>> -0.123456789  ;  -123.456789E-3 ; same
> >>>
> >>> -0.123456789  ;  -123.456789E-03 ; same
> >>>
> >>> 100123456.1    ;  100.123456E+06  ; same
> >>>
> >>> 100123456.1     ;   100.123459E+06 ; same
> >>>
> >>> 100123456.1     ;  100123459      ; same
> >>>
> >>> 100123456.1     ;  100123459.0    ; same
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Chavdar
> >>>
> >>>
> >>>
> >>
>


--


---
Marco Neumann
KONA

Re: Float comparison

Posted by Marco Neumann <ma...@gmail.com>.

Andy, yes I would agree xsd:float can lead to some funky behavior here due
to precision. While you are at it this could also explain why ?y is bound
to ?x in the example below on blazegraph but still "correctly" mapped in
Jena. Simply a bug in wikidata/blazegraph that doesn't throw an error and
is not caught on the server side.

PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>

SELECT ?x ?y ?z WHERE{
values ?x { "100123456.01"^^xsd:float }
values ?y { "100123459.01"^^xsd:float }
values ?z { "100123451.01"^^xsd:float}
}

Blazegraph

https://query.wikidata.org/#PREFIX%20xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fx%20%3Fy%20%3Fz%20WHERE%7B%0Avalues%20%3Fx%20%7B%20%22100123456.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fy%20%7B%20%22100123459.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fz%20%7B%20%22100123451.01%22%5E%5Exsd%3Afloat%7D%0A%7D


x                      y                      z
100123456.01 100123456.01 100123451.01

Jena 3.15

http://www.lotico.com:3030/lotico/sparql?query=PREFIX+xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0A%0D%0ASELECT+%3Fx+%3Fy+%3Fz+WHERE%7B%0D%0Avalues+%3Fx+%7B+%22100123456.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fy+%7B+%22100123459.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fz+%7B+%22100123451.01%22%5E%5Exsd%3Afloat%7D%0D%0A%7D&output=text


x                                             y
               z
100123456.01                     100123459.01
100123451.01


On Wed, Aug 19, 2020 at 10:13 AM Andy Seaborne <an...@apache.org> wrote:

>
>
> On 18/08/2020 22:17, Dr. Chavdar Ivanov wrote:
> > Andy, Richard,
> > Thank you for the feedback.
> >
> > In the graph  I have the 2 values as xsd:float so this is how the data
> is coming
> >
> > In the SPAQL query I tried to cast the float to decimal by using
> > FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).
> >
> > I am not sure if this is correct way, but I am now seeing a difference
> in the comparison result
> >
> > 0.1001244561 Is different from 0.1001234590 which is OK
>           ^^ typo?
>
>
> > But these are reported as same 100123456.1     and  100123459.0
>
>
> 100123456.1 is not a floating point number. It has more precision than
> xsf:float can represent.
>
> It's "1.00123456E8"^^xsd:float
>
>
> (Please copy and paste expressions into email.)
>
> xsd:decimal(?value1)
>
> is:
> evaluate ?value1 to get an xsd:float.
>
> which is
>
> '"0.1001244561"^^xsd:float
>
> Using Jena's expression evaluator:
>
> qexpr '"0.1001244561"^^xsd:float+0'
>   ==>
> "0.100124456"^^xsd:float
>
> See? Already lost precision.
>
> Then turn it into a deciminal.
>
> it is different to:
> xsd:decimal(str(?value1))
>
> which takes the lexical form, not the floating point value, of ?value1.
>
> > If I get the value before the comparison is executed the xsd:decimal of
> the two values appears to be the same 100123456.0 so this is why != does
> not reports the difference.
> > Here the decimal does not seem to help,
>
> Because precision was lost making the decimal.  Start with a decimal.
>
> xsd:decimal("0.1001244561")
>   or "0.1001244561"^^xsd:decimal
>   or 0.1001244561   (in Turtle and SPARQL).
>
> > but I guess this falls in the same category that large absolute values
> are less precise. So same effect as for xsd:float.
> >
> > Best regards
> > Chavdar
> >
> >
> >
> > -----Original Message-----
> > From: Andy Seaborne <an...@apache.org>
> > Sent: Tuesday, 18 August, 2020 19:07
> > To: users@jena.apache.org
> > Subject: Re: Float comparison
> >
> >
> >
> > On 18/08/2020 10:31, Richard Cyganiak wrote:
> >> The xsd:float datatype represents IEEE 754 single-precision floating
> point numbers.
> >>
> >> As with any floating-point datatype, the precision depends on the size
> of the number. Numbers close to zero are very precise. Numbers with a large
> absolute value (large positive or large negative) are less precise. For the
> gory details see for example here:
> >>
> >> https://en.wikipedia.org/wiki/Single-precision_floating-point_format#P
> >> recision_limitations_on_decimal_values_in_[1,_16777216]
> >>
> >> There is rarely a good reason to use xsd:float in RDF. xsd:double is
> much more precise at a small increase of storage cost (4 more bytes, which
> is negligible given the total size of an RDF triple). xsd:decimal provides
> arbitrary precision (in theory), but is more expensive in storage and
> computation.
> >>
> >> My general view is that if storage size and performance of mathematical
> computations are a major concern for the application, RDF is probably not
> the best choice—RDF optimises for other concerns. Therefore the best choice
> for representing non-integer numbers in RDF is usually xsd:decimal—more
> expensive, but no issues with precision.
> >>
> >> Richard
> >
> > xsd:decimal can record any decimal precision but division may loose
> precision - otherwise "1/3" is infinite storage.
> >
> > Jena uses 24 digit precision for division for inexact results like 1/3.
> >
> >>
> >>
> >>> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <ch...@outlook.hu>
> wrote:
> >>>
> >>> Hello
> >>>
> >>>
> >>>
> >>> I posted the message below to the TopBraid users mailing list and
> >>> already clarified that as sh:equals is based on RDF node equality,
> >>> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
> >>> So I am keeping this for the interest of others in the list
> >
> > SPARQL has both comparisons.
> >
> > The "sameTerm()" operator for RDF termequality, and SPARQL "=" for value
> comparison (by op:numeric-equal):
> >
> >       Andy
> >
> >>>
> >>>
> >>>
> >>> But on SPARQL float comparison I got an advise to check in this
> mailing list for other opinions.
> >>>
> >>> I understand that SPARQL comparison is mathematically based so 1.0
> should be equal to 1. However below in item 2 you will see the numbers I
> compared and I am getting confused. Take into account that in the data
> graph the 2 compared properties are typed literals with datatype float.
> >>>
> >>> I wanted to know what is the precision when float is compared. So I
> >>> have 2 questions
> >>>
> >>> *       What is the precision? - is it 6th decimal and is it OK to
> compare different forms of float, i.e. one is in scientific form
> >>> *       Why I am getting wrong comparison result for bigger values
> such as    100123456.1     and  100123459     which are found as same
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Chavdar
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ========
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Dear all,
> >>>
> >>>
> >>>
> >>> I have a very basic question...
> >>>
> >>> I need to compare literals that are floats and tried to use two ways.
> >>> 1) using sh:equals to compare 2 properties and 2) using SPARQL where
> >>> I filter != different values
> >>>
> >>>
> >>>
> >>> For the filter I tried using
> >>>
> >>> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
> >>>
> >>> or
> >>>
> >>> FILTER (?value1!=?value1).
> >>>
> >>> Both give the same outcome.
> >>>
> >>>
> >>>
> >>> Below I listed a summary of the tests I did
> >>>
> >>>
> >>>
> >>> I think sh:equals treats the literals as strings even though they are
> floats. It also gives 2 results. I thing this looks like according to the
> SHACL spec although I didn't if the sh:equals ignores the datatype.
> >>>
> >>>
> >>>
> >>> However In some cases the result form the SPARQL is kind of strange.
> It looks like the precision is 10-6, but for the big numbers  and when
> scientific form on float number is used we have something different.
> >>>
> >>>
> >>>
> >>> What is followed to define the difference?
> >>>
> >>> If I use google calculator
> >>>
> >>> 100123456.1-100.123459E+06=-2.90000000596
> >>>
> >>>
> >>>
> >>> Normally it should be OK to compare different forms of float.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 1) using sh:equals in the property shape
> >>>
> >>> Value1 ; value 2  ; comparisson result
> >>>
> >>> 1.123456 ; 1.123456 ; same
> >>>
> >>> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
> >>>
> >>> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
> >>>
> >>> 30    ;      30.0000001 ; different (sh:equals reports it twice)
> >>>
> >>> 30     ;      30.000001 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456.0  ; 100123456.0 ; same
> >>>
> >>> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
> >>>
> >>> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;  100123459      ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;  100123459.0    ; different (sh:equals reports it
> twice)
> >>>
> >>>
> >>>
> >>> 2) using SPARQL (in the property shape)
> >>>
> >>> 1.123456 ; 1.123456 ; same
> >>>
> >>> 1.1234560 ; 1.1234561 ; different
> >>>
> >>> 31.1234560 ; 31.1234561 ;different
> >>>
> >>> 30    ;      30.0000001 ; same
> >>>
> >>> 30     ;      30.000001 ; different
> >>>
> >>> 100123456.0  ; 100123456.1 ; same
> >>>
> >>> 100123456.0  ; 100123456.0 ; same
> >>>
> >>> 100123456    ;  100.123456E6 ; same
> >>>
> >>> 100123456    ;  100.123456E+06 ; same
> >>>
> >>> -0.123456789  ;  -123.456789E-3 ; same
> >>>
> >>> -0.123456789  ;  -123.456789E-03 ; same
> >>>
> >>> 100123456.1    ;  100.123456E+06  ; same
> >>>
> >>> 100123456.1     ;   100.123459E+06 ; same
> >>>
> >>> 100123456.1     ;  100123459      ; same
> >>>
> >>> 100123456.1     ;  100123459.0    ; same
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Chavdar
> >>>
> >>>
> >>>
> >>
>


-- 


---
Marco Neumann
KONA

Re: Float comparison

Posted by Andy Seaborne <an...@apache.org>.


On 18/08/2020 22:17, Dr. Chavdar Ivanov wrote:
> Andy, Richard,
> Thank you for the feedback.
> 
> In the graph  I have the 2 values as xsd:float so this is how the data is coming
> 
> In the SPAQL query I tried to cast the float to decimal by using
> FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).
> 
> I am not sure if this is correct way, but I am now seeing a difference in the comparison result
> 
> 0.1001244561 Is different from 0.1001234590 which is OK
          ^^ typo?


> But these are reported as same 100123456.1     and  100123459.0


100123456.1 is not a floating point number. It has more precision than 
xsf:float can represent.

It's "1.00123456E8"^^xsd:float


(Please copy and paste expressions into email.)

xsd:decimal(?value1)

is:
evaluate ?value1 to get an xsd:float.

which is

'"0.1001244561"^^xsd:float

Using Jena's expression evaluator:

qexpr '"0.1001244561"^^xsd:float+0'
  ==>
"0.100124456"^^xsd:float

See? Already lost precision.

Then turn it into a deciminal.

it is different to:
xsd:decimal(str(?value1))

which takes the lexical form, not the floating point value, of ?value1.

> If I get the value before the comparison is executed the xsd:decimal of the two values appears to be the same 100123456.0 so this is why != does not reports the difference.
> Here the decimal does not seem to help,

Because precision was lost making the decimal.  Start with a decimal.

xsd:decimal("0.1001244561")
  or "0.1001244561"^^xsd:decimal
  or 0.1001244561   (in Turtle and SPARQL).

> but I guess this falls in the same category that large absolute values are less precise. So same effect as for xsd:float.
> 
> Best regards
> Chavdar
> 
> 
> 
> -----Original Message-----
> From: Andy Seaborne <an...@apache.org>
> Sent: Tuesday, 18 August, 2020 19:07
> To: users@jena.apache.org
> Subject: Re: Float comparison
> 
> 
> 
> On 18/08/2020 10:31, Richard Cyganiak wrote:
>> The xsd:float datatype represents IEEE 754 single-precision floating point numbers.
>>
>> As with any floating-point datatype, the precision depends on the size of the number. Numbers close to zero are very precise. Numbers with a large absolute value (large positive or large negative) are less precise. For the gory details see for example here:
>>
>> https://en.wikipedia.org/wiki/Single-precision_floating-point_format#P
>> recision_limitations_on_decimal_values_in_[1,_16777216]
>>
>> There is rarely a good reason to use xsd:float in RDF. xsd:double is much more precise at a small increase of storage cost (4 more bytes, which is negligible given the total size of an RDF triple). xsd:decimal provides arbitrary precision (in theory), but is more expensive in storage and computation.
>>
>> My general view is that if storage size and performance of mathematical computations are a major concern for the application, RDF is probably not the best choice—RDF optimises for other concerns. Therefore the best choice for representing non-integer numbers in RDF is usually xsd:decimal—more expensive, but no issues with precision.
>>
>> Richard
> 
> xsd:decimal can record any decimal precision but division may loose precision - otherwise "1/3" is infinite storage.
> 
> Jena uses 24 digit precision for division for inexact results like 1/3.
> 
>>
>>
>>> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <ch...@outlook.hu> wrote:
>>>
>>> Hello
>>>
>>>
>>>
>>> I posted the message below to the TopBraid users mailing list and
>>> already clarified that as sh:equals is based on RDF node equality,
>>> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
>>> So I am keeping this for the interest of others in the list
> 
> SPARQL has both comparisons.
> 
> The "sameTerm()" operator for RDF termequality, and SPARQL "=" for value comparison (by op:numeric-equal):
> 
>       Andy
> 
>>>
>>>
>>>
>>> But on SPARQL float comparison I got an advise to check in this mailing list for other opinions.
>>>
>>> I understand that SPARQL comparison is mathematically based so 1.0 should be equal to 1. However below in item 2 you will see the numbers I compared and I am getting confused. Take into account that in the data graph the 2 compared properties are typed literals with datatype float.
>>>
>>> I wanted to know what is the precision when float is compared. So I
>>> have 2 questions
>>>
>>> *       What is the precision? - is it 6th decimal and is it OK to compare different forms of float, i.e. one is in scientific form
>>> *       Why I am getting wrong comparison result for bigger values such as    100123456.1     and  100123459     which are found as same
>>>
>>>
>>>
>>> Best regards
>>>
>>> Chavdar
>>>
>>>
>>>
>>>
>>>
>>> ========
>>>
>>>
>>>
>>>
>>>
>>> Dear all,
>>>
>>>
>>>
>>> I have a very basic question...
>>>
>>> I need to compare literals that are floats and tried to use two ways.
>>> 1) using sh:equals to compare 2 properties and 2) using SPARQL where
>>> I filter != different values
>>>
>>>
>>>
>>> For the filter I tried using
>>>
>>> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
>>>
>>> or
>>>
>>> FILTER (?value1!=?value1).
>>>
>>> Both give the same outcome.
>>>
>>>
>>>
>>> Below I listed a summary of the tests I did
>>>
>>>
>>>
>>> I think sh:equals treats the literals as strings even though they are floats. It also gives 2 results. I thing this looks like according to the SHACL spec although I didn't if the sh:equals ignores the datatype.
>>>
>>>
>>>
>>> However In some cases the result form the SPARQL is kind of strange. It looks like the precision is 10-6, but for the big numbers  and when scientific form on float number is used we have something different.
>>>
>>>
>>>
>>> What is followed to define the difference?
>>>
>>> If I use google calculator
>>>
>>> 100123456.1-100.123459E+06=-2.90000000596
>>>
>>>
>>>
>>> Normally it should be OK to compare different forms of float.
>>>
>>>
>>>
>>>
>>>
>>> 1) using sh:equals in the property shape
>>>
>>> Value1 ; value 2  ; comparisson result
>>>
>>> 1.123456 ; 1.123456 ; same
>>>
>>> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
>>>
>>> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
>>>
>>> 30    ;      30.0000001 ; different (sh:equals reports it twice)
>>>
>>> 30     ;      30.000001 ; different (sh:equals reports it twice)
>>>
>>> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
>>>
>>> 100123456.0  ; 100123456.0 ; same
>>>
>>> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
>>>
>>> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
>>>
>>> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
>>> twice)
>>>
>>> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
>>> twice)
>>>
>>> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it twice)
>>>
>>> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it twice)
>>>
>>> 100123456.1     ;  100123459      ; different (sh:equals reports it twice)
>>>
>>> 100123456.1     ;  100123459.0    ; different (sh:equals reports it twice)
>>>
>>>
>>>
>>> 2) using SPARQL (in the property shape)
>>>
>>> 1.123456 ; 1.123456 ; same
>>>
>>> 1.1234560 ; 1.1234561 ; different
>>>
>>> 31.1234560 ; 31.1234561 ;different
>>>
>>> 30    ;      30.0000001 ; same
>>>
>>> 30     ;      30.000001 ; different
>>>
>>> 100123456.0  ; 100123456.1 ; same
>>>
>>> 100123456.0  ; 100123456.0 ; same
>>>
>>> 100123456    ;  100.123456E6 ; same
>>>
>>> 100123456    ;  100.123456E+06 ; same
>>>
>>> -0.123456789  ;  -123.456789E-3 ; same
>>>
>>> -0.123456789  ;  -123.456789E-03 ; same
>>>
>>> 100123456.1    ;  100.123456E+06  ; same
>>>
>>> 100123456.1     ;   100.123459E+06 ; same
>>>
>>> 100123456.1     ;  100123459      ; same
>>>
>>> 100123456.1     ;  100123459.0    ; same
>>>
>>>
>>>
>>> Best regards
>>>
>>> Chavdar
>>>
>>>
>>>
>>

RE: Float comparison

Posted by "Dr. Chavdar Ivanov" <ch...@outlook.hu>.

Andy, Richard,
Thank you for the feedback.

In the graph  I have the 2 values as xsd:float so this is how the data is coming 

In the SPAQL query I tried to cast the float to decimal by using 
FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).

I am not sure if this is correct way, but I am now seeing a difference in the comparison result

0.1001244561 Is different from 0.1001234590 which is OK
But these are reported as same 100123456.1     and  100123459.0  
If I get the value before the comparison is executed the xsd:decimal of the two values appears to be the same 100123456.0 so this is why != does not reports the difference.
Here the decimal does not seem to help, but I guess this falls in the same category that large absolute values are less precise. So same effect as for xsd:float.

Best regards
Chavdar 



-----Original Message-----
From: Andy Seaborne <an...@apache.org> 
Sent: Tuesday, 18 August, 2020 19:07
To: users@jena.apache.org
Subject: Re: Float comparison



On 18/08/2020 10:31, Richard Cyganiak wrote:
> The xsd:float datatype represents IEEE 754 single-precision floating point numbers.
> 
> As with any floating-point datatype, the precision depends on the size of the number. Numbers close to zero are very precise. Numbers with a large absolute value (large positive or large negative) are less precise. For the gory details see for example here:
> 
> https://en.wikipedia.org/wiki/Single-precision_floating-point_format#P
> recision_limitations_on_decimal_values_in_[1,_16777216]
> 
> There is rarely a good reason to use xsd:float in RDF. xsd:double is much more precise at a small increase of storage cost (4 more bytes, which is negligible given the total size of an RDF triple). xsd:decimal provides arbitrary precision (in theory), but is more expensive in storage and computation.
> 
> My general view is that if storage size and performance of mathematical computations are a major concern for the application, RDF is probably not the best choice—RDF optimises for other concerns. Therefore the best choice for representing non-integer numbers in RDF is usually xsd:decimal—more expensive, but no issues with precision.
> 
> Richard

xsd:decimal can record any decimal precision but division may loose precision - otherwise "1/3" is infinite storage.

Jena uses 24 digit precision for division for inexact results like 1/3.

> 
> 
>> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <ch...@outlook.hu> wrote:
>>
>> Hello
>>
>>
>>
>> I posted the message below to the TopBraid users mailing list and 
>> already clarified that as sh:equals is based on RDF node equality, 
>> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct. 
>> So I am keeping this for the interest of others in the list

SPARQL has both comparisons.

The "sameTerm()" operator for RDF termequality, and SPARQL "=" for value comparison (by op:numeric-equal):

     Andy

>>
>>
>>
>> But on SPARQL float comparison I got an advise to check in this mailing list for other opinions.
>>
>> I understand that SPARQL comparison is mathematically based so 1.0 should be equal to 1. However below in item 2 you will see the numbers I compared and I am getting confused. Take into account that in the data graph the 2 compared properties are typed literals with datatype float.
>>
>> I wanted to know what is the precision when float is compared. So I 
>> have 2 questions
>>
>> *       What is the precision? - is it 6th decimal and is it OK to compare different forms of float, i.e. one is in scientific form
>> *       Why I am getting wrong comparison result for bigger values such as    100123456.1     and  100123459     which are found as same
>>
>>
>>
>> Best regards
>>
>> Chavdar
>>
>>
>>
>>
>>
>> ========
>>
>>
>>
>>
>>
>> Dear all,
>>
>>
>>
>> I have a very basic question...
>>
>> I need to compare literals that are floats and tried to use two ways. 
>> 1) using sh:equals to compare 2 properties and 2) using SPARQL where 
>> I filter != different values
>>
>>
>>
>> For the filter I tried using
>>
>> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
>>
>> or
>>
>> FILTER (?value1!=?value1).
>>
>> Both give the same outcome.
>>
>>
>>
>> Below I listed a summary of the tests I did
>>
>>
>>
>> I think sh:equals treats the literals as strings even though they are floats. It also gives 2 results. I thing this looks like according to the SHACL spec although I didn't if the sh:equals ignores the datatype.
>>
>>
>>
>> However In some cases the result form the SPARQL is kind of strange. It looks like the precision is 10-6, but for the big numbers  and when scientific form on float number is used we have something different.
>>
>>
>>
>> What is followed to define the difference?
>>
>> If I use google calculator
>>
>> 100123456.1-100.123459E+06=-2.90000000596
>>
>>
>>
>> Normally it should be OK to compare different forms of float.
>>
>>
>>
>>
>>
>> 1) using sh:equals in the property shape
>>
>> Value1 ; value 2  ; comparisson result
>>
>> 1.123456 ; 1.123456 ; same
>>
>> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
>>
>> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
>>
>> 30    ;      30.0000001 ; different (sh:equals reports it twice)
>>
>> 30     ;      30.000001 ; different (sh:equals reports it twice)
>>
>> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
>>
>> 100123456.0  ; 100123456.0 ; same
>>
>> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
>>
>> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
>>
>> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it 
>> twice)
>>
>> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it 
>> twice)
>>
>> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it twice)
>>
>> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it twice)
>>
>> 100123456.1     ;  100123459      ; different (sh:equals reports it twice)
>>
>> 100123456.1     ;  100123459.0    ; different (sh:equals reports it twice)
>>
>>
>>
>> 2) using SPARQL (in the property shape)
>>
>> 1.123456 ; 1.123456 ; same
>>
>> 1.1234560 ; 1.1234561 ; different
>>
>> 31.1234560 ; 31.1234561 ;different
>>
>> 30    ;      30.0000001 ; same
>>
>> 30     ;      30.000001 ; different
>>
>> 100123456.0  ; 100123456.1 ; same
>>
>> 100123456.0  ; 100123456.0 ; same
>>
>> 100123456    ;  100.123456E6 ; same
>>
>> 100123456    ;  100.123456E+06 ; same
>>
>> -0.123456789  ;  -123.456789E-3 ; same
>>
>> -0.123456789  ;  -123.456789E-03 ; same
>>
>> 100123456.1    ;  100.123456E+06  ; same
>>
>> 100123456.1     ;   100.123459E+06 ; same
>>
>> 100123456.1     ;  100123459      ; same
>>
>> 100123456.1     ;  100123459.0    ; same
>>
>>
>>
>> Best regards
>>
>> Chavdar
>>
>>
>>
>

Re: Float comparison

Posted by Andy Seaborne <an...@apache.org>.


On 18/08/2020 10:31, Richard Cyganiak wrote:
> The xsd:float datatype represents IEEE 754 single-precision floating point numbers.
> 
> As with any floating-point datatype, the precision depends on the size of the number. Numbers close to zero are very precise. Numbers with a large absolute value (large positive or large negative) are less precise. For the gory details see for example here:
> 
> https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Precision_limitations_on_decimal_values_in_[1,_16777216]
> 
> There is rarely a good reason to use xsd:float in RDF. xsd:double is much more precise at a small increase of storage cost (4 more bytes, which is negligible given the total size of an RDF triple). xsd:decimal provides arbitrary precision (in theory), but is more expensive in storage and computation.
> 
> My general view is that if storage size and performance of mathematical computations are a major concern for the application, RDF is probably not the best choice—RDF optimises for other concerns. Therefore the best choice for representing non-integer numbers in RDF is usually xsd:decimal—more expensive, but no issues with precision.
> 
> Richard

xsd:decimal can record any decimal precision but division may loose 
precision - otherwise "1/3" is infinite storage.

Jena uses 24 digit precision for division for inexact results like 1/3.

> 
> 
>> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <ch...@outlook.hu> wrote:
>>
>> Hello
>>
>>
>>
>> I posted the message below to the TopBraid users mailing list and already clarified that as sh:equals is based on RDF node equality, values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct. So I am keeping this for the interest of others in the list

SPARQL has both comparisons.

The "sameTerm()" operator for RDF termequality, and SPARQL "=" for value 
comparison (by op:numeric-equal):

     Andy

>>
>>
>>
>> But on SPARQL float comparison I got an advise to check in this mailing list for other opinions.
>>
>> I understand that SPARQL comparison is mathematically based so 1.0 should be equal to 1. However below in item 2 you will see the numbers I compared and I am getting confused. Take into account that in the data graph the 2 compared properties are typed literals with datatype float.
>>
>> I wanted to know what is the precision when float is compared. So I have 2 questions
>>
>> *       What is the precision? - is it 6th decimal and is it OK to compare different forms of float, i.e. one is in scientific form
>> *       Why I am getting wrong comparison result for bigger values such as    100123456.1     and  100123459     which are found as same
>>
>>
>>
>> Best regards
>>
>> Chavdar
>>
>>
>>
>>
>>
>> ========
>>
>>
>>
>>
>>
>> Dear all,
>>
>>
>>
>> I have a very basic question...
>>
>> I need to compare literals that are floats and tried to use two ways. 1) using sh:equals to compare 2 properties and 2) using SPARQL where I filter != different values
>>
>>
>>
>> For the filter I tried using
>>
>> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
>>
>> or
>>
>> FILTER (?value1!=?value1).
>>
>> Both give the same outcome.
>>
>>
>>
>> Below I listed a summary of the tests I did
>>
>>
>>
>> I think sh:equals treats the literals as strings even though they are floats. It also gives 2 results. I thing this looks like according to the SHACL spec although I didn't if the sh:equals ignores the datatype.
>>
>>
>>
>> However In some cases the result form the SPARQL is kind of strange. It looks like the precision is 10-6, but for the big numbers  and when scientific form on float number is used we have something different.
>>
>>
>>
>> What is followed to define the difference?
>>
>> If I use google calculator
>>
>> 100123456.1-100.123459E+06=-2.90000000596
>>
>>
>>
>> Normally it should be OK to compare different forms of float.
>>
>>
>>
>>
>>
>> 1) using sh:equals in the property shape
>>
>> Value1 ; value 2  ; comparisson result
>>
>> 1.123456 ; 1.123456 ; same
>>
>> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
>>
>> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
>>
>> 30    ;      30.0000001 ; different (sh:equals reports it twice)
>>
>> 30     ;      30.000001 ; different (sh:equals reports it twice)
>>
>> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
>>
>> 100123456.0  ; 100123456.0 ; same
>>
>> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
>>
>> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
>>
>> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it twice)
>>
>> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it twice)
>>
>> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it twice)
>>
>> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it twice)
>>
>> 100123456.1     ;  100123459      ; different (sh:equals reports it twice)
>>
>> 100123456.1     ;  100123459.0    ; different (sh:equals reports it twice)
>>
>>
>>
>> 2) using SPARQL (in the property shape)
>>
>> 1.123456 ; 1.123456 ; same
>>
>> 1.1234560 ; 1.1234561 ; different
>>
>> 31.1234560 ; 31.1234561 ;different
>>
>> 30    ;      30.0000001 ; same
>>
>> 30     ;      30.000001 ; different
>>
>> 100123456.0  ; 100123456.1 ; same
>>
>> 100123456.0  ; 100123456.0 ; same
>>
>> 100123456    ;  100.123456E6 ; same
>>
>> 100123456    ;  100.123456E+06 ; same
>>
>> -0.123456789  ;  -123.456789E-3 ; same
>>
>> -0.123456789  ;  -123.456789E-03 ; same
>>
>> 100123456.1    ;  100.123456E+06  ; same
>>
>> 100123456.1     ;   100.123459E+06 ; same
>>
>> 100123456.1     ;  100123459      ; same
>>
>> 100123456.1     ;  100123459.0    ; same
>>
>>
>>
>> Best regards
>>
>> Chavdar
>>
>>
>>
>

Re: Float comparison

Posted by Richard Cyganiak <ri...@cyganiak.de>.

The xsd:float datatype represents IEEE 754 single-precision floating point numbers.

As with any floating-point datatype, the precision depends on the size of the number. Numbers close to zero are very precise. Numbers with a large absolute value (large positive or large negative) are less precise. For the gory details see for example here:

https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Precision_limitations_on_decimal_values_in_[1,_16777216]

There is rarely a good reason to use xsd:float in RDF. xsd:double is much more precise at a small increase of storage cost (4 more bytes, which is negligible given the total size of an RDF triple). xsd:decimal provides arbitrary precision (in theory), but is more expensive in storage and computation.

My general view is that if storage size and performance of mathematical computations are a major concern for the application, RDF is probably not the best choice—RDF optimises for other concerns. Therefore the best choice for representing non-integer numbers in RDF is usually xsd:decimal—more expensive, but no issues with precision.

Richard


> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <ch...@outlook.hu> wrote:
> 
> Hello
> 
> 
> 
> I posted the message below to the TopBraid users mailing list and already clarified that as sh:equals is based on RDF node equality, values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct. So I am keeping this for the interest of others in the list
> 
> 
> 
> But on SPARQL float comparison I got an advise to check in this mailing list for other opinions.
> 
> I understand that SPARQL comparison is mathematically based so 1.0 should be equal to 1. However below in item 2 you will see the numbers I compared and I am getting confused. Take into account that in the data graph the 2 compared properties are typed literals with datatype float.
> 
> I wanted to know what is the precision when float is compared. So I have 2 questions
> 
> *       What is the precision? - is it 6th decimal and is it OK to compare different forms of float, i.e. one is in scientific form
> *       Why I am getting wrong comparison result for bigger values such as    100123456.1     and  100123459     which are found as same
> 
> 
> 
> Best regards
> 
> Chavdar
> 
> 
> 
> 
> 
> ========
> 
> 
> 
> 
> 
> Dear all,
> 
> 
> 
> I have a very basic question...
> 
> I need to compare literals that are floats and tried to use two ways. 1) using sh:equals to compare 2 properties and 2) using SPARQL where I filter != different values
> 
> 
> 
> For the filter I tried using
> 
> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
> 
> or
> 
> FILTER (?value1!=?value1).
> 
> Both give the same outcome.
> 
> 
> 
> Below I listed a summary of the tests I did
> 
> 
> 
> I think sh:equals treats the literals as strings even though they are floats. It also gives 2 results. I thing this looks like according to the SHACL spec although I didn't if the sh:equals ignores the datatype.
> 
> 
> 
> However In some cases the result form the SPARQL is kind of strange. It looks like the precision is 10-6, but for the big numbers  and when scientific form on float number is used we have something different.
> 
> 
> 
> What is followed to define the difference?
> 
> If I use google calculator
> 
> 100123456.1-100.123459E+06=-2.90000000596
> 
> 
> 
> Normally it should be OK to compare different forms of float.
> 
> 
> 
> 
> 
> 1) using sh:equals in the property shape
> 
> Value1 ; value 2  ; comparisson result
> 
> 1.123456 ; 1.123456 ; same
> 
> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
> 
> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
> 
> 30    ;      30.0000001 ; different (sh:equals reports it twice)
> 
> 30     ;      30.000001 ; different (sh:equals reports it twice)
> 
> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
> 
> 100123456.0  ; 100123456.0 ; same
> 
> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
> 
> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
> 
> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it twice)
> 
> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it twice)
> 
> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it twice)
> 
> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it twice)
> 
> 100123456.1     ;  100123459      ; different (sh:equals reports it twice)
> 
> 100123456.1     ;  100123459.0    ; different (sh:equals reports it twice)
> 
> 
> 
> 2) using SPARQL (in the property shape)
> 
> 1.123456 ; 1.123456 ; same
> 
> 1.1234560 ; 1.1234561 ; different
> 
> 31.1234560 ; 31.1234561 ;different
> 
> 30    ;      30.0000001 ; same
> 
> 30     ;      30.000001 ; different
> 
> 100123456.0  ; 100123456.1 ; same
> 
> 100123456.0  ; 100123456.0 ; same
> 
> 100123456    ;  100.123456E6 ; same
> 
> 100123456    ;  100.123456E+06 ; same
> 
> -0.123456789  ;  -123.456789E-3 ; same
> 
> -0.123456789  ;  -123.456789E-03 ; same
> 
> 100123456.1    ;  100.123456E+06  ; same
> 
> 100123456.1     ;   100.123459E+06 ; same
> 
> 100123456.1     ;  100123459      ; same
> 
> 100123456.1     ;  100123459.0    ; same
> 
> 
> 
> Best regards
> 
> Chavdar
> 
> 
>