You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by Kevin Pretterhofer <ke...@student.tugraz.at> on 2021/01/13 11:52:47 UTC

[Question] Regarding test cases and floating point errors

Hi all,

I hope this is the right place to ask questions. If not I am sorry, but 
it would be nice to direct me to the right place then.

So my question is about the unit tests. Currently I am implementing a 
simple gaussian classifier. Besides the class prior probabilities,
this implementation also outputs the respective mean values, 
determinants, and covariance matrices, respectively their inverses.

Now I face the problem, that the values of my SystemDS implementation 
and my R implementation are quite off for random generated test
matrices. I assume that this is due to floating point errors / floating 
point precision. At first glance they look quite similar, but since it 
outputs scientific notation,
one can clearly see that the magnitude by which they are off is quite a 
lot. E.g. for my determinant comparison I got the following:

(1,1): 1.2390121975770675E14 <--> 1.2390101941279517E14
(3,1): 1.510440018532407E85 <--> 1.5104388050968705E85
(2,1): 1.6420264128994816E38 <--> 1.6420263615987703E38
(5,1): 8.881025000211518E70 <--> 8.881037540234089E70
(4,1): 1.7888589555748764E22 <--> 1.78885700537877E22

I face similar issues with the inverses of my covariance matrices.

Since I use the eigenvalues and eigenvectors for calculating the 
determinant and the inverse in SystemDL, I already compared them to the 
eigenvalues and vectors which R
computes, and already there, differences (due to floating point 
differences) are observable.

My question would be now, how to test, respectively compare such 
matrices and vectors?
It seems a bit odd to me, to set the tolerance to something like 
"2.0E80" or so.

Would be great if someone could help me out!

Best,
Kevin


Re: [Question] Regarding test cases and floating point errors

Posted by Kevin Pretterhofer <ke...@student.tugraz.at>.
Hi,

thanks for pointing  to the compareMatricesBitAvgDistance function! I 
tried it out with different settings.
2^14 for the maxUnitsOfLeastPrecision parameter was a nice guess, 
however probably too optimistic.
Taking the log2 of some distances of the error output, sadly reveals 
values beyond the 30.

I will play around with it, and hopefully find some nice and adequate 
settings.

Thanks again!


Best,
Kevin



On 13.01.21 13:13, Baunsgaard, Sebastian wrote:
>
> Hi Kevin,
>
>
> Great question, and thanks for posting it to the mailing list!
>
>
> When comparing the floating point values i would suggest using our 
> "distance in bits" for matrices containing double values. This gives 
> you the ability to specify a relative difference between the values, 
> rather than the typical double comparison with an epsilon specified to 
> an exact value.
>
>
> You can find the method to compare in:
>
> File:  src/test/java/org/apache/sysds/test/TestUtils.java
>
> Method:    compareMatricesBitAvgDistance.
>
>
> Note the bit distance is a long, that specify how many of the tailing 
> bits of the double values distance is allowed. The long can be in the 
> entire long positive value space with Long.MAX_VALUE, meaning totally 
> different values expected, to 0, meaning exactly the same encoded 
> double value. I would suggest trying out using 2^14 to start with.
>
>
> It is normal that values can be off by 2.0E80 if the values we are 
> talking about is in those orders of magnitude, so therefore it is okay 
> for those tests to use an epsilon like that. Furthermore in systemds 
> we use Kahan correction of our double values, that make them able to 
> correct for rounding errors more detailed than the 64 bit double 
> values. This rounding can make the values deviate after a number of 
> operations such that the difference becomes more exaggerated.
>
>
> Best regards
>
> Sebastian Baunsgaard
>
>
>
> ------------------------------------------------------------------------
> *From:* Kevin Pretterhofer <ke...@student.tugraz.at>
> *Sent:* Wednesday, January 13, 2021 12:52:47 PM
> *To:* dev@systemds.apache.org
> *Subject:* [Question] Regarding test cases and floating point errors
> Hi all,
>
> I hope this is the right place to ask questions. If not I am sorry, but
> it would be nice to direct me to the right place then.
>
> So my question is about the unit tests. Currently I am implementing a
> simple gaussian classifier. Besides the class prior probabilities,
> this implementation also outputs the respective mean values,
> determinants, and covariance matrices, respectively their inverses.
>
> Now I face the problem, that the values of my SystemDS implementation
> and my R implementation are quite off for random generated test
> matrices. I assume that this is due to floating point errors / floating
> point precision. At first glance they look quite similar, but since it
> outputs scientific notation,
> one can clearly see that the magnitude by which they are off is quite a
> lot. E.g. for my determinant comparison I got the following:
>
> (1,1): 1.2390121975770675E14 <--> 1.2390101941279517E14
> (3,1): 1.510440018532407E85 <--> 1.5104388050968705E85
> (2,1): 1.6420264128994816E38 <--> 1.6420263615987703E38
> (5,1): 8.881025000211518E70 <--> 8.881037540234089E70
> (4,1): 1.7888589555748764E22 <--> 1.78885700537877E22
>
> I face similar issues with the inverses of my covariance matrices.
>
> Since I use the eigenvalues and eigenvectors for calculating the
> determinant and the inverse in SystemDL, I already compared them to the
> eigenvalues and vectors which R
> computes, and already there, differences (due to floating point
> differences) are observable.
>
> My question would be now, how to test, respectively compare such
> matrices and vectors?
> It seems a bit odd to me, to set the tolerance to something like
> "2.0E80" or so.
>
> Would be great if someone could help me out!
>
> Best,
> Kevin
>

Re: [Question] Regarding test cases and floating point errors

Posted by "Baunsgaard, Sebastian" <ba...@tugraz.at.INVALID>.
Hi Kevin,


Great question, and thanks for posting it to the mailing list!


When comparing the floating point values i would suggest using our "distance in bits" for matrices containing double values. This gives you the ability to specify a relative difference between the values, rather than the typical double comparison with an epsilon specified to an exact value.


You can find the method to compare in:

File:    src/test/java/org/apache/sysds/test/TestUtils.java

Method:    compareMatricesBitAvgDistance.


Note the bit distance is a long, that specify how many of the tailing bits of the double values distance is allowed. The long can be in the entire long positive value space with Long.MAX_VALUE, meaning totally different values expected, to 0, meaning exactly the same encoded double value. I would suggest trying out using 2^14 to start with.


It is normal that values can be off by 2.0E80 if the values we are talking about is in those orders of magnitude, so therefore it is okay for those tests to use an epsilon like that. Furthermore in systemds we use Kahan correction of our double values, that make them able to correct for rounding errors more detailed than the 64 bit double values. This rounding can make the values deviate after a number of operations such that the difference becomes more exaggerated.


Best regards

Sebastian Baunsgaard



________________________________
From: Kevin Pretterhofer <ke...@student.tugraz.at>
Sent: Wednesday, January 13, 2021 12:52:47 PM
To: dev@systemds.apache.org
Subject: [Question] Regarding test cases and floating point errors

Hi all,

I hope this is the right place to ask questions. If not I am sorry, but
it would be nice to direct me to the right place then.

So my question is about the unit tests. Currently I am implementing a
simple gaussian classifier. Besides the class prior probabilities,
this implementation also outputs the respective mean values,
determinants, and covariance matrices, respectively their inverses.

Now I face the problem, that the values of my SystemDS implementation
and my R implementation are quite off for random generated test
matrices. I assume that this is due to floating point errors / floating
point precision. At first glance they look quite similar, but since it
outputs scientific notation,
one can clearly see that the magnitude by which they are off is quite a
lot. E.g. for my determinant comparison I got the following:

(1,1): 1.2390121975770675E14 <--> 1.2390101941279517E14
(3,1): 1.510440018532407E85 <--> 1.5104388050968705E85
(2,1): 1.6420264128994816E38 <--> 1.6420263615987703E38
(5,1): 8.881025000211518E70 <--> 8.881037540234089E70
(4,1): 1.7888589555748764E22 <--> 1.78885700537877E22

I face similar issues with the inverses of my covariance matrices.

Since I use the eigenvalues and eigenvectors for calculating the
determinant and the inverse in SystemDL, I already compared them to the
eigenvalues and vectors which R
computes, and already there, differences (due to floating point
differences) are observable.

My question would be now, how to test, respectively compare such
matrices and vectors?
It seems a bit odd to me, to set the tolerance to something like
"2.0E80" or so.

Would be great if someone could help me out!

Best,
Kevin