You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by Martin Rosellen <Ma...@fu-berlin.de> on 2012/11/06 10:14:20 UTC

[math] Pearson Correaltion NaNs

Dear all,

I am having difficulties using the Pearson Correlation because it seems 
that it does not work if some cell is NaN. Is that intended? Here is 
some code:

public static void main(String[] args) {
         double [] row1 = new double[]{3,4};
         double [] row2 = new double[]{1,8};
         double [] row3 = new double[]{Double.NaN,4};
         double[][] data = new double[][]{row1,row2,row3};
         System.out.println(Arrays.deepToString(data));

         PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);

System.out.println(coefMatrixP.getCorrelationMatrix().toString());
     }

Greetings
Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] Pearson Correaltion NaNs

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hi.

> >>I analyse blood tests and not every blood sample is analysed for the
> >>same values. It would be best if rows (tuples) that contain a NaN
> >>are ignored.
> >>
> >It would be dangerous if Commons Math would simply discard NaN values as
> >they could have occurred because of a bug (that would go unnoticed).
> >
> >The best thing would be to filter your data before attempting to analyze
> >them. The "to-be-ignored" property is indeed application-dependent, and so
> >is your choice of NaN to represent data that require special handling.
> >
> >
> >Best regards,
> >Gilles
> That is interesting, because the SpearmanCorrelation does work with
> NaNs. The NaNs somehow get a rank and make influence therefore the
> coeffizient.
> 
> I also found a nanStrategy in the method rank in the class
> NaturalRanking. But if I set the nanStrategy to REMOVED I get an
> exception.

That would seem to indicate a bug in Commons Math.
Could you please create a minimal example that triggers the unexpected
exception and open an issue on the bug tracking system?
  https://issues.apache.org/jira/browse/MATH


Thanks,
Gilles

> Regards
> Martin
> >>>Hello.
> >>>
> >>>>I am having difficulties using the Pearson Correlation because it
> >>>>seems that it does not work if some cell is NaN. Is that intended?
> >>>Very likely. [When NaN appears in a computation, it propagates and the
> >>>result is NaN.]
> >>>
> >>>>Here is some code:
> >>>>
> >>>>public static void main(String[] args) {
> >>>>         double [] row1 = new double[]{3,4};
> >>>>         double [] row2 = new double[]{1,8};
> >>>>         double [] row3 = new double[]{Double.NaN,4};
> >>>>         double[][] data = new double[][]{row1,row2,row3};
> >>>>         System.out.println(Arrays.deepToString(data));
> >>>>
> >>>>         PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
> >>>>
> >>>>System.out.println(coefMatrixP.getCorrelationMatrix().toString());
> >>>>     }
> >>>>
> >>>What would you suggest should happen?
> >>>
> >>>
> >>>Regards,
> >>>Gilles
> >>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] Pearson Correaltion NaNs

Posted by Martin Rosellen <Ma...@fu-berlin.de>.

Hi Gilles,

> On Tue, Nov 06, 2012 at 03:06:07PM +0100, Martin Rosellen wrote:
>> Hi,
>>
>> I analyse blood tests and not every blood sample is analysed for the
>> same values. It would be best if rows (tuples) that contain a NaN
>> are ignored.
>>
> It would be dangerous if Commons Math would simply discard NaN values as
> they could have occurred because of a bug (that would go unnoticed).
>
> The best thing would be to filter your data before attempting to analyze
> them. The "to-be-ignored" property is indeed application-dependent, and so
> is your choice of NaN to represent data that require special handling.
>
>
> Best regards,
> Gilles
That is interesting, because the SpearmanCorrelation does work with 
NaNs. The NaNs somehow get a rank and make influence therefore the 
coeffizient.

I also found a nanStrategy in the method rank in the class 
NaturalRanking. But if I set the nanStrategy to REMOVED I get an exception.

Regards
Martin
>>> Hello.
>>>
>>>> I am having difficulties using the Pearson Correlation because it
>>>> seems that it does not work if some cell is NaN. Is that intended?
>>> Very likely. [When NaN appears in a computation, it propagates and the
>>> result is NaN.]
>>>
>>>> Here is some code:
>>>>
>>>> public static void main(String[] args) {
>>>>          double [] row1 = new double[]{3,4};
>>>>          double [] row2 = new double[]{1,8};
>>>>          double [] row3 = new double[]{Double.NaN,4};
>>>>          double[][] data = new double[][]{row1,row2,row3};
>>>>          System.out.println(Arrays.deepToString(data));
>>>>
>>>>          PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
>>>>
>>>> System.out.println(coefMatrixP.getCorrelationMatrix().toString());
>>>>      }
>>>>
>>> What would you suggest should happen?
>>>
>>>
>>> Regards,
>>> Gilles
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] Pearson Correaltion NaNs

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

On Tue, Nov 06, 2012 at 03:06:07PM +0100, Martin Rosellen wrote:
> Hi,
> 
> I analyse blood tests and not every blood sample is analysed for the
> same values. It would be best if rows (tuples) that contain a NaN
> are ignored.
> 

It would be dangerous if Commons Math would simply discard NaN values as
they could have occurred because of a bug (that would go unnoticed).

The best thing would be to filter your data before attempting to analyze
them. The "to-be-ignored" property is indeed application-dependent, and so
is your choice of NaN to represent data that require special handling.


Best regards,
Gilles

> 
> >Hello.
> >
> >>I am having difficulties using the Pearson Correlation because it
> >>seems that it does not work if some cell is NaN. Is that intended?
> >Very likely. [When NaN appears in a computation, it propagates and the
> >result is NaN.]
> >
> >>Here is some code:
> >>
> >>public static void main(String[] args) {
> >>         double [] row1 = new double[]{3,4};
> >>         double [] row2 = new double[]{1,8};
> >>         double [] row3 = new double[]{Double.NaN,4};
> >>         double[][] data = new double[][]{row1,row2,row3};
> >>         System.out.println(Arrays.deepToString(data));
> >>
> >>         PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
> >>
> >>System.out.println(coefMatrixP.getCorrelationMatrix().toString());
> >>     }
> >>
> >What would you suggest should happen?
> >
> >
> >Regards,
> >Gilles
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] Pearson Correaltion NaNs

Posted by Martin Rosellen <Ma...@fu-berlin.de>.

Hi,

I analyse blood tests and not every blood sample is analysed for the 
same values. It would be best if rows (tuples) that contain a NaN are 
ignored.

Kind regards
Martin


> Hello.
>
>> I am having difficulties using the Pearson Correlation because it
>> seems that it does not work if some cell is NaN. Is that intended?
> Very likely. [When NaN appears in a computation, it propagates and the
> result is NaN.]
>
>> Here is some code:
>>
>> public static void main(String[] args) {
>>          double [] row1 = new double[]{3,4};
>>          double [] row2 = new double[]{1,8};
>>          double [] row3 = new double[]{Double.NaN,4};
>>          double[][] data = new double[][]{row1,row2,row3};
>>          System.out.println(Arrays.deepToString(data));
>>
>>          PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
>>
>> System.out.println(coefMatrixP.getCorrelationMatrix().toString());
>>      }
>>
> What would you suggest should happen?
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] Pearson Correaltion NaNs

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hello.

> 
> I am having difficulties using the Pearson Correlation because it
> seems that it does not work if some cell is NaN. Is that intended?

Very likely. [When NaN appears in a computation, it propagates and the
result is NaN.]

> Here is some code:
> 
> public static void main(String[] args) {
>         double [] row1 = new double[]{3,4};
>         double [] row2 = new double[]{1,8};
>         double [] row3 = new double[]{Double.NaN,4};
>         double[][] data = new double[][]{row1,row2,row3};
>         System.out.println(Arrays.deepToString(data));
> 
>         PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
> 
> System.out.println(coefMatrixP.getCorrelationMatrix().toString());
>     }
> 

What would you suggest should happen?


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org