You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Martin Rosellen <Ma...@fu-berlin.de> on 2012/11/06 10:14:20 UTC
[math] Pearson Correaltion NaNs
Dear all,
I am having difficulties using the Pearson Correlation because it seems
that it does not work if some cell is NaN. Is that intended? Here is
some code:
public static void main(String[] args) {
double [] row1 = new double[]{3,4};
double [] row2 = new double[]{1,8};
double [] row3 = new double[]{Double.NaN,4};
double[][] data = new double[][]{row1,row2,row3};
System.out.println(Arrays.deepToString(data));
PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
System.out.println(coefMatrixP.getCorrelationMatrix().toString());
}
Greetings
Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org
Re: [math] Pearson Correaltion NaNs
Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
Hi.
> >>I analyse blood tests and not every blood sample is analysed for the
> >>same values. It would be best if rows (tuples) that contain a NaN
> >>are ignored.
> >>
> >It would be dangerous if Commons Math would simply discard NaN values as
> >they could have occurred because of a bug (that would go unnoticed).
> >
> >The best thing would be to filter your data before attempting to analyze
> >them. The "to-be-ignored" property is indeed application-dependent, and so
> >is your choice of NaN to represent data that require special handling.
> >
> >
> >Best regards,
> >Gilles
> That is interesting, because the SpearmanCorrelation does work with
> NaNs. The NaNs somehow get a rank and make influence therefore the
> coeffizient.
>
> I also found a nanStrategy in the method rank in the class
> NaturalRanking. But if I set the nanStrategy to REMOVED I get an
> exception.
That would seem to indicate a bug in Commons Math.
Could you please create a minimal example that triggers the unexpected
exception and open an issue on the bug tracking system?
https://issues.apache.org/jira/browse/MATH
Thanks,
Gilles
> Regards
> Martin
> >>>Hello.
> >>>
> >>>>I am having difficulties using the Pearson Correlation because it
> >>>>seems that it does not work if some cell is NaN. Is that intended?
> >>>Very likely. [When NaN appears in a computation, it propagates and the
> >>>result is NaN.]
> >>>
> >>>>Here is some code:
> >>>>
> >>>>public static void main(String[] args) {
> >>>> double [] row1 = new double[]{3,4};
> >>>> double [] row2 = new double[]{1,8};
> >>>> double [] row3 = new double[]{Double.NaN,4};
> >>>> double[][] data = new double[][]{row1,row2,row3};
> >>>> System.out.println(Arrays.deepToString(data));
> >>>>
> >>>> PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
> >>>>
> >>>>System.out.println(coefMatrixP.getCorrelationMatrix().toString());
> >>>> }
> >>>>
> >>>What would you suggest should happen?
> >>>
> >>>
> >>>Regards,
> >>>Gilles
> >>>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org
Re: [math] Pearson Correaltion NaNs
Posted by Martin Rosellen <Ma...@fu-berlin.de>.
Hi Gilles,
> On Tue, Nov 06, 2012 at 03:06:07PM +0100, Martin Rosellen wrote:
>> Hi,
>>
>> I analyse blood tests and not every blood sample is analysed for the
>> same values. It would be best if rows (tuples) that contain a NaN
>> are ignored.
>>
> It would be dangerous if Commons Math would simply discard NaN values as
> they could have occurred because of a bug (that would go unnoticed).
>
> The best thing would be to filter your data before attempting to analyze
> them. The "to-be-ignored" property is indeed application-dependent, and so
> is your choice of NaN to represent data that require special handling.
>
>
> Best regards,
> Gilles
That is interesting, because the SpearmanCorrelation does work with
NaNs. The NaNs somehow get a rank and make influence therefore the
coeffizient.
I also found a nanStrategy in the method rank in the class
NaturalRanking. But if I set the nanStrategy to REMOVED I get an exception.
Regards
Martin
>>> Hello.
>>>
>>>> I am having difficulties using the Pearson Correlation because it
>>>> seems that it does not work if some cell is NaN. Is that intended?
>>> Very likely. [When NaN appears in a computation, it propagates and the
>>> result is NaN.]
>>>
>>>> Here is some code:
>>>>
>>>> public static void main(String[] args) {
>>>> double [] row1 = new double[]{3,4};
>>>> double [] row2 = new double[]{1,8};
>>>> double [] row3 = new double[]{Double.NaN,4};
>>>> double[][] data = new double[][]{row1,row2,row3};
>>>> System.out.println(Arrays.deepToString(data));
>>>>
>>>> PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
>>>>
>>>> System.out.println(coefMatrixP.getCorrelationMatrix().toString());
>>>> }
>>>>
>>> What would you suggest should happen?
>>>
>>>
>>> Regards,
>>> Gilles
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org
Re: [math] Pearson Correaltion NaNs
Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Tue, Nov 06, 2012 at 03:06:07PM +0100, Martin Rosellen wrote:
> Hi,
>
> I analyse blood tests and not every blood sample is analysed for the
> same values. It would be best if rows (tuples) that contain a NaN
> are ignored.
>
It would be dangerous if Commons Math would simply discard NaN values as
they could have occurred because of a bug (that would go unnoticed).
The best thing would be to filter your data before attempting to analyze
them. The "to-be-ignored" property is indeed application-dependent, and so
is your choice of NaN to represent data that require special handling.
Best regards,
Gilles
>
> >Hello.
> >
> >>I am having difficulties using the Pearson Correlation because it
> >>seems that it does not work if some cell is NaN. Is that intended?
> >Very likely. [When NaN appears in a computation, it propagates and the
> >result is NaN.]
> >
> >>Here is some code:
> >>
> >>public static void main(String[] args) {
> >> double [] row1 = new double[]{3,4};
> >> double [] row2 = new double[]{1,8};
> >> double [] row3 = new double[]{Double.NaN,4};
> >> double[][] data = new double[][]{row1,row2,row3};
> >> System.out.println(Arrays.deepToString(data));
> >>
> >> PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
> >>
> >>System.out.println(coefMatrixP.getCorrelationMatrix().toString());
> >> }
> >>
> >What would you suggest should happen?
> >
> >
> >Regards,
> >Gilles
> >
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org
Re: [math] Pearson Correaltion NaNs
Posted by Martin Rosellen <Ma...@fu-berlin.de>.
Hi,
I analyse blood tests and not every blood sample is analysed for the
same values. It would be best if rows (tuples) that contain a NaN are
ignored.
Kind regards
Martin
> Hello.
>
>> I am having difficulties using the Pearson Correlation because it
>> seems that it does not work if some cell is NaN. Is that intended?
> Very likely. [When NaN appears in a computation, it propagates and the
> result is NaN.]
>
>> Here is some code:
>>
>> public static void main(String[] args) {
>> double [] row1 = new double[]{3,4};
>> double [] row2 = new double[]{1,8};
>> double [] row3 = new double[]{Double.NaN,4};
>> double[][] data = new double[][]{row1,row2,row3};
>> System.out.println(Arrays.deepToString(data));
>>
>> PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
>>
>> System.out.println(coefMatrixP.getCorrelationMatrix().toString());
>> }
>>
> What would you suggest should happen?
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org
Re: [math] Pearson Correaltion NaNs
Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
Hello.
>
> I am having difficulties using the Pearson Correlation because it
> seems that it does not work if some cell is NaN. Is that intended?
Very likely. [When NaN appears in a computation, it propagates and the
result is NaN.]
> Here is some code:
>
> public static void main(String[] args) {
> double [] row1 = new double[]{3,4};
> double [] row2 = new double[]{1,8};
> double [] row3 = new double[]{Double.NaN,4};
> double[][] data = new double[][]{row1,row2,row3};
> System.out.println(Arrays.deepToString(data));
>
> PearsonsCorrelation coefMatrixP = new PearsonsCorrelation(data);
>
> System.out.println(coefMatrixP.getCorrelationMatrix().toString());
> }
>
What would you suggest should happen?
Regards,
Gilles
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org