You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Phil Steitz (JIRA)" <ji...@apache.org> on 2018/04/30 19:04:00 UTC

[jira] [Commented] (MATH-1453) Mann-Whitney U Test returns maximum of U1 and U2

    [ https://issues.apache.org/jira/browse/MATH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458890#comment-16458890 ] 

Phil Steitz commented on MATH-1453:
-----------------------------------

The minimum value is what should be reported as the value of the statistic.  That is in fact what is used by the code to estimate p-values.  The p-value computation also suffers from some accuracy issues.  First, no continuity correction is applied when computing the normal approximation.  Second (as noted in the javadoc), nothing is done to adjust the variance in the presence of ties in the data.   The patch applied to fix [this issue|https://github.com/Hipparchus-Math/hipparchus/issues/38] in Hipparchus could be fairly easily backported to current [math] code.  The patch there also includes exact computation of p-values for very small samples.  Patches welcome there too, of course.

> Mann-Whitney U Test returns maximum of U1 and U2
> ------------------------------------------------
>
>                 Key: MATH-1453
>                 URL: https://issues.apache.org/jira/browse/MATH-1453
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.6.1
>            Reporter: Nikos Katsipoulakis
>            Priority: Critical
>
> Currently, I need to use Mann-Whitney U Test and I figured out that Apache Commons Math has it implemented. After consulting the [Wiki|https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test] presented in the Java Doc, it indicates that the U statistic of this test is the minimum among U1 and U2. However, when I look into Apache Commons Math {{MannWhitneyUTest.mannWhitneyU()}} method, it returns the maximum of U1 and U2. In fact, the code of this method is the following: 
>  
> {code:java}
> public double mannWhitneyU(double[] x, double[] y) throws NullArgumentException, NoDataException {
>   this.ensureDataConformance(x, y);
>   double[] z = this.concatenateSamples(x, y);
>   double[] ranks = this.naturalRanking.rank(z);
>   double sumRankX = 0.0D;
>   for(int i = 0; i < x.length; ++i) {
>     sumRankX += ranks[i];
>   }
>   double U1 = sumRankX - (double)((long)x.length * (long)(x.length + 1) / 2L);
>   double U2 = (double)((long)x.length * (long)y.length) - U1;
>   return FastMath.max(U1, U2);
> }
> {code}
> Also, in the Java Doc it is stated that the maximum value of U1 and U2 is returned.
>  
> My question is why Apache Commons returns the maximum of those two values, whereas all other sources I found online indicate returning the minimum? If this is not wrong, then shouldn't the Java Doc be updated to include a source that justifies that the maximum U should be returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)