You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Thomas Neidhart (JIRA)" <ji...@apache.org> on 2012/06/12 13:52:43 UTC
[jira] [Commented] (MATH-790) Mann-Whitney U Test Suffers From
Integer Overflow With Large Data Sets
[ https://issues.apache.org/jira/browse/MATH-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293552#comment-13293552 ]
Thomas Neidhart commented on MATH-790:
--------------------------------------
As discussed on the ML, there may be still a problem with integer overflow in the code fragment below:
{noformat}
final double n1n2prod = n1 * n2;
// http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Normal_approximation
final double EU = n1n2prod / 2.0;
final double VarU = n1n2prod * (n1 + n2 + 1) / 12.0;
final double z = (Umin - EU) / FastMath.sqrt(VarU);
{noformat}
The calculation of n1n2prod may still overflow if n1 and n2 are too big as it still does an int multiplication, so I would suggest to do it like that:
{noformat}
final long n1n2prod = (long) n1 * n2;
// http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Normal_approximation
final double EU = n1n2prod / 2.0;
final double VarU = n1n2prod * (n1 + n2 + 1) / 12.0;
final double z = (Umin - EU) / FastMath.sqrt(VarU);
{noformat}
> Mann-Whitney U Test Suffers From Integer Overflow With Large Data Sets
> ----------------------------------------------------------------------
>
> Key: MATH-790
> URL: https://issues.apache.org/jira/browse/MATH-790
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 3.0, Nightly Builds
> Environment: Ubuntu Linux x64, Sun Java 6
> Reporter: James Pickering
> Assignee: Mikkel Meyer Andersen
> Priority: Minor
> Labels: newbie, patch
> Fix For: 3.1
>
> Attachments: MannWhitnetUOVerflowPatch.diff
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> When performing a Mann-Whitney U Test on large data sets (the attached test uses two 1500 element sets), intermediate integer values used in calculateAsymptoticPValue can overflow, leading to invalid results, such as p-values of NaN, or incorrect calculations.
> Attached is a patch, including a test, and a fix, which modifies the affected code to use doubles
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira