You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Kexin Xie (JIRA)" <ji...@apache.org> on 2016/08/31 19:36:20 UTC

[jira] [Comment Edited] (MATH-1381) BinomialTest P-value > 1

    [ https://issues.apache.org/jira/browse/MATH-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453154#comment-15453154 ] 

Kexin Xie edited comment on MATH-1381 at 8/31/16 7:35 PM:
----------------------------------------------------------

Hi [~erans], thanks for looking at the PR. I agree with you that this does seems like it's a dirty fix and mask a potential bug in the computation.

However, the main problem here is that there is one corner case that the current algorithm did not consider. Which is that if the probability is large enough and the success is the same as the number of trials and both numbers are small enough, it will cause the {{criticalValueLow}} to rise too quickly and be the same as {{criticalValueHigh}}. The if condition in L138 is suppose to check the symmetry case when {{pLow == pHigh}}, but is not for the case when {{criticalValueLow == criticalValueHigh}}. At that point the probability will always jump to above 1.

It may seem like a dirty fix, but I have checked against results in R, and Python's scipy equivalent, and they produce the same value. I implemented this way because it actually works in handling this boundary condition, and it's the least change to the original implementation. Note that Python's scipy also uses a similar approach to deal with estimated value rising above 1 https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/morestats.py#L1661

I've also updated with more exhaustive test cases, please have a look again. Also I think the current implementation is correct as explained above, but I'm happy to change the estimation algorithm if that's required.


was (Author: kexinxie):
Hi [~erans], thanks for looking at the PR. I agree with you that this does seems like it's a dirty fix and mask a potential bug in the computation.

However, the main problem here is that there is one corner case that the current algorithm did not consider. Which is that if the probability is large enough and the success is the same as the number of trials and both numbers are small enough, it will cause the {{criticalValueLow}} to rise too quickly and be the same as {{criticalValueHigh}}. The if condition in L138 is suppose to check the symmetry case when {{pLow == pHigh}}, but is not for the case when {{criticalValueLow == criticalValueHigh}}. At that point the probability will always jump to above 1.

It may seem like a dirty fix, but I have checked against results in R, and Python's scipy equivalent, and they produce the same value. I implemented this way because it actually works in handling this boundary condition, and it's the least change to the original implementation. Note that Python's scipy also uses a similar approach to deal with estimated value rising above 1 https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/morestats.py#L1661

I've also updated with more exhaustive test cases, please have a look again.

> BinomialTest P-value > 1
> ------------------------
>
>                 Key: MATH-1381
>                 URL: https://issues.apache.org/jira/browse/MATH-1381
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Wang Qiang
>
> When I use the Binomial Test, I got p-value > 1 for two sided check.
> Example:
> (new BinomialTest()).binomialTest(200, 200, 0.9950429, AlternativeHypothesis.TWO_SIDED) == 1.3701357550780435
> In my case, if the expected p-value is 1 (calculated by package in other language, scipy in this case), the p-value returned could be > 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)