You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Alex Herbert (Jira)" <ji...@apache.org> on 2021/07/28 17:11:00 UTC

[jira] [Commented] (STATISTICS-32) Add survival probability function to discrete distributions

    [ https://issues.apache.org/jira/browse/STATISTICS-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388891#comment-17388891 ] 

Alex Herbert commented on STATISTICS-32:
----------------------------------------

I have attempted to implement this for the discrete distributions using R as the reference for high precision result values. See [PR 28|https://github.com/apache/commons-statistics/pull/28].

Two distributions use the RegularizedBeta function to compute the CDF and survival function.
 - BinomialDistribution
 - PascalDistribution (i.e. a negative binomial)

There is an identity that can be used here:
{noformat}
1 - I_z(a, b) = I_{1-z}(b, a)
{noformat}
In both distributions the z value is the probability of success. Thus if you compute 1 - p as p approaches zero the 1 - p value is not exact. For very small p (less than 2^-53) the value 1 - p will be 1. Thus for these computations I have used 1 - p only when p >= 0.5 and thus 1 - p is exact. The aim is to maintain p to the closest value input by the user. This however may not compute the most accurate value for the probability. See the example for the Pascal distribution:
{code:java}
@Override
public double survivalProbability(int x) {
    double ret;
    if (x < 0) {
        ret = 1.0;
    } else if (probabilityOfSuccess >= 0.5) {
        // 1 - p is exact.
        // Use the identity of the regularized beta function: 1 - I_z(a, b) = I_{1-z}(b, a)
        ret = RegularizedBeta.value(1.0 - probabilityOfSuccess,
                                    x + 1.0, numberOfSuccesses);
    } else {
        ret = 1.0 - RegularizedBeta.value(probabilityOfSuccess,
                                          numberOfSuccesses, x + 1.0);
    }
    return ret;
}
{code}
Depending on the parameters p and x either computation may be more accurate.

The internals of RegularizedBeta.value actually detect and use this identity:
{code:java}
    public static double value(double x,
                               final double a,
                               final double b,
                               double epsilon,
                               int maxIterations) {
        if (...) {
            return Double.NaN;
        } else if (x > (a + 1) / (2 + b + a) &&
                   1 - x <= (b + 1) / (2 + b + a)) {
            return 1 - value(1 - x, b, a, epsilon, maxIterations);
        } else {
            // compute ...
        }
    }
{code}
I will investigate using logic to call the RegularizedBeta with the most appropriate arguments to avoid it hitting the condition where it computes 1 - value. The unit tests I have already added for high precision should detect if the function is being correctly used.

> Add survival probability function to discrete distributions
> -----------------------------------------------------------
>
>                 Key: STATISTICS-32
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-32
>             Project: Apache Commons Statistics
>          Issue Type: New Feature
>            Reporter: Benjamin W Trent
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Sibling issue to: STATISTICS-31
> It is useful to know the [survival function|https://en.wikipedia.org/wiki/Survival_function] of a number given a discrete distribution.
> While this can be approximated with
> {noformat}
> 1 - cdf(x){noformat}
> , there is an opportunity for greater accuracy in certain distributions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)