You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Alex Herbert (Jira)" <ji...@apache.org> on 2021/09/21 15:10:00 UTC

[jira] [Commented] (STATISTICS-34) Geometric distribution to switch PMF computation for increased accuracy

    [ https://issues.apache.org/jira/browse/STATISTICS-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418166#comment-17418166 ] 

Alex Herbert commented on STATISTICS-34:
----------------------------------------

I have compared the two formulas using BigDecimal to compute a high precision result:
{code:java}
BigDecimal bp = new BigDecimal(p);
BigDecimal b1mp = BigDecimal.ONE.subtract(bp);
MathContext mc = new MathContext(128);
double p3 = b1mp.pow(x, mc).multiply(bp, mc).doubleValue();
{code}
The formulas were evaluated using:
 * Generate a random p in an interval
 * Create x using the inverse CDF of a range of percentiles
 * Evaluate the functions for the unique set of x (ignore x=0 as this is always exact)
 * Compute the ULP difference to the BigDecimal result and summarise errors
 * Repeat until at a suitable sample size is achieved

Here are the results for percentiles from 0.1 to 99.9:
||lower||upper||samples||function1||mean||SD||Max||function2||mean||SD||Max||
|0.00|0.05|100097|pow|60.488|167.368|6391|exp|0.708|0.860|8|
|0.05|0.10|100009|pow|7.276|7.273|61|exp|0.994|1.085|8|
|0.10|0.15|100026|pow|4.631|4.812|33|exp|1.103|1.186|10|
|0.15|0.20|100008|pow|3.656|3.888|21|exp|1.106|1.175|8|
|0.20|0.25|100020|pow|3.058|3.203|19|exp|1.208|1.286|12|
|0.25|0.30|100009|pow|2.529|3.202|13|exp|1.243|1.280|9|
|0.30|0.35|100008|pow|2.301|3.012|13|exp|1.172|1.237|9|
|0.35|0.40|100005|pow|2.136|2.726|13|exp|1.227|1.303|10|
|0.40|0.45|100011|pow|1.942|2.482|12|exp|1.460|1.504|13|
|0.45|0.50|100004|pow|1.745|2.235|11|exp|1.189|1.244|10|
|0.50|0.55|100005|pow|0.232|0.422|1|exp|1.243|1.259|9|
|0.55|0.60|100006|pow|0.228|0.419|1|exp|1.145|1.252|8|
|0.60|0.65|100005|pow|0.230|0.421|1|exp|1.308|1.427|11|
|0.65|0.70|100004|pow|0.209|0.407|1|exp|1.449|1.518|10|
|0.70|0.75|100001|pow|0.236|0.425|1|exp|1.263|1.340|10|
|0.75|0.80|100000|pow|0.175|0.380|1|exp|1.060|1.108|8|
|0.80|0.85|100003|pow|0.192|0.394|1|exp|1.073|1.210|9|
|0.85|0.90|100002|pow|0.171|0.377|1|exp|1.416|1.493|11|
|0.90|0.95|100000|pow|0.132|0.338|1|exp|1.090|0.954|6|
|0.95|1.00|100001|pow|0.071|0.257|1|exp|1.073|0.912|6|

When p>=0.5 then the power formula is more accurate.

When p<0.5 then the exponential formula is more accurate and significantly reduces errors with small p.

I suggest updating the GeometricDistribution to switch implementations when p=0.5.

> Geometric distribution to switch PMF computation for increased accuracy
> -----------------------------------------------------------------------
>
>                 Key: STATISTICS-34
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-34
>             Project: Apache Commons Statistics
>          Issue Type: Improvement
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Trivial
>             Fix For: 1.0
>
>
> The Geometric distribution is define by the probability of success (p).
> The PMF is:
> {noformat}
> pmf(x) = pow(1 - p, x) * p
> {noformat}
> This can be implemented directly or using exponential functions:
> {code:java}
> double p1 = Math.pow(1.0 - p, x) * p;
> double p2 = Math.exp(Math.log1p(-p) * x) * p;
> {code}
> The current code uses exponential functions. Implementations in Matlab, R and SciPy all use the power function. Both have advantages depending on the value of p.
> When p is >= 0.5 the value (1-p) is exact. As p becomes increasingly small then (1-p) loses precision due to the limited precision of a double value close to 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)