You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Alex Herbert <al...@gmail.com> on 2021/10/25 16:09:18 UTC

[statistics][numbers] Update the Gamma functions in Commons Numbers

Many of the issues within [statistics] have now been resolved. There
are some remaining issues that are linked to the gamma functions in
[numbers]:

Large values of the mean in the Poisson distribution have low
precision for the CDF. The distribution also has a configurable
threshold for convergence of the gamma function evaluation
(STATISTICS-38 [1]). Ideally this implementation detail should be
removed from the API.

The Gamma distribution has a function switch based on a threshold to
compute the PDF. This threshold results in incorrect values for
certain parameters (STATISTICS-39 [2]).

The function switch in the Gamma distribution is based on the
documentation for the Boost gamma functions [3]. However it does not
implement all the suggested optimisations detailed in the most recent
Boost documentation. This includes avoiding using the converging
series of the gamma function when the convergence is slow or unstable,
i.e. addresses the need for a configurable convergence threshold in
the Poisson distribution.

One part of the evaluation of the incomplete gamma functions require
accuracy in the leading term:

x^a exp(-x) / gamma(a) = exp(a log(x) - x - loggamma(a)

When x and a are large then using logs to compute this leads to
cancellation. Use of logs to compute this term is done in the current
implementation in numbers for RegularizedGamma P and Q. It is the
source of low precision for the CDF for the Poisson distribution when
the mean is large.

I propose to port the Boost gamma functions to numbers. This will be a
process similar to the recent port of the error functions to the same
package in numbers [4,5]. These functions improved both accuracy and
speed over the existing implementation. Once the gamma functions are
ported a comparison can be made between the existing and new
implementations.

Alex


[1] https://issues.apache.org/jira/browse/STATISTICS-38
[2] https://issues.apache.org/jira/browse/STATISTICS-39
[3] https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html
[4] https://issues.apache.org/jira/browse/NUMBERS-171
[5] https://issues.apache.org/jira/browse/NUMBERS-172

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [statistics][numbers] Update the Gamma functions in Commons Numbers

Posted by Alex Herbert <al...@gmail.com>.

On Tue, 26 Oct 2021 at 13:54, Gilles Sadowski <gi...@gmail.com> wrote:
<-- SNIP -->

> There is also
>   https://issues.apache.org/jira/browse/NUMBERS-167
> that definitely should entail a performance gain not achievable with the
> current API (a single "static" function).

I will keep this in mind when porting the Boost gamma function. This
enhancement was to avoid repeat computation of log(x) or logGamma(a).
The Boost implementation of the gamma function has many computation
paths depending on the parameters. Sometimes these factors would not
be required but there may be other constant expressions on alternate
branches that we can cache.

I plan to keep the Boost implementation in separate package private
classes. The existing RegularizedGamma class can call this. We can
then expose a new API to exploit the incomplete Gamma function as a
univariate function (with one parameter fixed).

The Boost library has implementations of all the gamma functions
currently in the gamma package. Most should be drop-in replacements
for the current functions. The Boost versions should be used if they
increase accuracy. This can be easily tested as the Boost source
provides high accuracy reference data.

The following functions would be additions to the library:

incomplete gamma : tgamma(a, z)
lower incomplete gamma : tgamma_lower(a, z)

(tgamma == true gamma)

These are computed using the same routines as the regularized
incomplete gamma functions P and Q. It would make sense to expose them
in a public API.

There is also the derivative of the regularized incomplete gamma
function. This is exposing some of the working of the incomplete gamma
function as this is the density function for the gamma distribution.
Exposing this will solve the issue with computation of the
GammaDistribution PDF which attempts to perform this function via the
Lanczos approximation but is not accurate for all parameters.

One functionally breaking change is a different Lanczos
implementation. It uses a different Lanczos factor g from the
implementation currently in the LanczosApproximation class. There is a
big section on this in their documentation on how they chose the
optimal constant g for each floating-point precision [1]. Their work
extends the work of Godfrey which is the basis for the numbers
implementation. Their functions uses g = 6.02 for a double. The
current implementation in numbers uses 4.74. However this number is
exposed via a method. Any use of the function value would have to
access the constant g via a method. Thus if the approximation is
changed then dependent code should continue to function correctly if
the Lanczos computation and g are updated  and if the dependent code
accesses g through the provided method.

One advantage of using the Boost Lanczos approximation is the
provision of a value that is pre-divided by exp(g). This has use in
computing logGamma. A comparison between the current and Boost
implementation for accuracy and speed should be done.

Alex

[1] https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/lanczos.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [statistics][numbers] Update the Gamma functions in Commons Numbers

Posted by Gilles Sadowski <gi...@gmail.com>.

Hello.

Le lun. 25 oct. 2021 à 18:09, Alex Herbert <al...@gmail.com> a écrit :
>
> Many of the issues within [statistics] have now been resolved. There
> are some remaining issues that are linked to the gamma functions in
> [numbers]:

Thanks for these accuracy and performance enhancements.

> [...]
>
> [1] https://issues.apache.org/jira/browse/STATISTICS-38
> [2] https://issues.apache.org/jira/browse/STATISTICS-39
> [3] https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html
> [4] https://issues.apache.org/jira/browse/NUMBERS-171
> [5] https://issues.apache.org/jira/browse/NUMBERS-172
>

There is also
  https://issues.apache.org/jira/browse/NUMBERS-167
that definitely should entail a performance gain not achievable with the
current API (a single "static" function).
As discussed in the comments, it seems that the required functionality
would kludge the existing class(es); so I wondered whether we should
keep those untouched (except for improving the implementation) and
provide an alternative API in new (factory) classes that would call and
cache the low-level "static" functions.

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org