You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Sébastien Brisard (JIRA)" <ji...@apache.org> on 2012/05/11 07:58:46 UTC

[jira] [Issue Comment Edited] (MATH-753) Better implementation for the gamma distribution density function

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273053#comment-13273053 ] 

Sébastien Brisard edited comment on MATH-753 at 5/11/12 5:58 AM:
-----------------------------------------------------------------

The attached file {{MATH-753.tar.gz}} compares the accuracy of four different calculation methods for the density.
* method 1 is the currently implemented method (leading to overflows)
* method 2 is the method proposed by Francesco (exp(log(...))
* method 3 is based on BOOST
* method 4 is the recommended method, partly based on method 3.

The provided files check the accuracy of the methods for several values of the shape parameter (scale is not really relevant to this problem: used 1). Reference values where computed with Maxima (with a precision of 64 digits, for exactly representable double values of the argument.

The report shows statistics on the error in ulps. For each method, two statistics are computed
* the first one corresponds to the case where the currently implemented method does not lead to overflow. Even in this case, the proposed method (4) achieves better accuracy
* the second one corresponds to the case where the currently implemented method does lead to overflow. In this case, method 4 is superior to method 2. Regardless of the method, the accuracy is worse than in the first case; this was expected.

As a conclusion, I suggest we implement method 4.

Here is the output of the test
{noformat}
---------- x not leading to overflow, shape = 1.0 ----------
Initial method
SummaryStatistics:
n: 3200
min: 2.0
max: 4.0
mean: 2.8724999999999934
geometric mean: 2.784676841906384
variance: 0.4982744607689899
sum of squares: 27998.0
standard deviation: 0.7058855861745513
sum of logs: 3277.2218605584508

Method proposed by Francesco
SummaryStatistics:
n: 3200
min: 0.0
max: 67.0
mean: 12.791562499999992
geometric mean: 0.0
variance: 148.30446145279777
sum of squares: 998023.0
standard deviation: 12.17803192033909
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3200
min: 1.0
max: 2.0
mean: 1.4162499999999971
geometric mean: 1.3344543929682238
variance: 0.24306189434198203
sum of squares: 7196.0
standard deviation: 0.49301307725250254
sum of logs: 923.2720445058158

---------- x not leading to overflow, shape = 10.0 ----------
Initial method
SummaryStatistics:
n: 400
min: 1.0
max: 6.0
mean: 3.14
geometric mean: 3.0020752311936034
variance: 0.8324812030075192
sum of squares: 4276.0
standard deviation: 0.9124040787981601
sum of logs: 439.72151730195765

Method proposed by Francesco
SummaryStatistics:
n: 400
min: 0.0
max: 70.0
mean: 14.739999999999998
geometric mean: 0.0
variance: 144.37834586466164
sum of squares: 144514.0
standard deviation: 12.015754069747834
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 400
min: 0.0
max: 3.0
mean: 0.6075000000000004
geometric mean: 0.0
variance: 0.3593421052631577
sum of squares: 291.0
standard deviation: 0.5994515036791197
sum of logs: -Infinity

---------- x not leading to overflow, shape = 100.0 ----------
Initial method
SummaryStatistics:
n: 393
min: 4.0
max: 12.0
mean: 7.806615776081426
geometric mean: 7.619643567166563
variance: 2.916588772913744
sum of squares: 25094.0
standard deviation: 1.7078023225519234
sum of logs: 798.076729908358

Method proposed by Francesco
SummaryStatistics:
n: 393
min: 2.0
max: 824.0
mean: 164.59796437659037
geometric mean: 106.88809881158063
variance: 18545.179791764032
sum of squares: 1.7917059E7
standard deviation: 136.18068802794335
sum of logs: 1836.0105153185225

Proposed method, based on BOOST
SummaryStatistics:
n: 393
min: 0.0
max: 4.0
mean: 1.1679389312977109
geometric mean: 0.0
variance: 0.5176429350366102
sum of squares: 739.0
standard deviation: 0.7194740683559139
sum of logs: -Infinity

---------- x not leading to overflow, shape = 142.0 ----------
Initial method
SummaryStatistics:
n: 614
min: 0.0
max: 551.0
mean: 400.01140065146564
geometric mean: 0.0
variance: 7323.946036207893
sum of squares: 1.02735179E8
standard deviation: 85.5800562993966
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 614
min: 0.0
max: 1421.0
mean: 408.09609120521105
geometric mean: 0.0
variance: 64951.369217975334
sum of squares: 1.42072235E8
standard deviation: 254.8555850240982
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 614
min: 0.0
max: 90.0
mean: 0.8387622149837135
geometric mean: 0.0
variance: 21.84182293520944
sum of squares: 13821.0
standard deviation: 4.6735236102120465
sum of logs: -Infinity

---------- x leading to overflow, shape = 142.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 986
min: 2.0
max: 1434.0
mean: 427.799188640973
geometric mean: 297.73157979362674
variance: 88404.8205475645
sum of squares: 2.67528724E8
standard deviation: 297.32948146385434
sum of logs: 5616.4456488656715

Proposed method, based on BOOST
SummaryStatistics:
n: 986
min: 0.0
max: 203.0
mean: 30.59837728194727
geometric mean: 0.0
variance: 962.2852359427928
sum of squares: 1871004.0
standard deviation: 31.020722685695006
sum of logs: -Infinity

---------- x not leading to overflow, shape = 1000.0 ----------
Initial method
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

---------- x leading to overflow, shape = 1000.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 3252
min: 0.0
max: 11647.0
mean: 2645.6626691266915
geometric mean: 0.0
variance: 6222262.112872347
sum of squares: 4.2991048807E10
standard deviation: 2494.4462537549985
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3252
min: 0.0
max: 1966.0
mean: 150.69772447724463
geometric mean: 0.0
variance: 47598.90552542646
sum of squares: 2.28596325E8
standard deviation: 218.17173402030443
sum of logs: -Infinity
{noformat}
                
      was (Author: celestin):
    The attached file {{MATH-753.tar.gz}} compares the accuracy of four different calculation methods for the density.
* method 1 is the currently implemented method (leading to overflows)
* method 2 is the method proposed by Francesco (exp(log(...))
* method 3 is based on BOOST
* method 4 is the recommended method, partly based on method 3.

The provided files check the accuracy of the methods for several values of the shape parameter (scale is not really relevant to this problem: used 1). Reference values where computed with Maxima (with a precision of 64 digits, for exactly representable double values of the argument.

The report shows statistics on the error in ulps. For each method, two statistics are computed
* the first one corresponds to the case where the currently implemented method does not lead to overflow. Even in this case, the proposed method (4) achieves better accuracy
* the second one corresponds to the case where the currently implemented method does lead to overflow. In this case, method 4 is superior to method 2.

As a conclusion, I suggest we implement method 4.

Here is the output of the test
{noformat}
---------- x not leading to overflow, shape = 1.0 ----------
Initial method
SummaryStatistics:
n: 3200
min: 2.0
max: 4.0
mean: 2.8724999999999934
geometric mean: 2.784676841906384
variance: 0.4982744607689899
sum of squares: 27998.0
standard deviation: 0.7058855861745513
sum of logs: 3277.2218605584508

Method proposed by Francesco
SummaryStatistics:
n: 3200
min: 0.0
max: 67.0
mean: 12.791562499999992
geometric mean: 0.0
variance: 148.30446145279777
sum of squares: 998023.0
standard deviation: 12.17803192033909
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3200
min: 1.0
max: 2.0
mean: 1.4162499999999971
geometric mean: 1.3344543929682238
variance: 0.24306189434198203
sum of squares: 7196.0
standard deviation: 0.49301307725250254
sum of logs: 923.2720445058158

---------- x not leading to overflow, shape = 10.0 ----------
Initial method
SummaryStatistics:
n: 400
min: 1.0
max: 6.0
mean: 3.14
geometric mean: 3.0020752311936034
variance: 0.8324812030075192
sum of squares: 4276.0
standard deviation: 0.9124040787981601
sum of logs: 439.72151730195765

Method proposed by Francesco
SummaryStatistics:
n: 400
min: 0.0
max: 70.0
mean: 14.739999999999998
geometric mean: 0.0
variance: 144.37834586466164
sum of squares: 144514.0
standard deviation: 12.015754069747834
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 400
min: 0.0
max: 3.0
mean: 0.6075000000000004
geometric mean: 0.0
variance: 0.3593421052631577
sum of squares: 291.0
standard deviation: 0.5994515036791197
sum of logs: -Infinity

---------- x not leading to overflow, shape = 100.0 ----------
Initial method
SummaryStatistics:
n: 393
min: 4.0
max: 12.0
mean: 7.806615776081426
geometric mean: 7.619643567166563
variance: 2.916588772913744
sum of squares: 25094.0
standard deviation: 1.7078023225519234
sum of logs: 798.076729908358

Method proposed by Francesco
SummaryStatistics:
n: 393
min: 2.0
max: 824.0
mean: 164.59796437659037
geometric mean: 106.88809881158063
variance: 18545.179791764032
sum of squares: 1.7917059E7
standard deviation: 136.18068802794335
sum of logs: 1836.0105153185225

Proposed method, based on BOOST
SummaryStatistics:
n: 393
min: 0.0
max: 4.0
mean: 1.1679389312977109
geometric mean: 0.0
variance: 0.5176429350366102
sum of squares: 739.0
standard deviation: 0.7194740683559139
sum of logs: -Infinity

---------- x not leading to overflow, shape = 142.0 ----------
Initial method
SummaryStatistics:
n: 614
min: 0.0
max: 551.0
mean: 400.01140065146564
geometric mean: 0.0
variance: 7323.946036207893
sum of squares: 1.02735179E8
standard deviation: 85.5800562993966
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 614
min: 0.0
max: 1421.0
mean: 408.09609120521105
geometric mean: 0.0
variance: 64951.369217975334
sum of squares: 1.42072235E8
standard deviation: 254.8555850240982
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 614
min: 0.0
max: 90.0
mean: 0.8387622149837135
geometric mean: 0.0
variance: 21.84182293520944
sum of squares: 13821.0
standard deviation: 4.6735236102120465
sum of logs: -Infinity

---------- x leading to overflow, shape = 142.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 986
min: 2.0
max: 1434.0
mean: 427.799188640973
geometric mean: 297.73157979362674
variance: 88404.8205475645
sum of squares: 2.67528724E8
standard deviation: 297.32948146385434
sum of logs: 5616.4456488656715

Proposed method, based on BOOST
SummaryStatistics:
n: 986
min: 0.0
max: 203.0
mean: 30.59837728194727
geometric mean: 0.0
variance: 962.2852359427928
sum of squares: 1871004.0
standard deviation: 31.020722685695006
sum of logs: -Infinity

---------- x not leading to overflow, shape = 1000.0 ----------
Initial method
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

---------- x leading to overflow, shape = 1000.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 3252
min: 0.0
max: 11647.0
mean: 2645.6626691266915
geometric mean: 0.0
variance: 6222262.112872347
sum of squares: 4.2991048807E10
standard deviation: 2494.4462537549985
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3252
min: 0.0
max: 1966.0
mean: 150.69772447724463
geometric mean: 0.0
variance: 47598.90552542646
sum of squares: 2.28596325E8
standard deviation: 218.17173402030443
sum of logs: -Infinity
{noformat}
                  
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: MATH-753.tar.gz, lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira