You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Francesco Strino (Created) (JIRA)" <ji...@apache.org> on 2012/02/24 22:40:04 UTC

[jira] [Created] (MATH-753) Better implementation for the gamma distribution implementation

Better implementation for the gamma distribution implementation
---------------------------------------------------------------

                 Key: MATH-753
                 URL: https://issues.apache.org/jira/browse/MATH-753
             Project: Commons Math
          Issue Type: Improvement
    Affects Versions: 2.2
            Reporter: Francesco Strino
            Priority: Minor
             Fix For: 2.2


The way the density of the gamma distribution function is estimated can be improved.

It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 

It would be sufficient to change line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows (I write it down here because I don't know how to submit a patch with JIRA):

@Override
    public double density(double x) {
        if (x < 0) return 0;
        //return Math.pow(x / beta, alpha - 1) / beta * Math.exp(-x / beta) / Math.exp(Gamma.logGamma(alpha)); //unstable for large values
        return Math.exp(Math.log(x)*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
    }



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (Issue Comment Edited JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216529#comment-13216529 ] 

Sébastien Brisard edited comment on MATH-753 at 2/25/12 8:19 PM:
-----------------------------------------------------------------

This suits me fine. The only concern I have is that using exp(log( x )) in place of x might incur a loss of accuracy. Maybe we should use this substitution only when it is necessary (for large values of alpha and beta). This would require a little bit of investigation to find the appropriate thresholds.
                
      was (Author: celestin):
    This suits me fine. The only concern I have is that using exp(log(x)) in place of x might incur a loss of accuracy. Maybe we should use this substitution only when it is necessary (for large values of alpha and beta). This would require a little bit of investigation to find the appropriate thresholds.
                  
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 2.2
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (Commented JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216359#comment-13216359 ] 

Sébastien Brisard commented on MATH-753:
----------------------------------------

Thanks Francesco for reporting this issue. The fix you propose is indeed very simple. However, we should make sure that there is no loss of accuracy for smaller values of alpha and beta. I'd like to know what others think. Also, we are trying to stabilize version 3.0, so we do not have much time to investigate this right now.

Two options would be possible
# wait until 3.1 to include this fix,
# include this fix right now, but keep the ticket open, in order to thoroughly investigate accurracy issues in 3.1.
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 2.2
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (Commented JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216529#comment-13216529 ] 

Sébastien Brisard commented on MATH-753:
----------------------------------------

This suits me fine. The only concern I have is that using exp(log(x)) in place of x might incur a loss of accuracy. Maybe we should use this substitution only when it is necessary (for large values of alpha and beta). This would require a little bit of investigation to find the appropriate thresholds.
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 2.2
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276490#comment-13276490 ] 

Sébastien Brisard commented on MATH-753:
----------------------------------------

Unit tests of the new implementation of {{double density(double)}} in {{r1339014}}. This solves this issue.
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: MATH-753.tar.gz, lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272089#comment-13272089 ] 

Sébastien Brisard commented on MATH-753:
----------------------------------------

Patch {{lanczos.patch}} (09/May/12 10:38) committed in {{r1336483}}.
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271328#comment-13271328 ] 

Sébastien Brisard commented on MATH-753:
----------------------------------------

Well spotted! I will investigate...
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Gilles (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216716#comment-13216716 ] 

Gilles commented on MATH-753:
-----------------------------

As per my message on the "dev" ML, if you find out how to correctly fix this, no problem to apply it.

                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276417#comment-13276417 ] 

Sébastien Brisard commented on MATH-753:
----------------------------------------

Reference data for unit tests in {{r1338986}}.
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: MATH-753.tar.gz, lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sébastien Brisard resolved MATH-753.
------------------------------------

    Resolution: Fixed
    
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: MATH-753.tar.gz, lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sébastien Brisard updated MATH-753:
-----------------------------------

    Attachment:     (was: lanczos.patch)
    
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sébastien Brisard updated MATH-753:
-----------------------------------

    Attachment: lanczos.patch

Patch against {{r1336059}}.
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sébastien Brisard updated MATH-753:
-----------------------------------

    Attachment: lanczos.patch

{{lanczos.patch}} is the patch discussed in the the {{dev}} mailing-list:
{quote}
As I initially feared, what was proposed in the JIRA ticket leads to high floating point errors. I adapted a method proposed in [BOOST|http://www.boost.org/doc/libs/1_35_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_gamma/igamma.html] (formula (15))
with acceptable errors. Meanwhile, I've also managed to improve the accuracy of the computation of the density for the range of parameters where the previous implementation was already working: in this range, the accuracy _was_ about 300 ulps, and is now 1-2 ulps! {color:red}Note: I might have been too optimistic, here. There is a significant improvement, though{color}. I think this improvement is worth implementing.

The downside is that I need to expose the Lanczos implementation internally used by {{o.a.c.m3.special.Gamma}}. This approximation is so standard that I don't see it as a problem. I don't think that it reveals too much of the Gamma internals, since the javadoc of {{Gamma.logGamma}} states that it uses this approximation. So what I
propose is the addition of two methods in {{Gamma}}:
* {{double getLanczosG()}}: returns the {{g}} constant
* {{double lanczos(double)}}: returns the value of the Lanczos sum.
{quote}
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (Updated JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sébastien Brisard updated MATH-753:
-----------------------------------

    Affects Version/s: 3.1
                       3.0
        Fix Version/s:     (was: 2.2)
                       3.1
    
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Gilles (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271320#comment-13271320 ] 

Gilles commented on MATH-753:
-----------------------------

I noticed that the code contains this line:

{noformat}
private static final double DEFAULT_EPSILON = 10e-15;
{noformat}

This is an unusual way to write 1e-14. Or, more probably, 1e-15 was intended.

                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Francesco Strino (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Strino updated MATH-753:
----------------------------------

    Description: 
The way the density of the gamma distribution function is estimated can be improved.

It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 

It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:

return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 

In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.



  was:
The way the density of the gamma distribution function is estimated can be improved.

It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 

It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:


return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 



    
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 2.2
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Gilles (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216391#comment-13216391 ] 

Gilles commented on MATH-753:
-----------------------------

bq. wait until 3.1 to include this fix,

+1

bq. include this fix right now, but keep the ticket open, in order to thoroughly investigate accurracy issues in 3.1.

-1
Let's freeze the "trunk" as proposed by Luc. This is an enhancement that can wait a few days, the more so that you suspect that there could be side-effects.
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 2.2
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Francesco Strino (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Strino updated MATH-753:
----------------------------------

    Description: 
The way the density of the gamma distribution function is estimated can be improved.

It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 

It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:


return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 



  was:
The way the density of the gamma distribution function is estimated can be improved.

It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 

It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:


return Math.exp(Math.log(x)*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 



        Summary: Better implementation for the gamma distribution density function  (was: Better implementation for the gamma distribution implementation)
    
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 2.2
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sébastien Brisard updated MATH-753:
-----------------------------------

    Attachment: MATH-753.tar.gz

The attached file {{MATH-753.tar.gz}} compares the accuracy of four different calculation methods for the density.
* method 1 is the currently implemented method (leading to overflows)
* method 2 is the method proposed by Francesco (exp(log(...))
* method 3 is based on BOOST
* method 4 is the recommended method, partly based on method 3.

The provided files check the accuracy of the methods for several values of the shape parameter (scale is not really relevant to this problem: used 1). Reference values where computed with Maxima (with a precision of 64 digits, for exactly representable double values of the argument.

The report shows statistics on the error in ulps. For each method, two statistics are computed
* the first one corresponds to the case where the currently implemented method does not lead to overflow. Even in this case, the proposed method (4) achieves better accuracy
* the second one corresponds to the case where the currently implemented method does lead to overflow. In this case, method 4 is superior to method 2.

As a conclusion, I suggest we implement method 4.

Here is the output of the test
{noformat}
---------- x not leading to overflow, shape = 1.0 ----------
Initial method
SummaryStatistics:
n: 3200
min: 2.0
max: 4.0
mean: 2.8724999999999934
geometric mean: 2.784676841906384
variance: 0.4982744607689899
sum of squares: 27998.0
standard deviation: 0.7058855861745513
sum of logs: 3277.2218605584508

Method proposed by Francesco
SummaryStatistics:
n: 3200
min: 0.0
max: 67.0
mean: 12.791562499999992
geometric mean: 0.0
variance: 148.30446145279777
sum of squares: 998023.0
standard deviation: 12.17803192033909
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3200
min: 1.0
max: 2.0
mean: 1.4162499999999971
geometric mean: 1.3344543929682238
variance: 0.24306189434198203
sum of squares: 7196.0
standard deviation: 0.49301307725250254
sum of logs: 923.2720445058158

---------- x not leading to overflow, shape = 10.0 ----------
Initial method
SummaryStatistics:
n: 400
min: 1.0
max: 6.0
mean: 3.14
geometric mean: 3.0020752311936034
variance: 0.8324812030075192
sum of squares: 4276.0
standard deviation: 0.9124040787981601
sum of logs: 439.72151730195765

Method proposed by Francesco
SummaryStatistics:
n: 400
min: 0.0
max: 70.0
mean: 14.739999999999998
geometric mean: 0.0
variance: 144.37834586466164
sum of squares: 144514.0
standard deviation: 12.015754069747834
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 400
min: 0.0
max: 3.0
mean: 0.6075000000000004
geometric mean: 0.0
variance: 0.3593421052631577
sum of squares: 291.0
standard deviation: 0.5994515036791197
sum of logs: -Infinity

---------- x not leading to overflow, shape = 100.0 ----------
Initial method
SummaryStatistics:
n: 393
min: 4.0
max: 12.0
mean: 7.806615776081426
geometric mean: 7.619643567166563
variance: 2.916588772913744
sum of squares: 25094.0
standard deviation: 1.7078023225519234
sum of logs: 798.076729908358

Method proposed by Francesco
SummaryStatistics:
n: 393
min: 2.0
max: 824.0
mean: 164.59796437659037
geometric mean: 106.88809881158063
variance: 18545.179791764032
sum of squares: 1.7917059E7
standard deviation: 136.18068802794335
sum of logs: 1836.0105153185225

Proposed method, based on BOOST
SummaryStatistics:
n: 393
min: 0.0
max: 4.0
mean: 1.1679389312977109
geometric mean: 0.0
variance: 0.5176429350366102
sum of squares: 739.0
standard deviation: 0.7194740683559139
sum of logs: -Infinity

---------- x not leading to overflow, shape = 142.0 ----------
Initial method
SummaryStatistics:
n: 614
min: 0.0
max: 551.0
mean: 400.01140065146564
geometric mean: 0.0
variance: 7323.946036207893
sum of squares: 1.02735179E8
standard deviation: 85.5800562993966
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 614
min: 0.0
max: 1421.0
mean: 408.09609120521105
geometric mean: 0.0
variance: 64951.369217975334
sum of squares: 1.42072235E8
standard deviation: 254.8555850240982
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 614
min: 0.0
max: 90.0
mean: 0.8387622149837135
geometric mean: 0.0
variance: 21.84182293520944
sum of squares: 13821.0
standard deviation: 4.6735236102120465
sum of logs: -Infinity

---------- x leading to overflow, shape = 142.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 986
min: 2.0
max: 1434.0
mean: 427.799188640973
geometric mean: 297.73157979362674
variance: 88404.8205475645
sum of squares: 2.67528724E8
standard deviation: 297.32948146385434
sum of logs: 5616.4456488656715

Proposed method, based on BOOST
SummaryStatistics:
n: 986
min: 0.0
max: 203.0
mean: 30.59837728194727
geometric mean: 0.0
variance: 962.2852359427928
sum of squares: 1871004.0
standard deviation: 31.020722685695006
sum of logs: -Infinity

---------- x not leading to overflow, shape = 1000.0 ----------
Initial method
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

---------- x leading to overflow, shape = 1000.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 3252
min: 0.0
max: 11647.0
mean: 2645.6626691266915
geometric mean: 0.0
variance: 6222262.112872347
sum of squares: 4.2991048807E10
standard deviation: 2494.4462537549985
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3252
min: 0.0
max: 1966.0
mean: 150.69772447724463
geometric mean: 0.0
variance: 47598.90552542646
sum of squares: 2.28596325E8
standard deviation: 218.17173402030443
sum of logs: -Infinity
{noformat}
                
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: MATH-753.tar.gz, lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-753) Better implementation for the gamma distribution implementation

Posted by "Francesco Strino (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Strino updated MATH-753:
----------------------------------

    Description: 
The way the density of the gamma distribution function is estimated can be improved.

It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 

It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:


return Math.exp(Math.log(x)*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 



  was:
The way the density of the gamma distribution function is estimated can be improved.

It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 

It would be sufficient to change line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows (I write it down here because I don't know how to submit a patch with JIRA):

@Override
    public double density(double x) {
        if (x < 0) return 0;
        //return Math.pow(x / beta, alpha - 1) / beta * Math.exp(-x / beta) / Math.exp(Gamma.logGamma(alpha)); //unstable for large values
        return Math.exp(Math.log(x)*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
    }



    
> Better implementation for the gamma distribution implementation
> ---------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2
>            Reporter: Francesco Strino
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 2.2
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log(x)*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MATH-753) Better implementation for the gamma distribution density function

Posted by "Sébastien Brisard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MATH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273053#comment-13273053 ] 

Sébastien Brisard edited comment on MATH-753 at 5/11/12 5:58 AM:
-----------------------------------------------------------------

The attached file {{MATH-753.tar.gz}} compares the accuracy of four different calculation methods for the density.
* method 1 is the currently implemented method (leading to overflows)
* method 2 is the method proposed by Francesco (exp(log(...))
* method 3 is based on BOOST
* method 4 is the recommended method, partly based on method 3.

The provided files check the accuracy of the methods for several values of the shape parameter (scale is not really relevant to this problem: used 1). Reference values where computed with Maxima (with a precision of 64 digits, for exactly representable double values of the argument.

The report shows statistics on the error in ulps. For each method, two statistics are computed
* the first one corresponds to the case where the currently implemented method does not lead to overflow. Even in this case, the proposed method (4) achieves better accuracy
* the second one corresponds to the case where the currently implemented method does lead to overflow. In this case, method 4 is superior to method 2. Regardless of the method, the accuracy is worse than in the first case; this was expected.

As a conclusion, I suggest we implement method 4.

Here is the output of the test
{noformat}
---------- x not leading to overflow, shape = 1.0 ----------
Initial method
SummaryStatistics:
n: 3200
min: 2.0
max: 4.0
mean: 2.8724999999999934
geometric mean: 2.784676841906384
variance: 0.4982744607689899
sum of squares: 27998.0
standard deviation: 0.7058855861745513
sum of logs: 3277.2218605584508

Method proposed by Francesco
SummaryStatistics:
n: 3200
min: 0.0
max: 67.0
mean: 12.791562499999992
geometric mean: 0.0
variance: 148.30446145279777
sum of squares: 998023.0
standard deviation: 12.17803192033909
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3200
min: 1.0
max: 2.0
mean: 1.4162499999999971
geometric mean: 1.3344543929682238
variance: 0.24306189434198203
sum of squares: 7196.0
standard deviation: 0.49301307725250254
sum of logs: 923.2720445058158

---------- x not leading to overflow, shape = 10.0 ----------
Initial method
SummaryStatistics:
n: 400
min: 1.0
max: 6.0
mean: 3.14
geometric mean: 3.0020752311936034
variance: 0.8324812030075192
sum of squares: 4276.0
standard deviation: 0.9124040787981601
sum of logs: 439.72151730195765

Method proposed by Francesco
SummaryStatistics:
n: 400
min: 0.0
max: 70.0
mean: 14.739999999999998
geometric mean: 0.0
variance: 144.37834586466164
sum of squares: 144514.0
standard deviation: 12.015754069747834
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 400
min: 0.0
max: 3.0
mean: 0.6075000000000004
geometric mean: 0.0
variance: 0.3593421052631577
sum of squares: 291.0
standard deviation: 0.5994515036791197
sum of logs: -Infinity

---------- x not leading to overflow, shape = 100.0 ----------
Initial method
SummaryStatistics:
n: 393
min: 4.0
max: 12.0
mean: 7.806615776081426
geometric mean: 7.619643567166563
variance: 2.916588772913744
sum of squares: 25094.0
standard deviation: 1.7078023225519234
sum of logs: 798.076729908358

Method proposed by Francesco
SummaryStatistics:
n: 393
min: 2.0
max: 824.0
mean: 164.59796437659037
geometric mean: 106.88809881158063
variance: 18545.179791764032
sum of squares: 1.7917059E7
standard deviation: 136.18068802794335
sum of logs: 1836.0105153185225

Proposed method, based on BOOST
SummaryStatistics:
n: 393
min: 0.0
max: 4.0
mean: 1.1679389312977109
geometric mean: 0.0
variance: 0.5176429350366102
sum of squares: 739.0
standard deviation: 0.7194740683559139
sum of logs: -Infinity

---------- x not leading to overflow, shape = 142.0 ----------
Initial method
SummaryStatistics:
n: 614
min: 0.0
max: 551.0
mean: 400.01140065146564
geometric mean: 0.0
variance: 7323.946036207893
sum of squares: 1.02735179E8
standard deviation: 85.5800562993966
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 614
min: 0.0
max: 1421.0
mean: 408.09609120521105
geometric mean: 0.0
variance: 64951.369217975334
sum of squares: 1.42072235E8
standard deviation: 254.8555850240982
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 614
min: 0.0
max: 90.0
mean: 0.8387622149837135
geometric mean: 0.0
variance: 21.84182293520944
sum of squares: 13821.0
standard deviation: 4.6735236102120465
sum of logs: -Infinity

---------- x leading to overflow, shape = 142.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 986
min: 2.0
max: 1434.0
mean: 427.799188640973
geometric mean: 297.73157979362674
variance: 88404.8205475645
sum of squares: 2.67528724E8
standard deviation: 297.32948146385434
sum of logs: 5616.4456488656715

Proposed method, based on BOOST
SummaryStatistics:
n: 986
min: 0.0
max: 203.0
mean: 30.59837728194727
geometric mean: 0.0
variance: 962.2852359427928
sum of squares: 1871004.0
standard deviation: 31.020722685695006
sum of logs: -Infinity

---------- x not leading to overflow, shape = 1000.0 ----------
Initial method
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

---------- x leading to overflow, shape = 1000.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 3252
min: 0.0
max: 11647.0
mean: 2645.6626691266915
geometric mean: 0.0
variance: 6222262.112872347
sum of squares: 4.2991048807E10
standard deviation: 2494.4462537549985
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3252
min: 0.0
max: 1966.0
mean: 150.69772447724463
geometric mean: 0.0
variance: 47598.90552542646
sum of squares: 2.28596325E8
standard deviation: 218.17173402030443
sum of logs: -Infinity
{noformat}
                
      was (Author: celestin):
    The attached file {{MATH-753.tar.gz}} compares the accuracy of four different calculation methods for the density.
* method 1 is the currently implemented method (leading to overflows)
* method 2 is the method proposed by Francesco (exp(log(...))
* method 3 is based on BOOST
* method 4 is the recommended method, partly based on method 3.

The provided files check the accuracy of the methods for several values of the shape parameter (scale is not really relevant to this problem: used 1). Reference values where computed with Maxima (with a precision of 64 digits, for exactly representable double values of the argument.

The report shows statistics on the error in ulps. For each method, two statistics are computed
* the first one corresponds to the case where the currently implemented method does not lead to overflow. Even in this case, the proposed method (4) achieves better accuracy
* the second one corresponds to the case where the currently implemented method does lead to overflow. In this case, method 4 is superior to method 2.

As a conclusion, I suggest we implement method 4.

Here is the output of the test
{noformat}
---------- x not leading to overflow, shape = 1.0 ----------
Initial method
SummaryStatistics:
n: 3200
min: 2.0
max: 4.0
mean: 2.8724999999999934
geometric mean: 2.784676841906384
variance: 0.4982744607689899
sum of squares: 27998.0
standard deviation: 0.7058855861745513
sum of logs: 3277.2218605584508

Method proposed by Francesco
SummaryStatistics:
n: 3200
min: 0.0
max: 67.0
mean: 12.791562499999992
geometric mean: 0.0
variance: 148.30446145279777
sum of squares: 998023.0
standard deviation: 12.17803192033909
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3200
min: 1.0
max: 2.0
mean: 1.4162499999999971
geometric mean: 1.3344543929682238
variance: 0.24306189434198203
sum of squares: 7196.0
standard deviation: 0.49301307725250254
sum of logs: 923.2720445058158

---------- x not leading to overflow, shape = 10.0 ----------
Initial method
SummaryStatistics:
n: 400
min: 1.0
max: 6.0
mean: 3.14
geometric mean: 3.0020752311936034
variance: 0.8324812030075192
sum of squares: 4276.0
standard deviation: 0.9124040787981601
sum of logs: 439.72151730195765

Method proposed by Francesco
SummaryStatistics:
n: 400
min: 0.0
max: 70.0
mean: 14.739999999999998
geometric mean: 0.0
variance: 144.37834586466164
sum of squares: 144514.0
standard deviation: 12.015754069747834
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 400
min: 0.0
max: 3.0
mean: 0.6075000000000004
geometric mean: 0.0
variance: 0.3593421052631577
sum of squares: 291.0
standard deviation: 0.5994515036791197
sum of logs: -Infinity

---------- x not leading to overflow, shape = 100.0 ----------
Initial method
SummaryStatistics:
n: 393
min: 4.0
max: 12.0
mean: 7.806615776081426
geometric mean: 7.619643567166563
variance: 2.916588772913744
sum of squares: 25094.0
standard deviation: 1.7078023225519234
sum of logs: 798.076729908358

Method proposed by Francesco
SummaryStatistics:
n: 393
min: 2.0
max: 824.0
mean: 164.59796437659037
geometric mean: 106.88809881158063
variance: 18545.179791764032
sum of squares: 1.7917059E7
standard deviation: 136.18068802794335
sum of logs: 1836.0105153185225

Proposed method, based on BOOST
SummaryStatistics:
n: 393
min: 0.0
max: 4.0
mean: 1.1679389312977109
geometric mean: 0.0
variance: 0.5176429350366102
sum of squares: 739.0
standard deviation: 0.7194740683559139
sum of logs: -Infinity

---------- x not leading to overflow, shape = 142.0 ----------
Initial method
SummaryStatistics:
n: 614
min: 0.0
max: 551.0
mean: 400.01140065146564
geometric mean: 0.0
variance: 7323.946036207893
sum of squares: 1.02735179E8
standard deviation: 85.5800562993966
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 614
min: 0.0
max: 1421.0
mean: 408.09609120521105
geometric mean: 0.0
variance: 64951.369217975334
sum of squares: 1.42072235E8
standard deviation: 254.8555850240982
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 614
min: 0.0
max: 90.0
mean: 0.8387622149837135
geometric mean: 0.0
variance: 21.84182293520944
sum of squares: 13821.0
standard deviation: 4.6735236102120465
sum of logs: -Infinity

---------- x leading to overflow, shape = 142.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 986
min: 2.0
max: 1434.0
mean: 427.799188640973
geometric mean: 297.73157979362674
variance: 88404.8205475645
sum of squares: 2.67528724E8
standard deviation: 297.32948146385434
sum of logs: 5616.4456488656715

Proposed method, based on BOOST
SummaryStatistics:
n: 986
min: 0.0
max: 203.0
mean: 30.59837728194727
geometric mean: 0.0
variance: 962.2852359427928
sum of squares: 1871004.0
standard deviation: 31.020722685695006
sum of logs: -Infinity

---------- x not leading to overflow, shape = 1000.0 ----------
Initial method
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Method proposed by Francesco
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 65
min: 0.0
max: 0.0
mean: 0.0
geometric mean: 0.0
variance: 0.0
sum of squares: 0.0
standard deviation: 0.0
sum of logs: -Infinity

---------- x leading to overflow, shape = 1000.0 ----------
Method proposed by Francesco
SummaryStatistics:
n: 3252
min: 0.0
max: 11647.0
mean: 2645.6626691266915
geometric mean: 0.0
variance: 6222262.112872347
sum of squares: 4.2991048807E10
standard deviation: 2494.4462537549985
sum of logs: -Infinity

Proposed method, based on BOOST
SummaryStatistics:
n: 3252
min: 0.0
max: 1966.0
mean: 150.69772447724463
geometric mean: 0.0
variance: 47598.90552542646
sum of squares: 2.28596325E8
standard deviation: 218.17173402030443
sum of logs: -Infinity
{noformat}
                  
> Better implementation for the gamma distribution density function
> -----------------------------------------------------------------
>
>                 Key: MATH-753
>                 URL: https://issues.apache.org/jira/browse/MATH-753
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.2, 3.0, 3.1
>            Reporter: Francesco Strino
>            Assignee: Sébastien Brisard
>            Priority: Minor
>              Labels: improvement, stability
>             Fix For: 3.1
>
>         Attachments: MATH-753.tar.gz, lanczos.patch
>
>
> The way the density of the gamma distribution function is estimated can be improved.
> It's much more stable to calculate first the log of the density and then exponentiate, otherwise the function returns NaN for high values of the parameters alpha and beta. 
> It would be sufficient to change the public double density(double x) function at line 204 in the file org.apache.commons.math.distribution.GammaDistributionImpl as follows:
> return Math.exp(Math.log( x )*(alpha-1) - Math.log(beta)*alpha - x/beta - Gamma.logGamma(alpha)); 
> In order to improve performance, log(beta) and Gamma.logGamma(alpha) could also be precomputed and stored during initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira