You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Otmar Ertl (JIRA)" <ji...@apache.org> on 2015/08/08 16:41:46 UTC

[jira] [Commented] (MATH-1253) Skewness could get more precision from slightly reordered code.

    [ https://issues.apache.org/jira/browse/MATH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662994#comment-14662994 ] 

Otmar Ertl commented on MATH-1253:
----------------------------------

The given example is problematic anyway, because the numbers (1.234E11, 1.234E51, 1.234E101, 1.234E151) are many orders of magnitude apart, for which cancellation effects are a major issue when using floating point types. I can reproduce the NaN result for values that have either large or small exponents, for example (1.234E148, 1.234E149, 1.234E150, 1.234E151) and (1.234E-148, 1.234E-149, 1.234E-150, 1.234E-151). The proposed code does not avoid this over-/underflows either, because the division is performed after the two multiplications which cause the over-/underflow. However, following code would work:
{code}
            final double variance = (accum - (accum2 * accum2 / length)) / (length - 1);
            final double stdDevInverse = 1d / FastMath.sqrt(variance);

            double accum3 = 0.0;
            for (int i = begin; i < begin + length; i++) {
                final double d = (values[i] - m) * stdDevInverse;
                accum3 += d * d * d;
            }
{code}
(Here the inverse is precomputed, in order to avoid additional divisions which are likely to be more expensive than multiplications.) Nevertheless, I am not sure, If we really should fix that. The variance calculation is also prone to over-/underflows for numbers greater than sqrt(Double.MAX_VALUE) and smaller than sqrt(Double.MIN_VALUE), respectively. So the fix would "slightly" extend the valid input range from (cbrt(Double.MIN_VALUE), cbrt(Double.MAX_VALUE)) to (sqrt(Double.MIN_VALUE), sqrt(Double.MAX_VALUE)) at the expense of an additional multiplication within the loop. Of course, if there was also an accurate variance calculation method, that avoids over-/underflows for values outside of (sqrt(Double.MIN_VALUE), sqrt(Double.MAX_VALUE)), this fix would make much more sense to me.

> Skewness could get more precision from slightly reordered code.
> ---------------------------------------------------------------
>
>                 Key: MATH-1253
>                 URL: https://issues.apache.org/jira/browse/MATH-1253
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Bill Murphy
>            Priority: Minor
>
> In Skewness.java, approx line 180, there is code like:
> {noformat}
>             double accum3 = 0.0;
>             for (int i = begin; i < begin + length; i++) {
>                 final double d = values[i] - m;
>                 accum3 += d * d * d;
>             }
>             accum3 /= variance * FastMath.sqrt(variance);
> {noformat}
> If the division was moved into the for loop, accum3 would be less likely to overflow to Infinity (or -Infinity). This might allow computation to return a result in a case such as:
> {noformat}
> double[] numArray = { 1.234E11, 1.234E51, 1.234E101, 1.234E151 };
> Skewness    skew = new Skewness();
> double    sk = skew.evaluate( numArray );
> {noformat}
> Currently, this returns NaN, but I'd prefer it returned approx 1.154700538379252.
> The change I'm proposing would have the code instead read like:
> {noformat}
>             double accum3 = 0.0;
>             double divisor = variance * FastMath.sqrt(variance);
>             for (int i = begin; i < begin + length; i++) {
>                 final double d = values[i] - m;
>                 accum3 += d * d * d / divisor;
>             }
> {noformat}
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)