You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Otmar Ertl (JIRA)" <ji...@apache.org> on 2015/08/08 16:41:46 UTC
[jira] [Commented] (MATH-1253) Skewness could get more precision
from slightly reordered code.
[ https://issues.apache.org/jira/browse/MATH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662994#comment-14662994 ]
Otmar Ertl commented on MATH-1253:
----------------------------------
The given example is problematic anyway, because the numbers (1.234E11, 1.234E51, 1.234E101, 1.234E151) are many orders of magnitude apart, for which cancellation effects are a major issue when using floating point types. I can reproduce the NaN result for values that have either large or small exponents, for example (1.234E148, 1.234E149, 1.234E150, 1.234E151) and (1.234E-148, 1.234E-149, 1.234E-150, 1.234E-151). The proposed code does not avoid this over-/underflows either, because the division is performed after the two multiplications which cause the over-/underflow. However, following code would work:
{code}
final double variance = (accum - (accum2 * accum2 / length)) / (length - 1);
final double stdDevInverse = 1d / FastMath.sqrt(variance);
double accum3 = 0.0;
for (int i = begin; i < begin + length; i++) {
final double d = (values[i] - m) * stdDevInverse;
accum3 += d * d * d;
}
{code}
(Here the inverse is precomputed, in order to avoid additional divisions which are likely to be more expensive than multiplications.) Nevertheless, I am not sure, If we really should fix that. The variance calculation is also prone to over-/underflows for numbers greater than sqrt(Double.MAX_VALUE) and smaller than sqrt(Double.MIN_VALUE), respectively. So the fix would "slightly" extend the valid input range from (cbrt(Double.MIN_VALUE), cbrt(Double.MAX_VALUE)) to (sqrt(Double.MIN_VALUE), sqrt(Double.MAX_VALUE)) at the expense of an additional multiplication within the loop. Of course, if there was also an accurate variance calculation method, that avoids over-/underflows for values outside of (sqrt(Double.MIN_VALUE), sqrt(Double.MAX_VALUE)), this fix would make much more sense to me.
> Skewness could get more precision from slightly reordered code.
> ---------------------------------------------------------------
>
> Key: MATH-1253
> URL: https://issues.apache.org/jira/browse/MATH-1253
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 3.5
> Reporter: Bill Murphy
> Priority: Minor
>
> In Skewness.java, approx line 180, there is code like:
> {noformat}
> double accum3 = 0.0;
> for (int i = begin; i < begin + length; i++) {
> final double d = values[i] - m;
> accum3 += d * d * d;
> }
> accum3 /= variance * FastMath.sqrt(variance);
> {noformat}
> If the division was moved into the for loop, accum3 would be less likely to overflow to Infinity (or -Infinity). This might allow computation to return a result in a case such as:
> {noformat}
> double[] numArray = { 1.234E11, 1.234E51, 1.234E101, 1.234E151 };
> Skewness skew = new Skewness();
> double sk = skew.evaluate( numArray );
> {noformat}
> Currently, this returns NaN, but I'd prefer it returned approx 1.154700538379252.
> The change I'm proposing would have the code instead read like:
> {noformat}
> double accum3 = 0.0;
> double divisor = variance * FastMath.sqrt(variance);
> for (int i = begin; i < begin + length; i++) {
> final double d = values[i] - m;
> accum3 += d * d * d / divisor;
> }
> {noformat}
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)