You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Alex Herbert (Jira)" <ji...@apache.org> on 2023/06/29 11:21:00 UTC

[jira] [Commented] (NUMBERS-193) Add support for extended precision floating-point numbers

    [ https://issues.apache.org/jira/browse/NUMBERS-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738504#comment-17738504 ] 

Alex Herbert commented on NUMBERS-193:
--------------------------------------

I have created a JMH benchmark for 3 variants of the internal DD class in the Statistics distribution package:
||Implementation||Description||
|static mutable|Has mutable fields for the high and low part of the double-double number. All methods are static and write results to a mutable result argument.
 
The current implementation in Statistics.|
|static immutable|Has immutable (final) fields for the high and low part. All methods are static and return a new DD instance.|
|OO immutable|Has immutable (final) fields for the high and low part. All methods are instance and return a new DD instance.|

I tested the implementations using the computation for the one-sided one-sample Kolmogorov-Smirnov distribution. This accepts arguments n for the sample size and x for the statistic (in [0, 1]). All implementations compute the same result.

The computation can be performed two ways and the choice is based on the arguments. The parameters tested have been chosen to exercise the more time consuming code path in the computation, or a range of parameters to exercise both code paths.

Using a simple counter I tracked the number of object instances created for the single test cases and the range test case:
||n||x min||x max||samples||Objects||
|10000|0.01| |1|282500|
|10000|0.05| |1|510159|
|100000|0.01| |1|2578505|
|100000|0.05| |1|4300223|
|1000|0.001|0.5|500|8543836|

For the range test case the samples were linearly spaced between x min and max. The table shows that the cases create a non-trivial amount of new instances of a double-double number.

Note: Object tracking was done for reference. All JMH tests had this code removed. The mutable implementation will create only 3 object instances per computation.
h2. Results

Tested using JDK 11:

OpenJDK 64-Bit Server VM (build 11.0.19+7-post-Ubuntu-0ubuntu120.04.1, mixed mode, sharing).

*Single Cases*
||n||x||static mutable||static immutable||Relative||OO immutable||Relative||
|10000|0.01|4758677.4|4766630.6|1.002|4888738.5|1.027|
| |0.05|3833546.9|3929037.5|1.025|4058091.9|1.059|
|100000|0.01|51637595.6|52368277.9|1.014|53024335.0|1.027|
| |0.05|34571369.7|34994716.9|1.012|35030851.3|1.013|

*Range Case*
||n||lx||ux||samples||static mutable||static immutable||Relative||OO immutable||Relative||
|1000|0.001|0.5|500|113759933|114414561|1.006|110909383|0.975|
h2. Observations

There is very little speed difference across the implementations. For the single cases the mutable implementation is slightly faster. For the case using 500 samples across a range the OO immutable implementation is slightly faster. This is despite the large number of new objects created. These objects are transient within the computation method and therefore should be optimised in memory allocation by the JVM using escape analysis.

These relative differences have been observed to change slightly across different parameterisations of the tests to favour one or other of the implementations. The differences are small enough to justify rewriting the DD class as an immutable object with little or no loss of performance.

Note that I tested on JDK 17 and the results were similar. No test was done on JDK 8 as the Numbers JMH project where I tried the code targets Java 9 (for reasons unrelated to this code).
h2. Method Implementations

There are different ways to implement some of the computations on a double-double number. These compromise accuracy for speed. The following table shows methods I tested with the accuracy reported using the epsilon (eps) of a double: 2^-53:
||Method||Description||
|fastAdd|Error of 4 eps^2|
|add|Error of 1 eps^2|
|fastMultiply|multiplication without checks; error of 16 eps^2|
|multiply|multiplication with checks for intermediate overflow and non-finite input arguments; error of 16 eps^2|

The operations were tested using random double-double numbers with the high part in the range [-1, 1], i.e. not hitting edge cases for the multiply checks.
||Method||samples||regular||fast||Relative||
|add|1000|17772.6|12952.4|0.728783508119157|
|multiply|1000|13820.5|13354.6|0.966289411350109|

There is a significant difference in speed between fastAdd and add. The difference between the fastMultiply and multiply is small.
h3. Notes on Add

The two implementations of add were created for use in the Kolmogorov-Smirnov computation. One computation has a summation of thousands of terms. Here fastAdd can be used as any cumulative error is unlikely to affect the final rounding of the double-double to a double. The other computation has a summation of alternately signed terms where correct maintenance of the round-off is more important for the final rounding. Thus there is a use case for both. But adding both to an API may be confusing.
h3. Note on Multiply

Note that the fastMultiply and multiply compute the same result. The only difference is that for a single operation that will compute a non-finite result the fastMultiply will compute NaN and multiply will compute the correct infinity. This difference may be significant for a single operation. But when the double-double number is used as part of a larger set of arithmetic the difference will be mute. This is due to the fact that addition of infinite double-double numbers will create NaN. This results from subtraction of same signed infinite values during round-off computations, e.g. Inf - Inf == NaN.

To support correct propagation of infinite values through all the operations (add, subtract, multiply, divide) would entail a significantly larger and more complex implementation of the double-double number. I would suggest that this support is omitted and the double-double used for computations well within the finite range of a double. This can be assisted by provision of a method to represent the number scaled to the range [0.5, 1) with a base 2 exponent ([frexp|https://www.cplusplus.com/reference/cmath/frexp/]) and to scale numbers by a power of 2 ([ldexp|https://www.cplusplus.com/reference/cmath/ldexp/]; scalb).

Note that the use of scaling to a normal range and maintenance of a separate exponent is used within the Kolmogorov-Smirnov computation. This is in part simplified by knowing that individual terms should be probabilities when rescaled after intermediate computation, thus overflow is avoided and underflow computes a p-value of zero.
h1. Conclusion

Implementation of an immutable double-double extended precision floating-point number is viable. There is no significant performance difference to the current internal implementation in Statistics.

I propose adding a public version of the class to a new package for extended precision numbers, e.g.:
{noformat}
o.a.c.numbers.ext : DD
{noformat}
The initial implementation can forego any checks for overflow and thus carry a warning that NaNs will be generated by standard IEEE arithmetic that creates infinite values.
h2. TBD
 * Whether to support multiple implementations of operations, such as add and fastAdd
 * Final package name and class name

> Add support for extended precision floating-point numbers
> ---------------------------------------------------------
>
>                 Key: NUMBERS-193
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-193
>             Project: Commons Numbers
>          Issue Type: New Feature
>            Reporter: Alex Herbert
>            Priority: Major
>              Labels: full-time, gsoc2023, part-time
>
> Add implementations of extended precision floating point numbers.
> An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:
> {noformat}
> double-double (a, b):
> |a| > |b|
> a == a + b{noformat}
> Common representations are double-double and quad-double (see for example David Bailey's paper on a quad-double library: [QD|https://www.davidhbailey.com/dhbpapers/qd.pdf]).
> Many computations in the Commons Numbers and Statistics libraries use extended precision computations where the accumulated error of a double would lead to complete cancellation of all significant bits; or create intermediate overflow of integer values.
> This project would formalise the code underlying these use cases with a generic library applicable for use in the case where the result is expected to be a finite value and using Java's BigDecimal and/or BigInteger negatively impacts performance.
> An example would be the average of long values where the intermediate sum overflows or the conversion to a double loses bits:
> {code:java}
> long[] values = {Long.MAX_VALUE, Long.MAX_VALUE}; System.out.println(Arrays.stream(values).average().getAsDouble()); System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
>     .reduce(BigDecimal.ZERO, BigDecimal::add)
>     .divide(BigDecimal.valueOf(values.length)).doubleValue());
> long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE}; System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble()); System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
>     .reduce(BigDecimal.ZERO, BigDecimal::add)
>     .divide(BigDecimal.valueOf(values2.length)).doubleValue());
> {code}
> Outputs:
> {noformat}
> -1.0
> 9.223372036854776E18
> 0.0
> -0.5{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)