You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Alex Herbert <al...@gmail.com> on 2020/01/10 17:18:03 UTC

[numbers] LinearCombination dot product accuracy

I have been investigating the accuracy of the LinearCombination class. This was because I was using it to perform high precision computations in the Complex class and it did not achieve the desired precision.

The basis of the class is to compute dot products: sum a_i b_i. It does this by splitting numbers into a higher precision representation, multiplying them and using additions to track the round-off of the standard summation. This is then added at the end.

The case where it fails is for a linear combination whose total sum is close to 0 but the intermediate sum is far enough from zero that there is a large difference between the exponents of summed terms and the final result. The case is important for Complex numbers which require a computation of log1p(x^2+y^2-1) when x^2+y^2 is close to 1 such that log(1) would be ~zero and the true logarithm is representable to very high precision.

LinearCombination has a few things that could be changed. However I note that the class has been written to avoid conditional statements on magnitude and so far I have not done a speed analysis of the implications of changes. I can only estimate based on a guess that 1 conditional (|x| < |y|) is equal to 3 floating point operations (flops). This is a figure used by Ogita et al (2005) in the paper upon which LinearCombination is based [1].

For reference I have also referred to Dekker (1971) [2] and Shewchuk (1997) [3]. Dekker is widely used as the originator of split product and split addition where a double of 53-bits is split into two doubles of 26-bits each (1-bit is moved to the sign component) and the terms multiplied separately with exact precision. Shewchuk's paper expands on splits into 2 parts to produce algorithms that express a double as an expansion of n-parts. In either case the sum of the expansion is the original number without loss of accuracy. Dekker’s split addition requires knowing which part is larger in magnitude. An alternative is an algorithm from Knuth that uses more flops but does not require ordering comparators.

The algorithm in LinearCombination is an implementation of dot2 from Ogita el al (algorithm 5.3). There is subtle difference in that the original dot2 algorithm sums the round-off from the products and the round-off from the product summation together. The method in LinearCombination sums them separately (using an extra variable) and adds them at the end. This actually increases precision in my testing as the round-offs are on a different scale.

Here are the things I have tried changing:

1. Using Dekker’s algorithm to split the number. As stated earlier this extracts two 26-bit mantissas from a 53-bit mantis (in IEEE 754 the leading bit in front of the of the 52-bit mantissa is assumed to be 1). This is done by multiplication by 2^s+1 with s = ceil(53/2) = 27:

big = (2^s+1) * a
a_hi = (big - (big - a))

The extra bit of precision is carried into the sign of the low part of the split number.

This is in contrast to the method in LinearCombination that uses a simple mask on the long representation to obtain the a_hi part in 26-bits and the lower part will be 27 bits.

The advantage of Dekker’s method is it produces 2 parts with 26 bits in the mantissa that can be multiplied exactly. The disadvantage is the potential for overflow.

It also appropriately handles very small sub-normal numbers that would be masked to create a 0 high part with all the bits in the low part using the current method. This will have knock on effects on split multiplication which wants the high part to be larger.

2. Using Dekker’s dot product algorithm. This maintains a running split sum of the individual products. It requires ordering comparison per summation to ensure the error term is appropriately maintained.

3. I have tried a variant with higher order summation using 3 doubles to represent the expanded split sum. This would be a Klein summation [4] of the sum. However although the implementation was better it was not significantly better and I will not show results.

4. Using the dotK algorithm (algorithm 5.10) of Ogita et al. This method stores all the round-off parts from the products and sums in an array. If simply summed at the end it will match dot2 (K=2). When K>2 then repeated loops over the array are used with a split sum to create an error free transformation of the data. This is then summed. The transformation has the effect that data is distilled so that it is less effected by differences in the scale of each part to be summed.

The paper by Ogita state that sum3 is approximately 2.2x the run-time of sum2. The error is that of a K-fold working precision summation. Thus the sum is 3-fold precision verses 2-fold precision.

Results

I created increasingly large arrays composed of the quartet of numbers [a, b, c, d] with the condition that:

a^2 + b^2 = 1
c^2 + d^2 = 1

These are multiplied as the dot product [a, b, c, d] * [a, b, -c, -d]

So when summed the result should be 0. But there is a lot of floating-point error. I compute the correct result using BigDecimal. The scale of BigDecimal is clipped to -324 such that the BigDecimal does not go under Double.MIN_VALUE. This never happens in the test conditions AFAIK but it creates a realistic limit for the computation. I measure the error using units of least precision (ulps).

In the following results the first column is the total length of the data. Then each method is broken into:

Mean ulps
Standard deviation ulps
Median ups
Percentile 99 ulps
Percentile 99.9 ulps
Percentile 100 ulps (i.e. max error)
Error count (i.e. when ulps != 0)
Sign error (when the result has the wrong sign)

For all metrics lower is better. 100000 samples were run.

The methods are:

1. LinearCombination.value
2. LinearCombination.value modified to use Dekker’s split
3. Dekker’s dot product
4. Dot2s as described in Ogita el al
5. DotK with K=2
6. DotK with K=3

4 : 2.76,50.3,0,32,272,8220,40510,0 : 1.17,28.7,0,15,128,5888,21154,0 : 1.05,37.4,0,9,118,8192,19680,0 : 1.39,39.0,0,16,128,6144,23002,0 : 1.21,28.9,0,16,128,5888,21961,0 : 0.00,0.00,0,0,0,0,0,0
8 : 9.01,1.66e+03,1,37,352,524288,53469,0 : 1.96,57.2,0,18,168,12288,33874,0 : 1.46,55.9,0,15,128,16384,31287,0 : 2.32,58.1,0,23,240,12288,40110,0 : 2.42,118,0,20,192,32768,35870,0 : 0.00,0.00,0,0,0,0,0,0
12 : 5.04,280,1,40,426,73728,58035,0 : 2.53,88.2,0,23,256,23032,40911,0 : 1.96,96.3,0,16,160,23032,36646,0 : 3.36,103,0,31,272,15872,49167,0 : 2.70,89.2,0,24,266,23032,43062,0 : 0.00,0.00,0,0,0,0,0,0
16 : 4.93,207,1,40,428,45784,59624,0 : 3.44,198,0,26,248,45784,44578,0 : 1.88,63.1,0,16,152,13824,38925,0 : 4.05,106,1,36,404,13968,53602,0 : 3.64,198,0,29,276,45784,46609,0 : 0.00,0.00,0,0,0,0,0,0
20 : 5.37,209,1,44,449,42880,61169,0 : 3.45,139,0,30,300,31744,47876,0 : 2.05,85.4,0,18,191,24576,40393,0 : 4.55,133,1,42,448,31744,57253,0 : 3.73,141,0,32,336,31744,49605,0 : 0.00,0.00,0,0,0,0,0,0
24 : 5.41,241,1,44,430,66816,61880,0 : 3.98,225,0,30,294,66816,49771,0 : 1.89,43.5,0,18,192,7936,41070,0 : 4.96,226,1,42,432,66816,59481,0 : 4.20,226,1,32,332,66816,51326,0 : 0.00,0.00,0,0,0,0,0,0
28 : 7.27,424,1,48,499,106904,62698,0 : 4.37,180,1,35,388,41368,52005,0 : 2.34,74.7,0,19,198,14048,41758,0 : 6.86,330,1,48,461,61440,61403,0 : 4.58,181,1,38,396,41368,53506,0 : 0.00,0.00,0,0,0,0,0,0
32 : 7.99,802,1,50,467,245760,63450,0 : 7.73,1.00e+03,1,37,350,311296,53600,0 : 2.36,83.7,0,20,226,16384,42182,0 : 6.23,251,1,52,502,49152,62968,0 : 7.94,1.00e+03,1,39,366,311296,54999,0 : 0.00,0.00,0,0,0,0,0,0


Conclusions

Dot3 is perfect on this data. The algorithm effectively handles the poorly composed dot product as the final summation reorders the round-off errors.

The LinearCombination method can be made approximately 30% more accurate (measured by a drop in mean, median and 99 percentile) by switching to Dekker’s split algorithm.

The Dekker-split variant of LinearCombination outperforms the dot2s and dot2 algorithm from the Ogita paper. All methods are doing a standard precision summation of the round-off terms. Dot2s combines alternating product round-off and sum round-off in a single term. Dot2 will sum all product round-off and continue to add the summation round-off. LinearCombination keeps these separate thus starts summing the sum round-off from zero. In the case where the total sum is fluctuating around zero this is an advantage and the method has better performance.

Dekker’s dot product outperforms all the other methods that use a summation to 2-fold precision. This method requires a (|a| < |b|) comparison for each summation and 1 additional float addition over LinearCombination.value. They share 7muitply+10add per split product and 8add per split sum. So 25n flops for length n as per Ogita et al and so Dekker’s method is 29n flops (assuming an absolute float compare is 3 flops). Ogita et al state that the float compare for Dekker’s two-sum addition requires 6 flops but is up to 50% slower then the method from Knuth requiring 6 flops without branching. I have not verified this but with a modern processor the branch prediction and parallel computation per cycle may change this (the paper was in 2005).

The dot3 algorithm is fine with this test. I think there is a case for including this in LinearCombination when a higher accuracy is required without switching to BigDecimal. Here is an example of the code for dotK. I did not include the split, product and sum methods for brevity:

/**
 * Dot product to K-fold precision.
 *
 * @param a Factors.
 * @param b Factors.
 * @param k The precision.
 * @return \( \sum_i a_i b_i \).
 * @throws IllegalArgumentException if the sizes of the arrays are different.
 */
public static double dotK(double[] a,
                          double[] b, int k) {
    if (a.length != b.length) {
        throw new IllegalArgumentException("Dimension mismatch: " + a.length + " != " + b.length);
    }

    final int len = a.length;

    if (len == 1) {
        // Revert to scalar multiplication.
        return a[0] * b[0];
    }

    // Store all round-off parts
    final double[] r = new double[len * 2];

    // p is the standard scalar product sum initialised with the first product
    double p = a[0] * b[0];
    r[0] = productLow(a[0], b[0], p);

    // Remaining split products added to the current sum and round-off stored
    for (int i = 1; i < len; i++) {
        final double h = a[i] * b[i];
        r[i] = productLow(a[i], b[i], h);

        final double x = p + h;
        r[i + len - 1] = sumLow(p, h, x);
        p = x;
    }

    if (Double.isNaN(p)) {
        // Either we have split infinite numbers or some coefficients were NaNs,
        // return the naive implementation for the IEEE754 result
        return p;
    }

    // Sum the round-off with the standard sum as the final component
    r[r.length - 1] = p;
    return sumK(r, k - 1);
}

/**
 * Sum to K-fold precision.
 *
 * @param p Data to sum.
 * @param k The precision.
 * @return the sum
 */
private static double sumK(double[] p, int k) {
    // k=1 will skip the vector transformation and sum in standard precision.
    for (int i = 1; i < k; i++) {
        vecSum(p);
    }
    double sum = 0;
    for (final double pi : p) {
        sum += pi;
    }
    return sum;
}

/**
 * Error free vector transformation for summation.
 *
 * @param p Data.
 */
private static void vecSum(double[] p) {
    for (int i = 1; i < p.length; i++) {
        final double x = p[i] + p[i - 1];
        p[i - 1] = sumLow(p[i], p[i - 1], x);
        p[i] = x;
    }
}

It is possible to efficiently unroll this for small vectors as has been done in the class already. I did this in the Complex class while testing and the sumK part looks like this:

// Cascade summation of the terms p_i
s = p0 + p1;
p0 = sumLow(p0, p1, s);
p1 = s;
s += p2;
p1 = sumLow(p1, p2, s);
p2 = s;
s += p3;
p2 = sumLow(p2, p3, s);
p3 = s;
Etc …
return p0 + p1 + p2 + p3 + ...


For discussion:

1. Leave LinearCombination alone.

2. Update LinearCombination to use Dekker’s split algorithm and the rest leave unchanged. The split algorithm must be guarded against overflow. This adds two float comparison operations in extraction of the split part:

if (value > SAFE_UPPER || value < -SAFE_UPPER) {
   // handle overflow
   return …;
}
// normal conversion
final double c = MULTIPLIER * value;
return c - (c - value);


In the nominal case this condition is false and the rest of the split should be the same speed as the current conversion to long bits, masking and back to a double.

The change would improve the accuracy by ensuring split multiply has no error due to loss of spurious round-off bits from the final bit of the original mantissa.

I note that cm4.util.FastMathCalc uses this method but the factor is (2^30) and not the correct (2^27 + 1).

3. Add a method value(double[] a. double[] b, int k) with k-fold precision to complement the current 2-fold precision value(double[] a. double[] b).

4. Switch to Dekker’s algorithm. However I have no error analysis of whether this is comparable on different test sets. It is just better here.

There is a MatLab script in Ogita et al to generate vectors for multiplication. This may be worth investigating. For now I would not recommend switching to Dekker’s dot product unless there is evidence it is generally applicable.

5. Add methods value3(…) that use 3-fold precision and unroll the loops for small vectors.


I think that option 2 is preferred as all split multiply methods should use the correct split from Dekker.

I would vote for option 3 to add a higher precision version if a user desires. Since this may be 2x slower it may not be the best option for the default.

I would opt to implement option 5 and test it with JMH to see if dot3 is 2.2x slower than dot2 in the JVM.


Thus most of this can be put into a JMH module and tested against the current class to see the performance effect.

The type of changes that may end up in LinearCombination comes down to whether you want precision over speed. 


Opinions welcome.

Alex


[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547 <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547>
[2] https://doi.org/10.1007/BF01397083 <https://doi.org/10.1007/BF01397083>
[3] http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps <http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps>
[4] https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements <https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements>



Re: [numbers] LinearCombination dot product accuracy

Posted by Gilles Sadowski <gi...@gmail.com>.
Hi.

Le ven. 10 janv. 2020 à 22:45, Alex Herbert <al...@gmail.com> a écrit :
>
>
>
> > On 10 Jan 2020, at 18:12, Gilles Sadowski <gi...@gmail.com> wrote:
> >
> >> ...
> >
> > IIUC, precision (without resorting to "BigDecimal") was the purpose of
> > "LinearCombination".  +1 for making the appropriate changes to the
> > existing "value" method.
> > [I'd suggest to open a JIRA issue and mention the rest of the alternatives
> > there, for future reference; but I wouldn't add them to the API until there
> > is a need for it.]
>
> OK. If the purpose is precision and to avoid BigDecimal then switching to dot3 seems the best option as it performs one pass over the round-off errors to improve the summation in the case of badly conditioned dot products. This is an alternative to sorting the input products before (and during) summing and faster.
>
> I’ll write the code to allow expansion to dot4, dot5, etc but not add this to the API.
>
> Previously I used the timings from the summation only for the 2.2x slower estimate. The timings for the dot product are better. Dot2 to dot3 should be 12.5 to 18.5 so about 50% slower (table 6.5 in Ogita’s paper).
>
> I am going to set up a JMH module for numbers as I would like to benchmark the operations in Complex. This is for reference with regard to the idea to create the previously discussed ComplexList that stores the numbers as primitive arrays and then does operations by creating instances of Complex and then returning the results to the primitive arrays. I hope to show that for most operations the overhead of creating Complex objects is negligible. There may be cases where special implementations would be useful such as multiply or divide of two lists in a paired operation.
>
> I was going to use the format from [rng] for the JMH module thus:
>
> commons-numbers/commons-numbers-examples/examples-jmh
>
> However I do not have any other examples.

One never knows.  Perhaps others will come up.
E.g. usage of "Complex" in order to call JTransform...

Gilles

> So perhaps just a single child module that is included via a profile as it need not be a distributed artifact:
>
> commons-numbers/commons-numbers-mmh
>
> WDYT?
>
> I can build the new LinearCombination methods and put the old ones in the jmh project with a test of relative speed.
>
> Then I’ll get back to Complex (which is nearing completion) and the ComplexList.
>
> Alex
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [numbers] LinearCombination dot product accuracy

Posted by Alex Herbert <al...@gmail.com>.

> On 10 Jan 2020, at 18:12, Gilles Sadowski <gi...@gmail.com> wrote:
> 
>> ...
> 
> IIUC, precision (without resorting to "BigDecimal") was the purpose of
> "LinearCombination".  +1 for making the appropriate changes to the
> existing "value" method.
> [I'd suggest to open a JIRA issue and mention the rest of the alternatives
> there, for future reference; but I wouldn't add them to the API until there
> is a need for it.]

OK. If the purpose is precision and to avoid BigDecimal then switching to dot3 seems the best option as it performs one pass over the round-off errors to improve the summation in the case of badly conditioned dot products. This is an alternative to sorting the input products before (and during) summing and faster.

I’ll write the code to allow expansion to dot4, dot5, etc but not add this to the API.

Previously I used the timings from the summation only for the 2.2x slower estimate. The timings for the dot product are better. Dot2 to dot3 should be 12.5 to 18.5 so about 50% slower (table 6.5 in Ogita’s paper).

I am going to set up a JMH module for numbers as I would like to benchmark the operations in Complex. This is for reference with regard to the idea to create the previously discussed ComplexList that stores the numbers as primitive arrays and then does operations by creating instances of Complex and then returning the results to the primitive arrays. I hope to show that for most operations the overhead of creating Complex objects is negligible. There may be cases where special implementations would be useful such as multiply or divide of two lists in a paired operation.

I was going to use the format from [rng] for the JMH module thus:

commons-numbers/commons-numbers-examples/examples-jmh

However I do not have any other examples. So perhaps just a single child module that is included via a profile as it need not be a distributed artifact:

commons-numbers/commons-numbers-mmh

WDYT?

I can build the new LinearCombination methods and put the old ones in the jmh project with a test of relative speed.

Then I’ll get back to Complex (which is nearing completion) and the ComplexList.

Alex



Re: [numbers] LinearCombination dot product accuracy

Posted by Gilles Sadowski <gi...@gmail.com>.
Hi Alex.

Thanks a lot for this work (and all the "Complex" stuff)!

Le ven. 10 janv. 2020 à 18:18, Alex Herbert <al...@gmail.com> a écrit :
>
> I have been investigating the accuracy of the LinearCombination class. This was because I was using it to perform high precision computations in the Complex class and it did not achieve the desired precision.
>
> The basis of the class is to compute dot products: sum a_i b_i. It does this by splitting numbers into a higher precision representation, multiplying them and using additions to track the round-off of the standard summation. This is then added at the end.
>
> The case where it fails is for a linear combination whose total sum is close to 0 but the intermediate sum is far enough from zero that there is a large difference between the exponents of summed terms and the final result. The case is important for Complex numbers which require a computation of log1p(x^2+y^2-1) when x^2+y^2 is close to 1 such that log(1) would be ~zero and the true logarithm is representable to very high precision.
>
> LinearCombination has a few things that could be changed. However I note that the class has been written to avoid conditional statements on magnitude and so far I have not done a speed analysis of the implications of changes. I can only estimate based on a guess that 1 conditional (|x| < |y|) is equal to 3 floating point operations (flops). This is a figure used by Ogita et al (2005) in the paper upon which LinearCombination is based [1].
>
> For reference I have also referred to Dekker (1971) [2] and Shewchuk (1997) [3]. Dekker is widely used as the originator of split product and split addition where a double of 53-bits is split into two doubles of 26-bits each (1-bit is moved to the sign component) and the terms multiplied separately with exact precision. Shewchuk's paper expands on splits into 2 parts to produce algorithms that express a double as an expansion of n-parts. In either case the sum of the expansion is the original number without loss of accuracy. Dekker’s split addition requires knowing which part is larger in magnitude. An alternative is an algorithm from Knuth that uses more flops but does not require ordering comparators.
>
> The algorithm in LinearCombination is an implementation of dot2 from Ogita el al (algorithm 5.3). There is subtle difference in that the original dot2 algorithm sums the round-off from the products and the round-off from the product summation together. The method in LinearCombination sums them separately (using an extra variable) and adds them at the end. This actually increases precision in my testing as the round-offs are on a different scale.
>
> Here are the things I have tried changing:
>
> 1. Using Dekker’s algorithm to split the number. As stated earlier this extracts two 26-bit mantissas from a 53-bit mantis (in IEEE 754 the leading bit in front of the of the 52-bit mantissa is assumed to be 1). This is done by multiplication by 2^s+1 with s = ceil(53/2) = 27:
>
> big = (2^s+1) * a
> a_hi = (big - (big - a))
>
> The extra bit of precision is carried into the sign of the low part of the split number.
>
> This is in contrast to the method in LinearCombination that uses a simple mask on the long representation to obtain the a_hi part in 26-bits and the lower part will be 27 bits.
>
> The advantage of Dekker’s method is it produces 2 parts with 26 bits in the mantissa that can be multiplied exactly. The disadvantage is the potential for overflow.
>
> It also appropriately handles very small sub-normal numbers that would be masked to create a 0 high part with all the bits in the low part using the current method. This will have knock on effects on split multiplication which wants the high part to be larger.
>
> 2. Using Dekker’s dot product algorithm. This maintains a running split sum of the individual products. It requires ordering comparison per summation to ensure the error term is appropriately maintained.
>
> 3. I have tried a variant with higher order summation using 3 doubles to represent the expanded split sum. This would be a Klein summation [4] of the sum. However although the implementation was better it was not significantly better and I will not show results.
>
> 4. Using the dotK algorithm (algorithm 5.10) of Ogita et al. This method stores all the round-off parts from the products and sums in an array. If simply summed at the end it will match dot2 (K=2). When K>2 then repeated loops over the array are used with a split sum to create an error free transformation of the data. This is then summed. The transformation has the effect that data is distilled so that it is less effected by differences in the scale of each part to be summed.
>
> The paper by Ogita state that sum3 is approximately 2.2x the run-time of sum2. The error is that of a K-fold working precision summation. Thus the sum is 3-fold precision verses 2-fold precision.
>
> Results
>
> I created increasingly large arrays composed of the quartet of numbers [a, b, c, d] with the condition that:
>
> a^2 + b^2 = 1
> c^2 + d^2 = 1
>
> These are multiplied as the dot product [a, b, c, d] * [a, b, -c, -d]
>
> So when summed the result should be 0. But there is a lot of floating-point error. I compute the correct result using BigDecimal. The scale of BigDecimal is clipped to -324 such that the BigDecimal does not go under Double.MIN_VALUE. This never happens in the test conditions AFAIK but it creates a realistic limit for the computation. I measure the error using units of least precision (ulps).
>
> In the following results the first column is the total length of the data. Then each method is broken into:
>
> Mean ulps
> Standard deviation ulps
> Median ups
> Percentile 99 ulps
> Percentile 99.9 ulps
> Percentile 100 ulps (i.e. max error)
> Error count (i.e. when ulps != 0)
> Sign error (when the result has the wrong sign)
>
> For all metrics lower is better. 100000 samples were run.
>
> The methods are:
>
> 1. LinearCombination.value
> 2. LinearCombination.value modified to use Dekker’s split
> 3. Dekker’s dot product
> 4. Dot2s as described in Ogita el al
> 5. DotK with K=2
> 6. DotK with K=3
>
> 4 : 2.76,50.3,0,32,272,8220,40510,0 : 1.17,28.7,0,15,128,5888,21154,0 : 1.05,37.4,0,9,118,8192,19680,0 : 1.39,39.0,0,16,128,6144,23002,0 : 1.21,28.9,0,16,128,5888,21961,0 : 0.00,0.00,0,0,0,0,0,0
> 8 : 9.01,1.66e+03,1,37,352,524288,53469,0 : 1.96,57.2,0,18,168,12288,33874,0 : 1.46,55.9,0,15,128,16384,31287,0 : 2.32,58.1,0,23,240,12288,40110,0 : 2.42,118,0,20,192,32768,35870,0 : 0.00,0.00,0,0,0,0,0,0
> 12 : 5.04,280,1,40,426,73728,58035,0 : 2.53,88.2,0,23,256,23032,40911,0 : 1.96,96.3,0,16,160,23032,36646,0 : 3.36,103,0,31,272,15872,49167,0 : 2.70,89.2,0,24,266,23032,43062,0 : 0.00,0.00,0,0,0,0,0,0
> 16 : 4.93,207,1,40,428,45784,59624,0 : 3.44,198,0,26,248,45784,44578,0 : 1.88,63.1,0,16,152,13824,38925,0 : 4.05,106,1,36,404,13968,53602,0 : 3.64,198,0,29,276,45784,46609,0 : 0.00,0.00,0,0,0,0,0,0
> 20 : 5.37,209,1,44,449,42880,61169,0 : 3.45,139,0,30,300,31744,47876,0 : 2.05,85.4,0,18,191,24576,40393,0 : 4.55,133,1,42,448,31744,57253,0 : 3.73,141,0,32,336,31744,49605,0 : 0.00,0.00,0,0,0,0,0,0
> 24 : 5.41,241,1,44,430,66816,61880,0 : 3.98,225,0,30,294,66816,49771,0 : 1.89,43.5,0,18,192,7936,41070,0 : 4.96,226,1,42,432,66816,59481,0 : 4.20,226,1,32,332,66816,51326,0 : 0.00,0.00,0,0,0,0,0,0
> 28 : 7.27,424,1,48,499,106904,62698,0 : 4.37,180,1,35,388,41368,52005,0 : 2.34,74.7,0,19,198,14048,41758,0 : 6.86,330,1,48,461,61440,61403,0 : 4.58,181,1,38,396,41368,53506,0 : 0.00,0.00,0,0,0,0,0,0
> 32 : 7.99,802,1,50,467,245760,63450,0 : 7.73,1.00e+03,1,37,350,311296,53600,0 : 2.36,83.7,0,20,226,16384,42182,0 : 6.23,251,1,52,502,49152,62968,0 : 7.94,1.00e+03,1,39,366,311296,54999,0 : 0.00,0.00,0,0,0,0,0,0

Table are not nice to read in email. :-}
I'll trust the conclusion.

>
> Conclusions
>
> Dot3 is perfect on this data. The algorithm effectively handles the poorly composed dot product as the final summation reorders the round-off errors.
>
> The LinearCombination method can be made approximately 30% more accurate (measured by a drop in mean, median and 99 percentile) by switching to Dekker’s split algorithm.
>
> The Dekker-split variant of LinearCombination outperforms the dot2s and dot2 algorithm from the Ogita paper. All methods are doing a standard precision summation of the round-off terms. Dot2s combines alternating product round-off and sum round-off in a single term. Dot2 will sum all product round-off and continue to add the summation round-off. LinearCombination keeps these separate thus starts summing the sum round-off from zero. In the case where the total sum is fluctuating around zero this is an advantage and the method has better performance.
>
> Dekker’s dot product outperforms all the other methods that use a summation to 2-fold precision. This method requires a (|a| < |b|) comparison for each summation and 1 additional float addition over LinearCombination.value. They share 7muitply+10add per split product and 8add per split sum. So 25n flops for length n as per Ogita et al and so Dekker’s method is 29n flops (assuming an absolute float compare is 3 flops). Ogita et al state that the float compare for Dekker’s two-sum addition requires 6 flops but is up to 50% slower then the method from Knuth requiring 6 flops without branching. I have not verified this but with a modern processor the branch prediction and parallel computation per cycle may change this (the paper was in 2005).
>
> The dot3 algorithm is fine with this test. I think there is a case for including this in LinearCombination when a higher accuracy is required without switching to BigDecimal. Here is an example of the code for dotK. I did not include the split, product and sum methods for brevity:
>
> /**
>  * Dot product to K-fold precision.
>  *
>  * @param a Factors.
>  * @param b Factors.
>  * @param k The precision.
>  * @return \( \sum_i a_i b_i \).
>  * @throws IllegalArgumentException if the sizes of the arrays are different.
>  */
> public static double dotK(double[] a,
>                           double[] b, int k) {
>     if (a.length != b.length) {
>         throw new IllegalArgumentException("Dimension mismatch: " + a.length + " != " + b.length);
>     }
>
>     final int len = a.length;
>
>     if (len == 1) {
>         // Revert to scalar multiplication.
>         return a[0] * b[0];
>     }
>
>     // Store all round-off parts
>     final double[] r = new double[len * 2];
>
>     // p is the standard scalar product sum initialised with the first product
>     double p = a[0] * b[0];
>     r[0] = productLow(a[0], b[0], p);
>
>     // Remaining split products added to the current sum and round-off stored
>     for (int i = 1; i < len; i++) {
>         final double h = a[i] * b[i];
>         r[i] = productLow(a[i], b[i], h);
>
>         final double x = p + h;
>         r[i + len - 1] = sumLow(p, h, x);
>         p = x;
>     }
>
>     if (Double.isNaN(p)) {
>         // Either we have split infinite numbers or some coefficients were NaNs,
>         // return the naive implementation for the IEEE754 result
>         return p;
>     }
>
>     // Sum the round-off with the standard sum as the final component
>     r[r.length - 1] = p;
>     return sumK(r, k - 1);
> }
>
> /**
>  * Sum to K-fold precision.
>  *
>  * @param p Data to sum.
>  * @param k The precision.
>  * @return the sum
>  */
> private static double sumK(double[] p, int k) {
>     // k=1 will skip the vector transformation and sum in standard precision.
>     for (int i = 1; i < k; i++) {
>         vecSum(p);
>     }
>     double sum = 0;
>     for (final double pi : p) {
>         sum += pi;
>     }
>     return sum;
> }
>
> /**
>  * Error free vector transformation for summation.
>  *
>  * @param p Data.
>  */
> private static void vecSum(double[] p) {
>     for (int i = 1; i < p.length; i++) {
>         final double x = p[i] + p[i - 1];
>         p[i - 1] = sumLow(p[i], p[i - 1], x);
>         p[i] = x;
>     }
> }
>
> It is possible to efficiently unroll this for small vectors as has been done in the class already. I did this in the Complex class while testing and the sumK part looks like this:
>
> // Cascade summation of the terms p_i
> s = p0 + p1;
> p0 = sumLow(p0, p1, s);
> p1 = s;
> s += p2;
> p1 = sumLow(p1, p2, s);
> p2 = s;
> s += p3;
> p2 = sumLow(p2, p3, s);
> p3 = s;
> Etc …
> return p0 + p1 + p2 + p3 + ...
>
>
> For discussion:
>
> 1. Leave LinearCombination alone.
>
> 2. Update LinearCombination to use Dekker’s split algorithm and the rest leave unchanged. The split algorithm must be guarded against overflow. This adds two float comparison operations in extraction of the split part:
>
> if (value > SAFE_UPPER || value < -SAFE_UPPER) {
>    // handle overflow
>    return …;
> }
> // normal conversion
> final double c = MULTIPLIER * value;
> return c - (c - value);
>
>
> In the nominal case this condition is false and the rest of the split should be the same speed as the current conversion to long bits, masking and back to a double.
>
> The change would improve the accuracy by ensuring split multiply has no error due to loss of spurious round-off bits from the final bit of the original mantissa.
>
> I note that cm4.util.FastMathCalc uses this method but the factor is (2^30) and not the correct (2^27 + 1).
>
> 3. Add a method value(double[] a. double[] b, int k) with k-fold precision to complement the current 2-fold precision value(double[] a. double[] b).
>
> 4. Switch to Dekker’s algorithm. However I have no error analysis of whether this is comparable on different test sets. It is just better here.
>
> There is a MatLab script in Ogita et al to generate vectors for multiplication. This may be worth investigating. For now I would not recommend switching to Dekker’s dot product unless there is evidence it is generally applicable.
>
> 5. Add methods value3(…) that use 3-fold precision and unroll the loops for small vectors.
>
>
> I think that option 2 is preferred as all split multiply methods should use the correct split from Dekker.
>
> I would vote for option 3 to add a higher precision version if a user desires. Since this may be 2x slower it may not be the best option for the default.
>
> I would opt to implement option 5 and test it with JMH to see if dot3 is 2.2x slower than dot2 in the JVM.
>
>
> Thus most of this can be put into a JMH module and tested against the current class to see the performance effect.
>
> The type of changes that may end up in LinearCombination comes down to whether you want precision over speed.

IIUC, precision (without resorting to "BigDecimal") was the purpose of
"LinearCombination".  +1 for making the appropriate changes to the
existing "value" method.
[I'd suggest to open a JIRA issue and mention the rest of the alternatives
there, for future reference; but I wouldn't add them to the API until there
is a need for it.]

Thanks,
Gilles

>
>
> Opinions welcome.
>
> Alex
>
>
> [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547 <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547>
> [2] https://doi.org/10.1007/BF01397083 <https://doi.org/10.1007/BF01397083>
> [3] http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps <http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps>
> [4] https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements <https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org