You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "aherbert (via GitHub)" <gi...@apache.org> on 2023/09/05 22:45:32 UTC

[GitHub] [commons-statistics] aherbert commented on a diff in pull request #52: STATISTICS-80: Variance Implementation

aherbert commented on code in PR #52:
URL: https://github.com/apache/commons-statistics/pull/52#discussion_r1316339953


##########
commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/SumOfSquaredDeviations.java:
##########
@@ -36,31 +36,25 @@
  *
  * <p><strong>Note that this implementation is not synchronized.</strong> If
  * multiple threads access an instance of this class concurrently, and at least
- * one of the threads invokes the <code>accept()</code> or
- * <code>combine()</code> method, it must be synchronized externally.
+ * one of the threads invokes the {@link java.util.function.DoubleConsumer#accept(double) accept} or
+ * {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine} method, it must be synchronized externally.
  *
- * <p>However, it is safe to use <code>accept()</code> and <code>combine()</code>
+ * <p>However, it is safe to use {@link java.util.function.DoubleConsumer#accept(double) accept} and {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine}

Review Comment:
   Wrap this to 100 or 120 character width



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/TestData.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+import java.util.stream.Stream;
+import org.junit.jupiter.params.provider.Arguments;
+
+final class TestData {
+
+    /** Class contains only static methods. */
+    private TestData() {}
+
+    static Stream<double[]> testValues() {
+        return Stream.of(
+            new double[] {0.0},
+            new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5},
+            new double[] {8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68},
+            new double[] {9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74, 7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73},
+            new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8},
+            new double[] {6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89},
+            new double[] {0, 0, 0.0},
+            new double[] {1, -7, 6},
+            new double[] {1, 7, -15, 3},
+            new double[] {2, 2, 2, 2},
+            new double[] {2.3},
+            new double[] {3.14, 2.718, 1.414},
+            new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9, 12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3},
+            new double[] {-0.0, +0.0},
+            new double[] {0.0, -0.0},
+            new double[] {0.0, +0.0},
+            new double[] {0.001, 0.0002, 0.00003, 10000.11, 0.000004},
+            new double[] {10E-50, 5E-100, 25E-200, 35.345E-50},
+            // Overflow of the sum
+            new double[] {Double.MAX_VALUE, Double.MAX_VALUE},
+            new double[] {-Double.MAX_VALUE, -Double.MAX_VALUE},
+            new double[] {Double.MAX_VALUE, 1},
+            new double[] {-Double.MAX_VALUE, 1, 1},
+            new double[] {-Double.MAX_VALUE, -1, 1},
+            new double[] {Double.MAX_VALUE, -1},
+            new double[] {Double.MAX_VALUE, -Double.MAX_VALUE},
+            new double[] {1, -Double.MAX_VALUE},
+            new double[] {1, 1, 1, -Double.MAX_VALUE},
+            new double[] {Double.MAX_VALUE, Double.MAX_VALUE / 2},
+            new double[] {Double.MAX_VALUE, Double.MAX_VALUE, -Double.MAX_VALUE},
+            new double[] {Double.MAX_VALUE, -Double.MAX_VALUE}
+        );
+    }
+
+    static Stream<Arguments> testValuesNonFinite() {

Review Comment:
   The expected value should not be in this method. You are always using NaN so this is not helpful. Each statistic should be able to compute the expected value for the data that has non-finite values. For example the min/max/mean may return an infinity but the variance would be NaN.



##########
commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/Variance.java:
##########
@@ -114,11 +114,11 @@ public static Variance of(double... values) {
         double accum2 = 0.0;
         double squaredDevSum;
         for (final double value : values) {
-            dev = value - mean;
+            dev = value * 0.5 - mean * 0.5;

Review Comment:
   Why protect from overflow here when immediately you square the result?
   
   I removed this and the failing test case is: {MAX, MAX, -MAX}.
   
   This is similar to {2, 2, -2} which has a mean of 2/3 and a variance of 3.5555. So rescaled by 2^1023 squared the variance would be 3.5555 * (2^1023)^2. This is a large overflow. Your scaling down is protecting computing the deviation from the mean, but the deviation squared is infinite and the rest of the computation cannot be saved.
   
   I think your overflow detection can be done on the `accum` value and not `accum2`. We can document that if the sum of squared deviations from the mean overflows then the variance is returned as infinity. So we use the formula as defined (comments removed for brevity):
   ```java
           for (final double value : values) {
               dev = value - mean;
               accum += dev * dev;
               accum2 += dev;
           }
           final double accum2Squared = accum2 * accum2;
           final long n = values.length;
           if (accum == Double.POSITIVE_INFINITY) {
               squaredDevSum = Double.POSITIVE_INFINITY;
           } else {
               squaredDevSum = accum - (accum2Squared / n);
           }
   ```
   



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java:
##########
@@ -31,38 +31,55 @@
  * Test for {@link Variance}.
  */
 final class VarianceTest {
+    private static final int ULP_ARRAY = 5;

Review Comment:
   I wondered why you have a ULP for ARRAY_OF_ARRAYS. I see that combine uses the accept method but array of arrays uses the of method to construct each. Perhaps we should rename these to:
   ```
   ULP_COMBINE_ACCEPT
   ULP_COMBINE_OF
   ```
   Note: I have updated the `TestHelper` to output the actual ULP error in the message. This makes setting these limits a bit easier. I will push the updated TestHelper so you can rebase and collect the change.
   



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java:
##########
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+import java.math.BigDecimal;
+import java.math.MathContext;
+import java.util.Arrays;
+import java.util.stream.Stream;
+import org.apache.commons.rng.UniformRandomProvider;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+/**
+ * Test for {@link Variance}.
+ */
+final class VarianceTest {
+    private static final int ULP_ARRAY = 5;
+
+    private static final int ULP_STREAM = 7;
+
+    private static final int ULP_COMBINE = 7;
+
+    private static final int ULP_ARRAY_OF_ARRAYS = 2;
+
+    @Test
+    void testEmpty() {
+        Variance var = Variance.create();
+        Assertions.assertEquals(Double.NaN, var.getAsDouble());
+    }
+
+    @Test
+    void testNaN() {
+        Variance variance = Variance.create();
+        double[] testArray = {Double.NaN, +0.0d, -0.0d, Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY};
+        for (double value : testArray) {
+            variance.accept(value);
+        }
+        Assertions.assertEquals(Double.NaN, variance.getAsDouble());
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testValues")
+    void testVariance(double[] values) {
+        final double expected = computeExpectedVariance(values);
+        Variance var = Variance.create();
+        for (double value : values) {
+            var.accept(value);
+        }
+        TestHelper.assertEquals(expected, var.getAsDouble(), ULP_STREAM, () -> "variance");
+        TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), ULP_ARRAY, () -> "of (values)");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testValues")
+    void testParallelStream(double[] values) {
+        final double expected = computeExpectedVariance(values);
+        final double actual = Arrays.stream(values)
+                .parallel()
+                .collect(Variance::create, Variance::accept, Variance::combine)
+                .getAsDouble();
+        TestHelper.assertEquals(expected, actual, ULP_COMBINE, () -> "parallel stream");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testValues")
+    void testVarianceRandomOrder(double[] values) {
+        UniformRandomProvider rng = TestHelper.createRNG();
+        for (int i = 1; i <= 10; i++) {
+            testVariance(TestHelper.shuffle(rng, values));
+            testParallelStream(TestHelper.shuffle(rng, values));
+        }
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testValuesNonFinite", "testVarianceNonFinite"})
+    void testVarianceNonFinite(double[] values, double expected) {
+        Variance var = Variance.create();
+        for (double value : values) {
+            var.accept(value);
+        }
+        Assertions.assertEquals(expected, var.getAsDouble(), "variance non-finite");
+        Assertions.assertEquals(expected, Variance.of(values).getAsDouble(), "of (values) non-finite");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testValuesNonFinite", "testVarianceNonFinite"})
+    void testParallelStreamNonFinite(double[] values, double expected) {
+        final double ans = Arrays.stream(values)
+                .parallel()
+                .collect(Variance::create, Variance::accept, Variance::combine)
+                .getAsDouble();
+        Assertions.assertEquals(expected, ans, "parallel stream non-finite");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testValuesNonFinite", "testVarianceNonFinite"})
+    void testVarianceRandomOrderNonFinite(double[] values, double expected) {
+        UniformRandomProvider rng = TestHelper.createRNG();
+        for (int i = 1; i <= 10; i++) {
+            testVarianceNonFinite(TestHelper.shuffle(rng, values), expected);
+            testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected);
+        }
+    }
+
+    static Stream<Arguments> testVarianceNonFinite() {
+        return Stream.of(
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN),
+            Arguments.of(new double[]{Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY}, Double.NaN),
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.MAX_VALUE}, Double.NaN),
+            Arguments.of(new double[]{Double.NEGATIVE_INFINITY, -Double.MIN_VALUE}, Double.NaN),
+            Arguments.of(new double[]{Double.NEGATIVE_INFINITY, Double.MAX_VALUE}, Double.NaN),
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN),
+            Arguments.of(new double[]{-Double.MAX_VALUE, Double.POSITIVE_INFINITY}, Double.NaN)
+        );
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testCombine")
+    void testCombine(double[] array1, double[] array2) {
+        final double[] combinedArray = TestHelper.concatenate(array1, array2);
+        final double expected = computeExpectedVariance(combinedArray);
+        Variance var1 = Variance.create();
+        Variance var2 = Variance.create();
+        Arrays.stream(array1).forEach(var1);
+        Arrays.stream(array2).forEach(var2);
+        final double var1BeforeCombine = var1.getAsDouble();
+        final double var2BeforeCombine = var2.getAsDouble();
+        var1.combine(var2);
+        TestHelper.assertEquals(expected, var1.getAsDouble(), ULP_COMBINE, () -> "combine");
+        Assertions.assertEquals(var2BeforeCombine, var2.getAsDouble());
+        // Combine in reverse order
+        Variance var1b = Variance.create();
+        Arrays.stream(array1).forEach(var1b);
+        var2.combine(var1b);
+        TestHelper.assertEquals(expected, var2.getAsDouble(), ULP_COMBINE, () -> "combine");
+        Assertions.assertEquals(var1BeforeCombine, var1b.getAsDouble());
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testCombine")
+    void testCombineRandomOrder(double[] array1, double[] array2) {
+        UniformRandomProvider rng = TestHelper.createRNG();
+        double[] data = TestHelper.concatenate(array1, array2);
+        final int n = array1.length;
+        for (int i = 1; i <= 10; i++) {
+            for (int j = 1; j <= 10; j++) {
+                TestHelper.shuffle(rng, array1);
+                TestHelper.shuffle(rng, array2);
+                testCombine(array1, array2);
+            }
+            TestHelper.shuffle(rng, data);
+            System.arraycopy(data, 0, array1, 0, n);
+            System.arraycopy(data, n, array2, 0, array2.length);
+            testCombine(array1, array2);
+        }
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testCombine")
+    void testArrayOfArrays(double[] array1, double[] array2) {
+        final double[] combinedArray = TestHelper.concatenate(array1, array2);
+        final double expected = computeExpectedVariance(combinedArray);
+        final double[][] values = {array1, array2};
+        final double actual = Arrays.stream(values)
+                .map(Variance::of)
+                .reduce(Variance::combine)
+                .map(Variance::getAsDouble)
+                .orElseThrow(RuntimeException::new);
+        TestHelper.assertEquals(expected, actual, ULP_ARRAY_OF_ARRAYS, () -> "array of arrays combined variance");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testCombineNonFinite", "testCombineVarianceNonFinite"})
+    void testCombineNonFinite(double[][] values, double expected) {
+        Variance var1 = Variance.create();
+        Variance var2 = Variance.create();
+        Arrays.stream(values[0]).forEach(var1);
+        Arrays.stream(values[1]).forEach(var2);
+        final double mean2BeforeCombine = var2.getAsDouble();
+        var1.combine(var2);
+        Assertions.assertEquals(expected, var1.getAsDouble(), "combine non-finite");
+        Assertions.assertEquals(mean2BeforeCombine, var2.getAsDouble());
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testCombineNonFinite", "testCombineVarianceNonFinite"})
+    void testCombineRandomOrderNonFinite(double[][] values, double expected) {
+        UniformRandomProvider rng = TestHelper.createRNG();
+        final double[] data = TestHelper.concatenate(values[0], values[1]);
+        final int n = values[0].length;
+        for (int i = 1; i <= 10; i++) {
+            for (int j = 1; j <= 10; j++) {
+                TestHelper.shuffle(rng, values[0]);
+                TestHelper.shuffle(rng, values[1]);
+                testCombineNonFinite(values, expected);
+            }
+            TestHelper.shuffle(rng, data);
+            System.arraycopy(data, 0, values[0], 0, n);
+            System.arraycopy(data, n, values[1], 0, values[1].length);
+            testCombineNonFinite(values, expected);
+        }
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testCombineNonFinite", "testCombineVarianceNonFinite"})
+    void testArrayOfArraysNonFinite(double[][] values, double expected) {
+        final double actual = Arrays.stream(values)
+                .map(Variance::of)
+                .reduce(Variance::combine)
+                .map(Variance::getAsDouble)
+                .orElseThrow(RuntimeException::new);
+        Assertions.assertEquals(expected, actual, "array of arrays combined variance non-finite");
+    }
+
+    static Stream<Arguments> testCombineVarianceNonFinite() {

Review Comment:
   Again, these are not the same as the MeanTest. So we are missing cases in one or the other. Please consolidate them to TestData.



##########
commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/SumOfSquaredDeviations.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+/**
+ * Computes the sum of squared deviations from the sample mean. This
+ * statistic is related to the second moment.
+ *
+ * <p>
+ * The following recursive updating formula is used:
+ * <p>
+ * Let <ul>
+ * <li> dev = (current obs - previous mean) </li>
+ * <li> n = number of observations (including current obs) </li>
+ * </ul>
+ * Then
+ * <p>
+ * new value = old value + dev^2 * (n - 1) / n.
+ * <p>
+ *
+ * Returns the sum of squared deviations of all values seen so far.
+ *
+ * <p><strong>Note that this implementation is not synchronized.</strong> If
+ * multiple threads access an instance of this class concurrently, and at least
+ * one of the threads invokes the {@link java.util.function.DoubleConsumer#accept(double) accept} or
+ * {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine} method, it must be synchronized externally.
+ *
+ * <p>However, it is safe to use {@link java.util.function.DoubleConsumer#accept(double) accept} and {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine}
+ * as <code>accumulator</code> and <code>combiner</code> functions of
+ * {@link java.util.stream.Collector Collector} on a parallel stream,
+ * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()}
+ * provides the necessary partitioning, isolation, and merging of results for
+ * safe and efficient parallel execution.
+ */
+class SumOfSquaredDeviations extends FirstMoment implements DoubleStatistic {

Review Comment:
   I think we should drop the implementation of DoubleStatistic. You could just implement DoubleConsumer in FirstMoment and drop the implementation of DoubleStaticticAccumilator<FirstMoment>. You then do not have to overload getAsDouble through the hierarchy. You can have typed methods for combine  and specific methods for the stat:
   
   ```
   FirstMoment implements DoubleConsumer:
   void accept(double)
   FirstMoment combine(FirstMoment);
   double getFirstMoment()
   
   SumOfSquaredDeviations extends FirstMoment:
   SumOfSquaredDeviations combine(SumOfSquareDeviations);
   double getSumOfSquaredDeviations();
   ```
   
   This allows the SumOfSquareDeviations to call `super.combine(other)` and `getFirstMoment()` directly without creating copy instances.
   



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/TestData.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+import java.util.stream.Stream;
+import org.junit.jupiter.params.provider.Arguments;
+
+final class TestData {

Review Comment:
   Class and methods should have some comments.



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/MeanTest.java:
##########
@@ -152,37 +117,21 @@ void testMeanRandomOrderNonFinite(double[] values, double expected) {
 
     static Stream<Arguments> testMeanNonFinite() {
         return Stream.of(
-            Arguments.of(new double[] {}, Double.NaN),
-            Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY},
-                Double.NaN),
-            Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY},
-                Double.POSITIVE_INFINITY),
-            Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY},
-                Double.NEGATIVE_INFINITY),
-            Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.MAX_VALUE},
-                Double.POSITIVE_INFINITY),
-            Arguments.of(new double[] {Double.NEGATIVE_INFINITY, -Double.MIN_VALUE},
-                Double.NEGATIVE_INFINITY),
-            Arguments.of(new double[] {Double.NaN, 34.56, 89.74}, Double.NaN),
-            Arguments.of(new double[] {34.56, Double.NaN, 89.74}, Double.NaN),
-            Arguments.of(new double[] {34.56, 89.74, Double.NaN}, Double.NaN),
-            Arguments.of(new double[] {Double.NaN, 3.14, Double.NaN, Double.NaN},
-                Double.NaN),
-            Arguments.of(new double[] {Double.NaN, Double.NaN, Double.NaN}, Double.NaN),
-            Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.MAX_VALUE},
-                Double.NEGATIVE_INFINITY),
-            Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY,
-                Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.POSITIVE_INFINITY),
-            Arguments.of(new double[] {-Double.MAX_VALUE, Double.POSITIVE_INFINITY},
-                Double.POSITIVE_INFINITY)
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.POSITIVE_INFINITY),

Review Comment:
   Can these apply to the variance too, i.e. move them to TestData



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java:
##########
@@ -116,75 +100,65 @@ void testVarianceNonFinite(double[] values, double expected) {
     }
 
     @ParameterizedTest
-    @MethodSource(value = "testVarianceNonFinite")
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testValuesNonFinite", "testVarianceNonFinite"})
     void testParallelStreamNonFinite(double[] values, double expected) {
-        double ans = Arrays.stream(values)
+        final double ans = Arrays.stream(values)
                 .parallel()
                 .collect(Variance::create, Variance::accept, Variance::combine)
                 .getAsDouble();
         Assertions.assertEquals(expected, ans, "parallel stream non-finite");
     }
 
     @ParameterizedTest
-    @MethodSource(value = "testVarianceNonFinite")
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testValuesNonFinite", "testVarianceNonFinite"})
     void testVarianceRandomOrderNonFinite(double[] values, double expected) {
         UniformRandomProvider rng = TestHelper.createRNG();
         for (int i = 1; i <= 10; i++) {
             testVarianceNonFinite(TestHelper.shuffle(rng, values), expected);
             testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected);
         }
     }
+
     static Stream<Arguments> testVarianceNonFinite() {
         return Stream.of(
-            Arguments.of(new double[] {}, Double.NaN),
-            Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY},
-                Double.NaN),
-            Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY},
-                Double.NaN),
-            Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY},
-                Double.NaN),
-            Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.MAX_VALUE},
-                Double.NaN),
-            Arguments.of(new double[] {Double.NEGATIVE_INFINITY, -Double.MIN_VALUE},
-                Double.NaN),
-            Arguments.of(new double[] {Double.NaN, +0.0d, -0.0d, Double.POSITIVE_INFINITY,
-                Double.NEGATIVE_INFINITY}, Double.NaN),
-            Arguments.of(new double[] {Double.NaN, 34.56, 89.74}, Double.NaN),
-            Arguments.of(new double[] {34.56, Double.NaN, 89.74}, Double.NaN),
-            Arguments.of(new double[] {34.56, 89.74, Double.NaN}, Double.NaN),
-            Arguments.of(new double[] {Double.NaN, 3.14, Double.NaN, Double.NaN},
-                Double.NaN),
-            Arguments.of(new double[] {Double.NaN, Double.NaN, Double.NaN}, Double.NaN),
-            Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.MAX_VALUE},
-                Double.NaN),
-            Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY,
-                Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN),
-            Arguments.of(new double[] {-Double.MAX_VALUE, Double.POSITIVE_INFINITY},
-                Double.NaN)
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN),

Review Comment:
   These are not the same as the MeanTest. So we are missing cases in one or the other. I would put them all in TestData and then have a method to compute the expected value when non-finite values are encountered. IIUC when the mean is infinite the variance will be NaN so computing the expected value is a simple return of NaN.



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/MeanTest.java:
##########
@@ -298,17 +215,17 @@ void testCombineRandomOrderNonFinite(double[][] values, double expected) {
     }
 
     @ParameterizedTest
-    @MethodSource(value = "testCombineNonFinite")
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testCombineNonFinite", "testCombineMeanNonFinite"})
     void testArrayOfArraysNonFinite(double[][] values, double expected) {
-        double actual = Arrays.stream(values)
+        final double actual = Arrays.stream(values)
                 .map(Mean::of)
                 .reduce(Mean::combine)
                 .map(Mean::getAsDouble)
                 .orElseThrow(RuntimeException::new);
         Assertions.assertEquals(expected, actual, "array of arrays combined mean non-finite");
     }
 
-    static Stream<Arguments> testCombineNonFinite() {
+    static Stream<Arguments> testCombineMeanNonFinite() {

Review Comment:
   I do not think we should have expected value in this stream method. Why are these cases needed for mean and not for any other stat. Can these cases be moved to the TestData class?
   
   You can just have a second method to compute the expected mean for non-finite cases. IIUC it should just be the mean of the stream of values:
   ```java
   Arrays.stream(values).flatMapToDouble(Arrays::stream).average().orElse(Double.NaN)
   ```



##########
commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/SumOfSquaredDeviations.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+/**
+ * Computes the sum of squared deviations from the sample mean. This
+ * statistic is related to the second moment.
+ *
+ * <p>
+ * The following recursive updating formula is used:
+ * <p>
+ * Let <ul>
+ * <li> dev = (current obs - previous mean) </li>
+ * <li> n = number of observations (including current obs) </li>
+ * </ul>
+ * Then
+ * <p>
+ * new value = old value + dev^2 * (n - 1) / n.
+ * <p>
+ *
+ * Returns the sum of squared deviations of all values seen so far.
+ *
+ * <p><strong>Note that this implementation is not synchronized.</strong> If
+ * multiple threads access an instance of this class concurrently, and at least
+ * one of the threads invokes the {@link java.util.function.DoubleConsumer#accept(double) accept} or
+ * {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine} method, it must be synchronized externally.
+ *
+ * <p>However, it is safe to use {@link java.util.function.DoubleConsumer#accept(double) accept} and {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine}
+ * as <code>accumulator</code> and <code>combiner</code> functions of
+ * {@link java.util.stream.Collector Collector} on a parallel stream,
+ * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()}
+ * provides the necessary partitioning, isolation, and merging of results for
+ * safe and efficient parallel execution.
+ */
+class SumOfSquaredDeviations extends FirstMoment implements DoubleStatistic {
+    /** Sum of squared deviations of the values that have been added. */
+    private double squaredDevSum;
+
+    /**
+     * Create a SumOfSquaredDeviations instance.
+     */
+    SumOfSquaredDeviations() {
+        // No-op
+    }
+
+    /**
+     * Create a SumOfSquaredDeviations instance with the given sum of
+     * squared deviations and a FirstMoment instance.
+     *
+     * @param squaredDevSum Sum of squared deviations.
+     * @param mean Mean of values.
+     * @param n Number of values.
+     * @param nonFiniteValue Sum of values.
+     */
+    SumOfSquaredDeviations(double squaredDevSum, double mean, long n, double nonFiniteValue) {
+        super(mean, n, nonFiniteValue);
+        this.squaredDevSum = squaredDevSum;
+    }
+
+    /**
+     * Updates the state of the statistic to reflect the addition of {@code value}.
+     * @param value Value.
+     */
+    @Override
+    public void accept(double value) {
+        super.accept(value);
+        squaredDevSum += (n - 1) * dev * nDev;
+    }
+
+    /**
+     * Gets the sum of squared deviations of all input values.
+     *
+     * @return {@code SumOfSquaredDeviations} of all values seen so far.
+     */
+    @Override
+    public double getAsDouble() {
+        return Double.isFinite(super.getAsDouble()) ? squaredDevSum : Double.NaN;
+    }
+
+    /**
+     * Gets a new FirstMoment instance with all of its parameters copied from the current instance.
+     * @return The FirstMoment instance.
+     */
+    FirstMoment getFirstMoment() {
+        return new FirstMoment(m1, n, super.getNonFiniteValue(), dev, nDev);
+    }
+
+    /**
+     * Combines the state of another {@code SumOfSquaredDeviations} into this one.
+     *
+     * @param other Another {@code SumOfSquaredDeviations} to be combined.
+     * @return {@code this} instance after combining {@code other}.
+     */
+    public SumOfSquaredDeviations combine(SumOfSquaredDeviations other) {
+        final long oldN = n;
+        final long otherN = other.n;
+        if (oldN == 0) {
+            squaredDevSum = other.squaredDevSum;
+        } else if (otherN != 0) {
+            final double diffOfMean = other.getFirstMoment().getAsDouble() - m1;
+            final double sqDiffOfMean = diffOfMean * diffOfMean;
+            squaredDevSum += other.squaredDevSum + sqDiffOfMean * ((double) (oldN * otherN) / (oldN + otherN));
+        }
+        super.combine(other.getFirstMoment());

Review Comment:
   No need to create a copy instance here. Just use: `super.combine(other)`



##########
commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/SumOfSquaredDeviations.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+/**
+ * Computes the sum of squared deviations from the sample mean. This
+ * statistic is related to the second moment.
+ *
+ * <p>
+ * The following recursive updating formula is used:
+ * <p>
+ * Let <ul>
+ * <li> dev = (current obs - previous mean) </li>
+ * <li> n = number of observations (including current obs) </li>
+ * </ul>
+ * Then
+ * <p>
+ * new value = old value + dev^2 * (n - 1) / n.
+ * <p>
+ *
+ * Returns the sum of squared deviations of all values seen so far.
+ *
+ * <p><strong>Note that this implementation is not synchronized.</strong> If
+ * multiple threads access an instance of this class concurrently, and at least
+ * one of the threads invokes the {@link java.util.function.DoubleConsumer#accept(double) accept} or
+ * {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine} method, it must be synchronized externally.
+ *
+ * <p>However, it is safe to use {@link java.util.function.DoubleConsumer#accept(double) accept} and {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine}
+ * as <code>accumulator</code> and <code>combiner</code> functions of
+ * {@link java.util.stream.Collector Collector} on a parallel stream,
+ * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()}
+ * provides the necessary partitioning, isolation, and merging of results for
+ * safe and efficient parallel execution.
+ */
+class SumOfSquaredDeviations extends FirstMoment implements DoubleStatistic {
+    /** Sum of squared deviations of the values that have been added. */
+    private double squaredDevSum;
+
+    /**
+     * Create a SumOfSquaredDeviations instance.
+     */
+    SumOfSquaredDeviations() {
+        // No-op
+    }
+
+    /**
+     * Create a SumOfSquaredDeviations instance with the given sum of
+     * squared deviations and a FirstMoment instance.
+     *
+     * @param squaredDevSum Sum of squared deviations.
+     * @param mean Mean of values.
+     * @param n Number of values.
+     * @param nonFiniteValue Sum of values.
+     */
+    SumOfSquaredDeviations(double squaredDevSum, double mean, long n, double nonFiniteValue) {
+        super(mean, n, nonFiniteValue);
+        this.squaredDevSum = squaredDevSum;
+    }
+
+    /**
+     * Updates the state of the statistic to reflect the addition of {@code value}.
+     * @param value Value.
+     */
+    @Override
+    public void accept(double value) {
+        super.accept(value);
+        squaredDevSum += (n - 1) * dev * nDev;
+    }
+
+    /**
+     * Gets the sum of squared deviations of all input values.
+     *
+     * @return {@code SumOfSquaredDeviations} of all values seen so far.
+     */
+    @Override
+    public double getAsDouble() {
+        return Double.isFinite(super.getAsDouble()) ? squaredDevSum : Double.NaN;
+    }
+
+    /**
+     * Gets a new FirstMoment instance with all of its parameters copied from the current instance.
+     * @return The FirstMoment instance.
+     */
+    FirstMoment getFirstMoment() {
+        return new FirstMoment(m1, n, super.getNonFiniteValue(), dev, nDev);
+    }
+
+    /**
+     * Combines the state of another {@code SumOfSquaredDeviations} into this one.
+     *
+     * @param other Another {@code SumOfSquaredDeviations} to be combined.
+     * @return {@code this} instance after combining {@code other}.
+     */
+    public SumOfSquaredDeviations combine(SumOfSquaredDeviations other) {
+        final long oldN = n;
+        final long otherN = other.n;
+        if (oldN == 0) {
+            squaredDevSum = other.squaredDevSum;
+        } else if (otherN != 0) {
+            final double diffOfMean = other.getFirstMoment().getAsDouble() - m1;

Review Comment:
   Avoid creating a copy instance of the FirstMoment. Instead drop the implementation of the statistic interfaces (these are internal classes so do not require them):
   ```java
   // In FirstMoment:
       double getFirstMoment() {
           if (Double.isFinite(m1)) {
               return n == 0 ? Double.NaN : m1;
           }
           // A non-finite value must have been encountered, return nonFiniteValue which represents m1.
           return nonFiniteValue;
       }
   
   // In SumOfSquaredDeviations:
       final double diffOfMean = other.getFirstMoment() - m1
   ```



##########
commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/Variance.java:
##########
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+/**
+ * Computes the variance of a set of values. By default, the
+ * "sample variance" is computed. The definitional formula for sample
+ * variance is:
+ * <p>
+ * sum((x_i - mean)^2) / (n - 1)
+ * <p>This formula does not have good numerical properties, so this
+ * implementation does not use it to compute the statistic.
+ * <ul>
+ * <li> The {@link #accept(double)} method computes the variance using
+ * updating formulae based on West's algorithm, as described in
+ * <a href="http://doi.acm.org/10.1145/359146.359152"> Chan, T. F. and
+ * J. G. Lewis 1979, <i>Communications of the ACM</i>,
+ * vol. 22 no. 9, pp. 526-531.</a></li>
+ *
+ * <li> The {@link #of(double...)} method leverages the fact that it has the
+ * full array of values in memory to execute a two-pass algorithm.
+ * Specifically, this method uses the "corrected two-pass algorithm" from
+ * Chan, Golub, Levesque, <i>Algorithms for Computing the Sample Variance</i>,
+ * American Statistician, vol. 37, no. 3 (1983) pp. 242-247.</li></ul>
+ *
+ * Note that adding values using {@code accept} and then executing {@code getAsDouble} will
+ * sometimes give a different, less accurate, result than executing
+ * {@code of} with the full array of values. The former approach
+ * should only be used when the full array of values is not available.
+ *
+ * <p>
+ * Returns <code>Double.NaN</code> if no data values have been added and
+ * returns <code>0</code> if there is just one finite value in the data set.
+ * Note that <code>Double.NaN</code> may also be returned if the input includes
+ * <code>Double.NaN</code> and / or infinite values.
+ *
+ * <p>This class is designed to work with (though does not require)
+ * {@linkplain java.util.stream streams}.
+ *
+ * <p><strong>Note that this implementation is not synchronized.</strong> If
+ * multiple threads access an instance of this class concurrently, and at least
+ * one of the threads invokes the {@link java.util.function.DoubleConsumer#accept(double) accept} or
+ * {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine} method, it must be synchronized externally.
+ *
+ * <p>However, it is safe to use <code>accept()</code> and <code>combine()</code>
+ * as <code>accumulator</code> and <code>combiner</code> functions of
+ * {@link java.util.stream.Collector Collector} on a parallel stream,
+ * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()}
+ * provides the necessary partitioning, isolation, and merging of results for
+ * safe and efficient parallel execution.
+ *
+ * @since 1.1
+ */
+public abstract class Variance implements DoubleStatistic, DoubleStatisticAccumulator<Variance> {
+
+    /**
+     * Create a Variance instance.
+     */
+    Variance() {
+        // No-op
+    }
+
+    /**
+     * Creates a {@code Variance} implementation which does not store the input value(s) it consumes.
+     *
+     * <p>The result is <code>NaN</code> if:
+     * <ul>
+     *     <li>no values have been added,</li>
+     *     <li>any of the values is <code>NaN</code>, or</li>
+     *     <li>an infinite value of either sign is encountered</li>
+     * </ul>
+     *
+     * @return {@code Variance} implementation.
+     */
+    public static Variance create() {
+        return new StorelessSampleVariance();
+    }
+
+    /**
+     * Returns a {@code Variance} instance that has the variance of all input values, or <code>NaN</code>
+     * if:
+     * <ul>
+     *     <li>the input array is empty,</li>
+     *     <li>any of the values is <code>NaN</code>, or</li>
+     *     <li>an infinite value of either sign is encountered</li>
+     * </ul>
+     *
+     * <p>Note: {@code Variance} computed using {@link Variance#accept Variance.accept()} may be different
+     * from this variance.
+     *
+     * <p>See {@link Variance} for details on the computing algorithm.
+     *
+     * @param values Values.
+     * @return {@code Variance} instance.
+     */
+    public static Variance of(double... values) {
+        final double mean = Mean.of(values).getAsDouble();

Review Comment:
   You can short-circuit here if the mean is non-finite. I think this is correct. We either return a infinity if the mean is +/-infinity or propagate the nan for the sum of squared deviations.
   ```java
           if (!Double.isFinite(mean)) {
               return StorelessSampleVariance.create(Math.abs(mean), mean, values.length, Math.abs(mean));
           }
   ```



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java:
##########
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+import java.math.BigDecimal;
+import java.math.MathContext;
+import java.util.Arrays;
+import java.util.stream.Stream;
+import org.apache.commons.rng.UniformRandomProvider;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+/**
+ * Test for {@link Variance}.
+ */
+final class VarianceTest {
+    private static final int ULP_ARRAY = 5;
+
+    private static final int ULP_STREAM = 7;
+
+    private static final int ULP_COMBINE = 7;
+
+    private static final int ULP_ARRAY_OF_ARRAYS = 2;
+
+    @Test
+    void testEmpty() {
+        Variance var = Variance.create();
+        Assertions.assertEquals(Double.NaN, var.getAsDouble());
+    }
+
+    @Test
+    void testNaN() {
+        Variance variance = Variance.create();
+        double[] testArray = {Double.NaN, +0.0d, -0.0d, Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY};
+        for (double value : testArray) {
+            variance.accept(value);
+        }
+        Assertions.assertEquals(Double.NaN, variance.getAsDouble());
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testValues")
+    void testVariance(double[] values) {
+        final double expected = computeExpectedVariance(values);
+        Variance var = Variance.create();
+        for (double value : values) {
+            var.accept(value);
+        }
+        TestHelper.assertEquals(expected, var.getAsDouble(), ULP_STREAM, () -> "variance");
+        TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), ULP_ARRAY, () -> "of (values)");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testValues")
+    void testParallelStream(double[] values) {
+        final double expected = computeExpectedVariance(values);
+        final double actual = Arrays.stream(values)
+                .parallel()
+                .collect(Variance::create, Variance::accept, Variance::combine)
+                .getAsDouble();
+        TestHelper.assertEquals(expected, actual, ULP_COMBINE, () -> "parallel stream");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testValues")
+    void testVarianceRandomOrder(double[] values) {
+        UniformRandomProvider rng = TestHelper.createRNG();
+        for (int i = 1; i <= 10; i++) {
+            testVariance(TestHelper.shuffle(rng, values));
+            testParallelStream(TestHelper.shuffle(rng, values));
+        }
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testValuesNonFinite", "testVarianceNonFinite"})
+    void testVarianceNonFinite(double[] values, double expected) {
+        Variance var = Variance.create();
+        for (double value : values) {
+            var.accept(value);
+        }
+        Assertions.assertEquals(expected, var.getAsDouble(), "variance non-finite");
+        Assertions.assertEquals(expected, Variance.of(values).getAsDouble(), "of (values) non-finite");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testValuesNonFinite", "testVarianceNonFinite"})
+    void testParallelStreamNonFinite(double[] values, double expected) {
+        final double ans = Arrays.stream(values)
+                .parallel()
+                .collect(Variance::create, Variance::accept, Variance::combine)
+                .getAsDouble();
+        Assertions.assertEquals(expected, ans, "parallel stream non-finite");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testValuesNonFinite", "testVarianceNonFinite"})
+    void testVarianceRandomOrderNonFinite(double[] values, double expected) {
+        UniformRandomProvider rng = TestHelper.createRNG();
+        for (int i = 1; i <= 10; i++) {
+            testVarianceNonFinite(TestHelper.shuffle(rng, values), expected);
+            testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected);
+        }
+    }
+
+    static Stream<Arguments> testVarianceNonFinite() {
+        return Stream.of(
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN),
+            Arguments.of(new double[]{Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY}, Double.NaN),
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.MAX_VALUE}, Double.NaN),
+            Arguments.of(new double[]{Double.NEGATIVE_INFINITY, -Double.MIN_VALUE}, Double.NaN),
+            Arguments.of(new double[]{Double.NEGATIVE_INFINITY, Double.MAX_VALUE}, Double.NaN),
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN),
+            Arguments.of(new double[]{-Double.MAX_VALUE, Double.POSITIVE_INFINITY}, Double.NaN)
+        );
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testCombine")
+    void testCombine(double[] array1, double[] array2) {
+        final double[] combinedArray = TestHelper.concatenate(array1, array2);
+        final double expected = computeExpectedVariance(combinedArray);
+        Variance var1 = Variance.create();
+        Variance var2 = Variance.create();
+        Arrays.stream(array1).forEach(var1);
+        Arrays.stream(array2).forEach(var2);
+        final double var1BeforeCombine = var1.getAsDouble();
+        final double var2BeforeCombine = var2.getAsDouble();
+        var1.combine(var2);
+        TestHelper.assertEquals(expected, var1.getAsDouble(), ULP_COMBINE, () -> "combine");
+        Assertions.assertEquals(var2BeforeCombine, var2.getAsDouble());
+        // Combine in reverse order
+        Variance var1b = Variance.create();
+        Arrays.stream(array1).forEach(var1b);
+        var2.combine(var1b);
+        TestHelper.assertEquals(expected, var2.getAsDouble(), ULP_COMBINE, () -> "combine");
+        Assertions.assertEquals(var1BeforeCombine, var1b.getAsDouble());
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testCombine")
+    void testCombineRandomOrder(double[] array1, double[] array2) {
+        UniformRandomProvider rng = TestHelper.createRNG();
+        double[] data = TestHelper.concatenate(array1, array2);
+        final int n = array1.length;
+        for (int i = 1; i <= 10; i++) {
+            for (int j = 1; j <= 10; j++) {
+                TestHelper.shuffle(rng, array1);
+                TestHelper.shuffle(rng, array2);
+                testCombine(array1, array2);
+            }
+            TestHelper.shuffle(rng, data);
+            System.arraycopy(data, 0, array1, 0, n);
+            System.arraycopy(data, n, array2, 0, array2.length);
+            testCombine(array1, array2);
+        }
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = "org.apache.commons.statistics.descriptive.TestData#testCombine")
+    void testArrayOfArrays(double[] array1, double[] array2) {
+        final double[] combinedArray = TestHelper.concatenate(array1, array2);
+        final double expected = computeExpectedVariance(combinedArray);
+        final double[][] values = {array1, array2};
+        final double actual = Arrays.stream(values)
+                .map(Variance::of)
+                .reduce(Variance::combine)
+                .map(Variance::getAsDouble)
+                .orElseThrow(RuntimeException::new);
+        TestHelper.assertEquals(expected, actual, ULP_ARRAY_OF_ARRAYS, () -> "array of arrays combined variance");
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testCombineNonFinite", "testCombineVarianceNonFinite"})
+    void testCombineNonFinite(double[][] values, double expected) {
+        Variance var1 = Variance.create();
+        Variance var2 = Variance.create();
+        Arrays.stream(values[0]).forEach(var1);
+        Arrays.stream(values[1]).forEach(var2);
+        final double mean2BeforeCombine = var2.getAsDouble();
+        var1.combine(var2);
+        Assertions.assertEquals(expected, var1.getAsDouble(), "combine non-finite");
+        Assertions.assertEquals(mean2BeforeCombine, var2.getAsDouble());
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testCombineNonFinite", "testCombineVarianceNonFinite"})
+    void testCombineRandomOrderNonFinite(double[][] values, double expected) {
+        UniformRandomProvider rng = TestHelper.createRNG();
+        final double[] data = TestHelper.concatenate(values[0], values[1]);
+        final int n = values[0].length;
+        for (int i = 1; i <= 10; i++) {
+            for (int j = 1; j <= 10; j++) {
+                TestHelper.shuffle(rng, values[0]);
+                TestHelper.shuffle(rng, values[1]);
+                testCombineNonFinite(values, expected);
+            }
+            TestHelper.shuffle(rng, data);
+            System.arraycopy(data, 0, values[0], 0, n);
+            System.arraycopy(data, n, values[1], 0, values[1].length);
+            testCombineNonFinite(values, expected);
+        }
+    }
+
+    @ParameterizedTest
+    @MethodSource(value = {"org.apache.commons.statistics.descriptive.TestData#testCombineNonFinite", "testCombineVarianceNonFinite"})
+    void testArrayOfArraysNonFinite(double[][] values, double expected) {
+        final double actual = Arrays.stream(values)
+                .map(Variance::of)
+                .reduce(Variance::combine)
+                .map(Variance::getAsDouble)
+                .orElseThrow(RuntimeException::new);
+        Assertions.assertEquals(expected, actual, "array of arrays combined variance non-finite");
+    }
+
+    static Stream<Arguments> testCombineVarianceNonFinite() {
+        return Stream.of(
+            Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.POSITIVE_INFINITY}}, Double.NaN),
+            Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY}, {Double.NEGATIVE_INFINITY}}, Double.NaN),
+            Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.MAX_VALUE}}, Double.NaN),
+            Arguments.of(new double[][] {{-Double.MAX_VALUE}, {Double.POSITIVE_INFINITY}}, Double.NaN),
+            Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY}, {-Double.MIN_VALUE}}, Double.NaN),
+            Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY, -Double.MAX_VALUE, -Double.MIN_VALUE}, {Double.MAX_VALUE, Double.MIN_VALUE}}, Double.NaN)
+        );
+    }
+
+    // Helper function to compute the expected value of Variance using BigDecimal.
+    static double computeExpectedVariance(double[] values) {
+        long n = values.length;
+        if (n == 1) {
+            return BigDecimal.ZERO.doubleValue();

Review Comment:
   Just return zero here



##########
commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/TestData.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+import java.util.stream.Stream;
+import org.junit.jupiter.params.provider.Arguments;
+
+final class TestData {
+
+    /** Class contains only static methods. */
+    private TestData() {}
+
+    static Stream<double[]> testValues() {
+        return Stream.of(
+            new double[] {0.0},
+            new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5},
+            new double[] {8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68},
+            new double[] {9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74, 7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73},
+            new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8},
+            new double[] {6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89},
+            new double[] {0, 0, 0.0},
+            new double[] {1, -7, 6},
+            new double[] {1, 7, -15, 3},
+            new double[] {2, 2, 2, 2},
+            new double[] {2.3},
+            new double[] {3.14, 2.718, 1.414},
+            new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9, 12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3},
+            new double[] {-0.0, +0.0},
+            new double[] {0.0, -0.0},
+            new double[] {0.0, +0.0},
+            new double[] {0.001, 0.0002, 0.00003, 10000.11, 0.000004},
+            new double[] {10E-50, 5E-100, 25E-200, 35.345E-50},
+            // Overflow of the sum
+            new double[] {Double.MAX_VALUE, Double.MAX_VALUE},
+            new double[] {-Double.MAX_VALUE, -Double.MAX_VALUE},
+            new double[] {Double.MAX_VALUE, 1},
+            new double[] {-Double.MAX_VALUE, 1, 1},
+            new double[] {-Double.MAX_VALUE, -1, 1},
+            new double[] {Double.MAX_VALUE, -1},
+            new double[] {Double.MAX_VALUE, -Double.MAX_VALUE},
+            new double[] {1, -Double.MAX_VALUE},
+            new double[] {1, 1, 1, -Double.MAX_VALUE},
+            new double[] {Double.MAX_VALUE, Double.MAX_VALUE / 2},
+            new double[] {Double.MAX_VALUE, Double.MAX_VALUE, -Double.MAX_VALUE},
+            new double[] {Double.MAX_VALUE, -Double.MAX_VALUE}
+        );
+    }
+
+    static Stream<Arguments> testValuesNonFinite() {
+        return Stream.of(
+            Arguments.of(new double[]{}, Double.NaN),
+            Arguments.of(new double[]{Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY}, Double.NaN),
+            Arguments.of(new double[]{Double.NaN, 34.56, 89.74}, Double.NaN),
+            Arguments.of(new double[]{34.56, Double.NaN, 89.74}, Double.NaN),
+            Arguments.of(new double[]{34.56, 89.74, Double.NaN}, Double.NaN),
+            Arguments.of(new double[]{Double.NaN, 3.14, Double.NaN, Double.NaN}, Double.NaN),
+            Arguments.of(new double[]{Double.NaN, Double.NaN, Double.NaN}, Double.NaN)
+        );
+    }
+
+    static Stream<Arguments> testCombine() {
+        return Stream.of(
+            Arguments.of(new double[] {}, new double[] {1}),
+            Arguments.of(new double[] {1}, new double[] {}),
+            Arguments.of(new double[] {}, new double[] {1, 7, -15, 3}),
+            Arguments.of(new double[] {0}, new double[] {0, 0.0}),
+            Arguments.of(new double[] {4, 8, -6, 3, 18}, new double[] {1, -7, 6}),
+            Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}),
+            Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}),
+            Arguments.of(new double[] {6.0, -1.32, -5.78, 8.967, 13.32, -9.67, 0.14, 7.321, 11.456, -3.111}, new double[] {2, 2, 2, 2}),
+            Arguments.of(new double[] {2.3}, new double[] {-42, 10, -88, 5, -17}),
+            Arguments.of(new double[] {-20, 34.983, -12.745, 28.12, -8.34, 42, -4, 16}, new double[] {3.14, 2.718, 1.414}),
+            Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9}, new double[] {12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}),
+            Arguments.of(new double[] {-0.0}, new double[] {+0.0}),
+            Arguments.of(new double[] {0.0}, new double[] {-0.0}),
+            Arguments.of(new double[] {0.0}, new double[] {+0.0}),
+            Arguments.of(new double[] {10E-50, 5E-100}, new double[] {25E-200, 35.345E-50}),
+            Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {Double.MAX_VALUE}),
+            Arguments.of(new double[] {-Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}),
+            Arguments.of(new double[] {-Double.MAX_VALUE, 1}, new double[] {1}),
+            Arguments.of(new double[] {Double.MAX_VALUE, 3.1415E153}, new double[] {}),
+            Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}),
+            Arguments.of(new double[] {1}, new double[] {-Double.MAX_VALUE}),
+            Arguments.of(new double[] {1, 1, 1}, new double[] {-Double.MAX_VALUE}),
+            Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {1, 1E300})
+        );
+    }
+
+    static Stream<Arguments> testCombineNonFinite() {

Review Comment:
   Again, the expected should not be here. It is always NaN so not helpful to any consumers of the data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org