You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by "leerho (via GitHub)" <gi...@apache.org> on 2024/03/26 21:58:50 UTC

[PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

leerho opened a new pull request, #535:
URL: https://github.com/apache/datasketches-java/pull/535

   from the Classic DoublesSketch to KllDoublesSketch and the KllFloatsSketch.
   
   In order to do this I needed to add "getNormalizedRankError(boolean) to the root interface QuantilesAPI. All of the sketches (except REQ) had this anyway. In REQ I just throw an unsupported operation exception, because it makes no sense for REQ.
   
   This also means the KolmogorovSmirnov class and its test move to the "quantilescommon" package.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "jmalkin (via GitHub)" <gi...@apache.org>.
jmalkin commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1541764500


##########
src/main/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnov.java:
##########
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.apache.datasketches.quantilescommon.QuantilesAPI.UNSUPPORTED_MSG;
+
+import org.apache.datasketches.req.ReqSketch;
+
+/**
+ * Kolmogorov-Smirnov Test
+ * See <a href="https://en.wikipedia.org/wiki/Kolmogorov-Smirnov_test">Kolmogorov–Smirnov Test</a>
+ */
+public final class KolmogorovSmirnov {
+
+  /**
+   * Computes the raw delta between two QuantilesDoublesAPI sketches for the <i>kolmogorovSmirnovTest(...)</i> method.
+   * @param sketch1 first Input QuantilesDoublesAPI
+   * @param sketch2 second Input QuantilesDoublesAPI
+   * @return the raw delta area between two QuantilesDoublesAPI sketches
+   */
+  public static double computeKSDelta(final QuantilesDoublesAPI sketch1, final QuantilesDoublesAPI sketch2) {
+    final DoublesSortedView p = sketch1.getSortedView();
+    final DoublesSortedView q = sketch2.getSortedView();
+
+    final double[] pSamplesArr = p.getQuantiles();
+    final double[] qSamplesArr = q.getQuantiles();
+    final long[] pCumWtsArr = p.getCumulativeWeights();
+    final long[] qCumWtsArr = q.getCumulativeWeights();
+    final int pSamplesArrLen = pSamplesArr.length;
+    final int qSamplesArrLen = qSamplesArr.length;
+
+    final double n1 = sketch1.getN();
+    final double n2 = sketch2.getN();
+
+    double deltaHeight = 0;
+    int i = 0;
+    int j = 0;
+
+    while ((i < pSamplesArrLen - 1) && (j < qSamplesArrLen - 1)) {
+      deltaHeight = Math.max(deltaHeight, Math.abs(pCumWtsArr[i] / n1 - qCumWtsArr[j] / n2));
+      if (pSamplesArr[i] < qSamplesArr[j]) {
+        i++;
+      } else if (qSamplesArr[j] < pSamplesArr[i]) {
+        j++;
+      } else {
+        i++;
+        j++;
+      }
+    }
+
+    deltaHeight = Math.max(deltaHeight, Math.abs(pCumWtsArr[i] / n1 - qCumWtsArr[j] / n2));
+    return deltaHeight;
+  }
+
+  /**
+   * Computes the raw delta between two QuantilesFloatsAPI sketches for the <i>kolmogorovSmirnovTest(...)</i> method.
+   * method.
+   * @param sketch1 first Input QuantilesFloatsAPI sketch
+   * @param sketch2 second Input QuantilesFloatsAPI sketch
+   * @return the raw delta area between two QuantilesFloatsAPI sketches
+   */
+  public static double computeKSDelta(final QuantilesFloatsAPI sketch1, final QuantilesFloatsAPI sketch2) {
+    final FloatsSortedView p = sketch1.getSortedView();
+    final FloatsSortedView q = sketch2.getSortedView();
+
+    final float[] pSamplesArr = p.getQuantiles();
+    final float[] qSamplesArr = q.getQuantiles();
+    final long[] pCumWtsArr = p.getCumulativeWeights();
+    final long[] qCumWtsArr = q.getCumulativeWeights();
+    final int pSamplesArrLen = pSamplesArr.length;
+    final int qSamplesArrLen = qSamplesArr.length;
+
+    final double n1 = sketch1.getN();
+    final double n2 = sketch2.getN();
+
+    double deltaHeight = 0;
+    int i = 0;
+    int j = 0;
+
+    while ((i < pSamplesArrLen - 1) && (j < qSamplesArrLen - 1)) {
+      deltaHeight = Math.max(deltaHeight, Math.abs(pCumWtsArr[i] / n1 - qCumWtsArr[j] / n2));
+      if (pSamplesArr[i] < qSamplesArr[j]) {
+        i++;
+      } else if (qSamplesArr[j] < pSamplesArr[i]) {
+        j++;
+      } else {
+        i++;
+        j++;
+      }
+    }
+
+    deltaHeight = Math.max(deltaHeight, Math.abs(pCumWtsArr[i] / n1 - qCumWtsArr[j] / n2));
+    return deltaHeight;
+  }
+
+  /**
+   * Computes the adjusted delta height threshold for the <i>kolmogorovSmirnovTest(...)</i> method.
+   * This adjusts the computed threshold by the error epsilons of the two given sketches.
+   * The two sketches must be of the same primitive type, double or float.
+   * This will not work with the REQ sketch.
+   * @param sketch1 first Input QuantilesAPI sketch
+   * @param sketch2 second Input QuantilesAPI sketch
+   * @param tgtPvalue Target p-value. Typically .001 to .1, e.g., .05.
+   * @return the adjusted threshold to be compared with the raw delta area.
+   */
+  public static double computeKSThreshold(final QuantilesAPI sketch1,
+                                          final QuantilesAPI sketch2,
+                                          final double tgtPvalue) {
+    final double r1 = sketch1.getNumRetained();
+    final double r2 = sketch2.getNumRetained();
+    final double alpha = tgtPvalue;
+    final double alphaFactor = Math.sqrt(-0.5 * Math.log(0.5 * alpha));
+    final double deltaAreaThreshold = alphaFactor * Math.sqrt((r1 + r2) / (r1 * r2));
+    final double eps1 = sketch1.getNormalizedRankError(false);
+    final double eps2 = sketch2.getNormalizedRankError(false);
+    return deltaAreaThreshold + eps1 + eps2;
+  }
+
+  /**
+   * Performs the Kolmogorov-Smirnov Test between two QuantilesAPI sketches.
+   * Note: if the given sketches have insufficient data or if the sketch sizes are too small,
+   * this will return false. The two sketches must be of the same primitive type, double or float.
+   * This will not work with the REQ sketch.
+   * @param sketch1 first Input QuantilesAPI
+   * @param sketch2 second Input QuantilesAPI
+   * @param tgtPvalue Target p-value. Typically .001 to .1, e.g., .05.
+   * @return Boolean indicating whether we can reject the null hypothesis (that the sketches
+   * reflect the same underlying distribution) using the provided tgtPValue.
+   */
+  public static boolean kolmogorovSmirnovTest(final QuantilesAPI sketch1,
+      final QuantilesAPI sketch2, final double tgtPvalue) {
+
+    final double delta = isDoubleType(sketch1, sketch2)

Review Comment:
   Kind of unfortunate we can't mix types :(



##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);

Review Comment:
   Same as above. `delta` is much too large to test anything meaningful here.



##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllDoublesSketches() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllFloatsSketches() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);

Review Comment:
   same



##########
src/main/java/org/apache/datasketches/req/ReqSketch.java:
##########
@@ -235,6 +235,15 @@ public long getN() {
     return totalN;
   }
 
+  @Override

Review Comment:
   Didn't realize `@Override` could appear above a comment like this. Makes sense in that comments are stripped when compiling.



##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);

Review Comment:
   `delta` is not the correct test target here. For disjoint distributions it needs to be much, much smaller than that. Maybe not quite 0.0 but very close.



##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);

Review Comment:
   Again



##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);

Review Comment:
   Here, `delta` is too restrictive in general. It may be fine with the given seed, but conceptually it's the wrong thing to test against. The allowable margin is larger than the single-sketch delta.
   
   Reading test code is a good way to learn how things are supposed to work, so we shouldn't have our tests suggest that this is a good target to check against. Or at the very least a comment noting that we're using it despite it being too tight a bound for this test in general.



##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllDoublesSketches() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);

Review Comment:
   same



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1542059814


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);

Review Comment:
   After a long side conversation, resolved to change name delta -> eps and the tolerance to 2 * eps.



##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllDoublesSketches() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);

Review Comment:
   After a long side conversation, resolved to change name delta -> eps and the tolerance to 2 * eps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1542059194


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);

Review Comment:
   After a long side conversation, resolved to change name delta -> eps and the tolerance to 2 * eps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "jmalkin (via GitHub)" <gi...@apache.org>.
jmalkin commented on PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#issuecomment-2023447441

   I don't think it's that it makes no sense for REQ, but it won't be able to share code nicely. With classic quantiles and KLL we can share an API since error is constant over the sketch. For REQ we'd need a point-specific error estimate.
   
   Completely out of scope of this PR though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "jmalkin (via GitHub)" <gi...@apache.org>.
jmalkin commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1542009611


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);

Review Comment:
   Let's go with 2*delta as the tolerance here.  Each sketch can be off by +-normalizedRankError(k) at each point so if we're unlucky and the maximum error aligns in opposite directions it can be up to 2x that error.
   
   Obviously won't happen with a fixed seed, but we generally want the test to cover the design goal rather than the specific random seed (even if that's not 100% possible in practice).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1541866017


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);

Review Comment:
   and you made Random deterministic.  I overlooked that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1541840889


##########
src/main/java/org/apache/datasketches/req/ReqSketch.java:
##########
@@ -235,6 +235,15 @@ public long getN() {
     return totalN;
   }
 
+  @Override

Review Comment:
   no comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "jmalkin (via GitHub)" <gi...@apache.org>.
jmalkin commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1541938769


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);

Review Comment:
   Looking at this further, it's a pre-existing issue with our KS test. We can look at how to improve that, but it's not an issue with extending the test to KLL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1542060190


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllDoublesSketches() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllFloatsSketches() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);

Review Comment:
   After a long side conversation, resolved to change name delta -> eps and the tolerance to 2 * eps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "github-advanced-security[bot] (via GitHub)" <gi...@apache.org>.
github-advanced-security[bot] commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1540161421


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,355 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllDoublesSketches() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllFloatsSketches() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void mediumResolutionClassicDoubles() {
+   final int k = 2048;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + .05);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertFalse(reject);
+ }
+
+ @Test
+ public void mediumResolutionKllDoubles() {
+   final int k = 2048;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + .05);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertFalse(reject);
+ }
+
+ @Test
+ public void mediumResolutionKllFloats() {
+   final int k = 2048;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + .05F);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertFalse(reject);
+ }
+
+ @Test
+ public void highResolutionClassicDoubles() {
+   final int k = 8192;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + .05);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertTrue(reject);
+ }
+
+ @Test
+ public void highResolutionKllDoubles() {
+   final int k = 8192;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + .05);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertTrue(reject);
+ }
+
+ @Test
+ public void highResolutionKllFloats() {
+   final int k = 8192;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + .05F);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertTrue(reject);
+ }
+
+  @Test
+  public void printlnTest() {
+    println("PRINTING: "+this.getClass().getName());
+  }
+
+  /**
+   * @param s value to print
+   */
+  static void println(String s) {

Review Comment:
   ## Useless parameter
   
   The parameter 's' is never used.
   
   [Show more details](https://github.com/apache/datasketches-java/security/code-scanning/703)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1542057067


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);

Review Comment:
   After a long side conversation, resolved to change name delta -> eps and the tolerance to 2 * eps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1541395138


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,355 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkIdenticalDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkIdenticalDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+   }
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s1));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s1), 0.0, 0.0);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentClassicDoublesSketches() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllDoublesSketches() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void checkSameDistributionDifferentKllFloatsSketches() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 0, delta);
+ }
+
+ @Test
+ public void mediumResolutionClassicDoubles() {
+   final int k = 2048;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + .05);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertFalse(reject);
+ }
+
+ @Test
+ public void mediumResolutionKllDoubles() {
+   final int k = 2048;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + .05);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertFalse(reject);
+ }
+
+ @Test
+ public void mediumResolutionKllFloats() {
+   final int k = 2048;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + .05F);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertFalse(reject);
+ }
+
+ @Test
+ public void highResolutionClassicDoubles() {
+   final int k = 8192;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + .05);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertTrue(reject);
+ }
+
+ @Test
+ public void highResolutionKllDoubles() {
+   final int k = 8192;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + .05);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertTrue(reject);
+ }
+
+ @Test
+ public void highResolutionKllFloats() {
+   final int k = 8192;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+   final double tgtPvalue = .05;
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + .05F);
+     s2.update(x);
+   }
+
+   double D = KolmogorovSmirnov.computeKSDelta(s1, s2);
+   double thresh = KolmogorovSmirnov.computeKSThreshold(s1, s2, tgtPvalue);
+   final boolean reject = KolmogorovSmirnov.kolmogorovSmirnovTest(s1, s2, tgtPvalue);
+   println("pVal = " + tgtPvalue + "\nK = " + k + "\nD = " + D + "\nTh = " + thresh
+       + "\nNull Hypoth Rejected = " + reject);
+   assertTrue(reject);
+ }
+
+  @Test
+  public void printlnTest() {
+    println("PRINTING: "+this.getClass().getName());
+  }
+
+  /**
+   * @param s value to print
+   */
+  static void println(String s) {

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1541841762


##########
src/main/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnov.java:
##########
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.apache.datasketches.quantilescommon.QuantilesAPI.UNSUPPORTED_MSG;
+
+import org.apache.datasketches.req.ReqSketch;
+
+/**
+ * Kolmogorov-Smirnov Test
+ * See <a href="https://en.wikipedia.org/wiki/Kolmogorov-Smirnov_test">Kolmogorov–Smirnov Test</a>
+ */
+public final class KolmogorovSmirnov {
+
+  /**
+   * Computes the raw delta between two QuantilesDoublesAPI sketches for the <i>kolmogorovSmirnovTest(...)</i> method.
+   * @param sketch1 first Input QuantilesDoublesAPI
+   * @param sketch2 second Input QuantilesDoublesAPI
+   * @return the raw delta area between two QuantilesDoublesAPI sketches
+   */
+  public static double computeKSDelta(final QuantilesDoublesAPI sketch1, final QuantilesDoublesAPI sketch2) {
+    final DoublesSortedView p = sketch1.getSortedView();
+    final DoublesSortedView q = sketch2.getSortedView();
+
+    final double[] pSamplesArr = p.getQuantiles();
+    final double[] qSamplesArr = q.getQuantiles();
+    final long[] pCumWtsArr = p.getCumulativeWeights();
+    final long[] qCumWtsArr = q.getCumulativeWeights();
+    final int pSamplesArrLen = pSamplesArr.length;
+    final int qSamplesArrLen = qSamplesArr.length;
+
+    final double n1 = sketch1.getN();
+    final double n2 = sketch2.getN();
+
+    double deltaHeight = 0;
+    int i = 0;
+    int j = 0;
+
+    while ((i < pSamplesArrLen - 1) && (j < qSamplesArrLen - 1)) {
+      deltaHeight = Math.max(deltaHeight, Math.abs(pCumWtsArr[i] / n1 - qCumWtsArr[j] / n2));
+      if (pSamplesArr[i] < qSamplesArr[j]) {
+        i++;
+      } else if (qSamplesArr[j] < pSamplesArr[i]) {
+        j++;
+      } else {
+        i++;
+        j++;
+      }
+    }
+
+    deltaHeight = Math.max(deltaHeight, Math.abs(pCumWtsArr[i] / n1 - qCumWtsArr[j] / n2));
+    return deltaHeight;
+  }
+
+  /**
+   * Computes the raw delta between two QuantilesFloatsAPI sketches for the <i>kolmogorovSmirnovTest(...)</i> method.
+   * method.
+   * @param sketch1 first Input QuantilesFloatsAPI sketch
+   * @param sketch2 second Input QuantilesFloatsAPI sketch
+   * @return the raw delta area between two QuantilesFloatsAPI sketches
+   */
+  public static double computeKSDelta(final QuantilesFloatsAPI sketch1, final QuantilesFloatsAPI sketch2) {
+    final FloatsSortedView p = sketch1.getSortedView();
+    final FloatsSortedView q = sketch2.getSortedView();
+
+    final float[] pSamplesArr = p.getQuantiles();
+    final float[] qSamplesArr = q.getQuantiles();
+    final long[] pCumWtsArr = p.getCumulativeWeights();
+    final long[] qCumWtsArr = q.getCumulativeWeights();
+    final int pSamplesArrLen = pSamplesArr.length;
+    final int qSamplesArrLen = qSamplesArr.length;
+
+    final double n1 = sketch1.getN();
+    final double n2 = sketch2.getN();
+
+    double deltaHeight = 0;
+    int i = 0;
+    int j = 0;
+
+    while ((i < pSamplesArrLen - 1) && (j < qSamplesArrLen - 1)) {
+      deltaHeight = Math.max(deltaHeight, Math.abs(pCumWtsArr[i] / n1 - qCumWtsArr[j] / n2));
+      if (pSamplesArr[i] < qSamplesArr[j]) {
+        i++;
+      } else if (qSamplesArr[j] < pSamplesArr[i]) {
+        j++;
+      } else {
+        i++;
+        j++;
+      }
+    }
+
+    deltaHeight = Math.max(deltaHeight, Math.abs(pCumWtsArr[i] / n1 - qCumWtsArr[j] / n2));
+    return deltaHeight;
+  }
+
+  /**
+   * Computes the adjusted delta height threshold for the <i>kolmogorovSmirnovTest(...)</i> method.
+   * This adjusts the computed threshold by the error epsilons of the two given sketches.
+   * The two sketches must be of the same primitive type, double or float.
+   * This will not work with the REQ sketch.
+   * @param sketch1 first Input QuantilesAPI sketch
+   * @param sketch2 second Input QuantilesAPI sketch
+   * @param tgtPvalue Target p-value. Typically .001 to .1, e.g., .05.
+   * @return the adjusted threshold to be compared with the raw delta area.
+   */
+  public static double computeKSThreshold(final QuantilesAPI sketch1,
+                                          final QuantilesAPI sketch2,
+                                          final double tgtPvalue) {
+    final double r1 = sketch1.getNumRetained();
+    final double r2 = sketch2.getNumRetained();
+    final double alpha = tgtPvalue;
+    final double alphaFactor = Math.sqrt(-0.5 * Math.log(0.5 * alpha));
+    final double deltaAreaThreshold = alphaFactor * Math.sqrt((r1 + r2) / (r1 * r2));
+    final double eps1 = sketch1.getNormalizedRankError(false);
+    final double eps2 = sketch2.getNormalizedRankError(false);
+    return deltaAreaThreshold + eps1 + eps2;
+  }
+
+  /**
+   * Performs the Kolmogorov-Smirnov Test between two QuantilesAPI sketches.
+   * Note: if the given sketches have insufficient data or if the sketch sizes are too small,
+   * this will return false. The two sketches must be of the same primitive type, double or float.
+   * This will not work with the REQ sketch.
+   * @param sketch1 first Input QuantilesAPI
+   * @param sketch2 second Input QuantilesAPI
+   * @param tgtPvalue Target p-value. Typically .001 to .1, e.g., .05.
+   * @return Boolean indicating whether we can reject the null hypothesis (that the sketches
+   * reflect the same underlying distribution) using the provided tgtPValue.
+   */
+  public static boolean kolmogorovSmirnovTest(final QuantilesAPI sketch1,
+      final QuantilesAPI sketch2, final double tgtPvalue) {
+
+    final double delta = isDoubleType(sketch1, sketch2)

Review Comment:
   In C++, yes, in Java, no.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on code in PR #535:
URL: https://github.com/apache/datasketches-java/pull/535#discussion_r1542059544


##########
src/test/java/org/apache/datasketches/quantilescommon/KolmogorovSmirnovTest.java:
##########
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.quantilescommon;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertFalse;
+import static org.testng.Assert.assertTrue;
+
+import java.util.Random;
+
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketch;
+import org.apache.datasketches.kll.KllSketch;
+import org.apache.datasketches.quantiles.DoublesSketch;
+import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.testng.annotations.Test;
+
+public class KolmogorovSmirnovTest {
+
+ @Test
+ public void checkDisjointDistributionClassicDoubles() {
+   final int k = 256;
+   final UpdateDoublesSketch s1 = DoublesSketch.builder().setK(k).build();
+   final UpdateDoublesSketch s2 = DoublesSketch.builder().setK(k).build();
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = DoublesSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllDoubles() {
+   final int k = 256;
+   final KllDoublesSketch s1 = KllDoublesSketch.newHeapInstance(k);
+   final KllDoublesSketch s2 = KllDoublesSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final double x = rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);
+ }
+
+ @Test
+ public void checkDisjointDistributionKllFloats() {
+   final int k = 256;
+   final KllFloatsSketch s1 = KllFloatsSketch.newHeapInstance(k);
+   final KllFloatsSketch s2 = KllFloatsSketch.newHeapInstance(k);
+
+   final Random rand = new Random(1);
+
+   final int n =  (3 * k) - 1;
+   for (int i = 0; i < n; ++i) {
+     final float x = (float)rand.nextGaussian();
+     s1.update(x + 500);
+     s2.update(x);
+   }
+   final double delta = KllSketch.getNormalizedRankError(k, false);
+   println("D = " + KolmogorovSmirnov.computeKSDelta(s1, s2));
+   assertEquals(KolmogorovSmirnov.computeKSDelta(s1, s2), 1.0, delta);

Review Comment:
   After a long side conversation, resolved to change name delta -> eps and the tolerance to 2 * eps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] This extends the usability of the Kolmogorov-Smirnov test (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho merged PR #535:
URL: https://github.com/apache/datasketches-java/pull/535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org