You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2021/07/19 19:49:12 UTC

[GitHub] [commons-statistics] aherbert commented on a change in pull request #27: [STATISTICS-31] Add survivalProbability function for cont. dist.

aherbert commented on a change in pull request #27:
URL: https://github.com/apache/commons-statistics/pull/27#discussion_r672529094



##########
File path: commons-statistics-distribution/src/main/java/org/apache/commons/statistics/distribution/CauchyDistribution.java
##########
@@ -47,6 +47,12 @@ public double cumulativeProbability(double x) {
         return 0.5 + (Math.atan((x - median) / scale) / Math.PI);
     }
 
+    /** {@inheritDoc} */
+    @Override
+    public double survivalProbability(double x) {
+        return cumulativeProbability(median - (x - median));

Review comment:
       Would this be better served with extracting the cumulativeProbability to a private method:
   ```java
       /** {@inheritDoc} */
       @Override
       public double cumulativeProbability(double x) {
           return cdf((x - median) / scale);
       }
   
       /** {@inheritDoc} */
       @Override
       public double survivalProbability(double x) {
           return cdf(-(x - median) / scale);
       }
   
       /**
        * Compute the CDF of the Cauchy distribution with location 0 and scale 1.
        *
        * @param x Point at which the CDF is evaluated.
        * @return CDF(x)
        */
       private static double cdf(double x) {
           return 0.5 + (Math.atan(x) / Math.PI);
       }
   ```
   
   Thus the deviation from the median is the same magnitude but with opposite sign.
   

##########
File path: commons-statistics-distribution/src/main/java/org/apache/commons/statistics/distribution/ContinuousDistribution.java
##########
@@ -94,6 +94,23 @@ default double logDensity(double x) {
      */
     double cumulativeProbability(double x);
 
+    /**
+     * For a random variable {@code X} whose values are distributed according
+     * to this distribution, this method returns {@code P(X > x)}.
+     * In other words, this method represents the complementary cumulative
+     * distribution function
+     *
+     * By default, this is defined as {@code 1 - cumulativeProbability(x)}, but

Review comment:
       `<p>` tag to start a new paragraph.

##########
File path: commons-statistics-distribution/src/main/java/org/apache/commons/statistics/distribution/ContinuousDistribution.java
##########
@@ -94,6 +94,23 @@ default double logDensity(double x) {
      */
     double cumulativeProbability(double x);
 
+    /**
+     * For a random variable {@code X} whose values are distributed according
+     * to this distribution, this method returns {@code P(X > x)}.
+     * In other words, this method represents the complementary cumulative
+     * distribution function
+     *
+     * By default, this is defined as {@code 1 - cumulativeProbability(x)}, but
+     * the specific implementation may be more accurate.
+     *
+     * @param x Point at which the survival function is evaluated

Review comment:
       End with a period

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/BetaDistributionTest.java
##########
@@ -149,6 +152,72 @@ void testCumulative() {
                             0.710208, 0.873964, 0.966656, 0.997272, 1.000000, 1.000000});
     }
 
+    @Test
+    void testSurvival() {
+        checkSurvival(1, 1, new double[]{
+            0.0, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.0});
+        checkSurvival(4, 1, new double[]{
+            0.0000, 0.0000, 0.0001, 0.0016, 0.0081, 0.0256, 0.0625, 0.1296, 0.2401,
+            0.4096, 0.6561, 1.0000, 1.0000});
+        checkSurvival(2, 4, new double[]{
+            0.00000, 0.00000, 0.08146, 0.26272, 0.47178, 0.66304, 0.81250, 0.91296,
+            0.96922, 0.99328, 0.99954, 1.00000, 1.00000});
+    }
+
+    @Test
+    void testCumulativeAndSurvivalComplement() {
+        checkCumulativeSurvivalComplement(0.1, 0.1);
+        checkCumulativeSurvivalComplement(0.1, 0.5);
+        checkCumulativeSurvivalComplement(0.1, 1.0);
+        checkCumulativeSurvivalComplement(0.1, 2.0);
+        checkCumulativeSurvivalComplement(0.1, 4.0);
+        checkCumulativeSurvivalComplement(0.5, 0.1);
+        checkCumulativeSurvivalComplement(0.5, 0.5);
+        checkCumulativeSurvivalComplement(0.5, 1.0);
+        checkCumulativeSurvivalComplement(0.5, 2.0);
+        checkCumulativeSurvivalComplement(0.5, 4.0);
+        checkCumulativeSurvivalComplement(1.0, 0.1);
+        checkCumulativeSurvivalComplement(1.0, 0.5);
+        checkCumulativeSurvivalComplement(1, 1);
+        checkCumulativeSurvivalComplement(1, 2);
+        checkCumulativeSurvivalComplement(1, 4);
+        checkCumulativeSurvivalComplement(2.0, 0.1);
+        checkCumulativeSurvivalComplement(2, 1);
+        checkCumulativeSurvivalComplement(2.0, 0.5);
+        checkCumulativeSurvivalComplement(2, 2);
+        checkCumulativeSurvivalComplement(2, 4);
+        checkCumulativeSurvivalComplement(4.0, 0.1);
+        checkCumulativeSurvivalComplement(4.0, 0.5);
+        checkCumulativeSurvivalComplement(4, 1);
+        checkCumulativeSurvivalComplement(4, 2);
+        checkCumulativeSurvivalComplement(4, 4);
+    }
+
+    /**Precision tests for verifying that CDF calculates accurately in cases where 1-cdf(x) is inaccurately 1*/
+    @Test
+    void testCumulativePrecision() {
+        // Calculated using WolframAlpha
+        checkCumulativePrecision(5.0, 5.0, 0.0001, 1.2595800539968654e-18);
+        checkCumulativePrecision(4.0, 5.0, 0.00001, 6.999776002800025e-19);
+        checkCumulativePrecision(5.0, 4.0, 0.0001, 5.598600119996539e-19);
+        checkCumulativePrecision(6.0, 2.0, 0.001, 6.994000000000028e-18);
+        checkCumulativePrecision(2.0, 6.0, 1e-9, 2.0999999930000014e-17);
+    }
+
+    /**
+     * Precision tests for verifying that survivalFunction calculates accurately in cases
+     * where 1-sf(x) is inaccurately 1
+     * */

Review comment:
       Change `* */` to `*/`

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/BetaDistributionTest.java
##########
@@ -149,6 +152,72 @@ void testCumulative() {
                             0.710208, 0.873964, 0.966656, 0.997272, 1.000000, 1.000000});
     }
 
+    @Test
+    void testSurvival() {
+        checkSurvival(1, 1, new double[]{
+            0.0, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.0});
+        checkSurvival(4, 1, new double[]{
+            0.0000, 0.0000, 0.0001, 0.0016, 0.0081, 0.0256, 0.0625, 0.1296, 0.2401,
+            0.4096, 0.6561, 1.0000, 1.0000});
+        checkSurvival(2, 4, new double[]{
+            0.00000, 0.00000, 0.08146, 0.26272, 0.47178, 0.66304, 0.81250, 0.91296,
+            0.96922, 0.99328, 0.99954, 1.00000, 1.00000});
+    }
+
+    @Test
+    void testCumulativeAndSurvivalComplement() {
+        checkCumulativeSurvivalComplement(0.1, 0.1);
+        checkCumulativeSurvivalComplement(0.1, 0.5);
+        checkCumulativeSurvivalComplement(0.1, 1.0);
+        checkCumulativeSurvivalComplement(0.1, 2.0);
+        checkCumulativeSurvivalComplement(0.1, 4.0);
+        checkCumulativeSurvivalComplement(0.5, 0.1);
+        checkCumulativeSurvivalComplement(0.5, 0.5);
+        checkCumulativeSurvivalComplement(0.5, 1.0);
+        checkCumulativeSurvivalComplement(0.5, 2.0);
+        checkCumulativeSurvivalComplement(0.5, 4.0);
+        checkCumulativeSurvivalComplement(1.0, 0.1);
+        checkCumulativeSurvivalComplement(1.0, 0.5);
+        checkCumulativeSurvivalComplement(1, 1);
+        checkCumulativeSurvivalComplement(1, 2);
+        checkCumulativeSurvivalComplement(1, 4);
+        checkCumulativeSurvivalComplement(2.0, 0.1);
+        checkCumulativeSurvivalComplement(2, 1);
+        checkCumulativeSurvivalComplement(2.0, 0.5);
+        checkCumulativeSurvivalComplement(2, 2);
+        checkCumulativeSurvivalComplement(2, 4);
+        checkCumulativeSurvivalComplement(4.0, 0.1);
+        checkCumulativeSurvivalComplement(4.0, 0.5);
+        checkCumulativeSurvivalComplement(4, 1);
+        checkCumulativeSurvivalComplement(4, 2);
+        checkCumulativeSurvivalComplement(4, 4);
+    }
+
+    /**Precision tests for verifying that CDF calculates accurately in cases where 1-cdf(x) is inaccurately 1*/
+    @Test
+    void testCumulativePrecision() {
+        // Calculated using WolframAlpha
+        checkCumulativePrecision(5.0, 5.0, 0.0001, 1.2595800539968654e-18);
+        checkCumulativePrecision(4.0, 5.0, 0.00001, 6.999776002800025e-19);
+        checkCumulativePrecision(5.0, 4.0, 0.0001, 5.598600119996539e-19);
+        checkCumulativePrecision(6.0, 2.0, 0.001, 6.994000000000028e-18);
+        checkCumulativePrecision(2.0, 6.0, 1e-9, 2.0999999930000014e-17);
+    }
+
+    /**
+     * Precision tests for verifying that survivalFunction calculates accurately in cases
+     * where 1-sf(x) is inaccurately 1

Review comment:
       Add a period at the end of the sentence

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/BetaDistributionTest.java
##########
@@ -149,6 +152,72 @@ void testCumulative() {
                             0.710208, 0.873964, 0.966656, 0.997272, 1.000000, 1.000000});
     }
 
+    @Test
+    void testSurvival() {
+        checkSurvival(1, 1, new double[]{
+            0.0, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.0});
+        checkSurvival(4, 1, new double[]{
+            0.0000, 0.0000, 0.0001, 0.0016, 0.0081, 0.0256, 0.0625, 0.1296, 0.2401,
+            0.4096, 0.6561, 1.0000, 1.0000});
+        checkSurvival(2, 4, new double[]{
+            0.00000, 0.00000, 0.08146, 0.26272, 0.47178, 0.66304, 0.81250, 0.91296,
+            0.96922, 0.99328, 0.99954, 1.00000, 1.00000});
+    }
+
+    @Test
+    void testCumulativeAndSurvivalComplement() {
+        checkCumulativeSurvivalComplement(0.1, 0.1);
+        checkCumulativeSurvivalComplement(0.1, 0.5);
+        checkCumulativeSurvivalComplement(0.1, 1.0);
+        checkCumulativeSurvivalComplement(0.1, 2.0);
+        checkCumulativeSurvivalComplement(0.1, 4.0);
+        checkCumulativeSurvivalComplement(0.5, 0.1);
+        checkCumulativeSurvivalComplement(0.5, 0.5);
+        checkCumulativeSurvivalComplement(0.5, 1.0);
+        checkCumulativeSurvivalComplement(0.5, 2.0);
+        checkCumulativeSurvivalComplement(0.5, 4.0);
+        checkCumulativeSurvivalComplement(1.0, 0.1);
+        checkCumulativeSurvivalComplement(1.0, 0.5);
+        checkCumulativeSurvivalComplement(1, 1);
+        checkCumulativeSurvivalComplement(1, 2);
+        checkCumulativeSurvivalComplement(1, 4);
+        checkCumulativeSurvivalComplement(2.0, 0.1);
+        checkCumulativeSurvivalComplement(2, 1);
+        checkCumulativeSurvivalComplement(2.0, 0.5);
+        checkCumulativeSurvivalComplement(2, 2);
+        checkCumulativeSurvivalComplement(2, 4);
+        checkCumulativeSurvivalComplement(4.0, 0.1);
+        checkCumulativeSurvivalComplement(4.0, 0.5);
+        checkCumulativeSurvivalComplement(4, 1);
+        checkCumulativeSurvivalComplement(4, 2);
+        checkCumulativeSurvivalComplement(4, 4);
+    }
+
+    /**Precision tests for verifying that CDF calculates accurately in cases where 1-cdf(x) is inaccurately 1*/

Review comment:
       Javadoc should have a space after `/**` and a period and space before `*/`

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/ContinuousDistributionAbstractTest.java
##########
@@ -46,6 +46,20 @@
  * makeInverseCumulativeTestPoints() -- arguments used to test inverse cdf
  * makeInverseCumulativeTestValues() -- expected inverse cdf values
  * <p>
+ * <p>

Review comment:
       Remove extra `<p>` tag

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/BetaDistributionTest.java
##########
@@ -160,6 +229,41 @@ private void checkCumulative(double alpha, double beta, double[] x, double[] cum
         }
     }
 
+    private void checkSurvival(double alpha, double beta, double[] cumes) {
+        final BetaDistribution d = new BetaDistribution(alpha, beta);
+        for (int i = 0; i < CUMULATIVE_TEST_POINTS.length; i++) {
+            Assertions.assertEquals(1 - cumes[i], d.survivalProbability(CUMULATIVE_TEST_POINTS[i]), 1e-8);
+        }
+    }
+
+    private void checkCumulativeSurvivalComplement(double alpha, double beta) {
+        final BetaDistribution d = new BetaDistribution(alpha, beta);
+        for (int i = 0; i < CUMULATIVE_TEST_POINTS.length; i++) {
+            double x = CUMULATIVE_TEST_POINTS[i];
+            Assertions.assertEquals(1, d.cumulativeProbability(x) + d.survivalProbability(x), 1e-8);
+        }
+    }
+
+    private void checkCumulativePrecision(double alpha, double beta, double value, double expected) {
+        final double tolerance = 1e-15;

Review comment:
       This tolerance is much larger than the expected which are below 5e-17. The tolerance should be lower.

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/ContinuousDistributionAbstractTest.java
##########
@@ -191,6 +252,54 @@ protected void verifyCumulativeProbabilities() {
         }
     }
 
+    protected void verifySurvivalProbability() {
+        for (int i = 0; i < cumulativeTestPoints.length; i++) {
+            TestUtils.assertEquals("Incorrect survival probability value returned for " +
+                            cumulativeTestPoints[i], 1 - cumulativeTestValues[i],
+                    distribution.survivalProbability(cumulativeTestPoints[i]),
+                    getTolerance());
+        }
+    }
+
+    protected void verifySurvivalAndCumulativeProbabilityComplement() {
+        for (final double x : cumulativeTestPoints) {
+            TestUtils.assertEquals("survival + cumulative probability were not close to 1.0" +
+                    x, 1.0,
+                distribution.survivalProbability(x) + distribution.cumulativeProbability(x),
+                getTolerance());
+        }
+    }
+
+    /**
+     * Verifies that survival is simply not 1-cdf by testing calculations that would underflow that calculation and
+     * result in an inaccurate answer

Review comment:
       Add a period to the end of the sentence.

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/ContinuousDistributionAbstractTest.java
##########
@@ -496,6 +627,10 @@ protected double getTolerance() {
         return tolerance;
     }
 
+    protected double getHighPrecisionTolerance() {

Review comment:
       Add a simple javadoc for this

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/ContinuousDistributionAbstractTest.java
##########
@@ -98,6 +127,34 @@
     /** Creates the default cumulative probability test expected values */
     public abstract double[] makeCumulativeTestValues();
 
+    /** Creates the default cumulative probability precision test input values */
+    public double[] makeCumulativePrecisionTestPoints() {
+        return new double[0];
+    }
+
+    /**
+     * Creates the default cumulative probability precision test expected values
+     * Note: The default threshold is 1e-15, any expected values with much higher precision may
+     *        not test the desired results without increasing precision threshold
+     */
+    public double[] makeCumulativePrecisionTestValues() {
+        return new double[0];
+    }
+
+    /** Creates the default survival probability precision test input values */
+    public double[] makeSurvivalPrecisionTestPoints() {
+        return new double[0];
+    }
+
+    /**
+     * Creates the default survival probability precision test expected values
+     * Note: The default threshold is 1e-22, any expected values with much higher precision may
+     *        not test the desired results without increasing precision threshold

Review comment:
       No need for the extra space here before `not`

##########
File path: commons-statistics-distribution/src/main/java/org/apache/commons/statistics/distribution/NormalDistribution.java
##########
@@ -94,6 +94,12 @@ public double cumulativeProbability(double x)  {
         return 0.5 * Erfc.value(-dev / (standardDeviation * SQRT2));
     }
 
+    /** {@inheritDoc} */
+    @Override
+    public double survivalProbability(double x) {
+        return cumulativeProbability(mean - (x - mean));

Review comment:
       I would avoid the double subtraction and duplicate the code instead:
   
   ```java
   public double survivalProbability(double x) {
       public double survivalProbability(double x) {
           final double dev = x - mean;
           if (Math.abs(dev) > 40 * standardDeviation) {
               return dev > 0 ? 0.0d : 1.0d;
           }
           return 0.5 * Erfc.value(dev / (standardDeviation * SQRT2));
       }
   }
   ```
   
   Duplicating logic with an opposite return for the edge of the support is similar to what is done in a few others such as the Beta, Exponential, or Levy.

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/ContinuousDistributionAbstractTest.java
##########
@@ -98,6 +127,34 @@
     /** Creates the default cumulative probability test expected values */
     public abstract double[] makeCumulativeTestValues();
 
+    /** Creates the default cumulative probability precision test input values */
+    public double[] makeCumulativePrecisionTestPoints() {
+        return new double[0];
+    }
+
+    /**
+     * Creates the default cumulative probability precision test expected values
+     * Note: The default threshold is 1e-15, any expected values with much higher precision may
+     *        not test the desired results without increasing precision threshold
+     */
+    public double[] makeCumulativePrecisionTestValues() {
+        return new double[0];
+    }
+
+    /** Creates the default survival probability precision test input values */
+    public double[] makeSurvivalPrecisionTestPoints() {
+        return new double[0];
+    }
+
+    /**
+     * Creates the default survival probability precision test expected values
+     * Note: The default threshold is 1e-22, any expected values with much higher precision may
+     *        not test the desired results without increasing precision threshold

Review comment:
       No need for the extra space here before `not`

##########
File path: commons-statistics-distribution/src/main/java/org/apache/commons/statistics/distribution/ContinuousDistribution.java
##########
@@ -94,6 +94,23 @@ default double logDensity(double x) {
      */
     double cumulativeProbability(double x);
 
+    /**
+     * For a random variable {@code X} whose values are distributed according
+     * to this distribution, this method returns {@code P(X > x)}.
+     * In other words, this method represents the complementary cumulative
+     * distribution function

Review comment:
       End with a period.

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/BetaDistributionTest.java
##########
@@ -160,6 +229,41 @@ private void checkCumulative(double alpha, double beta, double[] x, double[] cum
         }
     }
 
+    private void checkSurvival(double alpha, double beta, double[] cumes) {
+        final BetaDistribution d = new BetaDistribution(alpha, beta);
+        for (int i = 0; i < CUMULATIVE_TEST_POINTS.length; i++) {
+            Assertions.assertEquals(1 - cumes[i], d.survivalProbability(CUMULATIVE_TEST_POINTS[i]), 1e-8);
+        }
+    }
+
+    private void checkCumulativeSurvivalComplement(double alpha, double beta) {
+        final BetaDistribution d = new BetaDistribution(alpha, beta);
+        for (int i = 0; i < CUMULATIVE_TEST_POINTS.length; i++) {
+            double x = CUMULATIVE_TEST_POINTS[i];
+            Assertions.assertEquals(1, d.cumulativeProbability(x) + d.survivalProbability(x), 1e-8);
+        }
+    }
+
+    private void checkCumulativePrecision(double alpha, double beta, double value, double expected) {
+        final double tolerance = 1e-15;
+        BetaDistribution d = new BetaDistribution(alpha, beta);
+        TestUtils.assertEquals(
+                "cumulative probability not precise at " + value + " for a=" + alpha + " & b=" + beta,
+                d.cumulativeProbability(value),
+                expected,
+                tolerance);
+    }
+
+    private void checkSurvivalPrecision(double alpha, double beta, double value, double expected) {
+        final double tolerance = 1e-15;

Review comment:
       This tolerance is much larger than the expected which are below 5e-17. The tolerance should be lower.

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/ContinuousDistributionAbstractTest.java
##########
@@ -46,6 +46,20 @@
  * makeInverseCumulativeTestPoints() -- arguments used to test inverse cdf
  * makeInverseCumulativeTestValues() -- expected inverse cdf values
  * <p>
+ * <p>
+ * If the continuous distribution provides higher precision implementations of cumulativeProbability
+ * and/or survivalProbability, the following methods should be implemented to provide testing.
+ * To use these tests, calculate the cumulativeProbability and survivalProbability such that their naive
+ * complement is exceptionally close to `1` and consequently could lose precision due to floating point
+ * arithmetic.
+ *
+ * NOTE: The default high-precision threshold is 1e-15.
+ *
+ * makeCumulativePrecisionTestPoints() -- high precision test inputs

Review comment:
       Put this block in `<pre>` tags otherwise it will be rendered as a single sentence.

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/ContinuousDistributionAbstractTest.java
##########
@@ -46,6 +46,20 @@
  * makeInverseCumulativeTestPoints() -- arguments used to test inverse cdf
  * makeInverseCumulativeTestValues() -- expected inverse cdf values
  * <p>
+ * <p>
+ * If the continuous distribution provides higher precision implementations of cumulativeProbability
+ * and/or survivalProbability, the following methods should be implemented to provide testing.
+ * To use these tests, calculate the cumulativeProbability and survivalProbability such that their naive
+ * complement is exceptionally close to `1` and consequently could lose precision due to floating point
+ * arithmetic.
+ *
+ * NOTE: The default high-precision threshold is 1e-15.

Review comment:
       Add a `<p>` tag. The default tolerance should be changed to 1e-22.
   

##########
File path: commons-statistics-distribution/src/test/java/org/apache/commons/statistics/distribution/ContinuousDistributionAbstractTest.java
##########
@@ -98,6 +127,34 @@
     /** Creates the default cumulative probability test expected values */
     public abstract double[] makeCumulativeTestValues();
 
+    /** Creates the default cumulative probability precision test input values */
+    public double[] makeCumulativePrecisionTestPoints() {
+        return new double[0];
+    }
+
+    /**
+     * Creates the default cumulative probability precision test expected values
+     * Note: The default threshold is 1e-15, any expected values with much higher precision may

Review comment:
       Update tolerance and add a period at the end of the sentence.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org