You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by "leerho (via GitHub)" <gi...@apache.org> on 2023/02/27 20:05:55 UTC

[GitHub] [datasketches-java] leerho commented on a diff in pull request #430: HLL relative error

leerho commented on code in PR #430:
URL: https://github.com/apache/datasketches-java/pull/430#discussion_r1119247599


##########
src/main/java/org/apache/datasketches/hll/BaseHllSketch.java:
##########
@@ -118,8 +120,14 @@ public static final int getSerializationVersion(final Memory mem) {
    * <a href="{@docRoot}/resources/dictionary.html#numStdDev">Number of Standard Deviations</a>
    * @return the current (approximate) RelativeError
    */
-  public double getRelErr(final boolean upperBound, final boolean unioned,
+  public static double getRelErr(final boolean upperBound, final boolean unioned,
       final int lgConfigK, final int numStdDev) {
+    HllUtil.checkLgK(lgConfigK);
+    if (lgConfigK > 12) {
+      final double rseFactor = unioned ? HLL_NON_HIP_RSE_FACTOR : HLL_HIP_RSE_FACTOR;
+      final int configK = 1 << lgConfigK;
+      return (upperBound ? -1.0 : 1.0) * (numStdDev * rseFactor) / Math.sqrt(configK);

Review Comment:
   If the user calls this method, the returned relative error will be negative for the upper bound, which will be very confusing because Relative Error should never be a negative number! 
   
   In our internal method HllEstimators.hllUpperBound(...), the relative error is not really negative.  Think of it as a positive value subtracted from 1.0, only for the UB.  The resulting Upper Bound (and Lower Bound) values are always positive. 
   
   The fact that our internal tables return a negative value is an internal issue that the user should never see.



##########
src/main/java/org/apache/datasketches/hll/HllEstimators.java:
##########
@@ -50,39 +48,17 @@ static final double hllLowerBound(final AbstractHllArray absHllArr, final int nu
     final int configK = 1 << lgConfigK;
     final double numNonZeros =
         (absHllArr.getCurMin() == 0) ? configK - absHllArr.getNumAtCurMin() : configK;
-    final double estimate;
-    final double rseFactor;
+    final double estimate = absHllArr.getEstimate();
     final boolean oooFlag = absHllArr.isOutOfOrder();
-    if (oooFlag) {
-      estimate = absHllArr.getCompositeEstimate();
-      rseFactor = HLL_NON_HIP_RSE_FACTOR;
-    } else {
-      estimate = absHllArr.getHipAccum();
-      rseFactor = HLL_HIP_RSE_FACTOR;
-    }
-    final double relErr = (lgConfigK > 12)
-        ? (numStdDev * rseFactor) / Math.sqrt(configK)
-        : RelativeErrorTables.getRelErr(false, oooFlag, lgConfigK, numStdDev);
+    final double relErr = BaseHllSketch.getRelErr(false, oooFlag, lgConfigK, numStdDev);
     return Math.max(estimate / (1.0 + relErr), numNonZeros);
   }
 
   static final double hllUpperBound(final AbstractHllArray absHllArr, final int numStdDev) {
     final int lgConfigK = absHllArr.lgConfigK;
-    final int configK = 1 << lgConfigK;
-    final double estimate;
-    final double rseFactor;
+    final double estimate = absHllArr.getEstimate();
     final boolean oooFlag = absHllArr.isOutOfOrder();
-    if (oooFlag) {
-      estimate = absHllArr.getCompositeEstimate();
-      rseFactor = HLL_NON_HIP_RSE_FACTOR;
-    } else {
-      estimate = absHllArr.getHipAccum();
-      rseFactor = HLL_HIP_RSE_FACTOR;
-    }
-
-    final double relErr = (lgConfigK > 12)
-        ? ((-1.0) * (numStdDev * rseFactor)) / Math.sqrt(configK)
-        : RelativeErrorTables.getRelErr(true, oooFlag, lgConfigK, numStdDev);
+    final double relErr = BaseHllSketch.getRelErr(true, oooFlag, lgConfigK, numStdDev);
     return estimate / (1.0 + relErr);
   }

Review Comment:
   The hllUpperBound(...) method needs to be changed to accommodate the fact that the relErr obtained from the BaseHllSketch.getRelErr(...) must always be a positive value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org