You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/08/24 21:46:50 UTC

[GitHub] [incubator-datasketches-java] davecromberge opened a new pull request #331: Correct minor java doc typos

davecromberge opened a new pull request #331:
URL: https://github.com/apache/incubator-datasketches-java/pull/331


   In order to better understand the inner workings of Theta sketch, I have made an attempt to study the calculation for approximation of lower and upper bounds.  This PR consists of a small number of minor corrections to the java docs.
   
   Concerning the module in question, the following comment caught my attention, as it provides a reference to an external writeup:
   ```
   // BTW, the suffixes "NStar", "NPrimeB", and "NPrimeF" correspond to variables in the formal
   // writeup of this scheme.
   ```
   
   Is it possible to recommend or supply a link to the writeup?  Perhaps I could include this in the PR amendments as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] davecromberge merged pull request #331: Correct minor java doc typos

Posted by GitBox <gi...@apache.org>.
davecromberge merged pull request #331:
URL: https://github.com/apache/incubator-datasketches-java/pull/331


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] leerho commented on a change in pull request #331: Correct minor java doc typos

Posted by GitBox <gi...@apache.org>.
leerho commented on a change in pull request #331:
URL: https://github.com/apache/incubator-datasketches-java/pull/331#discussion_r477541450



##########
File path: src/main/java/org/apache/datasketches/BinomialBoundsN.java
##########
@@ -225,9 +225,9 @@ else if (theta < ((numSamplesI) / 360.0)) {  // empirically-determined threshold
    * This must be an integer value of 1, 2 or 3.
    * @param noDataSeen this is normally false. However, in the case where you have zero samples
    * and a theta &lt; 1.0, this flag enables the distinction between a virgin case when no actual
-   * data has been seen and the case where the estimate may be zero but an upper error bound may
+   * data has been seen and the case where the estimate may be zero but a lower error bound may
    * still exist.

Review comment:
       Your change here is incorrect.  When a valid estimate is zero (say, due to an intersection), the lower bound will, by definition be zero, because it cannot be negative, and an upper bound may exist that is > zero (if theta < 1.0).   The sketch is essentially saying "There is a distribution of values that could represent the answer, but the best estimate from that distribution is zero." 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] leerho commented on a change in pull request #331: Correct minor java doc typos

Posted by GitBox <gi...@apache.org>.
leerho commented on a change in pull request #331:
URL: https://github.com/apache/incubator-datasketches-java/pull/331#discussion_r477550347



##########
File path: src/main/java/org/apache/datasketches/BinomialBoundsN.java
##########
@@ -225,9 +225,9 @@ else if (theta < ((numSamplesI) / 360.0)) {  // empirically-determined threshold
    * This must be an integer value of 1, 2 or 3.
    * @param noDataSeen this is normally false. However, in the case where you have zero samples
    * and a theta &lt; 1.0, this flag enables the distinction between a virgin case when no actual
-   * data has been seen and the case where the estimate may be zero but an upper error bound may
+   * data has been seen and the case where the estimate may be zero but a lower error bound may
    * still exist.

Review comment:
       David, thank you for finding this typo!  Make the above correction and you should be able to merge this PR yourself!
   
   As for your question about the reference to an external writeup:  
   
   The person who supported us with much of the math theory and brilliant code implementations for 7 years was Kevin Lang, who tragically and suddenly passed away last year.  I will have to do some digging to see if we even have that reference. :( 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] davecromberge commented on a change in pull request #331: Correct minor java doc typos

Posted by GitBox <gi...@apache.org>.
davecromberge commented on a change in pull request #331:
URL: https://github.com/apache/incubator-datasketches-java/pull/331#discussion_r478498938



##########
File path: src/main/java/org/apache/datasketches/BinomialBoundsN.java
##########
@@ -225,9 +225,9 @@ else if (theta < ((numSamplesI) / 360.0)) {  // empirically-determined threshold
    * This must be an integer value of 1, 2 or 3.
    * @param noDataSeen this is normally false. However, in the case where you have zero samples
    * and a theta &lt; 1.0, this flag enables the distinction between a virgin case when no actual
-   * data has been seen and the case where the estimate may be zero but an upper error bound may
+   * data has been seen and the case where the estimate may be zero but a lower error bound may
    * still exist.

Review comment:
       Thanks for clearing that up Lee, it makes sense for a sketch that estimates cardinalities to have those constraints on the error bounds - especially since the numbers refer to cardinalities.
   
   I'm very sorry to hear about the tragic loss of your gifted colleague... I will leave out the reference in that case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] leerho commented on a change in pull request #331: Correct minor java doc typos

Posted by GitBox <gi...@apache.org>.
leerho commented on a change in pull request #331:
URL: https://github.com/apache/incubator-datasketches-java/pull/331#discussion_r478521055



##########
File path: src/main/java/org/apache/datasketches/BinomialBoundsN.java
##########
@@ -225,9 +225,9 @@ else if (theta < ((numSamplesI) / 360.0)) {  // empirically-determined threshold
    * This must be an integer value of 1, 2 or 3.
    * @param noDataSeen this is normally false. However, in the case where you have zero samples
    * and a theta &lt; 1.0, this flag enables the distinction between a virgin case when no actual
-   * data has been seen and the case where the estimate may be zero but an upper error bound may
+   * data has been seen and the case where the estimate may be zero but a lower error bound may
    * still exist.

Review comment:
       I still don't see the correction.  Line 228 should read "... may be zero but an upper error bound ..."
   
   The javadoc description for the parameter _noDataSeen_ must be identical for both getLowerBound() and getUpperBound().




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org