You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/06/11 23:09:47 UTC

[GitHub] [incubator-datasketches-java] AlexanderSaydakov commented on a change in pull request #321: Improvements in Theta Sketch to (hopefully) eliminate the

AlexanderSaydakov commented on a change in pull request #321:
URL: https://github.com/apache/incubator-datasketches-java/pull/321#discussion_r439118597



##########
File path: src/test/java/org/apache/datasketches/theta/EmptyTest.java
##########
@@ -111,45 +105,6 @@ public void checkPsampling() {
     assertEquals(sk1.compact().toByteArray().length, 8);
   }
 
-  //These 3 tests reproduce a failure mode where an "old" empty sketch of 8 bytes without

Review comment:
       why these tests are removed?

##########
File path: src/main/java/org/apache/datasketches/theta/Sketch.java
##########
@@ -677,15 +677,14 @@ private static final Sketch heapifyFromMemory(final Memory srcMem, final long se
           throw new SketchesArgumentException(
               "Corrupted: COMPACT family sketch image must have Read-Only flag set");
         }
-        final boolean empty = PreambleUtil.isEmpty(srcMem); //also checks memCap < 16
-        if (empty) { //empty flag or < 16 bytes
+        final boolean empty = PreambleUtil.isEmpty(srcMem);
+        if (empty) { //TODO

Review comment:
       not clear what to do

##########
File path: src/main/java/org/apache/datasketches/theta/UnionImpl.java
##########
@@ -431,6 +426,10 @@ private void processVer3(final Memory skMem) {
   //has seedHash and p, could have 0 entries & theta,
   // can only be compact, ordered, size >= 8
   private void processVer2(final Memory skMem) {
+    final int famId = extractFamilyID(skMem);
+    if (famId != 3) {

Review comment:
       should this refer to a constant in Family?

##########
File path: src/main/java/org/apache/datasketches/theta/PreambleUtil.java
##########
@@ -462,6 +468,20 @@ static boolean isEmpty(final Memory mem) {
     return emptyFlag || emptyCap;
   }
 
+  static boolean isSingleItem(final Memory mem) {
+    // Flags byte must be LittleEndian, ReadOnly, Not Empty, Compact, Ordered = 11010 = 0x1A.
+    // Flags mask will be 0x1F.
+    // SingleItem flag may not be set due to a historical bug, so we can't depend on it for now.
+    // However, if the above flags are correct, preLongs == 1, SerVer >= 3, FamilyID == 3,
+    // and the hash seed matches (not done here), it is virtually guaranteed that we have a
+    // SingleItem Sketch.
+    final boolean preLongs = extractPreLongs(mem) == 1;
+    final boolean serVer = extractSerVer(mem) >= 3;
+    final boolean famId = extractFamilyID(mem) == 3; //compact

Review comment:
       should this refer to a constant defined in Family?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org