You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/06/09 16:23:32 UTC

[GitHub] [incubator-datasketches-java] davecromberge commented on a change in pull request #320: Renaming

davecromberge commented on a change in pull request #320:
URL: https://github.com/apache/incubator-datasketches-java/pull/320#discussion_r436833977



##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -83,31 +83,41 @@ public void notB(final Sketch<S> skB) {
     //skB is not empty
     final long thetaLongB = skB.getThetaLong();
     thetaLong_ = Math.min(thetaLong_, thetaLongB);
-    //Build hashtable and removes keys of skB >= theta
+
+    //Build hashtable and removes hashes of skB >= theta
     final int countB = skB.getRetainedEntries();
-    final long[] hashTableKeysB = convertToHashTable(skB.keys_, countB, thetaLong_);
+    CompactSketch<S> cskB = null;
+    QuickSelectSketch<S> qskB = null;
+    final long[] hashTableB;
+    if (skB instanceof CompactSketch) {
+      cskB = (CompactSketch<S>) skB;
+      hashTableB = convertToHashTable(cskB.getHashArr(), countB, thetaLong_);
+    } else {
+      qskB = (QuickSelectSketch<S>) skB;
+      hashTableB = convertToHashTable(qskB.getHashTable(), countB, thetaLong_);

Review comment:
       It appears that the quick select sketches' keys are already in hash table format, does this conversion merely prune out values greater than `thetaLong_`?

##########
File path: src/main/java/org/apache/datasketches/tuple/QuickSelectSketch.java
##########
@@ -388,96 +399,102 @@ boolean isInSamplingMode() {
   }
 
   void setThetaLong(final long theta) {
-    theta_ = theta;
+    thetaLong_ = theta;
   }
 
   void setEmpty(final boolean value) {
-    isEmpty_ = value;
+    empty_ = value;
   }
 
   SummaryFactory<S> getSummaryFactory() {
     return summaryFactory_;
   }
 
-  int findOrInsert(final long key) {
-    final int index = HashOperations.hashSearchOrInsert(keys_, lgCurrentCapacity_, key);
+  int findOrInsert(final long hash) {
+    final int index = HashOperations.hashSearchOrInsert(hashTable_, lgCurrentCapacity_, hash);
     if (index < 0) {
       count_++;
     }
     return index;
   }
 
-  S find(final long key) {
-    final int index = HashOperations.hashSearch(keys_, lgCurrentCapacity_, key);
-    if (index == -1) { return null; }
-    return summaries_[index];
-  }
+  //  S find(final long hash) {
+  //    final int index = HashOperations.hashSearch(hashTable_, lgCurrentCapacity_, hash);
+  //    if (index == -1) { return null; }
+  //    return summaryTable_[index];
+  //  }

Review comment:
       Did you intend to exclude this?

##########
File path: src/main/java/org/apache/datasketches/tuple/Sketch.java
##########
@@ -33,10 +33,10 @@
 
   protected static final byte PREAMBLE_LONGS = 1;
 
-  long[] keys_;
-  S[] summaries_;
-  long theta_;
-  boolean isEmpty_ = true;
+  //long[] hashArr_;
+  //S[] summaryArr_;

Review comment:
       You may have intended to remove these variables.

##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -252,44 +273,45 @@ public void notB(final org.apache.datasketches.theta.Sketch skB) {
     final CompactSketch<S> cskA = (skA instanceof CompactSketch)
         ? (CompactSketch<S>)skA
         : ((QuickSelectSketch<S>)skA).compact();
-    final long[] keysA = cskA.keys_;
-    final S[] summariesA = cskA.summaries_;
+    final long[] hashArrA = cskA.getHashArr();
+    final S[] summaryArrA = cskA.getSummaryArr();
     final int countA = cskA.getRetainedEntries();
 
     if (skB.isEmpty()) {
-      return new CompactSketch<>(keysA, summariesA, thetaLongA, empty);
+      return new CompactSketch<>(hashArrA, summaryArrA, thetaLongA, empty);
     }
     //skB is not empty
     final long thetaLongB = skB.getThetaLong();
     final long thetaLong = Math.min(thetaLongA, thetaLongB);
-    //Build hashtable and removes keys of skB >= theta
     final int countB = skB.getRetainedEntries();
-    final long[] hashTableKeysB =
+
+    //Build/rebuild hashtable and removes hashes of skB >= thetaLong
+    final long[] hashTableB = //this works for all theta sketches

Review comment:
       `//this works for all theta sketches` - it would be nice to add a little more context on what `this` is, whilst reading the documentation I couldn't figure it out.

##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -83,31 +83,41 @@ public void notB(final Sketch<S> skB) {
     //skB is not empty
     final long thetaLongB = skB.getThetaLong();
     thetaLong_ = Math.min(thetaLong_, thetaLongB);
-    //Build hashtable and removes keys of skB >= theta
+
+    //Build hashtable and removes hashes of skB >= theta
     final int countB = skB.getRetainedEntries();
-    final long[] hashTableKeysB = convertToHashTable(skB.keys_, countB, thetaLong_);
+    CompactSketch<S> cskB = null;
+    QuickSelectSketch<S> qskB = null;
+    final long[] hashTableB;
+    if (skB instanceof CompactSketch) {
+      cskB = (CompactSketch<S>) skB;
+      hashTableB = convertToHashTable(cskB.getHashArr(), countB, thetaLong_);
+    } else {
+      qskB = (QuickSelectSketch<S>) skB;
+      hashTableB = convertToHashTable(qskB.getHashTable(), countB, thetaLong_);

Review comment:
       Your optimization is a smart idea and could add up over many difference operations, thanks for the explanation 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org