You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/06/09 16:13:00 UTC

[GitHub] [incubator-datasketches-java] leerho commented on a change in pull request #320: Renaming

leerho commented on a change in pull request #320:
URL: https://github.com/apache/incubator-datasketches-java/pull/320#discussion_r436877733



##########
File path: src/main/java/org/apache/datasketches/tuple/Sketch.java
##########
@@ -33,10 +33,10 @@
 
   protected static final byte PREAMBLE_LONGS = 1;
 
-  long[] keys_;
-  S[] summaries_;
-  long theta_;
-  boolean isEmpty_ = true;
+  //long[] hashArr_;
+  //S[] summaryArr_;

Review comment:
       Good catch. Thanks!

##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -83,31 +83,41 @@ public void notB(final Sketch<S> skB) {
     //skB is not empty
     final long thetaLongB = skB.getThetaLong();
     thetaLong_ = Math.min(thetaLong_, thetaLongB);
-    //Build hashtable and removes keys of skB >= theta
+
+    //Build hashtable and removes hashes of skB >= theta
     final int countB = skB.getRetainedEntries();
-    final long[] hashTableKeysB = convertToHashTable(skB.keys_, countB, thetaLong_);
+    CompactSketch<S> cskB = null;
+    QuickSelectSketch<S> qskB = null;
+    final long[] hashTableB;
+    if (skB instanceof CompactSketch) {
+      cskB = (CompactSketch<S>) skB;
+      hashTableB = convertToHashTable(cskB.getHashArr(), countB, thetaLong_);
+    } else {
+      qskB = (QuickSelectSketch<S>) skB;
+      hashTableB = convertToHashTable(qskB.getHashTable(), countB, thetaLong_);

Review comment:
       Yes. All Set operations must apply the "Theta Rule" whereby the resulting theta of the operation is the Min(skA, skB).  This rule is applied on line 85, where thetaLong_ is the current theta of the internal data retained by the AnotB operator for stateful operations.  Because the new thetaLong_ is potentially lower than the thetaLong of skB, we must filter out all hash values >= this new theta.  
   
   Even though skB is in HashTable form, Open Address hash tables do not allow deletions, which means a new hash table may need to be reconstructed from the valid remaining hashes.  Whether skB is Compact or QS, we want to construct an accurate local hashTableB that we use for searching for matches.  
   
   On closer look, I did decide that a simple optimization could be added such that if the new thetaLong_ >= thetaLongB, we can just grab a reference to the skB hashTable. Since it is only a local copy and not modified, we don't need to copy it. :)

##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -252,44 +273,45 @@ public void notB(final org.apache.datasketches.theta.Sketch skB) {
     final CompactSketch<S> cskA = (skA instanceof CompactSketch)
         ? (CompactSketch<S>)skA
         : ((QuickSelectSketch<S>)skA).compact();
-    final long[] keysA = cskA.keys_;
-    final S[] summariesA = cskA.summaries_;
+    final long[] hashArrA = cskA.getHashArr();
+    final S[] summaryArrA = cskA.getSummaryArr();
     final int countA = cskA.getRetainedEntries();
 
     if (skB.isEmpty()) {
-      return new CompactSketch<>(keysA, summariesA, thetaLongA, empty);
+      return new CompactSketch<>(hashArrA, summaryArrA, thetaLongA, empty);
     }
     //skB is not empty
     final long thetaLongB = skB.getThetaLong();
     final long thetaLong = Math.min(thetaLongA, thetaLongB);
-    //Build hashtable and removes keys of skB >= theta
     final int countB = skB.getRetainedEntries();
-    final long[] hashTableKeysB =
+
+    //Build/rebuild hashtable and removes hashes of skB >= thetaLong
+    final long[] hashTableB = //this works for all theta sketches

Review comment:
       "this" refers to the following convert operation. Changed to make it clearer.  Because there are so many different flavors of Theta sketches, we need a method that works for all of them.

##########
File path: src/main/java/org/apache/datasketches/tuple/QuickSelectSketch.java
##########
@@ -388,96 +399,102 @@ boolean isInSamplingMode() {
   }
 
   void setThetaLong(final long theta) {
-    theta_ = theta;
+    thetaLong_ = theta;
   }
 
   void setEmpty(final boolean value) {
-    isEmpty_ = value;
+    empty_ = value;
   }
 
   SummaryFactory<S> getSummaryFactory() {
     return summaryFactory_;
   }
 
-  int findOrInsert(final long key) {
-    final int index = HashOperations.hashSearchOrInsert(keys_, lgCurrentCapacity_, key);
+  int findOrInsert(final long hash) {
+    final int index = HashOperations.hashSearchOrInsert(hashTable_, lgCurrentCapacity_, hash);
     if (index < 0) {
       count_++;
     }
     return index;
   }
 
-  S find(final long key) {
-    final int index = HashOperations.hashSearch(keys_, lgCurrentCapacity_, key);
-    if (index == -1) { return null; }
-    return summaries_[index];
-  }
+  //  S find(final long hash) {
+  //    final int index = HashOperations.hashSearch(hashTable_, lgCurrentCapacity_, hash);
+  //    if (index == -1) { return null; }
+  //    return summaryTable_[index];
+  //  }

Review comment:
       I did a search and it is not used anywhere.  So yes I will remove this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org