You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2021/10/20 12:49:13 UTC

[GitHub] [datasketches-java] davecromberge commented on a change in pull request #369: Fix Mikhail's bug

davecromberge commented on a change in pull request #369:
URL: https://github.com/apache/datasketches-java/pull/369#discussion_r732706242



##########
File path: src/main/java/org/apache/datasketches/theta/AnotBimpl.java
##########
@@ -119,17 +121,22 @@ public CompactSketch aNotB(final Sketch skA, final Sketch skB, final boolean dst
     }
     //Both skA & skB are not null
 
+    final long minThetaLong = Math.min(skA.getThetaLong(), skB.getThetaLong());
+
     if (skA.isEmpty()) { return skA.compact(dstOrdered, dstMem); }
+    //A is not Empty
     checkSeedHashes(skA.getSeedHash(), seedHash_);
 
-    if (skB.isEmpty()) { return skA.compact(dstOrdered, dstMem); }
+    if (skB.isEmpty() && skB.getRetainedEntries() == 0) {
+      return skA.compact(dstOrdered, dstMem);
+   }

Review comment:
       My understanding is that a sketch with no retained entries might be the result of an operation such as intersection between two disjoint sets where either set might or might not be non empty.
   Does the `isEmpty` check apply only to a sketch that has no entries, and if so, why would it also be necessary to check the retained entries?  
   In the context of the issue report, it appears that this this additional check is necessary because the skB operand is the result of a disjoint intersection (where there are no retained entries).  

##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -116,14 +117,23 @@ public void setA(final Sketch<S> skA) {
       return;
     }
     //skA is not empty
-    empty_ = false;
-    thetaLong_ = skA.getThetaLong();
 
     //process A
+    empty_ = false;
+    thetaLong_ = skA.getThetaLong();
     final DataArrays<S> da = getDataArraysA(skA);
+
     hashArr_ = da.hashArr;
-    summaryArr_ = da.summaryArr;
+    hashArr_ = (hashArr_ == null) ? new long[0] : hashArr_;
     curCount_ = hashArr_.length;
+
+    summaryArr_ = da.summaryArr;
+    if (summaryArr_ == null) {
+      final SummaryFactory<S> sumFact = ((QuickSelectSketch<S>)skA).getSummaryFactory();
+      final S summary = sumFact.newSummary();
+      final Class<S> summaryType = (Class<S>)summary.getClass();
+      summaryArr_ = (S[]) Array.newInstance(summaryType, 0);
+    }

Review comment:
       What is your opinion on defaulting the summary array on `DataArrays<S>` within the getDataArraysA method?

##########
File path: src/main/java/org/apache/datasketches/tuple/Intersection.java
##########
@@ -107,14 +107,19 @@ public void intersect(final Sketch<S> tupleSketch) {
     if (tupleSketch == null) { throw new SketchesArgumentException("Sketch must not be null"); }
     final boolean firstCall = firstCall_;
     firstCall_ = false;
+    final boolean emptyIn = tupleSketch.isEmpty();
+    if (empty_ || emptyIn) { //empty rule
+      //Because of the definition of null above and the Empty Rule (which is OR), empty_ must be true.
+      //Whatever the current internal state, we make our local empty.
+      resetToEmpty();
+      return;

Review comment:
       Does the "Empty Rule" specifically apply to stateful set intersections, or is it defined formally in the Theta Sketch Framework?

##########
File path: src/test/java/org/apache/datasketches/tuple/aninteger/MikhailsBugTuple.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.tuple.aninteger;
+
+import org.apache.datasketches.tuple.AnotB;
+import org.apache.datasketches.tuple.CompactSketch;
+import org.apache.datasketches.tuple.Intersection;
+import org.testng.annotations.Test;
+
+/**
+ * Issue #368, from Mikhail Lavrinovich 12 OCT 2021
+ * The failure was AnotB(estimating {<1.0,1,F}, Intersect(estimating{<1.0,1,F}, newDegenerative{<1.0,0,T},
+ * Which should be equal to AnotB(estimating{<1.0,1,F}, new{1.0,0,T} = estimating{<1.0, 1, F}. The AnotB
+ * threw a null pointer exception because it was not properly handling sketches with zero entries.
+ */

Review comment:
       I've seen the term degenerate applied in the resource dictionary to sketches that are not in estimation mode.  If this is the case, why are the sketches in the example not all degenerative?

##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -143,18 +153,28 @@ public void setA(final Sketch<S> skA) {
    *
    * @param skB The incoming Tuple sketch for the second (or following) argument <i>B</i>.
    */
+  @SuppressWarnings("unchecked")
   public void notB(final Sketch<S> skB) {
-    if (empty_ || skB == null || skB.isEmpty() || hashArr_ == null) { return; }
+    if (empty_ || skB == null || skB.isEmpty()) { return; }

Review comment:
       Why are the retained entries for skB not checked here? (it is checked in the stateless aNotB)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org