You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/20 08:27:55 UTC

[GitHub] [lucene] javanna opened a new pull request, #11793: Prvent PointValues from returning null for ghost fields

javanna opened a new pull request, #11793:
URL: https://github.com/apache/lucene/pull/11793

   getPointValues may currently return null for unknown fields or fields that don't index points. It can happen that a field no longer has points for any document in a segment after delete+merge, which causes field info to think that the field is there and has points, yet when calling getPointValues null is returned.
   
   With this change, we prevent getPointValues from returning null for ghost fields, it will instead return an empty instance of PointValues.
   
   Disclaimer: this is an attempt around addressing potential issues with ghost fields when retrieving point values. It has to be said that most places that call getPointValues today have null checks, and don't consult field info upfront, hence they would not be exposed to the issue that FieldExistsQuery triggers, as field info says there are points and then the retrieved point values is null. One quick-fix would be to add a null check to FieldExistsQuery, though it feels like we should try to prevent this kind of situation from happening again in the future? getPointValues can still return null, but it shouldn't for ghost fields?
   
   Relates to #11393


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1254593758

   Test failures suggest CheckIndex needs to have its expectations adjusted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] javanna commented on pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
javanna commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1289134578

   @jpountz I addressed the check index issue, and pushed an update around the packed values that are null in the empty point values instance. That scenario is a bit tricky and ghost fields are not extensively tested, I wonder if more testing is a thing we'd like to do. I am not entirely sure if things may go wrong when the packed values are retrieved with `PointTree`. That should be the `intersect` path and I could not yet determine if there may be NPEs there too with ghost fields. I am though under impression that we assumed before that `PointValues` is not null, hence in the worst case we would still have an NPE but in a different place.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] javanna closed pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
javanna closed pull request #11793: Prevent PointValues from returning null for ghost fields
URL: https://github.com/apache/lucene/pull/11793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] javanna commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
javanna commented on code in PR #11793:
URL: https://github.com/apache/lucene/pull/11793#discussion_r976257408


##########
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##########
@@ -104,28 +104,28 @@ public abstract class NumericLeafComparator implements LeafFieldComparator {
 
     public NumericLeafComparator(LeafReaderContext context) throws IOException {
       this.docValues = getNumericDocValues(context, field);
-      this.pointValues = canSkipDocuments ? context.reader().getPointValues(field) : null;
-      if (pointValues != null) {
-        FieldInfo info = context.reader().getFieldInfos().fieldInfo(field);
-        if (info == null || info.getPointDimensionCount() == 0) {
-          throw new IllegalStateException(
-              "Field "
-                  + field
-                  + " doesn't index points according to FieldInfos yet returns non-null PointValues");
-        } else if (info.getPointDimensionCount() > 1) {
-          throw new IllegalArgumentException(
-              "Field " + field + " is indexed with multiple dimensions, sorting is not supported");
-        } else if (info.getPointNumBytes() != bytesCount) {
-          throw new IllegalArgumentException(
-              "Field "
-                  + field
-                  + " is indexed with "
-                  + info.getPointNumBytes()
-                  + " bytes per dimension, but "
-                  + NumericComparator.this
-                  + " expected "
-                  + bytesCount);
-        }
+      FieldInfo info = context.reader().getFieldInfos().fieldInfo(field);
+      if (info == null || info.getPointDimensionCount() == 0) {
+        throw new IllegalStateException(

Review Comment:
   yes this is wrong, I pushed a fix: the idea is that we don't need to check consistency between FieldInfo and getPointValues here, hence the first exception can go away. We check fieldinfo and we let that drive our decisions, and retrieve points based on what field info says.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1310334991

   Apologies @javanna, but after looking more at your changes, I'm getting worried that this change is harder than I had anticipated. I was optimistically hoping that never returning null PointValues would automatically help save some NullPointerExceptions but looking at your change it looks like there are many places where we just need to replace these null checks with other checks if we want to keep the logic correct, so things wouldn't magically work on segments that have ghost fields unless we explicitly test ghost fields. So I would suggest closing this PR, apologies for the time you spent on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] javanna commented on pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
javanna commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1252119563

   I pushed an update. I have removed null checks from consumers that were already checking field info. In most cases we already check field info for compatibility, hence we can expect point values to never be null. Where we instead we only check if field info is null, we also need to add a check that point dimension count is > 0 to replace the point values null check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] javanna commented on pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
javanna commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1310370411

   Agreed @jpountz I think it was a good experiment to spend some time on, and I have also been thinking along the same lines, that the changes I ended up making were not solving the problem like we initially thought. No problem at all. I will open a follow-up PR to clarify the current behaviour and expectations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #11793:
URL: https://github.com/apache/lucene/pull/11793#discussion_r975074845


##########
lucene/core/src/java/org/apache/lucene/index/PointValues.java:
##########
@@ -467,4 +467,90 @@ public final long estimateDocCount(IntersectVisitor visitor) {
 
   /** Returns the total number of documents that have indexed at least one point. */
   public abstract int getDocCount();
+
+  public static PointValues empty(
+      int numDimensions, int numIndexDimensions, int bytesPerDimension) {
+    return new PointValues() {
+      @Override
+      public PointTree getPointTree() {
+        return new PointTree() {
+          @Override
+          public PointTree clone() {
+            throw new UnsupportedOperationException();
+          }
+
+          @Override
+          public boolean moveToChild() {
+            return false;
+          }
+
+          @Override
+          public boolean moveToSibling() {
+            return false;
+          }
+
+          @Override
+          public boolean moveToParent() {
+            return false;
+          }
+
+          @Override
+          public byte[] getMinPackedValue() {
+            return new byte[0];

Review Comment:
   The contract is to return `null` if `size` is 0.



##########
lucene/core/src/java/org/apache/lucene/index/PointValues.java:
##########
@@ -467,4 +467,90 @@ public final long estimateDocCount(IntersectVisitor visitor) {
 
   /** Returns the total number of documents that have indexed at least one point. */
   public abstract int getDocCount();
+
+  public static PointValues empty(
+      int numDimensions, int numIndexDimensions, int bytesPerDimension) {
+    return new PointValues() {
+      @Override
+      public PointTree getPointTree() {
+        return new PointTree() {
+          @Override
+          public PointTree clone() {
+            throw new UnsupportedOperationException();
+          }
+
+          @Override
+          public boolean moveToChild() {
+            return false;
+          }
+
+          @Override
+          public boolean moveToSibling() {
+            return false;
+          }
+
+          @Override
+          public boolean moveToParent() {
+            return false;
+          }
+
+          @Override
+          public byte[] getMinPackedValue() {
+            return new byte[0];
+          }
+
+          @Override
+          public byte[] getMaxPackedValue() {
+            return new byte[0];

Review Comment:
   and likewise here



##########
lucene/core/src/java/org/apache/lucene/index/PointValues.java:
##########
@@ -467,4 +467,90 @@ public final long estimateDocCount(IntersectVisitor visitor) {
 
   /** Returns the total number of documents that have indexed at least one point. */
   public abstract int getDocCount();
+
+  public static PointValues empty(

Review Comment:
   Add javadocs?



##########
lucene/core/src/java/org/apache/lucene/index/PointValues.java:
##########
@@ -467,4 +467,90 @@ public final long estimateDocCount(IntersectVisitor visitor) {
 
   /** Returns the total number of documents that have indexed at least one point. */
   public abstract int getDocCount();
+
+  public static PointValues empty(
+      int numDimensions, int numIndexDimensions, int bytesPerDimension) {
+    return new PointValues() {
+      @Override
+      public PointTree getPointTree() {
+        return new PointTree() {
+          @Override
+          public PointTree clone() {
+            throw new UnsupportedOperationException();

Review Comment:
   Should we `return this` here to make this implementation well behaved?



##########
lucene/core/src/java/org/apache/lucene/index/PointValues.java:
##########
@@ -467,4 +467,90 @@ public final long estimateDocCount(IntersectVisitor visitor) {
 
   /** Returns the total number of documents that have indexed at least one point. */
   public abstract int getDocCount();
+
+  public static PointValues empty(
+      int numDimensions, int numIndexDimensions, int bytesPerDimension) {
+    return new PointValues() {
+      @Override
+      public PointTree getPointTree() {
+        return new PointTree() {
+          @Override
+          public PointTree clone() {
+            throw new UnsupportedOperationException();
+          }
+
+          @Override
+          public boolean moveToChild() {
+            return false;
+          }
+
+          @Override
+          public boolean moveToSibling() {
+            return false;
+          }
+
+          @Override
+          public boolean moveToParent() {
+            return false;
+          }
+
+          @Override
+          public byte[] getMinPackedValue() {
+            return new byte[0];
+          }
+
+          @Override
+          public byte[] getMaxPackedValue() {
+            return new byte[0];
+          }
+
+          @Override
+          public long size() {
+            return 0;
+          }
+
+          @Override
+          public void visitDocIDs(IntersectVisitor visitor) {}
+
+          @Override
+          public void visitDocValues(IntersectVisitor visitor) {}
+        };
+      }
+
+      @Override
+      public byte[] getMinPackedValue() {
+        return null;
+      }
+
+      @Override
+      public byte[] getMaxPackedValue() {
+        return null;
+      }
+
+      @Override
+      public int getNumDimensions() {
+        return 0;

Review Comment:
   return `numDimensions` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #11793:
URL: https://github.com/apache/lucene/pull/11793#discussion_r975300662


##########
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##########
@@ -104,28 +104,28 @@ public abstract class NumericLeafComparator implements LeafFieldComparator {
 
     public NumericLeafComparator(LeafReaderContext context) throws IOException {
       this.docValues = getNumericDocValues(context, field);
-      this.pointValues = canSkipDocuments ? context.reader().getPointValues(field) : null;
-      if (pointValues != null) {
-        FieldInfo info = context.reader().getFieldInfos().fieldInfo(field);
-        if (info == null || info.getPointDimensionCount() == 0) {
-          throw new IllegalStateException(
-              "Field "
-                  + field
-                  + " doesn't index points according to FieldInfos yet returns non-null PointValues");
-        } else if (info.getPointDimensionCount() > 1) {
-          throw new IllegalArgumentException(
-              "Field " + field + " is indexed with multiple dimensions, sorting is not supported");
-        } else if (info.getPointNumBytes() != bytesCount) {
-          throw new IllegalArgumentException(
-              "Field "
-                  + field
-                  + " is indexed with "
-                  + info.getPointNumBytes()
-                  + " bytes per dimension, but "
-                  + NumericComparator.this
-                  + " expected "
-                  + bytesCount);
-        }
+      FieldInfo info = context.reader().getFieldInfos().fieldInfo(field);
+      if (info == null || info.getPointDimensionCount() == 0) {
+        throw new IllegalStateException(

Review Comment:
   This doesn't look correct to me as it would throw an exception if the field doesn't exist, which should be supported. Previously it would only fail if the field would not exist AND the reader provides point values?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] javanna commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
javanna commented on code in PR #11793:
URL: https://github.com/apache/lucene/pull/11793#discussion_r975046441


##########
lucene/core/src/java/org/apache/lucene/index/PointValues.java:
##########
@@ -467,4 +467,91 @@ public final long estimateDocCount(IntersectVisitor visitor) {
 
   /** Returns the total number of documents that have indexed at least one point. */
   public abstract int getDocCount();
+
+  private static final PointTree EMPTY_POINT_TREE =
+      new PointTree() {
+        @Override
+        public PointTree clone() {
+          throw new UnsupportedOperationException();

Review Comment:
   there are not test failures caused by this but I have been wondering if this is acceptable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #11793:
URL: https://github.com/apache/lucene/pull/11793#discussion_r975049657


##########
lucene/core/src/java/org/apache/lucene/index/PointValues.java:
##########
@@ -467,4 +467,91 @@ public final long estimateDocCount(IntersectVisitor visitor) {
 
   /** Returns the total number of documents that have indexed at least one point. */
   public abstract int getDocCount();
+
+  private static final PointTree EMPTY_POINT_TREE =
+      new PointTree() {
+        @Override
+        public PointTree clone() {
+          throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public boolean moveToChild() {
+          return false;
+        }
+
+        @Override
+        public boolean moveToSibling() {
+          return false;
+        }
+
+        @Override
+        public boolean moveToParent() {
+          return false;
+        }
+
+        @Override
+        public byte[] getMinPackedValue() {
+          return new byte[0];
+        }
+
+        @Override
+        public byte[] getMaxPackedValue() {
+          return new byte[0];
+        }
+
+        @Override
+        public long size() {
+          return 0;
+        }
+
+        @Override
+        public void visitDocIDs(IntersectVisitor visitor) {}
+
+        @Override
+        public void visitDocValues(IntersectVisitor visitor) {}
+      };
+
+  public static final PointValues EMPTY =

Review Comment:
   One issue that the other PR regarding ghost fields and terms highlighted is that we should tune this empty instance so that it returns correct values for field metadata such as numbers of dimensions. Otherwise any code that would be checking the number of dimensions via this empty instance could be retrieving wrong numbers. See e.g. `checkValidPointValues` in `PointRangeQuery`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] javanna commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
javanna commented on code in PR #11793:
URL: https://github.com/apache/lucene/pull/11793#discussion_r975048307


##########
lucene/core/src/java/org/apache/lucene/index/PointValues.java:
##########
@@ -467,4 +467,91 @@ public final long estimateDocCount(IntersectVisitor visitor) {
 
   /** Returns the total number of documents that have indexed at least one point. */
   public abstract int getDocCount();
+
+  private static final PointTree EMPTY_POINT_TREE =
+      new PointTree() {
+        @Override
+        public PointTree clone() {
+          throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public boolean moveToChild() {
+          return false;
+        }
+
+        @Override
+        public boolean moveToSibling() {
+          return false;
+        }
+
+        @Override
+        public boolean moveToParent() {
+          return false;
+        }
+
+        @Override
+        public byte[] getMinPackedValue() {
+          return new byte[0];
+        }
+
+        @Override
+        public byte[] getMaxPackedValue() {
+          return new byte[0];
+        }
+
+        @Override
+        public long size() {
+          return 0;
+        }
+
+        @Override
+        public void visitDocIDs(IntersectVisitor visitor) {}
+
+        @Override
+        public void visitDocValues(IntersectVisitor visitor) {}
+      };
+
+  public static final PointValues EMPTY =
+      new PointValues() {
+        @Override
+        public PointTree getPointTree() {
+          return EMPTY_POINT_TREE;
+        }
+
+        @Override
+        public byte[] getMinPackedValue() {
+          return null;
+        }
+
+        @Override
+        public byte[] getMaxPackedValue() {
+          return null;

Review Comment:
   returning null here does not feel great, yet I did not find a better way that fits in the existing code. I tried the empty array but it required changes in how the return value is consumed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1252371627

   I merged your other PR that adds a null check in FieldExistsQuery, we should now be able to remove this null check with this change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] javanna commented on pull request #11793: Prevent PointValues from returning null for ghost fields

Posted by GitBox <gi...@apache.org>.
javanna commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1308433716

   @jpountz would you have time to take another look at this please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org