You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/06/15 10:27:29 UTC

[GitHub] [lucene] jimczi opened a new pull request #185: CombinedFieldQuery can fail with an exception when document is missing fields

jimczi opened a new pull request #185:
URL: https://github.com/apache/lucene/pull/185


   This change fixes a bug in `MultiNormsLeafSimScorer` that assumes that each field should have a norm for every term/document.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r669839811



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -62,6 +67,14 @@
           weightList.add(field.weight);
         }
       }
+
+      if (normsList.isEmpty() == false && normsList.size() != normFields.size()) {
+        throw new IllegalArgumentException(
+            getClass().getSimpleName()
+                + " requires norms to be consistent across fields: some fields cannot"
+                + " have norms enabled, while others have norms disabled");

Review comment:
       Doing more research, I don't think it's possible to detect if norms are disabled upfront: `LeafReader#getNormsValues` could be `null` if a field uses norms but happens to be missing from a segment. Using field infos would have the same problem.
   
   I opted to just document this requirement (as we do for the fact norms must be additive and encoded in a certain way). If the calling application has its own notion of schema, it can enforce the norms requirements upfront.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r669839811



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -62,6 +67,14 @@
           weightList.add(field.weight);
         }
       }
+
+      if (normsList.isEmpty() == false && normsList.size() != normFields.size()) {
+        throw new IllegalArgumentException(
+            getClass().getSimpleName()
+                + " requires norms to be consistent across fields: some fields cannot"
+                + " have norms enabled, while others have norms disabled");

Review comment:
       Doing more research, I don't think it's possible to detect if norms are disabled upfront: `LeafReader#getNormsValues` could be `null` a field uses norms but happens to be missing from a segment. I opted to just document this requirement (as we do for the fact norms must be additive and encoded in a certain way). If the calling application has its own notion of schema, it can enforce the norms requirements upfront.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r669839811



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -62,6 +67,14 @@
           weightList.add(field.weight);
         }
       }
+
+      if (normsList.isEmpty() == false && normsList.size() != normFields.size()) {
+        throw new IllegalArgumentException(
+            getClass().getSimpleName()
+                + " requires norms to be consistent across fields: some fields cannot"
+                + " have norms enabled, while others have norms disabled");

Review comment:
       Doing more research, I don't think it's possible to detect if norms are disabled upfront: `LeafReader#getNormsValues` could be `null` if a field uses norms but happens to be missing from a segment. I opted to just document this requirement (as we do for the fact norms must be additive and encoded in a certain way). If the calling application has its own notion of schema, it can enforce the norms requirements upfront.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on pull request #185:
URL: https://github.com/apache/lucene/pull/185#issuecomment-880102478


   @jpountz would you be able to take another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jimczi commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jimczi commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r651810484



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -80,9 +80,7 @@
   }
 
   private long getNormValue(int doc) throws IOException {
-    if (norms != null) {
-      boolean found = norms.advanceExact(doc);
-      assert found;
+    if (norms != null && norms.advanceExact(doc)) {

Review comment:
       We allow a StringField and a TextField to be used at the moment. We could disallow that upfront (all fields or none have norms enabled) because the old logic would fail with weird exceptions without this change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r654110030



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java
##########
@@ -85,7 +86,7 @@ public void testToString() {
     assertEquals("CombinedFieldQuery((foo title^3.0)(bar baz))", builder.build().toString());
   }
 
-  public void testSameScore() throws IOException {
+  public void testSparseFieldSameScore() throws IOException {

Review comment:
       It's a little unclear to me what we aim to test with the 'same score' checks. The `testCopyField` test seems to best capture the core behavior -- we could have a version of `testCopyField` where one field is sparse?

##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -80,9 +80,7 @@
   }
 
   private long getNormValue(int doc) throws IOException {
-    if (norms != null) {
-      boolean found = norms.advanceExact(doc);
-      assert found;
+    if (norms != null && norms.advanceExact(doc)) {

Review comment:
       +1 from me to disallow it upfront, the combined score won't be meaningful if there is a mix of norms being disabled/ enabled.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r658097724



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -62,6 +67,14 @@
           weightList.add(field.weight);
         }
       }
+
+      if (normsList.isEmpty() == false && normsList.size() != normFields.size()) {
+        throw new IllegalArgumentException(
+            getClass().getSimpleName()
+                + " requires norms to be consistent across fields: some fields cannot"
+                + " have norms enabled, while others have norms disabled");

Review comment:
       Properly throwing an error would require a more invasive change, since we rewrite the query very early if scores aren't needed: https://github.com/apache/lucene/blob/9942d59f0dc92db28d1dd79bf321cd2a05cdb6d1/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L262-L265 However I could refactor this part in `MultiNormsLeafSimScorer` just to be clearer/ more solid?
   
   This also made me notice -- I'm using the fact that `LeafReader#getNormsValues` is `null` to detect that norms were disabled. This seems accurate but I couldn't confirm it's really part of the method contract. I wonder if checking field infos is better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jimczi commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jimczi commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r657762626



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -62,6 +67,14 @@
           weightList.add(field.weight);
         }
       }
+
+      if (normsList.isEmpty() == false && normsList.size() != normFields.size()) {
+        throw new IllegalArgumentException(
+            getClass().getSimpleName()
+                + " requires norms to be consistent across fields: some fields cannot"
+                + " have norms enabled, while others have norms disabled");

Review comment:
       It's kind of an edge case but should we also throw the exception if scoring is not needed ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani merged pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani merged pull request #185:
URL: https://github.com/apache/lucene/pull/185


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on pull request #185:
URL: https://github.com/apache/lucene/pull/185#issuecomment-867185368


   @jimczi asked if I could finish the PR -- I pushed some changes:
   * Add a check that either all fields or no fields have norms enabled
   * Rework the testing strategy
   
   Let me know if it still looks okay or if you have other comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on pull request #185:
URL: https://github.com/apache/lucene/pull/185#issuecomment-881794847


   I tried out your idea. I opted to do it on a global basis in `createWeight` to ensure we always validate this, even when the query is rewritten.
   
   I'm going to merge, but feel free to give post merge feedback if anything seems off.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r658097724



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -62,6 +67,14 @@
           weightList.add(field.weight);
         }
       }
+
+      if (normsList.isEmpty() == false && normsList.size() != normFields.size()) {
+        throw new IllegalArgumentException(
+            getClass().getSimpleName()
+                + " requires norms to be consistent across fields: some fields cannot"
+                + " have norms enabled, while others have norms disabled");

Review comment:
       Properly throwing an error would require a more invasive change, since we rewrite the query very early if scores aren't needed: https://github.com/apache/lucene/blob/9942d59f0dc92db28d1dd79bf321cd2a05cdb6d1/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L262-L265. However I could refactor this part in `MultiNormsLeafSimScorer` just to be clearer/ more solid?
   
   This also made me notice -- I'm using the fact that `LeafReader#getNormsValues` is `null` to detect that norms were disabled. This seems accurate but I couldn't confirm it's really part of the method contract. I wonder if checking field infos is better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #185: LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #185:
URL: https://github.com/apache/lucene/pull/185#discussion_r651759489



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -80,9 +80,7 @@
   }
 
   private long getNormValue(int doc) throws IOException {
-    if (norms != null) {
-      boolean found = norms.advanceExact(doc);
-      assert found;
+    if (norms != null && norms.advanceExact(doc)) {

Review comment:
       I believe that the old logic was correct?
   
   The only case I can think of when it might fail is when some fields are indexed with norms and other fields without norms, but I'm not sure we should even try to support this case with `CombinedFieldQuery`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org