You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/11/16 04:22:12 UTC

[GitHub] [lucene] zacharymorn opened a new pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

zacharymorn opened a new pull request #444:
URL: https://github.com/apache/lucene/pull/444


   # Description
   
   Updated field-weight used in CombinedFieldQuery scoring calculation
   
   # Tests
   
   1. Added a new test from https://github.com/apache/lucene/pull/418
   2. Run `./gradlew clean; ./gradlew check -Pvalidation.git.failOnModified=false`
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #444:
URL: https://github.com/apache/lucene/pull/444#issuecomment-997621234


   > Will backport this to 8.11 and 8x.
   
   Hi @jimczi @jtibshirani @mikemccand, just to confirm as I saw there was a thread on the proper handling of `branch_8x`, this change should be backported to branches  `branch_8_11` (version `8.11.2`), `branch_9_0` (version `9.0.1`) & `branch_9x` (version `9.1.0`), but not to branch `branch_8x` right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on a change in pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #444:
URL: https://github.com/apache/lucene/pull/444#discussion_r750880606



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java
##########
@@ -165,6 +169,117 @@ public void testSameScore() throws IOException {
     dir.close();
   }
 
+  public void testSameScoreAndCollectionBetweenCompleteAndTopScores() throws IOException {

Review comment:
       This test was actually developed in another related PR https://github.com/apache/lucene/pull/418, and it would generate duplicated field-weight pairs in `fields` object to trigger the condition. As the existing tests don't currently trigger that condition, I thought this would be a good test to put in first for both PRs. 
   
   I guess if we were to have a focused test for this PR, it would still look something very similar to this test (and `testCopyField`), maybe something like the following, and it just ensures the test run through without assertion exception. The delta between the tests are just I removed some doc content randomizations and result comparison between TOP_SCORE and COMPLETE collection.
   
   ```
    public void testShouldRunWithoutAssertionException() throws IOException {
       int numDocs =
           randomBoolean()
               ? atLeast(1000)
               : atLeast(128 * 8 * 8 * 3); // make sure some terms have skip data
       int numMatchDoc = randomIntBetween(200, 500);
       int numHits = atMost(100);
       int boost1 = Math.max(1, random().nextInt(5));
       int boost2 = Math.max(1, random().nextInt(5));
   
       Directory dir = newDirectory();
       Similarity similarity = randomCompatibleSimilarity();
   
       IndexWriterConfig iwc = new IndexWriterConfig();
       iwc.setSimilarity(similarity);
       RandomIndexWriter w = new RandomIndexWriter(random(), dir, iwc);
   
       // adding potentially matching doc
       for (int i = 0; i < numMatchDoc; i++) {
         Document doc = new Document();
   
         int freqA = random().nextInt(20) + 1;
         if (randomBoolean()) {
           for (int j = 0; j < freqA; j++) {
             doc.add(new TextField("a", "foo", Store.NO));
           }
         }
   
         freqA = random().nextInt(20) + 1;
         if (randomBoolean()) {
           for (int j = 0; j < freqA; j++) {
             doc.add(new TextField("a", "foo" + j, Store.NO));
           }
         }
   
         freqA = random().nextInt(20) + 1;
         if (randomBoolean()) {
           for (int j = 0; j < freqA; j++) {
             doc.add(new TextField("a", "zoo", Store.NO));
           }
         }
   
         int freqB = random().nextInt(20) + 1;
         if (randomBoolean()) {
           for (int j = 0; j < freqB; j++) {
             doc.add(new TextField("b", "zoo", Store.NO));
           }
         }
   
         freqB = random().nextInt(20) + 1;
         if (randomBoolean()) {
           for (int j = 0; j < freqB; j++) {
             doc.add(new TextField("b", "zoo" + j, Store.NO));
           }
         }
   
         int freqC = random().nextInt(20) + 1;
         for (int j = 0; j < freqC; j++) {
           doc.add(new TextField("c", "bla" + j, Store.NO));
         }
         w.addDocument(doc);
       }
   
       IndexReader reader = w.getReader();
       IndexSearcher searcher = newSearcher(reader);
       searcher.setSimilarity(similarity);
   
       CombinedFieldQuery query =
           new CombinedFieldQuery.Builder()
               .addField("a", (float) boost1)
               .addField("b", (float) boost2)
               .addTerm(new BytesRef("foo"))
               .addTerm(new BytesRef("zoo"))
               .build();
   
       TopScoreDocCollector completeCollector =
           TopScoreDocCollector.create(numHits, null, Integer.MAX_VALUE); 
       searcher.search(query, completeCollector);
   
       reader.close();
       w.close();
       dir.close();
     }
   ```
   
   I guess I'm fine either way? If you prefer a more focused test, I can replace it with the above one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #444:
URL: https://github.com/apache/lucene/pull/444#discussion_r750528027



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java
##########
@@ -165,6 +169,117 @@ public void testSameScore() throws IOException {
     dir.close();
   }
 
+  public void testSameScoreAndCollectionBetweenCompleteAndTopScores() throws IOException {

Review comment:
       Could you explain how this test relates to the fix? Would it make sense to have a more targeted test similar to `testCopyField`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #444:
URL: https://github.com/apache/lucene/pull/444#issuecomment-973796460


   Will backport this to 8.11 and 8x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on a change in pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #444:
URL: https://github.com/apache/lucene/pull/444#discussion_r750881073



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -61,7 +63,14 @@
     if (needsScores) {
       final List<NumericDocValues> normsList = new ArrayList<>();
       final List<Float> weightList = new ArrayList<>();
+      final Set<String> duplicateCheckingSet = new HashSet<>();
       for (FieldAndWeight field : normFields) {
+        assert duplicateCheckingSet.contains(field.field) == false
+            : "There is duplicated field ["
+                + field.field
+                + "] used to construct MultiNormsLeafSimScorer";
+        duplicateCheckingSet.add(field.field);

Review comment:
       Ah yes. I assume you meant `assert duplicateCheckingSet.add(field.field)` and have updated it accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on a change in pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #444:
URL: https://github.com/apache/lucene/pull/444#discussion_r751910253



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java
##########
@@ -165,6 +169,117 @@ public void testSameScore() throws IOException {
     dir.close();
   }
 
+  public void testSameScoreAndCollectionBetweenCompleteAndTopScores() throws IOException {

Review comment:
       Sounds good. I've replaced the test with a more focused one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jimczi commented on a change in pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
jimczi commented on a change in pull request #444:
URL: https://github.com/apache/lucene/pull/444#discussion_r749986210



##########
File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiNormsLeafSimScorer.java
##########
@@ -61,7 +63,14 @@
     if (needsScores) {
       final List<NumericDocValues> normsList = new ArrayList<>();
       final List<Float> weightList = new ArrayList<>();
+      final Set<String> duplicateCheckingSet = new HashSet<>();
       for (FieldAndWeight field : normFields) {
+        assert duplicateCheckingSet.contains(field.field) == false
+            : "There is duplicated field ["
+                + field.field
+                + "] used to construct MultiNormsLeafSimScorer";
+        duplicateCheckingSet.add(field.field);

Review comment:
       Could be `assert duplicateCheckingSet.add(field.field) == false` ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #444:
URL: https://github.com/apache/lucene/pull/444#issuecomment-1005865609


   Correct, changes should no longer be backported to `branch_8x`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #444:
URL: https://github.com/apache/lucene/pull/444#discussion_r751461104



##########
File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java
##########
@@ -165,6 +169,117 @@ public void testSameScore() throws IOException {
     dir.close();
   }
 
+  public void testSameScoreAndCollectionBetweenCompleteAndTopScores() throws IOException {

Review comment:
       I think starting with this more focused test makes sense for this PR. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn merged pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
zacharymorn merged pull request #444:
URL: https://github.com/apache/lucene/pull/444


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #444:
URL: https://github.com/apache/lucene/pull/444#issuecomment-971181479


   Thanks @jimczi @jtibshirani for the review and feedback!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org