You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2019/10/02 07:35:56 UTC

[GitHub] [lucene-solr] jimczi commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection

jimczi commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection
URL: https://github.com/apache/lucene-solr/pull/913#discussion_r330405030
 
 

 ##########
 File path: lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestPrefixCompletionQuery.java
 ##########
 @@ -253,6 +258,61 @@ public void testDocFiltering() throws Exception {
     iw.close();
   }
 
+  /**
+   * Test that the correct amount of documents are collected if using a collector that also rejects documents.
+   */
+  public void testCollectorThatRejects() throws Exception {
+    // use synonym analyzer to have multiple paths to same suggested document. This mock adds "dog" as synonym for "dogs"
+    Analyzer analyzer = new MockSynonymAnalyzer();
+    RandomIndexWriter iw = new RandomIndexWriter(random(), dir, iwcWithSuggestField(analyzer, "suggest_field"));
+    List<Entry> expectedResults = new ArrayList<Entry>();
+
+    for (int docCount = 10; docCount > 0; docCount--) {
+      Document document = new Document();
+      String value = "ab" + docCount + " dogs";
+      document.add(new SuggestField("suggest_field", value, docCount));
+      expectedResults.add(new Entry(value, docCount));
+      iw.addDocument(document);
+    }
+
+    if (rarely()) {
+      iw.commit();
+    }
+
+    DirectoryReader reader = iw.getReader();
+    SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader);
+
+    PrefixCompletionQuery query = new PrefixCompletionQuery(analyzer, new Term("suggest_field", "ab"));
+    int topN = 5;
+
+    // use a TopSuggestDocsCollector that rejects results with duplicate docIds
+    TopSuggestDocsCollector collector = new TopSuggestDocsCollector(topN, false) {
+
+      private Set<Integer> seenDocIds = new HashSet<>();
+
+      @Override
+      public boolean collect(int docID, CharSequence key, CharSequence context, float score) throws IOException {
+          int globalDocId = docID + docBase;
+          boolean collected = false;
+          if (seenDocIds.contains(globalDocId) == false) {
+              super.collect(docID, key, context, score);
+              seenDocIds.add(globalDocId);
+              collected = true;
+          }
+          return collected;
+      }
+    };
+
+    indexSearcher.suggest(query, collector);
+    assertSuggestions(collector.get(), expectedResults.subList(0, topN).toArray(new Entry[0]));
+
+    // TODO expecting true here, why false?
 
 Review comment:
   it seems that NRTSuggesterBuilder#maxAnalyzedPathsPerOutput is not computed correctly. From what I understand it records the number of suggestions with the same analyzed form but the comment says that it should be the highest number of analyzed paths we saw for any input surface form. So imo this is a bug, it's exactly related to this change so we should probably open a new issue for this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org