You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Arcadius Ahouansou (JIRA)" <ji...@apache.org> on 2015/07/15 17:07:04 UTC

[jira] [Created] (LUCENE-6680) BlendedInfixSuggester dedup bug

Arcadius Ahouansou created LUCENE-6680:
------------------------------------------

             Summary: BlendedInfixSuggester dedup bug
                 Key: LUCENE-6680
                 URL: https://issues.apache.org/jira/browse/LUCENE-6680
             Project: Lucene - Core
          Issue Type: Bug
    Affects Versions: 5.2.1
            Reporter: Arcadius Ahouansou



I expect the following test to pass, but it's failing in the latest Lucene 5.2.1: 

{{code}}

public void testBlendedInfixSuggesterDedupsOnWeightTitleAndPayload() throws Exception {

//Only the payload is different
    Input[] inputDocuments = new Input[]{
        new Input("lend me your ear", 7, new BytesRef("uid1")),
        new Input("lend me your ear", 7, new BytesRef("uid2")),
    };

    Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
    BlendedInfixSuggester suggester = new BlendedInfixSuggester(newDirectory(), a, a, AnalyzingInfixSuggester.DEFAULT_MIN_PREFIX_CHARS,
        BlendedInfixSuggester.BlenderType.POSITION_RECIPROCAL, 10, false);

    InputArrayIterator inputArrayIterator = new InputArrayIterator(inputDocuments);
    suggester.build(inputArrayIterator);

    List<Lookup.LookupResult> results = suggester.lookup(TestUtil.stringToCharSequence("ear", random()), 10, true, true);

    suggester.close();
    a.close();

    assertEquals(2, results.size());

  }

{{code}}

This test is failing because the BlendedInfixSuggester internally uses a TreeSet for storing the results and the corresponding Comparator only uses text+weight meaning that results with different payloads are collapsed into one.

The point here is that if two ingested documents have same title, weight but different payloads, then they are two different things and folding them into a single document would mean loosing the payload information 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org