You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chongchen Chen (Jira)" <ji...@apache.org> on 2019/10/10 00:01:24 UTC

[jira] [Issue Comment Deleted] (LUCENE-8985) SynonymGraphFilter cannot handle input stream with tokens filtered.

     [ https://issues.apache.org/jira/browse/LUCENE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chongchen Chen updated LUCENE-8985:
-----------------------------------
    Comment: was deleted

(was: [~janhoy] I find two testcases in TestSynonymGraphFilter that I cannot understand.

{code:java}
public void testBasicKeepOrigTwoOutputs() throws Exception {
    SynonymMap.Builder b = new SynonymMap.Builder();
    add(b, "a b", "x y", true);
    add(b, "a b", "m n o", true);

    Analyzer a = getAnalyzer(b, true);
    assertAnalyzesTo(a,
                     "c a b d",
                     new String[] {"c", "x", "m", "a", "y", "n", "o", "b", "d"},
                     new int[]    { 0,   2,   2,   2,   2,   2,   2,   4,   6},
                     new int[]    { 1,   5,   5,   3,   5,   5,   5,   5,   7},
                     new String[] {"word", "SYNONYM", "SYNONYM", "word", "SYNONYM", "SYNONYM", "SYNONYM", "word", "word"},
                     new int[]    { 1,   1,   0,   0,   1,   1,   1,   1,   1},
                     new int[]    { 1,   1,   2,   4,   4,   1,   2,   1,   1}); // I think posLengths should be {1, 1, 1, 1, 2, 1, 1, 2, 1} . because the longest synonym's length is 3
    a.close();
  }

public void testBasicNoKeepOrigTwoOutputs() throws Exception {
    SynonymMap.Builder b = new SynonymMap.Builder();
    add(b, "a b", "x y", false);
    add(b, "a b", "m n o", false);

    Analyzer a = getAnalyzer(b, true);
    assertAnalyzesTo(a,
                     "c a b d",
                     new String[] {"c", "x", "m", "y", "n", "o", "d"},
                     new int[]    { 0,   2,   2,   2,   2,   2,   6},
                     new int[]    { 1,   5,   5,   5,   5,   5,   7},
                     new String[] {"word", "SYNONYM", "SYNONYM", "SYNONYM", "SYNONYM", "SYNONYM", "word"},
                     new int[]    { 1,   1,   0,   1,   1,   1,   1},
                     new int[]    { 1,   1,   2,   3,   1,   1,   1}); // I think posLengths should be {1, 1, 1, 2, 1, 1, 1}.  because the longest synonym's length is 3
    a.close();
  }
{code}
Why PosLengths are those numbers? Do I misunderstand something?


)

> SynonymGraphFilter cannot handle input stream with tokens filtered.
> -------------------------------------------------------------------
>
>                 Key: LUCENE-8985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8985
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Chongchen Chen
>            Assignee: Jan Høydahl
>            Priority: Major
>             Fix For: 8.3
>
>         Attachments: SGF_SF_interaction.patch.txt
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [~janhoy] find the bug.
> In an analyzer with e.g. stopFilter where tokens are removed from the stream and replaced with a “hole”, synonymgraphfilter will not preserve these holes but remove them, resulting in certain phrase queries failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org