You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chongchen Chen (Jira)" <ji...@apache.org> on 2019/10/10 00:01:24 UTC
[jira] [Issue Comment Deleted] (LUCENE-8985) SynonymGraphFilter
cannot handle input stream with tokens filtered.
[ https://issues.apache.org/jira/browse/LUCENE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chongchen Chen updated LUCENE-8985:
-----------------------------------
Comment: was deleted
(was: [~janhoy] I find two testcases in TestSynonymGraphFilter that I cannot understand.
{code:java}
public void testBasicKeepOrigTwoOutputs() throws Exception {
SynonymMap.Builder b = new SynonymMap.Builder();
add(b, "a b", "x y", true);
add(b, "a b", "m n o", true);
Analyzer a = getAnalyzer(b, true);
assertAnalyzesTo(a,
"c a b d",
new String[] {"c", "x", "m", "a", "y", "n", "o", "b", "d"},
new int[] { 0, 2, 2, 2, 2, 2, 2, 4, 6},
new int[] { 1, 5, 5, 3, 5, 5, 5, 5, 7},
new String[] {"word", "SYNONYM", "SYNONYM", "word", "SYNONYM", "SYNONYM", "SYNONYM", "word", "word"},
new int[] { 1, 1, 0, 0, 1, 1, 1, 1, 1},
new int[] { 1, 1, 2, 4, 4, 1, 2, 1, 1}); // I think posLengths should be {1, 1, 1, 1, 2, 1, 1, 2, 1} . because the longest synonym's length is 3
a.close();
}
public void testBasicNoKeepOrigTwoOutputs() throws Exception {
SynonymMap.Builder b = new SynonymMap.Builder();
add(b, "a b", "x y", false);
add(b, "a b", "m n o", false);
Analyzer a = getAnalyzer(b, true);
assertAnalyzesTo(a,
"c a b d",
new String[] {"c", "x", "m", "y", "n", "o", "d"},
new int[] { 0, 2, 2, 2, 2, 2, 6},
new int[] { 1, 5, 5, 5, 5, 5, 7},
new String[] {"word", "SYNONYM", "SYNONYM", "SYNONYM", "SYNONYM", "SYNONYM", "word"},
new int[] { 1, 1, 0, 1, 1, 1, 1},
new int[] { 1, 1, 2, 3, 1, 1, 1}); // I think posLengths should be {1, 1, 1, 2, 1, 1, 1}. because the longest synonym's length is 3
a.close();
}
{code}
Why PosLengths are those numbers? Do I misunderstand something?
)
> SynonymGraphFilter cannot handle input stream with tokens filtered.
> -------------------------------------------------------------------
>
> Key: LUCENE-8985
> URL: https://issues.apache.org/jira/browse/LUCENE-8985
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Chongchen Chen
> Assignee: Jan Høydahl
> Priority: Major
> Fix For: 8.3
>
> Attachments: SGF_SF_interaction.patch.txt
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> [~janhoy] find the bug.
> In an analyzer with e.g. stopFilter where tokens are removed from the stream and replaced with a “hole”, synonymgraphfilter will not preserve these holes but remove them, resulting in certain phrase queries failing.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org