You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2016/11/08 04:29:59 UTC
[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words
in synonym definition can give more results than expected
[ https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646457#comment-15646457 ]
Vikas Saurabh commented on OAK-4804:
------------------------------------
OAK-4804
Btw, there's an ugly work around to solve some of the cases.
Assuming that index def is only about full text aggregated on top-level node, then if we have following nodes
* /AB -> contains {{"A B"}}
* /AXB -> contains {{"A X B"}}
* /C D -> contains {{"C D"}}
* /E -> contains {{E}}
* /F -> contains {{F}}
then having synonym def to have
{noformat}
A B=>A,B
C D=>C,D
A B,C D=>E,F
E,F
{noformat}
would give following results
||query text||result||comment||
|E|/E, /F /AB, /CD| |
|A B|/AB, /AXB| |
|A|/AB, /AXB| |
|"A B"|/AB, /AXB, /E, /F, /CD|possibly unexpected|
|C D|/CD| |
|D|/CD| |
|"C D"|/CD, /E, /F, /AB|possibly unexpected|
> Synonym analyzer with multiple words in synonym definition can give more results than expected
> ----------------------------------------------------------------------------------------------
>
> Key: OAK-4804
> URL: https://issues.apache.org/jira/browse/OAK-4804
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: lucene
> Reporter: Vikas Saurabh
> Assignee: Vikas Saurabh
> Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
> @Test
> public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
> Tree idx = createFulltextIndex(root.getTree("/"), "test");
> TestUtil.useV2(idx);
> Tree anl = idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
> anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME, "Standard");
> Tree synFilter = anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
> synFilter.setProperty("synonyms", "syn.txt");
> synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, "FTW, For the win");
> Tree test = root.getTree("/").addChild("test");
> test.addChild("1").setProperty("foo", "FTW");
> test.addChild("2").setProperty("foo", "For the win");
> test.addChild("3").setProperty("foo", "For gods sake, this is not the way to win it");
> root.commit();
> assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND ISDESCENDANTNODE('/test')",
> asList("/test/1", "/test/2"));//current (failing result is ["/test/1", "/test/2", "/test/3"])
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)