You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2016/11/08 04:29:59 UTC

[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected

    [ https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646457#comment-15646457 ] 

Vikas Saurabh commented on OAK-4804:
------------------------------------

OAK-4804
Btw, there's an ugly work around to solve some of the cases.

Assuming that index def is only about full text aggregated on top-level node, then if we have following nodes
* /AB -> contains {{"A B"}}
* /AXB -> contains {{"A X B"}}
* /C D -> contains {{"C D"}}
* /E -> contains {{E}}
* /F -> contains {{F}}

then having synonym def to have
{noformat}
A B=>A,B
C D=>C,D
A B,C D=>E,F
E,F
{noformat}
would give following results
||query text||result||comment||
|E|/E, /F /AB, /CD| |
|A B|/AB, /AXB| |
|A|/AB, /AXB| |
|"A B"|/AB, /AXB, /E, /F, /CD|possibly unexpected|
|C D|/CD| |
|D|/CD| |
|"C D"|/CD, /E, /F, /AB|possibly unexpected|

> Synonym analyzer with multiple words in synonym definition can give more results than expected
> ----------------------------------------------------------------------------------------------
>
>                 Key: OAK-4804
>                 URL: https://issues.apache.org/jira/browse/OAK-4804
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>            Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
>     @Test
>     public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
>         Tree idx = createFulltextIndex(root.getTree("/"), "test");
>         TestUtil.useV2(idx);
>         Tree anl = idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
>         anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME, "Standard");
>         Tree synFilter = anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
>         synFilter.setProperty("synonyms", "syn.txt");
>         synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, "FTW, For the win");
>         Tree test = root.getTree("/").addChild("test");
>         test.addChild("1").setProperty("foo", "FTW");
>         test.addChild("2").setProperty("foo", "For the win");
>         test.addChild("3").setProperty("foo", "For gods sake, this is not the way to win it");
>         root.commit();
>         assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND ISDESCENDANTNODE('/test')",
>                 asList("/test/1", "/test/2"));//current (failing result is ["/test/1", "/test/2", "/test/3"])
>     }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)