You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Bill Steele (JIRA)" <ji...@apache.org> on 2013/11/05 22:43:18 UTC

[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

    [ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814281#comment-13814281 ] 

Bill Steele commented on SOLR-5379:
-----------------------------------

We found this to be much more useful code for multiword synonyms.  We ran some tests, and when having a synonym set such as:

seabiscuit, sea biscuit, sea biscit

Search on the following:

seabiscuit article

Returned matches with the following terms

Sea biscit article
Sea biscuit article
Seabiscuit article
Biscuit Sea article
Sea article
Biscit article

With this patch, the above search query just returned the terms:

Sea biscit article
Sea biscuit article
Seabiscuit article



> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-5379
>                 URL: https://issues.apache.org/jira/browse/SOLR-5379
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nguyen Manh Tien
>              Labels: multi-word, queryparser, synonym
>             Fix For: 4.5.1, 4.6
>
>         Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words.
> For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org