You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2013/01/30 19:33:16 UTC

[jira] [Comment Edited] (SOLR-4381) Query-time multi-word synonym expansion

    [ https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566732#comment-13566732 ] 

Steve Rowe edited comment on SOLR-4381 at 1/30/13 6:32 PM:
-----------------------------------------------------------

bq. As it turns out, I'm having problems getting it to work with Solr 4.1.0 (due to [this bug|https://github.com/healthonnet/hon-lucene-synonyms/issues/4]), although 3.5.0 - 4.0.0 all work nicely.

I commented on the bug with more details, but basically you need to call reset() before using any tokenstream.
                
      was (Author: steve_rowe):
    bq. As it turns out, I'm having problems getting it to work with Solr 4.1.0 (due to [this bug|https://github.com/healthonnet/hon-lucene-synonyms/issues/4), although 3.5.0 - 4.0.0 all work nicely.

I commented on the bug with more details, but basically you need to call reset() before using any tokenstream.
                  
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.2, 5.0
>
>         Attachments: SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory] caution that index-time synonym expansion should be preferred to query-time synonym expansion, due to the way multi-word synonyms are treated and how IDF values can be boosted artificially. But query-time expansion should have huge benefits, given that changes to the synonyms don't require re-indexing, the index size stays the same, and the IDF values for the documents don't get permanently altered.
> The proposed solution is to move the synonym expansion logic from the analysis chain (either query- or index-type) and into a new QueryParser.  See the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is extended, and synonym expansion is done on-the-fly.  Queries are parsed into a lattice (i.e. all possible synonym combinations), while individual components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, so it invites experimentation and improvement.  And I think it fits in well with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and [the Github page for the code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently fixes SOLR-3390 (highlighting problems with multi-word synonyms) and LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org