You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2013/03/17 04:36:14 UTC

[jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion

    [ https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604491#comment-13604491 ] 

Otis Gospodnetic commented on SOLR-4381:
----------------------------------------

bq. In general, I agree with you that some rapid iteration outside of the Solr core would probably be a better approach than outright integration. Please consider my "merge request" withdrawn; I'll let the code incubate for a bit, and then look into integration later.

Has that time come by any chance?

                
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.3
>
>         Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory] caution that index-time synonym expansion should be preferred to query-time synonym expansion, due to the way multi-word synonyms are treated and how IDF values can be boosted artificially. But query-time expansion should have huge benefits, given that changes to the synonyms don't require re-indexing, the index size stays the same, and the IDF values for the documents don't get permanently altered.
> The proposed solution is to move the synonym expansion logic from the analysis chain (either query- or index-type) and into a new QueryParser.  See the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is extended, and synonym expansion is done on-the-fly.  Queries are parsed into a lattice (i.e. all possible synonym combinations), while individual components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, so it invites experimentation and improvement.  And I think it fits in well with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and [the Github page for the code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently fixes SOLR-3390 (highlighting problems with multi-word synonyms) and LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org