You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hemant Verma (JIRA)" <ji...@apache.org> on 2013/06/05 10:43:20 UTC

[jira] [Comment Edited] (SOLR-4381) Query-time multi-word synonym expansion

    [ https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675702#comment-13675702 ] 

Hemant Verma edited comment on SOLR-4381 at 6/5/13 8:42 AM:
------------------------------------------------------------

While using this patch I found one scenario in which it is not working properly.
I have in my synonyms list the below keywords:
       pepsi,pepsico,pbg
       outsourcing,rpo,offshoring

Difference in expanding synonyms comes up when I use any of the word with stopword as a prefix.

Search Keyword ------------ Expanded Result
----------------------------------------------------------------
pepsi -----------------------> pepsi, pepsico, pbg
pbg -------------------------> pepsi, pepsico, pbg
the pepsi -----------------> pepsi, pepsico
the pbg --------------------> pepsi, pbg
outsourcing -----------------> outsourc, offshor, rpo
the outsourcing -------------> outsourc, offshor

The above expanded synonyms result shows that when we use any keyword (available in synonym list) prefixed with stopword then expanded synonyms do miss few synonym.
                
      was (Author: hemantverma09):
    While using this patch I found one scenario in which it is not working properly.
I have in my synonyms list the below keywords:
       pepsi,pepsico,pbg
       outsourcing,rpo,offshoring

Difference in expanding synonyms comes up when I use any of the word with stopword as a prefix.

Search Keyword       Expanded Result
--------------       ---------------
pepsi                pepsi, pepsico, pbg
pbg                  pepsi, pepsico, pbg
the pepsi            pepsi, pepsico
the pbg              pepsi, pbg
outsourcing          outsourc, offshor, rpo
the outsourcing      outsourc, offshor

The above expanded synonyms result shows that when we use any keyword (available in synonym list) prefixed with stopword then expanded synonyms do miss few synonym.
                  
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.4
>
>         Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory] caution that index-time synonym expansion should be preferred to query-time synonym expansion, due to the way multi-word synonyms are treated and how IDF values can be boosted artificially. But query-time expansion should have huge benefits, given that changes to the synonyms don't require re-indexing, the index size stays the same, and the IDF values for the documents don't get permanently altered.
> The proposed solution is to move the synonym expansion logic from the analysis chain (either query- or index-type) and into a new QueryParser.  See the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is extended, and synonym expansion is done on-the-fly.  Queries are parsed into a lattice (i.e. all possible synonym combinations), while individual components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, so it invites experimentation and improvement.  And I think it fits in well with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and [the Github page for the code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently fixes SOLR-3390 (highlighting problems with multi-word synonyms) and LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org