You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2012/04/22 23:59:34 UTC

[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms

    [ https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259278#comment-13259278 ] 

Jan Høydahl commented on SOLR-3390:
-----------------------------------

This is due to how the multi word synonym is inserted at the same position as the original term, and we have no way to tell whether you match the synonym or the original term since that information is lost after Analysis processing.

This case would be solved by encoding term positions as a graph in such a way that the synonym node "domain name system" would occupy the same position as the original node "dns". This however would be a major change.
                
> Highlighting issue with multi-word synonyms causes to highlight the wrong terms
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-3390
>                 URL: https://issues.apache.org/jira/browse/SOLR-3390
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter, query parsers
>    Affects Versions: 3.6
>         Environment: Windows 7. (Development machine, not the server) 
>            Reporter: Rahul Babulal
>              Labels: highlighter, multi-word, solr, synonyms
>
> I am using solr 3.6 and when I have multi-words synonyms the highlighting results have the wrong word highlighted. 
> If I have the below entry in the synonyms file:
> dns, domain name system 
> If I index something like: "A sample dns entry explaining the details".
> Searching for "name" (without quotes) in the highlight results/snippets I get :  "A sample dns <em>entry</em> explaining the details". (The token "entry" overlaps with the token "name" in the analysis.jsp)
> Searching for "system" (without quotes) in the highlight results/snippets I get :  "A sample dns entry <em>explaining</em> the details". (The token "explaining" overlaps with the token "system" in the analysis.jsp)
> Here is my schema field Type:
> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PorterStemFilterFactory"/>        
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
> 		<filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org