You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Mike Klaas (JIRA)" <ji...@apache.org> on 2008/06/05 07:05:45 UTC

[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

    [ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602541#action_12602541 ] 

Mike Klaas commented on SOLR-556:
---------------------------------

Ah, I see what the problem is:  Although it is impossible for tokens from different values to appear in the same fragment (due to the semantics of MultiValuedTokenFilter), the non-token text (typically, punctuation) from different values can bleed into the same fragment, since lucene's highlighter can only create a new fragment on token boundaries.

Unfortunately SOLR-553 was committed a day after you submitted your patch, and rearranges the code slightly so that it no longer applies.  Could you sync the patch with trunk?  I think the basic approach is sound.

> Highlighting of multi-valued fields returns snippets which span multiple different values
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-556
>                 URL: https://issues.apache.org/jira/browse/SOLR-556
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 1.3
>         Environment: Tomcat 5.5
>            Reporter: Lars Kotthoff
>            Assignee: Mike Klaas
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: solr-highlight-multivalued-example.xml, solr-highlight-multivalued.patch
>
>
> When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values "foo" and "bar" and search term "ba" the highlighter will create the snippet "foo<em>ba</em>r". Furthermore it sometimes returns smaller snippets than it should, e.g. with value "foobar" and search term "oo" it will create the snippet "<em>oo</em>" regardless of hl.fragsize.
> I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps:
> * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though)
> * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though)
> * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.