You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sreehareesh Kaipravan Meethaleveetil <sm...@sapient.com> on 2013/09/04 13:47:15 UTC

Solr highlighting fragment issue

Hi,
I'm having some  issues with Solr search results (using Solr 1.4 ) . I have enabled highlighting of searched text (hl=true) and set the fragment size as 500 (hl.fragsize=500) in the search query.
Below is the (screen shot) results shown when I searched for the term 'grandfather' (2 results are displayed) .
Now I have couple of problems in this.

1.       In the search results the keyword is appearing inconsistently towards the start/end of the text. I'd like to control the number of characters appearing before and after the keyword match (highlighted term). More specifically I'd like to get the keyword match somewhere around the middle of the resultant text.

2.       The total number of characters appearing in the search result is never equals the fragment size I specified (500 characters). It varies in greater extends (for example  408 or 520).
Please share your thoughts on achieving the above 2 results.
[cid:image001.png@01CEA8D2.4FF025E0]
Thanks & Regards,
Sreehareesh KM

RE: Solr highlighting fragment issue

Posted by Bryan Loofbourrow <bl...@knowledgemosaic.com>.
>> I’m having some  issues with Solr search results (using Solr 1.4 ) . I
have enabled highlighting of searched text (hl=true) and set the fragment
size as 500 (hl.fragsize=500) in the search query.

Below is the (screen shot) results shown when I searched for the term
‘grandfather’ (2 results are displayed) .

Now I have couple of problems in this.

1.       In the search results the keyword is appearing inconsistently
towards the start/end of the text. I’d like to control the number of
characters appearing before and after the keyword match (highlighted term).
More specifically I’d like to get the keyword match somewhere around the
middle of the resultant text.

2.       The total number of characters appearing in the search result is
never equals the fragment size I specified (500 characters). It varies in
greater extends (for example  408 or 520).

Please share your thoughts on achieving the above 2 results. <<

I can’t see your screenshot, but it doesn’t really matter.



If I remember correctly how this stuff works, I think you’re going to have
a challenge getting where you want to get. In your position, I would push
back on both of those requirements rather than try to solve the problem.



For (1), the issue is that, IIRC, the highlighter breaks up your documents
into fragments BEFORE it knows where the matches are. I’d think you’d have
to pretty seriously recast the algorithm to get the result you want.



For (2), it may well be that you could tune the fragmenter to get closer to
your desired number of characters, either writing your own, or using the
available regexes and whatnot. But getting an exact number of characters
does not seem reasonable, because I’m pretty sure that there is a
constraint that a matching term must appear in its entirety in one fragment
– and also that sometimes fragments are concatenated. Imagine, for example,
a matched phrase where the start of the phrase is in one fragment, and the
end is in another. Which goes back to the first point.



So if you absolutely must have both of these (and the second one is
strange, since it implies that your fragments will often start and end in
the middles of words), then I guess you would need to rewrite the
fragmenting algorithm to drive fragmenting from the matches.



-- Bryan