You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Juan Carlos Serrano <jc...@gmail.com> on 2014/02/19 16:53:07 UTC

Exact fragment length in highlighting

Hello everybody,

I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
exactly the number of characters of a fragment used in highlights. If I use
hl.fragsize=70 the length of the fragments that I get is variable (often)
and I get results of 90 characters length.

Regards and thanks in advance,

Juan Carlos

Re: Exact fragment length in highlighting

Posted by Jason Hellman <jh...@innoventsolutions.com>.
Juan,

Pay close attention to the boundary scanner you’re employing:

http://wiki.apache.org/solr/HighlightingParameters#hl.boundaryScanner

You can be explicit to indicate a type (hl.bs.type) with options such as CHARACTER, WORD, SENTENCE, and LINE.  The default is WORD (as the wiki indicates) and I presume this is what you are employing.

Be careful about using explicit characters.  I had an interesting case of highlight returns that looked like this:

> This is a highlight
> Here is another highlight
> Yes, another one, etc…

It was a bit maddening trying to figure out why “>” was in the highlight…turned out it was XML content and the character boundary clipped the trailing “>” based on the boundary rules.

In any case, you should be able to achieve a pretty flexible result depending on what you’re really after with the right combination of settings.

Jason

On Feb 19, 2014, at 7:53 AM, Juan Carlos Serrano <jc...@gmail.com> wrote:

> Hello everybody,
> 
> I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
> exactly the number of characters of a fragment used in highlights. If I use
> hl.fragsize=70 the length of the fragments that I get is variable (often)
> and I get results of 90 characters length.
> 
> Regards and thanks in advance,
> 
> Juan Carlos


Re: Exact fragment length in highlighting

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Juan,

Are you counting number of characters of html rendered snippet?

I think pre and post strings (html markup which are not displayed) are causing that difference.

Ahmet


On Wednesday, February 19, 2014 5:53 PM, Juan Carlos Serrano <jc...@gmail.com> wrote:
Hello everybody,

I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
exactly the number of characters of a fragment used in highlights. If I use
hl.fragsize=70 the length of the fragments that I get is variable (often)
and I get results of 90 characters length.

Regards and thanks in advance,

Juan Carlos