You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Zisis Tachtsidis <zi...@runbox.com> on 2015/01/20 17:45:11 UTC

PostingsHighlighter highlighted snippet size (fragsize)

Hi all,

I'm using SolrCloud 4.10.0 and trying to incorporate
PostingsSolrHighlighter. One issue that I'm having is that I cannot have the
functionality of "hl.fragsize" in PostingsSolrHighlighter. How can I limit
the size of the highlighted text? I get highlighted results but their
snippet size varies and can be quite large in some cases (>1000 chars). Note
that I've done this successfully using hl.fragsize and the default Solr
highlighter.

The field I want highlighting on is defined as 
/<field name="highlighted_text" type="text_en" indexed="true" stored="true"
storeOffsetsWithPositions="true" multiValued="true" />/

"text_en" is the default definition. I've even tried using only
StandardTokenizer (no filters) for index/query chains to avoid issues
described at https://issues.apache.org/jira/browse/LUCENE-4641.

and the highlighter is defined as follows in solrconfg.xml (all other
highlight components are commented out)
/<searchComponent name="highlight" class="solr.HighlightComponent">
   <highlighting class="org.apache.solr.highlight.PostingsSolrHighlighter"/>
</searchComponent>
/
My search query looks like
//select?q=highlighted_text:introduction&wt=json&indent=true
&hl=true&hl.fl=highlighted_text&hl.simple.pre=<em>&hl.simple.post=</em>/



--
View this message in context: http://lucene.472066.n3.nabble.com/PostingsHighlighter-highlighted-snippet-size-fragsize-tp4180634.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: PostingsHighlighter highlighted snippet size (fragsize)

Posted by Zisis Tachtsidis <zi...@runbox.com>.

It seems that a solution has been found.

PostingsHighlighter uses by default Java's SENTENCE BreakIterator so it
breaks the snippets into fragments per sentence.
In my text_en analysis chain though I was using a filter that lowercases
input and this seems to mess with the logic of SENTENCE BreakIterator.
Removing the filter did the trick.

Apart from that there is a new issue now. I'm trying to search on one field
and highlight another and this seems to not be working even If I use the
exact same analyzers for both fields. I get the correct results in the
highlighting section but there is no highlight. Digging deeper I've found
inside PostingsHighlighter.highlightFieldsAsObjects() (line 393 in version
4.10.3) that the fields to be highlighted (I guess) are the intersection of
the query terms set (fields used in the search query) and the set of fields
to be highlighted (defined by the hl.fl param). So, unless I use the field
to be highlighted in the search query I get no highlight.



--
View this message in context: http://lucene.472066.n3.nabble.com/PostingsHighlighter-highlighted-snippet-size-fragsize-tp4180634p4182596.html
Sent from the Solr - User mailing list archive at Nabble.com.