You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "O. Klein" <kl...@octoweb.nl> on 2011/09/22 16:09:25 UTC

Snippets and Boundaryscanner in Highlighter

Im testing the new Boundaryscanner in the highlighter, but I can't get it to
show more then 1 snippet.

<str name="f.content_text.hl.snippets">2</str>

Bug or am I doing something wrong?

--
View this message in context: http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3358898.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snippets and Boundaryscanner in Highlighter

Posted by "O. Klein" <kl...@octoweb.nl>.
OK, I found the problem was in our new interface.

Your feedback made me look deeper. Thanx.

--
View this message in context: http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3361571.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snippets and Boundaryscanner in Highlighter

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(11/09/23 20:03), O. Klein wrote:
> The regex fragmenter showed that there was enough content to show multiple
> snippets.
>
> The amount of snippets has no effect on any of the types of breakIterator.
> Only fragsize has effect.
>
> Or is this highlighter not supporting multiple snippets?

This highlighter supports multiple snippets (as I showed you at the first reply).

koji
-- 
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: Snippets and Boundaryscanner in Highlighter

Posted by "O. Klein" <kl...@octoweb.nl>.
The regex fragmenter showed that there was enough content to show multiple
snippets.

The amount of snippets has no effect on any of the types of breakIterator.
Only fragsize has effect.

Or is this highlighter not supporting multiple snippets?





--
View this message in context: http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3361510.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snippets and Boundaryscanner in Highlighter

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(11/09/23 8:57), O. Klein wrote:
> The content_text field is filled with text from pdf's. So this is not the
> problem. Besides the regex fragmenter gives back multiple snippets like
> expected.

This doesn't show that BoundaryScanner has the bug. Highlighter's fragmenter
and FVH FragmentsBuilder are totally different.

> Have you tested to see if a boundaryscanner of type LINE gives back multiple
> snippets with your content?

No, I haven't. Do you mean LINE type causes the problem? Can you get two snippets
if you use WORD type BreakIteratorBoundaryScanner?

You can implement your own BoundaryScanner instead, if you think
LINE BreakIterator doesn't work as you expected.

koji
-- 
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: Snippets and Boundaryscanner in Highlighter

Posted by "O. Klein" <kl...@octoweb.nl>.
The content_text field is filled with text from pdf's. So this is not the
problem. Besides the regex fragmenter gives back multiple snippets like
expected.

Have you tested to see if a boundaryscanner of type LINE gives back multiple
snippets with your content?

--
View this message in context: http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3360499.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snippets and Boundaryscanner in Highlighter

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(11/09/23 7:59), O. Klein wrote:
> Thanx for you answer, but you are not using the Boundaryscanner

No. Regardless of specifying BoundaryScanner or not, it is used implicitly
because BaseFragmentsBuilder always use it (SimpleBoundaryScanner is the default).

Try to index a long text and highlight the first and the last of the text:

q=A B

<doc>
<field name="content_text">A ... very looong text ... B</field>
</doc>

koji
-- 
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: Snippets and Boundaryscanner in Highlighter

Posted by "O. Klein" <kl...@octoweb.nl>.
Thanx for you answer, but you are not using the Boundaryscanner

<str name="f.content_text.hl.boundaryScanner">breakIterator</str>
<str name="f.content_text.hl.bs.type">LINE</str>

was the config I used and with

<str name="f.content_text.hl.snippets">2</str>

I expect to see 2 lines, but I only see one.

--
View this message in context: http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3360398.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snippets and Boundaryscanner in Highlighter

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(11/09/22 23:09), O. Klein wrote:
> Im testing the new Boundaryscanner in the highlighter, but I can't get it to
> show more then 1 snippet.
>
> <str name="f.content_text.hl.snippets">2</str>
>
> Bug or am I doing something wrong?

I think your content_text is too short to get more than one snippets?

Try the following with solr example (I'm using trunk):

1.
http://localhost:8983/solr/select?q=SD+AND+battery&fq=&fl=includes&hl=on&hl.fl=includes&hl.useFastVectorHighlighter=true&hl.snippets=2

you request 2 snippets, but Solr will return 1 snippet:

<lst name="highlighting">
   <lst name="9885A004">
     <arr name="includes">
       <str>32MB <em>SD</em> card, USB cable, AV cable, <em>battery</em> </str>
     </arr>
   </lst>
</lst>

2.
http://localhost:8983/solr/select?q=SD+AND+battery&fq=&fl=includes&hl=on&hl.fl=includes&hl.useFastVectorHighlighter=true&hl.snippets=2&hl.fragsize=18

now you request 2 snippets with shorter fragsize option, then Solr can return 2 snippets:

<lst name="highlighting">
   <lst name="9885A004">
     <arr name="includes">
       <str>32MB <em>SD</em> card, USB cable</str>
       <str>cable, <em>battery</em> </str>
     </arr>
   </lst>
</lst>

koji
-- 
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/