You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sasank Mudunuri <sa...@gmail.com> on 2010/11/10 03:08:06 UTC

Highlighter - multiple instances of term being combined

I'm finding that if a keyword appears in a field multiple times very close
together, it will get highlighted as a phrase even though there are other
terms between the two instances. So this search:

http://localhost:8983/solr/select/?

hl=true&
hl.snippets=1&
q=residue&
hl.fragsize=0&
mergeContiguous=false&
indent=on&
hl.usePhraseHighlighter=false&
debugQuery=on&
hl.fragmenter=gap&
hl.highlightMultiTerm=false

Highlights as:
What does "low-<em>residue" mean? Like low-residue</em> diet?

Trying to get it to highlight as:
What does "low-<em>residue</em>" mean? Like low-<em>residue</em> diet?
I've tried playing with various combinations of mergeContiguous,
highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
output.

For reference, field type uses a StandardTokenizerFactory and
SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
SnowballFilterFactory. I've confirmed that the intermediate words don't
appear in either the synonym or the stop words list. I can post the full
definition if helpful.

Any pointers as to how to debug this would be greatly appreciated!
sasank

Re: Highlighter - multiple instances of term being combined

Posted by Sasank Mudunuri <sa...@gmail.com>.
Ahh this reconfirms. The analyzers are properly pulling things apart. There
are two instances of the query keyword with words between them. But from
your last comment, it sounds like the system's not trying to do any sort of
phrase highlighting, but is just hitting a weird edge case? I'm seeing this
behavior somewhat commonly, so I thought for sure there must be some option
that says if two highlighted words are sufficiently close together,
highlight them as a single phrase.

On Tue, Nov 9, 2010 at 7:11 PM, Lance Norskog <go...@gmail.com> wrote:

> Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link
> off the main solr admin page. It will show you how text is broken up
> for both the indexing and query processes. You might get some insight
> about how these words are torn apart and assigned positions. Trying
> the different Analyzers and options might get you there.
>
> But to be frank- highlighting is a tough problem and has always had a
> lot of edge cases.
>
> On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri <sa...@gmail.com> wrote:
> > I'm finding that if a keyword appears in a field multiple times very
> close
> > together, it will get highlighted as a phrase even though there are other
> > terms between the two instances. So this search:
> >
> > http://localhost:8983/solr/select/?
> >
> > hl=true&
> > hl.snippets=1&
> > q=residue&
> > hl.fragsize=0&
> > mergeContiguous=false&
> > indent=on&
> > hl.usePhraseHighlighter=false&
> > debugQuery=on&
> > hl.fragmenter=gap&
> > hl.highlightMultiTerm=false
> >
> > Highlights as:
> > What does "low-<em>residue" mean? Like low-residue</em> diet?
> >
> > Trying to get it to highlight as:
> > What does "low-<em>residue</em>" mean? Like low-<em>residue</em> diet?
> > I've tried playing with various combinations of mergeContiguous,
> > highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
> > output.
> >
> > For reference, field type uses a StandardTokenizerFactory and
> > SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
> > SnowballFilterFactory. I've confirmed that the intermediate words don't
> > appear in either the synonym or the stop words list. I can post the full
> > definition if helpful.
> >
> > Any pointers as to how to debug this would be greatly appreciated!
> > sasank
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Highlighter - multiple instances of term being combined

Posted by Lance Norskog <go...@gmail.com>.
Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link
off the main solr admin page. It will show you how text is broken up
for both the indexing and query processes. You might get some insight
about how these words are torn apart and assigned positions. Trying
the different Analyzers and options might get you there.

But to be frank- highlighting is a tough problem and has always had a
lot of edge cases.

On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri <sa...@gmail.com> wrote:
> I'm finding that if a keyword appears in a field multiple times very close
> together, it will get highlighted as a phrase even though there are other
> terms between the two instances. So this search:
>
> http://localhost:8983/solr/select/?
>
> hl=true&
> hl.snippets=1&
> q=residue&
> hl.fragsize=0&
> mergeContiguous=false&
> indent=on&
> hl.usePhraseHighlighter=false&
> debugQuery=on&
> hl.fragmenter=gap&
> hl.highlightMultiTerm=false
>
> Highlights as:
> What does "low-<em>residue" mean? Like low-residue</em> diet?
>
> Trying to get it to highlight as:
> What does "low-<em>residue</em>" mean? Like low-<em>residue</em> diet?
> I've tried playing with various combinations of mergeContiguous,
> highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
> output.
>
> For reference, field type uses a StandardTokenizerFactory and
> SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
> SnowballFilterFactory. I've confirmed that the intermediate words don't
> appear in either the synonym or the stop words list. I can post the full
> definition if helpful.
>
> Any pointers as to how to debug this would be greatly appreciated!
> sasank
>



-- 
Lance Norskog
goksron@gmail.com