You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matt Mitchell <go...@gmail.com> on 2009/04/28 03:24:41 UTC

highlighting html content

Hi,

I've been looking around but can't seem to find any clear instruction on how
to do this... I'm storing html content and would like to enable highlighting
on the html content. The problem is that the search can sometimes match html
element names or attributes, and when the highlighter adds the highlight
tags, the html is bad.

I've been toying with setting custom pre/post delimiters and then removing
them in the client, but I thought I'd ask the list before I go to far with
that idea :)

Thanks,
Matt

Re: highlighting html content

Posted by Matt Mitchell <go...@gmail.com>.
Hi Christian,

I decided to do something very similar. How do you handle cases where the
highlighting is inside of html/xml tags though? I'm getting stuff like this:

?q=jackson

<entry type="song" author="Michael <em>Jackson</em>">Bad by Michael
<em>Jackson</em></entry>

I wrote a regular expression to take care of the html/xml problem
(highlighting inside of the tag), I'd be interested in seeing your and
others approach to this, even if it's a regular expression.

Matt

On Tue, Apr 28, 2009 at 3:21 AM, Christian Vogler <
christian.vogler@gmail.com> wrote:

> Hi Matt,
>
> On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell <go...@gmail.com>
> wrote:
> > I've been toying with setting custom pre/post delimiters and then
> removing
> > them in the client, but I thought I'd ask the list before I go to far
> with
> > that idea :)
>
> this is what I do. I define the custom highlight delimiters as
> [solr:hl] and [/solr:hl], and then do a string replace with <em
> class="highlight"> </em> on the search results.
>
> It is simple to implement, and effective.
>
> Best regards
> - Christian
>

Re: highlighting html content

Posted by Christian Vogler <ch...@gmail.com>.
Hi Matt,

On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell <go...@gmail.com> wrote:
> I've been toying with setting custom pre/post delimiters and then removing
> them in the client, but I thought I'd ask the list before I go to far with
> that idea :)

this is what I do. I define the custom highlight delimiters as
[solr:hl] and [/solr:hl], and then do a string replace with <em
class="highlight"> </em> on the search results.

It is simple to implement, and effective.

Best regards
- Christian