You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by okayndc <bo...@gmail.com> on 2012/04/05 19:34:10 UTC
HTML tags and Lucene highlighting
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more
current version soon.
The problem that I have is when I test search for a an HTML tag (ex.
<strong>), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to
"filter" HTML tags?
I have read up on HTMLStripChar filter (packaged with Solr) and wondered if
this is the way to go?
Any help will be greatly appreciated,
Thanks
Re: HTML tags and Lucene highlighting
Posted by okayndc <bo...@gmail.com>.
I want to retain the formatted HTML in a result but, want to ignore (or
filter out) HTML tags in a search, if this makes sense?
On Thu, Apr 5, 2012 at 3:44 PM, Steven A Rowe <sa...@syr.edu> wrote:
> okayndc,
>
> A field configured to use HTMLStripCharFilter as part of its index-time
> analyzer will strip out HTML tags before index terms are created by the
> tokenizer, so HTML tags will not be put into the index. As a result,
> queries for HTML tags cannot match the original documents' HTML tags (in
> the field configured to use HTMLStripCharFilter, anyway).
>
> So HTMLStripCharFilter should do what you want.
>
> Steve
>
> From: okayndc [mailto:bodymoves@gmail.com]
> Sent: Thursday, April 05, 2012 3:36 PM
> To: Steven A Rowe
> Cc: java-user@lucene.apache.org
> Subject: Re: HTML tags and Lucene highlighting
>
> Hello,
>
> I want to ignore HTML tags within a search. ~ I should not be able to
> search for a HTML tag (ex. <strong>) and get back the highlighted HTML tag
> (ex. <span class="highlighted"><strong></span>) in a result set.
>
> Thanks
>
> On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <sarowe@syr.edu<mailto:
> sarowe@syr.edu>> wrote:
> Hi okayndc,
>
> What *do* you want?
>
> Steve
>
> -----Original Message-----
> From: okayndc [mailto:bodymoves@gmail.com<ma...@gmail.com>]
> Sent: Thursday, April 05, 2012 1:34 PM
> To: java-user@lucene.apache.org<ma...@lucene.apache.org>
> Subject: HTML tags and Lucene highlighting
>
> Hello,
>
> I currently use Lucene version 3.0...probably need to upgrade to a more
> current version soon.
> The problem that I have is when I test search for a an HTML tag (ex.
> <strong>), Lucene returns
> the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to
> "filter" HTML tags?
> I have read up on HTMLStripChar filter (packaged with Solr) and wondered
> if this is the way to go?
>
> Any help will be greatly appreciated,
> Thanks
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<mailto:
> java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.org<mailto:
> java-user-help@lucene.apache.org>
>
>
RE: HTML tags and Lucene highlighting
Posted by Steven A Rowe <sa...@syr.edu>.
okayndc,
A field configured to use HTMLStripCharFilter as part of its index-time analyzer will strip out HTML tags before index terms are created by the tokenizer, so HTML tags will not be put into the index. As a result, queries for HTML tags cannot match the original documents' HTML tags (in the field configured to use HTMLStripCharFilter, anyway).
So HTMLStripCharFilter should do what you want.
Steve
From: okayndc [mailto:bodymoves@gmail.com]
Sent: Thursday, April 05, 2012 3:36 PM
To: Steven A Rowe
Cc: java-user@lucene.apache.org
Subject: Re: HTML tags and Lucene highlighting
Hello,
I want to ignore HTML tags within a search. ~ I should not be able to search for a HTML tag (ex. <strong>) and get back the highlighted HTML tag (ex. <span class="highlighted"><strong></span>) in a result set.
Thanks
On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <sa...@syr.edu>> wrote:
Hi okayndc,
What *do* you want?
Steve
-----Original Message-----
From: okayndc [mailto:bodymoves@gmail.com<ma...@gmail.com>]
Sent: Thursday, April 05, 2012 1:34 PM
To: java-user@lucene.apache.org<ma...@lucene.apache.org>
Subject: HTML tags and Lucene highlighting
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more current version soon.
The problem that I have is when I test search for a an HTML tag (ex.
<strong>), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to "filter" HTML tags?
I have read up on HTMLStripChar filter (packaged with Solr) and wondered if this is the way to go?
Any help will be greatly appreciated,
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<ma...@lucene.apache.org>
For additional commands, e-mail: java-user-help@lucene.apache.org<ma...@lucene.apache.org>
Re: HTML tags and Lucene highlighting
Posted by okayndc <bo...@gmail.com>.
Hello,
I want to ignore HTML tags within a search. ~ I should not be able to
search for a HTML tag (ex. <strong>) and get back the highlighted HTML tag
(ex. <span class="highlighted"><strong></span>) in a result set.
Thanks
On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <sa...@syr.edu> wrote:
> Hi okayndc,
>
> What *do* you want?
>
> Steve
>
> -----Original Message-----
> From: okayndc [mailto:bodymoves@gmail.com]
> Sent: Thursday, April 05, 2012 1:34 PM
> To: java-user@lucene.apache.org
> Subject: HTML tags and Lucene highlighting
>
> Hello,
>
> I currently use Lucene version 3.0...probably need to upgrade to a more
> current version soon.
> The problem that I have is when I test search for a an HTML tag (ex.
> <strong>), Lucene returns
> the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to
> "filter" HTML tags?
> I have read up on HTMLStripChar filter (packaged with Solr) and wondered
> if this is the way to go?
>
> Any help will be greatly appreciated,
> Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
RE: HTML tags and Lucene highlighting
Posted by Steven A Rowe <sa...@syr.edu>.
Hi okayndc,
What *do* you want?
Steve
-----Original Message-----
From: okayndc [mailto:bodymoves@gmail.com]
Sent: Thursday, April 05, 2012 1:34 PM
To: java-user@lucene.apache.org
Subject: HTML tags and Lucene highlighting
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more current version soon.
The problem that I have is when I test search for a an HTML tag (ex.
<strong>), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to "filter" HTML tags?
I have read up on HTMLStripChar filter (packaged with Solr) and wondered if this is the way to go?
Any help will be greatly appreciated,
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: HTML tags and Lucene highlighting
Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/04/06 2:34), okayndc wrote:
> Hello,
>
> I currently use Lucene version 3.0...probably need to upgrade to a more
> current version soon.
> The problem that I have is when I test search for a an HTML tag (ex.
> <strong>), Lucene returns
> the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to
> "filter" HTML tags?
> I have read up on HTMLStripChar filter (packaged with Solr) and wondered if
> this is the way to go?
>
> Any help will be greatly appreciated,
> Thanks
>
There is a way to encode HTML tags:
https://builds.apache.org/job/Lucene-3.x/javadoc/contrib-highlighter/org/apache/lucene/search/highlight/SimpleHTMLEncoder.html
koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org