You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Cool Coder <te...@yahoo.com> on 2007/10/25 20:18:26 UTC

HTML analyzer

Is there any analyzer that can be configured to stop searching url test i.e. href=""? Maybe I need some sort of filter with reg ex so that searcher will skip searching text if it matches regular expressions in the filter. I am not sure whether this is possible? I would appreciate your valuable suggestion/input.
   
  - BR

 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: HTML analyzer

Posted by Cool Coder <te...@yahoo.com>.
Thanks Ketin for your input. There is already build in HTML strip reader i.e. HTMLStripReader in solr, which I am currently using to strip all HTML tags before creating index. This also solved my earlier problem related to highlighter , which was highlighting HTML tags e.g. I was searching for "net" and result was something http://sdjkkjsd.net and it got converted to http://sjhdnjkshn.<b>net</b> by highlighter.
   
  -BR

Karl Wettin <ka...@gmail.com> wrote:
  
25 okt 2007 kl. 20.18 skrev Cool Coder:

> Is there any analyzer that can be configured

All of them can be.

TokenFilter.html>

I suggest you take a look at the code of any of them, 
StandardAnalyzer for instance.



-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: HTML analyzer

Posted by Karl Wettin <ka...@gmail.com>.
25 okt 2007 kl. 20.18 skrev Cool Coder:

> Is there any analyzer that can be configured

All of them can be.

<http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/ 
TokenFilter.html>

I suggest you take a look at the code of any of them,  
StandardAnalyzer for instance.



-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org