You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Cool Coder <te...@yahoo.com> on 2007/10/24 23:27:27 UTC

Highlighter and href fields

Is there anyway I stop highlighting text if it is a href/url etc...? The problem occurs when the field content is a URL which contains the query e.g. my search is for .net and fields has value http://jkjsd.net. After applying highlighter, it becomes http://jkjsd<b>.net</b>, which is a wrong URL. Can I filter it out?
   
  - BR

 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Highlighter and href fields

Posted by Cool Coder <te...@yahoo.com>.
Ok I understand now that I have a big work ahead of me.
  >2. Use an Analyzer that recognizes URL's. That way you wont get partial
  BTW, Do you know any analyzer that can recognize URLs.
   
  - BR

Mark Miller <ma...@gmail.com> wrote:
  Nothing in the Highlighter per seh that will help you there. I see two 
options off the top of my head:

1. break the text before feeding it to the highlighter and feed all but 
the URL parts, and then stitch back together -- much as you might do if 
highlighting an XML doc. Ugly though.

2. Use an Analyzer that recognizes URL's. That way you wont get partial 
URL matches like .net. Each URL would be a full token, and would require 
a search matching the entire URL to match. Even if you already indexed 
with a different Analzyer, you could use this special Analyzer just for 
highlighting...it would act exactly the same as your indexing Analyzer, 
but would parse any URL as a single token. Of course, if you are using 
TokenSource, this is not an option.

- Mark

Cool Coder wrote:
> Is there anyway I stop highlighting text if it is a href/url etc...? The problem occurs when the field content is a URL which contains the query e.g. my search is for .net and fields has value http://jkjsd.net. After applying highlighter, it becomes http://jkjsd.net, which is a wrong URL. Can I filter it out?
> 
> - BR
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Highlighter and href fields

Posted by Mark Miller <ma...@gmail.com>.
Nothing in the Highlighter per seh that will help you there. I see two 
options off the top of my head:

1. break the text before feeding it to the highlighter and feed all but 
the URL parts, and then stitch back together -- much as you might do if 
highlighting an XML doc. Ugly though.

2. Use an Analyzer that recognizes URL's. That way you wont get partial 
URL matches like .net. Each URL would be a full token, and would require 
a search matching the entire URL to match. Even if you already indexed 
with a different Analzyer, you could use this special Analyzer just for 
highlighting...it would act exactly the same as your indexing Analyzer, 
but would parse any URL as a single token. Of course, if you are using 
TokenSource, this is not an option.

- Mark

Cool Coder wrote:
> Is there anyway I stop highlighting text if it is a href/url etc...? The problem occurs when the field content is a URL which contains the query e.g. my search is for .net and fields has value http://jkjsd.net. After applying highlighter, it becomes http://jkjsd<b>.net</b>, which is a wrong URL. Can I filter it out?
>    
>   - BR
>
>  __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org