You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Harini Raghavan <ha...@insideview.com> on 2006/01/09 19:16:58 UTC

highlighting phrases

Hi All,

I am using the highlighter package to highlight my search results. The 
query I am passing to the Highlighter is:
+(Content:"Apple Computer" Content:"Apple Comp") +(Title:"Apple 
Computer" Title:"Apple Comp")
But the Highlighter is highlighting even occurances of terms 
'Computer'/'Comp'. Anyone knows how to make sure
that only phrases are highlighted, not just the individual terms?

Thanks,
Harini

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: highlighting phrases

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Harini - you won't find a custom analyzer that does exactly what  
you've described, but building custom analyzers is pretty  
straightforward.  You can learn a lot about it by looking at the  
pieces within Lucene's source code or the examples (and text) from  
Lucene in Action.

Reading from an external file and aggregating tokens shouldn't be too  
difficult.

	Erik


On Jan 11, 2006, at 8:37 AM, Harini Raghavan wrote:

> Hi Erik,
>
> I had a look at the SpansExtractor class by Mark, that can convert any
> Query to spans. But I think ultimately the analyzer that is used to
> convert the text in to TokenStream is what is more important. I am  
> using
> the StandardAnalyzer and it seems to return a stream of Tokens where
> each token is single word with some positional information. But I want
> some words(company name which I can read from an external file) not to
> be broken in to separate tokens. So what I need to work on first, is a
> custom analyzer. Are there any such existing Analyzer implementations
> available?
>
> Thanks,
> Harini
>
> Erik Hatcher wrote:
>
>>
>> On Jan 9, 2006, at 1:16 PM, Harini Raghavan wrote:
>>
>>> I am using the highlighter package to highlight my search  
>>> results.  The query I am passing to the Highlighter is:
>>> +(Content:"Apple Computer" Content:"Apple Comp") +(Title:"Apple   
>>> Computer" Title:"Apple Comp")
>>> But the Highlighter is highlighting even occurances of terms   
>>> 'Computer'/'Comp'. Anyone knows how to make sure
>>> that only phrases are highlighted, not just the individual terms?
>>
>>
>> This is not currently implemented in the Highlighter in contrib/  
>> highlighter.  Implementing this has been discussed (by converting  
>> a  Query to a SpanQuery and leveraging its precise range for  
>> highlighting).
>>
>> It would be greatly welcome to have some patches to achieve this   
>> level of precise highlighting!
>>
>>     Erik
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: highlighting phrases

Posted by Harini Raghavan <ha...@insideview.com>.
Hi Erik,

I had a look at the SpansExtractor class by Mark, that can convert any
Query to spans. But I think ultimately the analyzer that is used to
convert the text in to TokenStream is what is more important. I am using
the StandardAnalyzer and it seems to return a stream of Tokens where
each token is single word with some positional information. But I want
some words(company name which I can read from an external file) not to
be broken in to separate tokens. So what I need to work on first, is a
custom analyzer. Are there any such existing Analyzer implementations
available?

Thanks,
Harini

Erik Hatcher wrote:

>
> On Jan 9, 2006, at 1:16 PM, Harini Raghavan wrote:
>
>> I am using the highlighter package to highlight my search results.  
>> The query I am passing to the Highlighter is:
>> +(Content:"Apple Computer" Content:"Apple Comp") +(Title:"Apple  
>> Computer" Title:"Apple Comp")
>> But the Highlighter is highlighting even occurances of terms  
>> 'Computer'/'Comp'. Anyone knows how to make sure
>> that only phrases are highlighted, not just the individual terms?
>
>
> This is not currently implemented in the Highlighter in contrib/ 
> highlighter.  Implementing this has been discussed (by converting a  
> Query to a SpanQuery and leveraging its precise range for highlighting).
>
> It would be greatly welcome to have some patches to achieve this  
> level of precise highlighting!
>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: highlighting phrases

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jan 9, 2006, at 1:16 PM, Harini Raghavan wrote:
> I am using the highlighter package to highlight my search results.  
> The query I am passing to the Highlighter is:
> +(Content:"Apple Computer" Content:"Apple Comp") +(Title:"Apple  
> Computer" Title:"Apple Comp")
> But the Highlighter is highlighting even occurances of terms  
> 'Computer'/'Comp'. Anyone knows how to make sure
> that only phrases are highlighted, not just the individual terms?

This is not currently implemented in the Highlighter in contrib/ 
highlighter.  Implementing this has been discussed (by converting a  
Query to a SpanQuery and leveraging its precise range for highlighting).

It would be greatly welcome to have some patches to achieve this  
level of precise highlighting!

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org