You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pierre Van Ingelandt <pv...@inforama.fr> on 2006/09/05 15:21:56 UTC

Highlighting "really" found terms

Hello,

After a search, I need to highlight only the terms that do "really"
correspond to the query.
For instance :
1/ I search docs with toto and titi in the SAME sentence (using
SpanNotQuery(spanNearQuery({"toto","titi"},99999)),".") )
2/ Then I try to highlight "toto" and "titi" found (I use the queryscorer
from highlight package)

Then the problem is that it highlights ALL the titi and toto terms in the
documents. (even if they are not in the same sentence).
Is there a way to highlight only the terms really found ?

Thanks a lot !

Pierre


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Highlighting "really" found terms

Posted by Karel Tejnora <ka...@tejnora.cz>.
Not for now, but I'd like to contribute span support soon.

Karel
> An alternative highlighter implementation was recently contributed here:
>    http://issues.apache.org/jira/browse/LUCENE-644?page=all
> I've not had the time to study this alternative in detail (I hope to soon) so I can't say if it will do Spans correctly. 
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Highlighting "really" found terms

Posted by Shane <lu...@my-family.us>.
Is your objective to avoid highlighting matching tokens which are not in 
a phrase?  I recently received the request to avoid highlighting single 
tokens which appear in the hit (vs. sequences of matched tokens).
I have just completed a partial re-write of the getBestTextFragments to 
allow this.  Now the calling object can specify the minimum number of 
tokens (default is 1 to replicate the current functionality) that have 
to be in a sequence before the tokens will be highlighted.

I haven't done a whole lot of testing as I finished the code last night, 
but if you are interested I have made the code available (along with a 
patch file) at http://my-family.us/highlighter.  To set the minimum 
sequence size, just call setMinTokenSequence(int) after creating the 
Highlighter object.

Shane

Harini Raghavan wrote:
> I have a requirement to highlight phrases. I came across a reference 
> to this alternate highlighter implementation. But I am unable to see 
> the source files for the same. Can someone please point me to it?
>
> Thanks,
> Harini
>
> mark harwood wrote:
>
>> See here for a thread reviewing the challenges and possible solutions 
>> associated with this problem:
>>   http://www.mail-archive.com/java-user@lucene.apache.org/msg02543.html
>>
>> An alternative highlighter implementation was recently contributed here:
>>   http://issues.apache.org/jira/browse/LUCENE-644?page=all
>> I've not had the time to study this alternative in detail (I hope to 
>> soon) so I can't say if it will do Spans correctly.
>> Cheers
>> Mark
>>
>>
>>
>> ----- Original Message ----
>> From: Pierre Van Ingelandt <pv...@inforama.fr>
>> To: java-user@lucene.apache.org
>> Sent: Tuesday, 5 September, 2006 2:21:56 PM
>> Subject: Highlighting "really" found terms
>>
>> Hello,
>>
>> After a search, I need to highlight only the terms that do "really"
>> correspond to the query.
>> For instance :
>> 1/ I search docs with toto and titi in the SAME sentence (using
>> SpanNotQuery(spanNearQuery({"toto","titi"},99999)),".") )
>> 2/ Then I try to highlight "toto" and "titi" found (I use the 
>> queryscorer
>> from highlight package)
>>
>> Then the problem is that it highlights ALL the titi and toto terms in 
>> the
>> documents. (even if they are not in the same sentence).
>> Is there a way to highlight only the terms really found ?
>>
>> Thanks a lot !
>>
>> Pierre
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>  
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Highlighting "really" found terms

Posted by mark harwood <ma...@yahoo.co.uk>.
See here for a thread reviewing the challenges and possible solutions associated with this problem:
   http://www.mail-archive.com/java-user@lucene.apache.org/msg02543.html

An alternative highlighter implementation was recently contributed here:
   http://issues.apache.org/jira/browse/LUCENE-644?page=all
I've not had the time to study this alternative in detail (I hope to soon) so I can't say if it will do Spans correctly. 

Cheers
Mark



----- Original Message ----
From: Pierre Van Ingelandt <pv...@inforama.fr>
To: java-user@lucene.apache.org
Sent: Tuesday, 5 September, 2006 2:21:56 PM
Subject: Highlighting "really" found terms

Hello,

After a search, I need to highlight only the terms that do "really"
correspond to the query.
For instance :
1/ I search docs with toto and titi in the SAME sentence (using
SpanNotQuery(spanNearQuery({"toto","titi"},99999)),".") )
2/ Then I try to highlight "toto" and "titi" found (I use the queryscorer
from highlight package)

Then the problem is that it highlights ALL the titi and toto terms in the
documents. (even if they are not in the same sentence).
Is there a way to highlight only the terms really found ?

Thanks a lot !

Pierre


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org