You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pierre Van Ingelandt <pv...@inforama.fr> on 2006/09/05 15:21:56 UTC
Highlighting "really" found terms
Hello,
After a search, I need to highlight only the terms that do "really"
correspond to the query.
For instance :
1/ I search docs with toto and titi in the SAME sentence (using
SpanNotQuery(spanNearQuery({"toto","titi"},99999)),".") )
2/ Then I try to highlight "toto" and "titi" found (I use the queryscorer
from highlight package)
Then the problem is that it highlights ALL the titi and toto terms in the
documents. (even if they are not in the same sentence).
Is there a way to highlight only the terms really found ?
Thanks a lot !
Pierre
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Highlighting "really" found terms
Posted by Karel Tejnora <ka...@tejnora.cz>.
Not for now, but I'd like to contribute span support soon.
Karel
> An alternative highlighter implementation was recently contributed here:
> http://issues.apache.org/jira/browse/LUCENE-644?page=all
> I've not had the time to study this alternative in detail (I hope to soon) so I can't say if it will do Spans correctly.
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Highlighting "really" found terms
Posted by Shane <lu...@my-family.us>.
Is your objective to avoid highlighting matching tokens which are not in
a phrase? I recently received the request to avoid highlighting single
tokens which appear in the hit (vs. sequences of matched tokens).
I have just completed a partial re-write of the getBestTextFragments to
allow this. Now the calling object can specify the minimum number of
tokens (default is 1 to replicate the current functionality) that have
to be in a sequence before the tokens will be highlighted.
I haven't done a whole lot of testing as I finished the code last night,
but if you are interested I have made the code available (along with a
patch file) at http://my-family.us/highlighter. To set the minimum
sequence size, just call setMinTokenSequence(int) after creating the
Highlighter object.
Shane
Harini Raghavan wrote:
> I have a requirement to highlight phrases. I came across a reference
> to this alternate highlighter implementation. But I am unable to see
> the source files for the same. Can someone please point me to it?
>
> Thanks,
> Harini
>
> mark harwood wrote:
>
>> See here for a thread reviewing the challenges and possible solutions
>> associated with this problem:
>> http://www.mail-archive.com/java-user@lucene.apache.org/msg02543.html
>>
>> An alternative highlighter implementation was recently contributed here:
>> http://issues.apache.org/jira/browse/LUCENE-644?page=all
>> I've not had the time to study this alternative in detail (I hope to
>> soon) so I can't say if it will do Spans correctly.
>> Cheers
>> Mark
>>
>>
>>
>> ----- Original Message ----
>> From: Pierre Van Ingelandt <pv...@inforama.fr>
>> To: java-user@lucene.apache.org
>> Sent: Tuesday, 5 September, 2006 2:21:56 PM
>> Subject: Highlighting "really" found terms
>>
>> Hello,
>>
>> After a search, I need to highlight only the terms that do "really"
>> correspond to the query.
>> For instance :
>> 1/ I search docs with toto and titi in the SAME sentence (using
>> SpanNotQuery(spanNearQuery({"toto","titi"},99999)),".") )
>> 2/ Then I try to highlight "toto" and "titi" found (I use the
>> queryscorer
>> from highlight package)
>>
>> Then the problem is that it highlights ALL the titi and toto terms in
>> the
>> documents. (even if they are not in the same sentence).
>> Is there a way to highlight only the terms really found ?
>>
>> Thanks a lot !
>>
>> Pierre
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Highlighting "really" found terms
Posted by mark harwood <ma...@yahoo.co.uk>.
See here for a thread reviewing the challenges and possible solutions associated with this problem:
http://www.mail-archive.com/java-user@lucene.apache.org/msg02543.html
An alternative highlighter implementation was recently contributed here:
http://issues.apache.org/jira/browse/LUCENE-644?page=all
I've not had the time to study this alternative in detail (I hope to soon) so I can't say if it will do Spans correctly.
Cheers
Mark
----- Original Message ----
From: Pierre Van Ingelandt <pv...@inforama.fr>
To: java-user@lucene.apache.org
Sent: Tuesday, 5 September, 2006 2:21:56 PM
Subject: Highlighting "really" found terms
Hello,
After a search, I need to highlight only the terms that do "really"
correspond to the query.
For instance :
1/ I search docs with toto and titi in the SAME sentence (using
SpanNotQuery(spanNearQuery({"toto","titi"},99999)),".") )
2/ Then I try to highlight "toto" and "titi" found (I use the queryscorer
from highlight package)
Then the problem is that it highlights ALL the titi and toto terms in the
documents. (even if they are not in the same sentence).
Is there a way to highlight only the terms really found ?
Thanks a lot !
Pierre
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org