You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Raghavendra Prabhu <rr...@gmail.com> on 2006/03/22 06:41:19 UTC

lucene highlighter

Hi guys

If anyone can tell me how to get the best fragments using the highligher

The query has two terms  - term1 and term2

The search result display only term1 in the highlighter whereas term2 is
also there. How can i adjust the lucene highlighter to make sure that
atleast each term is displayed in the query result

Rgds
Prabhu

Which field has a hit?

Posted by Frank Kunemann <fr...@innosystec.de>.
Hi again,

is there a way to receive the fields of a document that have a hit?
My problem is that in my case a lucene document consists of many different
files that belong together. Each of the files has an own content field, but
I don't store the content to keep the index as small as possible.
Therefore when using highlighting I have to go through each file till a
given number of fragments are highlighted or no more content fields (and
therefore files) are available. This would be much faster if I knew in which
field the hits were and how many.


Cheers,

Frank


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene highlighter

Posted by Raghavendra Prabhu <rr...@gmail.com>.
Hi Mark

 Currently both of the terms have the same score (weightage)

As you mentioned,  i would want it to be decreased so during the next run
for selecting second fragment, term1 has less weightage and term2 which has
not been selected has more weightage

Thanks
Rgds
Prabhu

On 3/22/06, mark harwood <ma...@yahoo.co.uk> wrote:
>
> >>How can i adjust the lucene highlighter to make sure
> >> that atleast each term is displayed in the query
> result
>
>
> First some, basic things to sanity check:
>
> * A classic problem: are you using compatible
> analyzers for tokenizing the query and the document
> content (both index time and highlight time)? Term2
> may not be being produced at all.
>
> * Are you selecting only one fragment and using a
> fragmenter implementation that means Term1 and Term2
> don't happen to fall within the scope of this single
> fragment?
>
> If both of these checks turn out OK I suspect what is
> happening is that term2 is weighted significantly less
> than term1 (based on idf and query boosts) and the
> highlighter may be continually selecting multiple
> fragments with term1 in preference to selecting any
> fragments which only contain the lower scoring term2.
>
> If this is the case and you really want to ensure that
> term2 gets shown then you can use a custom Scorer
> implementation that influences the highlighter
> according to your preferences. Such an implementation
> could, for example, score fragments that are merely
> repetitions of the same "hits" (ie your term1) with a
> decreasing value. This would then allow the fragments
> with term2 to be considered more strongly for
> selection.
>
>
> Hope this helps
> Mark
>
>
>
>
> ___________________________________________________________
> To help you stay safe and secure online, we've developed the all new
> Yahoo! Security Centre. http://uk.security.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: lucene highlighter

Posted by mark harwood <ma...@yahoo.co.uk>.
>>How can i adjust the lucene highlighter to make sure
>> that atleast each term is displayed in the query
result


First some, basic things to sanity check:

* A classic problem: are you using compatible
analyzers for tokenizing the query and the document
content (both index time and highlight time)? Term2
may not be being produced at all.

* Are you selecting only one fragment and using a
fragmenter implementation that means Term1 and Term2
don't happen to fall within the scope of this single
fragment?

If both of these checks turn out OK I suspect what is
happening is that term2 is weighted significantly less
than term1 (based on idf and query boosts) and the
highlighter may be continually selecting multiple
fragments with term1 in preference to selecting any
fragments which only contain the lower scoring term2.

If this is the case and you really want to ensure that
term2 gets shown then you can use a custom Scorer
implementation that influences the highlighter
according to your preferences. Such an implementation
could, for example, score fragments that are merely
repetitions of the same "hits" (ie your term1) with a
decreasing value. This would then allow the fragments
with term2 to be considered more strongly for
selection.


Hope this helps
Mark



		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org