You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Taylor <pa...@fastmail.fm> on 2010/01/12 12:20:04 UTC

Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?


Why is this , and how much is this (in plain english ) please ?

thanks Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

Posted by Erick Erickson <er...@gmail.com>.
I'd *strongly* recommend getting a copy of Luke, opening your index
with it and playing around. The "explain" tab will show you a *lot*
about how scoring works......

Erick

On Tue, Jan 12, 2010 at 8:16 AM, Paul Taylor <pa...@fastmail.fm> wrote:

> Benjamin Heilbrunn wrote:
>
>> This is because matches in short fields (few terms) als typically more
>> pregnant, than matches in long fields (much terms).
>>
>> Imagine the case with two fields named "title" and "content"
>> representing the title and the content of books.
>> If you match three search terms in a five terms title this is a better
>> hit than if you match those three search terms in the content of the
>> book.
>>
>> The length normalization factor is calculated by your Similarity
>> implementation in the method
>> public float lengthNorm(String fieldName, int numTokens)
>>
>> Does that help you?
>>
>>
>>
>
> Yes, thanks it does I was just getting it, is it base purely on matching a
> field with less terms rather than the percentage of terms in a field
> matched.
> i.e If you match three search terms in a five terms field would this be
> better then if you match those four search terms in a six term field.
>
>
> do you know the answer to my second post.
> i.e what does default lengthNorm return for a single term field, (compared
> to if have no NO NORM whereby assume value 1.0)
>
>
> Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

Posted by Paul Taylor <pa...@fastmail.fm>.
Benjamin Heilbrunn wrote:
> This is because matches in short fields (few terms) als typically more
> pregnant, than matches in long fields (much terms).
>
> Imagine the case with two fields named "title" and "content"
> representing the title and the content of books.
> If you match three search terms in a five terms title this is a better
> hit than if you match those three search terms in the content of the
> book.
>
> The length normalization factor is calculated by your Similarity
> implementation in the method
> public float lengthNorm(String fieldName, int numTokens)
>
> Does that help you?
>
>   

Yes, thanks it does I was just getting it, is it base purely on matching 
a field with less terms rather than the percentage of terms in a field 
matched.
i.e If you match three search terms in a five terms field would this be 
better then if you match those four search terms in a six term field.


do you know the answer to my second post.
 i.e what does default lengthNorm return for a single term field, 
(compared to if have no NO NORM whereby assume value 1.0)

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

Posted by Benjamin Heilbrunn <be...@gmail.com>.
This is because matches in short fields (few terms) als typically more
pregnant, than matches in long fields (much terms).

Imagine the case with two fields named "title" and "content"
representing the title and the content of books.
If you match three search terms in a five terms title this is a better
hit than if you match those three search terms in the content of the
book.

The length normalization factor is calculated by your Similarity
implementation in the method
public float lengthNorm(String fieldName, int numTokens)

Does that help you?

2010/1/12 Paul Taylor <pa...@fastmail.fm>:
>
>
> Why is this , and how much is this (in plain english ) please ?
>
> thanks Paul
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org