You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Matt Savona <ma...@gmail.com> on 2016/03/09 19:23:16 UTC

disable field length normalization on specific fields?

Hi all,

I am trying to understand if the following is possible:

I would like to have several fields in my index which are boosted at index
time. Because they are to be boosted at index time, their field type
requires omitNorms(false).

However, I do not want field length normalization to affect the scoring of
these fields. For example, finding the term 'baseball' (1:5 words) should
score exactly the same as (1:100 words).

There are other fields in my index which are not boosted, so
omitNorms(true) is acceptable on them. However, I do not want to broadly
disable length normalization on every single field (I have at least one
where I require it). Thus, I am not certain a custom Similarity class is
appropriate.

Is it possible to simply disable length normalization on a a field-by-field
basis, while still allowing index-time boosting?

Thank you in advance!

- Matt

Re: disable field length normalization on specific fields?

Posted by Chris Hostetter <ho...@fucit.org>.

yep, just use a customied similarity that doesn't include a length factor 
when computing the norm.

If you are currently using TFIDFSimilarity (or one of it's subclasses) 
then the computeNorm method delegates to a lengthNorm method, and you 
can override that to return "1" for fields with a certain name regardless 
of the length.

If you are currently using something else -- like BM25Similarity perhaps 
-- you'll probably have to override the computeNorm method and 
write a slightly longer calculation based on whatever logic is in the 
computeNorm method you are currently using -- look for usages of 
FieldInvertState.getLength() and remove/replace that with a fixed value.




: Date: Wed, 9 Mar 2016 13:23:16 -0500
: From: Matt Savona <ma...@gmail.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: disable field length normalization on specific fields?
: 
: Hi all,
: 
: I am trying to understand if the following is possible:
: 
: I would like to have several fields in my index which are boosted at index
: time. Because they are to be boosted at index time, their field type
: requires omitNorms(false).
: 
: However, I do not want field length normalization to affect the scoring of
: these fields. For example, finding the term 'baseball' (1:5 words) should
: score exactly the same as (1:100 words).
: 
: There are other fields in my index which are not boosted, so
: omitNorms(true) is acceptable on them. However, I do not want to broadly
: disable length normalization on every single field (I have at least one
: where I require it). Thus, I am not certain a custom Similarity class is
: appropriate.
: 
: Is it possible to simply disable length normalization on a a field-by-field
: basis, while still allowing index-time boosting?
: 
: Thank you in advance!
: 
: - Matt
: 

-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org