You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mossaab Bagdouri <ba...@yahoo.fr.INVALID> on 2016/11/09 19:25:04 UTC

Isn't fieldLength in BM25 supposed to be an integer?

Hi,

On Lucene 6.2.1, I have the following explain output for a document that
contain two words. I'm wondering why the value of fieldLength is not 2.

A related question was posted on S.O. two years ago:
http://stackoverflow.com/questions/22194920

23.637165 = sum of:
  10.065297 = weight(title:googl in 401658357) [BM25Similarity], result of:
    10.065297 = score(doc=401658357,freq=1.0 = termFreq=1.0
), product of:
      7.3866553 = idf(docFreq=414179, docCount=668609139)
      1.3626325 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        7.3254013 = avgFieldLength
        2.56 = fieldLength
  13.571868 = weight(title:hangout in 401658357) [BM25Similarity], result
of:
    13.571868 = score(doc=401658357,freq=1.0 = termFreq=1.0
), product of:
      9.960035 = idf(docFreq=31592, docCount=668609139)
      1.3626325 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        7.3254013 = avgFieldLength
        2.56 = fieldLength

Regards,
Mossaab

Re: Isn't fieldLength in BM25 supposed to be an integer?

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Mossaab,

Probably due to the encodeNormValue/decodeNormValue transformation of the document length.

Please see the aforementioned methods in BM25Similarity.java

Ahmet





On Wednesday, November 9, 2016 10:25 PM, Mossaab Bagdouri <ba...@yahoo.fr.INVALID> wrote:
Hi,

On Lucene 6.2.1, I have the following explain output for a document that
contain two words. I'm wondering why the value of fieldLength is not 2.

A related question was posted on S.O. two years ago:
http://stackoverflow.com/questions/22194920

23.637165 = sum of:
  10.065297 = weight(title:googl in 401658357) [BM25Similarity], result of:
    10.065297 = score(doc=401658357,freq=1.0 = termFreq=1.0
), product of:
      7.3866553 = idf(docFreq=414179, docCount=668609139)
      1.3626325 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        7.3254013 = avgFieldLength
        2.56 = fieldLength
  13.571868 = weight(title:hangout in 401658357) [BM25Similarity], result
of:
    13.571868 = score(doc=401658357,freq=1.0 = termFreq=1.0
), product of:
      9.960035 = idf(docFreq=31592, docCount=668609139)
      1.3626325 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        7.3254013 = avgFieldLength
        2.56 = fieldLength

Regards,
Mossaab

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org