You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ole-Martin Mørk <ol...@gmail.com> on 2009/10/09 11:03:28 UTC

Scoring when using solrindex

Hi.We are using Nutch with a solr backend. I have some questions about the
field boost used by Nutch when indexing documents. I can't find the numbers
anywhere, but it seems like nutch is not using the default values?

When the document is indexed by nutch I get this result when searching for
the url:

0.0014793393 = fieldWeight(url:"super secret url" in 22), product of:
  1.0 = tf(phraseFreq=1.0)
  32.31666 = idf(url: www=7327 host=321 com=7327 something=2456
something=2 something=44 704290075=1)
  4.5776367E-5 = fieldNorm(field=url, doc=22)


After retrieving the document from the solr index and writing it back with
default field boost of 1.0, I get these values.

9.874598 = fieldWeight(url:"super secret url" in 0), product of:
  1.0 = tf(phraseFreq=1.0)
  31.598713 = idf(url: www=7328 host=322 com=7328 something =2457
something =3 something =45 704290075=2)
  0.3125 = fieldNorm(field=url, doc=0)

As you can see, fieldNorm has changed significantly. The fieldNorm is
calculated using this algorithm:
document boost * field boost * (1/sqrt(terms in field))

Document boost is equal.
Terms in field is equal.
The only thing that may have changed is "field boost"

So the question is: What kind of index-time field boost does nutch use?

--
Ole-Martin Mørk
http://twitter.com/olemartin
http://flickr.com/olemartin