You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Siddhant Goel <si...@gmail.com> on 2010/03/07 14:55:55 UTC

Question about fieldNorms

Hi everyone,

Is the fieldNorm calculation altered by the omitNorms factor? I saw on this
page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html) the
formula for calculation of fieldNorms (fieldNorm =
fieldBoost/sqrt(numTermsForField)).

Does this mean that for a document containing a string like "A B C D E" in
its field, its fieldNorm would be boost/sqrt(5), and for another document
containing the string "A B C" in the same field, its fieldNorm would be
boost/sqrt(3). Is that correct?

If yes, then is *this* what omitNorms affects?

Thanks,

-- 
- Siddhant

Re: Question about fieldNorms

Posted by Siddhant Goel <si...@gmail.com>.
Wonderful! That explains it. Thanks a lot!

Regards,

On Mon, Mar 8, 2010 at 6:39 AM, Jay Hill <ja...@gmail.com> wrote:

> Yes, if omitNorms=true, then no lengthNorm calculation will be done, and
> the
> fieldNorm value will be 1.0, and lengths of the field in question will not
> be a factor in the score.
>
> To see an example of this you can do a quick test. Add two "text" fields,
> and on one omitNorms:
>
>   <field name="foo" type="text" indexed="true" stored="true"/>
>   <field name="bar" type="text" indexed="true" stored="true"
> omitNorms="true"/>
>
> Index a doc with the same value for both fields:
>  <field name="foo">1 2 3 4 5</field>
>  <field name="bar">1 2 3 4 5</field>
>
> Set &debugQuery=true and do two queries: &q=foo:5   &q=bar:5
>
> in the "explain" section of the debug output note that the fieldNorm value
> for the "foo" query is this:
>
>    0.4375 = fieldNorm(field=foo, doc=1)
>
> and the value for the "bar" query is this:
>
>    1.0 = fieldNorm(field=bar, doc=1)
>
> A simplified description of how the fieldNorm value is: fieldNorm =
> lengthNorm * documentBoost * documentFieldBoosts
>
> and the lengthNorm is calculated like this: lengthNorm  =
> 1/(numTermsInField)**.5
> [note that the value is encoded as a single byte, so there is some
> precision
> loss]
>
> When omitNorms=true no norm calculation is done, so fieldNorm will always
> be
> one on those fields.
>
> You can also use the Luke utility to view the document in the index, and it
> will show that there is a norm value for the foo field, but not the bar
> field.
>
> -Jay
> http://www.lucidimagination.com
>
>
> On Sun, Mar 7, 2010 at 5:55 AM, Siddhant Goel <siddhantgoel@gmail.com
> >wrote:
>
> > Hi everyone,
> >
> > Is the fieldNorm calculation altered by the omitNorms factor? I saw on
> this
> > page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html)
> the
> > formula for calculation of fieldNorms (fieldNorm =
> > fieldBoost/sqrt(numTermsForField)).
> >
> > Does this mean that for a document containing a string like "A B C D E"
> in
> > its field, its fieldNorm would be boost/sqrt(5), and for another document
> > containing the string "A B C" in the same field, its fieldNorm would be
> > boost/sqrt(3). Is that correct?
> >
> > If yes, then is *this* what omitNorms affects?
> >
> > Thanks,
> >
> > --
> > - Siddhant
> >
>



-- 
- Siddhant

Re: Question about fieldNorms

Posted by Jay Hill <ja...@gmail.com>.
Yes, if omitNorms=true, then no lengthNorm calculation will be done, and the
fieldNorm value will be 1.0, and lengths of the field in question will not
be a factor in the score.

To see an example of this you can do a quick test. Add two "text" fields,
and on one omitNorms:

   <field name="foo" type="text" indexed="true" stored="true"/>
   <field name="bar" type="text" indexed="true" stored="true"
omitNorms="true"/>

Index a doc with the same value for both fields:
  <field name="foo">1 2 3 4 5</field>
  <field name="bar">1 2 3 4 5</field>

Set &debugQuery=true and do two queries: &q=foo:5   &q=bar:5

in the "explain" section of the debug output note that the fieldNorm value
for the "foo" query is this:

    0.4375 = fieldNorm(field=foo, doc=1)

and the value for the "bar" query is this:

    1.0 = fieldNorm(field=bar, doc=1)

A simplified description of how the fieldNorm value is: fieldNorm =
lengthNorm * documentBoost * documentFieldBoosts

and the lengthNorm is calculated like this: lengthNorm  =
1/(numTermsInField)**.5
[note that the value is encoded as a single byte, so there is some precision
loss]

When omitNorms=true no norm calculation is done, so fieldNorm will always be
one on those fields.

You can also use the Luke utility to view the document in the index, and it
will show that there is a norm value for the foo field, but not the bar
field.

-Jay
http://www.lucidimagination.com


On Sun, Mar 7, 2010 at 5:55 AM, Siddhant Goel <si...@gmail.com>wrote:

> Hi everyone,
>
> Is the fieldNorm calculation altered by the omitNorms factor? I saw on this
> page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html) the
> formula for calculation of fieldNorms (fieldNorm =
> fieldBoost/sqrt(numTermsForField)).
>
> Does this mean that for a document containing a string like "A B C D E" in
> its field, its fieldNorm would be boost/sqrt(5), and for another document
> containing the string "A B C" in the same field, its fieldNorm would be
> boost/sqrt(3). Is that correct?
>
> If yes, then is *this* what omitNorms affects?
>
> Thanks,
>
> --
> - Siddhant
>