You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Fuad Efendi <fu...@efendi.ca> on 2008/07/24 15:48:30 UTC

Unsure about omitNorms, termVectors...

Hi,

It's unclear... found in schema.xml:


omitNorms: (expert) set to true to omit the norms associated with
        this field (this disables length normalization and index-time
        boosting for the field, and saves some memory).  Only full-text
        fields or fields that need an index-time boost need norms.
termVectors: [false] set to true to store the term vector for a given field.
        When using MoreLikeThis, fields used for similarity should be  
stored for
        best performance.


Questions:

omitNorms: do I need it for full-text fields even if I don't need  
index-time boosting? I don't want to boost text where keyword repeated  
several time. Is my understanding correct?

termVectors: do I need it for MoreLikeThis only?

What are memory requirements for Lucene caches warming up if I use  
term vectors and norms?


Thanks,
Fuad

Re: Unsure about omitNorms, termVectors...

Posted by Chris Hostetter <ho...@fucit.org>.

: > omitNorms: do I need it for full-text fields even if I don't need index-time
: > boosting? I don't want to boost text where keyword repeated several time. Is
: > my understanding correct?

if you omitNorms="true" then you not only lose index-time doc/field 
boosting, but you also loose lengthNorms -- it won't matter how long a 
field is, if a term occurs once in a 5 term field value it will score the 
same as if it appears once in a 5000 term field value.

if you don't wnat docs to score higher when the word is repeated omitNorms 
won't help you -- you'll need a custom similarity where you override the 
tf() method.

: > What are memory requirements for Lucene caches warming up if I use term
: > vectors and norms?
: 
: I don't believe Term Vectors are cached anywhere, other than via the OS.  I'd
: have to go dig around for norms info, or maybe someone else can chime in.

norms is one byte per doc per field.


-Hoss

Re: Unsure about omitNorms, termVectors...

Posted by Grant Ingersoll <gs...@apache.org>.

On Jul 24, 2008, at 9:48 AM, Fuad Efendi wrote:

> Hi,
>
> It's unclear... found in schema.xml:
>
>
> omitNorms: (expert) set to true to omit the norms associated with
>       this field (this disables length normalization and index-time
>       boosting for the field, and saves some memory).  Only full-text
>       fields or fields that need an index-time boost need norms.
> termVectors: [false] set to true to store the term vector for a  
> given field.
>       When using MoreLikeThis, fields used for similarity should be  
> stored for
>       best performance.
>
>
> Questions:
>
> omitNorms: do I need it for full-text fields even if I don't need  
> index-time boosting? I don't want to boost text where keyword  
> repeated several time. Is my understanding correct?

I'm not sure what you are asking  Do you mean you don't want term  
frequency factored in or you don't want length normalization and  
document/field boosting factored in?

>
>
> termVectors: do I need it for MoreLikeThis only?

They can help speed up MLT, but are not required.  If they are not  
available, than MLT has to re-analyze the field.

>
>
> What are memory requirements for Lucene caches warming up if I use  
> term vectors and norms?

I don't believe Term Vectors are cached anywhere, other than via the  
OS.  I'd have to go dig around for norms info, or maybe someone else  
can chime in.

-Grant