You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Tobias Hill <To...@citerus.se> on 2008/03/05 08:02:23 UTC

NO_NORM and TOKENIZED

Hi,

I am quite new to the Lucene API. I find the Field-constructor
unintuitive. Maybe I have misunderstood it. Let's find out...

It can be used either as:
new Field("field", "data", Store.NO, TOKENIZED)

or:
new Field("field", "data", Store.NO, NO_NORM)


As I understand it NO_NORM and TOKENIZED are not settings for
a one-dimensional behaviour - on the contrary they are rather
orthogonal.

I.e. it is quite likely that I would want _both_ TOKENIZED and NO_NORM.
This is especially true for fields that are of approx. equal and short length
over the doc-space.

- Am I right in my reasoning (which means that the API is a bit unclear)?
Or
- Have I misunderstood something fundamental about TOKENIZED and NO_NORM?

Thankful for any feedback on this,
Tobias

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: NO_NORM and TOKENIZED

Posted by Michael McCandless <lu...@mikemccandless.com>.

NO_NORMS means "do index the field as a single token (ie, do not  
tokenize the field), and, do not store norms for it".

Mike

On Mar 5, 2008, at 5:20 AM, <sp...@gmx.eu> <sp...@gmx.eu> wrote:

> Hm, what exactly does NO_NORM mean?
>
> Thank you
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: NO_NORM and TOKENIZED

Posted by sp...@gmx.eu.

Hm, what exactly does NO_NORM mean?

Thank you


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: NO_NORM and TOKENIZED

Posted by Michael McCandless <lu...@mikemccandless.com>.

Correct, they are logically orthogonal, and I agree the API is  
somewhat confusing since "NO_NORMS" is mixing up two things.

To get a tokenized field without norms you can create the field with  
Index.TOKENIZED, and then call setOmitNorms(true).

Note that norms "spread" during merges, so, if you really want  
NO_NORMS for a given field X then every doc in the index must have  
its field X indexed with NO_NORMS.  Ie, build a clean index if you  
decide to turn off norms for field X.

Mike

Tobias Hill wrote:

> Hi,
>
> I am quite new to the Lucene API. I find the Field-constructor
> unintuitive. Maybe I have misunderstood it. Let's find out...
>
> It can be used either as:
> new Field("field", "data", Store.NO, TOKENIZED)
>
> or:
> new Field("field", "data", Store.NO, NO_NORM)
>
>
> As I understand it NO_NORM and TOKENIZED are not settings for
> a one-dimensional behaviour - on the contrary they are rather
> orthogonal.
>
> I.e. it is quite likely that I would want _both_ TOKENIZED and  
> NO_NORM.
> This is especially true for fields that are of approx. equal and  
> short length
> over the doc-space.
>
> - Am I right in my reasoning (which means that the API is a bit  
> unclear)?
> Or
> - Have I misunderstood something fundamental about TOKENIZED and  
> NO_NORM?
>
> Thankful for any feedback on this,
> Tobias
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org