You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Wei Wang <we...@gmail.com> on 2013/04/10 09:34:34 UTC

IntField question

IntField inherits from Field class a function called setByteValue().
However, if we call it, it gives an error message:

java.lang.IllegalArgumentException: cannot change value type from Integer
to Byte

1. If this not allowed for IntField, and there is no ByteField, how will
function setByteValue() be used?

2. Will IntField automatically detect value range is small and use less
space? I understand DocValuesField can save space by using variable length
codec, but not sure about IntField.

Thanks.

Re: IntField question

Posted by Wei Wang <we...@gmail.com>.
Thanks for the clarification. Very helpful.

On Wed, Apr 10, 2013 at 8:19 AM, Adrien Grand <jp...@gmail.com> wrote:

> Hi,
>
> On Wed, Apr 10, 2013 at 4:59 PM, Wei Wang <we...@gmail.com> wrote:
> > Okay. Since there is no ByteField, setByteValue will never by used. It
> > seems like a dead function.
>
> Right, Lucene doesn't have byte or short fields.
>
> > That makes sense. If we don't need positional info (virtually all terms
> are
> > at the same position), can we control this for IntField or any other
> Field?
>
> You can configure a FieldType so that its postings lists only contain
> matching documents without any positional information[1]. This is the
> case by default on numeric fields (in particular IntField).
>
> [1]
> https://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/FieldInfo.IndexOptions.html#DOCS_ONLY
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: IntField question

Posted by Adrien Grand <jp...@gmail.com>.
Hi,

On Wed, Apr 10, 2013 at 4:59 PM, Wei Wang <we...@gmail.com> wrote:
> Okay. Since there is no ByteField, setByteValue will never by used. It
> seems like a dead function.

Right, Lucene doesn't have byte or short fields.

> That makes sense. If we don't need positional info (virtually all terms are
> at the same position), can we control this for IntField or any other Field?

You can configure a FieldType so that its postings lists only contain
matching documents without any positional information[1]. This is the
case by default on numeric fields (in particular IntField).

[1] https://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/FieldInfo.IndexOptions.html#DOCS_ONLY

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IntField question

Posted by Wei Wang <we...@gmail.com>.
Hi,

On Wed, Apr 10, 2013 at 2:45 AM, Adrien Grand <jp...@gmail.com> wrote:

> Hi,
>
> On Wed, Apr 10, 2013 at 9:34 AM, Wei Wang <we...@gmail.com> wrote:
> > IntField inherits from Field class a function called setByteValue().
> > However, if we call it, it gives an error message:
> >
> > java.lang.IllegalArgumentException: cannot change value type from Integer
> > to Byte
> >
> > 1. If this not allowed for IntField, and there is no ByteField, how will
> > function setByteValue() be used?
>
> The rule is that if your Field instances wrap an object whose type is
> XXX, you should only use the setXXXValue setter. Other setters will
> throw an exception instead of performing automatic type conversions in
> order to detect programming errors. This is why setByteValue threw an
> exception on your IntField.
>

Okay. Since there is no ByteField, setByteValue will never by used. It
seems like a dead function.

>
> > 2. Will IntField automatically detect value range is small and use less
> > space? I understand DocValuesField can save space by using variable
> length
> > codec, but not sure about IntField.
>
> They are very different:
>  - A DocValues field stores one value per document ID.
>  - An indexed field only stores distinct values, and associate with
> every dictinct value the list of document IDs that contain this value
> (this is called a postings list).
>
> Indexed values are not compressed but the postings lists are, and the
> compression ratio is better when postings lists are dense (with the
> current default postings format at least). This makes indexed fields
> (such as IntField) use less space when the number of dictinct values
> is small.
>

That makes sense. If we don't need positional info (virtually all terms are
at the same position), can we control this for IntField or any other Field?

>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: IntField question

Posted by Adrien Grand <jp...@gmail.com>.
Hi,

On Wed, Apr 10, 2013 at 9:34 AM, Wei Wang <we...@gmail.com> wrote:
> IntField inherits from Field class a function called setByteValue().
> However, if we call it, it gives an error message:
>
> java.lang.IllegalArgumentException: cannot change value type from Integer
> to Byte
>
> 1. If this not allowed for IntField, and there is no ByteField, how will
> function setByteValue() be used?

The rule is that if your Field instances wrap an object whose type is
XXX, you should only use the setXXXValue setter. Other setters will
throw an exception instead of performing automatic type conversions in
order to detect programming errors. This is why setByteValue threw an
exception on your IntField.

> 2. Will IntField automatically detect value range is small and use less
> space? I understand DocValuesField can save space by using variable length
> codec, but not sure about IntField.

They are very different:
 - A DocValues field stores one value per document ID.
 - An indexed field only stores distinct values, and associate with
every dictinct value the list of document IDs that contain this value
(this is called a postings list).

Indexed values are not compressed but the postings lists are, and the
compression ratio is better when postings lists are dense (with the
current default postings format at least). This makes indexed fields
(such as IntField) use less space when the number of dictinct values
is small.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org