You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christian Reuschling <ch...@gmail.com> on 2012/02/15 12:57:40 UTC

Empty numeric field

Hi all,

for some reason, we need empty numeric field values (to ensure that
the length of the field value list is constant). We tried to add an
empty String-Fieldable instead in the case a value is not present,
which seemed to work for searching.
Nevertheless, when we want to sort against this field, sadly we fall
into exceptions.

Is there any possibility to store empty numeric fields into a
document/the index?

Thanks

Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Empty numeric field

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi again,

I just have to remind that sorting on multi-valued fields is not supported by Lucene! This has nothing to do with numeric, it just does not work and may throw other exceptions depending on the version you use.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
> Sent: Wednesday, February 15, 2012 2:56 PM
> To: java-user@lucene.apache.org
> Subject: Re: Empty numeric field
> 
> Uwe, thank you very much. This sounds like the pretty best solution!
> 
> 
> 2012/2/15 Uwe Schindler <uw...@thetaphi.de>:
> > Hi,
> >
> > Thanks for explanation. I almost expected that it has to do with" stored
> fields". It's easy to fix:
> >
> >> ah ok, I know what you mean. We have to read out the stored field values
> later.
> >> A field can have multiple (stored) values (several
> >> document.add(fieldable) invocations for one field). Further, we have
> >> the problem that some field values are logically related to each
> >> other. Since Lucene has no possibility to define relationships
> >> between documents, and offer e.g. key-based, table-join-similar
> >> stuff, we took this approach as a kind of
> >> approximation:
> >>
> >> Example document:  (*fieldName: value1, value2, ..)
> >>      * personId: personId1, personId2, personId3, personIdN
> >>      * personName: personName1, personName2, personName3,
> personNameN
> >>      * personAge: personAge1, personAge2, personAge3, personAgeN
> >>      * tagId: tagId1, tagId2, tagId3, tagIdN
> >>      * tagLabel: tagLabel1, tagLabel2, tagLabel3, tagLabelN
> >>      *
> >>
> >> In the case there is e.g. no person age known for person 2, we have
> >> to insert an empty entry to ensure that all data for person 3 is in
> >> the third stored field value in this document:
> >>
> >>      * personName: personName1, personName2, personName3,
> personNameN
> >>      * personAge: personAge1, EMPTYENTRY, personAge3, personAgeN
> >>
> >> since you can't create a numeric Field instance with a null entry
> >> (that would be wonderfull), we insert a string fieldable, so in this
> >> example the values for 'personage1, personage3, and personageN are
> >> NumericFields, and EMPTYENTRY is a standard string field with a
> >> zero-length string value "". This results into exceptions when we sort against
> the personAge field.
> >
> > Then add those zero-length string fields with Field.Index.NO - they dnt have to
> be indexed. For your use case Store.YES is enough!
> >
> >> We also thinked about NaN or negative infinity as placeholder - but
> >> these are only available at Float and Double. For Longs and Integers,
> >> only min_value and max_value are offered. Further, we have to convert
> >> this value back into null or something empty, whereby the empty
> >> string value field integrates seamless as it should (with respect to
> >> the exceptions ;) ) But maybe there is no other possibility to take the
> max/min values?
> >>
> >>
> >>
> >> 2012/2/15 Uwe Schindler <uw...@thetaphi.de>:
> >> > Hi,
> >> >
> >> > This looks like an XY problem
> >> (http://www.perlmonks.org/index.pl?node_id=542341). Maybe you should
> >> first explain to us, why you need that. In Lucene fields have no
> >> "equal length" or something like that, especially numeric fields are
> >> tokenized and contain of several tokens separately indexed. So what do you
> mean with equal length?
> >> Why must this "length" be identical?
> >> >
> >> > The only suggestion is to index a "fake" placeholder value (like
> >> > -1, infinity,
> >> NaN). If you only need it in the "stored" fields, just store it but don't index it.
> >> >
> >> > Uwe
> >> >
> >> > -----
> >> > Uwe Schindler
> >> > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >> > eMail: uwe@thetaphi.de
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
> >> >> Sent: Wednesday, February 15, 2012 12:58 PM
> >> >> To: java-user
> >> >> Subject: Empty numeric field
> >> >>
> >> >> Hi all,
> >> >>
> >> >> for some reason, we need empty numeric field values (to ensure that
> >> >> the length of the field value list is constant). We tried to add an
> >> >> empty String- Fieldable instead in the case a value is not present,
> >> >> which seemed to work for searching.
> >> >> Nevertheless, when we want to sort against this field, sadly we fall
> >> >> into exceptions.
> >> >>
> >> >> Is there any possibility to store empty numeric fields into a
> >> >> document/the index?
> >> >>
> >> >> Thanks
> >> >>
> >> >> Chris
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Empty numeric field

Posted by Christian Reuschling <ch...@gmail.com>.
Uwe, thank you very much. This sounds like the pretty best solution!


2012/2/15 Uwe Schindler <uw...@thetaphi.de>:
> Hi,
>
> Thanks for explanation. I almost expected that it has to do with" stored fields". It's easy to fix:
>
>> ah ok, I know what you mean. We have to read out the stored field values later.
>> A field can have multiple (stored) values (several
>> document.add(fieldable) invocations for one field). Further, we have the
>> problem that some field values are logically related to each other. Since
>> Lucene has no possibility to define relationships between documents, and offer
>> e.g. key-based, table-join-similar stuff, we took this approach as a kind of
>> approximation:
>>
>> Example document:  (*fieldName: value1, value2, ..)
>>      * personId: personId1, personId2, personId3, personIdN
>>      * personName: personName1, personName2, personName3, personNameN
>>      * personAge: personAge1, personAge2, personAge3, personAgeN
>>      * tagId: tagId1, tagId2, tagId3, tagIdN
>>      * tagLabel: tagLabel1, tagLabel2, tagLabel3, tagLabelN
>>      *
>>
>> In the case there is e.g. no person age known for person 2, we have to insert an
>> empty entry to ensure that all data for person 3 is in the third stored field value
>> in this document:
>>
>>      * personName: personName1, personName2, personName3, personNameN
>>      * personAge: personAge1, EMPTYENTRY, personAge3, personAgeN
>>
>> since you can't create a numeric Field instance with a null entry (that would be
>> wonderfull), we insert a string fieldable, so in this example the values for
>> 'personage1, personage3, and personageN are NumericFields, and
>> EMPTYENTRY is a standard string field with a zero-length string value "". This
>> results into exceptions when we sort against the personAge field.
>
> Then add those zero-length string fields with Field.Index.NO - they dnt have to be indexed. For your use case Store.YES is enough!
>
>> We also thinked about NaN or negative infinity as placeholder - but these are
>> only available at Float and Double. For Longs and Integers, only min_value and
>> max_value are offered. Further, we have to convert this value back into null or
>> something empty, whereby the empty string value field integrates seamless as
>> it should (with respect to the exceptions ;) ) But maybe there is no other
>> possibility to take the max/min values?
>>
>>
>>
>> 2012/2/15 Uwe Schindler <uw...@thetaphi.de>:
>> > Hi,
>> >
>> > This looks like an XY problem
>> (http://www.perlmonks.org/index.pl?node_id=542341). Maybe you should first
>> explain to us, why you need that. In Lucene fields have no "equal length" or
>> something like that, especially numeric fields are tokenized and contain of
>> several tokens separately indexed. So what do you mean with equal length?
>> Why must this "length" be identical?
>> >
>> > The only suggestion is to index a "fake" placeholder value (like -1, infinity,
>> NaN). If you only need it in the "stored" fields, just store it but don't index it.
>> >
>> > Uwe
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> >> -----Original Message-----
>> >> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
>> >> Sent: Wednesday, February 15, 2012 12:58 PM
>> >> To: java-user
>> >> Subject: Empty numeric field
>> >>
>> >> Hi all,
>> >>
>> >> for some reason, we need empty numeric field values (to ensure that
>> >> the length of the field value list is constant). We tried to add an
>> >> empty String- Fieldable instead in the case a value is not present,
>> >> which seemed to work for searching.
>> >> Nevertheless, when we want to sort against this field, sadly we fall
>> >> into exceptions.
>> >>
>> >> Is there any possibility to store empty numeric fields into a
>> >> document/the index?
>> >>
>> >> Thanks
>> >>
>> >> Chris
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Empty numeric field

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

Thanks for explanation. I almost expected that it has to do with" stored fields". It's easy to fix:

> ah ok, I know what you mean. We have to read out the stored field values later.
> A field can have multiple (stored) values (several
> document.add(fieldable) invocations for one field). Further, we have the
> problem that some field values are logically related to each other. Since
> Lucene has no possibility to define relationships between documents, and offer
> e.g. key-based, table-join-similar stuff, we took this approach as a kind of
> approximation:
> 
> Example document:  (*fieldName: value1, value2, ..)
>      * personId: personId1, personId2, personId3, personIdN
>      * personName: personName1, personName2, personName3, personNameN
>      * personAge: personAge1, personAge2, personAge3, personAgeN
>      * tagId: tagId1, tagId2, tagId3, tagIdN
>      * tagLabel: tagLabel1, tagLabel2, tagLabel3, tagLabelN
>      *
> 
> In the case there is e.g. no person age known for person 2, we have to insert an
> empty entry to ensure that all data for person 3 is in the third stored field value
> in this document:
> 
>      * personName: personName1, personName2, personName3, personNameN
>      * personAge: personAge1, EMPTYENTRY, personAge3, personAgeN
> 
> since you can't create a numeric Field instance with a null entry (that would be
> wonderfull), we insert a string fieldable, so in this example the values for
> 'personage1, personage3, and personageN are NumericFields, and
> EMPTYENTRY is a standard string field with a zero-length string value "". This
> results into exceptions when we sort against the personAge field.

Then add those zero-length string fields with Field.Index.NO - they dnt have to be indexed. For your use case Store.YES is enough!

> We also thinked about NaN or negative infinity as placeholder - but these are
> only available at Float and Double. For Longs and Integers, only min_value and
> max_value are offered. Further, we have to convert this value back into null or
> something empty, whereby the empty string value field integrates seamless as
> it should (with respect to the exceptions ;) ) But maybe there is no other
> possibility to take the max/min values?
> 
> 
> 
> 2012/2/15 Uwe Schindler <uw...@thetaphi.de>:
> > Hi,
> >
> > This looks like an XY problem
> (http://www.perlmonks.org/index.pl?node_id=542341). Maybe you should first
> explain to us, why you need that. In Lucene fields have no "equal length" or
> something like that, especially numeric fields are tokenized and contain of
> several tokens separately indexed. So what do you mean with equal length?
> Why must this "length" be identical?
> >
> > The only suggestion is to index a "fake" placeholder value (like -1, infinity,
> NaN). If you only need it in the "stored" fields, just store it but don't index it.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
> >> Sent: Wednesday, February 15, 2012 12:58 PM
> >> To: java-user
> >> Subject: Empty numeric field
> >>
> >> Hi all,
> >>
> >> for some reason, we need empty numeric field values (to ensure that
> >> the length of the field value list is constant). We tried to add an
> >> empty String- Fieldable instead in the case a value is not present,
> >> which seemed to work for searching.
> >> Nevertheless, when we want to sort against this field, sadly we fall
> >> into exceptions.
> >>
> >> Is there any possibility to store empty numeric fields into a
> >> document/the index?
> >>
> >> Thanks
> >>
> >> Chris
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Empty numeric field

Posted by Christian Reuschling <ch...@gmail.com>.
ah ok, I know what you mean. We have to read out the stored field
values later. A field can have multiple (stored) values (several
document.add(fieldable) invocations for one field). Further, we have
the problem that some field values are logically related to each
other. Since Lucene has no possibility to define relationships between
documents, and offer e.g. key-based, table-join-similar stuff, we took
this approach as a kind of approximation:

Example document:  (*fieldName: value1, value2, ..)
     * personId: personId1, personId2, personId3, personIdN
     * personName: personName1, personName2, personName3, personNameN
     * personAge: personAge1, personAge2, personAge3, personAgeN
     * tagId: tagId1, tagId2, tagId3, tagIdN
     * tagLabel: tagLabel1, tagLabel2, tagLabel3, tagLabelN
     *

In the case there is e.g. no person age known for person 2, we have to
insert an empty entry to ensure that all data for person 3 is in the
third stored field value in this document:

     * personName: personName1, personName2, personName3, personNameN
     * personAge: personAge1, EMPTYENTRY, personAge3, personAgeN

since you can't create a numeric Field instance with a null entry
(that would be wonderfull), we insert a string fieldable, so in this
example the values for 'personage1, personage3, and personageN are
NumericFields, and EMPTYENTRY is a standard string field with a
zero-length string value "". This results into exceptions when we sort
against the personAge field.

We also thinked about NaN or negative infinity as placeholder - but
these are only available at Float and Double. For Longs and Integers,
only min_value and max_value are offered. Further, we have to convert
this value back into null or something empty, whereby the empty string
value field integrates seamless as it should (with respect to the
exceptions ;) ) But maybe there is no other possibility to take the
max/min values?



2012/2/15 Uwe Schindler <uw...@thetaphi.de>:
> Hi,
>
> This looks like an XY problem (http://www.perlmonks.org/index.pl?node_id=542341). Maybe you should first explain to us, why you need that. In Lucene fields have no "equal length" or something like that, especially numeric fields are tokenized and contain of several tokens separately indexed. So what do you mean with equal length? Why must this "length" be identical?
>
> The only suggestion is to index a "fake" placeholder value (like -1, infinity, NaN). If you only need it in the "stored" fields, just store it but don't index it.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
>> Sent: Wednesday, February 15, 2012 12:58 PM
>> To: java-user
>> Subject: Empty numeric field
>>
>> Hi all,
>>
>> for some reason, we need empty numeric field values (to ensure that the
>> length of the field value list is constant). We tried to add an empty String-
>> Fieldable instead in the case a value is not present, which seemed to work for
>> searching.
>> Nevertheless, when we want to sort against this field, sadly we fall into
>> exceptions.
>>
>> Is there any possibility to store empty numeric fields into a document/the
>> index?
>>
>> Thanks
>>
>> Chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Empty numeric field

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

This looks like an XY problem (http://www.perlmonks.org/index.pl?node_id=542341). Maybe you should first explain to us, why you need that. In Lucene fields have no "equal length" or something like that, especially numeric fields are tokenized and contain of several tokens separately indexed. So what do you mean with equal length? Why must this "length" be identical?

The only suggestion is to index a "fake" placeholder value (like -1, infinity, NaN). If you only need it in the "stored" fields, just store it but don't index it.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
> Sent: Wednesday, February 15, 2012 12:58 PM
> To: java-user
> Subject: Empty numeric field
> 
> Hi all,
> 
> for some reason, we need empty numeric field values (to ensure that the
> length of the field value list is constant). We tried to add an empty String-
> Fieldable instead in the case a value is not present, which seemed to work for
> searching.
> Nevertheless, when we want to sort against this field, sadly we fall into
> exceptions.
> 
> Is there any possibility to store empty numeric fields into a document/the
> index?
> 
> Thanks
> 
> Chris
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org