You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christian Reuschling <ch...@gmail.com> on 2011/11/02 20:19:04 UTC

Numeric field min max values

Hi,

maybe it is an easy question - I searched over the lucene-user
archive, but sadly didn't found an answer :(

I currently change our field logic from string- to numeric fields.
Until now, I managed to find the min-max values of a field by
iterating over the field with a TermEnum
(termEnum = reader.terms(new Term(strFieldName, ""));).

Now, in the case of a numeric field, I get some strange field values
as "$)A M`" - I guess this could be a low-precision token from the
field trie?

Is there a special way to iterate over numeric field values? Or is
there a possibility to get the trie and ask him for the min-max
values? Or another (util)-class?

Thanks for all answers!

Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Numeric field min max values

Posted by Christian Reuschling <ch...@gmail.com>.
Thank you very much - this really helps me a lot!


2011/11/8 Christoph Kaser <ch...@iconparc.de>:
> Hi Chris,
>
> Here is some code we use to obtain the int values from the TermEnum:
>
>        HashSet<Integer> ints = new HashSet<Integer>();
>        TermEnum te = reader.terms(new Term(fieldName,""));
>        do {
>            String val = te.term().text();
>
>            //See the FieldCache-Implementation: NumericFields add some
> values that are only needed for range querying
>            final int shift = val.charAt(0)-NumericUtils.SHIFT_START_INT;
>            if (shift>0 && shift<=31)
>                break;
>
>            ints.add(NumericUtils.prefixCodedToInt(val));
>        }while(te.next());
>
> Hope that helps,
>
> Christoph Kaser
>
> Am 07.11.2011 21:07, schrieb Uwe Schindler:
>>
>> This is caused by lower-precision terms used by NumericField to allow fast
>> NumericRangeQuery. You have to filter those values by looking at the first
>> few bits, which contains the precision.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
>>> Sent: Monday, November 07, 2011 8:17 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Numeric field min max values
>>>
>>> hm - I recognized that when I iterate with TermEnum and decode the value
>>> with prefixCodedToInt (..), I get correct values, but I also get values
>>> that are not
>>> Field values of this field in the entire index.
>>> E.g. I get in the number-encoded field with the timestams also a '0'
>>> as term - but all documents have a correct timestamp.
>>> I also recognized that Luke shows the same values, even in the case the
>>> correct
>>> decoder is selected. Luke also gives the opportunity to 'browse term
>>> docs', and
>>> says that every document is a '0' - term document.
>>>
>>> Has anyone a idea?
>>>
>>> best
>>>
>>> Chris
>>>
>>> 2011/11/3 Christian Reuschling<ch...@gmail.com>:
>>>>
>>>> Thank you very much! This exactly solves my problem
>>>>
>>>>
>>>> 2011/11/3 Ian Lea<ia...@gmail.com>:
>>>>>
>>>>> I can't answer most of the questions, but oal.util.NumericUtils has
>>>>> prefixCodedToInt (Long, etc) methods that will convert the encoded
>>>>> value (what you are seeing, I presume) to int or long or whatever.
>>>>> Maybe that will help.
>>>>>
>>>>>
>>>>> --
>>>>> Ian.
>>>>>
>>>>>
>>>>> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
>>>>> <ch...@gmail.com>  wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> maybe it is an easy question - I searched over the lucene-user
>>>>>> archive, but sadly didn't found an answer :(
>>>>>>
>>>>>> I currently change our field logic from string- to numeric fields.
>>>>>> Until now, I managed to find the min-max values of a field by
>>>>>> iterating over the field with a TermEnum (termEnum =
>>>>>> reader.terms(new Term(strFieldName, ""));).
>>>>>>
>>>>>> Now, in the case of a numeric field, I get some strange field values
>>>>>> as "$)A M`" - I guess this could be a low-precision token from the
>>>>>> field trie?
>>>>>>
>>>>>> Is there a special way to iterate over numeric field values? Or is
>>>>>> there a possibility to get the trie and ask him for the min-max
>>>>>> values? Or another (util)-class?
>>>>>>
>>>>>> Thanks for all answers!
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> --------------------------------------------------------------------
>>>>>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Dipl.-Inf. Christoph Kaser
>
> IconParc GmbH
> Sophienstrasse 1
> 80333 München
>
> www.iconparc.de
>
> Tel +49 -89- 15 90 06 - 21
> Fax +49 -89- 15 90 06 - 49
>
> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
> 121830, Amtsgericht München
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Numeric field min max values

Posted by Christoph Kaser <ch...@iconparc.de>.
Hi Chris,

Here is some code we use to obtain the int values from the TermEnum:

         HashSet<Integer> ints = new HashSet<Integer>();
         TermEnum te = reader.terms(new Term(fieldName,""));
         do {
             String val = te.term().text();

             //See the FieldCache-Implementation: NumericFields add some 
values that are only needed for range querying
             final int shift = val.charAt(0)-NumericUtils.SHIFT_START_INT;
             if (shift>0 && shift<=31)
                 break;

             ints.add(NumericUtils.prefixCodedToInt(val));
         }while(te.next());

Hope that helps,

Christoph Kaser

Am 07.11.2011 21:07, schrieb Uwe Schindler:
> This is caused by lower-precision terms used by NumericField to allow fast NumericRangeQuery. You have to filter those values by looking at the first few bits, which contains the precision.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
>> Sent: Monday, November 07, 2011 8:17 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Numeric field min max values
>>
>> hm - I recognized that when I iterate with TermEnum and decode the value
>> with prefixCodedToInt (..), I get correct values, but I also get values that are not
>> Field values of this field in the entire index.
>> E.g. I get in the number-encoded field with the timestams also a '0'
>> as term - but all documents have a correct timestamp.
>> I also recognized that Luke shows the same values, even in the case the correct
>> decoder is selected. Luke also gives the opportunity to 'browse term docs', and
>> says that every document is a '0' - term document.
>>
>> Has anyone a idea?
>>
>> best
>>
>> Chris
>>
>> 2011/11/3 Christian Reuschling<ch...@gmail.com>:
>>> Thank you very much! This exactly solves my problem
>>>
>>>
>>> 2011/11/3 Ian Lea<ia...@gmail.com>:
>>>> I can't answer most of the questions, but oal.util.NumericUtils has
>>>> prefixCodedToInt (Long, etc) methods that will convert the encoded
>>>> value (what you are seeing, I presume) to int or long or whatever.
>>>> Maybe that will help.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
>>>> <ch...@gmail.com>  wrote:
>>>>> Hi,
>>>>>
>>>>> maybe it is an easy question - I searched over the lucene-user
>>>>> archive, but sadly didn't found an answer :(
>>>>>
>>>>> I currently change our field logic from string- to numeric fields.
>>>>> Until now, I managed to find the min-max values of a field by
>>>>> iterating over the field with a TermEnum (termEnum =
>>>>> reader.terms(new Term(strFieldName, ""));).
>>>>>
>>>>> Now, in the case of a numeric field, I get some strange field values
>>>>> as "$)A M`" - I guess this could be a low-precision token from the
>>>>> field trie?
>>>>>
>>>>> Is there a special way to iterate over numeric field values? Or is
>>>>> there a possibility to get the trie and ask him for the min-max
>>>>> values? Or another (util)-class?
>>>>>
>>>>> Thanks for all answers!
>>>>>
>>>>> Chris
>>>>>
>>>>> --------------------------------------------------------------------
>>>>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Dipl.-Inf. Christoph Kaser

IconParc GmbH
Sophienstrasse 1
80333 München

www.iconparc.de

Tel +49 -89- 15 90 06 - 21
Fax +49 -89- 15 90 06 - 49

Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
121830, Amtsgericht München




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Numeric field min max values

Posted by Uwe Schindler <uw...@thetaphi.de>.
This is caused by lower-precision terms used by NumericField to allow fast NumericRangeQuery. You have to filter those values by looking at the first few bits, which contains the precision.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Christian Reuschling [mailto:christian.reuschling@gmail.com]
> Sent: Monday, November 07, 2011 8:17 PM
> To: java-user@lucene.apache.org
> Subject: Re: Numeric field min max values
> 
> hm - I recognized that when I iterate with TermEnum and decode the value
> with prefixCodedToInt (..), I get correct values, but I also get values that are not
> Field values of this field in the entire index.
> E.g. I get in the number-encoded field with the timestams also a '0'
> as term - but all documents have a correct timestamp.
> I also recognized that Luke shows the same values, even in the case the correct
> decoder is selected. Luke also gives the opportunity to 'browse term docs', and
> says that every document is a '0' - term document.
> 
> Has anyone a idea?
> 
> best
> 
> Chris
> 
> 2011/11/3 Christian Reuschling <ch...@gmail.com>:
> > Thank you very much! This exactly solves my problem
> >
> >
> > 2011/11/3 Ian Lea <ia...@gmail.com>:
> >> I can't answer most of the questions, but oal.util.NumericUtils has
> >> prefixCodedToInt (Long, etc) methods that will convert the encoded
> >> value (what you are seeing, I presume) to int or long or whatever.
> >> Maybe that will help.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
> >> <ch...@gmail.com> wrote:
> >>> Hi,
> >>>
> >>> maybe it is an easy question - I searched over the lucene-user
> >>> archive, but sadly didn't found an answer :(
> >>>
> >>> I currently change our field logic from string- to numeric fields.
> >>> Until now, I managed to find the min-max values of a field by
> >>> iterating over the field with a TermEnum (termEnum =
> >>> reader.terms(new Term(strFieldName, ""));).
> >>>
> >>> Now, in the case of a numeric field, I get some strange field values
> >>> as "$)A M`" - I guess this could be a low-precision token from the
> >>> field trie?
> >>>
> >>> Is there a special way to iterate over numeric field values? Or is
> >>> there a possibility to get the trie and ask him for the min-max
> >>> values? Or another (util)-class?
> >>>
> >>> Thanks for all answers!
> >>>
> >>> Chris
> >>>
> >>> --------------------------------------------------------------------
> >>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Numeric field min max values

Posted by Christian Reuschling <ch...@gmail.com>.
hm - I recognized that when I iterate with TermEnum and decode the
value with prefixCodedToInt (..), I get correct values, but I also get
values that are not Field values of this field in the entire index.
E.g. I get in the number-encoded field with the timestams also a '0'
as term - but all documents have a correct timestamp.
I also recognized that Luke shows the same values, even in the case
the correct decoder is selected. Luke also gives the opportunity to
'browse term docs', and says that every document is a '0' - term
document.

Has anyone a idea?

best

Chris

2011/11/3 Christian Reuschling <ch...@gmail.com>:
> Thank you very much! This exactly solves my problem
>
>
> 2011/11/3 Ian Lea <ia...@gmail.com>:
>> I can't answer most of the questions, but oal.util.NumericUtils has
>> prefixCodedToInt (Long, etc) methods that will convert the encoded
>> value (what you are seeing, I presume) to int or long or whatever.
>> Maybe that will help.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
>> <ch...@gmail.com> wrote:
>>> Hi,
>>>
>>> maybe it is an easy question - I searched over the lucene-user
>>> archive, but sadly didn't found an answer :(
>>>
>>> I currently change our field logic from string- to numeric fields.
>>> Until now, I managed to find the min-max values of a field by
>>> iterating over the field with a TermEnum
>>> (termEnum = reader.terms(new Term(strFieldName, ""));).
>>>
>>> Now, in the case of a numeric field, I get some strange field values
>>> as "$)A M`" - I guess this could be a low-precision token from the
>>> field trie?
>>>
>>> Is there a special way to iterate over numeric field values? Or is
>>> there a possibility to get the trie and ask him for the min-max
>>> values? Or another (util)-class?
>>>
>>> Thanks for all answers!
>>>
>>> Chris
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Numeric field min max values

Posted by Christian Reuschling <ch...@gmail.com>.
Thank you very much! This exactly solves my problem


2011/11/3 Ian Lea <ia...@gmail.com>:
> I can't answer most of the questions, but oal.util.NumericUtils has
> prefixCodedToInt (Long, etc) methods that will convert the encoded
> value (what you are seeing, I presume) to int or long or whatever.
> Maybe that will help.
>
>
> --
> Ian.
>
>
> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
> <ch...@gmail.com> wrote:
>> Hi,
>>
>> maybe it is an easy question - I searched over the lucene-user
>> archive, but sadly didn't found an answer :(
>>
>> I currently change our field logic from string- to numeric fields.
>> Until now, I managed to find the min-max values of a field by
>> iterating over the field with a TermEnum
>> (termEnum = reader.terms(new Term(strFieldName, ""));).
>>
>> Now, in the case of a numeric field, I get some strange field values
>> as "$)A M`" - I guess this could be a low-precision token from the
>> field trie?
>>
>> Is there a special way to iterate over numeric field values? Or is
>> there a possibility to get the trie and ask him for the min-max
>> values? Or another (util)-class?
>>
>> Thanks for all answers!
>>
>> Chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Numeric field min max values

Posted by Ian Lea <ia...@gmail.com>.
I can't answer most of the questions, but oal.util.NumericUtils has
prefixCodedToInt (Long, etc) methods that will convert the encoded
value (what you are seeing, I presume) to int or long or whatever.
Maybe that will help.


--
Ian.


On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
<ch...@gmail.com> wrote:
> Hi,
>
> maybe it is an easy question - I searched over the lucene-user
> archive, but sadly didn't found an answer :(
>
> I currently change our field logic from string- to numeric fields.
> Until now, I managed to find the min-max values of a field by
> iterating over the field with a TermEnum
> (termEnum = reader.terms(new Term(strFieldName, ""));).
>
> Now, in the case of a numeric field, I get some strange field values
> as "$)A M`" - I guess this could be a low-precision token from the
> field trie?
>
> Is there a special way to iterate over numeric field values? Or is
> there a possibility to get the trie and ask him for the min-max
> values? Or another (util)-class?
>
> Thanks for all answers!
>
> Chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org