You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jay Hill <ja...@gmail.com> on 2009/12/05 01:38:53 UTC

Sanity check on numeric types and which of them to use

Looking at the example version of schema.xml there seems to be some
confusion on which numeric field types are best used in different
situations. What confused me was that the type of "int" is now set to a
TrieIntField, but with a precisionStep of 0:
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
omitNorms="true" positionIncrementGap="0"/>'
the "tint" type is set up as a TrieIntField with a precisionStep of 8:
    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8"
omitNorms="true" positionIncrementGap="0"/>
the "sint" type is unchanged:
    <fieldType name="sint" class="solr.SortableIntField"
sortMissingLast="true" omitNorms="true"/>
and the old IntField is now of type "pint":
    <fieldType name="pint" class="solr.IntField" omitNorms="true"/>

It's obvious that the "tint" type would be preferred for range queries. But
these questions come to mind:
1) Is there any benefit to using the "int" type as a TrieIntField w/
precisionStep=0 over the "pint" type for simple ints that won't be sorted or
range queried?
2) In 1.4, what type is now most efficient for sorting?
3) The only reason to use a "sint" field is for backward compatibility
and/or to use sortMissingFirst/SortMissingLast, correct?

-Jay

Re: Sanity check on numeric types and which of them to use

Posted by wojtekpia <wo...@hotmail.com>.


> 3) The only reason to use a "sint" field is for backward compatibility
> and/or to use sortMissingFirst/SortMissingLast, correct?
> 

I'm using sint so I can facet and sort facets numerically. 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Sanity-check-on-numeric-types-and-which-of-them-to-use-tp473893p784295.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sanity check on numeric types and which of them to use

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sat, Dec 5, 2009 at 7:02 AM, Marc Sturlese <ma...@gmail.com> wrote:
>
> And what about:
> <fieldtype name="sint" class="solr.SortableIntField"
> sortMissingLast="true"/>
> vs.
> <fieldtype name="bcdint" class="solr.BCDIntField" sortMissingLast="true"/>
>
> Wich is the differenece between both? It's just bcdint always better?
> Thanks in advance

BCDInt was a very early attempt at a sortable int type that didnt go
through binary - it went directly from base 10 (the actual string
representation) to a sortable base 10000 (10K fits in a single char
and saves memory in the fieldCache), and it also had no size limit.
It's no longer referenced in any example schemas, and it doesn't have
support for function queries.

-Yonik
http://www.lucidimagination.com

Re: Sanity check on numeric types and which of them to use

Posted by Marc Sturlese <ma...@gmail.com>.
And what about:
<fieldtype name="sint" class="solr.SortableIntField"
sortMissingLast="true"/>
vs.
<fieldtype name="bcdint" class="solr.BCDIntField" sortMissingLast="true"/>

Wich is the differenece between both? It's just bcdint always better?
Thanks in advance


Yonik Seeley-2 wrote:
> 
> On Fri, Dec 4, 2009 at 7:38 PM, Jay Hill <ja...@gmail.com> wrote:
>> 1) Is there any benefit to using the "int" type as a TrieIntField w/
>> precisionStep=0 over the "pint" type for simple ints that won't be sorted
>> or
>> range queried?
> 
> No.  But given that people could throw in a random range query and
> have it work correctly with a trie based int (vs a plain int), seems
> reason enough to prefer it.
> 
>> 2) In 1.4, what type is now most efficient for sorting?
> 
> trie and plain should be pretty equivalent (trie might be slightly
> faster to uninvert the first time).  Both take up less memory in the
> field cache than sint.
> 
>> 3) The only reason to use a "sint" field is for backward compatibility
>> and/or to use sortMissingFirst/SortMissingLast, correct?
> 
> I believe so.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 

-- 
View this message in context: http://old.nabble.com/Sanity-check-on-numeric-types-and-which-of-them-to-use-tp26651725p26655009.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sanity check on numeric types and which of them to use

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Dec 4, 2009 at 7:38 PM, Jay Hill <ja...@gmail.com> wrote:
> 1) Is there any benefit to using the "int" type as a TrieIntField w/
> precisionStep=0 over the "pint" type for simple ints that won't be sorted or
> range queried?

No.  But given that people could throw in a random range query and
have it work correctly with a trie based int (vs a plain int), seems
reason enough to prefer it.

> 2) In 1.4, what type is now most efficient for sorting?

trie and plain should be pretty equivalent (trie might be slightly
faster to uninvert the first time).  Both take up less memory in the
field cache than sint.

> 3) The only reason to use a "sint" field is for backward compatibility
> and/or to use sortMissingFirst/SortMissingLast, correct?

I believe so.

-Yonik
http://www.lucidimagination.com