You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/07/04 21:27:10 UTC
Re: Trie vs long string for sorting
: My data are library call numbers, normalized to be comparable, resulting in
: (maximum) 21-character strings of the form "RK 052180H359~999~999"
:
: Now, these are fine -- they work for sorting and ranges and the whole thing,
: but right now I can't use them because I've got two or three for each of my
: 6M documents and on a 32-bit machine I run out of heap.
:
: Another option would be to turn them into longs (using roughly 56 bits of
: the 64 bit space) and use a trie type. Is there any sort of a win involved
: there?
I don't think Trie fields can be used for sorting (because they result in
multiple terms per doc) but i could be wrong about that, smarter people
then me may have done something cool with the TreiField that i'm not aware
of.
As a general rule: if you have character data that fits a rigid enough set
of constraints that you can encode any legal value into a single
numberic value (either int, or long) such that they still sort properly,
sorting on those encoded values is going to be more memory efficient (and
probably just as fast) as sorting on the string values.
-Hoss
Re: Trie vs long string for sorting
Posted by Mark Miller <ma...@gmail.com>.
Trie has a custom parser that can load the FieldCache for sorting. Its
basically a built in type now, that supports fieldcache, sorting, stored
fields, etc.
On Sat, Jul 4, 2009 at 3:27 PM, Chris Hostetter <ho...@fucit.org>wrote:
>
> : My data are library call numbers, normalized to be comparable, resulting
> in
> : (maximum) 21-character strings of the form "RK 052180H359~999~999"
> :
> : Now, these are fine -- they work for sorting and ranges and the whole
> thing,
> : but right now I can't use them because I've got two or three for each of
> my
> : 6M documents and on a 32-bit machine I run out of heap.
> :
> : Another option would be to turn them into longs (using roughly 56 bits of
> : the 64 bit space) and use a trie type. Is there any sort of a win
> involved
> : there?
>
> I don't think Trie fields can be used for sorting (because they result in
> multiple terms per doc) but i could be wrong about that, smarter people
> then me may have done something cool with the TreiField that i'm not aware
> of.
>
> As a general rule: if you have character data that fits a rigid enough set
> of constraints that you can encode any legal value into a single
> numberic value (either int, or long) such that they still sort properly,
> sorting on those encoded values is going to be more memory efficient (and
> probably just as fast) as sorting on the string values.
>
>
> -Hoss
>
>
--
--
- Mark
http://www.lucidimagination.com