You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/02/06 23:00:33 UTC

TrieRange

I've taken a quick peek at TrieUtils/TrieRangeQuery - nice work Uwe!

One general comment is that it seems like TrieUtils tries to do a
little too much for users (and in the process covers up some
functionallity).
For example, whether to encode values in two fields and exactly what
those fields are named, seems like it should be under developer
control.  Likewise, developers should be in control of creating and
adding fields to documents and setting other properties like
omitTerms, omitNorms, etc.

Encoding a slice per character makes the code simpler, but increases
the size of the index... but perhaps not enough to worry about in
practice?

What's the synchronization for... a mistaken remnant from a previous draft?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: TrieRange

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Yonik,

> On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > The field names could be changed, sure, the small performance
> optimization
> > is in TrieRangeFilter: The splitting of the range is done in a way, to
> not
> > seek back and forward in the Term list, just in forward direction. This
> is
> > only possible if the helper field is ordered *after* the original one.
> 
> This should not matter unless the terms are near each other.
> That won't be the case with two different fields in a sufficiently sized
> index.

This is only a minimal optimization, suitable for very large indexes. The
problem is: if you have many terms in highest precission (a lot of different
double values), seeking is more costly if you jump from higher to lower
precisions. But as I noted, there is no problem for the code, the names of
the fields do not matter. The only optimization is if the helper field comes
before or after the main one.

Uwe

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: TrieRange

Posted by Uwe Schindler <uw...@thetaphi.de>.

> On Sat, Feb 7, 2009 at 12:29 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > This is only a minimal optimization, suitable for very large indexes.
> The
> > problem is: if you have many terms in highest precission (a lot of
> different
> > double values), seeking is more costly if you jump from higher to lower
> > precisions.
> 
> That's my point... in very large indexes this should not result in any
> difference at all on average because the terms would be no where near
> each other.

OK.

-- I prepare a new TrieRangeFilter implementation, just taking the String[]
fieldnames and the sortableLong and the precisionStep.

And I think, you are right. One could completely remove the "storeing" API.
If one wants to add stored fields, he could use NumberUtils.

> As an example: in a very big index, one wants to independently collect
> all documents that match "apple" and all documents that match "zebra",
> which term you seek to first should not matter.

OK, I agree :)

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Sat, Feb 7, 2009 at 12:29 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> This is only a minimal optimization, suitable for very large indexes. The
> problem is: if you have many terms in highest precission (a lot of different
> double values), seeking is more costly if you jump from higher to lower
> precisions.

That's my point... in very large indexes this should not result in any
difference at all on average because the terms would be no where near
each other.

As an example: in a very big index, one wants to independently collect
all documents that match "apple" and all documents that match "zebra",
which term you seek to first should not matter.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> The field names could be changed, sure, the small performance optimization
> is in TrieRangeFilter: The splitting of the range is done in a way, to not
> seek back and forward in the Term list, just in forward direction. This is
> only possible if the helper field is ordered *after* the original one.

This should not matter unless the terms are near each other.
That won't be the case with two different fields in a sufficiently sized index.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Sat, Feb 7, 2009 at 12:26 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> To optimize index space, one would want to "right justify" the encoded
>> number for any bit range to minimize variation on the left - this
>> plays into lucene's prefix compression.

The prototype code I just posted in JIRA does this.  For example, if
we are encoding the bits 0xffffffffffffffff with a precision of only 8
bits, and using 7 bits per char, then it stores

0x01 0x7f
  instead of
0x7f 0x70

This means that a whole sequence of these values would take up closer
to 1 byte of data instead of 2 in the index.

> I am not sure, if this is the right way. Lucene's prefix compression is also
> good for seeking fast to the term. If thousands of terms, only varying in
> the last bits (because all bits before are zero), must be scanned to get to
> the right one, it would get less performant.

Every 128th term is stored in full in memory and a binary search is
used to find the closest lower term.  A linear scan is done from
there.

If anything it should be slightly faster to iterate over a more compact index.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: TrieRange

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Yonik,

> > An optimization might be to remove
> > the lower 0 bits from the string, but it would not be needed. The
> strings
> > are unique for one precision (no difference between 0-bits there or
> not).
> 
> Yes, one would certainly want to remove trailing bits that were
> insignificant.
> 
> To optimize index space, one would want to "right justify" the encoded
> number for any bit range to minimize variation on the left - this
> plays into lucene's prefix compression.

I am not sure, if this is the right way. Lucene's prefix compression is also
good for seeking fast to the term. If thousands of terms, only varying in
the last bits (because all bits before are zero), must be scanned to get to
the right one, it would get less performant. I would pack all bits to the
begiing to optimize the prefix usage. Maybe the index gets bigger, but
trierangefilter needs fast seeking to the right terms.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> An optimization might be to remove
> the lower 0 bits from the string, but it would not be needed. The strings
> are unique for one precision (no difference between 0-bits there or not).

Yes, one would certainly want to remove trailing bits that were insignificant.

To optimize index space, one would want to "right justify" the encoded
number for any bit range to minimize variation on the left - this
plays into lucene's prefix compression.

For exampe: If we wanted to encode 7 bits per character (so each
character will take up only one byte in UTF8), but say we have 9 bits
of data we want to encode.

The two characters could be encoded like this (where x is a data bit):
xxxxxxxx xx000000
Or this:
000000xx xxxxxxxx

The latter is more efficient in index space since many more values
will share the same leading bits.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Doug Cutting <cu...@apache.org>.

Ryan McKinley wrote:
> If someone is 
> using /trunk, they should be ready for changes like this to happen.

+1

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Ryan McKinley <ry...@gmail.com>.

>
> This change would require a complete API change (as you proposed),  
> the only
> problem is, I pointed a lot of people to the new (unreleased code).  
> They
> have to reindex their docs and change the code. So my question: Is  
> anybody
> using TrieRangeQuery at the moment?
>

I don't pretend to speak for anyone using TrieRangeQuery at the  
moment, but...

If there is an improovement that can be made before a release, i think  
that is a good idea.  From the sound of this, this seems like a pretty  
good direction.

For the people already using TrieRangeQuery, they can continue to use  
it by sticking with the nightly build they are using now.  If someone  
is using /trunk, they should be ready for changes like this to happen.

ryan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: TrieRange

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Yonik,

I combine both mails into one:

> >> Encoding a slice per character makes the code simpler, but increases
> >> the size of the index... but perhaps not enough to worry about in
> >> practice?
> >
> > This is correct. For 2bit and 4bit there is a lot of overhead by this,
> but
> > there is no way round (any ideas how to fix this?). But 8bit is the most
> > compact one. There needs to be more testing and benchmarking.
> 
> Separate bit slicing and String encoding.... they are independent.
> If a,b,c,d are prefix codes designating precision, and w,x,y,z are
> each 2 bits of the number, then
> 
> ax
> bxy
> cxyz
> dxyzw
> 
> Everything after the prefix can be encoded in a single character in each
> case.
> 
> Lucene's prefix encoding of the index will remove some of the
> redundancy... buy only for numbers that are very packed together.

This is a really great idea, to bad, that I got it not before christmas.
This change would require an almost complete rewrite of TrieUtils and partly
of TrieRangeFilter. The idea would be to change the following:

- For encoding the unsigned longs use the same encoding as Solr's
NumberUtils does (because of the trie encoded fields, it must be a little
bit different structured, but the encoding would be identical, only the java
code different), which is even more compacted as the current 8bit encoding.
This would be a good place to merge the NumberUtils again with Lucene. This
encodings would be stored in the main field (and optionally stored). The
encoding is always used, even for 8bit, 4bit and 2bit (it's always the
same). This makes decoding of stored trie encoded fields and sorting
simplier (you only need one en-/decoder). No more need for the current auto
detect decoders.

- The additional trie encoded lower precision slices (stored in an extra
field, there is no possibility anymore to store it in same field, more to
that later) are generated not on the character string side, they are
generated before encoding. The first character still specifies the precision
(slice length). The encoded field comes after that, but the lower precision
is not generated by stripping chars from the right, it is just a binary
operation (anding with a mask) on the original unsigned long (just remove
the lower bits by setting to 0). The manipulated long is then encoded in the
same way, prefixed with the length char. An optimization might be to remove
the lower 0 bits from the string, but it would not be needed. The strings
are unique for one precision (no difference between 0-bits there or not).

- TrieRangeFilter would do the same as before but with some changes: It
would not split the range using the string encoded field, it would also use
the unsigned long, anding the bits. The only thing to change is
increment/decrementTrieCoded() from TrieUtils which then must operate on the
longs, but this can be done easily. This method is needed for fitting the
ranges against each other and exclusive ranges.

I think about it some more days and will write a patch/rewrite and test a
little bit. The tests have to pass for TestTrieRangeQuery, if they does, all
is OK :)

This change would require a complete API change (as you proposed), the only
problem is, I pointed a lot of people to the new (unreleased code). They
have to reindex their docs and change the code. So my question: Is anybody
using TrieRangeQuery at the moment?

> > The encoding of the values
> > into two different field names does the trick for the whole range query.
> > Removing the code that generates the field in exactly that way would
> remove
> > the idea behind TrieRangeFilter.
> 
> Allowing the ability to specify the exact name of both fields (or
> specify both names as the same)?

With the change above, there is no way to use the same field name (currently
the two fields can be differentiated using the character value of the first
cahr, if <0x100 it's a prefixed field, if >=0x100 it's a raw value. But this
could be changed if also the highest precision is prefixed in index (not for
the stored field, but for the indexed ones). If I would keep the highest
precision without prefix, the fields cannot merged, but a traditional
RangeQuery using the NumberUtils encoding would work!!! (and the test case,
comparing both...).

> > The only thing that could be changed would
> > be to make the suffix on the helper field variable. There is lot of
> > optimization behind this (see TrieRangeFilter comments in the central
> > splitRange method). This depends on the order and so the names of the
> field
> > and its helper.
> 
> Are you saying that the actual field names cause a predictable
> performance difference?  This should not be the case.

The field names could be changed, sure, the small performance optimization
is in TrieRangeFilter: The splitting of the range is done in a way, to not
seek back and forward in the Term list, just in forward direction. This is
only possible if the helper field is ordered *after* the original one. But
this can be detected by string comparison of the field names and then the
order of splitting is changed.

> > Very early versions of the trie algorithm was using a extra
> > field name for each slice, as you call it, I removed it completely later
> (no
> > helper field at all), but then trie fields could not be sorted any more.
> The
> > extra helper field is the way in the middle.
> >
> > Maybe there should be more documentation about that by a more native
> > speaker.
> 
> I understand how it works and how one would need to configure it such
> that it be sortable if needed - but my point was really much more
> about allowing people to do things differently if needed.

Propose an API to generate the document fields, which should be simple for
beginners (like now), but flexible!

One possibility would be to generate the main field and helper fields
separate. The helper fields without special configutation (no norms, no
store, not tf). The main field is generated with all possibilities. Maybe I
split the method into two ones, one that only generates the helper field
(with any name), and the other one like now (for easy usage). If somebody
uses the first one (only helper), he needs to generate the main field with
all options (norms, tf,..) for himself (in a compatible way).

> >> For example, whether to encode values in two fields and exactly what
> >> those fields are named, seems like it should be under developer
> >> control.  Likewise, developers should be in control of creating and
> >> adding fields to documents and setting other properties like
> >> omitTerms, omitNorms, etc.
> >
> > In my opinion there is no sense in having norms or such things on trie
> > fields. They should only be queried using TrieRangeFilter/Query, for
> that
> > norms and TF are not needed (as they are numerical values, for what do
> you
> > need the norms?).
> 
> norms: they also fold in index time boosts.

Currently norms are switched of for trie fields, this should optimize
indexing time. Why switch them on for trie fields?

> What if someone wanted to put payloads or use some other future
> indexing method on the terms?

See above.

> It's more about not forcing decisions on all developers.
> There would be no way for me to incorporate the Trie stuff into Solr
> as it stands - I'd need to develop custom code that duplicated code
> from TrieUtils because not enough flexibility is exposed.  I'm not
> adverse to doing so - just pointing out the downsides.

Wait for my next code proposal, and lets look into it. Give me some time,
this is not so simple to change, but would be effective. I do not like all
your proposals (like switching on/of norms/tf/...), but payloads may be
good. Maybe you just throw a IllegalArgumentException in Solr, when somebody
tries to modify norms/tf for numeric fields that are trie encoded.

I hope you like the new ideas. What I would not like is a fork of
TrieRangeQuery in Solr, this would make maintenance harder. Let's make it
better in Lucene Contrib (and that before 2.9, after that I cannot change
the API/encoding anymore).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Yonik Seeley <yo...@lucidimagination.com>.

Actually, I think we can totally do away with "variants" of 2,4,8 bits
and make it completely generic, able to support any slice size from 1
to 63 bits.

I'll work up some prototype code and post it in the original TrieUtils
JIRA issue.

-Yonik


On Sat, Feb 7, 2009 at 9:58 AM, Yonik Seeley <yo...@lucidimagination.com> wrote:
> On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>>> I understand how it works and how one would need to configure it such
>>> that it be sortable if needed - but my point was really much more
>>> about allowing people to do things differently if needed.
>>
>> Propose an API to generate the document fields, which should be simple for
>> beginners (like now), but flexible!
>
> Simplicity for beginners can always be layered on top if enough power
> and flexibility is exposed (my bias).
>
> // the first entry is full precision, the remaining entries have less
> String[] trieCode(long sortableBits)
>
> long getSortableBits(long value)
> long getSortableBits(double value)
> long getSortableBits(Date date)
>
> // example usage:
> String[] terms = trieCode(getSortableBits(1.4142135d))
>
> This API concentrates on bit manipulation and term generation
> (Strings) w/o getting into things like Fields and Documents.
>
> -Yonik
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> I understand how it works and how one would need to configure it such
>> that it be sortable if needed - but my point was really much more
>> about allowing people to do things differently if needed.
>
> Propose an API to generate the document fields, which should be simple for
> beginners (like now), but flexible!

Simplicity for beginners can always be layered on top if enough power
and flexibility is exposed (my bias).

// the first entry is full precision, the remaining entries have less
String[] trieCode(long sortableBits)

long getSortableBits(long value)
long getSortableBits(double value)
long getSortableBits(Date date)

// example usage:
String[] terms = trieCode(getSortableBits(1.4142135d))

This API concentrates on bit manipulation and term generation
(Strings) w/o getting into things like Fields and Documents.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> Encoding a slice per character makes the code simpler, but increases
>> the size of the index... but perhaps not enough to worry about in
>> practice?
>
> This is correct. For 2bit and 4bit there is a lot of overhead by this, but
> there is no way round (any ideas how to fix this?). But 8bit is the most
> compact one. There needs to be more testing and benchmarking.

Separate bit slicing and String encoding.... they are independent.
If a,b,c,d are prefix codes designating precision, and w,x,y,z are
each 2 bits of the number, then

ax
bxy
cxyz
dxyzw

Everything after the prefix can be encoded in a single character in each case.

Lucene's prefix encoding of the index will remove some of the
redundancy... buy only for numbers that are very packed together.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: TrieRange

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler <uw...@thetaphi.de> wrote
> The encoding of the values
> into two different field names does the trick for the whole range query.
> Removing the code that generates the field in exactly that way would remove
> the idea behind TrieRangeFilter.

Allowing the ability to specify the exact name of both fields (or
specify both names as the same)?

> The only thing that could be changed would
> be to make the suffix on the helper field variable. There is lot of
> optimization behind this (see TrieRangeFilter comments in the central
> splitRange method). This depends on the order and so the names of the field
> and its helper.

Are you saying that the actual field names cause a predictable
performance difference?  This should not be the case.

> Very early versions of the trie algorithm was using a extra
> field name for each slice, as you call it, I removed it completely later (no
> helper field at all), but then trie fields could not be sorted any more. The
> extra helper field is the way in the middle.
>
> Maybe there should be more documentation about that by a more native
> speaker.

I understand how it works and how one would need to configure it such
that it be sortable if needed - but my point was really much more
about allowing people to do things differently if needed.

>> For example, whether to encode values in two fields and exactly what
>> those fields are named, seems like it should be under developer
>> control.  Likewise, developers should be in control of creating and
>> adding fields to documents and setting other properties like
>> omitTerms, omitNorms, etc.
>
> In my opinion there is no sense in having norms or such things on trie
> fields. They should only be queried using TrieRangeFilter/Query, for that
> norms and TF are not needed (as they are numerical values, for what do you
> need the norms?).

norms: they also fold in index time boosts.
What if someone wanted to put payloads or use some other future
indexing method on the terms?

It's more about not forcing decisions on all developers.
There would be no way for me to incorporate the Trie stuff into Solr
as it stands - I'd need to develop custom code that duplicated code
from TrieUtils because not enough flexibility is exposed.  I'm not
adverse to doing so - just pointing out the downsides.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: TrieRange

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Yonik,

> I've taken a quick peek at TrieUtils/TrieRangeQuery - nice work Uwe!

Thank you :-)

Some words before: I made the original "TrieRangeQuery" about middle of
2005, when Lucene 1.4.3 was on the market, closed source for PANGAEA
(www.pangaea.de). At this time there was no ConstantScoreQuery around. The
special Rangequery was was just a work-around, that splitted a Range into
smaller RangeQueries connected by a BooleanQuery in rewrite() with different
precision. I was now able, to set the static clause limitation in
BooleanQuery to about 4000 and all numerical ranges worked. At this time, I
used fixed-point decimals, that I indexed using these trie methods.

During the work for the project panFMP (PANGAEA Framework for Metadata
Portals, www.panfmp.org), I released it open source. And the I read a lot of
discussions about range queries, geographical searches and so on, colleagues
also using Lucene asked me, to give it directly to Lucene, so it can be used
outside of panFMP. Older versions of this query can be still be retrieved
from the panFMP SVN. So there was a lot of optimization work involved,
because we do very often such type of queries at PANGAEA.

> One general comment is that it seems like TrieUtils tries to do a
> little too much for users (and in the process covers up some
> functionallity).

The question came from Otis after new year, too. The encoding of the values
into two different field names does the trick for the whole range query.
Removing the code that generates the field in exactly that way would remove
the idea behind TrieRangeFilter. The only thing that could be changed would
be to make the suffix on the helper field variable. There is lot of
optimization behind this (see TrieRangeFilter comments in the central
splitRange method). This depends on the order and so the names of the field
and its helper. Very early versions of the trie algorithm was using a extra
field name for each slice, as you call it, I removed it completely later (no
helper field at all), but then trie fields could not be sorted any more. The
extra helper field is the way in the middle.

Maybe there should be more documentation about that by a more native
speaker.

> For example, whether to encode values in two fields and exactly what
> those fields are named, seems like it should be under developer
> control.  Likewise, developers should be in control of creating and
> adding fields to documents and setting other properties like
> omitTerms, omitNorms, etc.

In my opinion there is no sense in having norms or such things on trie
fields. They should only be queried using TrieRangeFilter/Query, for that
norms and TF are not needed (as they are numerical values, for what do you
need the norms?).

> Encoding a slice per character makes the code simpler, but increases
> the size of the index... but perhaps not enough to worry about in
> practice?

This is correct. For 2bit and 4bit there is a lot of overhead by this, but
there is no way round (any ideas how to fix this?). But 8bit is the most
compact one. There needs to be more testing and benchmarking.

> What's the synchronization for... a mistaken remnant from a previous
> draft?

You mean the synchronization around StringBuffer? The original code was Java
1.5 and used StringBuilder. As StringBuffer synchronizes on each method
call, it is more performant to synchronize the whole block that uses the
StringBuffer (or use Java 1.5). By this, the mutex is locked and the method
calls do not need to be synchronized on each call (because it is already
synchronized). See e.g.
http://allu.wordpress.com/2006/08/20/string-buffer-is-thread-safe/, but
there is more of this around. When we switch to Java 1.5, we can remove all
usage of StringBuffer and replace by StringBuilder. There may be other
places in Lucene that could be optimized like this (a lot of toString() and
+ operator, but toString() is mostly used for debugging so no use).

To conclude:
If some time in future, the idea behind TrieRangeFilter would make its way
into the core, the whole term "Term" must be changed. The paper mentioned in
the original JIRA issue (LUCENE-1470) handles this
[http://webcourse.cs.technion.ac.il/236620/Winter2006-2007/hw/WCFiles/params
earch.pdf] - I did not know it. But Lucene is Lucene, if you want to change
it and make this issue more compact, many parts of Lucene's core must be
rewritten.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org