You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by eks dev <ek...@yahoo.co.uk> on 2005/11/09 18:34:06 UTC

A lot of short documents, optimal query?

Hi all,

Can somebody please suggest a way/ways on how to
optimize execution times this query below (or to use
some of Trunk BooleanScorers)... Probably I do not see
obvious.

Use Case:
Here I have names of people with query expansion for
individual tokens (not using Fuzzy Query) that should
be found in ceartin zip codes.
The collection contains approx 30Mio rather short
documents with only two fields, name (2-4 tokens) and
ZIP code.

My tougts were to index ony 2 char prefix to get rid
of the prefix part of the query. Sounds plausible?

Also worth of mentioning, I would like to get all hits
(currently using HitCollector) and score is not
needed, any way to avoid scoring (would that help at
all?)

Befor adding ZIPS:12* part of the query, Lucene worked
like a charm, a lot under 1 second on 25Mio
collection! Now it jumped into 10 second range.

Trunk is ok for me.

Thanks a lot! 
eks dev

(
+(
(+raimonds +marschan) 
(+raimonds +marschol) 
(+raimonds +marschel) 
(+raimonds +marschalfr) 
(+raimonds +marschalek) 
(+raimonds +marscha) 
...
) 
+(ZIPS:22* ZIPS:21* ZIPS:20* ZIPS:23* ZIPS:245*
ZIPS:246* ZIPS:247* ZIPS:240* ZIPS:241* ZIPS:242*
ZIPS:243* ZIPS:254* ZIPS:253* ZIPS:255* ZIPS:256*
ZIPS:257* ZIPS:295* ZIPS:296* ZIPS:273* ZIPS:274*
ZIPS:275* ZIPS:276* ZIPS:192* ZIPS:190*)
)


		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A lot of short documents, optimal query?

Posted by Paul Elschot <pa...@xs4all.nl>.
On Friday 11 November 2005 23:04, Chris Hostetter wrote:
> 
> : Wouldn't it make sense to have BooleanFilter,
> : TermFilter, MultiTermFilter, RangeFilter... fammily to
> : "mirror"  xxxQuery world with same idioms and
> : interfaces? Is this the direction allready taken in
> : Lucene development (an alternative would be to
> : parametrize existiong Query world). How I see it
> : functionaly, at a moment filters (and thir
> : combination) are the only way to use fast "pure
> : boolean" model.
> :
> : Does this what I just said makes any sense?
> 
> It makes perfect sense, and you have grasped a ot of the possibilities.
> While making a version of Filter varient of every Query class is my gut
> instinct, there has in fact been discussion about generalizing Queries so
> that they can have "non-scoring" mode.  these issues have all been
> mentioned in LUCENE-383 ...
> 
> 	http://issues.apache.org/jira/browse/LUCENE-383
> 
> One of the big reasons why it might make sense to use Queries instead of
> Filters even if you don't care about scoring is when you have a large set
> of very restrictive conditions.  (ie: A BooleanQuery consisting of many
> TermQueries).  the BooleanScorer can make good decisisons to skip over
> large sets of documents -- sometimes ignoring sub queries entirely -- when
> one sub query only matches a few documents because of hte flexability of
> the Scorer API.
> 
> The Filter API on the other hand doesn't have this flexability.  There is
> not way for a ChainedFilter/BooleanFilter to know that it can skip over
> one of it's sub filters, or ask one of it's sub filters to only look at
> certain documents.

This FilteredQuery will skip over documents that do not pass the filter:
http://issues.apache.org/jira/browse/LUCENE-330
even when it is nested in itself.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A lot of short documents, optimal query?

Posted by Chris Hostetter <ho...@fucit.org>.
: 2. Filters are perfect for what they do good,
: filtering. But using them for reimplementing
: BooleanQuery, mirroring everything with filters would
: introduce a lot of redundancy. BooleanQuery for exmple
: does a lot of cool optimizations, shortcicuting
: expresins... and more or less the same things would
: have to be reimplemented using filters.

More specifically: the Filter API won't let you write code that can
shortcircut.  ChainedFilter has no way of telling any of it's sub-Filters
"you can skip ahead to document #n" the way BooleanQuery can with it's
sub-Queries.  For arbitrary search criteria entered by users, this means
building up a complex hierarchy of Filters can be less efficient then
building up a complex hierarchy of queries -- even if you don't care about
scores.

But for things that you can cache and reuse, Filters kick ass.

: 3. ConstantScoreQuery:
: I am a bit unsure here, but this looks like a bridge
: that enables Filters to enter "regular" Query world.

Correct.  ConstantScoreQuery is just a way to leverage the advantages of a
Filter in situation where need a Query.  ConstantScoreRangeQuery being the
prime example - you can subclass QueryParser to use it.

: 1. Make a TermFilter for all unique, high frequency
	...
: 2. wrap those TermFilters in ConstantScoreQuery,
:
: 3. combine this inside BooleanQuery as before (Boolean
: mix of term queries and ConstantScoreQueries)

Unless those terms are very frequently used to do searches (in various of
combinations) you may be better off just using regular TermQueries.

Even with the BooleanQuery being smart, and knowing when to skip docs
based on it's sub-queries, what I said before about not being able to
shortcircut a Filter still applies if those Filters are inside
ConstantScoreQueries.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A lot of short documents, optimal query?

Posted by eks dev <ek...@yahoo.co.uk>.
Hi Hoss, 
 
Good to hear that, I felt a bit fuzzy trying to grasp
all the possibilities.

I've read discussion from Doug's proposal for
implementing non-scoring Query features,
ConstantScoreQuery, Paul's FilteredQuery patch. 

And in summary options to avoid scoring: 

1. There is a consensus that Doug's proposal would be
the best way to proceed, but requires some time until
we get there. 

2. Filters are perfect for what they do good,
filtering. But using them for reimplementing
BooleanQuery, mirroring everything with filters would
introduce a lot of redundancy. BooleanQuery for exmple
does a lot of cool optimizations, shortcicuting 
expresins... and more or less the same things would
have to be reimplemented using filters. 

3. ConstantScoreQuery: 
I am a bit unsure here, but this looks like a bridge
that enables Filters to enter "regular" Query world. 

4. Paul's patch on FilteredQuery. This is "just" an
optimization to avoid unecessary scoring for doc's
that do not pass Filter (rather smart one tough). 

So, back to my use case:

You are totally right, ZIP codes are done best with
SetFilter (or PrefixFilter), no doubts about this one.
And they were the problem actually, so the solution is
allredy here.

But now when I learned about ConstntScoreQuery I
started thinking abot the following option:

The problem:
The first part of the query (part about thr field
"name" in my example) is combining term queries in
many strange ways using BooleanQuery, so using
ChainedFilter  would make thigs not so easy to read,
generalize and make right.  

So, what would you say about the following:

1. Make a TermFilter for all unique, high frequency
terms in my query (I have fequency info during
construction of the query). Of course, with simple
caching at TermFilter level is really simple. 

2. wrap those TermFilters in ConstantScoreQuery, 

3. combine this inside BooleanQuery as before (Boolean
mix of term queries and ConstantScoreQueries)

ZIPS field goes into SetFilter

Did I allready say "thank you!" for staying with me
while asking dumb questions :) And yes, if you get
close to Hanover, a good german beer on me is sure
thing. 




--- Chris Hostetter <ho...@fucit.org> wrote:

> 
> : Wouldn't it make sense to have BooleanFilter,
> : TermFilter, MultiTermFilter, RangeFilter...
> fammily to
> : "mirror"  xxxQuery world with same idioms and
> : interfaces? Is this the direction allready taken
> in
> : Lucene development (an alternative would be to
> : parametrize existiong Query world). How I see it
> : functionaly, at a moment filters (and thir
> : combination) are the only way to use fast "pure
> : boolean" model.
> :
> : Does this what I just said makes any sense?
> 
> It makes perfect sense, and you have grasped a ot of
> the possibilities.
> While making a version of Filter varient of every
> Query class is my gut
> instinct, there has in fact been discussion about
> generalizing Queries so
> that they can have "non-scoring" mode.  these issues
> have all been
> mentioned in LUCENE-383 ...
> 
> 	http://issues.apache.org/jira/browse/LUCENE-383
> 
> One of the big reasons why it might make sense to
> use Queries instead of
> Filters even if you don't care about scoring is when
> you have a large set
> of very restrictive conditions.  (ie: A BooleanQuery
> consisting of many
> TermQueries).  the BooleanScorer can make good
> decisisons to skip over
> large sets of documents -- sometimes ignoring sub
> queries entirely -- when
> one sub query only matches a few documents because
> of hte flexability of
> the Scorer API.
> 
> The Filter API on the other hand doesn't have this
> flexability.  There is
> not way for a ChainedFilter/BooleanFilter to know
> that it can skip over
> one of it's sub filters, or ask one of it's sub
> filters to only look at
> certain documents.
> 
> 
> I suggested the full Filter approach for your
> situation based on the
> following information...
>   1) you didn't care about scoring
>   2) you were using Range/Prefix queries on teh ZIP
> field that could
>      easily exceed practicle clause limits in
> BooleanQuery.
>   3) your restrictions on the ZIP field looked like
> they could be cached
>      individually so the and the results reused
> accross many searches
> 
> 
> 
> -Hoss
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 



		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A lot of short documents, optimal query?

Posted by Chris Hostetter <ho...@fucit.org>.
: Wouldn't it make sense to have BooleanFilter,
: TermFilter, MultiTermFilter, RangeFilter... fammily to
: "mirror"  xxxQuery world with same idioms and
: interfaces? Is this the direction allready taken in
: Lucene development (an alternative would be to
: parametrize existiong Query world). How I see it
: functionaly, at a moment filters (and thir
: combination) are the only way to use fast "pure
: boolean" model.
:
: Does this what I just said makes any sense?

It makes perfect sense, and you have grasped a ot of the possibilities.
While making a version of Filter varient of every Query class is my gut
instinct, there has in fact been discussion about generalizing Queries so
that they can have "non-scoring" mode.  these issues have all been
mentioned in LUCENE-383 ...

	http://issues.apache.org/jira/browse/LUCENE-383

One of the big reasons why it might make sense to use Queries instead of
Filters even if you don't care about scoring is when you have a large set
of very restrictive conditions.  (ie: A BooleanQuery consisting of many
TermQueries).  the BooleanScorer can make good decisisons to skip over
large sets of documents -- sometimes ignoring sub queries entirely -- when
one sub query only matches a few documents because of hte flexability of
the Scorer API.

The Filter API on the other hand doesn't have this flexability.  There is
not way for a ChainedFilter/BooleanFilter to know that it can skip over
one of it's sub filters, or ask one of it's sub filters to only look at
certain documents.


I suggested the full Filter approach for your situation based on the
following information...
  1) you didn't care about scoring
  2) you were using Range/Prefix queries on teh ZIP field that could
     easily exceed practicle clause limits in BooleanQuery.
  3) your restrictions on the ZIP field looked like they could be cached
     individually so the and the results reused accross many searches



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Filter to support DocNrSkipper interface

Posted by Morus Walter <mo...@googlemail.com>.
On Fri, 23 Dec 2005 11:01:52 +0000 (GMT)
eks dev <ek...@yahoo.co.uk> wrote:

> To put it another way, Filter forces us to use BitSet,
> which is rather inefficient way to store a few
> documents from the big collection.

I cannot comment on your suggestion, but I think the current  filter
should probably be replaced by a more general solution.
(I haven't had the time to read the lucene mailing for quite some time
so this has probably been discussed before.)

The sort extensions already provide a cached list of document values. So
if I'd like to filter and sort for some field (e.g. date) it would look
much more natural to me to use that list for filtering than create a
list of acceptable documents stored as a bitset or a sparse list. 

So IMHO a filter (as seen by the searcher) should just provide an
interface to query the information if some document is accecpted or
refused by the filter. So the interface would look like

public interface Filter {
	boolean filter(int docno);
}

The current type of bitset based filters could implement this interface
as well as other types of filter.

One could problably go one step further and allow the filter to
modify the score as well (e.g. to allow for a scoring that degrades the
score of a hit depending on it's age).

Actually I just did an extension of IndexSearcher that provides that
kind of filtering for a distance filter based on geographic coordinates.

Of course such a change would not be compatible, so my suggestion goes
far beyond the suggested change.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Filter to support DocNrSkipper interface

Posted by eks dev <ek...@yahoo.co.uk>.
Hi,

Would it be OK to add one method in Filter class that
returns DocNrSkipper interface from Pauls's "Compact
sparse Filter" in jira LUCENE-328

This would be the first step for: 
- smooth integration of compact representations of the
underlaying BitSet in Filter (VInt and sorted int[]).
They are often faster for and/or operations. 
- ChainedFilter (see contrib from Hoss) enhancement
that operates on DocNrSkipper (see And(Or)DocNrSkipper
in Paul's work) 

Compatibility problems do not exist, only BitSet has
to be constructed in bits() method, the same as today
 
The reasoning that justifies effort in this direction
is that distribution of tokens in typical collection
is perfect for these 3 representations of BitVectors
(Very Low freq tokens in sorted int[],  Very HF tokens
in VInt and the rest in BitSet )

To put it another way, Filter forces us to use BitSet,
which is rather inefficient way to store a few
documents from the big collection.

Any feedback appreceated, could easily happen that I
overlooked something essential.

Cheers, e.


	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A lot of short documents, optimal query?

Posted by eks dev <ek...@yahoo.co.uk>.
Everything is perfect with your suggestion, scoring is
not needed. I am going to try all also approach with
ChainedFilter, but for this I need to think a bit more
on how to get it right. The Query in the example is
just one variation on the same topic and there are a
few more cases I need to cover (thanks to your
suggestions, I see now clean way to avoid scoring). 

Now a view from 10k miles, pls correct me if I am
wrong .

Filters are somehow "paralel world" to the complete
"normal" Query (PrefixQuery, TermQuery...) world that
evades scoring overhead, and ChainedFilter combines
them together. One could say "kind of BooleanQuery
without scoring"

Wouldn't it make sense to have BooleanFilter,
TermFilter, MultiTermFilter, RangeFilter... fammily to
"mirror"  xxxQuery world with same idioms and
interfaces? Is this the direction allready taken in
Lucene development (an alternative would be to
parametrize existiong Query world). How I see it
functionaly, at a moment filters (and thir
combination) are the only way to use fast "pure
boolean" model.

Does this what I just said makes any sense?



	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A lot of short documents, optimal query?

Posted by Chris Hostetter <ho...@fucit.org>.
: - What is the purpose of hasCode and equals methods in
: XxxFilter? (this is a question about actual usage in
: Lucene, not java elementary :)

You mean hashCode right? ... those methods are generally important for
Hashing, which makes then key for effective caching in most cases.
CachingWrapperFilter doesn't need them because each instance only caches
one Filter -- but if you're trying to write a good, reusable Filter,
you'll want to impliment them.  (for the same reasons you want to
impliment them anytime your trying to write a good reusable class)


: - What would be recommended usage of the SetFilter?
: a) to use search(Query, SetFilter, HitCollector)
: or
: b) search(Query, HitCollector) + I do AND inside
: hitCollector

My suggestion about writing a SetFilter was so you could completely bypass
the "search" -- which involves Weighting and Scoring -- instead you can
just call the bits method directly.  If i remember correctly, I suggested
it because you said you didn't care about score at all, so you can do
something like this (off the top of my head, not 100 certain this is
what you want)...

  Set lnames = ...;
  /* you need to write SetFilter */
  Filter l = new SetFilter("name",lnames);
  /* this is a poor mans "TermFilter" unless you want to write one */
  Filter f = new RangeFilter("name","raimonds", "raimonds",true,true);
  Filter zips = new ChainedFilter(new Filter[] {
     new RangeFilter("ZIP","20000","20999",true,true),
     ...
     new RangeFilter("ZIP","24500","24599",true,true),
     ...
  }, ChainedFilter.OR);
  Filter main = new ChainedFilter(new Filter[] { f, l, zips },
                                  ChainedFilter.AND);
  BitSet results = main.bits(r);


Thinking about your use case a little more, those RangeFilters would make
more sense as "PrefixFilters" ...

Hey Yonik, are you ever going to add your PrefixFilter and
ConstantScorePrefixQuery to LUCENE-383 ?

: > : +(
: > : (+raimonds +marschan)
: > : (+raimonds +marschol)
: > : (+raimonds +marschel)
: > : (+raimonds +marschalfr)
: > : (+raimonds +marschalek)
: > : (+raimonds +marscha)
: > : ...
: > : )
: > : +(ZIPS:22* ZIPS:21* ZIPS:20* ZIPS:23* ZIPS:245*
: > : ZIPS:246* ZIPS:247* ZIPS:240* ZIPS:241* ZIPS:242*
: > : ZIPS:243* ZIPS:254* ZIPS:253* ZIPS:255* ZIPS:256*
: > : ZIPS:257* ZIPS:295* ZIPS:296* ZIPS:273* ZIPS:274*
: > : ZIPS:275* ZIPS:276* ZIPS:192* ZIPS:190*)
: > : )
: >
: > independent of how short/long your documents are,
: > using RangeFilters on
: > your ZIPS field is going to be more efficient then
: > PrevixQueries ... I'd
: > bet money it will even be more efficient then making
: > a two character
: > prefix_ZIPS field and doing a TermQuery on it -- and
: > there's no reason not
: > to use a Filter if you dont' care about the score.
: >
: > take a look at RangeFilter in SVN, even if you are
: > using 1.4.3 it should
: > be combatible.  Also take a look at ChainedFilter as
: > a way to compose lots
: > of individual RangeFilters...
: >
: >
: http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/misc/ChainedFilter.java
: >
: > You can probably get additional speed ups by using
: > Filters on whatever
: > your default name search field is, google for
: > "lucene Hoss SetFilter" to
: > see a previous discussion where i suggested
: > something similar.
: >
: > -Hoss
: >
: >
: >
: ---------------------------------------------------------------------
: > To unsubscribe, e-mail:
: > java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail:
: > java-user-help@lucene.apache.org
: >
: >
:
:
:
:
: ___________________________________________________________
: Yahoo! Model Search 2005 - Find the next catwalk superstars - http://uk.news.yahoo.com/hot/model-search/
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A lot of short documents, optimal query?

Posted by eks dev <ek...@yahoo.co.uk>.
Thanks Hoss,

I've looked intio it and you were absolutely right,
could not be simpler.

Two quick ones on the same topic (my personal
education like questions):

- What is the purpose of hasCode and equals methods in
XxxFilter? (this is a question about actual usage in
Lucene, not java elementary :)

- What would be recommended usage of the SetFilter?
a) to use search(Query, SetFilter, HitCollector)
or
b) search(Query, HitCollector) + I do AND inside
hitCollector

Option a) could be faster because inside collect()
method in HitCollector there is no way to skip to the
first set bit of the SetFilter (bitset of the Query is
not visible) 

When we are done with testing (takes a while), I will
post results. Also, If somebody has interests in
SetFilter, I see no problems to post it back.

Agan, many thanks for usefull tip! 



--- Chris Hostetter <ho...@fucit.org> wrote:

> 
> : (
> : +(
> : (+raimonds +marschan)
> : (+raimonds +marschol)
> : (+raimonds +marschel)
> : (+raimonds +marschalfr)
> : (+raimonds +marschalek)
> : (+raimonds +marscha)
> : ...
> : )
> : +(ZIPS:22* ZIPS:21* ZIPS:20* ZIPS:23* ZIPS:245*
> : ZIPS:246* ZIPS:247* ZIPS:240* ZIPS:241* ZIPS:242*
> : ZIPS:243* ZIPS:254* ZIPS:253* ZIPS:255* ZIPS:256*
> : ZIPS:257* ZIPS:295* ZIPS:296* ZIPS:273* ZIPS:274*
> : ZIPS:275* ZIPS:276* ZIPS:192* ZIPS:190*)
> : )
> 
> independent of how short/long your documents are,
> using RangeFilters on
> your ZIPS field is going to be more efficient then
> PrevixQueries ... I'd
> bet money it will even be more efficient then making
> a two character
> prefix_ZIPS field and doing a TermQuery on it -- and
> there's no reason not
> to use a Filter if you dont' care about the score.
> 
> take a look at RangeFilter in SVN, even if you are
> using 1.4.3 it should
> be combatible.  Also take a look at ChainedFilter as
> a way to compose lots
> of individual RangeFilters...
> 
>
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/misc/ChainedFilter.java
> 
> You can probably get additional speed ups by using
> Filters on whatever
> your default name search field is, google for
> "lucene Hoss SetFilter" to
> see a previous discussion where i suggested
> something similar.
> 
> -Hoss
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 



		
___________________________________________________________ 
Yahoo! Model Search 2005 - Find the next catwalk superstars - http://uk.news.yahoo.com/hot/model-search/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A lot of short documents, optimal query?

Posted by Chris Hostetter <ho...@fucit.org>.
: (
: +(
: (+raimonds +marschan)
: (+raimonds +marschol)
: (+raimonds +marschel)
: (+raimonds +marschalfr)
: (+raimonds +marschalek)
: (+raimonds +marscha)
: ...
: )
: +(ZIPS:22* ZIPS:21* ZIPS:20* ZIPS:23* ZIPS:245*
: ZIPS:246* ZIPS:247* ZIPS:240* ZIPS:241* ZIPS:242*
: ZIPS:243* ZIPS:254* ZIPS:253* ZIPS:255* ZIPS:256*
: ZIPS:257* ZIPS:295* ZIPS:296* ZIPS:273* ZIPS:274*
: ZIPS:275* ZIPS:276* ZIPS:192* ZIPS:190*)
: )

independent of how short/long your documents are, using RangeFilters on
your ZIPS field is going to be more efficient then PrevixQueries ... I'd
bet money it will even be more efficient then making a two character
prefix_ZIPS field and doing a TermQuery on it -- and there's no reason not
to use a Filter if you dont' care about the score.

take a look at RangeFilter in SVN, even if you are using 1.4.3 it should
be combatible.  Also take a look at ChainedFilter as a way to compose lots
of individual RangeFilters...

http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/misc/ChainedFilter.java

You can probably get additional speed ups by using Filters on whatever
your default name search field is, google for "lucene Hoss SetFilter" to
see a previous discussion where i suggested something similar.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org