You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Aigner, Thomas" <TA...@WescoDist.com> on 2006/01/30 19:51:15 UTC

Number Searches vs Character

I am curious what would be the difference between searching for a number
verses a character.

I have a large index consisting of a few fields (So index would look
something like:  " 123123123 my description my catalog"
	
Searching for 12* is much slower than searching for de*
I don't have any issues searching for 3 or more characters.. just 1 or 2
and the wildcard.
	Stored this way: doc.add(Field.Text("allindex", all,true));
	Using 1.4.3

Any reason why that would be and how I can help speed that up?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Number Searches vs Character

Posted by Chris Hostetter <ho...@fucit.org>.
PrefixQuery is implimented as a BooleanQuery using term expansion.  what
that means is that a prefix query on a common prefix is much more
expensive then a prefix query on a less common prefix.  not just in terms
of hte number of documents that match, but because of the number of terms
that match the prefix.

assuming "pq" is your PrefixQuery object, take a look at the output of
calling pq.rewrite(yourReader).toString() and compare the difference
between your 12* and your de* queries ... i'm guessing you'll find that
the 12* approach is a lot bigger.

if you poke arround you'll find mention of a ConstanctScoreRangeQuery ...
using the ideas from that, you can impliment a much faster version of
PrefixQuery that doesn't score documents based on term frequency ... which
may be ok depending on your needs.


: Date: Mon, 30 Jan 2006 13:51:15 -0500
: From: "Aigner, Thomas" <TA...@WescoDist.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Number Searches vs Character
:
:
: I am curious what would be the difference between searching for a number
: verses a character.
:
: I have a large index consisting of a few fields (So index would look
: something like:  " 123123123 my description my catalog"
:
: Searching for 12* is much slower than searching for de*
: I don't have any issues searching for 3 or more characters.. just 1 or 2
: and the wildcard.
: 	Stored this way: doc.add(Field.Text("allindex", all,true));
: 	Using 1.4.3
:
: Any reason why that would be and how I can help speed that up?
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org