You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Florian Hanke <fl...@ergon.ch> on 2006/03/17 13:44:53 UTC

Appending * to each search term

Hello all,

I'd like to append an * (create a WildcardQuery) to each search term  
in a query, such that a query that is entered as e.g. "term1 AND  
term2" is modified (effectively) to "term1* AND term2*".
Parsing the search string is not very elegant (of course). I'm  
thinking that overriding QueryParser#get(Boolean etc.)Query is the  
way to go, the way it's designed. But still, extracting terms and  
injecting them back in while operating on specific Query classes does  
not seem the way to go.
Can anyone perhaps suggest a nice alternative?

Thanks very much and have a nice day,
    Florian

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Appending * to each search term

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Interestingly, the last two consulting jobs I've had dealt with this  
very issue - having user entered terms be interpreted as partial  
string to match in any indexed term.  Care must be taken to avoid the  
classic TooManyClauses exception or a more insidious OutOfMemory  
exception.

By using the PrefixQuery for all unadorned terms in QueryParser, you  
risk someone typing "a" and one of the above problems occurring,  
depending on how many terms you have in your index.

There are techniques to more efficiently handle the "starts with" or  
even the "contains" type substring queries by being clever with  
tokenization and taking advantage of clever tokenization to form much  
more efficient TermQuery queries.

If "starts with" are the only types of queries you need to worry  
about, and not "contains" then consider indexing with prefix tokens.   
For example, 'cat' could be indexed as 'cat', 'ca', and 'c'.  Someone  
types in 'ca' and you issue a TermQuery for 'ca' for a match.  The  
index size will grow, perhaps dramatically, but your searches will be  
much faster and more efficient.

I plan to provide more documentation, examples, and TokenFilter(s) to  
deal with this common scenario in the future.

	Erik


On Mar 17, 2006, at 7:51 AM, Eric Jain wrote:
> Florian Hanke wrote:
>> I'd like to append an * (create a WildcardQuery) to each search  
>> term in a query, such that a query that is entered as e.g. "term1  
>> AND term2" is modified (effectively) to "term1* AND term2*".
>> Parsing the search string is not very elegant (of course). I'm  
>> thinking that overriding QueryParser#get(Boolean etc.)Query is the  
>> way to go, the way it's designed. But still, extracting terms and  
>> injecting them back in while operating on specific Query classes  
>> does not seem the way to go.
>> Can anyone perhaps suggest a nice alternative?
>
> Perhaps you could subclass the QueryParser and override the  
> getFieldQuery method:
>
> protected Query getFieldQuery(String field, String term) {
>   return new PrefixQuery(new Term(field, term));
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Appending * to each search term

Posted by Florian Hanke <fl...@ergon.ch>.
Thank you very much - that did the trick! :)

Am 17.03.2006 um 13:51 schrieb Eric Jain:

> Perhaps you could subclass the QueryParser and override the  
> getFieldQuery method:
>
> protected Query getFieldQuery(String field, String term) {
>   return new PrefixQuery(new Term(field, term));
> }


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Appending * to each search term

Posted by Eric Jain <Er...@isb-sib.ch>.
Florian Hanke wrote:
> I'd like to append an * (create a WildcardQuery) to each search term in 
> a query, such that a query that is entered as e.g. "term1 AND term2" is 
> modified (effectively) to "term1* AND term2*".
> Parsing the search string is not very elegant (of course). I'm thinking 
> that overriding QueryParser#get(Boolean etc.)Query is the way to go, the 
> way it's designed. But still, extracting terms and injecting them back 
> in while operating on specific Query classes does not seem the way to go.
> Can anyone perhaps suggest a nice alternative?

Perhaps you could subclass the QueryParser and override the getFieldQuery 
method:

protected Query getFieldQuery(String field, String term) {
   return new PrefixQuery(new Term(field, term));
}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org