You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew May <am...@ingenta.com> on 2006/08/18 21:17:23 UTC

Queries with wildcards

Hi,

I figure I'm probably being stupid, but I can't seem to get queries (using the standard 
request handler) using wildcards to work.

For example, using the latest build (Aug 18) and the example documents, a search for 
Enterprise matches the SOLR1000 document, but a search for Enter* does not.

I guess I'd kind of assumed that wildcards would work (being part of the linked to Lucene 
query syntax), but I'd never actually tested it (or not with anything other than 
brain-dead searches that probably worked due to stemming rather than wildcards).

So, does it work, and I'm just totally hopeless/incompetent (it is after all a Friday 
afternoon)? Or is there some good reason why wildcards don't work?

Thanks,

Andrew

Re: Queries with wildcards

Posted by Andrew May <am...@ingenta.com>.
Chris Hostetter wrote:
> : For example, using the latest build (Aug 18) and the example documents, a search for
> : Enterprise matches the SOLR1000 document, but a search for Enter* does not.
> 
> try searching for:    enter*

Ah-ha!

> 
> ...this is a somewhat long standing anoyance with Lucene, that exists
> because there's really no good way to deal with it -- when using
> Wildcards, the Lucene QueryParser does not analyze the input -- if you ask
> for a wildCard search on Enter*, a PrefixQuery is constructed with that
> exact prefix, case an all.  But in this case, the default search field is
> "text" which uses the LowerCaseFilter -- so you'll never get a match on a
> prefix with an upersapce character.
> 
> 
> In general, Wildcard queries are "hard" and only make sense on fields that
> have very simplistic Index time analyzers (like WhitespaceAnalyzer)
> -- even then you might want to use the LowercaseFilter and override the
> QueryParser's getPrfixQuery and getWildCardQuery methods to do things like
> lowercase the input string for certain fields so you don't get anoying
> situations like enter* not matching Enterprise.
> 

Thanks, I think I understand now. Given that I'm doing some processing of the user input 
before passing the query onto Solr, I can convert the query to lowercase before passing it 
on.

-Andrew

Re: Queries with wildcards

Posted by Chris Hostetter <ho...@fucit.org>.
: For example, using the latest build (Aug 18) and the example documents, a search for
: Enterprise matches the SOLR1000 document, but a search for Enter* does not.

try searching for:    enter*

...this is a somewhat long standing anoyance with Lucene, that exists
because there's really no good way to deal with it -- when using
Wildcards, the Lucene QueryParser does not analyze the input -- if you ask
for a wildCard search on Enter*, a PrefixQuery is constructed with that
exact prefix, case an all.  But in this case, the default search field is
"text" which uses the LowerCaseFilter -- so you'll never get a match on a
prefix with an upersapce character.

The reason that the QueryParser doesn't attempt to analyze the input you
give it when doing a PrefixQuery, is because it might get analyzed in a
completley differnet way then the words that prefix "logically" matches
on.  Consider for example using a Porter stemmer on "enterprise" -- that
produces "enterpris" ... but if you asked for a prefix search for
"enterpris*", and the query parser analyzed "enterpris" then the
PorterStemmer would produce "enterpri"

The problem gets even worse when dealing with mid-word WildCards like
"Ent*prise" ... how can the QueryParser even approach trying to analyze
that input -- the * certianly isnt' aprt ofthe text, should it split it up
into two words and analyze them seperatly, and then rejoin them with a
Star in the middle?

In general, Wildcard queries are "hard" and only make sense on fields that
have very simplistic Index time analyzers (like WhitespaceAnalyzer)
-- even then you might want to use the LowercaseFilter and override the
QueryParser's getPrfixQuery and getWildCardQuery methods to do things like
lowercase the input string for certain fields so you don't get anoying
situations like enter* not matching Enterprise.


-Hoss