You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Daniel Naber <lu...@danielnaber.de> on 2006/12/04 00:08:31 UTC

Phrase queries with wildcards

Hi,

Lucene's phrase queries don't support wildcards and I'm thinking about the 
best way to "fix" this. One way would be to change QueryParser so that it 
builds a MultiPhraseQuery when it encounters a wildcard inside a phrase. 
However, to expand the wildcard the QueryParser needs an IndexReader.

The other way might be to add a rewrite to PhraseQuery that returns a 
MultiPhraseQuery with the terms expanded. QueryParser would need to be 
modified so that the "*" isn't already lost when it's added to the phrase 
query.

The second solution looks better to me. Has anybody solved this problem 
already and would like to share his experiences?

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phrase queries with wildcards

Posted by Chris Hostetter <ho...@fucit.org>.
: best way to "fix" this. One way would be to change QueryParser so that it
: builds a MultiPhraseQuery when it encounters a wildcard inside a phrase.
: However, to expand the wildcard the QueryParser needs an IndexReader.
:
: The other way might be to add a rewrite to PhraseQuery that returns a
: MultiPhraseQuery with the terms expanded. QueryParser would need to be
: modified so that the "*" isn't already lost when it's added to the phrase
: query.

i would suggest a hybrid approach .. write a new WildCardPhraseQuery that
rewrites to a MultiPhraseQuery ... QP doesn't need an IndexReader and
PhraseQuery doesn't need to start doing anything special with terms that
have "*" in them  (which would break backwards compatibility)

I don't have the code for QueryParser in front of me right now, but as i
recall the one anoying thing in all of this is that there is not easy way
to subclass QP and replace what type of query gets used when QP encounters
a "quoted" phrase ...  getFieldQuery is used for both single 'words' and
for quoted strings, and then the default slop is added afterwards if the
resulting query is an instance of PhraseQuery.

maybe the existing getFieldQuery methods should be deprecated and replaced
with new getQuotedQuery and getWordQuery ?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org