You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Avlesh Singh <av...@gmail.com> on 2010/01/08 06:15:13 UTC

Understanding the query parser

I am using Solr 1.3.
I have an index with a field called "name". It is of type "text"
(unmodified, stock text field from solr).

My query
field:foo-bar
is parsed as a phrase query
field:"foo bar"

I was rather expecting it to be parsed as
field:(foo bar)
or
field:foo field:bar

Is there an expectation mismatch? Can I make it work as I expect it to?

Cheers
Avlesh

Re: Understanding the query parser

Posted by Avlesh Singh <av...@gmail.com>.
Thanks Erik for responding.
Hoss explained the behavior with nice corollaries here -
http://www.lucidimagination.com/search/document/8bc351d408f24cf6/tokenizer_question

Cheers
Avlesh

On Tue, Jan 12, 2010 at 2:21 AM, Erik Hatcher <er...@gmail.com>wrote:

>
> On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote:
>
>
>>> It is in the source code of QueryParser's getFieldQuery(String field,
>>> String queryText)  method line#660. If numTokens > 1 it returns Phrase
>>> Query.
>>>
>>>  That's exactly the question. Would be nice to hear from someone as to
>> why is
>> it that way?
>>
>
> Suppose you indexed "Foo Bar".  It'd get indexed as two tokens [foo]
> followed by [bar].  Then someone searches for foo-bar, which would get
> analyzed into two tokens also.  A PhraseQuery is the most logical thing for
> it to turn into, no?
>
> What's the alternative?
>
> Of course it's tricky business though, impossible to do the right thing for
> all cases within SolrQueryParser.  Thankfully it is pleasantly subclassable
> and overridable for this method.
>
>        Erik
>
>

Re: Understanding the query parser

Posted by Erik Hatcher <er...@gmail.com>.
On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote:

>>
>> It is in the source code of QueryParser's getFieldQuery(String field,
>> String queryText)  method line#660. If numTokens > 1 it returns  
>> Phrase
>> Query.
>>
> That's exactly the question. Would be nice to hear from someone as  
> to why is
> it that way?

Suppose you indexed "Foo Bar".  It'd get indexed as two tokens [foo]  
followed by [bar].  Then someone searches for foo-bar, which would get  
analyzed into two tokens also.  A PhraseQuery is the most logical  
thing for it to turn into, no?

What's the alternative?

Of course it's tricky business though, impossible to do the right  
thing for all cases within SolrQueryParser.  Thankfully it is  
pleasantly subclassable and overridable for this method.

	Erik


Re: Understanding the query parser

Posted by Avlesh Singh <av...@gmail.com>.
>
> It is in the source code of QueryParser's getFieldQuery(String field,
> String queryText)  method line#660. If numTokens > 1 it returns Phrase
> Query.
>
That's exactly the question. Would be nice to hear from someone as to why is
it that way?

Cheers
Avlesh

On Mon, Jan 11, 2010 at 5:10 PM, Ahmet Arslan <io...@yahoo.com> wrote:

>
> > I am running in to the same issue. I have tried to replace
> > my
> > WhitespaceTokenizerFactory with a PatternTokenizerFactory
> > with pattern
> > (\s+|-) but I still seem to get a phrase query. Why is
> > that?
>
> It is in the source code of QueryParser's getFieldQuery(String field,
> String queryText)  method line#660. If numTokens > 1 it returns Phrase
> Query.
>
> Modifications in analysis phase (CharFilterFactory, TokenizerFactory,
> TokenFilterFactory) won't change this behavior. Something must be done
> before analysis phase.
>
> But i think in your case, you can obtain match with modifying parameters of
> WordDelimeterFilterFactory even with PhraseQuery.
>
>
>
>

Re: Understanding the query parser

Posted by Ahmet Arslan <io...@yahoo.com>.
> I am running in to the same issue. I have tried to replace
> my
> WhitespaceTokenizerFactory with a PatternTokenizerFactory
> with pattern
> (\s+|-) but I still seem to get a phrase query. Why is
> that?

It is in the source code of QueryParser's getFieldQuery(String field, String queryText)  method line#660. If numTokens > 1 it returns Phrase Query. 

Modifications in analysis phase (CharFilterFactory, TokenizerFactory, TokenFilterFactory) won't change this behavior. Something must be done before analysis phase.

But i think in your case, you can obtain match with modifying parameters of WordDelimeterFilterFactory even with PhraseQuery.


      

Re: Understanding the query parser

Posted by rswart <rj...@gmail.com>.
I am running in to the same issue. I have tried to replace my
WhitespaceTokenizerFactory with a PatternTokenizerFactory with pattern
(\s+|-) but I still seem to get a phrase query. Why is that?




Ahmet Arslan wrote:
> 
> 
>> I am using Solr 1.3.
>> I have an index with a field called "name". It is of type
>> "text"
>> (unmodified, stock text field from solr).
>> 
>> My query
>> field:foo-bar
>> is parsed as a phrase query
>> field:"foo bar"
>> 
>> I was rather expecting it to be parsed as
>> field:(foo bar)
>> or
>> field:foo field:bar
>> 
>> Is there an expectation mismatch? Can I make it work as I
>> expect it to?
> 
> If the query analyzer produces two or more tokens from a single token,
> QueryParser constructs PhraseQuery. Therefore it is expected. 
> 
> Without writing custom code it seems impossible to alter this behavior.
> 
> Modifying QueryParser to change this behavior will be troublesome. 
> I think easiest way is to replace '-' with whitespace before analysis
> phase. Probably in client side. Or in an custom RequestHandler.
> 
> May be you can set qp.setPhraseSlop(Integer.MAX_VALUE); so that 
> field:foo-bar and field:(foo AND bar) will be virtually equal.
> 
> hope this helps.
> 
> 
>       
> 
> 

-- 
View this message in context: http://old.nabble.com/Understanding-the-query-parser-tp27071483p27107523.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding the query parser

Posted by Ahmet Arslan <io...@yahoo.com>.
> I am using Solr 1.3.
> I have an index with a field called "name". It is of type
> "text"
> (unmodified, stock text field from solr).
> 
> My query
> field:foo-bar
> is parsed as a phrase query
> field:"foo bar"
> 
> I was rather expecting it to be parsed as
> field:(foo bar)
> or
> field:foo field:bar
> 
> Is there an expectation mismatch? Can I make it work as I
> expect it to?

If the query analyzer produces two or more tokens from a single token, QueryParser constructs PhraseQuery. Therefore it is expected. 

Without writing custom code it seems impossible to alter this behavior.

Modifying QueryParser to change this behavior will be troublesome. 
I think easiest way is to replace '-' with whitespace before analysis phase. Probably in client side. Or in an custom RequestHandler.

May be you can set qp.setPhraseSlop(Integer.MAX_VALUE); so that 
field:foo-bar and field:(foo AND bar) will be virtually equal.

hope this helps.