You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Brian Hurt <bh...@gmail.com> on 2010/12/13 20:10:03 UTC

The logic of QueryParser

I just encountered an unexpected behavior in query parser.  So, if you pass
in a query that is multiple terms, like "cat hat", the query that is
returned uses an or between the two term searches, instead of an and.  That
is, the query will return all documents with the given field containing
either "cat" or "hat".  Now, I know about phrase queries, using "\"cat
hat\"", and I know about +, "+cat +hat".  So there are ways to work around
the problem- the behavior was just unintuitive for me and several others.  I
was just wondering what the logic was for defaulting to or instead of and.

I have googled the mailing list archives and didn't find anything.  But if
this has been discussed to death, please just point me to the threads in the
archive. rather than stirring up some old flame war.  Or just tell me what
to google for (the terms I've tried haven't yielded anything useful).
Thanks.

Brian

Re: The logic of QueryParser

Posted by Robert Muir <rc...@gmail.com>.

On Mon, Dec 13, 2010 at 2:10 PM, Brian Hurt <bh...@gmail.com> wrote:
> I just encountered an unexpected behavior in query parser.  So, if you pass
> in a query that is multiple terms, like "cat hat", the query that is
> returned uses an or between the two term searches, instead of an and.  That
> is, the query will return all documents with the given field containing
> either "cat" or "hat".  Now, I know about phrase queries, using "\"cat
> hat\"", and I know about +, "+cat +hat".  So there are ways to work around
> the problem- the behavior was just unintuitive for me and several others.  I
> was just wondering what the logic was for defaulting to or instead of and.
>
> I have googled the mailing list archives and didn't find anything.  But if
> this has been discussed to death, please just point me to the threads in the
> archive. rather than stirring up some old flame war.  Or just tell me what
> to google for (the terms I've tried haven't yielded anything useful).
> Thanks.
>

Well its not quite a pure OR query, since it also incorporates
Similarity.coord() which boosts documents that contain more of the
query terms.
But to understand the default, imagine a more natural query of "where
is the cat in the hat".
The default OR query will still give good results, including boosting
documents that contain both 'cat' and 'hat', but with AND you would
get nothing if all of those low-value terms for some reason were not
in that document.

However, if your queries are more restricted, maybe you want to either:

1) adjust Similarity.coord() to make this boost better for your app
(for example, maybe only give a boost if overlap == maxOverlap, and
maybe play with the amount of boost)
 or
2) set your queryParser's default operator to AND with the
.setDefaultOperator() method..., but realize this could exclude very
relevant results that happen to be missing some useless keywords.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: The logic of QueryParser

Posted by Robert Muir <rc...@gmail.com>.

On Mon, Dec 13, 2010 at 3:16 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Mon, Dec 13, 2010 at 3:07 PM, Robert Muir <rc...@gmail.com> wrote:
>> On Mon, Dec 13, 2010 at 3:04 PM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>>> I think of the Lucene QueryParser like SQL. SQL is text based and also
>>> meant for human entered text - but for either very expert users, or
>>> programmatically created queries.  You normally don't want to pass
>>> text from a search box directly to an SQL database or to the Lucene
>>> QueryParser.
>>>
>>
>> Then why does solr use it by default?
>
> Because it's a decent default?  It was also the only choice when Solr
> was first created.  I don't see a compelling reason to change that.

Right, for the same reason its a good way for a lot of lucene apps to
get started quickly.

>
> Solr fits about the same place a database does in many applications...
> it's certainly not meant for users to query directly.  There's
> normally a web application that handles interaction with the user and
> creates/submits queries to Solr.
>

Sure, but just because Solr "fits the same place as a database" for
some apps doesn't suddenly make the queryparser "not for human
consumption".

There's definitely some human-oriented stuff in there that I'm not a
huge fan of (believe it or not, the whole way that its localized for
one), but that doesn't mean it should be treated like SQL. For lots of
apps it works just fine as a start.

And this is even more the reason to have the default as a coordinated
OR query, it works ok for both short and long queries, for both
synthetic languages where AND seems like it would make sense, and for
more analytic languages where AND as a default is completely stupid.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: The logic of QueryParser

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, Dec 13, 2010 at 3:07 PM, Robert Muir <rc...@gmail.com> wrote:
> On Mon, Dec 13, 2010 at 3:04 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> I think of the Lucene QueryParser like SQL. SQL is text based and also
>> meant for human entered text - but for either very expert users, or
>> programmatically created queries.  You normally don't want to pass
>> text from a search box directly to an SQL database or to the Lucene
>> QueryParser.
>>
>
> Then why does solr use it by default?

Because it's a decent default?  It was also the only choice when Solr
was first created.  I don't see a compelling reason to change that.

Solr fits about the same place a database does in many applications...
it's certainly not meant for users to query directly.  There's
normally a web application that handles interaction with the user and
creates/submits queries to Solr.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: The logic of QueryParser

Posted by Robert Muir <rc...@gmail.com>.

On Mon, Dec 13, 2010 at 3:04 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> I think of the Lucene QueryParser like SQL. SQL is text based and also
> meant for human entered text - but for either very expert users, or
> programmatically created queries.  You normally don't want to pass
> text from a search box directly to an SQL database or to the Lucene
> QueryParser.
>

Then why does solr use it by default?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: The logic of QueryParser

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, Dec 13, 2010 at 2:51 PM, Robert Muir <rc...@gmail.com> wrote:
> On Mon, Dec 13, 2010 at 2:43 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Mon, Dec 13, 2010 at 2:10 PM, Brian Hurt <bh...@gmail.com> wrote:
>>>  I was just wondering what the logic was for defaulting to or instead of and.
>>
>> Largely historical.  I think the original rational was that it
>> probably fit better with the traditional vector space model.
>> There's also not a good reason to change the default, given that
>> QueryParser isn't meant for end users.
>>
> Thats pretty misleading Yonik.
>
> "In other words, the query parser is designed for human-entered text,
> not for program-generated text."
> http://lucene.apache.org/java/3_0_3/queryparsersyntax.html

*shrugs*, I didn't recall that phrase... but I'm not clear if you
disagree with what I'm saying, or if you just think that it's
inconsistent with the documentation.

I think of the Lucene QueryParser like SQL. SQL is text based and also
meant for human entered text - but for either very expert users, or
programmatically created queries.  You normally don't want to pass
text from a search box directly to an SQL database or to the Lucene
QueryParser.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: The logic of QueryParser

Posted by Robert Muir <rc...@gmail.com>.

On Mon, Dec 13, 2010 at 2:43 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Mon, Dec 13, 2010 at 2:10 PM, Brian Hurt <bh...@gmail.com> wrote:
>>  I was just wondering what the logic was for defaulting to or instead of and.
>
> Largely historical.  I think the original rational was that it
> probably fit better with the traditional vector space model.
> There's also not a good reason to change the default, given that
> QueryParser isn't meant for end users.
>

Thats pretty misleading Yonik.

"In other words, the query parser is designed for human-entered text,
not for program-generated text."
http://lucene.apache.org/java/3_0_3/queryparsersyntax.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: The logic of QueryParser

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, Dec 13, 2010 at 2:10 PM, Brian Hurt <bh...@gmail.com> wrote:
>  I was just wondering what the logic was for defaulting to or instead of and.

Largely historical.  I think the original rational was that it
probably fit better with the traditional vector space model.
There's also not a good reason to change the default, given that
QueryParser isn't meant for end users.

-Yonik
http://lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: The logic of QueryParser

Posted by Brian Hurt <bh...@gmail.com>.

On Mon, Dec 13, 2010 at 2:35 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> > I have googled the mailing list archives and didn't find
> > anything.  But if
> > this has been discussed to death, please just point me to
> > the threads in the
> > archive. rather than stirring up some old flame war.
> > Or just tell me what
> > to google for (the terms I've tried haven't yielded
> > anything useful).
>
>
> org.apache.lucene.queryParser.QueryParser.setDefaultOperator()
>
>
>
Even better.  I had missed that.  Thank you very much!

Brian

Re: The logic of QueryParser

Posted by Ahmet Arslan <io...@yahoo.com>.

> I have googled the mailing list archives and didn't find
> anything.  But if
> this has been discussed to death, please just point me to
> the threads in the
> archive. rather than stirring up some old flame war. 
> Or just tell me what
> to google for (the terms I've tried haven't yielded
> anything useful).


org.apache.lucene.queryParser.QueryParser.setDefaultOperator()


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org