You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Burton-West, Tom" <tb...@umich.edu> on 2011/12/01 18:51:06 UTC

re: LUCENE-167 and Solr default handling of Boolean operators is broken

The default query parser in Solr does not handle precedence of Boolean operators in the way most people expect.

"A AND B OR C" gets interpreted as "A AND (B OR C)" . There are numerous other examples in the JIRA ticket for Lucene 167, this article on the wiki http://wiki.apache.org/lucene-java/BooleanQuerySyntax and in this blog post: http://robotlibrarian.billdueber.com/solr-and-boolean-operators/

This issue was reported in 2003 but the fix does not seem to have made it into the default query parser for either Lucene or Solr

It appears that Lucene 167 was closed in 2009 based on the assumption that the query parser in Lucene 1823 would become the default Lucene query parser.  However 1823 seems to have gotten bogged down and is not yet resolved.  I do see that there is a precedence query parser in LUCENE-1937  which was committed to contrib. in  the 3x branch:(http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/queryparser/src/java/org/apache/lucene/queryParser/precedence/package.html?view=co)

Would it be possible to use the contrib 3x  precedence query parser in Solr?
Would this require modifying the LuceneQParserPlugin and if so would it make sense to open a JIRA issue?

Are there any plans to make the precedence query parser the default for either Lucene or Solr?

If not, are there any plans to make it more prominent in the documentation that the default Lucene query parser has issues with precedence?


A bit more background below

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search
----------------------------------------------------

More Background

There were some concerns about breaking backward compatibility but in a mailing list post in 2005  Yonik  Sealy said:
"The current behavior is so surprising that I doubt  that no one is
relying on it."  (http://www.mail-archive.com/java-user@lucene.apache.org/msg00018.html)

and Doug Cutting said  "+1. Fixing operator precedence seems to me like an acceptable incompatibility. The change needs to be well documented in release notes, and the old QueryParser should be available, deprecated, for a time for back-compatibility."
(http://www.mail-archive.com/java-user@lucene.apache.org/msg00037.html)




RE: LUCENE-167 and Solr default handling of Boolean operators is broken

Posted by "Burton-West, Tom" <tb...@umich.edu>.
Thanks Yonik,

Should I open a Solr JIRA issue?

Tom

-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: Thursday, December 01, 2011 1:16 PM
To: dev@lucene.apache.org
Subject: Re: LUCENE-167 and Solr default handling of Boolean operators is broken

Whew, that was a while ago - didn't remember even commenting on the
issue, but it still makes sense (double-negative aside... boy I hate
re-reading things I wrote to quickly ;-)

The old precedence query parser had issues IIRC.  The precedence query
parser based on the flexible queryparser framework in contrib isn't
that Solr friendly (i.e. Solr has a lot of hooks into the current
standard query parser and moving would probably be both error prone
and difficult).

SolrCloud is consuming my time right now, but I might be able to take
look to see if this is easy to fix in another month or so (if no one
beats me to it).  Since it's a major release, we may be able to just
fix it in trunk w/o having to keep the old behavior.

-Yonik
http://www.lucidimagination.com



On Thu, Dec 1, 2011 at 12:51 PM, Burton-West, Tom <tb...@umich.edu> wrote:
> The default query parser in Solr does not handle precedence of Boolean
> operators in the way most people expect.
>
> "A AND B OR C" gets interpreted as "A AND (B OR C)" . There are numerous
> other examples in the JIRA ticket for Lucene 167, this article on the wiki
> http://wiki.apache.org/lucene-java/BooleanQuerySyntax and in this blog post:
> http://robotlibrarian.billdueber.com/solr-and-boolean-operators/
>
> This issue was reported in 2003 but the fix does not seem to have made it
> into the default query parser for either Lucene or Solr
>
> It appears that Lucene 167 was closed in 2009 based on the assumption that
> the query parser in Lucene 1823 would become the default Lucene query
> parser.  However 1823 seems to have gotten bogged down and is not yet
> resolved.  I do see that there is a precedence query parser in LUCENE-1937
> which was committed to contrib. in  the 3x
> branch:(http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/queryparser/src/java/org/apache/lucene/queryParser/precedence/package.html?view=co)
>
> Would it be possible to use the contrib 3x precedence query parser in Solr?
> Would this require modifying the LuceneQParserPlugin and if so would it make
> sense to open a JIRA issue?
>
> Are there any plans to make the precedence query parser the default for
> either Lucene or Solr?
>
> If not, are there any plans to make it more prominent in the documentation
> that the default Lucene query parser has issues with precedence?
>
>
> A bit more background below
>
> Tom Burton-West
> http://www.hathitrust.org/blogs/large-scale-search
> ----------------------------------------------------
>
> More Background
>
> There were some concerns about breaking backward compatibility but in a
> mailing list post in 2005  Yonik Sealy said:
> "The current behavior is so surprising that I doubt  that no one is
> relying on it."
> (http://www.mail-archive.com/java-user@lucene.apache.org/msg00018.html)
>
> and Doug Cutting said  "+1. Fixing operator precedence seems to me like an
> acceptable incompatibility. The change needs to be well documented in
> release notes, and the old QueryParser should be available, deprecated, for
> a time for back-compatibility."
> (http://www.mail-archive.com/java-user@lucene.apache.org/msg00037.html)
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LUCENE-167 and Solr default handling of Boolean operators is broken

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Whew, that was a while ago - didn't remember even commenting on the
issue, but it still makes sense (double-negative aside... boy I hate
re-reading things I wrote to quickly ;-)

The old precedence query parser had issues IIRC.  The precedence query
parser based on the flexible queryparser framework in contrib isn't
that Solr friendly (i.e. Solr has a lot of hooks into the current
standard query parser and moving would probably be both error prone
and difficult).

SolrCloud is consuming my time right now, but I might be able to take
look to see if this is easy to fix in another month or so (if no one
beats me to it).  Since it's a major release, we may be able to just
fix it in trunk w/o having to keep the old behavior.

-Yonik
http://www.lucidimagination.com



On Thu, Dec 1, 2011 at 12:51 PM, Burton-West, Tom <tb...@umich.edu> wrote:
> The default query parser in Solr does not handle precedence of Boolean
> operators in the way most people expect.
>
> “A AND B OR C” gets interpreted as “A AND (B OR C)” . There are numerous
> other examples in the JIRA ticket for Lucene 167, this article on the wiki
> http://wiki.apache.org/lucene-java/BooleanQuerySyntax and in this blog post:
> http://robotlibrarian.billdueber.com/solr-and-boolean-operators/
>
> This issue was reported in 2003 but the fix does not seem to have made it
> into the default query parser for either Lucene or Solr
>
> It appears that Lucene 167 was closed in 2009 based on the assumption that
> the query parser in Lucene 1823 would become the default Lucene query
> parser.  However 1823 seems to have gotten bogged down and is not yet
> resolved.  I do see that there is a precedence query parser in LUCENE-1937
> which was committed to contrib. in  the 3x
> branch:(http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/queryparser/src/java/org/apache/lucene/queryParser/precedence/package.html?view=co)
>
> Would it be possible to use the contrib 3x precedence query parser in Solr?
> Would this require modifying the LuceneQParserPlugin and if so would it make
> sense to open a JIRA issue?
>
> Are there any plans to make the precedence query parser the default for
> either Lucene or Solr?
>
> If not, are there any plans to make it more prominent in the documentation
> that the default Lucene query parser has issues with precedence?
>
>
> A bit more background below
>
> Tom Burton-West
> http://www.hathitrust.org/blogs/large-scale-search
> ----------------------------------------------------
>
> More Background
>
> There were some concerns about breaking backward compatibility but in a
> mailing list post in 2005  Yonik Sealy said:
> “The current behavior is so surprising that I doubt  that no one is
> relying on it.”
> (http://www.mail-archive.com/java-user@lucene.apache.org/msg00018.html)
>
> and Doug Cutting said  “+1. Fixing operator precedence seems to me like an
> acceptable incompatibility. The change needs to be well documented in
> release notes, and the old QueryParser should be available, deprecated, for
> a time for back-compatibility.”
> (http://www.mail-archive.com/java-user@lucene.apache.org/msg00037.html)
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org