You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jim Swainston <ji...@googlemail.com> on 2011/08/03 15:26:20 UTC

Grouping Clauses to Preserve Order of Boolean Precedence

Hi,

I'm having trouble thinking of a way to effectively group clauses to form
sub queries. For example, I need to handle the following query:

Marketing AND Smith OR Davies.

Lucene is currently parsing this as  +Marketing +Smith Davies meaning that
results where only the term Davies is found are not returned. I want to be
able to apply the order of Boolean precedence so that this query is treated
as:

(Marketing AND Smith) OR Davies.

The QueryParser will correctly parse the above query as (+Marketing +Smith)
Davies meaning that results where only Davies is found will be returned.

Is there a way within Lucene to apply this order of precedence? Currently
the only way I can think of is to manually add the brackets in the correct
place. However, this would become more difficult as the number of clauses
increases so I'm not sure how scalable this method would be.

How would I work out where to place the brackets if the query was something
like:

Marketing AND Smith OR Davies OR Management OR Business AND
Science.........?

Can any suggest an effective way to group clauses so that the order of
Boolean precedence is preserved?

Thanks very much.

Jim

Re: Grouping Clauses to Preserve Order of Boolean Precedence

Posted by Jim Swainston <ji...@googlemail.com>.
Brilliant, that looks perfect. We're currently using an older version of
Lucene in which this was an experimental class. Looks like we should
upgrade.

Thanks

Jim

On 5 August 2011 02:10, Trejkaz <tr...@trypticon.org> wrote:

> On Fri, Aug 5, 2011 at 1:57 AM, Jim Swainston
> <ji...@googlemail.com> wrote:
> > So if the Text input is:
> >
> > Marketing AND Smith OR Davies
> >
> > I want my program to work out that this should be grouped as the
> following
> > (as AND has higher precedence than OR):
> >
> > (Marketing AND Smith) OR Davies.
> >
> > I'm effectively looking for an algorithm that will properly group any
> number
> > of terms......
>
> Have you tried using PrecedenceQueryParser instead of the standard one?
>
>
> http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/queryParser/precedence/PrecedenceQueryParser.html
>
> It probably does the right thing.
>
> (Related but not part of the answer: I solved the issue here by making
> it a parser-level feature, since I was making my own parser anyway.
> When we get the AST from the parser, the precedence has already been
> figured out.)
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Grouping Clauses to Preserve Order of Boolean Precedence

Posted by Trejkaz <tr...@trypticon.org>.
On Fri, Aug 5, 2011 at 1:57 AM, Jim Swainston
<ji...@googlemail.com> wrote:
> So if the Text input is:
>
> Marketing AND Smith OR Davies
>
> I want my program to work out that this should be grouped as the following
> (as AND has higher precedence than OR):
>
> (Marketing AND Smith) OR Davies.
>
> I'm effectively looking for an algorithm that will properly group any number
> of terms......

Have you tried using PrecedenceQueryParser instead of the standard one?

http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/queryParser/precedence/PrecedenceQueryParser.html

It probably does the right thing.

(Related but not part of the answer: I solved the issue here by making
it a parser-level feature, since I was making my own parser anyway.
When we get the AST from the parser, the precedence has already been
figured out.)

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Grouping Clauses to Preserve Order of Boolean Precedence

Posted by Jim Swainston <ji...@googlemail.com>.
My Apologies Hoss, perhaps I should have been clearer. I'm trying to
programatically build a BooleanQuery from text input. I want the
BooleanQuery that is built to have the correct structure based on the
precedence rules of Boolean Logic.

So if the Text input is:

Marketing AND Smith OR Davies

I want my program to work out that this should be grouped as the following
(as AND has higher precedence than OR):

(Marketing AND Smith) OR Davies.

I'm effectively looking for an algorithm that will properly group any number
of terms......

Thanks

Jim

On 4 August 2011 16:47, Chris Hostetter <ho...@fucit.org> wrote:

>
> : But the query parser doesn't seem to do that for me with the input
> Marketing
> : AND Smith OR Davies. The query parser gives me 3 clauses. 1 must clause
> for
>
> i didn't say the QueryParser would do that with *that* input
>
> You asked...
>
> : > : Thanks Ian. How would you achieve the logic of the below query using
> : > : BooleanQuery and BooleanClause.occur? How would you achieve the
> grouping
> : > : effect?
> : > :
> : > : (Marketing AND Smith) OR Davies
>
> ...and i said...
>
> : > The same way the query parser does: that's a BooleanQuery (A) with two
> : > "SHOULD" clauses, the first of which is a nested BooleanQuery (B) (with
> : > two "MUST" clauses (X child of B) Marketing, and (Y child of B) Smith),
> : > and the 2nd of which (C, child of A) is a query for Davies.
>
> ...which is exactly what the QueryParser would do given *that* input (with
> parens)
>
> If you want to programaticlly build up a query with the structure you are
> describing, nested BooleanQuery objects is how you do it.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Grouping Clauses to Preserve Order of Boolean Precedence

Posted by Chris Hostetter <ho...@fucit.org>.
: But the query parser doesn't seem to do that for me with the input Marketing
: AND Smith OR Davies. The query parser gives me 3 clauses. 1 must clause for

i didn't say the QueryParser would do that with *that* input

You asked...

: > : Thanks Ian. How would you achieve the logic of the below query using
: > : BooleanQuery and BooleanClause.occur? How would you achieve the grouping
: > : effect?
: > :
: > : (Marketing AND Smith) OR Davies

...and i said...

: > The same way the query parser does: that's a BooleanQuery (A) with two
: > "SHOULD" clauses, the first of which is a nested BooleanQuery (B) (with
: > two "MUST" clauses (X child of B) Marketing, and (Y child of B) Smith),
: > and the 2nd of which (C, child of A) is a query for Davies.

...which is exactly what the QueryParser would do given *that* input (with 
parens)

If you want to programaticlly build up a query with the structure you are 
describing, nested BooleanQuery objects is how you do it.


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Grouping Clauses to Preserve Order of Boolean Precedence

Posted by Jim Swainston <ji...@googlemail.com>.
But the query parser doesn't seem to do that for me with the input Marketing
AND Smith OR Davies. The query parser gives me 3 clauses. 1 must clause for
the term Marketing, 1 must clause for the term smith and 1 should clause for
the term Davies. e.g. +Marketing +Smith SHOULD Davies. What I would like the
query parser to be doing is recognising the order of Boolean precedence so
that it automatically gives me the nested query you describe e.g. SHOULD
(+Marketing +Smith) SHOULD Davies.

If Lucene can't do this does anyone know of any algorithms for generating
these nested queries so that they follow the order of Boolean precedence?

On 4 August 2011 02:05, Chris Hostetter <ho...@fucit.org> wrote:

>
> : Thanks Ian. How would you achieve the logic of the below query using
> : BooleanQuery and BooleanClause.occur? How would you achieve the grouping
> : effect?
> :
> : (Marketing AND Smith) OR Davies
>
> The same way the query parser does: that's a BooleanQuery (A) with two
> "SHOULD" clauses, the first of which is a nested BooleanQuery (B) (with
> two "MUST" clauses (X child of B) Marketing, and (Y child of B) Smith),
> and the 2nd of which (C, child of A) is a query for Davies.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Grouping Clauses to Preserve Order of Boolean Precedence

Posted by Chris Hostetter <ho...@fucit.org>.
: Thanks Ian. How would you achieve the logic of the below query using
: BooleanQuery and BooleanClause.occur? How would you achieve the grouping
: effect?
: 
: (Marketing AND Smith) OR Davies

The same way the query parser does: that's a BooleanQuery (A) with two 
"SHOULD" clauses, the first of which is a nested BooleanQuery (B) (with 
two "MUST" clauses (X child of B) Marketing, and (Y child of B) Smith), 
and the 2nd of which (C, child of A) is a query for Davies.


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Grouping Clauses to Preserve Order of Boolean Precedence

Posted by Jim Swainston <ji...@googlemail.com>.
Thanks Ian. How would you achieve the logic of the below query using
BooleanQuery and BooleanClause.occur? How would you achieve the grouping
effect?

(Marketing AND Smith) OR Davies

Thanks a lot.

Jim



On 3 August 2011 14:54, Ian Lea <ia...@gmail.com> wrote:

> I don't think there is an easy way.  Brackets are the official way to
> do it with the query parser:
> http://lucene.apache.org/java/3_3_0/queryparsersyntax.html#Grouping
>
> For anything non-trivial I prefer to build up queries in code using
> BooleanQuery.  That way it is comparatively easy to build in whatever
> logic you need with BooleanClause.Occur.
>
>
> There are alternative parsers in contrib.  They might have more
> support for grouping clauses.
>
>
>
> --
> Ian.
>
>
> On Wed, Aug 3, 2011 at 2:26 PM, Jim Swainston
> <ji...@googlemail.com> wrote:
> > Hi,
> >
> > I'm having trouble thinking of a way to effectively group clauses to form
> > sub queries. For example, I need to handle the following query:
> >
> > Marketing AND Smith OR Davies.
> >
> > Lucene is currently parsing this as  +Marketing +Smith Davies meaning
> that
> > results where only the term Davies is found are not returned. I want to
> be
> > able to apply the order of Boolean precedence so that this query is
> treated
> > as:
> >
> > (Marketing AND Smith) OR Davies.
> >
> > The QueryParser will correctly parse the above query as (+Marketing
> +Smith)
> > Davies meaning that results where only Davies is found will be returned.
> >
> > Is there a way within Lucene to apply this order of precedence? Currently
> > the only way I can think of is to manually add the brackets in the
> correct
> > place. However, this would become more difficult as the number of clauses
> > increases so I'm not sure how scalable this method would be.
> >
> > How would I work out where to place the brackets if the query was
> something
> > like:
> >
> > Marketing AND Smith OR Davies OR Management OR Business AND
> > Science.........?
> >
> > Can any suggest an effective way to group clauses so that the order of
> > Boolean precedence is preserved?
> >
> > Thanks very much.
> >
> > Jim
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Grouping Clauses to Preserve Order of Boolean Precedence

Posted by Ian Lea <ia...@gmail.com>.
I don't think there is an easy way.  Brackets are the official way to
do it with the query parser:
http://lucene.apache.org/java/3_3_0/queryparsersyntax.html#Grouping

For anything non-trivial I prefer to build up queries in code using
BooleanQuery.  That way it is comparatively easy to build in whatever
logic you need with BooleanClause.Occur.


There are alternative parsers in contrib.  They might have more
support for grouping clauses.



--
Ian.


On Wed, Aug 3, 2011 at 2:26 PM, Jim Swainston
<ji...@googlemail.com> wrote:
> Hi,
>
> I'm having trouble thinking of a way to effectively group clauses to form
> sub queries. For example, I need to handle the following query:
>
> Marketing AND Smith OR Davies.
>
> Lucene is currently parsing this as  +Marketing +Smith Davies meaning that
> results where only the term Davies is found are not returned. I want to be
> able to apply the order of Boolean precedence so that this query is treated
> as:
>
> (Marketing AND Smith) OR Davies.
>
> The QueryParser will correctly parse the above query as (+Marketing +Smith)
> Davies meaning that results where only Davies is found will be returned.
>
> Is there a way within Lucene to apply this order of precedence? Currently
> the only way I can think of is to manually add the brackets in the correct
> place. However, this would become more difficult as the number of clauses
> increases so I'm not sure how scalable this method would be.
>
> How would I work out where to place the brackets if the query was something
> like:
>
> Marketing AND Smith OR Davies OR Management OR Business AND
> Science.........?
>
> Can any suggest an effective way to group clauses so that the order of
> Boolean precedence is preserved?
>
> Thanks very much.
>
> Jim
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org