You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Naber <da...@t-online.de> on 2004/08/20 00:48:46 UTC

API cleanup: BooleanQuery.add()

BooleanQuery.add() currently takes two boolean values. That's difficult to 
use as these two parameters can easily be confused. Also, there's a 
runtime error if one uses true for both parameters. Thus this method's API 
should be redesigned. It has been discussed here already:

http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg08479.html

I suggest the attached patch. Instead of Doug's suggestion of "Occur" my 
patch uses "Operator" and instead of MUST/MUST_NOT/SHOULD it uses 
AND/NOT/OR for the enumeration. I think SHOULD would be difficult to 
understand, but I'm not happy with OR either. The problem is that it's 
okay to have a BooleanQuery with only one Term, and OR would mean the same 
as AND in that case.

Please post your suggestions for better names.

Regards
 Daniel

-- 
http://www.danielnaber.de

Re: API cleanup: BooleanQuery.add()

Posted by Daniel Naber <da...@t-online.de>.
On Monday 23 August 2004 19:38, Doug Cutting wrote:

> > Does this mean we're limited
> > in the changes that we can apply? So is it okay to deprecate (and
> > later remove) the current constructor that takes two booleans and add
> > one that takes the Occur object?
>
> Yes, I don't see a problem with that.

Okay, here's a new version of the patch. It adds an enumeration and 
deprecates the add(query, boolean, boolean) method. It also deprecates the 
public fields of BooleanClause and adds appropriate get/set methods 
instead. I'll commit this in the next days unless someone sees a problem 
with it.

Regards
 Daniel

-- 
http://www.danielnaber.de

Re: API cleanup: BooleanQuery.add()

Posted by Paul Elschot <pa...@xs4all.nl>.
On Monday 23 August 2004 19:38, Doug Cutting wrote:
> Daniel Naber wrote:
...
>
> >>BooleanQuery is really a misnomer, since it is not the traditional tree
> >>of boolean operators.
> >
> > Then maybe we should rename it for Lucene 2.0? Do you have a better name?
>
> Hmm... we could call it something like CompoundQuery, CombinedQuery or
> ComposedQuery, since it is the only primitive query type that combines
> other types of queries.

Those C...'s are perhaps too general, one could also see a SpanQuery
as C...'ed.
I was thinking about OccurrenceQuery or perhaps CombinedOccurrenceQuery.
But I can live with BooleanQuery.

> We could also add classes for true boolean queries, AndQuery, OrQuery
> and AndNotQuery, all of which rewrite to CompoundQuery.  AndQuery and
> OrQuery could be n-ary, rather than binary.  Would there be much point
> in this?

That would be nice for query languages having such facilities. 
It's difficult to combine the current query language with those operators,
at least for me it was when implementing the Surround language posted
earlier. One point of that language is this line of java code:

public class DistanceQuery extends ComposedQuery implements DistanceSubQuery

which is why I'm not in favour of using ComposedQuery for boolean purposes
only. Searching a DistanceQuery is implemented by a Lucene SpanQuery btw.

OTOH, in the Surround language ComposedQuery is not a subclass of
Lucene's Query, it only represents that the input to the query parser
consisted of some operator with subqueries/clauses.

Perhaps one should distinguish parsed queries (AndQuery, OrQuery etc.)
from searchable queries (BooleanQuery, SpanQuery) by
putting them in separate packages.


Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: API cleanup: BooleanQuery.add()

Posted by Doug Cutting <cu...@apache.org>.
Daniel Naber wrote:
> Okay. I noticed that my suggested patch only changes BooleanQuery, but 
> BooleanClause is public, too. So it should be changed as well. However, it 
> implements Serializable and I don't know why.

BooleanClause implements Serializable so that a BooleanQuery, which 
encapsulates BooleanClauses, can be passed to a RemoteSearchable.  In 
general, the internal structures of a query, public or not, must be 
serializable.

> Does this mean we're limited 
> in the changes that we can apply? So is it okay to deprecate (and later 
> remove) the current constructor that takes two booleans and add one that 
> takes the Occur object? 

Yes, I don't see a problem with that.

>>BooleanQuery is really a misnomer, since it is not the traditional tree
>>of boolean operators.
> 
> Then maybe we should rename it for Lucene 2.0? Do you have a better name?

Hmm... we could call it something like CompoundQuery, CombinedQuery or 
ComposedQuery, since it is the only primitive query type that combines 
other types of queries.

We could also add classes for true boolean queries, AndQuery, OrQuery 
and AndNotQuery, all of which rewrite to CompoundQuery.  AndQuery and 
OrQuery could be n-ary, rather than binary.  Would there be much point 
in this?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: API cleanup: BooleanQuery.add()

Posted by Daniel Naber <da...@t-online.de>.
On Friday 20 August 2004 23:48, Doug Cutting wrote:

> I still prefer Occur.MUST, Occur.SHOULD and Occur.MUST_NOT.

Okay. I noticed that my suggested patch only changes BooleanQuery, but 
BooleanClause is public, too. So it should be changed as well. However, it 
implements Serializable and I don't know why. Does this mean we're limited 
in the changes that we can apply? So is it okay to deprecate (and later 
remove) the current constructor that takes two booleans and add one that 
takes the Occur object? 

> BooleanQuery is really a misnomer, since it is not the traditional tree
> of boolean operators.

Then maybe we should rename it for Lucene 2.0? Do you have a better name?

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: API cleanup: BooleanQuery.add()

Posted by Doug Cutting <cu...@apache.org>.
I still prefer Occur.MUST, Occur.SHOULD and Occur.MUST_NOT. 
BooleanQuery is really a misnomer, since it is not the traditional tree 
of boolean operators.  Rather it is designed to directly support the 
plus and minus operators popularized by web search engines.  (It also, 
  felicitously, naturallly permits some bitmask-based optimizations that 
would be considerably more complicated using a traditional boolean 
operator tree.)

These operators are often and accurately described as affecting whether 
terms must occur, should occur or must not occur.  AND, OR and NOT are 
confusing, since their boolean semantics are not the same as their 
English semantics.  Also, Lucene's prohibited is really AND_NOT, not 
NOT.  Such things are frequently a source of confusion from folks trying 
to translate a traditional boolean query tree to a Lucene BooleanQuery. 
  I'd rather make it clearer that this is not a traditional boolean 
query, but rather a different way to achieve the same ends.

Doug

Daniel Naber wrote:
> BooleanQuery.add() currently takes two boolean values. That's difficult to 
> use as these two parameters can easily be confused. Also, there's a 
> runtime error if one uses true for both parameters. Thus this method's API 
> should be redesigned. It has been discussed here already:
> 
> http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg08479.html
> 
> I suggest the attached patch. Instead of Doug's suggestion of "Occur" my 
> patch uses "Operator" and instead of MUST/MUST_NOT/SHOULD it uses 
> AND/NOT/OR for the enumeration. I think SHOULD would be difficult to 
> understand, but I'm not happy with OR either. The problem is that it's 
> okay to have a BooleanQuery with only one Term, and OR would mean the same 
> as AND in that case.
> 
> Please post your suggestions for better names.
> 
> Regards
>  Daniel
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Index: BooleanQuery.java
> ===================================================================
> RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/search/BooleanQuery.java,v
> retrieving revision 1.24
> diff -u -r1.24 BooleanQuery.java
> --- BooleanQuery.java	14 May 2004 12:56:29 -0000	1.24
> +++ BooleanQuery.java	17 Aug 2004 22:20:47 -0000
> @@ -25,6 +25,33 @@
>    */
>  public class BooleanQuery extends Query {
>  
> +  public static final class Operator {
> +    
> +    private String name;
> +    
> +    private Operator() {
> +      // typesafe enum pattern, no public constructor
> +    }
> +    
> +    private Operator(String name) {
> +      // typesafe enum pattern, no public constructor
> +      this.name = name;
> +    }
> +    
> +    public String toString() {
> +      return name;
> +    }
> +
> +    /** Use this operator for terms that <i>must</i> appear in the matching document. */
> +    public static final Operator AND = new Operator("AND");
> +    /** Use this operator for terms of which at least one <i>must</i> appear in the 
> +     * matching document. */
> +    public static final Operator OR = new Operator("OR");
> +    /** Use this operator for terms that <i>must not</i> appear in the matching document. */
> +    public static final Operator NOT = new Operator("NOT");
> +    
> +  }
> +  
>    /**
>     * Default value is 1024.  Use <code>org.apache.lucene.maxClauseCount</code>
>     * system property to override.
> @@ -65,14 +92,31 @@
>     * It is an error to specify a clause as both <code>required</code> and
>     * <code>prohibited</code>.
>     *
> -   * @see #getMaxClauseCount()
> +   * @deprecated use {@link #add(Query, Operator)} instead
>     */
>    public void add(Query query, boolean required, boolean prohibited) {
>      add(new BooleanClause(query, required, prohibited));
>    }
>  
>    /** Adds a clause to a boolean query.
> -    * @see #getMaxClauseCount()
> +   *
> +   * @throws TooManyClauses if the new number of clauses exceeds the maximum clause number
> +   * @see #getMaxClauseCount()
> +   */
> +  public void add(Query query, Operator operator) {
> +    if (operator == Operator.AND)
> +      add(new BooleanClause(query, true, false));
> +    else if (operator == Operator.OR)
> +      add(new BooleanClause(query, false, false));
> +    else if (operator == Operator.NOT)
> +      add(new BooleanClause(query, false, true));
> +    else
> +      throw new IllegalArgumentException("Unknown operator " + operator);
> +  }
> +
> +  /** Adds a clause to a boolean query.
> +   * @throws TooManyClauses if the new number of clauses exceeds the maximum clause number
> +   * @see #getMaxClauseCount()
>     */
>    public void add(BooleanClause clause) {
>      if (clauses.size() >= maxClauseCount)
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: API cleanup: BooleanQuery.add()

Posted by Incze Lajos <in...@axelero.hu>.
On Fri, Aug 20, 2004 at 01:32:57PM -0400, Erik Hatcher wrote:
> Looks ok to me.  I don't really have a preference between OR or SHOULD.
> 
> 	Erik
> 
> On Aug 19, 2004, at 6:48 PM, Daniel Naber wrote:
> 
> >
> >BooleanQuery.add() currently takes two boolean values. That's  
> >difficult to

What about booleanQuery.add(clause) and booleanQuery.add(clause, op)
(defaulting the 1st one to "should/or" and with must/n't op)?

incze

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: API cleanup: BooleanQuery.add()

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Looks ok to me.  I don't really have a preference between OR or SHOULD.

	Erik

On Aug 19, 2004, at 6:48 PM, Daniel Naber wrote:

>
> BooleanQuery.add() currently takes two boolean values. That's  
> difficult to
> use as these two parameters can easily be confused. Also, there's a
> runtime error if one uses true for both parameters. Thus this method's  
> API
> should be redesigned. It has been discussed here already:
>
> http://www.mail-archive.com/lucene-user%40jakarta.apache.org/ 
> msg08479.html
>
> I suggest the attached patch. Instead of Doug's suggestion of "Occur"  
> my
> patch uses "Operator" and instead of MUST/MUST_NOT/SHOULD it uses
> AND/NOT/OR for the enumeration. I think SHOULD would be difficult to
> understand, but I'm not happy with OR either. The problem is that it's
> okay to have a BooleanQuery with only one Term, and OR would mean the  
> same
> as AND in that case.
>
> Please post your suggestions for better names.
>
> Regards
>  Daniel
>
> -- 
> http://www.danielnaber.de
> <boolean.diff>--------------------------------------------------------- 
> ------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org