You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Naber <da...@t-online.de> on 2004/08/20 00:48:46 UTC
API cleanup: BooleanQuery.add()
BooleanQuery.add() currently takes two boolean values. That's difficult to
use as these two parameters can easily be confused. Also, there's a
runtime error if one uses true for both parameters. Thus this method's API
should be redesigned. It has been discussed here already:
http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg08479.html
I suggest the attached patch. Instead of Doug's suggestion of "Occur" my
patch uses "Operator" and instead of MUST/MUST_NOT/SHOULD it uses
AND/NOT/OR for the enumeration. I think SHOULD would be difficult to
understand, but I'm not happy with OR either. The problem is that it's
okay to have a BooleanQuery with only one Term, and OR would mean the same
as AND in that case.
Please post your suggestions for better names.
Regards
Daniel
--
http://www.danielnaber.de
Re: API cleanup: BooleanQuery.add()
Posted by Daniel Naber <da...@t-online.de>.
On Monday 23 August 2004 19:38, Doug Cutting wrote:
> > Does this mean we're limited
> > in the changes that we can apply? So is it okay to deprecate (and
> > later remove) the current constructor that takes two booleans and add
> > one that takes the Occur object?
>
> Yes, I don't see a problem with that.
Okay, here's a new version of the patch. It adds an enumeration and
deprecates the add(query, boolean, boolean) method. It also deprecates the
public fields of BooleanClause and adds appropriate get/set methods
instead. I'll commit this in the next days unless someone sees a problem
with it.
Regards
Daniel
--
http://www.danielnaber.de
Re: API cleanup: BooleanQuery.add()
Posted by Paul Elschot <pa...@xs4all.nl>.
On Monday 23 August 2004 19:38, Doug Cutting wrote:
> Daniel Naber wrote:
...
>
> >>BooleanQuery is really a misnomer, since it is not the traditional tree
> >>of boolean operators.
> >
> > Then maybe we should rename it for Lucene 2.0? Do you have a better name?
>
> Hmm... we could call it something like CompoundQuery, CombinedQuery or
> ComposedQuery, since it is the only primitive query type that combines
> other types of queries.
Those C...'s are perhaps too general, one could also see a SpanQuery
as C...'ed.
I was thinking about OccurrenceQuery or perhaps CombinedOccurrenceQuery.
But I can live with BooleanQuery.
> We could also add classes for true boolean queries, AndQuery, OrQuery
> and AndNotQuery, all of which rewrite to CompoundQuery. AndQuery and
> OrQuery could be n-ary, rather than binary. Would there be much point
> in this?
That would be nice for query languages having such facilities.
It's difficult to combine the current query language with those operators,
at least for me it was when implementing the Surround language posted
earlier. One point of that language is this line of java code:
public class DistanceQuery extends ComposedQuery implements DistanceSubQuery
which is why I'm not in favour of using ComposedQuery for boolean purposes
only. Searching a DistanceQuery is implemented by a Lucene SpanQuery btw.
OTOH, in the Surround language ComposedQuery is not a subclass of
Lucene's Query, it only represents that the input to the query parser
consisted of some operator with subqueries/clauses.
Perhaps one should distinguish parsed queries (AndQuery, OrQuery etc.)
from searchable queries (BooleanQuery, SpanQuery) by
putting them in separate packages.
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: API cleanup: BooleanQuery.add()
Posted by Doug Cutting <cu...@apache.org>.
Daniel Naber wrote:
> Okay. I noticed that my suggested patch only changes BooleanQuery, but
> BooleanClause is public, too. So it should be changed as well. However, it
> implements Serializable and I don't know why.
BooleanClause implements Serializable so that a BooleanQuery, which
encapsulates BooleanClauses, can be passed to a RemoteSearchable. In
general, the internal structures of a query, public or not, must be
serializable.
> Does this mean we're limited
> in the changes that we can apply? So is it okay to deprecate (and later
> remove) the current constructor that takes two booleans and add one that
> takes the Occur object?
Yes, I don't see a problem with that.
>>BooleanQuery is really a misnomer, since it is not the traditional tree
>>of boolean operators.
>
> Then maybe we should rename it for Lucene 2.0? Do you have a better name?
Hmm... we could call it something like CompoundQuery, CombinedQuery or
ComposedQuery, since it is the only primitive query type that combines
other types of queries.
We could also add classes for true boolean queries, AndQuery, OrQuery
and AndNotQuery, all of which rewrite to CompoundQuery. AndQuery and
OrQuery could be n-ary, rather than binary. Would there be much point
in this?
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: API cleanup: BooleanQuery.add()
Posted by Daniel Naber <da...@t-online.de>.
On Friday 20 August 2004 23:48, Doug Cutting wrote:
> I still prefer Occur.MUST, Occur.SHOULD and Occur.MUST_NOT.
Okay. I noticed that my suggested patch only changes BooleanQuery, but
BooleanClause is public, too. So it should be changed as well. However, it
implements Serializable and I don't know why. Does this mean we're limited
in the changes that we can apply? So is it okay to deprecate (and later
remove) the current constructor that takes two booleans and add one that
takes the Occur object?
> BooleanQuery is really a misnomer, since it is not the traditional tree
> of boolean operators.
Then maybe we should rename it for Lucene 2.0? Do you have a better name?
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: API cleanup: BooleanQuery.add()
Posted by Doug Cutting <cu...@apache.org>.
I still prefer Occur.MUST, Occur.SHOULD and Occur.MUST_NOT.
BooleanQuery is really a misnomer, since it is not the traditional tree
of boolean operators. Rather it is designed to directly support the
plus and minus operators popularized by web search engines. (It also,
felicitously, naturallly permits some bitmask-based optimizations that
would be considerably more complicated using a traditional boolean
operator tree.)
These operators are often and accurately described as affecting whether
terms must occur, should occur or must not occur. AND, OR and NOT are
confusing, since their boolean semantics are not the same as their
English semantics. Also, Lucene's prohibited is really AND_NOT, not
NOT. Such things are frequently a source of confusion from folks trying
to translate a traditional boolean query tree to a Lucene BooleanQuery.
I'd rather make it clearer that this is not a traditional boolean
query, but rather a different way to achieve the same ends.
Doug
Daniel Naber wrote:
> BooleanQuery.add() currently takes two boolean values. That's difficult to
> use as these two parameters can easily be confused. Also, there's a
> runtime error if one uses true for both parameters. Thus this method's API
> should be redesigned. It has been discussed here already:
>
> http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg08479.html
>
> I suggest the attached patch. Instead of Doug's suggestion of "Occur" my
> patch uses "Operator" and instead of MUST/MUST_NOT/SHOULD it uses
> AND/NOT/OR for the enumeration. I think SHOULD would be difficult to
> understand, but I'm not happy with OR either. The problem is that it's
> okay to have a BooleanQuery with only one Term, and OR would mean the same
> as AND in that case.
>
> Please post your suggestions for better names.
>
> Regards
> Daniel
>
>
>
> ------------------------------------------------------------------------
>
> Index: BooleanQuery.java
> ===================================================================
> RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/search/BooleanQuery.java,v
> retrieving revision 1.24
> diff -u -r1.24 BooleanQuery.java
> --- BooleanQuery.java 14 May 2004 12:56:29 -0000 1.24
> +++ BooleanQuery.java 17 Aug 2004 22:20:47 -0000
> @@ -25,6 +25,33 @@
> */
> public class BooleanQuery extends Query {
>
> + public static final class Operator {
> +
> + private String name;
> +
> + private Operator() {
> + // typesafe enum pattern, no public constructor
> + }
> +
> + private Operator(String name) {
> + // typesafe enum pattern, no public constructor
> + this.name = name;
> + }
> +
> + public String toString() {
> + return name;
> + }
> +
> + /** Use this operator for terms that <i>must</i> appear in the matching document. */
> + public static final Operator AND = new Operator("AND");
> + /** Use this operator for terms of which at least one <i>must</i> appear in the
> + * matching document. */
> + public static final Operator OR = new Operator("OR");
> + /** Use this operator for terms that <i>must not</i> appear in the matching document. */
> + public static final Operator NOT = new Operator("NOT");
> +
> + }
> +
> /**
> * Default value is 1024. Use <code>org.apache.lucene.maxClauseCount</code>
> * system property to override.
> @@ -65,14 +92,31 @@
> * It is an error to specify a clause as both <code>required</code> and
> * <code>prohibited</code>.
> *
> - * @see #getMaxClauseCount()
> + * @deprecated use {@link #add(Query, Operator)} instead
> */
> public void add(Query query, boolean required, boolean prohibited) {
> add(new BooleanClause(query, required, prohibited));
> }
>
> /** Adds a clause to a boolean query.
> - * @see #getMaxClauseCount()
> + *
> + * @throws TooManyClauses if the new number of clauses exceeds the maximum clause number
> + * @see #getMaxClauseCount()
> + */
> + public void add(Query query, Operator operator) {
> + if (operator == Operator.AND)
> + add(new BooleanClause(query, true, false));
> + else if (operator == Operator.OR)
> + add(new BooleanClause(query, false, false));
> + else if (operator == Operator.NOT)
> + add(new BooleanClause(query, false, true));
> + else
> + throw new IllegalArgumentException("Unknown operator " + operator);
> + }
> +
> + /** Adds a clause to a boolean query.
> + * @throws TooManyClauses if the new number of clauses exceeds the maximum clause number
> + * @see #getMaxClauseCount()
> */
> public void add(BooleanClause clause) {
> if (clauses.size() >= maxClauseCount)
>
>
>
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: API cleanup: BooleanQuery.add()
Posted by Incze Lajos <in...@axelero.hu>.
On Fri, Aug 20, 2004 at 01:32:57PM -0400, Erik Hatcher wrote:
> Looks ok to me. I don't really have a preference between OR or SHOULD.
>
> Erik
>
> On Aug 19, 2004, at 6:48 PM, Daniel Naber wrote:
>
> >
> >BooleanQuery.add() currently takes two boolean values. That's
> >difficult to
What about booleanQuery.add(clause) and booleanQuery.add(clause, op)
(defaulting the 1st one to "should/or" and with must/n't op)?
incze
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: API cleanup: BooleanQuery.add()
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Looks ok to me. I don't really have a preference between OR or SHOULD.
Erik
On Aug 19, 2004, at 6:48 PM, Daniel Naber wrote:
>
> BooleanQuery.add() currently takes two boolean values. That's
> difficult to
> use as these two parameters can easily be confused. Also, there's a
> runtime error if one uses true for both parameters. Thus this method's
> API
> should be redesigned. It has been discussed here already:
>
> http://www.mail-archive.com/lucene-user%40jakarta.apache.org/
> msg08479.html
>
> I suggest the attached patch. Instead of Doug's suggestion of "Occur"
> my
> patch uses "Operator" and instead of MUST/MUST_NOT/SHOULD it uses
> AND/NOT/OR for the enumeration. I think SHOULD would be difficult to
> understand, but I'm not happy with OR either. The problem is that it's
> okay to have a BooleanQuery with only one Term, and OR would mean the
> same
> as AND in that case.
>
> Please post your suggestions for better names.
>
> Regards
> Daniel
>
> --
> http://www.danielnaber.de
> <boolean.diff>---------------------------------------------------------
> ------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org