You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Peter Carlson <ca...@bookandhammer.com> on 2002/05/15 15:57:50 UTC

PLEASE REVIEW: Updated Query Parser Syntax

Hi,
I have updated the query parser syntax document and put it up on the website
(without any  links).

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

Please review it and send feedback.

Current questions:

The current queryParser supports range searches, but can you put a date into
the query parser? Are range searches only used for searching dates?

--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Can NOT be used with a single Term?

Posted by Eugene Gluzberg <dr...@apache.org>.
Oops. :(

Could you check in the test into cvs? I was just about to add it to
TestQueryParser

----- Original Message -----
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Thursday, May 16, 2002 11:54 AM
Subject: Re: Can NOT be used with a single Term?


> I just tried:
>  -foo
>  NOT foo
>
> Neither worked.
>
> Otis
>
> --- Peter Carlson <ca...@bookandhammer.com> wrote:
> > According the Jguru FAQ
> >
> >
> > Is it possible to find all documents in the index that do not contain
> > a
> > certain term? In other words, is it possible to make a query such as
> > 'NOT
> > <term>'?
> >
> >
> > No, it is not possible to do that with Lucene. Lucene could be
> > modified to
> > do that, but such queries would be very slow.
> > c.f. http://marc.theaimsgroup.com/?t=100455365600001&r=1&w=2
> >
> >
> > --Peter
> >
> > On 5/16/02 7:41 AM, "Eugene Gluzberg" <dr...@apache.org> wrote:
> >
> > > Wait a sec,
> > >
> > > NOT can be used with only one term.
> > >
> > > NOT "bye bye"
> > > is a legal query
> > >
> > > so is:
> > > -"bye bye"
> > >
> > >
> > > Either will find all documents which do not contain the phrase "bye
> > bye"
> > >
> > > I will write a test.
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> LAUNCH - Your Yahoo! Music Experience
> http://launch.yahoo.com
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Can NOT be used with a single Term?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I just tried:
 -foo
 NOT foo

Neither worked.

Otis

--- Peter Carlson <ca...@bookandhammer.com> wrote:
> According the Jguru FAQ
> 
> 
> Is it possible to find all documents in the index that do not contain
> a
> certain term? In other words, is it possible to make a query such as
> 'NOT
> <term>'? 
> 
> 
> No, it is not possible to do that with Lucene. Lucene could be
> modified to
> do that, but such queries would be very slow.
> c.f. http://marc.theaimsgroup.com/?t=100455365600001&r=1&w=2
> 
> 
> --Peter
> 
> On 5/16/02 7:41 AM, "Eugene Gluzberg" <dr...@apache.org> wrote:
> 
> > Wait a sec,
> > 
> > NOT can be used with only one term.
> > 
> > NOT "bye bye"
> > is a legal query
> > 
> > so is:
> > -"bye bye"
> > 
> > 
> > Either will find all documents which do not contain the phrase "bye
> bye"
> > 
> > I will write a test.
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Can NOT be used with a single Term?

Posted by Peter Carlson <ca...@bookandhammer.com>.
According the Jguru FAQ


Is it possible to find all documents in the index that do not contain a
certain term? In other words, is it possible to make a query such as 'NOT
<term>'? 


No, it is not possible to do that with Lucene. Lucene could be modified to
do that, but such queries would be very slow.
c.f. http://marc.theaimsgroup.com/?t=100455365600001&r=1&w=2


--Peter

On 5/16/02 7:41 AM, "Eugene Gluzberg" <dr...@apache.org> wrote:

> Wait a sec,
> 
> NOT can be used with only one term.
> 
> NOT "bye bye"
> is a legal query
> 
> so is:
> -"bye bye"
> 
> 
> Either will find all documents which do not contain the phrase "bye bye"
> 
> I will write a test.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: PLEASE REVIEW: Updated Query Parser Syntax

Posted by Eugene Gluzberg <dr...@apache.org>.
Wait a sec,

NOT can be used with only one term.

NOT "bye bye"
is a legal query

so is:
-"bye bye"


Either will find all documents which do not contain the phrase "bye bye"

I will write a test.

----- Original Message -----
From: "Peter Carlson" <ca...@bookandhammer.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Wednesday, May 15, 2002 4:14 PM
Subject: Re: PLEASE REVIEW: Updated Query Parser Syntax


> Good point. I do think there needs to be more clarification that NOT
cannot
> be used as the only term and find any results. That is you can say
>
>     NOT "bye bye" hello
>
> But you cannot say
>
>     NOT "bye bye"
>
> However, I think you you might be wrong in describing the way NOT works
with
> AND and OR.
>
> hello OR NOT "bye bye"
>
> Will find all documents with hello AND do not have "bye bye". That is the
> NOT is always based subtracting its finding from the other search results.
> It never subtracts from the complete set of documents in the index.
>
> So in practice
> hello AND NOT "bye bye" (+hello -"bye bye")
> hello OR NOT "bye bye" (hello -"bye bye")
> hello NOT "bye bye" (hello -"bye bye")
> NOT "bye bye" hello (-"bye bye" hello)
>
> Are all equivalent because there is just one term being found and one
phrase
> being removed. These are not equivalent in the general case.
>
> Does that make sense?
>
> --Peter
>
>
> On 5/15/02 8:26 AM, "Eugene Gluzberg" <dr...@apache.org> wrote:
>
> > Sorry, I should have been more clear.
> >
> > As far as I understand, NOT is an unary operator, and applies to the to
the
> > term that follows it.
> > So:
> > NOT "bye bye"
> > is a valid query and will find all documents that do not have the phrase
> > "bye bye"
> >
> > if you want to do find all documents that have the set difference
between
> > hello and "bye bye" you will have to use the query:
> >
> > hello AND NOT "bye bye"
> >
> > Also it follows that,
> > hello NOT "bye bye" is equivalent to:
> > hello OR NOT "bye bye"
> >
> > So the query hello NOT "bye bye" will find all documents that either
have
> > hello OR do not have "bye bye"
> >
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: PLEASE REVIEW: Updated Query Parser Syntax

Posted by Peter Carlson <ca...@bookandhammer.com>.
Good point. I do think there needs to be more clarification that NOT cannot
be used as the only term and find any results. That is you can say

    NOT "bye bye" hello

But you cannot say

    NOT "bye bye"

However, I think you you might be wrong in describing the way NOT works with
AND and OR.

hello OR NOT "bye bye"

Will find all documents with hello AND do not have "bye bye". That is the
NOT is always based subtracting its finding from the other search results.
It never subtracts from the complete set of documents in the index.

So in practice
hello AND NOT "bye bye" (+hello -"bye bye")
hello OR NOT "bye bye" (hello -"bye bye")
hello NOT "bye bye" (hello -"bye bye")
NOT "bye bye" hello (-"bye bye" hello)

Are all equivalent because there is just one term being found and one phrase
being removed. These are not equivalent in the general case.

Does that make sense?

--Peter


On 5/15/02 8:26 AM, "Eugene Gluzberg" <dr...@apache.org> wrote:

> Sorry, I should have been more clear.
> 
> As far as I understand, NOT is an unary operator, and applies to the to the
> term that follows it.
> So:
> NOT "bye bye"
> is a valid query and will find all documents that do not have the phrase
> "bye bye"
> 
> if you want to do find all documents that have the set difference between
> hello and "bye bye" you will have to use the query:
> 
> hello AND NOT "bye bye"
> 
> Also it follows that,
> hello NOT "bye bye" is equivalent to:
> hello OR NOT "bye bye"
> 
> So the query hello NOT "bye bye" will find all documents that either have
> hello OR do not have "bye bye"
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: PLEASE REVIEW: Updated Query Parser Syntax

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I believe you could simplify things by suggesting the alternative
syntax:

        +"Microsoft Word" -"Microsoft Excel"

I think this will get you all documents that have the first phrase, but
not the second one.

Otis


--- Eugene Gluzberg <dr...@apache.org> wrote:
> Sorry, I should have been more clear.
> 
> As far as I understand, NOT is an unary operator, and applies to the
> to the
> term that follows it.
> So:
> NOT "bye bye"
> is a valid query and will find all documents that do not have the
> phrase
> "bye bye"
> 
> if you want to do find all documents that have the set difference
> between
> hello and "bye bye" you will have to use the query:
> 
> hello AND NOT "bye bye"
> 
> Also it follows that,
> hello NOT "bye bye" is equivalent to:
> hello OR NOT "bye bye"
> 
> So the query hello NOT "bye bye" will find all documents that either
> have
> hello OR do not have "bye bye"
> 
> In your description you said:
>   The NOT operator excludes documents that contain the term after
> NOT. This
> is equivalent to a difference using sets. For example to search for
> documents that contain "Microsoft Word" but not "Microsoft Excel":
> 
>        "Microsoft Word" NOT "Microsoft Excel"
> 
> The correct way of doing that would be:
> "Microsoft Word" AND NOT "Microsoft Excel"
> 
> ----- Original Message -----
> From: "Peter Carlson" <ca...@bookandhammer.com>
> To: "Lucene Developers List" <lu...@jakarta.apache.org>
> Sent: Wednesday, May 15, 2002 10:43 AM
> Subject: Re: PLEASE REVIEW: Updated Query Parser Syntax
> 
> 
> > Do you think this should be stated more directly as an option?
> >
> > It seems like the "OR NOT" is more confusing.
> >
> > Or are you making the point that this is not "AND NOT" meaning that
> > "Microsoft Word" is not required?
> >
> > --Peter
> >
> >
> > On 5/15/02 7:30 AM, "Eugene Gluzberg" <dr...@apache.org> wrote:
> >
> > >
> > > "Microsoft Word" NOT "Microsoft Excel"
> > >
> > >
> > > My understanding of query parser this query would be the same as:
> > > "Microsoft Word" OR NOT "Microsoft Excel"
> > >
> > > Same with the -
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: PLEASE REVIEW: Updated Query Parser Syntax

Posted by Eugene Gluzberg <dr...@apache.org>.
Sorry, I should have been more clear.

As far as I understand, NOT is an unary operator, and applies to the to the
term that follows it.
So:
NOT "bye bye"
is a valid query and will find all documents that do not have the phrase
"bye bye"

if you want to do find all documents that have the set difference between
hello and "bye bye" you will have to use the query:

hello AND NOT "bye bye"

Also it follows that,
hello NOT "bye bye" is equivalent to:
hello OR NOT "bye bye"

So the query hello NOT "bye bye" will find all documents that either have
hello OR do not have "bye bye"

In your description you said:
  The NOT operator excludes documents that contain the term after NOT. This
is equivalent to a difference using sets. For example to search for
documents that contain "Microsoft Word" but not "Microsoft Excel":

       "Microsoft Word" NOT "Microsoft Excel"

The correct way of doing that would be:
"Microsoft Word" AND NOT "Microsoft Excel"

----- Original Message -----
From: "Peter Carlson" <ca...@bookandhammer.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Wednesday, May 15, 2002 10:43 AM
Subject: Re: PLEASE REVIEW: Updated Query Parser Syntax


> Do you think this should be stated more directly as an option?
>
> It seems like the "OR NOT" is more confusing.
>
> Or are you making the point that this is not "AND NOT" meaning that
> "Microsoft Word" is not required?
>
> --Peter
>
>
> On 5/15/02 7:30 AM, "Eugene Gluzberg" <dr...@apache.org> wrote:
>
> >
> > "Microsoft Word" NOT "Microsoft Excel"
> >
> >
> > My understanding of query parser this query would be the same as:
> > "Microsoft Word" OR NOT "Microsoft Excel"
> >
> > Same with the -
> >
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: PLEASE REVIEW: Updated Query Parser Syntax

Posted by Peter Carlson <ca...@bookandhammer.com>.
Do you think this should be stated more directly as an option?

It seems like the "OR NOT" is more confusing.

Or are you making the point that this is not "AND NOT" meaning that
"Microsoft Word" is not required?

--Peter


On 5/15/02 7:30 AM, "Eugene Gluzberg" <dr...@apache.org> wrote:

> 
> "Microsoft Word" NOT "Microsoft Excel"
> 
> 
> My understanding of query parser this query would be the same as:
> "Microsoft Word" OR NOT "Microsoft Excel"
> 
> Same with the -
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: PLEASE REVIEW: Updated Query Parser Syntax

Posted by Eugene Gluzberg <dr...@apache.org>.
 "Microsoft Word" NOT "Microsoft Excel"


My understanding of query parser this query would be the same as:
 "Microsoft Word" OR NOT "Microsoft Excel"

Same with the -



----- Original Message -----
From: "Peter Carlson" <ca...@bookandhammer.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Wednesday, May 15, 2002 9:57 AM
Subject: PLEASE REVIEW: Updated Query Parser Syntax


> Hi,
> I have updated the query parser syntax document and put it up on the
website
> (without any  links).
>
> http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
>
> Please review it and send feedback.
>
> Current questions:
>
> The current queryParser supports range searches, but can you put a date
into
> the query parser? Are range searches only used for searching dates?
>
> --Peter
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>