You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2007/02/01 21:57:33 UTC

bad queryparser bug

I have discovered a serious bug in QueryParser. The following query:
contents:sales && contents:marketing || contents:industrial &&
contents:sales

is parsed as:
+contents:sales +contents:marketing +contents:industrial +contents:sales

The same parsed query occurs even with parenthesis:
(contents:sales && contents:marketing) || (contents:industrial &&
contents:sales)

Is there any way around this bug?

Thanks,
Peter

Re: bad queryparser bug

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 1, 2007, at 5:03 PM, Peter Keegan wrote:
> OK, I see that I'm not the first to discover this behavior of  
> QueryParser.
> Can anyone vouch for the integrity of the PrecedenceQueryParser here:
>
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/ 
> miscellaneous/src/java/org/apache/lucene/queryParser/precedence/

PrecedenceQueryParser was my tinkering attempt to make it more  
logically handle precedence.  I don't recall the exact issues that  
occur, though a JIRA issue was just filed with one:

    <https://issues.apache.org/jira/browse/LUCENE-792>
    "NOT foo AND baz" is parsed as "-(+foo +baz)" instead of "-foo  
+bar".
    (I'm setting parser.setDefaultOperator 
(PrecedenceQueryParser.AND_OPERATOR) but the issue applies otherwise  
too.)

I believe the test case points out some potential issues.  In other  
words, PrecedenceQueryParser is a work-in-progress that I no longer  
am working on myself.  Improvements to it welcome.  Query parsing is  
tricky business!

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: bad queryparser bug

Posted by Mark Miller <ma...@gmail.com>.
This is a ton of discussion on this if you search the lucene user list 
(QueryParser and precendence and the 'binary' operators). I have seen 
many mentions of the precedence parser still having open issues but no 
mention of what those issues are.

Peter Keegan wrote:
> OK, I see that I'm not the first to discover this behavior of 
> QueryParser.
> Can anyone vouch for the integrity of the PrecedenceQueryParser here:
>
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/queryParser/precedence/ 
>
>
> Thanks,
> Peter
>
> On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>>
>> Correction:
>>
>> The query parser produces the correct query with the parenthesis.
>> But, I'm still looking for a fix for this. I could use some advice on
>> where to look in QueryParser to fix this.
>>
>> Thanks,
>> Peter
>>
>> On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>> >
>> > I have discovered a serious bug in QueryParser. The following query:
>> > contents:sales && contents:marketing || contents:industrial &&
>> > contents:sales
>> >
>> > is parsed as:
>> > +contents:sales +contents:marketing +contents:industrial 
>> +contents:sales
>> >
>> >
>> > The same parsed query occurs even with parenthesis:
>> > (contents:sales && contents:marketing) || (contents:industrial &&
>> > contents:sales)
>> >
>> > Is there any way around this bug?
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: bad queryparser bug

Posted by Peter Keegan <pe...@gmail.com>.
OK, I see that I'm not the first to discover this behavior of QueryParser.
Can anyone vouch for the integrity of the PrecedenceQueryParser here:

http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/queryParser/precedence/

Thanks,
Peter

On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>
> Correction:
>
> The query parser produces the correct query with the parenthesis.
> But, I'm still looking for a fix for this. I could use some advice on
> where to look in QueryParser to fix this.
>
> Thanks,
> Peter
>
> On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
> >
> > I have discovered a serious bug in QueryParser. The following query:
> > contents:sales && contents:marketing || contents:industrial &&
> > contents:sales
> >
> > is parsed as:
> > +contents:sales +contents:marketing +contents:industrial +contents:sales
> >
> >
> > The same parsed query occurs even with parenthesis:
> > (contents:sales && contents:marketing) || (contents:industrial &&
> > contents:sales)
> >
> > Is there any way around this bug?
> >
> > Thanks,
> > Peter
> >
> >
>

Re: bad queryparser bug

Posted by Peter Keegan <pe...@gmail.com>.
OK, I see that I'm not the first to discover this behavior of QueryParser.
Can anyone vouch for the integrity of the PrecedenceQueryParser here:

http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/queryParser/precedence/

Thanks,
Peter

On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>
> Correction:
>
> The query parser produces the correct query with the parenthesis.
> But, I'm still looking for a fix for this. I could use some advice on
> where to look in QueryParser to fix this.
>
> Thanks,
> Peter
>
> On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
> >
> > I have discovered a serious bug in QueryParser. The following query:
> > contents:sales && contents:marketing || contents:industrial &&
> > contents:sales
> >
> > is parsed as:
> > +contents:sales +contents:marketing +contents:industrial +contents:sales
> >
> >
> > The same parsed query occurs even with parenthesis:
> > (contents:sales && contents:marketing) || (contents:industrial &&
> > contents:sales)
> >
> > Is there any way around this bug?
> >
> > Thanks,
> > Peter
> >
> >
>

Re: bad queryparser bug

Posted by Peter Keegan <pe...@gmail.com>.
> (If i could go back in time and stop the AND/OR/NOT/&&/|| "aliases" from
> being added to the QueryParser -- i would)

Yes, this is the cause of the confusion. Our users are accustomed to the
boolean logic syntax from a legacy search engine (also common to many other
engines). We'll have to convert them into native QueryParser syntax as
possible.

Sorry for the cross post.

Thanks,
Peter

On 2/2/07, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : The query parser produces the correct query with the parenthesis.
> : But, I'm still looking for a fix for this. I could use some advice on
> where
> : to look in QueryParser to fix this.
>
> the best advice i can give you: don't use the binary operators.
>
>   * Lucene is not a boolean logic system
>   * BooleanQuery does not impliment boolean logic
>   * QueryParser is not a boolean language parser
>
> (If i could go back in time and stop the AND/OR/NOT/&&/|| "aliases" from
> being added to the QueryParser -- i would)
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: bad queryparser bug

Posted by Chris Hostetter <ho...@fucit.org>.
: The query parser produces the correct query with the parenthesis.
: But, I'm still looking for a fix for this. I could use some advice on where
: to look in QueryParser to fix this.

the best advice i can give you: don't use the binary operators.

  * Lucene is not a boolean logic system
  * BooleanQuery does not impliment boolean logic
  * QueryParser is not a boolean language parser

(If i could go back in time and stop the AND/OR/NOT/&&/|| "aliases" from
being added to the QueryParser -- i would)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: bad queryparser bug

Posted by Peter Keegan <pe...@gmail.com>.
Correction:

The query parser produces the correct query with the parenthesis.
But, I'm still looking for a fix for this. I could use some advice on where
to look in QueryParser to fix this.

Thanks,
Peter

On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>
> I have discovered a serious bug in QueryParser. The following query:
> contents:sales && contents:marketing || contents:industrial &&
> contents:sales
>
> is parsed as:
> +contents:sales +contents:marketing +contents:industrial +contents:sales
>
> The same parsed query occurs even with parenthesis:
> (contents:sales && contents:marketing) || (contents:industrial &&
> contents:sales)
>
> Is there any way around this bug?
>
> Thanks,
> Peter
>
>

Re: bad queryparser bug

Posted by Peter Keegan <pe...@gmail.com>.
Correction:

The query parser produces the correct query with the parenthesis.
But, I'm still looking for a fix for this. I could use some advice on where
to look in QueryParser to fix this.

Thanks,
Peter

On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>
> I have discovered a serious bug in QueryParser. The following query:
> contents:sales && contents:marketing || contents:industrial &&
> contents:sales
>
> is parsed as:
> +contents:sales +contents:marketing +contents:industrial +contents:sales
>
> The same parsed query occurs even with parenthesis:
> (contents:sales && contents:marketing) || (contents:industrial &&
> contents:sales)
>
> Is there any way around this bug?
>
> Thanks,
> Peter
>
>

Re: bad queryparser bug

Posted by Chris Hostetter <ho...@fucit.org>.
please do not cross post questions about using the Lucene API to both the
user and dev mailing lists -- the user list is the correct place to ask
questions about behavior you are seeing that you think may be a bug.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: trouble with permissions?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Miles Efron wrote:
> I really don't know why os x could have induced those kinds of 
> filesystem issues.  i assumed that since i had switched over to the 
> intel architecture that perhaps something was going on with the 
> JVM...everything involved in the process was mac; local filesystem, etc.
> 
> but i'm fairly sure that the trunk code has fixed the problem.  i ran 
> two 'offending' bits of code and checked their results.  not only did 
> they finish (quite a feat today), but they did so correctly.

OK I will keep my fingers crossed that there isn't another issue
lurking :)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: trouble with permissions?

Posted by Miles Efron <me...@ibiblio.org>.
I really don't know why os x could have induced those kinds of  
filesystem issues.  i assumed that since i had switched over to the  
intel architecture that perhaps something was going on with the  
JVM...everything involved in the process was mac; local filesystem, etc.

but i'm fairly sure that the trunk code has fixed the problem.  i ran  
two 'offending' bits of code and checked their results.  not only did  
they finish (quite a feat today), but they did so correctly.

-Miles

On Feb 1, 2007, at 4:19 PM, Michael McCandless wrote:

> Miles Efron wrote:
>
>> You rule.  Swapping out the nightly build seems to have fixed the  
>> problem... tried it on two problematic cases and both worked.
>
> Phew!
>
>> For the record, I'm running mac os 10.4.8.
>
> Uh-oh, I can't explain why you would hit these errors on OS X 10.4.8;
> we have only seen these one Windows.
>
> Are you sure switching to trunk has fixed it?  Lockless commits makes
> Lucene "write once" so this works around a number of file system
> "quirks".  Still it'd be good to get to your root cause.
>
> Is the index stored on a remote (Windows CIFS) mount?  Or is it stored
> on a local (Mac OS HFS+) drive?
>
>> Do you know if the lockless commits will be included in the next  
>> stable release?
>
> Yes this will be included in 2.1 -- I think 2.1 will be released soon
> (there's been discussions on the dev list to get the release process
> started soon).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: trouble with permissions?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Miles Efron wrote:

> You rule.  Swapping out the nightly build seems to have fixed the 
> problem... tried it on two problematic cases and both worked.

Phew!

> For the record, I'm running mac os 10.4.8.

Uh-oh, I can't explain why you would hit these errors on OS X 10.4.8;
we have only seen these one Windows.

Are you sure switching to trunk has fixed it?  Lockless commits makes
Lucene "write once" so this works around a number of file system
"quirks".  Still it'd be good to get to your root cause.

Is the index stored on a remote (Windows CIFS) mount?  Or is it stored
on a local (Mac OS HFS+) drive?

> Do you know if the lockless commits will be included in the next stable 
> release?

Yes this will be included in 2.1 -- I think 2.1 will be released soon
(there's been discussions on the dev list to get the release process
started soon).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: trouble with permissions?

Posted by Miles Efron <me...@ibiblio.org>.
Mike,

You rule.  Swapping out the nightly build seems to have fixed the  
problem... tried it on two problematic cases and both worked.

For the record, I'm running mac os 10.4.8.

Do you know if the lockless commits will be included in the next  
stable release?

Thanks so much!
-Miles

On Feb 1, 2007, at 3:33 PM, Michael McCandless wrote:

> Miles Efron wrote:
>> i seem to be having a problem analogous to this one (no answer  
>> that i see):
>>     http://www.gossamer-threads.com/lists/lucene/java-user/32268? 
>> search_string=cannot%20overwrite;#32268 trouble is, i just put  
>> lucene on my new macbook pro and am having the problem that if i  
>> build a large index, i get an I/O error due to something like
>>     java.io.IOException: Cannot overwrite: /data/reuters/indexes/ 
>> reuters/deleteable.new
>> same code worked fine on my previous machine (still running on a  
>> G4 powerbook and a linux machine).  sometimes it has trouble  
>> writing the segments file instead...
>> has anyone seen and solved this problem?  thoughts on what might  
>> be behind it?
>
> Are you running Windows on your macbook pro?
>
> There are known issues like this, but only on Windows, eg:
>
>   http://issues.apache.org/jira/browse/LUCENE-665
>
> We believe such cases are now fixed by lockless commits, on the trunk
> of Lucene (which is not yet released).  If you could try the trunk
> (but beware that API, file formats, can change) and see if this still
> happens that'd be great!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: trouble with permissions?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Miles Efron wrote:
> i seem to be having a problem analogous to this one (no answer that i see):
> 
>     http://www.gossamer-threads.com/lists/lucene/java-user/32268?search_string=cannot%20overwrite;#32268 
> 
> 
> trouble is, i just put lucene on my new macbook pro and am having the 
> problem that if i build a large index, i get an I/O error due to 
> something like
> 
>     java.io.IOException: Cannot overwrite: 
> /data/reuters/indexes/reuters/deleteable.new
> 
> same code worked fine on my previous machine (still running on a G4 
> powerbook and a linux machine).  sometimes it has trouble writing the 
> segments file instead...
> 
> has anyone seen and solved this problem?  thoughts on what might be 
> behind it?

Are you running Windows on your macbook pro?

There are known issues like this, but only on Windows, eg:

   http://issues.apache.org/jira/browse/LUCENE-665

We believe such cases are now fixed by lockless commits, on the trunk
of Lucene (which is not yet released).  If you could try the trunk
(but beware that API, file formats, can change) and see if this still
happens that'd be great!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


trouble with permissions?

Posted by Miles Efron <me...@ibiblio.org>.
i seem to be having a problem analogous to this one (no answer that i  
see):

	http://www.gossamer-threads.com/lists/lucene/java-user/32268? 
search_string=cannot%20overwrite;#32268

trouble is, i just put lucene on my new macbook pro and am having the  
problem that if i build a large index, i get an I/O error due to  
something like

	java.io.IOException: Cannot overwrite: /data/reuters/indexes/reuters/ 
deleteable.new

same code worked fine on my previous machine (still running on a G4  
powerbook and a linux machine).  sometimes it has trouble writing the  
segments file instead...

has anyone seen and solved this problem?  thoughts on what might be  
behind it?

thanks,
-Miles

On Feb 1, 2007, at 2:57 PM, Peter Keegan wrote:

> I have discovered a serious bug in QueryParser. The following query:
> contents:sales && contents:marketing || contents:industrial &&
> contents:sales
>
> is parsed as:
> +contents:sales +contents:marketing +contents:industrial  
> +contents:sales
>
> The same parsed query occurs even with parenthesis:
> (contents:sales && contents:marketing) || (contents:industrial &&
> contents:sales)
>
> Is there any way around this bug?
>
> Thanks,
> Peter