You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2007/02/01 21:57:33 UTC
bad queryparser bug
I have discovered a serious bug in QueryParser. The following query:
contents:sales && contents:marketing || contents:industrial &&
contents:sales
is parsed as:
+contents:sales +contents:marketing +contents:industrial +contents:sales
The same parsed query occurs even with parenthesis:
(contents:sales && contents:marketing) || (contents:industrial &&
contents:sales)
Is there any way around this bug?
Thanks,
Peter
Re: bad queryparser bug
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 1, 2007, at 5:03 PM, Peter Keegan wrote:
> OK, I see that I'm not the first to discover this behavior of
> QueryParser.
> Can anyone vouch for the integrity of the PrecedenceQueryParser here:
>
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/
> miscellaneous/src/java/org/apache/lucene/queryParser/precedence/
PrecedenceQueryParser was my tinkering attempt to make it more
logically handle precedence. I don't recall the exact issues that
occur, though a JIRA issue was just filed with one:
<https://issues.apache.org/jira/browse/LUCENE-792>
"NOT foo AND baz" is parsed as "-(+foo +baz)" instead of "-foo
+bar".
(I'm setting parser.setDefaultOperator
(PrecedenceQueryParser.AND_OPERATOR) but the issue applies otherwise
too.)
I believe the test case points out some potential issues. In other
words, PrecedenceQueryParser is a work-in-progress that I no longer
am working on myself. Improvements to it welcome. Query parsing is
tricky business!
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: bad queryparser bug
Posted by Mark Miller <ma...@gmail.com>.
This is a ton of discussion on this if you search the lucene user list
(QueryParser and precendence and the 'binary' operators). I have seen
many mentions of the precedence parser still having open issues but no
mention of what those issues are.
Peter Keegan wrote:
> OK, I see that I'm not the first to discover this behavior of
> QueryParser.
> Can anyone vouch for the integrity of the PrecedenceQueryParser here:
>
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/queryParser/precedence/
>
>
> Thanks,
> Peter
>
> On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>>
>> Correction:
>>
>> The query parser produces the correct query with the parenthesis.
>> But, I'm still looking for a fix for this. I could use some advice on
>> where to look in QueryParser to fix this.
>>
>> Thanks,
>> Peter
>>
>> On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>> >
>> > I have discovered a serious bug in QueryParser. The following query:
>> > contents:sales && contents:marketing || contents:industrial &&
>> > contents:sales
>> >
>> > is parsed as:
>> > +contents:sales +contents:marketing +contents:industrial
>> +contents:sales
>> >
>> >
>> > The same parsed query occurs even with parenthesis:
>> > (contents:sales && contents:marketing) || (contents:industrial &&
>> > contents:sales)
>> >
>> > Is there any way around this bug?
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: bad queryparser bug
Posted by Peter Keegan <pe...@gmail.com>.
OK, I see that I'm not the first to discover this behavior of QueryParser.
Can anyone vouch for the integrity of the PrecedenceQueryParser here:
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/queryParser/precedence/
Thanks,
Peter
On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>
> Correction:
>
> The query parser produces the correct query with the parenthesis.
> But, I'm still looking for a fix for this. I could use some advice on
> where to look in QueryParser to fix this.
>
> Thanks,
> Peter
>
> On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
> >
> > I have discovered a serious bug in QueryParser. The following query:
> > contents:sales && contents:marketing || contents:industrial &&
> > contents:sales
> >
> > is parsed as:
> > +contents:sales +contents:marketing +contents:industrial +contents:sales
> >
> >
> > The same parsed query occurs even with parenthesis:
> > (contents:sales && contents:marketing) || (contents:industrial &&
> > contents:sales)
> >
> > Is there any way around this bug?
> >
> > Thanks,
> > Peter
> >
> >
>
Re: bad queryparser bug
Posted by Peter Keegan <pe...@gmail.com>.
OK, I see that I'm not the first to discover this behavior of QueryParser.
Can anyone vouch for the integrity of the PrecedenceQueryParser here:
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/queryParser/precedence/
Thanks,
Peter
On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>
> Correction:
>
> The query parser produces the correct query with the parenthesis.
> But, I'm still looking for a fix for this. I could use some advice on
> where to look in QueryParser to fix this.
>
> Thanks,
> Peter
>
> On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
> >
> > I have discovered a serious bug in QueryParser. The following query:
> > contents:sales && contents:marketing || contents:industrial &&
> > contents:sales
> >
> > is parsed as:
> > +contents:sales +contents:marketing +contents:industrial +contents:sales
> >
> >
> > The same parsed query occurs even with parenthesis:
> > (contents:sales && contents:marketing) || (contents:industrial &&
> > contents:sales)
> >
> > Is there any way around this bug?
> >
> > Thanks,
> > Peter
> >
> >
>
Re: bad queryparser bug
Posted by Peter Keegan <pe...@gmail.com>.
> (If i could go back in time and stop the AND/OR/NOT/&&/|| "aliases" from
> being added to the QueryParser -- i would)
Yes, this is the cause of the confusion. Our users are accustomed to the
boolean logic syntax from a legacy search engine (also common to many other
engines). We'll have to convert them into native QueryParser syntax as
possible.
Sorry for the cross post.
Thanks,
Peter
On 2/2/07, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : The query parser produces the correct query with the parenthesis.
> : But, I'm still looking for a fix for this. I could use some advice on
> where
> : to look in QueryParser to fix this.
>
> the best advice i can give you: don't use the binary operators.
>
> * Lucene is not a boolean logic system
> * BooleanQuery does not impliment boolean logic
> * QueryParser is not a boolean language parser
>
> (If i could go back in time and stop the AND/OR/NOT/&&/|| "aliases" from
> being added to the QueryParser -- i would)
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: bad queryparser bug
Posted by Chris Hostetter <ho...@fucit.org>.
: The query parser produces the correct query with the parenthesis.
: But, I'm still looking for a fix for this. I could use some advice on where
: to look in QueryParser to fix this.
the best advice i can give you: don't use the binary operators.
* Lucene is not a boolean logic system
* BooleanQuery does not impliment boolean logic
* QueryParser is not a boolean language parser
(If i could go back in time and stop the AND/OR/NOT/&&/|| "aliases" from
being added to the QueryParser -- i would)
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: bad queryparser bug
Posted by Peter Keegan <pe...@gmail.com>.
Correction:
The query parser produces the correct query with the parenthesis.
But, I'm still looking for a fix for this. I could use some advice on where
to look in QueryParser to fix this.
Thanks,
Peter
On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>
> I have discovered a serious bug in QueryParser. The following query:
> contents:sales && contents:marketing || contents:industrial &&
> contents:sales
>
> is parsed as:
> +contents:sales +contents:marketing +contents:industrial +contents:sales
>
> The same parsed query occurs even with parenthesis:
> (contents:sales && contents:marketing) || (contents:industrial &&
> contents:sales)
>
> Is there any way around this bug?
>
> Thanks,
> Peter
>
>
Re: bad queryparser bug
Posted by Peter Keegan <pe...@gmail.com>.
Correction:
The query parser produces the correct query with the parenthesis.
But, I'm still looking for a fix for this. I could use some advice on where
to look in QueryParser to fix this.
Thanks,
Peter
On 2/1/07, Peter Keegan <pe...@gmail.com> wrote:
>
> I have discovered a serious bug in QueryParser. The following query:
> contents:sales && contents:marketing || contents:industrial &&
> contents:sales
>
> is parsed as:
> +contents:sales +contents:marketing +contents:industrial +contents:sales
>
> The same parsed query occurs even with parenthesis:
> (contents:sales && contents:marketing) || (contents:industrial &&
> contents:sales)
>
> Is there any way around this bug?
>
> Thanks,
> Peter
>
>
Re: bad queryparser bug
Posted by Chris Hostetter <ho...@fucit.org>.
please do not cross post questions about using the Lucene API to both the
user and dev mailing lists -- the user list is the correct place to ask
questions about behavior you are seeing that you think may be a bug.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: trouble with permissions?
Posted by Michael McCandless <lu...@mikemccandless.com>.
Miles Efron wrote:
> I really don't know why os x could have induced those kinds of
> filesystem issues. i assumed that since i had switched over to the
> intel architecture that perhaps something was going on with the
> JVM...everything involved in the process was mac; local filesystem, etc.
>
> but i'm fairly sure that the trunk code has fixed the problem. i ran
> two 'offending' bits of code and checked their results. not only did
> they finish (quite a feat today), but they did so correctly.
OK I will keep my fingers crossed that there isn't another issue
lurking :)
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: trouble with permissions?
Posted by Miles Efron <me...@ibiblio.org>.
I really don't know why os x could have induced those kinds of
filesystem issues. i assumed that since i had switched over to the
intel architecture that perhaps something was going on with the
JVM...everything involved in the process was mac; local filesystem, etc.
but i'm fairly sure that the trunk code has fixed the problem. i ran
two 'offending' bits of code and checked their results. not only did
they finish (quite a feat today), but they did so correctly.
-Miles
On Feb 1, 2007, at 4:19 PM, Michael McCandless wrote:
> Miles Efron wrote:
>
>> You rule. Swapping out the nightly build seems to have fixed the
>> problem... tried it on two problematic cases and both worked.
>
> Phew!
>
>> For the record, I'm running mac os 10.4.8.
>
> Uh-oh, I can't explain why you would hit these errors on OS X 10.4.8;
> we have only seen these one Windows.
>
> Are you sure switching to trunk has fixed it? Lockless commits makes
> Lucene "write once" so this works around a number of file system
> "quirks". Still it'd be good to get to your root cause.
>
> Is the index stored on a remote (Windows CIFS) mount? Or is it stored
> on a local (Mac OS HFS+) drive?
>
>> Do you know if the lockless commits will be included in the next
>> stable release?
>
> Yes this will be included in 2.1 -- I think 2.1 will be released soon
> (there's been discussions on the dev list to get the release process
> started soon).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: trouble with permissions?
Posted by Michael McCandless <lu...@mikemccandless.com>.
Miles Efron wrote:
> You rule. Swapping out the nightly build seems to have fixed the
> problem... tried it on two problematic cases and both worked.
Phew!
> For the record, I'm running mac os 10.4.8.
Uh-oh, I can't explain why you would hit these errors on OS X 10.4.8;
we have only seen these one Windows.
Are you sure switching to trunk has fixed it? Lockless commits makes
Lucene "write once" so this works around a number of file system
"quirks". Still it'd be good to get to your root cause.
Is the index stored on a remote (Windows CIFS) mount? Or is it stored
on a local (Mac OS HFS+) drive?
> Do you know if the lockless commits will be included in the next stable
> release?
Yes this will be included in 2.1 -- I think 2.1 will be released soon
(there's been discussions on the dev list to get the release process
started soon).
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: trouble with permissions?
Posted by Miles Efron <me...@ibiblio.org>.
Mike,
You rule. Swapping out the nightly build seems to have fixed the
problem... tried it on two problematic cases and both worked.
For the record, I'm running mac os 10.4.8.
Do you know if the lockless commits will be included in the next
stable release?
Thanks so much!
-Miles
On Feb 1, 2007, at 3:33 PM, Michael McCandless wrote:
> Miles Efron wrote:
>> i seem to be having a problem analogous to this one (no answer
>> that i see):
>> http://www.gossamer-threads.com/lists/lucene/java-user/32268?
>> search_string=cannot%20overwrite;#32268 trouble is, i just put
>> lucene on my new macbook pro and am having the problem that if i
>> build a large index, i get an I/O error due to something like
>> java.io.IOException: Cannot overwrite: /data/reuters/indexes/
>> reuters/deleteable.new
>> same code worked fine on my previous machine (still running on a
>> G4 powerbook and a linux machine). sometimes it has trouble
>> writing the segments file instead...
>> has anyone seen and solved this problem? thoughts on what might
>> be behind it?
>
> Are you running Windows on your macbook pro?
>
> There are known issues like this, but only on Windows, eg:
>
> http://issues.apache.org/jira/browse/LUCENE-665
>
> We believe such cases are now fixed by lockless commits, on the trunk
> of Lucene (which is not yet released). If you could try the trunk
> (but beware that API, file formats, can change) and see if this still
> happens that'd be great!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: trouble with permissions?
Posted by Michael McCandless <lu...@mikemccandless.com>.
Miles Efron wrote:
> i seem to be having a problem analogous to this one (no answer that i see):
>
> http://www.gossamer-threads.com/lists/lucene/java-user/32268?search_string=cannot%20overwrite;#32268
>
>
> trouble is, i just put lucene on my new macbook pro and am having the
> problem that if i build a large index, i get an I/O error due to
> something like
>
> java.io.IOException: Cannot overwrite:
> /data/reuters/indexes/reuters/deleteable.new
>
> same code worked fine on my previous machine (still running on a G4
> powerbook and a linux machine). sometimes it has trouble writing the
> segments file instead...
>
> has anyone seen and solved this problem? thoughts on what might be
> behind it?
Are you running Windows on your macbook pro?
There are known issues like this, but only on Windows, eg:
http://issues.apache.org/jira/browse/LUCENE-665
We believe such cases are now fixed by lockless commits, on the trunk
of Lucene (which is not yet released). If you could try the trunk
(but beware that API, file formats, can change) and see if this still
happens that'd be great!
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
trouble with permissions?
Posted by Miles Efron <me...@ibiblio.org>.
i seem to be having a problem analogous to this one (no answer that i
see):
http://www.gossamer-threads.com/lists/lucene/java-user/32268?
search_string=cannot%20overwrite;#32268
trouble is, i just put lucene on my new macbook pro and am having the
problem that if i build a large index, i get an I/O error due to
something like
java.io.IOException: Cannot overwrite: /data/reuters/indexes/reuters/
deleteable.new
same code worked fine on my previous machine (still running on a G4
powerbook and a linux machine). sometimes it has trouble writing the
segments file instead...
has anyone seen and solved this problem? thoughts on what might be
behind it?
thanks,
-Miles
On Feb 1, 2007, at 2:57 PM, Peter Keegan wrote:
> I have discovered a serious bug in QueryParser. The following query:
> contents:sales && contents:marketing || contents:industrial &&
> contents:sales
>
> is parsed as:
> +contents:sales +contents:marketing +contents:industrial
> +contents:sales
>
> The same parsed query occurs even with parenthesis:
> (contents:sales && contents:marketing) || (contents:industrial &&
> contents:sales)
>
> Is there any way around this bug?
>
> Thanks,
> Peter