You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lance Lance <go...@gmail.com> on 2007/07/13 02:58:28 UTC

Question on query syntax

Are there any known bugs in the syntax parser? We're using lucene-2.2.0 and
Solr 1.2.
 
We have documents with searchable text and a field 'collection'.
 
This query works as expected, finding everything except for collections
'pile1' and 'pile2'.
 
    text -(collection:pile1 OR collection:pile2)
 
When we apply De Morgan's Law, we get 0 records:
 
    text (-collection:pile1 AND -collection:pile2)
 
This should return all records, but it returns nothing:
 
    text (-collection:pile1 OR -collection:pile2)
 
Thanks,
 
Lance
 

Re: Question on query syntax

Posted by Chris Hostetter <ho...@fucit.org>.
: Solr can process the query which has NOT operator ("-") in the head.
: If Solr find it, Solr adds MatchAllDocsQuery automatically
: in the front of that query as follows:

thta's not strictly true ... Solr doesn't *add* a MatchAllDocsQuery if the
query is entirely prohibitive, instead Solr executes a MatchAllDocsQuery
and then filters that by the DocSet returned by the "absolute value" of
the orriginal query.

the end result should be functionaly equivilent, but this approach caches
better (both "-text:foo" and "text:foo" are cached the same) ... the
downside is that the debuging info for purely prohibitive queries is
currently incorrect (see SOLR-119)



-Hoss


Re: Question on query syntax

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Lance,

I think you are right. I met the same problem before.

 > -(collection:pile1 OR collection:pile2)

Solr can process the query which has NOT operator ("-") in the head.
If Solr find it, Solr adds MatchAllDocsQuery automatically
in the front of that query as follows:

MatchAllDocsQuery -(collection:pile1 OR collection:pile2)

Then Lucene can process this query properly.

However, Solr doesn't add MatchAllDocsQuery if the query doesn't
have NOT operator in the head.

To avoid this problem, you can add "*:*" at the front of your query:

(*:* -collection:pile1 AND -collection:pile2)
(*:* -collection:pile1 OR -collection:pile2)

Thank you,

Koji


Lance Lance wrote:
> Are there any known bugs in the syntax parser? We're using lucene-2.2.0 and
> Solr 1.2.
>  
> We have documents with searchable text and a field 'collection'.
>  
> This query works as expected, finding everything except for collections
> 'pile1' and 'pile2'.
>  
>     text -(collection:pile1 OR collection:pile2)
>  
> When we apply De Morgan's Law, we get 0 records:
>  
>     text (-collection:pile1 AND -collection:pile2)
>  
> This should return all records, but it returns nothing:
>  
>     text (-collection:pile1 OR -collection:pile2)
>  
> Thanks,
>  
> Lance
>  
>
>   


Re: Question on query syntax

Posted by Mike Klaas <mi...@gmail.com>.
On 12-Jul-07, at 5:58 PM, Lance Lance wrote:

> Are there any known bugs in the syntax parser? We're using  
> lucene-2.2.0 and
> Solr 1.2.
>
> We have documents with searchable text and a field 'collection'.
>
> This query works as expected, finding everything except for  
> collections
> 'pile1' and 'pile2'.
>
>     text -(collection:pile1 OR collection:pile2)
>
> When we apply De Morgan's Law, we get 0 records:
>
>     text (-collection:pile1 AND -collection:pile2)
>
> This should return all records, but it returns nothing:
>
>     text (-collection:pile1 OR -collection:pile2)

Lucene's "boolean" operators are not true boolean operators.   
Instead, every clause is one of:

OPTIONAL
REQUIRED
PROHIBITED

for a query (or parenthesized subqueries) to match, all REQUIRED  
clauses must match, zero PROHIBITED clauses must match, and if there  
are not REQUIRED clauses, at least one OPTIONAL must match.  You  
cannot have only PROHIBITED clauses.

Now, the syntax for each is (nothing), +, -, and they can be applied  
to entire subqueries using brackets:

+hello -(goodbye -night)

returns docs that have hello, and do not have (goodbye without night)

In lucene, AND/OR/NOT are syntactic sugar that translates clauses to  
the above form.  However, it imperfectly matches people's (rational)  
expectations of how boolean operators work.  Also, brackets _create  
subqueries_, not just group operators.  I suggest that AND and OR  
never be used programmatically, if possible.

Try these alternatives:

docs (must) containing 'text' that do not match (col=pile1 or col=pile2)
>     text -(collection:pile1 collection:pile2)

same as above
>     text -collection:pile1 -collection:pile2

docs (must) contain 'text' that (must) match (col=pile1 or col=pile2)
>     +text +(collection:pile1 collection:pile2)

Note in the last example, the + is necessary before the text because  
otherwise it would be optional and not required (as there are other  
required clauses).

-Mike