You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Simon Wistow <si...@thegestalt.org> on 2011/04/25 22:27:49 UTC

Negative OR in fq field not working as expected

I have a field 'type' that has several values. If it's type 'foo' then 
it also has a field 'restriction_id'.

What I want is a filter query which says "either it's not a 'foo' or if 
it is then it has the restriction '1'"

I expect two matches - one of type 'bar' and one of type 'foo' 

Neither

 fq=(-type:foo OR restriction_id:1)
 fq={!dismax q.op=OR}-type:foo restriction_id:1

produce any results.

 fq=restriction_id:1

gets the 'foo' typed result.

 fq=type:bar 

get the 'bar' typed result.

Either of these

  fq=type:[* TO *] OR (type:foo AND restriction_id:1)
  fq=type:(bar OR quux OR fleeg) OR restriction_id:1

do work but are very, very slow to the point of unusability (our indexes 
are pretty large).

Searching round it seems like other people have experienced similar 
issues and the answer has been "Lucene just doesn't work like that"

"When dealing with Lucene people are strongly encouraged to think in 
terms of MUST, MUST_NOT and SHOULD (which are represented in the query 
parser as the prefixes "+", "-" and the default) instead of in terms of 
AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's 
QueryParser) is not a strict Boolean Logic system, so it's best not to 
try and think of it like one."

  http://wiki.apache.org/lucene-java/BooleanQuerySyntax

Am I just out of luck? Might edismax help here?

Simon

Re: Negative OR in fq field not working as expected

Posted by Jonathan Rochkind <ro...@jhu.edu>.

Yeah, I do the (*:* AND -type:foo) OR something:else

thing on my own pretty big index, and it's not slow at all.  At least no 
slower than doing any other "X OR Y" where X and Y both include lots of 
results.

Pre-warming the field cache for, in this case, the 'type' field may 
help. Same as it would if 'X' were just "type:bar" (not negated) where 
"type:bar" matched about the same number or documents as "-type:foo" 
does in your case.  In general, there's nothing special that should make 
that slow, it's a pretty ordinary query, really. Just using weird syntax 
to get around lucene query parser  issues.

[Obligatory mention: This may have nothing to do with your issue, but I 
have found occasions where not having enough RAM allocated to Solr 1.4.1 
can make things terribly slow, even though there is no OutOfMemory error 
or other error in the logs. Especially if you are doing facetting and/or 
StatsComponent.  Excaserbated if you are using the default JVM GC 
strategies instead of picking some of the concurrent strategies.]

On 4/25/2011 5:02 PM, Yonik Seeley wrote:
> On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow<si...@thegestalt.org>  wrote:
>> On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
>>> This is what I do instead, to rewrite the query to mean the same thing but
>>> not give the lucene query parser trouble:
>>>
>>> fq=( (*:* AND -type:foo) OR restriction_id:1)
>>>
>>> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as
>>> just "-type:foo", but can get around the lucene query parsers troubles.
>>>
>>> So that might work for you.
>> Thanks for confirming my suspicions.
>>
>> Unfortunately I've tried that as well and, whilst it works
>> it's also unbelievably slow (~30s query time).
> It really shouldn't be that slow... how many documents are in your
> index, and how many match -type:foo?
>
> bq. Would writing my own Query Parser help here?
>
> Nope.  That's just syntax.
>
> If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
> are much slower (to the point where it causes you problems) and
> filters of the form
> type:foo) OR restriction_id:1
> are fast, then you could index the negation of the type field as well
> (if you know all the types)
>
> For instance, in a doc, index two type fields:
> type:bar
> type_not:foo
>
> Or if "type" is multi-valued, you could index both foo and NOT_foo in
> the same field.
>
> Then you could express the filter as type_not:foo OR restriction_id:1
> or
> type:NOT_foo OR restriction_id:1
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>

Re: Negative OR in fq field not working as expected

Posted by Simon Wistow <si...@thegestalt.org>.

On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said:
> It really shouldn't be that slow... how many documents are in your
> index, and how many match -type:foo?

Total number of docs is 161,000,000

 type:foo  39,000,000
-type:foo 122,200,000 
 type:bar 90,000,000

We're aware it's large and we're in the process or splitting the index 
up but I was just hoping that there was a workaround I could use in 
order to reclaim some performance.

Re: Negative OR in fq field not working as expected

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow <si...@thegestalt.org> wrote:
> On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
>> This is what I do instead, to rewrite the query to mean the same thing but
>> not give the lucene query parser trouble:
>>
>> fq=( (*:* AND -type:foo) OR restriction_id:1)
>>
>> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as
>> just "-type:foo", but can get around the lucene query parsers troubles.
>>
>> So that might work for you.
>
> Thanks for confirming my suspicions.
>
> Unfortunately I've tried that as well and, whilst it works
> it's also unbelievably slow (~30s query time).

It really shouldn't be that slow... how many documents are in your
index, and how many match -type:foo?

bq. Would writing my own Query Parser help here?

Nope.  That's just syntax.

If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
are much slower (to the point where it causes you problems) and
filters of the form
type:foo) OR restriction_id:1
are fast, then you could index the negation of the type field as well
(if you know all the types)

For instance, in a doc, index two type fields:
type:bar
type_not:foo

Or if "type" is multi-valued, you could index both foo and NOT_foo in
the same field.

Then you could express the filter as type_not:foo OR restriction_id:1
or
type:NOT_foo OR restriction_id:1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: Negative OR in fq field not working as expected

Posted by Simon Wistow <si...@thegestalt.org>.

On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
> This is what I do instead, to rewrite the query to mean the same thing but 
> not give the lucene query parser trouble:
> 
> fq=( (*:* AND -type:foo) OR restriction_id:1)
> 
> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as 
> just "-type:foo", but can get around the lucene query parsers troubles.
> 
> So that might work for you.

Thanks for confirming my suspicions.

Unfortunately I've tried that as well and, whilst it works 
it's also unbelievably slow (~30s query time).

Would writing my own Query Parser help here?

Simon

Re: Negative OR in fq field not working as expected

Posted by Jonathan Rochkind <ro...@jhu.edu>.

The solr 'lucene' query parser (that's being used there, in an fq) 
sometimes has trouble with "pure negative" clauses in an OR.

Even though it can handle "pure negative" queries like "-type:foo", it 
has trouble with pure negative in an OR like you are doing. At least in 
1.4.1, don't know if it's been improved in 3.1.  I _think_ you may have 
a case it has trouble with.

This is what I do instead, to rewrite the query to mean the same thing 
but not give the lucene query parser trouble:

fq=( (*:* AND -type:foo) OR restriction_id:1)

"*:*" means "everything", so (*:* AND -type:foo) means the same thing as 
just "-type:foo", but can get around the lucene query parsers troubles.

So that might work for you.

Dismax has even WORSE problems with "pure negative", with no easy way to 
get around em, so switching to dismax is probably not helpful there.

On 4/25/2011 4:27 PM, Simon Wistow wrote:
> I have a field 'type' that has several values. If it's type 'foo' then
> it also has a field 'restriction_id'.
>
> What I want is a filter query which says "either it's not a 'foo' or if
> it is then it has the restriction '1'"
>
> I expect two matches - one of type 'bar' and one of type 'foo'
>
> Neither
>
>   fq=(-type:foo OR restriction_id:1)
>   fq={!dismax q.op=OR}-type:foo restriction_id:1
>
> produce any results.
>
>   fq=restriction_id:1
>
> gets the 'foo' typed result.
>
>   fq=type:bar
>
> get the 'bar' typed result.
>
> Either of these
>
>    fq=type:[* TO *] OR (type:foo AND restriction_id:1)
>    fq=type:(bar OR quux OR fleeg) OR restriction_id:1
>
> do work but are very, very slow to the point of unusability (our indexes
> are pretty large).
>
> Searching round it seems like other people have experienced similar
> issues and the answer has been "Lucene just doesn't work like that"
>
> "When dealing with Lucene people are strongly encouraged to think in
> terms of MUST, MUST_NOT and SHOULD (which are represented in the query
> parser as the prefixes "+", "-" and the default) instead of in terms of
> AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's
> QueryParser) is not a strict Boolean Logic system, so it's best not to
> try and think of it like one."
>
>    http://wiki.apache.org/lucene-java/BooleanQuerySyntax
>
> Am I just out of luck? Might edismax help here?
>
> Simon
>
>
>
>
>
>
>