You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simon Wistow <si...@thegestalt.org> on 2011/04/25 22:27:49 UTC
Negative OR in fq field not working as expected
I have a field 'type' that has several values. If it's type 'foo' then
it also has a field 'restriction_id'.
What I want is a filter query which says "either it's not a 'foo' or if
it is then it has the restriction '1'"
I expect two matches - one of type 'bar' and one of type 'foo'
Neither
fq=(-type:foo OR restriction_id:1)
fq={!dismax q.op=OR}-type:foo restriction_id:1
produce any results.
fq=restriction_id:1
gets the 'foo' typed result.
fq=type:bar
get the 'bar' typed result.
Either of these
fq=type:[* TO *] OR (type:foo AND restriction_id:1)
fq=type:(bar OR quux OR fleeg) OR restriction_id:1
do work but are very, very slow to the point of unusability (our indexes
are pretty large).
Searching round it seems like other people have experienced similar
issues and the answer has been "Lucene just doesn't work like that"
"When dealing with Lucene people are strongly encouraged to think in
terms of MUST, MUST_NOT and SHOULD (which are represented in the query
parser as the prefixes "+", "-" and the default) instead of in terms of
AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's
QueryParser) is not a strict Boolean Logic system, so it's best not to
try and think of it like one."
http://wiki.apache.org/lucene-java/BooleanQuerySyntax
Am I just out of luck? Might edismax help here?
Simon
Re: Negative OR in fq field not working as expected
Posted by Jonathan Rochkind <ro...@jhu.edu>.
Yeah, I do the (*:* AND -type:foo) OR something:else
thing on my own pretty big index, and it's not slow at all. At least no
slower than doing any other "X OR Y" where X and Y both include lots of
results.
Pre-warming the field cache for, in this case, the 'type' field may
help. Same as it would if 'X' were just "type:bar" (not negated) where
"type:bar" matched about the same number or documents as "-type:foo"
does in your case. In general, there's nothing special that should make
that slow, it's a pretty ordinary query, really. Just using weird syntax
to get around lucene query parser issues.
[Obligatory mention: This may have nothing to do with your issue, but I
have found occasions where not having enough RAM allocated to Solr 1.4.1
can make things terribly slow, even though there is no OutOfMemory error
or other error in the logs. Especially if you are doing facetting and/or
StatsComponent. Excaserbated if you are using the default JVM GC
strategies instead of picking some of the concurrent strategies.]
On 4/25/2011 5:02 PM, Yonik Seeley wrote:
> On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow<si...@thegestalt.org> wrote:
>> On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
>>> This is what I do instead, to rewrite the query to mean the same thing but
>>> not give the lucene query parser trouble:
>>>
>>> fq=( (*:* AND -type:foo) OR restriction_id:1)
>>>
>>> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as
>>> just "-type:foo", but can get around the lucene query parsers troubles.
>>>
>>> So that might work for you.
>> Thanks for confirming my suspicions.
>>
>> Unfortunately I've tried that as well and, whilst it works
>> it's also unbelievably slow (~30s query time).
> It really shouldn't be that slow... how many documents are in your
> index, and how many match -type:foo?
>
> bq. Would writing my own Query Parser help here?
>
> Nope. That's just syntax.
>
> If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
> are much slower (to the point where it causes you problems) and
> filters of the form
> type:foo) OR restriction_id:1
> are fast, then you could index the negation of the type field as well
> (if you know all the types)
>
> For instance, in a doc, index two type fields:
> type:bar
> type_not:foo
>
> Or if "type" is multi-valued, you could index both foo and NOT_foo in
> the same field.
>
> Then you could express the filter as type_not:foo OR restriction_id:1
> or
> type:NOT_foo OR restriction_id:1
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>
Re: Negative OR in fq field not working as expected
Posted by Simon Wistow <si...@thegestalt.org>.
On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said:
> It really shouldn't be that slow... how many documents are in your
> index, and how many match -type:foo?
Total number of docs is 161,000,000
type:foo 39,000,000
-type:foo 122,200,000
type:bar 90,000,000
We're aware it's large and we're in the process or splitting the index
up but I was just hoping that there was a workaround I could use in
order to reclaim some performance.
Re: Negative OR in fq field not working as expected
Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow <si...@thegestalt.org> wrote:
> On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
>> This is what I do instead, to rewrite the query to mean the same thing but
>> not give the lucene query parser trouble:
>>
>> fq=( (*:* AND -type:foo) OR restriction_id:1)
>>
>> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as
>> just "-type:foo", but can get around the lucene query parsers troubles.
>>
>> So that might work for you.
>
> Thanks for confirming my suspicions.
>
> Unfortunately I've tried that as well and, whilst it works
> it's also unbelievably slow (~30s query time).
It really shouldn't be that slow... how many documents are in your
index, and how many match -type:foo?
bq. Would writing my own Query Parser help here?
Nope. That's just syntax.
If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
are much slower (to the point where it causes you problems) and
filters of the form
type:foo) OR restriction_id:1
are fast, then you could index the negation of the type field as well
(if you know all the types)
For instance, in a doc, index two type fields:
type:bar
type_not:foo
Or if "type" is multi-valued, you could index both foo and NOT_foo in
the same field.
Then you could express the filter as type_not:foo OR restriction_id:1
or
type:NOT_foo OR restriction_id:1
-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco
Re: Negative OR in fq field not working as expected
Posted by Simon Wistow <si...@thegestalt.org>.
On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
> This is what I do instead, to rewrite the query to mean the same thing but
> not give the lucene query parser trouble:
>
> fq=( (*:* AND -type:foo) OR restriction_id:1)
>
> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as
> just "-type:foo", but can get around the lucene query parsers troubles.
>
> So that might work for you.
Thanks for confirming my suspicions.
Unfortunately I've tried that as well and, whilst it works
it's also unbelievably slow (~30s query time).
Would writing my own Query Parser help here?
Simon
Re: Negative OR in fq field not working as expected
Posted by Jonathan Rochkind <ro...@jhu.edu>.
The solr 'lucene' query parser (that's being used there, in an fq)
sometimes has trouble with "pure negative" clauses in an OR.
Even though it can handle "pure negative" queries like "-type:foo", it
has trouble with pure negative in an OR like you are doing. At least in
1.4.1, don't know if it's been improved in 3.1. I _think_ you may have
a case it has trouble with.
This is what I do instead, to rewrite the query to mean the same thing
but not give the lucene query parser trouble:
fq=( (*:* AND -type:foo) OR restriction_id:1)
"*:*" means "everything", so (*:* AND -type:foo) means the same thing as
just "-type:foo", but can get around the lucene query parsers troubles.
So that might work for you.
Dismax has even WORSE problems with "pure negative", with no easy way to
get around em, so switching to dismax is probably not helpful there.
On 4/25/2011 4:27 PM, Simon Wistow wrote:
> I have a field 'type' that has several values. If it's type 'foo' then
> it also has a field 'restriction_id'.
>
> What I want is a filter query which says "either it's not a 'foo' or if
> it is then it has the restriction '1'"
>
> I expect two matches - one of type 'bar' and one of type 'foo'
>
> Neither
>
> fq=(-type:foo OR restriction_id:1)
> fq={!dismax q.op=OR}-type:foo restriction_id:1
>
> produce any results.
>
> fq=restriction_id:1
>
> gets the 'foo' typed result.
>
> fq=type:bar
>
> get the 'bar' typed result.
>
> Either of these
>
> fq=type:[* TO *] OR (type:foo AND restriction_id:1)
> fq=type:(bar OR quux OR fleeg) OR restriction_id:1
>
> do work but are very, very slow to the point of unusability (our indexes
> are pretty large).
>
> Searching round it seems like other people have experienced similar
> issues and the answer has been "Lucene just doesn't work like that"
>
> "When dealing with Lucene people are strongly encouraged to think in
> terms of MUST, MUST_NOT and SHOULD (which are represented in the query
> parser as the prefixes "+", "-" and the default) instead of in terms of
> AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's
> QueryParser) is not a strict Boolean Logic system, so it's best not to
> try and think of it like one."
>
> http://wiki.apache.org/lucene-java/BooleanQuerySyntax
>
> Am I just out of luck? Might edismax help here?
>
> Simon
>
>
>
>
>
>
>