You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2017/05/01 14:19:14 UTC

Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

On 4/26/2017 1:04 PM, Andy C wrote:
> I'm looking at upgrading the version of Solr used with our application from
> 5.3 to 6.5.
>
> Having an issue with a change in the behavior of one of the filter queries
> we generate.
>
> The field "ctindex" is only present in a subset of documents. It basically
> contains a user id. For those documents where it is present, I only want
> documents returned where the ctindex value matches the id of the user
> performing the search. Documents with no ctindex value should be returned
> as well.
>
> This is implemented through a filter query that excludes documents that
> contain some other value in the ctindex field: fq=(-ctindex:({* TO "MyId"}
> OR {"MyId" TO *}))

I am surprised that this works in 5.3.  The crux of the problem is that
fully negative query clauses do not actually work.

Here's the best-performing query that gives you the results you want:

fq=ctindex:myId OR (*:* -ctindex:[* TO *])

The *:* is needed in the second clause to give the query a starting
point of all documents, from which is subtracted all documents where
ctindex has a value.  Without the "all docs" starting point, you are
subtracting from nothing, which yields nothing.

You may notice that this query works perfectly, and wonder why:

fq=-ctindex:[* TO *]

This works because on such a simple query, Solr is able to detect that
it is fully negated, so it implicitly adds the *:* starting point for
you.  As soon as you implement any kind of complexity (multiple clauses,
parentheses, etc) that detection doesn't work.

Thanks,
Shawn


Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/1/2017 9:19 AM, Andy C wrote:
> Your state that the best performing query that gives the desired results is:
>> fq=ctindex:myId OR (*:* -ctindex:[* TO *])
> Is this because there some sort of optimization invoked when you use [* TO
> *], or just because a single range will be more efficient than multiple
> ranges ORed together?

There are fewer query clauses, so it takes less time.  The "all values"
range *might* perform faster than a range with a specific endpoint,
although I'm not familiar enough with the code to say for sure.

> I was considering generating an additional field "ctindex_populated" that
> would contain true or false depending on whether a ctindex value is
> present. And then changing the filter query to:
>
> fq=ctindex_populated:false OR ctindex:myId
>
> Would this be more efficient than your proposed filter query?

Yes.  Probably a lot more efficient.  Boolean fields only have two
possible values, so queries on those fields tend to be extremely fast.

Thanks,
Shawn


Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

Posted by Andy C <an...@gmail.com>.
Thanks for the response Shawn.

Adding "*:*" in front of my filter query does indeed resolve the issue. It
seems odd to me that the fully negated query does work if I don't set
q.op=AND. I guess this must be "adding complexity". Actually I just
discovered that that simply removing the extraneous outer parenthesis
[ fq=-ctindex:({*
TO "MyId"} OR {"MyId" TO *}) ] also resolved the issue.

Your state that the best performing query that gives the desired results is:

> fq=ctindex:myId OR (*:* -ctindex:[* TO *])

Is this because there some sort of optimization invoked when you use [* TO
*], or just because a single range will be more efficient than multiple
ranges ORed together?

I was considering generating an additional field "ctindex_populated" that
would contain true or false depending on whether a ctindex value is
present. And then changing the filter query to:

fq=ctindex_populated:false OR ctindex:myId

Would this be more efficient than your proposed filter query?

Thanks again,
- Andy -

On Mon, May 1, 2017 at 10:19 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 4/26/2017 1:04 PM, Andy C wrote:
> > I'm looking at upgrading the version of Solr used with our application
> from
> > 5.3 to 6.5.
> >
> > Having an issue with a change in the behavior of one of the filter
> queries
> > we generate.
> >
> > The field "ctindex" is only present in a subset of documents. It
> basically
> > contains a user id. For those documents where it is present, I only want
> > documents returned where the ctindex value matches the id of the user
> > performing the search. Documents with no ctindex value should be returned
> > as well.
> >
> > This is implemented through a filter query that excludes documents that
> > contain some other value in the ctindex field: fq=(-ctindex:({* TO
> "MyId"}
> > OR {"MyId" TO *}))
>
> I am surprised that this works in 5.3.  The crux of the problem is that
> fully negative query clauses do not actually work.
>
> Here's the best-performing query that gives you the results you want:
>
> fq=ctindex:myId OR (*:* -ctindex:[* TO *])
>
> The *:* is needed in the second clause to give the query a starting
> point of all documents, from which is subtracted all documents where
> ctindex has a value.  Without the "all docs" starting point, you are
> subtracting from nothing, which yields nothing.
>
> You may notice that this query works perfectly, and wonder why:
>
> fq=-ctindex:[* TO *]
>
> This works because on such a simple query, Solr is able to detect that
> it is fully negated, so it implicitly adds the *:* starting point for
> you.  As soon as you implement any kind of complexity (multiple clauses,
> parentheses, etc) that detection doesn't work.
>
> Thanks,
> Shawn
>
>