You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andy C <an...@gmail.com> on 2017/04/26 19:04:21 UTC

After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

I'm looking at upgrading the version of Solr used with our application from
5.3 to 6.5.

Having an issue with a change in the behavior of one of the filter queries
we generate.

The field "ctindex" is only present in a subset of documents. It basically
contains a user id. For those documents where it is present, I only want
documents returned where the ctindex value matches the id of the user
performing the search. Documents with no ctindex value should be returned
as well.

This is implemented through a filter query that excludes documents that
contain some other value in the ctindex field: fq=(-ctindex:({* TO "MyId"}
OR {"MyId" TO *}))

In 6.5 if q.op=AND I always get 0 results returned when the fq is used.
This wasn't the case in 5.3. If I remove the q.op parameter (or set it to
OR) I get the expected results.

I can reproduce this in the Solr Admin UI. If I enable debugQuery, the
parsed_filter_queries output is different with q.op=AND and with no q.op
parameter:

For q.op=AND I see: ["+(-(SolrRangeQuery(ctindex:{* TO MyId})
SolrRangeQuery(ctindex:{MyId TO *})))"]

With no q.op set I get: ["-(SolrRangeQuery(ctindex:{* TO MyId})
SolrRangeQuery(ctindex:{MyId TO *}))"]

In 5.3 I always get the same parsed_filter_queries output regardless of the
q.op setting: ["-(ctindex:{* TO MyId} ctindex:{MyId TO *})"]

Any idea what is going on, or how to make the behavior of this filter query
independent of the q.op setting?

More details:
- Using the standard query parser
- The fieldType of the ctindex field is "string"
- I upgraded to 6.5 by copying my 5.3 config files over, updating the
schema version to 1.6 in the schema.xml, updating the luceneMatchVersion to
6.5.0 in the solrconfig.xml, and building a brand new index.

Thanks,
- Andy -

Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/1/2017 9:19 AM, Andy C wrote:
> Your state that the best performing query that gives the desired results is:
>> fq=ctindex:myId OR (*:* -ctindex:[* TO *])
> Is this because there some sort of optimization invoked when you use [* TO
> *], or just because a single range will be more efficient than multiple
> ranges ORed together?

There are fewer query clauses, so it takes less time.  The "all values"
range *might* perform faster than a range with a specific endpoint,
although I'm not familiar enough with the code to say for sure.

> I was considering generating an additional field "ctindex_populated" that
> would contain true or false depending on whether a ctindex value is
> present. And then changing the filter query to:
>
> fq=ctindex_populated:false OR ctindex:myId
>
> Would this be more efficient than your proposed filter query?

Yes.  Probably a lot more efficient.  Boolean fields only have two
possible values, so queries on those fields tend to be extremely fast.

Thanks,
Shawn


Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

Posted by Andy C <an...@gmail.com>.
Thanks for the response Shawn.

Adding "*:*" in front of my filter query does indeed resolve the issue. It
seems odd to me that the fully negated query does work if I don't set
q.op=AND. I guess this must be "adding complexity". Actually I just
discovered that that simply removing the extraneous outer parenthesis
[ fq=-ctindex:({*
TO "MyId"} OR {"MyId" TO *}) ] also resolved the issue.

Your state that the best performing query that gives the desired results is:

> fq=ctindex:myId OR (*:* -ctindex:[* TO *])

Is this because there some sort of optimization invoked when you use [* TO
*], or just because a single range will be more efficient than multiple
ranges ORed together?

I was considering generating an additional field "ctindex_populated" that
would contain true or false depending on whether a ctindex value is
present. And then changing the filter query to:

fq=ctindex_populated:false OR ctindex:myId

Would this be more efficient than your proposed filter query?

Thanks again,
- Andy -

On Mon, May 1, 2017 at 10:19 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 4/26/2017 1:04 PM, Andy C wrote:
> > I'm looking at upgrading the version of Solr used with our application
> from
> > 5.3 to 6.5.
> >
> > Having an issue with a change in the behavior of one of the filter
> queries
> > we generate.
> >
> > The field "ctindex" is only present in a subset of documents. It
> basically
> > contains a user id. For those documents where it is present, I only want
> > documents returned where the ctindex value matches the id of the user
> > performing the search. Documents with no ctindex value should be returned
> > as well.
> >
> > This is implemented through a filter query that excludes documents that
> > contain some other value in the ctindex field: fq=(-ctindex:({* TO
> "MyId"}
> > OR {"MyId" TO *}))
>
> I am surprised that this works in 5.3.  The crux of the problem is that
> fully negative query clauses do not actually work.
>
> Here's the best-performing query that gives you the results you want:
>
> fq=ctindex:myId OR (*:* -ctindex:[* TO *])
>
> The *:* is needed in the second clause to give the query a starting
> point of all documents, from which is subtracted all documents where
> ctindex has a value.  Without the "all docs" starting point, you are
> subtracting from nothing, which yields nothing.
>
> You may notice that this query works perfectly, and wonder why:
>
> fq=-ctindex:[* TO *]
>
> This works because on such a simple query, Solr is able to detect that
> it is fully negated, so it implicitly adds the *:* starting point for
> you.  As soon as you implement any kind of complexity (multiple clauses,
> parentheses, etc) that detection doesn't work.
>
> Thanks,
> Shawn
>
>

Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/26/2017 1:04 PM, Andy C wrote:
> I'm looking at upgrading the version of Solr used with our application from
> 5.3 to 6.5.
>
> Having an issue with a change in the behavior of one of the filter queries
> we generate.
>
> The field "ctindex" is only present in a subset of documents. It basically
> contains a user id. For those documents where it is present, I only want
> documents returned where the ctindex value matches the id of the user
> performing the search. Documents with no ctindex value should be returned
> as well.
>
> This is implemented through a filter query that excludes documents that
> contain some other value in the ctindex field: fq=(-ctindex:({* TO "MyId"}
> OR {"MyId" TO *}))

I am surprised that this works in 5.3.  The crux of the problem is that
fully negative query clauses do not actually work.

Here's the best-performing query that gives you the results you want:

fq=ctindex:myId OR (*:* -ctindex:[* TO *])

The *:* is needed in the second clause to give the query a starting
point of all documents, from which is subtracted all documents where
ctindex has a value.  Without the "all docs" starting point, you are
subtracting from nothing, which yields nothing.

You may notice that this query works perfectly, and wonder why:

fq=-ctindex:[* TO *]

This works because on such a simple query, Solr is able to detect that
it is fully negated, so it implicitly adds the *:* starting point for
you.  As soon as you implement any kind of complexity (multiple clauses,
parentheses, etc) that detection doesn't work.

Thanks,
Shawn