You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Chris Dempsey <cd...@gmail.com> on 2020/07/09 18:04:51 UTC

Multiple fq vs combined fq performance

Hi all! In a collection where we have ~54 million documents we've noticed
running a query with the following:

"fq":["{!cache=false}_class:taggedTickets",
      "{!cache=false}taggedTickets_ticketId:1000000241",
      "{!cache=false}companyId:22476"]

when I debugQuery I see:

"parsed_filter_queries":[
  "{!cache=false}_class:taggedTickets",
  "{!cache=false}IndexOrDocValuesQuery(taggedTickets_ticketId:[1000000241
TO 1000000241])",
  "{!cache=false}IndexOrDocValuesQuery(companyId:[22476 TO 22476])"
]

runs in roughly ~450ms but if we remove `{!cache=false}companyId:22476` it
drops down to ~5ms (it's important to note that `taggedTickets_ticketId` is
globally unique).

If we change the fqs to:

"fq":["{!cache=false}_class:taggedTickets",
      "{!cache=false}+companyId:22476 +taggedTickets_ticketId:1000000241"]

when I debugQuery I see:

"parsed_filter_queries":[
   "{!cache=false}_class:taggedTickets",
   "{!cache=false}+IndexOrDocValuesQuery(companyId:[22476 TO 22476])
+IndexOrDocValuesQuery(taggedTickets_ticketId:[1000000241 TO 1000000241])"
]

we get the correct result back in ~5ms.

My current thought is that in the slow scenario Solr is still running
`{!cache=false}IndexOrDocValuesQuery(companyId:[22476
TO 22476])` even though it "has the answer" from the first two fq.

Am I off-base or misunderstanding how `fq` are processed?

Re: Multiple fq vs combined fq performance

Posted by Tomás Fernández Löbbe <to...@gmail.com>.

All non-cached filters will be executed together (leapfrog between them)
and will be sorted by the filter cost (I guess that, since you aren't
setting a cost, then the order of the input matters).  You can try setting
a cost in your filters (lower than 100, so that they don't become post
filters)

One other thing though, I guess you are using Point fields? If you
typically query for a single value like in this example (vs. ranges), you
may want to use string fields for those. See
https://issues.apache.org/jira/browse/SOLR-11078.




On Fri, Jul 10, 2020 at 7:51 AM Chris Dempsey <cd...@gmail.com> wrote:

> Thanks for the suggestion, Alex. It doesn't appear that
> IndexOrDocValuesQuery (at least in Solr 7.7.1) supports the PostFilter
> interface. I've tried various values for cost on each of the fq and it
> doesn't change the QTime.
>
> So, after digging around a bit even though
> {!cache=false}taggedTickets_ticketId:1000000241 only matches one and only
> one document in the collection that doesn't matter for the other two fq who
> continue to look over the index of the collection, correct?
>
> On Thu, Jul 9, 2020 at 4:24 PM Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
> > I _think_ it will run all 3 and then do index hopping. But if you know
> one
> > fq is super expensive, you could assign it a cost
> > Value over 100 will try to use PostFilter then and apply the query on top
> > of results from other queries.
> >
> >
> >
> >
> https://lucene.apache.org/solr/guide/8_4/common-query-parameters.html#cache-parameter
> >
> > Hope it helps,
> >     Alex.
> >
> > On Thu., Jul. 9, 2020, 2:05 p.m. Chris Dempsey, <cd...@gmail.com>
> wrote:
> >
> > > Hi all! In a collection where we have ~54 million documents we've
> noticed
> > > running a query with the following:
> > >
> > > "fq":["{!cache=false}_class:taggedTickets",
> > >       "{!cache=false}taggedTickets_ticketId:1000000241",
> > >       "{!cache=false}companyId:22476"]
> > >
> > > when I debugQuery I see:
> > >
> > > "parsed_filter_queries":[
> > >   "{!cache=false}_class:taggedTickets",
> > >
>  "{!cache=false}IndexOrDocValuesQuery(taggedTickets_ticketId:[1000000241
> > > TO 1000000241])",
> > >   "{!cache=false}IndexOrDocValuesQuery(companyId:[22476 TO 22476])"
> > > ]
> > >
> > > runs in roughly ~450ms but if we remove `{!cache=false}companyId:22476`
> > it
> > > drops down to ~5ms (it's important to note that
> `taggedTickets_ticketId`
> > is
> > > globally unique).
> > >
> > > If we change the fqs to:
> > >
> > > "fq":["{!cache=false}_class:taggedTickets",
> > >       "{!cache=false}+companyId:22476
> > +taggedTickets_ticketId:1000000241"]
> > >
> > > when I debugQuery I see:
> > >
> > > "parsed_filter_queries":[
> > >    "{!cache=false}_class:taggedTickets",
> > >    "{!cache=false}+IndexOrDocValuesQuery(companyId:[22476 TO 22476])
> > > +IndexOrDocValuesQuery(taggedTickets_ticketId:[1000000241 TO
> > 1000000241])"
> > > ]
> > >
> > > we get the correct result back in ~5ms.
> > >
> > > My current thought is that in the slow scenario Solr is still running
> > > `{!cache=false}IndexOrDocValuesQuery(companyId:[22476
> > > TO 22476])` even though it "has the answer" from the first two fq.
> > >
> > > Am I off-base or misunderstanding how `fq` are processed?
> > >
> >
>

Re: Multiple fq vs combined fq performance

Posted by Chris Dempsey <cd...@gmail.com>.

Thanks for the suggestion, Alex. It doesn't appear that
IndexOrDocValuesQuery (at least in Solr 7.7.1) supports the PostFilter
interface. I've tried various values for cost on each of the fq and it
doesn't change the QTime.

So, after digging around a bit even though
{!cache=false}taggedTickets_ticketId:1000000241 only matches one and only
one document in the collection that doesn't matter for the other two fq who
continue to look over the index of the collection, correct?

On Thu, Jul 9, 2020 at 4:24 PM Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> I _think_ it will run all 3 and then do index hopping. But if you know one
> fq is super expensive, you could assign it a cost
> Value over 100 will try to use PostFilter then and apply the query on top
> of results from other queries.
>
>
>
> https://lucene.apache.org/solr/guide/8_4/common-query-parameters.html#cache-parameter
>
> Hope it helps,
>     Alex.
>
> On Thu., Jul. 9, 2020, 2:05 p.m. Chris Dempsey, <cd...@gmail.com> wrote:
>
> > Hi all! In a collection where we have ~54 million documents we've noticed
> > running a query with the following:
> >
> > "fq":["{!cache=false}_class:taggedTickets",
> >       "{!cache=false}taggedTickets_ticketId:1000000241",
> >       "{!cache=false}companyId:22476"]
> >
> > when I debugQuery I see:
> >
> > "parsed_filter_queries":[
> >   "{!cache=false}_class:taggedTickets",
> >   "{!cache=false}IndexOrDocValuesQuery(taggedTickets_ticketId:[1000000241
> > TO 1000000241])",
> >   "{!cache=false}IndexOrDocValuesQuery(companyId:[22476 TO 22476])"
> > ]
> >
> > runs in roughly ~450ms but if we remove `{!cache=false}companyId:22476`
> it
> > drops down to ~5ms (it's important to note that `taggedTickets_ticketId`
> is
> > globally unique).
> >
> > If we change the fqs to:
> >
> > "fq":["{!cache=false}_class:taggedTickets",
> >       "{!cache=false}+companyId:22476
> +taggedTickets_ticketId:1000000241"]
> >
> > when I debugQuery I see:
> >
> > "parsed_filter_queries":[
> >    "{!cache=false}_class:taggedTickets",
> >    "{!cache=false}+IndexOrDocValuesQuery(companyId:[22476 TO 22476])
> > +IndexOrDocValuesQuery(taggedTickets_ticketId:[1000000241 TO
> 1000000241])"
> > ]
> >
> > we get the correct result back in ~5ms.
> >
> > My current thought is that in the slow scenario Solr is still running
> > `{!cache=false}IndexOrDocValuesQuery(companyId:[22476
> > TO 22476])` even though it "has the answer" from the first two fq.
> >
> > Am I off-base or misunderstanding how `fq` are processed?
> >
>

Re: Multiple fq vs combined fq performance

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

I _think_ it will run all 3 and then do index hopping. But if you know one
fq is super expensive, you could assign it a cost
Value over 100 will try to use PostFilter then and apply the query on top
of results from other queries.


https://lucene.apache.org/solr/guide/8_4/common-query-parameters.html#cache-parameter

Hope it helps,
    Alex.

On Thu., Jul. 9, 2020, 2:05 p.m. Chris Dempsey, <cd...@gmail.com> wrote:

> Hi all! In a collection where we have ~54 million documents we've noticed
> running a query with the following:
>
> "fq":["{!cache=false}_class:taggedTickets",
>       "{!cache=false}taggedTickets_ticketId:1000000241",
>       "{!cache=false}companyId:22476"]
>
> when I debugQuery I see:
>
> "parsed_filter_queries":[
>   "{!cache=false}_class:taggedTickets",
>   "{!cache=false}IndexOrDocValuesQuery(taggedTickets_ticketId:[1000000241
> TO 1000000241])",
>   "{!cache=false}IndexOrDocValuesQuery(companyId:[22476 TO 22476])"
> ]
>
> runs in roughly ~450ms but if we remove `{!cache=false}companyId:22476` it
> drops down to ~5ms (it's important to note that `taggedTickets_ticketId` is
> globally unique).
>
> If we change the fqs to:
>
> "fq":["{!cache=false}_class:taggedTickets",
>       "{!cache=false}+companyId:22476 +taggedTickets_ticketId:1000000241"]
>
> when I debugQuery I see:
>
> "parsed_filter_queries":[
>    "{!cache=false}_class:taggedTickets",
>    "{!cache=false}+IndexOrDocValuesQuery(companyId:[22476 TO 22476])
> +IndexOrDocValuesQuery(taggedTickets_ticketId:[1000000241 TO 1000000241])"
> ]
>
> we get the correct result back in ~5ms.
>
> My current thought is that in the slow scenario Solr is still running
> `{!cache=false}IndexOrDocValuesQuery(companyId:[22476
> TO 22476])` even though it "has the answer" from the first two fq.
>
> Am I off-base or misunderstanding how `fq` are processed?
>