You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by suresh pendap <su...@gmail.com> on 2017/08/31 17:35:41 UTC

query with wild card with AND taking lot of time

Hello everybody,

We are seeing that the below query is running very slow and taking almost 4
seconds to finish


[<shard7_replica1>] webapp=/solr path=/select
params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://<host1>:8983/solr/flat_product_index_shard7_replica1/%7Chttp://<host2>:8983/solr/flat_product_index_shard7_replica2/%7Chttp://<host3>:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:<numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
hits=0 status=0 QTime=3663


It seems like the abstract_or_primary_product_id:* clause is contributing
to the overall response time. It seems that the
abstract_or_primary_product_id:* . clause is not adding any value in the
query criteria and can be safely removed.  Is my understanding correct?

I would like to know if the order of the clauses in the AND query would
affect the response time of the query?

For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10

Doesn't Lucene/Solr pick up the optimal query execution plan?

Is there anyway to look at the query execution plan generated by Lucene?

Regards
Suresh

Re: query with wild card with AND taking lot of time

Posted by David Hastings <ha...@gmail.com>.

a field:* query always takes a long time, and should be avoided if at all
possible.  solr/lucene is still going to try to rank the documents based on
that, even thought theres nothing to really rank.  every single document
where that field is not empty will have the same score for that part of the
ranking.  I dont know what the purpose of adding that in is in your case.

On Thu, Aug 31, 2017 at 2:38 PM, Josh Lincoln <jo...@gmail.com>
wrote:

> The closest thing to an execution plan that I know of is debug=true.That'll
> show timings of some of the components
> I also find it useful to add echoParams=all when troubleshooting. That'll
> show every param solr is using for the request, including params set in
> solrconfig.xml and not passed in the request. This can help explain the
> debug output (e.g. what queryparser is being used, if fields are being
> expanded through field aliases, etc.).
>
> On Thu, Aug 31, 2017 at 1:35 PM suresh pendap <su...@gmail.com>
> wrote:
>
> > Hello everybody,
> >
> > We are seeing that the below query is running very slow and taking
> almost 4
> > seconds to finish
> >
> >
> > [<shard7_replica1>] webapp=/solr path=/select
> >
> > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > <host1>:8983/solr/flat_product_index_shard7_replica1/
> %7Chttp://<host2>:8983/solr/flat_product_index_shard7_
> replica2/%7Chttp://<host3>:8983/solr/flat_product_index_
> shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+
> AND+abstract_or_primary_product_id:*+AND+(gtin:<
> numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > hits=0 status=0 QTime=3663
> >
> >
> > It seems like the abstract_or_primary_product_id:* clause is
> contributing
> > to the overall response time. It seems that the
> > abstract_or_primary_product_id:* . clause is not adding any value in the
> > query criteria and can be safely removed.  Is my understanding correct?
> >
> > I would like to know if the order of the clauses in the AND query would
> > affect the response time of the query?
> >
> > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> >
> > Doesn't Lucene/Solr pick up the optimal query execution plan?
> >
> > Is there anyway to look at the query execution plan generated by Lucene?
> >
> > Regards
> > Suresh
> >
>

Re: query with wild card with AND taking lot of time

Posted by Josh Lincoln <jo...@gmail.com>.

The closest thing to an execution plan that I know of is debug=true.That'll
show timings of some of the components
I also find it useful to add echoParams=all when troubleshooting. That'll
show every param solr is using for the request, including params set in
solrconfig.xml and not passed in the request. This can help explain the
debug output (e.g. what queryparser is being used, if fields are being
expanded through field aliases, etc.).

On Thu, Aug 31, 2017 at 1:35 PM suresh pendap <su...@gmail.com>
wrote:

> Hello everybody,
>
> We are seeing that the below query is running very slow and taking almost 4
> seconds to finish
>
>
> [<shard7_replica1>] webapp=/solr path=/select
>
> params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> <host1>:8983/solr/flat_product_index_shard7_replica1/%7Chttp://<host2>:8983/solr/flat_product_index_shard7_replica2/%7Chttp://<host3>:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:<numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> hits=0 status=0 QTime=3663
>
>
> It seems like the abstract_or_primary_product_id:* clause is contributing
> to the overall response time. It seems that the
> abstract_or_primary_product_id:* . clause is not adding any value in the
> query criteria and can be safely removed.  Is my understanding correct?
>
> I would like to know if the order of the clauses in the AND query would
> affect the response time of the query?
>
> For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
>
> Doesn't Lucene/Solr pick up the optimal query execution plan?
>
> Is there anyway to look at the query execution plan generated by Lucene?
>
> Regards
> Suresh
>

Re: query with wild card with AND taking lot of time

Posted by suresh pendap <su...@gmail.com>.

Thanks Lincoln for your suggestions. It was very helpful.

I am still curious as to why is the original query taking long time.  It is
something that Lucene should have ideally optimized.
Is there any way to see the execution plan used by Lucene?

Thanks
Suresh


On Thu, Aug 31, 2017 at 11:11 AM, Josh Lincoln <jo...@gmail.com>
wrote:

> As I understand it, using a different fq for each clause makes the
> resultant caches more likely to be used in future requests.
>
> For the query
> fq=first:bob AND last:smith
> a subsequent query for
> fq=first:tim AND last:smith
> won't be able to use the fq cache from the first query.
>
> However, if the first query was
> fq=first:bob
> fq=last:smith
> and subsequently
> fq=first:tim
> fq=last:smith
> then the second query will at least benefit from the last:smith cache
>
> Because fq clauses are always ANDed, this does not work for ORed clauses.
>
> I suppose if some conditions are frequently used together it may be better
> to put them in the same fq so there's only one cache. E.g. if an ecommerce
> site reqularly queried for featured:Y AND instock:Y
>
> On Thu, Aug 31, 2017 at 1:48 PM David Hastings <
> hastings.recursive@gmail.com>
> wrote:
>
> > >
> > > 2) Because all your clauses are more like filters and are ANDed
> together,
> > > you'll likely get better performance by putting them _each_ in an fq
> > > E.g.
> > > fq=product_identifier_type:DOTCOM_OFFER
> > > fq=abstract_or_primary_product_id:[* TO *]
> >
> >
> > why is this the case?  is it just better to have no logic operators in
> the
> > filter queries?
> >
> >
> >
> > On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln <jo...@gmail.com>
> > wrote:
> >
> > > Suresh,
> > > Two things I noticed.
> > > 1) If your intent is to only match records where there's something,
> > > anything, in abstract_or_primary_product_id, you should use
> fieldname:[*
> > > TO
> > > *]  but that will exclude records where that field is empty/missing. If
> > you
> > > want to match records even if that field is empty/missing, then you
> > should
> > > remove that clause entirely
> > > 2) Because all your clauses are more like filters and are ANDed
> together,
> > > you'll likely get better performance by putting them _each_ in an fq
> > > E.g.
> > > fq=product_identifier_type:DOTCOM_OFFER
> > > fq=abstract_or_primary_product_id:[* TO *]
> > > fq=gtin:<numericValue>
> > > fq=product_class_type:BUNDLE
> > > fq=hasProduct:N
> > >
> > >
> > > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap <sureshforsolr@gmail.com
> >
> > > wrote:
> > >
> > > > Hello everybody,
> > > >
> > > > We are seeing that the below query is running very slow and taking
> > > almost 4
> > > > seconds to finish
> > > >
> > > >
> > > > [<shard7_replica1>] webapp=/solr path=/select
> > > >
> > > > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> > > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > > > <host1>:8983/solr/flat_product_index_shard7_replica1/
> > > %7Chttp://<host2>:8983/solr/flat_product_index_shard7_
> > > replica2/%7Chttp://<host3>:8983/solr/flat_product_index_
> > >
> > shard7_replica0/&rows=11&version=2&q=product_
> identifier_type:DOTCOM_OFFER+
> > > AND+abstract_or_primary_product_id:*+AND+(gtin:<
> > > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> > > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > > > hits=0 status=0 QTime=3663
> > > >
> > > >
> > > > It seems like the abstract_or_primary_product_id:* clause is
> > > contributing
> > > > to the overall response time. It seems that the
> > > > abstract_or_primary_product_id:* . clause is not adding any value in
> > the
> > > > query criteria and can be safely removed.  Is my understanding
> correct?
> > > >
> > > > I would like to know if the order of the clauses in the AND query
> would
> > > > affect the response time of the query?
> > > >
> > > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> > > >
> > > > Doesn't Lucene/Solr pick up the optimal query execution plan?
> > > >
> > > > Is there anyway to look at the query execution plan generated by
> > Lucene?
> > > >
> > > > Regards
> > > > Suresh
> > > >
> > >
> >
>

Re: query with wild card with AND taking lot of time

Posted by Shawn Heisey <ap...@elyograg.org>.

On 9/1/2017 5:24 PM, Walter Underwood wrote:
> Hmm. Solr really should convert an fq of “a AND b” to separate “a” and “b” fq filters. That should be a simple special-case rewrite. It might take less time to implement than explaining it to everyone.
>
> Well, I guess then we’d have to explain how it wasn’t really necessary to send separate fq params…

I don't think it's a good idea for Solr to attempt optimizations like
this for everything.  Sure, there are plenty of times that people ask
Solr to do something, not realizing that what they have asked for is a
bad idea, but what about the person who *does* know exactly what they
have asked Solr to do, and actually did intend to do it that way? 
Turning one filter query into five filter queries does sound like a good
idea, but those filters will all run in parallel, which depending on the
use case might overwhelm CPU resources.

This is probably going to sound insane, but I like the fact that Solr
gives me plenty of rope with which to hang myself.  It means I have full
access to all the power of the system.  I know that Solr will do exactly
what I tell it to do, even if I don't actually understand all the
instructions I've given it.

Thanks,
Shawn

Re: query with wild card with AND taking lot of time

Posted by Dave <ha...@gmail.com>.

My other concern would be your p's and q's. If you start mixing in Boolean logic and solrs weak respect for it, it could be unpredictable 

> On Sep 3, 2017, at 5:43 PM, Phil Scadden <P....@gns.cri.nz> wrote:
> 
> 5 seems a reasonable limit to me. After that revert to slow.
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Saturday, 2 September 2017 12:01 p.m.
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: query with wild card with AND taking lot of time
> 
> How far would you take that? Say you had 100 terms joined by AND (ridiculous I know, just sayin' ). Then you'd chew up 100 entries in the filterCache.
> 
>> On Fri, Sep 1, 2017 at 4:24 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>> Hmm. Solr really should convert an fq of “a AND b” to separate “a” and “b” fq filters. That should be a simple special-case rewrite. It might take less time to implement than explaining it to everyone.
>> 
>> Well, I guess then we’d have to explain how it wasn’t really necessary
>> to send separate fq params…
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Sep 1, 2017, at 2:01 PM, Erick Erickson <er...@gmail.com> wrote:
>>> 
>>> Shawn:
>>> 
>>> See: https://issues.apache.org/jira/browse/SOLR-7219
>>> 
>>> Try fq=filter(foo) filter(bar) filter(baz)
>>> 
>>> Patches to docs welcome ;)....
>>> 
>>>> On Fri, Sep 1, 2017 at 1:50 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>>>> On 9/1/2017 8:13 AM, Alexandre Rafalovitch wrote:
>>>>> You can OR cachable filter queries in the latest Solr. There is a
>>>>> special
>>>>> (filter) syntax for that.
>>>> 
>>>> This is actually possible?  If so, I didn't see anything come across
>>>> the dev list about it.
>>>> 
>>>> I opened an issue for it, didn't know anything had been implemented.
>>>> After I opened the issue, I discovered that I was merely the latest
>>>> to do so, it had been requested before.
>>>> 
>>>> Can you point to the relevant part of the reference guide and the
>>>> Jira issue where the change was committed?
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>> 
> Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.

RE: query with wild card with AND taking lot of time

Posted by Phil Scadden <P....@gns.cri.nz>.

5 seems a reasonable limit to me. After that revert to slow.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Saturday, 2 September 2017 12:01 p.m.
To: solr-user <so...@lucene.apache.org>
Subject: Re: query with wild card with AND taking lot of time

How far would you take that? Say you had 100 terms joined by AND (ridiculous I know, just sayin' ). Then you'd chew up 100 entries in the filterCache.

On Fri, Sep 1, 2017 at 4:24 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> Hmm. Solr really should convert an fq of “a AND b” to separate “a” and “b” fq filters. That should be a simple special-case rewrite. It might take less time to implement than explaining it to everyone.
>
> Well, I guess then we’d have to explain how it wasn’t really necessary
> to send separate fq params…
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 1, 2017, at 2:01 PM, Erick Erickson <er...@gmail.com> wrote:
>>
>> Shawn:
>>
>> See: https://issues.apache.org/jira/browse/SOLR-7219
>>
>> Try fq=filter(foo) filter(bar) filter(baz)
>>
>> Patches to docs welcome ;)....
>>
>> On Fri, Sep 1, 2017 at 1:50 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>> On 9/1/2017 8:13 AM, Alexandre Rafalovitch wrote:
>>>> You can OR cachable filter queries in the latest Solr. There is a
>>>> special
>>>> (filter) syntax for that.
>>>
>>> This is actually possible?  If so, I didn't see anything come across
>>> the dev list about it.
>>>
>>> I opened an issue for it, didn't know anything had been implemented.
>>> After I opened the issue, I discovered that I was merely the latest
>>> to do so, it had been requested before.
>>>
>>> Can you point to the relevant part of the reference guide and the
>>> Jira issue where the change was committed?
>>>
>>> Thanks,
>>> Shawn
>>>
>
Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.

Re: query with wild card with AND taking lot of time

Posted by Erick Erickson <er...@gmail.com>.

How far would you take that? Say you had 100 terms joined by AND
(ridiculous I know, just sayin' ). Then you'd chew up 100 entries in
the filterCache.

On Fri, Sep 1, 2017 at 4:24 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> Hmm. Solr really should convert an fq of “a AND b” to separate “a” and “b” fq filters. That should be a simple special-case rewrite. It might take less time to implement than explaining it to everyone.
>
> Well, I guess then we’d have to explain how it wasn’t really necessary to send separate fq params…
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 1, 2017, at 2:01 PM, Erick Erickson <er...@gmail.com> wrote:
>>
>> Shawn:
>>
>> See: https://issues.apache.org/jira/browse/SOLR-7219
>>
>> Try fq=filter(foo) filter(bar) filter(baz)
>>
>> Patches to docs welcome ;)....
>>
>> On Fri, Sep 1, 2017 at 1:50 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>> On 9/1/2017 8:13 AM, Alexandre Rafalovitch wrote:
>>>> You can OR cachable filter queries in the latest Solr. There is a special
>>>> (filter) syntax for that.
>>>
>>> This is actually possible?  If so, I didn't see anything come across the
>>> dev list about it.
>>>
>>> I opened an issue for it, didn't know anything had been implemented.
>>> After I opened the issue, I discovered that I was merely the latest to
>>> do so, it had been requested before.
>>>
>>> Can you point to the relevant part of the reference guide and the Jira
>>> issue where the change was committed?
>>>
>>> Thanks,
>>> Shawn
>>>
>

Re: query with wild card with AND taking lot of time

Posted by Walter Underwood <wu...@wunderwood.org>.

Hmm. Solr really should convert an fq of “a AND b” to separate “a” and “b” fq filters. That should be a simple special-case rewrite. It might take less time to implement than explaining it to everyone.

Well, I guess then we’d have to explain how it wasn’t really necessary to send separate fq params…

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 1, 2017, at 2:01 PM, Erick Erickson <er...@gmail.com> wrote:
> 
> Shawn:
> 
> See: https://issues.apache.org/jira/browse/SOLR-7219
> 
> Try fq=filter(foo) filter(bar) filter(baz)
> 
> Patches to docs welcome ;)....
> 
> On Fri, Sep 1, 2017 at 1:50 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>> On 9/1/2017 8:13 AM, Alexandre Rafalovitch wrote:
>>> You can OR cachable filter queries in the latest Solr. There is a special
>>> (filter) syntax for that.
>> 
>> This is actually possible?  If so, I didn't see anything come across the
>> dev list about it.
>> 
>> I opened an issue for it, didn't know anything had been implemented.
>> After I opened the issue, I discovered that I was merely the latest to
>> do so, it had been requested before.
>> 
>> Can you point to the relevant part of the reference guide and the Jira
>> issue where the change was committed?
>> 
>> Thanks,
>> Shawn
>>

Re: query with wild card with AND taking lot of time

Posted by Erick Erickson <er...@gmail.com>.

Shawn:

See: https://issues.apache.org/jira/browse/SOLR-7219

Try fq=filter(foo) filter(bar) filter(baz)

Patches to docs welcome ;)....

On Fri, Sep 1, 2017 at 1:50 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 9/1/2017 8:13 AM, Alexandre Rafalovitch wrote:
>> You can OR cachable filter queries in the latest Solr. There is a special
>> (filter) syntax for that.
>
> This is actually possible?  If so, I didn't see anything come across the
> dev list about it.
>
> I opened an issue for it, didn't know anything had been implemented.
> After I opened the issue, I discovered that I was merely the latest to
> do so, it had been requested before.
>
> Can you point to the relevant part of the reference guide and the Jira
> issue where the change was committed?
>
> Thanks,
> Shawn
>

Re: query with wild card with AND taking lot of time

Posted by Mikhail Khludnev <mk...@apache.org>.

Shawn, you are welcome:

http://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html
Support for a special filter(…) syntax to...
https://issues.apache.org/jira/browse/SOLR-7219


On Fri, Sep 1, 2017 at 11:50 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 9/1/2017 8:13 AM, Alexandre Rafalovitch wrote:
> > You can OR cachable filter queries in the latest Solr. There is a special
> > (filter) syntax for that.
>
> This is actually possible?  If so, I didn't see anything come across the
> dev list about it.
>
> I opened an issue for it, didn't know anything had been implemented.
> After I opened the issue, I discovered that I was merely the latest to
> do so, it had been requested before.
>
> Can you point to the relevant part of the reference guide and the Jira
> issue where the change was committed?
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev

Re: query with wild card with AND taking lot of time

Posted by Shawn Heisey <ap...@elyograg.org>.

On 9/1/2017 8:13 AM, Alexandre Rafalovitch wrote:
> You can OR cachable filter queries in the latest Solr. There is a special
> (filter) syntax for that.

This is actually possible?  If so, I didn't see anything come across the
dev list about it.

I opened an issue for it, didn't know anything had been implemented. 
After I opened the issue, I discovered that I was merely the latest to
do so, it had been requested before.

Can you point to the relevant part of the reference guide and the Jira
issue where the change was committed?

Thanks,
Shawn

Re: query with wild card with AND taking lot of time

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

You can OR cachable filter queries in the latest Solr. There is a special
(filter) syntax for that.

Regards,
   Alex


On 31 Aug. 2017 2:11 pm, "Josh Lincoln" <jo...@gmail.com> wrote:

As I understand it, using a different fq for each clause makes the
resultant caches more likely to be used in future requests.

For the query
fq=first:bob AND last:smith
a subsequent query for
fq=first:tim AND last:smith
won't be able to use the fq cache from the first query.

However, if the first query was
fq=first:bob
fq=last:smith
and subsequently
fq=first:tim
fq=last:smith
then the second query will at least benefit from the last:smith cache

Because fq clauses are always ANDed, this does not work for ORed clauses.

I suppose if some conditions are frequently used together it may be better
to put them in the same fq so there's only one cache. E.g. if an ecommerce
site reqularly queried for featured:Y AND instock:Y

On Thu, Aug 31, 2017 at 1:48 PM David Hastings <hastings.recursive@gmail.com
>
wrote:

> >
> > 2) Because all your clauses are more like filters and are ANDed
together,
> > you'll likely get better performance by putting them _each_ in an fq
> > E.g.
> > fq=product_identifier_type:DOTCOM_OFFER
> > fq=abstract_or_primary_product_id:[* TO *]
>
>
> why is this the case?  is it just better to have no logic operators in the
> filter queries?
>
>
>
> On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln <jo...@gmail.com>
> wrote:
>
> > Suresh,
> > Two things I noticed.
> > 1) If your intent is to only match records where there's something,
> > anything, in abstract_or_primary_product_id, you should use fieldname:[*
> > TO
> > *]  but that will exclude records where that field is empty/missing. If
> you
> > want to match records even if that field is empty/missing, then you
> should
> > remove that clause entirely
> > 2) Because all your clauses are more like filters and are ANDed
together,
> > you'll likely get better performance by putting them _each_ in an fq
> > E.g.
> > fq=product_identifier_type:DOTCOM_OFFER
> > fq=abstract_or_primary_product_id:[* TO *]
> > fq=gtin:<numericValue>
> > fq=product_class_type:BUNDLE
> > fq=hasProduct:N
> >
> >
> > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap <su...@gmail.com>
> > wrote:
> >
> > > Hello everybody,
> > >
> > > We are seeing that the below query is running very slow and taking
> > almost 4
> > > seconds to finish
> > >
> > >
> > > [<shard7_replica1>] webapp=/solr path=/select
> > >
> > > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > > <host1>:8983/solr/flat_product_index_shard7_replica1/
> > %7Chttp://<host2>:8983/solr/flat_product_index_shard7_
> > replica2/%7Chttp://<host3>:8983/solr/flat_product_index_
> >
> shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+
> > AND+abstract_or_primary_product_id:*+AND+(gtin:<
> > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > > hits=0 status=0 QTime=3663
> > >
> > >
> > > It seems like the abstract_or_primary_product_id:* clause is
> > contributing
> > > to the overall response time. It seems that the
> > > abstract_or_primary_product_id:* . clause is not adding any value in
> the
> > > query criteria and can be safely removed.  Is my understanding
correct?
> > >
> > > I would like to know if the order of the clauses in the AND query
would
> > > affect the response time of the query?
> > >
> > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> > >
> > > Doesn't Lucene/Solr pick up the optimal query execution plan?
> > >
> > > Is there anyway to look at the query execution plan generated by
> Lucene?
> > >
> > > Regards
> > > Suresh
> > >
> >
>

Re: query with wild card with AND taking lot of time

Posted by Josh Lincoln <jo...@gmail.com>.

As I understand it, using a different fq for each clause makes the
resultant caches more likely to be used in future requests.

For the query
fq=first:bob AND last:smith
a subsequent query for
fq=first:tim AND last:smith
won't be able to use the fq cache from the first query.

However, if the first query was
fq=first:bob
fq=last:smith
and subsequently
fq=first:tim
fq=last:smith
then the second query will at least benefit from the last:smith cache

Because fq clauses are always ANDed, this does not work for ORed clauses.

I suppose if some conditions are frequently used together it may be better
to put them in the same fq so there's only one cache. E.g. if an ecommerce
site reqularly queried for featured:Y AND instock:Y

On Thu, Aug 31, 2017 at 1:48 PM David Hastings <ha...@gmail.com>
wrote:

> >
> > 2) Because all your clauses are more like filters and are ANDed together,
> > you'll likely get better performance by putting them _each_ in an fq
> > E.g.
> > fq=product_identifier_type:DOTCOM_OFFER
> > fq=abstract_or_primary_product_id:[* TO *]
>
>
> why is this the case?  is it just better to have no logic operators in the
> filter queries?
>
>
>
> On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln <jo...@gmail.com>
> wrote:
>
> > Suresh,
> > Two things I noticed.
> > 1) If your intent is to only match records where there's something,
> > anything, in abstract_or_primary_product_id, you should use fieldname:[*
> > TO
> > *]  but that will exclude records where that field is empty/missing. If
> you
> > want to match records even if that field is empty/missing, then you
> should
> > remove that clause entirely
> > 2) Because all your clauses are more like filters and are ANDed together,
> > you'll likely get better performance by putting them _each_ in an fq
> > E.g.
> > fq=product_identifier_type:DOTCOM_OFFER
> > fq=abstract_or_primary_product_id:[* TO *]
> > fq=gtin:<numericValue>
> > fq=product_class_type:BUNDLE
> > fq=hasProduct:N
> >
> >
> > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap <su...@gmail.com>
> > wrote:
> >
> > > Hello everybody,
> > >
> > > We are seeing that the below query is running very slow and taking
> > almost 4
> > > seconds to finish
> > >
> > >
> > > [<shard7_replica1>] webapp=/solr path=/select
> > >
> > > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > > <host1>:8983/solr/flat_product_index_shard7_replica1/
> > %7Chttp://<host2>:8983/solr/flat_product_index_shard7_
> > replica2/%7Chttp://<host3>:8983/solr/flat_product_index_
> >
> shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+
> > AND+abstract_or_primary_product_id:*+AND+(gtin:<
> > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > > hits=0 status=0 QTime=3663
> > >
> > >
> > > It seems like the abstract_or_primary_product_id:* clause is
> > contributing
> > > to the overall response time. It seems that the
> > > abstract_or_primary_product_id:* . clause is not adding any value in
> the
> > > query criteria and can be safely removed.  Is my understanding correct?
> > >
> > > I would like to know if the order of the clauses in the AND query would
> > > affect the response time of the query?
> > >
> > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> > >
> > > Doesn't Lucene/Solr pick up the optimal query execution plan?
> > >
> > > Is there anyway to look at the query execution plan generated by
> Lucene?
> > >
> > > Regards
> > > Suresh
> > >
> >
>

Re: query with wild card with AND taking lot of time

Posted by David Hastings <ha...@gmail.com>.

>
> 2) Because all your clauses are more like filters and are ANDed together,
> you'll likely get better performance by putting them _each_ in an fq
> E.g.
> fq=product_identifier_type:DOTCOM_OFFER
> fq=abstract_or_primary_product_id:[* TO *]


why is this the case?  is it just better to have no logic operators in the
filter queries?



On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln <jo...@gmail.com>
wrote:

> Suresh,
> Two things I noticed.
> 1) If your intent is to only match records where there's something,
> anything, in abstract_or_primary_product_id, you should use fieldname:[*
> TO
> *]  but that will exclude records where that field is empty/missing. If you
> want to match records even if that field is empty/missing, then you should
> remove that clause entirely
> 2) Because all your clauses are more like filters and are ANDed together,
> you'll likely get better performance by putting them _each_ in an fq
> E.g.
> fq=product_identifier_type:DOTCOM_OFFER
> fq=abstract_or_primary_product_id:[* TO *]
> fq=gtin:<numericValue>
> fq=product_class_type:BUNDLE
> fq=hasProduct:N
>
>
> On Thu, Aug 31, 2017 at 1:35 PM suresh pendap <su...@gmail.com>
> wrote:
>
> > Hello everybody,
> >
> > We are seeing that the below query is running very slow and taking
> almost 4
> > seconds to finish
> >
> >
> > [<shard7_replica1>] webapp=/solr path=/select
> >
> > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > <host1>:8983/solr/flat_product_index_shard7_replica1/
> %7Chttp://<host2>:8983/solr/flat_product_index_shard7_
> replica2/%7Chttp://<host3>:8983/solr/flat_product_index_
> shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+
> AND+abstract_or_primary_product_id:*+AND+(gtin:<
> numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > hits=0 status=0 QTime=3663
> >
> >
> > It seems like the abstract_or_primary_product_id:* clause is
> contributing
> > to the overall response time. It seems that the
> > abstract_or_primary_product_id:* . clause is not adding any value in the
> > query criteria and can be safely removed.  Is my understanding correct?
> >
> > I would like to know if the order of the clauses in the AND query would
> > affect the response time of the query?
> >
> > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> >
> > Doesn't Lucene/Solr pick up the optimal query execution plan?
> >
> > Is there anyway to look at the query execution plan generated by Lucene?
> >
> > Regards
> > Suresh
> >
>

Re: query with wild card with AND taking lot of time

Posted by Josh Lincoln <jo...@gmail.com>.

Suresh,
Two things I noticed.
1) If your intent is to only match records where there's something,
anything, in abstract_or_primary_product_id, you should use fieldname:[* TO
*]  but that will exclude records where that field is empty/missing. If you
want to match records even if that field is empty/missing, then you should
remove that clause entirely
2) Because all your clauses are more like filters and are ANDed together,
you'll likely get better performance by putting them _each_ in an fq
E.g.
fq=product_identifier_type:DOTCOM_OFFER
fq=abstract_or_primary_product_id:[* TO *]
fq=gtin:<numericValue>
fq=product_class_type:BUNDLE
fq=hasProduct:N

On Thu, Aug 31, 2017 at 1:35 PM suresh pendap <su...@gmail.com>
wrote:

> Hello everybody,
>
> We are seeing that the below query is running very slow and taking almost 4
> seconds to finish
>
>
> [<shard7_replica1>] webapp=/solr path=/select
>
> params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> <host1>:8983/solr/flat_product_index_shard7_replica1/%7Chttp://<host2>:8983/solr/flat_product_index_shard7_replica2/%7Chttp://<host3>:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:<numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> hits=0 status=0 QTime=3663
>
>
> It seems like the abstract_or_primary_product_id:* clause is contributing
> to the overall response time. It seems that the
> abstract_or_primary_product_id:* . clause is not adding any value in the
> query criteria and can be safely removed.  Is my understanding correct?
>
> I would like to know if the order of the clauses in the AND query would
> affect the response time of the query?
>
> For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
>
> Doesn't Lucene/Solr pick up the optimal query execution plan?
>
> Is there anyway to look at the query execution plan generated by Lucene?
>
> Regards
> Suresh
>