You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gael Jourdan-Weil <ga...@kelkoogroup.com> on 2020/01/13 10:31:36 UTC

Querying multiple pages for same keyword at same time

Hello,

We are experiencing some performance issues on Solr that seems related to requests querying multiple pages of results for the same keyword at the same time.
For instance, querying 10 pages of results (with 50 or 100 results per page) in the same second for a given keyword, and doing that on different keywords at the same time also.

The performance issues we observe is a high CPU usage and response time increasing a lot.

This doesn't seem related to the number of requests itself because we can handle a lot more requets per second when there is no such requests.

Do you think this makes sense and can be explained by the way Solr works?

Environment: SolrCloud 7.6.0

Gaël

RE: Querying multiple pages for same keyword at same time

Posted by Gael Jourdan-Weil <ga...@kelkoogroup.com>.
Indeed, with a max of 1K doc to be manipulated, I don't expect issues.
We are looking at other avenues to understand our issues.

Regards,
Gaël

Re: Querying multiple pages for same keyword at same time

Posted by Erick Erickson <er...@gmail.com>.
Conceptually asking for cods 900-1000 works something like this. Solr (well, Lucene actually) has to keep a sorted list 1,000 items long of scores and doc IDs because you can’t know whether doc N+1 will be in the list, or where. So the list manipulation is what takes the extra time. For even 1,000 docs, that shouldn’t be very much overhead, when it gets up in the 10s of K (or, I’ve seen millions) it’s _very_ noticeable.

With the example you’ve talked about, I doubt this is really a problem.

FWIW,
Erick

> On Jan 14, 2020, at 1:40 PM, Gael Jourdan-Weil <ga...@kelkoogroup.com> wrote:
> 
> Ok I understand better.
> Solr does not "read" the 1 to 900 docs to retrieve 901 to 1000 but it still needs to compute some stuff (docset intersection or something like that, right?) and sort, which is costly, and then "read" the docs.
> 
>> Are those 10 requests happening simultaneously, or consecutively?  If 
>> it's simultaneous, then they won't benefit from Solr caching.  Because 
>> Solr can cache certain things, it would probably be faster to make 10 
>> consecutive requests than 10 simultaneous.
> 
> The 10 requests are simultaneous which is I think an explanation of the issues we encounter. If they were consecutive, I'd expect to take benefit of the cache indeed.
> 
>> What are you trying to accomplish when you make these queries?  If we 
>> understand that, perhaps we can come up with something better.
> 
> Actually we are exposing a search engine and it's a behavior from some of our clients.
> It's not a behavior we are deliberately doing or encouraging.
> But before discussing with them, we wanted to understand a bit better what in Solr explain those response times.
> 
> Regards,
> Gaël
> 


RE: Querying multiple pages for same keyword at same time

Posted by Gael Jourdan-Weil <ga...@kelkoogroup.com>.
Ok I understand better.
Solr does not "read" the 1 to 900 docs to retrieve 901 to 1000 but it still needs to compute some stuff (docset intersection or something like that, right?) and sort, which is costly, and then "read" the docs.

> Are those 10 requests happening simultaneously, or consecutively?  If 
> it's simultaneous, then they won't benefit from Solr caching.  Because 
> Solr can cache certain things, it would probably be faster to make 10 
> consecutive requests than 10 simultaneous.

The 10 requests are simultaneous which is I think an explanation of the issues we encounter. If they were consecutive, I'd expect to take benefit of the cache indeed.

> What are you trying to accomplish when you make these queries?  If we 
> understand that, perhaps we can come up with something better.

Actually we are exposing a search engine and it's a behavior from some of our clients.
It's not a behavior we are deliberately doing or encouraging.
But before discussing with them, we wanted to understand a bit better what in Solr explain those response times.

Regards,
Gaël


Re: Querying multiple pages for same keyword at same time

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/13/2020 11:53 AM, Gael Jourdan-Weil wrote:
> Just to clarify something, we are not returning 1000 docs per request, we are only returning 100.
> We get 10 requests to Solr querying for docs 1 to 100, then 101 to 200, ... until 901 to 1000.
> But all that in the exact same second.
> 
> But I understand that to retrieve docs 901 to 1000, Solr needs to first get and sort the first 900 docs, so the request to get 901 to 1000 is as costly as asking for 1 to 1000 directly?
> If the sort applies on an indexed field (isn't it mandatory?), why do Solr needs to read the first 900 docs ?

In order to get the 10th page, it must sort to determine the IDs for the 
top 1000, skip 900 of them, and then retrieve the last 100.  So the 
query portion (not counting document retrieval) for page 10 has nearly 
the same cost as asking for all 1000 in the same request.

Asking for the first 100 involves only the top 100 documents.  Then 
because the request for the next 100 must obtain the top 200, it is a 
little bit slower.  The third request must obtain the top 300, so it's 
slower again.  And so on.

Are those 10 requests happening simultaneously, or consecutively?  If 
it's simultaneous, then they won't benefit from Solr caching.  Because 
Solr can cache certain things, it would probably be faster to make 10 
consecutive requests than 10 simultaneous.

What are you trying to accomplish when you make these queries?  If we 
understand that, perhaps we can come up with something better.

Thanks,
Shawn

RE: Querying multiple pages for same keyword at same time

Posted by Gael Jourdan-Weil <ga...@kelkoogroup.com>.
Thanks for your answer Erick.

Just to clarify something, we are not returning 1000 docs per request, we are only returning 100.
We get 10 requests to Solr querying for docs 1 to 100, then 101 to 200, ... until 901 to 1000.
But all that in the exact same second.

But I understand that to retrieve docs 901 to 1000, Solr needs to first get and sort the first 900 docs, so the request to get 901 to 1000 is as costly as asking for 1 to 1000 directly?
If the sort applies on an indexed field (isn't it mandatory?), why do Solr needs to read the first 900 docs ?

Regards,
Gaël

________________________________
De : Erick Erickson <er...@gmail.com>
Envoyé : lundi 13 janvier 2020 14:44
À : solr-user@lucene.apache.org <so...@lucene.apache.org>
Objet : Re: Querying multiple pages for same keyword at same time

To return stored values, Lucene must
1> read the stored values from disk
2> decompress a minimum 16K block
3> assemble the return packet.

So you’re returning 500-1,000 documents per request, it may just be the above set of steps. Solr was never designed to _return_ large result sets. Search them, yes but not return. So if this never happens when you only return a few docs, this is probably your problem.

There are two ways of making this less work for Solr, both depend on returning only docValues="true” fields.
1> return only docValues fields. See useDocValuesAsStored.
2> use the /export handler.

Best,
Erick

> On Jan 13, 2020, at 5:31 AM, Gael Jourdan-Weil <ga...@kelkoogroup.com> wrote:
>
> Hello,
>
> We are experiencing some performance issues on Solr that seems related to requests querying multiple pages of results for the same keyword at the same time.
> For instance, querying 10 pages of results (with 50 or 100 results per page) in the same second for a given keyword, and doing that on different keywords at the same time also.
>
> The performance issues we observe is a high CPU usage and response time increasing a lot.
>
> This doesn't seem related to the number of requests itself because we can handle a lot more requets per second when there is no such requests.
>
> Do you think this makes sense and can be explained by the way Solr works?
>
> Environment: SolrCloud 7.6.0
>
> Gaël


Re: Querying multiple pages for same keyword at same time

Posted by Erick Erickson <er...@gmail.com>.
To return stored values, Lucene must
1> read the stored values from disk
2> decompress a minimum 16K block
3> assemble the return packet.

So you’re returning 500-1,000 documents per request, it may just be the above set of steps. Solr was never designed to _return_ large result sets. Search them, yes but not return. So if this never happens when you only return a few docs, this is probably your problem.

There are two ways of making this less work for Solr, both depend on returning only docValues="true” fields.
1> return only docValues fields. See useDocValuesAsStored. 
2> use the /export handler.

Best,
Erick

> On Jan 13, 2020, at 5:31 AM, Gael Jourdan-Weil <ga...@kelkoogroup.com> wrote:
> 
> Hello,
> 
> We are experiencing some performance issues on Solr that seems related to requests querying multiple pages of results for the same keyword at the same time.
> For instance, querying 10 pages of results (with 50 or 100 results per page) in the same second for a given keyword, and doing that on different keywords at the same time also.
> 
> The performance issues we observe is a high CPU usage and response time increasing a lot.
> 
> This doesn't seem related to the number of requests itself because we can handle a lot more requets per second when there is no such requests.
> 
> Do you think this makes sense and can be explained by the way Solr works?
> 
> Environment: SolrCloud 7.6.0
> 
> Gaël