You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Toke Eskildsen <te...@statsbiblioteket.dk> on 2015/09/29 10:00:53 UTC

Sanity checking in Solr

Yesterday I helped solving a performance problem, triggered by issuing
requests with rows=2147483647 on an index with 3M documents.

In this concrete case the fix was easy, as it was possible to lower this
to rows=10. But it had stumped the one asking for weeks - the typical
amount of hits was 0 or 1, so he had assumed that the large number in
rows did not have a performance impact.


This got me thinking: What about adding a debug=sanity option to Solr
requests? It could inspect the concrete request as well as the index
layout and issue warnings where appropriate. Checks could be

* rows > X
* facet.limit > X
* facet.limit=-1 and unique values in facet field > X
* facet.method=enum and unique values in facet field > X
* (filterCache_size * maxDoc/8) > (X * heap_size)
* facet.field=A and A is a StrField without DocValues

I am sure we can come up with more. My point is that some parts of
trouble shooting Solr performance problems are easily definable and can
be fully automated. Of course some of these will be false positives, but
such is the nature of looking for warning signs.

As this would be primarily for people not familiar with the inner
working of Solr, some explanations would be needed:

# Potential problem: rows=2147483647
# Explanation: Specifying a number larger than 10,000 for rows can lead 
# to high CPU load and slow response times, even if the number of hits
# in the search result is low.
# Technical: A high row count makes Solr allocate min(rows, maxDoc) 
# ScoreDoc Objects temporarily , which can trigger excessive garbage
# collection.
# Alternative: Use pagination
(https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results)


- Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Sanity checking in Solr

Posted by Erick Erickson <er...@gmail.com>.
Yeah, this would be totally cool. Personally, if it fits into the
debug component
I'd advocate adding it in automatically if debug=all/true, assuming it wasn't
all that expensive at run time.


On Tue, Sep 29, 2015 at 11:26 PM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> On Tue, 2015-09-29 at 13:30 +0100, Upayavira wrote:
>> Could it be added to the debug component? That seems like a natural
>> place for it. It could, as you say, look for standard things that
>> might make a query perform badly, and report them in a new <sanity>
>> element, or such.
>
> Agreed. I'll take a closer look on how the debug mechanism ties into
> Solr. If sanity checking fits well, I'll try and make a proof of concept
> and a JIRA.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Sanity checking in Solr

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2015-09-29 at 13:30 +0100, Upayavira wrote:
> Could it be added to the debug component? That seems like a natural
> place for it. It could, as you say, look for standard things that
> might make a query perform badly, and report them in a new <sanity>
> element, or such.

Agreed. I'll take a closer look on how the debug mechanism ties into
Solr. If sanity checking fits well, I'll try and make a proof of concept
and a JIRA.

- Toke Eskildsen, State and University Library, Denmark




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Sanity checking in Solr

Posted by Upayavira <uv...@odoko.co.uk>.
Could it be added to the debug component? That seems like a natural
place for it. It could, as you say, look for standard things that
might make a query perform badly, and report them in a new <sanity>
element, or such.

Upayavira

On Tue, Sep 29, 2015, at 01:24 PM, Mikhail Khludnev wrote:
> Hi Toke! What a cool idea!
>
> On Tue, Sep 29, 2015 at 11:00 AM, Toke Eskildsen
> <te...@statsbiblioteket.dk> wrote:
>> Yesterday I helped solving a performance problem, triggered by
>> issuing
>>
requests with rows=2147483647 on an index with 3M documents.
>>
>>
In this concrete case the fix was easy, as it was possible to lower this
>>
to rows=10. But it had stumped the one asking for weeks - the typical
>>
amount of hits was 0 or 1, so he had assumed that the large number in
>>
rows did not have a performance impact.
>>
>>
>>
This got me thinking: What about adding a debug=sanity option to Solr
>>
requests? It could inspect the concrete request as well as the index
>>
layout and issue warnings where appropriate. Checks could be
>>
>>
* rows > X
>>
* facet.limit > X
>>
* facet.limit=-1 and unique values in facet field > X
>>
* facet.method=enum and unique values in facet field > X
>>
* (filterCache_size * maxDoc/8) > (X * heap_size)
>>
* facet.field=A and A is a StrField without DocValues
>>
>>
I am sure we can come up with more. My point is that some parts of
>>
trouble shooting Solr performance problems are easily definable and can
>>
be fully automated. Of course some of these will be false positives, but
>>
such is the nature of looking for warning signs.
>>
>>
As this would be primarily for people not familiar with the inner
>>
working of Solr, some explanations would be needed:
>>
>>
# Potential problem: rows=2147483647
>>
# Explanation: Specifying a number larger than 10,000 for rows can lead
>>
# to high CPU load and slow response times, even if the number of hits
>>
# in the search result is low.
>>
# Technical: A high row count makes Solr allocate min(rows, maxDoc)
>>
# ScoreDoc Objects temporarily , which can trigger excessive garbage
>>
# collection.
>>
# Alternative: Use pagination
>>
(https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results)
>>
>>
>>
- Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>>
---------------------------------------------------------------------
>>
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>
For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
>
> --
> Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics
>
>

Re: Sanity checking in Solr

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hi Toke!
What a cool idea!

On Tue, Sep 29, 2015 at 11:00 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> Yesterday I helped solving a performance problem, triggered by issuing
> requests with rows=2147483647 on an index with 3M documents.
>
> In this concrete case the fix was easy, as it was possible to lower this
> to rows=10. But it had stumped the one asking for weeks - the typical
> amount of hits was 0 or 1, so he had assumed that the large number in
> rows did not have a performance impact.
>
>
> This got me thinking: What about adding a debug=sanity option to Solr
> requests? It could inspect the concrete request as well as the index
> layout and issue warnings where appropriate. Checks could be
>
> * rows > X
> * facet.limit > X
> * facet.limit=-1 and unique values in facet field > X
> * facet.method=enum and unique values in facet field > X
> * (filterCache_size * maxDoc/8) > (X * heap_size)
> * facet.field=A and A is a StrField without DocValues
>
> I am sure we can come up with more. My point is that some parts of
> trouble shooting Solr performance problems are easily definable and can
> be fully automated. Of course some of these will be false positives, but
> such is the nature of looking for warning signs.
>
> As this would be primarily for people not familiar with the inner
> working of Solr, some explanations would be needed:
>
> # Potential problem: rows=2147483647
> # Explanation: Specifying a number larger than 10,000 for rows can lead
> # to high CPU load and slow response times, even if the number of hits
> # in the search result is low.
> # Technical: A high row count makes Solr allocate min(rows, maxDoc)
> # ScoreDoc Objects temporarily , which can trigger excessive garbage
> # collection.
> # Alternative: Use pagination
> (https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results)
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>