You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Jan Høydahl (Jira)" <ji...@apache.org> on 2021/04/15 11:54:00 UTC

[jira] [Commented] (SOLR-15252) Solr should log WARN log when a query requests huge rows number

    [ https://issues.apache.org/jira/browse/SOLR-15252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322120#comment-17322120 ] 

Jan Høydahl commented on SOLR-15252:
------------------------------------

See PR for first attempt. In this first round I only do a WARN log in Solr whenever rows > 100.000. The log message is
{quote}Very high 'rows' parameter detected. This may lead to performance- and memory problems. Consider pagination, see [https://solr.apache.org/guide/pagination-of-results.html]. This warning will mute for 60s.
{quote}
 This log will appear once per node and then mute for 60s, then appear again once etc. So in a busy system it will be quite annoying, and people will hopefully change their habits :) I agree it's an anti-pattern with 10.000 as well, but then for small documents it may work very well and have absolutely no GC consequences. But start getting into 100.000's or millions and the risk of all kinds of issues is much higher, so this is a tradeoff.

Is there already a Jira for making Lucene smarter and not pre-allocate arrays for huge responses? Cause now this warning serves two purposes - one is the wasteful RAM usage even if you get only a few hits, causing GC pauses. The other is the anti-pattern of not using paging. If Lucene gets smarter, we can re-phrase this warning.

 

> Solr should log WARN log when a query requests huge rows number
> ---------------------------------------------------------------
>
>                 Key: SOLR-15252
>                 URL: https://issues.apache.org/jira/browse/SOLR-15252
>             Project: Solr
>          Issue Type: Improvement
>          Components: query
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>            Priority: Major
>
> We have all seen it - clients that use Integer.MAX_VALUE or 10000000 as rows parameter, to just make sure they get all possible results. And this of course leads to high GC pauses since Lucene allocates an array up front to hold results.
> Solr should either log WARN when it encounters a value above a certain threshold, such as 100k (then you should use cursormark instead). Or it should simply respond with 400 error and have a system property or query parameter folks can use to override if they know what they are doing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org