You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Lars Kotthoff (JIRA)" <ji...@apache.org> on 2008/04/04 09:35:26 UTC

[jira] Created: (SOLR-534) Return all query results with parameter rows=-1

Return all query results with parameter rows=-1
-----------------------------------------------

                 Key: SOLR-534
                 URL: https://issues.apache.org/jira/browse/SOLR-534
             Project: Solr
          Issue Type: New Feature
          Components: search
    Affects Versions: 1.3
         Environment: Tomcat 5.5
            Reporter: Lars Kotthoff
            Priority: Minor
         Attachments: solr-all-results.patch

The searcher should return all results matching a query when the parameter rows=-1 is given.

I know that it is a bad idea to do this in general, but as it explicitly requires a special parameter, people using this feature will be aware of what they are doing. The main use case for this feature is probably debugging, but in some cases one might actually need to retrieve all results because they e.g. are to be merged with results from different sources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-534) Return all query results with parameter rows=-1

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-534:
-------------------------------

    Attachment: solr-all-results.patch

Patch adding the feature. If rows is negative, all results are returned.

> Return all query results with parameter rows=-1
> -----------------------------------------------
>
>                 Key: SOLR-534
>                 URL: https://issues.apache.org/jira/browse/SOLR-534
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: Tomcat 5.5
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: solr-all-results.patch
>
>
> The searcher should return all results matching a query when the parameter rows=-1 is given.
> I know that it is a bad idea to do this in general, but as it explicitly requires a special parameter, people using this feature will be aware of what they are doing. The main use case for this feature is probably debugging, but in some cases one might actually need to retrieve all results because they e.g. are to be merged with results from different sources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-534) Return all query results with parameter rows=-1

Posted by "Walter Underwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832351#action_12832351 ] 

Walter Underwood commented on SOLR-534:
---------------------------------------

-1

This adds a denial of service vulnerability to Solr. One query can use lots of CPU or memory, or even crash the server.

This could also take out an entire distributed system.

If this is added, we MUST add a config option to disable it.

Let's take this back to the mailing list and find out why they believe all results are needed.There must be a better way to solve this.

> Return all query results with parameter rows=-1
> -----------------------------------------------
>
>                 Key: SOLR-534
>                 URL: https://issues.apache.org/jira/browse/SOLR-534
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: Tomcat 5.5
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: solr-all-results.patch
>
>
> The searcher should return all results matching a query when the parameter rows=-1 is given.
> I know that it is a bad idea to do this in general, but as it explicitly requires a special parameter, people using this feature will be aware of what they are doing. The main use case for this feature is probably debugging, but in some cases one might actually need to retrieve all results because they e.g. are to be merged with results from different sources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-534) Return all query results with parameter rows=-1

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585675#action_12585675 ] 

Hoss Man commented on SOLR-534:
-------------------------------

+0

rows=$REALLY_BIG_NUMBER works just as well, makes people just as aware of what they are doing, and helps protected people who *think* they are aware of what they are doing but really don't (ie: "I know this query will never return more then a thousand things, so I'll use rows=-1 to get them all at once" ... if you know it will never contain more then a thousand, use rows=1000, assert numFound<1000, and eliminate the risk of crashing Solr, hozing your net, or crashing your client.

> Return all query results with parameter rows=-1
> -----------------------------------------------
>
>                 Key: SOLR-534
>                 URL: https://issues.apache.org/jira/browse/SOLR-534
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: Tomcat 5.5
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: solr-all-results.patch
>
>
> The searcher should return all results matching a query when the parameter rows=-1 is given.
> I know that it is a bad idea to do this in general, but as it explicitly requires a special parameter, people using this feature will be aware of what they are doing. The main use case for this feature is probably debugging, but in some cases one might actually need to retrieve all results because they e.g. are to be merged with results from different sources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-534) Return all query results with parameter rows=-1

Posted by "Lisa Carter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796982#action_12796982 ] 

Lisa Carter commented on SOLR-534:
----------------------------------

I would argue that REALLY_BIG_NUMBER is actually significantly MORE dangerous than a crash. 

Here's why: A crash at least lets the programmer know something went wrong. Missing data is a silent failure. 

1) If the result set is too large for the client, it will run out of memory and generate an exception. The programmer will immediately know they did something wrong.

2) If the result set is too large for the network (unlikely) this will disconnect and fail. The programmer will immediately know they did something wrong.

3) If the result set is too large for solr, solr should not crash but rather return a page with the standard error handler "result set too large"/"out of memory". The programmer will immediately know they did something wrong. Solr sure as heck better be checking this already--you never know when you'll run into bizarre low memory conditions;allocations should ALWAYS be checked for.

But if you use the REALLY_BIG_NUMBER approach, the same bad programmer who never thought he would get back more than a 1000 records will never check whether the result set contains more than 1000 records either. If the programmer was expecting the complete result set and the database now contains 1002 records instead of 999, they will not know there is a problem... the last records in the set are simply truncated. The programmer who wrote the code may not be the person maintaining the application, quite common in production environments. The maintenance person may not know for weeks or months that a problem even exists! 

The -1 approach ensures immediate, loud failure.

The REALLY_BIG_NUMBER ensures only silent failure.

While it's impossible to idiot-proof everything, loud failure is always preferable to silent failure. Barking loudly saves the poor soul who maintains the idiot's code a lot of heartache.


> Return all query results with parameter rows=-1
> -----------------------------------------------
>
>                 Key: SOLR-534
>                 URL: https://issues.apache.org/jira/browse/SOLR-534
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: Tomcat 5.5
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: solr-all-results.patch
>
>
> The searcher should return all results matching a query when the parameter rows=-1 is given.
> I know that it is a bad idea to do this in general, but as it explicitly requires a special parameter, people using this feature will be aware of what they are doing. The main use case for this feature is probably debugging, but in some cases one might actually need to retrieve all results because they e.g. are to be merged with results from different sources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-534) Return all query results with parameter rows=-1

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832333#action_12832333 ] 

Hoss Man commented on SOLR-534:
-------------------------------

bq. But if you use the REALLY_BIG_NUMBER approach, the same bad programmer who never thought he would get back more than a 1000 records will never check whether the result set contains more than 1000 records either.

If we're going to assume the programmer doesn't check the actual number found, then why assume that the programmer pays attention to anything in the response at all? 

If you think it's likley that programmers will write code that only looks at the docList to iterates over all the docs in a response and doesn't notice that the numFound at the top of the docList is higher then the number asked for. then why do you assume that same programmer would be smart enough to check if an error message is returned when they ask for "all" rows and Solr can't provide them?

Bottom line: we can't protect programmers from all possible forms of stupidity stupidity, but we can make them be explicit about exactly what they want -- if they want 100, they ask for 100;  if they want 10000 they ask for 10000, if they want "all" they have to specify how big they think "all" is.

bq. Solr sure as heck better be checking this already--you never know when you'll run into bizarre low memory conditions;allocations should ALWAYS be checked for.

This isn't as easy as it may sound in Java ... the APIS available to test for the amount of memory available are limited, and even if hte JVM has the resources to allocate a 10,000,000 item PiorityQuery when computing the results, that doesn't mean doing so won't eat up all the available RAM causing some later (extremely tiny) allocation to trigger an OOM --- but If you've got a suggestion to help prevent OOM in situations like this, by all means patches welcome. 

> Return all query results with parameter rows=-1
> -----------------------------------------------
>
>                 Key: SOLR-534
>                 URL: https://issues.apache.org/jira/browse/SOLR-534
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: Tomcat 5.5
>            Reporter: Lars Kotthoff
>            Priority: Minor
>         Attachments: solr-all-results.patch
>
>
> The searcher should return all results matching a query when the parameter rows=-1 is given.
> I know that it is a bad idea to do this in general, but as it explicitly requires a special parameter, people using this feature will be aware of what they are doing. The main use case for this feature is probably debugging, but in some cases one might actually need to retrieve all results because they e.g. are to be merged with results from different sources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.