You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by eg...@gmx.de on 2010/02/10 13:45:46 UTC

How to not limit maximum number of documents?

Hi at all,

I'm working with Solr1.4 and came across the point, that Solr limits the number of documents retrieved by a solr response. This number can be changed by the common query parameter 'rows'.

In my scenario it is very important that the response contains ALL documents in the index! I played around with the 'rows'-parameter but couldn't find a way to do it.

I was not able to find any hint in the mailing list.
Thanks a lot in advance.

Cheers,
Egon
-- 
NEU: Mit GMX DSL über 1000,- ¿ sparen!
http://portal.gmx.net/de/go/dsl02

Re: How to not limit maximum number of documents?

Posted by Walter Underwood <wu...@wunderwood.org>.
Solr will not do this efficiently. Getting all rows will be very slow. Adding a parameter will not make it fast.

Why do you want to do this?

wunder

On Feb 10, 2010, at 7:06 AM, egon.o@gmx.de wrote:

> Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler.
> 
> Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :)
> But still, it feels like a hack. Originally I was searching more for something like:
> 
> q=<query>&rows=-1
> 
> Which leaves the API to do the job (efficiently!). :)
> The question is:
> Does Solr support something? Or should we write a feature request?
> 
> Cheers,
> Egon
> 
> 
> 
> -------- Original-Message --------
>> Datum: Wed, 10 Feb 2010 14:38:51 +0000 (GMT)
>> Von: Ron Chan <rc...@i-tao.com>
>> An: solr-user@lucene.apache.org
>> Betreff: Re: How to not limit maximum number of documents?
> 
>> just set the rows to a very large number, larger than the number of
>> documents available 
>> 
>> useful to set the fl parameter with the fields required to avoid memory
>> problems, if each document contains a lot of information 
>> 
>> 
>> ----- Original Message ----- 
>> From: "stefan maric" <st...@bt.com> 
>> To: solr-user@lucene.apache.org 
>> Sent: Wednesday, 10 February, 2010 2:14:05 PM 
>> Subject: RE: How to not limit maximum number of documents? 
>> 
>> Egon 
>> 
>> If you first run your query with q=<query>&rows=0 
>> 
>> Then your you get back an indication of the total number of docs 
>> <result name="response" numFound="53" start="0"/> 
>> 
>> Now your app can query again to get 1st n rows & manage forward|backward
>> traversal of results by subsequent queries 
>> 
>> 
>> 
>> Regards 
>> Stefan Maric
> -- 
> NEU: Mit GMX DSL über 1000,- ¿ sparen!
> http://portal.gmx.net/de/go/dsl02
> 


Re: How to not limit maximum number of documents?

Posted by Chris Hostetter <ho...@fucit.org>.
: Okay. So we have to leave this question open for now. There might be 
: other (more advanced) users that can answer this question. It's for 
: sure, the solution we found is not quite good.

The question really isn't "open", it's a FAQ...

http://wiki.apache.org/solr/FAQ#How_can_I_get_ALL_the_matching_documents_back.3F_..._How_can_I_return_an_unlimited_number_of_rows.3F


-Hoss


Re: How to not limit maximum number of documents?

Posted by eg...@gmx.de.
Okay. So we have to leave this question open for now. There might be other (more advanced) users that can answer this question. It's for sure, the solution we found is not quite good.

In the meantime, I will look for a way to submit a feature request. :)



-------- Original-Message --------
> Datum: Wed, 10 Feb 2010 15:13:49 +0000
> Von: stefan.maric@bt.com
> An: solr-user@lucene.apache.org
> Betreff: RE: How to not limit maximum number of documents?

> Yes, I tried the q=<query>&rows=-1 - the other day and gave up
> 
> But as you say it wouldn't help because you might get 
> a) timeouts because you have to wait a 'long' time for the large set of
> results to be returned
> b) exceptions being thrown because you're retrieving too much info to be
> thrown around the system
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

RE: How to not limit maximum number of documents?

Posted by st...@bt.com.
Yes, I tried the q=<query>&rows=-1 - the other day and gave up

But as you say it wouldn't help because you might get 
a) timeouts because you have to wait a 'long' time for the large set of results to be returned
b) exceptions being thrown because you're retrieving too much info to be thrown around the system



Regards
Stefan Maric 

-----Original Message-----
From: egon.o@gmx.de [mailto:egon.o@gmx.de] 
Sent: 10 February 2010 15:06
To: solr-user@lucene.apache.org
Subject: Re: How to not limit maximum number of documents?

Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler.

Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) But still, it feels like a hack. Originally I was searching more for something like:

q=<query>&rows=-1

Which leaves the API to do the job (efficiently!). :) The question is:
Does Solr support something? Or should we write a feature request?

Cheers,
Egon



-------- Original-Message --------
> Datum: Wed, 10 Feb 2010 14:38:51 +0000 (GMT)
> Von: Ron Chan <rc...@i-tao.com>
> An: solr-user@lucene.apache.org
> Betreff: Re: How to not limit maximum number of documents?

> just set the rows to a very large number, larger than the number of 
> documents available
> 
> useful to set the fl parameter with the fields required to avoid 
> memory problems, if each document contains a lot of information
> 
> 
> ----- Original Message -----
> From: "stefan maric" <st...@bt.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, 10 February, 2010 2:14:05 PM
> Subject: RE: How to not limit maximum number of documents? 
> 
> Egon
> 
> If you first run your query with q=<query>&rows=0
> 
> Then your you get back an indication of the total number of docs 
> <result name="response" numFound="53" start="0"/>
> 
> Now your app can query again to get 1st n rows & manage 
> forward|backward traversal of results by subsequent queries
> 
> 
> 
> Regards
> Stefan Maric
--
NEU: Mit GMX DSL über 1000,- ¿ sparen!
http://portal.gmx.net/de/go/dsl02

Re: How to not limit maximum number of documents?

Posted by eg...@gmx.de.
Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler.

Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :)
But still, it feels like a hack. Originally I was searching more for something like:

q=<query>&rows=-1

Which leaves the API to do the job (efficiently!). :)
The question is:
Does Solr support something? Or should we write a feature request?

Cheers,
Egon



-------- Original-Message --------
> Datum: Wed, 10 Feb 2010 14:38:51 +0000 (GMT)
> Von: Ron Chan <rc...@i-tao.com>
> An: solr-user@lucene.apache.org
> Betreff: Re: How to not limit maximum number of documents?

> just set the rows to a very large number, larger than the number of
> documents available 
> 
> useful to set the fl parameter with the fields required to avoid memory
> problems, if each document contains a lot of information 
> 
> 
> ----- Original Message ----- 
> From: "stefan maric" <st...@bt.com> 
> To: solr-user@lucene.apache.org 
> Sent: Wednesday, 10 February, 2010 2:14:05 PM 
> Subject: RE: How to not limit maximum number of documents? 
> 
> Egon 
> 
> If you first run your query with q=<query>&rows=0 
> 
> Then your you get back an indication of the total number of docs 
> <result name="response" numFound="53" start="0"/> 
> 
> Now your app can query again to get 1st n rows & manage forward|backward
> traversal of results by subsequent queries 
> 
> 
> 
> Regards 
> Stefan Maric
-- 
NEU: Mit GMX DSL über 1000,- ¿ sparen!
http://portal.gmx.net/de/go/dsl02

Re: How to not limit maximum number of documents?

Posted by Ron Chan <rc...@i-tao.com>.
just set the rows to a very large number, larger than the number of documents available 

useful to set the fl parameter with the fields required to avoid memory problems, if each document contains a lot of information 


----- Original Message ----- 
From: "stefan maric" <st...@bt.com> 
To: solr-user@lucene.apache.org 
Sent: Wednesday, 10 February, 2010 2:14:05 PM 
Subject: RE: How to not limit maximum number of documents? 

Egon 

If you first run your query with q=<query>&rows=0 

Then your you get back an indication of the total number of docs 
<result name="response" numFound="53" start="0"/> 

Now your app can query again to get 1st n rows & manage forward|backward traversal of results by subsequent queries 



Regards 
Stefan Maric 

-----Original Message----- 
From: egon.o@gmx.de [mailto:egon.o@gmx.de] 
Sent: 10 February 2010 14:08 
To: solr-user@lucene.apache.org 
Subject: Re: How to not limit maximum number of documents? 

Hi Stefan, 

you are right. I noticed this page-based result handling too. For web pages it is handy to maintain a number-of-results-per-page parameter together with an offset to browse result pages. Both can be done be solr's 'start' and 'rows' parameters. 
But as I don't use Solr in a web context it's important for me to get all results in one go. 

While waiting for answers I was working on a work-around and came across the LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows to query the index and obtain meta information about it. I found a parameter in the response called 'numDocs' which seams to contain the current number of index rows. 

So I was now thinking about first asking for the number of index rows via the LukeRequestHandler and then setting the 'rows' parameter to this value. Apparently, this is quite expensive as one front-end query always leads to two back-end queries. So I'm still searching for a better way to do this! 

Cheers, 
Egon 



-------- Original-Nachricht -------- 
> Datum: Wed, 10 Feb 2010 13:19:05 +0000 
> Von: stefan.maric@bt.com 
> An: solr-user@lucene.apache.org 
> Betreff: RE: How to not limit maximum number of documents? 

> I was just thinking along similar lines 
> 
> As far as I can tell you can use the parameters start & rows in 
> combination to control the retrieval of query results 
> 
> So 
> http://<host>:<port>/solr/select/?q=<query> 
> Will retrieve up to results 1..10 
> 
> http://<host>:<port>/solr/select/?q=<query>&start=11&rows=10 
> Will retrieve up results 11..20 
> 
> So it is up to your application to control result traversal/pagination 
> 
> 
> Question - does this mean that 
> http://<host>:<port>/solr/select/?q=<query>&start=11&rows=10 
> Runs the query a 2nd time 
> 
> And so on 
> 
> 
> Regards 
> Stefan Maric 

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! 
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 

RE: How to not limit maximum number of documents?

Posted by st...@bt.com.
Egon

If you first run your query with q=<query>&rows=0

Then your you get back an indication of the total number of docs 
<result  	name="response" numFound="53" start="0"/>

Now your app can query again to get 1st n rows & manage forward|backward traversal of results by subsequent queries



Regards
Stefan Maric 

-----Original Message-----
From: egon.o@gmx.de [mailto:egon.o@gmx.de] 
Sent: 10 February 2010 14:08
To: solr-user@lucene.apache.org
Subject: Re: How to not limit maximum number of documents?

Hi Stefan,

you are right. I noticed this page-based result handling too. For web pages it is handy to maintain a number-of-results-per-page parameter together with an offset to browse result pages. Both can be done be solr's 'start' and 'rows' parameters.
But as I don't use Solr in a web context it's important for me to get all results in one go.

While waiting for answers I was working on a work-around and came across the LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows to query the index and obtain meta information about it. I found a parameter in the response called 'numDocs' which seams to contain the current number of index rows.

So I was now thinking about first asking for the number of index rows via the LukeRequestHandler and then setting the 'rows' parameter to this value. Apparently, this is quite expensive as one front-end query always leads to two back-end queries. So I'm still searching for a better way to do this!

Cheers,
Egon



-------- Original-Nachricht --------
> Datum: Wed, 10 Feb 2010 13:19:05 +0000
> Von: stefan.maric@bt.com
> An: solr-user@lucene.apache.org
> Betreff: RE: How to not limit maximum number of documents?

> I was just thinking along similar lines
> 
> As far as I can tell you can use the parameters start & rows in 
> combination to control the retrieval of query results
> 
> So
> http://<host>:<port>/solr/select/?q=<query>
> Will retrieve up to results 1..10
> 
> http://<host>:<port>/solr/select/?q=<query>&start=11&rows=10
> Will retrieve up results 11..20
> 
> So it is up to your application to control result traversal/pagination
> 
> 
> Question - does this mean that
> http://<host>:<port>/solr/select/?q=<query>&start=11&rows=10
> Runs the query a 2nd time
> 
> And so on
> 
> 
> Regards
> Stefan Maric

--
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

Re: How to not limit maximum number of documents?

Posted by eg...@gmx.de.
Hi Stefan,

you are right. I noticed this page-based result handling too. For web pages it is handy to maintain a number-of-results-per-page parameter together with an offset to browse result pages. Both can be done be solr's 'start' and 'rows' parameters.
But as I don't use Solr in a web context it's important for me to get all results in one go.

While waiting for answers I was working on a work-around and came across the LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows to query the index and obtain meta information about it. I found a parameter in the response called 'numDocs' which seams to contain the current number of index rows.

So I was now thinking about first asking for the number of index rows via the LukeRequestHandler and then setting the 'rows' parameter to this value. Apparently, this is quite expensive as one front-end query always leads to two back-end queries. So I'm still searching for a better way to do this!

Cheers,
Egon



-------- Original-Nachricht --------
> Datum: Wed, 10 Feb 2010 13:19:05 +0000
> Von: stefan.maric@bt.com
> An: solr-user@lucene.apache.org
> Betreff: RE: How to not limit maximum number of documents?

> I was just thinking along similar lines
> 
> As far as I can tell you can use the parameters start & rows in
> combination to control the retrieval of query results
> 
> So
> http://<host>:<port>/solr/select/?q=<query>
> Will retrieve up to results 1..10
> 
> http://<host>:<port>/solr/select/?q=<query>&start=11&rows=10
> Will retrieve up results 11..20
> 
> So it is up to your application to control result traversal/pagination
> 
> 
> Question - does this mean that 
> http://<host>:<port>/solr/select/?q=<query>&start=11&rows=10
> Runs the query a 2nd time
> 
> And so on
> 
> 
> Regards
> Stefan Maric 

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

RE: How to not limit maximum number of documents?

Posted by st...@bt.com.
I was just thinking along similar lines

As far as I can tell you can use the parameters start & rows in combination to control the retrieval of query results

So
http://<host>:<port>/solr/select/?q=<query>
Will retrieve up to results 1..10

http://<host>:<port>/solr/select/?q=<query>&start=11&rows=10
Will retrieve up results 11..20

So it is up to your application to control result traversal/pagination


Question - does this mean that 
http://<host>:<port>/solr/select/?q=<query>&start=11&rows=10
Runs the query a 2nd time

And so on


Regards
Stefan Maric