You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paul Tomblin <pt...@xcski.com> on 2009/11/03 01:49:48 UTC

SolrJ looping until I get all the results

If I want to do a query and only return X number of rows at a time,
but I want to keep querying until I get all the row, how do I do that?
 Can I just keep advancing query.setStart(...) and then checking if
server.query(query) returns any rows?  Or is there a better way?

Here's what I'm thinking

final static int MAX_ROWS = 100;
int start = 0;
query.setRows(MAX_ROWS);
while (true)
{
   QueryResponse resp = solrChunkServer.query(query);
   SolrDocumentList docs = resp.getResults();
   if (docs.size() == 0)
     break;
   ....
  start += MAX_ROWS;
  query.setStart(start);
}



-- 
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin

Re: SolrJ looping until I get all the results

Posted by Avlesh Singh <av...@gmail.com>.
>
> This isn't a search, this is a search and destroy.  Basically I need the
> file names of all the documents that I've indexed in Solr so that I can
> delete them.
>
Okay. I am sure you are aware of the "fl" parameter which restricts the
number of fields returned back with a response. If you need limited info, it
might be a good idea to use this parameter.

Cheers
Avlesh

On Tue, Nov 3, 2009 at 7:23 AM, Paul Tomblin <pt...@xcski.com> wrote:

> On Mon, Nov 2, 2009 at 8:47 PM, Avlesh Singh <av...@gmail.com> wrote:
> >>
> >> I was doing it that way, but what I'm doing with the documents is do
> >> some manipulation and put the new classes into a different list.
> >> Because I basically have two times the number of documents in lists,
> >> I'm running out of memory.  So I figured if I do it 1000 documents at
> >> a time, the SolrDocumentList will get garbage collected at least.
> >>
> > You are right w.r.t to all that but I am surprised that you would need
> ALL
> > the documents from the index for a search requirement.
>
> This isn't a search, this is a search and destroy.  Basically I need
> the file names of all the documents that I've indexed in Solr so that
> I can delete them.
>
> --
> http://www.linkedin.com/in/paultomblin
> http://careers.stackoverflow.com/ptomblin
>

Re: SolrJ looping until I get all the results

Posted by Paul Tomblin <pt...@xcski.com>.
On Mon, Nov 2, 2009 at 8:47 PM, Avlesh Singh <av...@gmail.com> wrote:
>>
>> I was doing it that way, but what I'm doing with the documents is do
>> some manipulation and put the new classes into a different list.
>> Because I basically have two times the number of documents in lists,
>> I'm running out of memory.  So I figured if I do it 1000 documents at
>> a time, the SolrDocumentList will get garbage collected at least.
>>
> You are right w.r.t to all that but I am surprised that you would need ALL
> the documents from the index for a search requirement.

This isn't a search, this is a search and destroy.  Basically I need
the file names of all the documents that I've indexed in Solr so that
I can delete them.

-- 
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin

Re: SolrJ looping until I get all the results

Posted by Avlesh Singh <av...@gmail.com>.
>
> I was doing it that way, but what I'm doing with the documents is do
> some manipulation and put the new classes into a different list.
> Because I basically have two times the number of documents in lists,
> I'm running out of memory.  So I figured if I do it 1000 documents at
> a time, the SolrDocumentList will get garbage collected at least.
>
You are right w.r.t to all that but I am surprised that you would need ALL
the documents from the index for a search requirement.

Cheers
Avlesh

On Tue, Nov 3, 2009 at 7:13 AM, Paul Tomblin <pt...@xcski.com> wrote:

> On Mon, Nov 2, 2009 at 8:40 PM, Avlesh Singh <av...@gmail.com> wrote:
> >>
> >> final static int MAX_ROWS = 100;
> >> int start = 0;
> >> query.setRows(MAX_ROWS);
> >> while (true)
> >> {
> >>   QueryResponse resp = solrChunkServer.query(query);
> >>   SolrDocumentList docs = resp.getResults();
> >>   if (docs.size() == 0)
> >>     break;
> >>   ....
> >>  start += MAX_ROWS;
> >>  query.setStart(start);
> >> }
> >>
> > Yes. It will work as you think. But are you sure that you want to do
> this?
> > How many documents do you have in the index? If the number is in an
> > acceptable range, why not simply do a query.setRows(Integer.MAX_VALUE)
> once?
>
> I was doing it that way, but what I'm doing with the documents is do
> some manipulation and put the new classes into a different list.
> Because I basically have two times the number of documents in lists,
> I'm running out of memory.  So I figured if I do it 1000 documents at
> a time, the SolrDocumentList will get garbage collected at least.
>
>
>
> --
> http://www.linkedin.com/in/paultomblin
> http://careers.stackoverflow.com/ptomblin
>

Re: SolrJ looping until I get all the results

Posted by Paul Tomblin <pt...@xcski.com>.
On Mon, Nov 2, 2009 at 8:40 PM, Avlesh Singh <av...@gmail.com> wrote:
>>
>> final static int MAX_ROWS = 100;
>> int start = 0;
>> query.setRows(MAX_ROWS);
>> while (true)
>> {
>>   QueryResponse resp = solrChunkServer.query(query);
>>   SolrDocumentList docs = resp.getResults();
>>   if (docs.size() == 0)
>>     break;
>>   ....
>>  start += MAX_ROWS;
>>  query.setStart(start);
>> }
>>
> Yes. It will work as you think. But are you sure that you want to do this?
> How many documents do you have in the index? If the number is in an
> acceptable range, why not simply do a query.setRows(Integer.MAX_VALUE) once?

I was doing it that way, but what I'm doing with the documents is do
some manipulation and put the new classes into a different list.
Because I basically have two times the number of documents in lists,
I'm running out of memory.  So I figured if I do it 1000 documents at
a time, the SolrDocumentList will get garbage collected at least.



-- 
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin

Re: SolrJ looping until I get all the results

Posted by Avlesh Singh <av...@gmail.com>.
>
> final static int MAX_ROWS = 100;
> int start = 0;
> query.setRows(MAX_ROWS);
> while (true)
> {
>   QueryResponse resp = solrChunkServer.query(query);
>   SolrDocumentList docs = resp.getResults();
>   if (docs.size() == 0)
>     break;
>   ....
>  start += MAX_ROWS;
>  query.setStart(start);
> }
>
Yes. It will work as you think. But are you sure that you want to do this?
How many documents do you have in the index? If the number is in an
acceptable range, why not simply do a query.setRows(Integer.MAX_VALUE) once?


Cheers
Avlesh

On Tue, Nov 3, 2009 at 6:19 AM, Paul Tomblin <pt...@xcski.com> wrote:

> If I want to do a query and only return X number of rows at a time,
> but I want to keep querying until I get all the row, how do I do that?
>  Can I just keep advancing query.setStart(...) and then checking if
> server.query(query) returns any rows?  Or is there a better way?
>
> Here's what I'm thinking
>
> final static int MAX_ROWS = 100;
> int start = 0;
> query.setRows(MAX_ROWS);
> while (true)
> {
>   QueryResponse resp = solrChunkServer.query(query);
>   SolrDocumentList docs = resp.getResults();
>   if (docs.size() == 0)
>     break;
>   ....
>  start += MAX_ROWS;
>  query.setStart(start);
> }
>
>
>
> --
> http://www.linkedin.com/in/paultomblin
> http://careers.stackoverflow.com/ptomblin
>

Re: SolrJ looping until I get all the results

Posted by Mck <mi...@semb.wever.org>.
On Mon, 2009-11-02 at 19:49 -0500, Paul Tomblin wrote:
> Here's what I'm thinking
> 
> final static int MAX_ROWS = 100;
> int start = 0;
> query.setRows(MAX_ROWS);
> while (true)
> {
>    QueryResponse resp = solrChunkServer.query(query);
>    SolrDocumentList docs = resp.getResults();
>    if (docs.size() == 0)
>      break;
>    ....
>   start += MAX_ROWS;
>   query.setStart(start);
> } 

Why not after the first limited fetch read how many hits there are and
on the second fetch get all remaining documents.

Example code (see the do-while loop)
http://sesat.no/projects/sesat-kernel/xref/no/sesat/search/query/token/SolrTokenEvaluator.html#237

~mck



-- 
"This above all: to thine own self be true. It must follow that you
cannot then be false to any man." Shakespeare 
| semb.wever.org | sesat.no | finn.no |