You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by James liu <li...@gmail.com> on 2007/05/14 10:35:13 UTC

Question: Pagination with multi index box

if use multi index box, how to pagination with sort by score correctly?

for example, i wanna query "search" with 60 index box and sort by score.

i don't know the num found from every index box which have different
content.

if promise 10 page with sort score correctly, i think solr 's start is 0,
and rows is 100.(10 result per page)

60*100=6000, sort it and get top 100 to cache.

it is very slove although it promise 10 page with sort score correctly.


any idea to fix it?

fast and correct.



-- 
regards
jl

Re: Question: Pagination with multi index box

Posted by James liu <li...@gmail.com>.
maybe full-text search sort correct not very import.


2007/5/15, James liu <li...@gmail.com>:
>
>
>
> 2007/5/15, Mike Klaas <mi...@gmail.com>:
> >
> > On 14-May-07, at 8:55 PM, James liu wrote:
> >
> > > thks for your detail answer.
> > >
> > > but u ignore "sorted by score"
> > >
> > > p1, p2,p1,p1,p3,p4,p1,p1
> > >
> > > maybe their max score is lower than from p19,p20.
> > >
> >
> > I'm not ignoring it: I'm implying that the above is the correct
> > descending score-sorted order.  You have to perform that sort manually.
>
>
> i mean merged results(from 60 p) and sort it, not solr's sort.
> every result from box have been  sorted by score.
>
>
> > so it will not sorted by score correctly.
> > >
> > > and if user click page 2 to see, how to show data?
> > >
> > > p1 start from 10 or query other partitions?
> >
> > Assemble results 1 through 20, then display 11-20 to the user.
>
>
> for example, i wanna query "solr"
>
> p1 have 100 results which score is bigger than 80
>
> p2 have 100 results which score is smaller than 20
>
> so if i use rows=10, score not correct.
>
> if i wanna promise 10 pages which sort by score correctly.
>
> so i have to get 100(rows=100) results from every box.
>
> and merge results, sort it, finallay get top 100 results.
>
> but it will very slow.
>
>
> i don't know other search how to solve it? maybe they not sort by score
> very correctly.
>
>
>
>
> -Mike
> >
> > >
> > > 2007/5/15, Mike Klaas <mike.klaas@gmail.com >:
> > >>
> > >> On 14-May-07, at 6:49 PM, James liu wrote:
> > >>
> > >> > 2007/5/15, Mike Klaas <mi...@gmail.com>:
> > >> >>
> > >> >> On 14-May-07, at 1:35 AM, James liu wrote:
> > >> >>
> > >> >> When you get up to 60 partitions, you should make it a multi stage
> > >> >> process.  Assuming your partitions are disjoint and evenly
> > >> >> distributed, estimate the number of documents that will appear
> > >> in the
> > >> >> final result from each.
> > >> >
> > >> >
> > >> > yes, partitions distrbuted.
> > >> >
> > >> >
> > >> > Double or triple that (and put a minimum
> > >> >> threshold), try to assemble the number of documents you
> > >> require, and
> > >> >> if one partition "runs out" of docs before it is done, request
> > >> a new
> > >> >> round.
> > >> >
> > >> >
> > >> > i dont' know what u mean "runs out"
> > >>
> > >> Say you request 5 docs from each of 60 partitions, and are interested
> >
> > >> in docs 1-10.  If, sorted by score, the docs come from:
> > >>
> > >> p1, p2, p1, p1, p3, p4, p1, p1
> > >>
> > >> Then p1 has "run out" at n=8, and there is no way to be sure if the
> > >> remaining two needed docs come from p1 or somewhere else.  So you
> > >> have to now request at least two additional documents from p1.
> > >>
> > >> > one user request will generate 60 partitions request.
> > >> >
> > >> > they work in parallel。
> > >> >
> > >> > so i don't know every partion's status before they done.
> > >>
> > >> Normally, you would wait for them to finish, and execute a subsequent
> >
> > >> request if more docs are needed.
> > >>
> > >> -Mike
> > >
> > >
> > >
> > >
> > > --
> > > regards
> > > jl
> >
> >
>
>
> --
> regards
> jl




-- 
regards
jl

Re: Question: Pagination with multi index box

Posted by Mike Klaas <mi...@gmail.com>.
On 14-May-07, at 10:05 PM, James liu wrote:

> 2007/5/15, Mike Klaas <mi...@gmail.com>:
>>
>> I'm not ignoring it: I'm implying that the above is the correct
>> descending score-sorted order.  You have to perform that sort  
>> manually.
>
>
> i mean merged results(from 60 p) and sort it, not solr's sort.
> every result from box have been  sorted by score.

Yep, me too.

>
>> so it will not sorted by score correctly.
>> >
>> > and if user click page 2 to see, how to show data?
>> >
>> > p1 start from 10 or query other partitions?
>>
>> Assemble results 1 through 20, then display 11-20 to the user.
>
>
> for example, i wanna query "solr"
>
> p1 have 100 results which score is bigger than 80
>
> p2 have 100 results which score is smaller than 20
>
> so if i use rows=10, score not correct.
>
> if i wanna promise 10 pages which sort by score correctly.
>
> so i have to get 100(rows=100) results from every box.
>
> and merge results, sort it, finallay get top 100 results.
>
> but it will very slow.
>
>
> i don't know other search how to solve it? maybe they not sort by  
> score very
> correctly.

Hmm, I feel as though we are going in circles.

If you want to cache the top 100 documents for a query, there is  
essentially no efficient means of accumulating these results in one  
request--as you note, to be sure of having the top 100 documents, 100  
documents from each partition must be requested.

Your options are essentially:

1) request a smaller number of documents, and accept some  
inaccuracies (frinstance, if you request 10 docs, then the first page  
is guaranteed to be correct, but page 10 probably won't be quite right)

2) request a smaller number of documents and attempt to assemble the  
top 100 docs.  if you can't, then request more documents from the  
partitions that were exhausted soonest.

Keep in mind also that the scores across independent solr partitions  
are comparable, but not exact, due to idf differences.  The relative  
exactitude of page 10 results might not be too important.

-Mike

Re: Question: Pagination with multi index box

Posted by James liu <li...@gmail.com>.
2007/5/15, Mike Klaas <mi...@gmail.com>:
>
> On 14-May-07, at 8:55 PM, James liu wrote:
>
> > thks for your detail answer.
> >
> > but u ignore "sorted by score"
> >
> > p1, p2,p1,p1,p3,p4,p1,p1
> >
> > maybe their max score is lower than from p19,p20.
> >
>
> I'm not ignoring it: I'm implying that the above is the correct
> descending score-sorted order.  You have to perform that sort manually.


i mean merged results(from 60 p) and sort it, not solr's sort.
every result from box have been  sorted by score.


> so it will not sorted by score correctly.
> >
> > and if user click page 2 to see, how to show data?
> >
> > p1 start from 10 or query other partitions?
>
> Assemble results 1 through 20, then display 11-20 to the user.


for example, i wanna query "solr"

p1 have 100 results which score is bigger than 80

p2 have 100 results which score is smaller than 20

so if i use rows=10, score not correct.

if i wanna promise 10 pages which sort by score correctly.

so i have to get 100(rows=100) results from every box.

and merge results, sort it, finallay get top 100 results.

but it will very slow.


i don't know other search how to solve it? maybe they not sort by score very
correctly.




-Mike
>
> >
> > 2007/5/15, Mike Klaas <mi...@gmail.com>:
> >>
> >> On 14-May-07, at 6:49 PM, James liu wrote:
> >>
> >> > 2007/5/15, Mike Klaas <mi...@gmail.com>:
> >> >>
> >> >> On 14-May-07, at 1:35 AM, James liu wrote:
> >> >>
> >> >> When you get up to 60 partitions, you should make it a multi stage
> >> >> process.  Assuming your partitions are disjoint and evenly
> >> >> distributed, estimate the number of documents that will appear
> >> in the
> >> >> final result from each.
> >> >
> >> >
> >> > yes, partitions distrbuted.
> >> >
> >> >
> >> > Double or triple that (and put a minimum
> >> >> threshold), try to assemble the number of documents you
> >> require, and
> >> >> if one partition "runs out" of docs before it is done, request
> >> a new
> >> >> round.
> >> >
> >> >
> >> > i dont' know what u mean "runs out"
> >>
> >> Say you request 5 docs from each of 60 partitions, and are interested
> >> in docs 1-10.  If, sorted by score, the docs come from:
> >>
> >> p1, p2, p1, p1, p3, p4, p1, p1
> >>
> >> Then p1 has "run out" at n=8, and there is no way to be sure if the
> >> remaining two needed docs come from p1 or somewhere else.  So you
> >> have to now request at least two additional documents from p1.
> >>
> >> > one user request will generate 60 partitions request.
> >> >
> >> > they work in parallel。
> >> >
> >> > so i don't know every partion's status before they done.
> >>
> >> Normally, you would wait for them to finish, and execute a subsequent
> >> request if more docs are needed.
> >>
> >> -Mike
> >
> >
> >
> >
> > --
> > regards
> > jl
>
>


-- 
regards
jl

Re: Question: Pagination with multi index box

Posted by Mike Klaas <mi...@gmail.com>.
On 14-May-07, at 8:55 PM, James liu wrote:

> thks for your detail answer.
>
> but u ignore "sorted by score"
>
> p1, p2,p1,p1,p3,p4,p1,p1
>
> maybe their max score is lower than from p19,p20.
>

I'm not ignoring it: I'm implying that the above is the correct  
descending score-sorted order.  You have to perform that sort manually.

> so it will not sorted by score correctly.
>
> and if user click page 2 to see, how to show data?
>
> p1 start from 10 or query other partitions?

Assemble results 1 through 20, then display 11-20 to the user.

-Mike

>
> 2007/5/15, Mike Klaas <mi...@gmail.com>:
>>
>> On 14-May-07, at 6:49 PM, James liu wrote:
>>
>> > 2007/5/15, Mike Klaas <mi...@gmail.com>:
>> >>
>> >> On 14-May-07, at 1:35 AM, James liu wrote:
>> >>
>> >> When you get up to 60 partitions, you should make it a multi stage
>> >> process.  Assuming your partitions are disjoint and evenly
>> >> distributed, estimate the number of documents that will appear  
>> in the
>> >> final result from each.
>> >
>> >
>> > yes, partitions distrbuted.
>> >
>> >
>> > Double or triple that (and put a minimum
>> >> threshold), try to assemble the number of documents you  
>> require, and
>> >> if one partition "runs out" of docs before it is done, request  
>> a new
>> >> round.
>> >
>> >
>> > i dont' know what u mean "runs out"
>>
>> Say you request 5 docs from each of 60 partitions, and are interested
>> in docs 1-10.  If, sorted by score, the docs come from:
>>
>> p1, p2, p1, p1, p3, p4, p1, p1
>>
>> Then p1 has "run out" at n=8, and there is no way to be sure if the
>> remaining two needed docs come from p1 or somewhere else.  So you
>> have to now request at least two additional documents from p1.
>>
>> > one user request will generate 60 partitions request.
>> >
>> > they work in parallel。
>> >
>> > so i don't know every partion's status before they done.
>>
>> Normally, you would wait for them to finish, and execute a subsequent
>> request if more docs are needed.
>>
>> -Mike
>
>
>
>
> -- 
> regards
> jl


Re: Question: Pagination with multi index box

Posted by James liu <li...@gmail.com>.
thks for your detail answer.

but u ignore "sorted by score"

p1, p2,p1,p1,p3,p4,p1,p1

maybe their max score is lower than from p19,p20.

so it will not sorted by score correctly.

and if user click page 2 to see, how to show data?

p1 start from 10 or query other partitions?


2007/5/15, Mike Klaas <mi...@gmail.com>:
>
> On 14-May-07, at 6:49 PM, James liu wrote:
>
> > 2007/5/15, Mike Klaas <mi...@gmail.com>:
> >>
> >> On 14-May-07, at 1:35 AM, James liu wrote:
> >>
> >> When you get up to 60 partitions, you should make it a multi stage
> >> process.  Assuming your partitions are disjoint and evenly
> >> distributed, estimate the number of documents that will appear in the
> >> final result from each.
> >
> >
> > yes, partitions distrbuted.
> >
> >
> > Double or triple that (and put a minimum
> >> threshold), try to assemble the number of documents you require, and
> >> if one partition "runs out" of docs before it is done, request a new
> >> round.
> >
> >
> > i dont' know what u mean "runs out"
>
> Say you request 5 docs from each of 60 partitions, and are interested
> in docs 1-10.  If, sorted by score, the docs come from:
>
> p1, p2, p1, p1, p3, p4, p1, p1
>
> Then p1 has "run out" at n=8, and there is no way to be sure if the
> remaining two needed docs come from p1 or somewhere else.  So you
> have to now request at least two additional documents from p1.
>
> > one user request will generate 60 partitions request.
> >
> > they work in parallel。
> >
> > so i don't know every partion's status before they done.
>
> Normally, you would wait for them to finish, and execute a subsequent
> request if more docs are needed.
>
> -Mike




-- 
regards
jl

Re: Question: Pagination with multi index box

Posted by Mike Klaas <mi...@gmail.com>.
On 14-May-07, at 6:49 PM, James liu wrote:

> 2007/5/15, Mike Klaas <mi...@gmail.com>:
>>
>> On 14-May-07, at 1:35 AM, James liu wrote:
>>
>> When you get up to 60 partitions, you should make it a multi stage
>> process.  Assuming your partitions are disjoint and evenly
>> distributed, estimate the number of documents that will appear in the
>> final result from each.
>
>
> yes, partitions distrbuted.
>
>
> Double or triple that (and put a minimum
>> threshold), try to assemble the number of documents you require, and
>> if one partition "runs out" of docs before it is done, request a new
>> round.
>
>
> i dont' know what u mean "runs out"

Say you request 5 docs from each of 60 partitions, and are interested  
in docs 1-10.  If, sorted by score, the docs come from:

p1, p2, p1, p1, p3, p4, p1, p1

Then p1 has "run out" at n=8, and there is no way to be sure if the  
remaining two needed docs come from p1 or somewhere else.  So you  
have to now request at least two additional documents from p1.

> one user request will generate 60 partitions request.
>
> they work in parallel。
>
> so i don't know every partion's status before they done.

Normally, you would wait for them to finish, and execute a subsequent  
request if more docs are needed.

-Mike

Re: Question: Pagination with multi index box

Posted by James liu <li...@gmail.com>.
for example, i wanna query "lucene", it's numFound is 234300.

and results should sorted by score.

if u do, how to pagination and sort it's score?


2007/5/15, Mike Klaas <mi...@gmail.com>:
>
>
> On 14-May-07, at 7:15 PM, James liu wrote:
>
> > if i set rows=(page-1)*10,,,it will lose more result which fits query.
> >
> > how to set start when pagination.
>
> I'm not sure I understand the question.
>
> When combining results from partitions, you can't use startAt.



if not use startAt, how to define rows to keep user can find results?


 You
> must always assemble the docs from 0 to N for each partition (whether
> through one request or multiple).


if  rows bigger it will slow, if smaller it will lose data and sort score
not correctly.

-Mike
>
> >
> >
> > 2007/5/15, James liu <li...@gmail.com>:
> >>
> >>
> >>
> >> 2007/5/15, Mike Klaas <mi...@gmail.com>:
> >> >
> >> > On 14-May-07, at 1:35 AM, James liu wrote:
> >> >
> >> > > if use multi index box, how to pagination with sort by score
> >> > > correctly?
> >> > >
> >> > > for example, i wanna query "search" with 60 index box and sort by
> >> > > score.
> >> > >
> >> > > i don't know the num found from every index box which have
> >> different
> >> > > content.
> >> > >
> >> > > if promise 10 page with sort score correctly, i think solr 's
> >> start
> >> > > is 0,
> >> > > and rows is 100.(10 result per page)
> >> > >
> >> > > 60*100=6000, sort it and get top 100 to cache.
> >> >
> >> > > it is very slove although it promise 10 page with sort score
> >> > > correctly.
> >> >
> >> > With few index partitions, you it is sufficient to ask for startAt
> >> > +numNeeded docs from each partition and sort globally.  Normally if
> >> > you wanted 10 for the first page, you would ask for 10 from each
> >> > server and cache the remainder.  It is better to ask for more later
> >> > if the user asks for page ten.
> >> >
> >> >
> >> > When you get up to 60 partitions, you should make it a multi stage
> >> > process.  Assuming your partitions are disjoint and evenly
> >> > distributed, estimate the number of documents that will appear
> >> in the
> >> > final result from each.
> >>
> >>
> >> yes, partitions distrbuted.
> >>
> >>
> >>  Double or triple that (and put a minimum
> >> > threshold), try to assemble the number of documents you require,
> >> and
> >> > if one partition "runs out" of docs before it is done, request a
> >> new
> >> > round.
> >>
> >>
> >> i dont' know what u mean "runs out"
> >>
> >> one user request will generate 60 partitions request.
> >>
> >> they work in parallel。
> >>
> >> so i don't know every partion's status before they done.
> >>
> >>
> >> To promise 10 page result sorted by score correctly, the only way
> >> seems to
> >> get 100 results(rows=100) from each partitioin. but it very slow.
> >>
> >> now i wanna find a way to get result sorted by score correctly and
> >> search
> >> fast.
> >>
> >>
> >> -Mike
> >> >
> >>
> >> Thks Mike. But it not i want.
> >>
> >>
> >> --
> >> regards
> >> jl
> >
> >
> >
> >
> > --
> > regards
> > jl
>
>


-- 
regards
jl

Re: Question: Pagination with multi index box

Posted by Mike Klaas <mi...@gmail.com>.
On 14-May-07, at 7:15 PM, James liu wrote:

> if i set rows=(page-1)*10,,,it will lose more result which fits query.
>
> how to set start when pagination.

I'm not sure I understand the question.

When combining results from partitions, you can't use startAt.  You  
must always assemble the docs from 0 to N for each partition (whether  
through one request or multiple).

-Mike

>
>
> 2007/5/15, James liu <li...@gmail.com>:
>>
>>
>>
>> 2007/5/15, Mike Klaas <mi...@gmail.com>:
>> >
>> > On 14-May-07, at 1:35 AM, James liu wrote:
>> >
>> > > if use multi index box, how to pagination with sort by score
>> > > correctly?
>> > >
>> > > for example, i wanna query "search" with 60 index box and sort by
>> > > score.
>> > >
>> > > i don't know the num found from every index box which have  
>> different
>> > > content.
>> > >
>> > > if promise 10 page with sort score correctly, i think solr 's  
>> start
>> > > is 0,
>> > > and rows is 100.(10 result per page)
>> > >
>> > > 60*100=6000, sort it and get top 100 to cache.
>> >
>> > > it is very slove although it promise 10 page with sort score
>> > > correctly.
>> >
>> > With few index partitions, you it is sufficient to ask for startAt
>> > +numNeeded docs from each partition and sort globally.  Normally if
>> > you wanted 10 for the first page, you would ask for 10 from each
>> > server and cache the remainder.  It is better to ask for more later
>> > if the user asks for page ten.
>> >
>> >
>> > When you get up to 60 partitions, you should make it a multi stage
>> > process.  Assuming your partitions are disjoint and evenly
>> > distributed, estimate the number of documents that will appear  
>> in the
>> > final result from each.
>>
>>
>> yes, partitions distrbuted.
>>
>>
>>  Double or triple that (and put a minimum
>> > threshold), try to assemble the number of documents you require,  
>> and
>> > if one partition "runs out" of docs before it is done, request a  
>> new
>> > round.
>>
>>
>> i dont' know what u mean "runs out"
>>
>> one user request will generate 60 partitions request.
>>
>> they work in parallel。
>>
>> so i don't know every partion's status before they done.
>>
>>
>> To promise 10 page result sorted by score correctly, the only way  
>> seems to
>> get 100 results(rows=100) from each partitioin. but it very slow.
>>
>> now i wanna find a way to get result sorted by score correctly and  
>> search
>> fast.
>>
>>
>> -Mike
>> >
>>
>> Thks Mike. But it not i want.
>>
>>
>> --
>> regards
>> jl
>
>
>
>
> -- 
> regards
> jl


Re: Question: Pagination with multi index box

Posted by James liu <li...@gmail.com>.
if i set rows=(page-1)*10,,,it will lose more result which fits query.

how to set start when pagination.



2007/5/15, James liu <li...@gmail.com>:
>
>
>
> 2007/5/15, Mike Klaas <mi...@gmail.com>:
> >
> > On 14-May-07, at 1:35 AM, James liu wrote:
> >
> > > if use multi index box, how to pagination with sort by score
> > > correctly?
> > >
> > > for example, i wanna query "search" with 60 index box and sort by
> > > score.
> > >
> > > i don't know the num found from every index box which have different
> > > content.
> > >
> > > if promise 10 page with sort score correctly, i think solr 's start
> > > is 0,
> > > and rows is 100.(10 result per page)
> > >
> > > 60*100=6000, sort it and get top 100 to cache.
> >
> > > it is very slove although it promise 10 page with sort score
> > > correctly.
> >
> > With few index partitions, you it is sufficient to ask for startAt
> > +numNeeded docs from each partition and sort globally.  Normally if
> > you wanted 10 for the first page, you would ask for 10 from each
> > server and cache the remainder.  It is better to ask for more later
> > if the user asks for page ten.
> >
> >
> > When you get up to 60 partitions, you should make it a multi stage
> > process.  Assuming your partitions are disjoint and evenly
> > distributed, estimate the number of documents that will appear in the
> > final result from each.
>
>
> yes, partitions distrbuted.
>
>
>  Double or triple that (and put a minimum
> > threshold), try to assemble the number of documents you require, and
> > if one partition "runs out" of docs before it is done, request a new
> > round.
>
>
> i dont' know what u mean "runs out"
>
> one user request will generate 60 partitions request.
>
> they work in parallel。
>
> so i don't know every partion's status before they done.
>
>
> To promise 10 page result sorted by score correctly, the only way seems to
> get 100 results(rows=100) from each partitioin. but it very slow.
>
> now i wanna find a way to get result sorted by score correctly and search
> fast.
>
>
> -Mike
> >
>
> Thks Mike. But it not i want.
>
>
> --
> regards
> jl




-- 
regards
jl

Re: Question: Pagination with multi index box

Posted by James liu <li...@gmail.com>.
2007/5/15, Mike Klaas <mi...@gmail.com>:
>
> On 14-May-07, at 1:35 AM, James liu wrote:
>
> > if use multi index box, how to pagination with sort by score
> > correctly?
> >
> > for example, i wanna query "search" with 60 index box and sort by
> > score.
> >
> > i don't know the num found from every index box which have different
> > content.
> >
> > if promise 10 page with sort score correctly, i think solr 's start
> > is 0,
> > and rows is 100.(10 result per page)
> >
> > 60*100=6000, sort it and get top 100 to cache.
>
> > it is very slove although it promise 10 page with sort score
> > correctly.
>
> With few index partitions, you it is sufficient to ask for startAt
> +numNeeded docs from each partition and sort globally.  Normally if
> you wanted 10 for the first page, you would ask for 10 from each
> server and cache the remainder.  It is better to ask for more later
> if the user asks for page ten.
>
>
> When you get up to 60 partitions, you should make it a multi stage
> process.  Assuming your partitions are disjoint and evenly
> distributed, estimate the number of documents that will appear in the
> final result from each.


yes, partitions distrbuted.


 Double or triple that (and put a minimum
> threshold), try to assemble the number of documents you require, and
> if one partition "runs out" of docs before it is done, request a new
> round.


i dont' know what u mean "runs out"

one user request will generate 60 partitions request.

they work in parallel。

so i don't know every partion's status before they done.


To promise 10 page result sorted by score correctly, the only way seems to
get 100 results(rows=100) from each partitioin. but it very slow.

now i wanna find a way to get result sorted by score correctly and search
fast.


-Mike
>

Thks Mike. But it not i want.


-- 
regards
jl

Re: Question: Pagination with multi index box

Posted by Mike Klaas <mi...@gmail.com>.
On 14-May-07, at 1:35 AM, James liu wrote:

> if use multi index box, how to pagination with sort by score  
> correctly?
>
> for example, i wanna query "search" with 60 index box and sort by  
> score.
>
> i don't know the num found from every index box which have different
> content.
>
> if promise 10 page with sort score correctly, i think solr 's start  
> is 0,
> and rows is 100.(10 result per page)
>
> 60*100=6000, sort it and get top 100 to cache.

> it is very slove although it promise 10 page with sort score  
> correctly.

With few index partitions, you it is sufficient to ask for startAt 
+numNeeded docs from each partition and sort globally.  Normally if  
you wanted 10 for the first page, you would ask for 10 from each  
server and cache the remainder.  It is better to ask for more later  
if the user asks for page ten.


When you get up to 60 partitions, you should make it a multi stage  
process.  Assuming your partitions are disjoint and evenly  
distributed, estimate the number of documents that will appear in the  
final result from each.  Double or triple that (and put a minimum  
threshold), try to assemble the number of documents you require, and  
if one partition "runs out" of docs before it is done, request a new  
round.

-Mike