You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ro...@earthlink.net on 2009/02/19 04:29:56 UTC

what's the best practice for getting "next page" of hits?

R2.4

So, I may well be missing something here, but: I use 

<pseudoCode>IndexSearcher.search(someQuery, null, count, new Sort());</pseudoCode>

to get an instance of TopFieldDocs (the "Hits" is deprecated). So far, all fine; I get a bunch of documents. Now, what is the Lucene-best-practice for getting the *next* batch of size "count"? (Didn't see this discussed anywhere, but maybe I missed it.) 

a) I could guess that my users will never want more than "N*count", for some value of N, request that right up front, and do all my own "paging" using the one TopFieldDocs instance; 

b) I could assume that (a) will be an inefficient memory and time hog, and when the user clicks "Next" (or whatever), then ... (with i starting at "1") get a new TopFieldDocs with "(++i)*count", and out of that discard the first "i*count" items? In the limit (as i => N) that uses up just as much space and memory, but does so lazily (better); 

c) some compromise of (a) and (b), where I get M*count, do my own paging, and when the user asks for the (i+1)==(M+1)-th batch, then get another M*count (maybe faster, but also maybe bigger amortized memory footprint); 

d) something else? (I'd hope for something like a search() method with some parameter saying, in effect, "such and such a range of hits" ...) 

thanks,
Paul 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: what's the best practice for getting "next page" of hits?

Posted by Erick Erickson <er...@gmail.com>.
The best practice is, well, "It Depends" (tm). First off, I wouldn't do any
caching of results unless and until you had a reasonable certainty that
you had performance issues, so <b> would by my first choice. And if
you *did* start to see performance issues, I'd look first at why the queries
were expensive rather than look at caching. And I'd be certain that
you were getting a lot of requests for pages 2-N by mining my
query logs. There's no point in putting a caching scheme in if only
10% of your queries were for subsequent pages. Or even 50% of the
queries were for subsequent pages.

The thing to remember is that every search/sort *must* score and/or sort
all the documents to catch the case that the very last
document in the index is the best match. So having a method that
only returned matches N through N+pagesize only saves the
time/memory needed to copy matches 0 through N, and each
ScoreDoc is just an int and a float. You can copy a LOT of
ScoreDocs around before you notice........

What a caching scheme *would* save is re-executing the query. But long
before I went to a caching scheme, I'd try to understand why my queries
were slow. Especially when you couple that with the fact that the
overwhelming
number of users don't page very far into the result set before changing the
query.

Form the eXtreme Programming people "Do the simplest thing
that could possibly work". I add the addendum "Then *measure* to see
what the problems are before 'fixing' anything".

FWIW
Erick


On Wed, Feb 18, 2009 at 10:29 PM, <ro...@earthlink.net> wrote:

> R2.4
>
> So, I may well be missing something here, but: I use
>
> <pseudoCode>IndexSearcher.search(someQuery, null, count, new
> Sort());</pseudoCode>
>
> to get an instance of TopFieldDocs (the "Hits" is deprecated). So far, all
> fine; I get a bunch of documents. Now, what is the Lucene-best-practice for
> getting the *next* batch of size "count"? (Didn't see this discussed
> anywhere, but maybe I missed it.)
>
> a) I could guess that my users will never want more than "N*count", for
> some value of N, request that right up front, and do all my own "paging"
> using the one TopFieldDocs instance;
>
> b) I could assume that (a) will be an inefficient memory and time hog, and
> when the user clicks "Next" (or whatever), then ... (with i starting at "1")
> get a new TopFieldDocs with "(++i)*count", and out of that discard the first
> "i*count" items? In the limit (as i => N) that uses up just as much space
> and memory, but does so lazily (better);
>
> c) some compromise of (a) and (b), where I get M*count, do my own paging,
> and when the user asks for the (i+1)==(M+1)-th batch, then get another
> M*count (maybe faster, but also maybe bigger amortized memory footprint);
>
> d) something else? (I'd hope for something like a search() method with some
> parameter saying, in effect, "such and such a range of hits" ...)
>
> thanks,
> Paul
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: what's the best practice for getting "next page" of hits?

Posted by Joel Halbert <jo...@su3analytics.com>.
Out of interest, if the index is entirely in memory (using a RAMDir) is
there any significant different in performance between options (a) and
(b) as outlined below?

Rgs,
Joel

-----Original Message-----
From: Ganesh <em...@yahoo.co.in>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org, rolarenfan@earthlink.net
Subject: Re: what's the best practice for getting "next page" of hits?
Date: Thu, 19 Feb 2009 10:48:02 +0530

Your solution (b) is better rather than using your own way of paging.

Do search for every page and collect the (pageno * count) results, discard 
(pageno-1 * count) and display the last count results to the User. This is 
fast and efficient.

Regards
Ganesh

----- Original Message ----- 
From: <ro...@earthlink.net>
To: <ja...@lucene.apache.org>
Sent: Thursday, February 19, 2009 8:59 AM
Subject: what's the best practice for getting "next page" of hits?


> R2.4
>
> So, I may well be missing something here, but: I use
>
> <pseudoCode>IndexSearcher.search(someQuery, null, count, new 
> Sort());</pseudoCode>
>
> to get an instance of TopFieldDocs (the "Hits" is deprecated). So far, all 
> fine; I get a bunch of documents. Now, what is the Lucene-best-practice 
> for getting the *next* batch of size "count"? (Didn't see this discussed 
> anywhere, but maybe I missed it.)
>
> a) I could guess that my users will never want more than "N*count", for 
> some value of N, request that right up front, and do all my own "paging" 
> using the one TopFieldDocs instance;
>
> b) I could assume that (a) will be an inefficient memory and time hog, and 
> when the user clicks "Next" (or whatever), then ... (with i starting at 
> "1") get a new TopFieldDocs with "(++i)*count", and out of that discard 
> the first "i*count" items? In the limit (as i => N) that uses up just as 
> much space and memory, but does so lazily (better);
>
> c) some compromise of (a) and (b), where I get M*count, do my own paging, 
> and when the user asks for the (i+1)==(M+1)-th batch, then get another 
> M*count (maybe faster, but also maybe bigger amortized memory footprint);
>
> d) something else? (I'd hope for something like a search() method with 
> some parameter saying, in effect, "such and such a range of hits" ...)
>
> thanks,
> Paul
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: what's the best practice for getting "next page" of hits?

Posted by Ganesh <em...@yahoo.co.in>.
Your solution (b) is better rather than using your own way of paging.

Do search for every page and collect the (pageno * count) results, discard 
(pageno-1 * count) and display the last count results to the User. This is 
fast and efficient.

Regards
Ganesh

----- Original Message ----- 
From: <ro...@earthlink.net>
To: <ja...@lucene.apache.org>
Sent: Thursday, February 19, 2009 8:59 AM
Subject: what's the best practice for getting "next page" of hits?


> R2.4
>
> So, I may well be missing something here, but: I use
>
> <pseudoCode>IndexSearcher.search(someQuery, null, count, new 
> Sort());</pseudoCode>
>
> to get an instance of TopFieldDocs (the "Hits" is deprecated). So far, all 
> fine; I get a bunch of documents. Now, what is the Lucene-best-practice 
> for getting the *next* batch of size "count"? (Didn't see this discussed 
> anywhere, but maybe I missed it.)
>
> a) I could guess that my users will never want more than "N*count", for 
> some value of N, request that right up front, and do all my own "paging" 
> using the one TopFieldDocs instance;
>
> b) I could assume that (a) will be an inefficient memory and time hog, and 
> when the user clicks "Next" (or whatever), then ... (with i starting at 
> "1") get a new TopFieldDocs with "(++i)*count", and out of that discard 
> the first "i*count" items? In the limit (as i => N) that uses up just as 
> much space and memory, but does so lazily (better);
>
> c) some compromise of (a) and (b), where I get M*count, do my own paging, 
> and when the user asks for the (i+1)==(M+1)-th batch, then get another 
> M*count (maybe faster, but also maybe bigger amortized memory footprint);
>
> d) something else? (I'd hope for something like a search() method with 
> some parameter saying, in effect, "such and such a range of hits" ...)
>
> thanks,
> Paul
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org