You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Антон Кириллов <an...@gmail.com> on 2009/12/21 13:26:07 UTC

Lucene' results paging

Hi, All! I have some problems with Lucene's search process and it's
results, so I hope You could help me.

First one: how should I split results by pages? Now I get search
results in such way:

TopDocs topDocs = is.search(finalQuery, 100000) //For example

and after that I get the needed results in such way:

//for example startPage = 20, endPage = 40
for(int j=startPage; j<stopPage; j++){

doc[j-startPage] = is.doc(topDocs.scoreDocs[j].doc);
}

I think this is a bad approach. How should I optimize my code to make
search faster? Is there any possibility to set start and stop pages in
search methods?

The second one:
After the search is completed and the results are not sorted are all
the results stored in search engine? I mean when I set the max number
of docs in method search(finalQuery, 10) equal 10. would Lucene find
all relevant docs, then sort them by relevance and select first ten
after that? Or does Lucene store some specific information in indices
which allows select first 10 most relevant docs without sorting all
million (for example) relevant pages?

Thanks in advance.
-- 
Anton Kirillov

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene' results paging

Posted by Uwe Schindler <uw...@thetaphi.de>.
Ist inside TopDocs class as a getter method.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Антон Кириллов [mailto:antonv.kirillov@gmail.com]
> Sent: Tuesday, December 22, 2009 1:05 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene' results paging
> 
> Thanks for your answer. But how should I get the total nubmer of
> search results in this case?
> 
> On Mon, Dec 21, 2009 at 6:51 PM, Ian Lea <ia...@gmail.com> wrote:
> > Hi
> >
> >
> > The standard approach to paging is just to do the search again and
> > pick out the docs you want, along the lines you outline.  You cannot
> > pass start/end info to any search methods.
> >
> > When you set max doc to, say, 10, lucene will find the 10 highest
> > scoring docs and return them.  There is no point in passing a number
> > higher than you need.  So if you are displaying 10 hits per page, call
> > search(query, 10) for the first page, ..., 20 for the second and pick
> > out the last 10, and so on.
> >
> > An alternative approach is for you to cache the search results
> > yourself.  That way you can avoid any subsequent searches.  But that
> > has it's own overhead and complexity, and most people most of the time
> > don't get much past the first page.
> >
> >
> > http://wiki.apache.org/lucene-
> java/LuceneFAQ#How_do_I_implement_paging.2C_i.e._showing_result_from_1-
> 10.2C_11-20_etc.3F
> >
> > --
> > Ian.
> >
> >
> > On Mon, Dec 21, 2009 at 12:26 PM, Антон Кириллов
> > <an...@gmail.com> wrote:
> >> Hi, All! I have some problems with Lucene's search process and it's
> >> results, so I hope You could help me.
> >>
> >> First one: how should I split results by pages? Now I get search
> >> results in such way:
> >>
> >> TopDocs topDocs = is.search(finalQuery, 100000) //For example
> >>
> >> and after that I get the needed results in such way:
> >>
> >> //for example startPage = 20, endPage = 40
> >> for(int j=startPage; j<stopPage; j++){
> >>
> >> doc[j-startPage] = is.doc(topDocs.scoreDocs[j].doc);
> >> }
> >>
> >> I think this is a bad approach. How should I optimize my code to make
> >> search faster? Is there any possibility to set start and stop pages in
> >> search methods?
> >>
> >> The second one:
> >> After the search is completed and the results are not sorted are all
> >> the results stored in search engine? I mean when I set the max number
> >> of docs in method search(finalQuery, 10) equal 10. would Lucene find
> >> all relevant docs, then sort them by relevance and select first ten
> >> after that? Or does Lucene store some specific information in indices
> >> which allows select first 10 most relevant docs without sorting all
> >> million (for example) relevant pages?
> >>
> >> Thanks in advance.
> >> --
> >> Anton Kirillov
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> 
> --
> С уважением, Кириллов Антон
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene' results paging

Posted by Антон Кириллов <an...@gmail.com>.
Thanks for your answer. But how should I get the total nubmer of
search results in this case?

On Mon, Dec 21, 2009 at 6:51 PM, Ian Lea <ia...@gmail.com> wrote:
> Hi
>
>
> The standard approach to paging is just to do the search again and
> pick out the docs you want, along the lines you outline.  You cannot
> pass start/end info to any search methods.
>
> When you set max doc to, say, 10, lucene will find the 10 highest
> scoring docs and return them.  There is no point in passing a number
> higher than you need.  So if you are displaying 10 hits per page, call
> search(query, 10) for the first page, ..., 20 for the second and pick
> out the last 10, and so on.
>
> An alternative approach is for you to cache the search results
> yourself.  That way you can avoid any subsequent searches.  But that
> has it's own overhead and complexity, and most people most of the time
> don't get much past the first page.
>
>
> http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_implement_paging.2C_i.e._showing_result_from_1-10.2C_11-20_etc.3F
>
> --
> Ian.
>
>
> On Mon, Dec 21, 2009 at 12:26 PM, Антон Кириллов
> <an...@gmail.com> wrote:
>> Hi, All! I have some problems with Lucene's search process and it's
>> results, so I hope You could help me.
>>
>> First one: how should I split results by pages? Now I get search
>> results in such way:
>>
>> TopDocs topDocs = is.search(finalQuery, 100000) //For example
>>
>> and after that I get the needed results in such way:
>>
>> //for example startPage = 20, endPage = 40
>> for(int j=startPage; j<stopPage; j++){
>>
>> doc[j-startPage] = is.doc(topDocs.scoreDocs[j].doc);
>> }
>>
>> I think this is a bad approach. How should I optimize my code to make
>> search faster? Is there any possibility to set start and stop pages in
>> search methods?
>>
>> The second one:
>> After the search is completed and the results are not sorted are all
>> the results stored in search engine? I mean when I set the max number
>> of docs in method search(finalQuery, 10) equal 10. would Lucene find
>> all relevant docs, then sort them by relevance and select first ten
>> after that? Or does Lucene store some specific information in indices
>> which allows select first 10 most relevant docs without sorting all
>> million (for example) relevant pages?
>>
>> Thanks in advance.
>> --
>> Anton Kirillov
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
С уважением, Кириллов Антон

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene' results paging

Posted by Ian Lea <ia...@gmail.com>.
Hi


The standard approach to paging is just to do the search again and
pick out the docs you want, along the lines you outline.  You cannot
pass start/end info to any search methods.

When you set max doc to, say, 10, lucene will find the 10 highest
scoring docs and return them.  There is no point in passing a number
higher than you need.  So if you are displaying 10 hits per page, call
search(query, 10) for the first page, ..., 20 for the second and pick
out the last 10, and so on.

An alternative approach is for you to cache the search results
yourself.  That way you can avoid any subsequent searches.  But that
has it's own overhead and complexity, and most people most of the time
don't get much past the first page.


http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_implement_paging.2C_i.e._showing_result_from_1-10.2C_11-20_etc.3F

--
Ian.


On Mon, Dec 21, 2009 at 12:26 PM, Антон Кириллов
<an...@gmail.com> wrote:
> Hi, All! I have some problems with Lucene's search process and it's
> results, so I hope You could help me.
>
> First one: how should I split results by pages? Now I get search
> results in such way:
>
> TopDocs topDocs = is.search(finalQuery, 100000) //For example
>
> and after that I get the needed results in such way:
>
> //for example startPage = 20, endPage = 40
> for(int j=startPage; j<stopPage; j++){
>
> doc[j-startPage] = is.doc(topDocs.scoreDocs[j].doc);
> }
>
> I think this is a bad approach. How should I optimize my code to make
> search faster? Is there any possibility to set start and stop pages in
> search methods?
>
> The second one:
> After the search is completed and the results are not sorted are all
> the results stored in search engine? I mean when I set the max number
> of docs in method search(finalQuery, 10) equal 10. would Lucene find
> all relevant docs, then sort them by relevance and select first ten
> after that? Or does Lucene store some specific information in indices
> which allows select first 10 most relevant docs without sorting all
> million (for example) relevant pages?
>
> Thanks in advance.
> --
> Anton Kirillov
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org