You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Hiller, Dean x66079" <de...@broadridge.com> on 2011/06/19 19:51:37 UTC

looks like no allowing of paging without counting entire result set?

More on the below issue.  We have perhaps 10 million or 100 million but this new 3.x lucene appears to go over all the entries that match instead of just having a cursor into the index???  The more I look at the code, it almost looks like it is not possible.

I am wondering how the old Hits object worked that was deprecated and removed....that looks like I could stop asking it for more results and it would work better not counting all activities that matched in my 10 mil or 100 mil result set and just returning the first 100, second 100 and then I can cut off which would be way more performant.

Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?

Thanks,
Dean

From: Hiller, Dean x66079
Sent: Sunday, June 19, 2011 11:29 AM
To: 'java-user@lucene.apache.org'
Subject: how to do simple search paging results of 100 each? and query syntax question

On the link
http://lucene.apache.org/java/3_0_3/queryparsersyntax.html#Range%20Searches


There is ranged searched, how do I specify everything above a date from date 20020101  to end of time?



Next, I am temporarily using lucene in a noSQL solution(to switch to Solr later after prototype) and

So I am just indexing basic columns..no need for "top search results", etc.



When I look at the IndexSearcher and it's list of methods I am not sure how I can grab the first 100

Results, then the second 100 results(that is if I need them), then the third 100 results (again if needed)



I see a TopScoreDocCollector.create method but the IndexSearcher.search(Query, Collector) method states only to call that method if you need ALL the results.  I definitely don't need all but need to page through the

Results and typically exit out around the third page.  This is not a web app, so ideally I want a reference held into the indexed tree so it can keep giving me the next 100 results.



Thanks,

Dean

This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

Re: looks like no allowing of paging without counting entire result set?

Posted by Erick Erickson <er...@gmail.com>.

<<< that if the first page took 3 seconds to come up, the second page
took 3 seconds + x seconds>>>

This is really suspicious, what all are you trying to do in your
process? Because I'm starting to guess
that Solr isn't the performance problem here, assuming
reasonably-sized pages (e.g. < thousands).

If all you're doing is matching terms, not scoring, using wildcards,
and all that, you might get
some joy from TermDocs or similar.

Best
Erick

On Mon, Jun 20, 2011 at 9:44 AM, Hiller, Dean  x66079
<de...@broadridge.com> wrote:
> One more note:  We hit a big performance problem in that if the first page took 3 seconds to come up, the second page took 3 seconds + x seconds to come up....this was the major problem we hit.  Our client is not a web app but automated software so the timings on the second page really need to be in the 0 seconds + x seconds range.
>
> So, deep paging may happen if there are no matches in our system as the automated software has to go through all results until it pairs up the record that just came in.
>
> Main issue is we have nothing to do with search and are trying to use lucene as a plain indexing library for those typical rdbms indexing use-cases that you have.
>
> Dean
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Monday, June 20, 2011 6:15 AM
> To: java-user@lucene.apache.org
> Subject: Re: looks like no allowing of paging without counting entire result set?
>
> re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *]
>
> About paging... Yes, you have to start all over again for each search. The basic
> problem is that you have to score every document each search, the last document
> scored might be the highest-scoring document.
>
> But let's back up a step, can you tell us what the higher-level
> problem you're trying
> to solve is? *Why* do you want to do "deep paging"? Do you care about scoring
> the documents or do you just want to look at all of them that match?
>
> One solution would be to use a Collector that collected as many documents as
> you ever want to return and then you can use that list to "page". But
> that requires
> a stateful connection, which may be appropriate to your problem...
>
> Best
> Erick
>
> On Sun, Jun 19, 2011 at 2:39 PM, Hiller, Dean  x66079
> <de...@broadridge.com> wrote:
>> "It supports it like 2.9, but not using the Hits API. As described above, to
>> show results 991 to 1000 request the top-1000 results and display the last
>> 10 :-)"
>>
>> Bear with me as I am little confused so let me throw some stuff down here and think out loud...
>> So, I basically have to request the top 100, then do another request for the next 100, etc. etc which seems like that would start all over from scratch and be a bit of a performance hit correct???  I would think the optimal way would be search returns an object which maintains a cursor into the index tree until I close it so I can keep asking for the next 100.  It sounds like this new api doesn't do that?  And maybe the old one didn't either but from client perspective, I thought the Hits object might actually just maintain that pointer.
>>
>> NOTE: I am not doing anything close to search.  Just basic column indexing like an RDBMS would do for us except we don't have an RDBMS.  Our old RDBMS system has scaled up to being too costly(3 terabytes).  We are now scaling out with noSQL and trying to replace the RDBMS before the costs start to be more than the customers pay us.
>>
>> BIG NOTE: I think back to hibernate here where if you use select * from xx where yyy and setMaxResults and setFirstPage(index), it gets slower and slower as you page further in, BUT if you instead use the ScrollableResults, it maintains a cursor and the speed NEVER gets slower as you page into the results.
>>
>> Maybe I am using the wrong library but there are a lot of noSQL users of Hbase starting to use SOLR from what I understand.  Should I be using a different indexing library perhaps?
>>
>> Thanks,
>> Dean
>>
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Sunday, June 19, 2011 12:16 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: looks like no allowing of paging without counting entire result set?
>>
>>> I am wondering how the old Hits object worked that was deprecated and
>>> removed....that looks like I could stop asking it for more results and it
>> would
>>> work better not counting all activities that matched in my 10 mil or 100
>> mil
>>> result set and just returning the first 100, second 100 and then I can cut
>> off
>>> which would be way more performant.
>>
>> Hits did exactly what you described before. It got as many results as needed
>> to show the nth page. To when showing the page for results 20 to 30, it
>> fetches at least 30 results.
>>
>> In general Full Text Search engines are only scoring the top results. This
>> is e.g. one reason why Google limits the maximum page you can go to.
>>
>>> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?
>>
>> It supports it like 2.9, but not using the Hits API. As described above, to
>> show results 991 to 1000 request the top-1000 results and display the last
>> 10 :-)
>>
>> Uwe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> This message and any attachments are intended only for the use of the addressee and
>> may contain information that is privileged and confidential. If the reader of the
>> message is not the intended recipient or an authorized representative of the
>> intended recipient, you are hereby notified that any dissemination of this
>> communication is strictly prohibited. If you have received this communication in
>> error, please notify us immediately by e-mail and delete the message and any
>> attachments from your system.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: looks like no allowing of paging without counting entire result set?

Posted by "Hiller, Dean x66079" <de...@broadridge.com>.

One more note:  We hit a big performance problem in that if the first page took 3 seconds to come up, the second page took 3 seconds + x seconds to come up....this was the major problem we hit.  Our client is not a web app but automated software so the timings on the second page really need to be in the 0 seconds + x seconds range.

So, deep paging may happen if there are no matches in our system as the automated software has to go through all results until it pairs up the record that just came in.

Main issue is we have nothing to do with search and are trying to use lucene as a plain indexing library for those typical rdbms indexing use-cases that you have.

Dean

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, June 20, 2011 6:15 AM
To: java-user@lucene.apache.org
Subject: Re: looks like no allowing of paging without counting entire result set?

re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *]

About paging... Yes, you have to start all over again for each search. The basic
problem is that you have to score every document each search, the last document
scored might be the highest-scoring document.

But let's back up a step, can you tell us what the higher-level
problem you're trying
to solve is? *Why* do you want to do "deep paging"? Do you care about scoring
the documents or do you just want to look at all of them that match?

One solution would be to use a Collector that collected as many documents as
you ever want to return and then you can use that list to "page". But
that requires
a stateful connection, which may be appropriate to your problem...

Best
Erick

On Sun, Jun 19, 2011 at 2:39 PM, Hiller, Dean  x66079
<de...@broadridge.com> wrote:
> "It supports it like 2.9, but not using the Hits API. As described above, to
> show results 991 to 1000 request the top-1000 results and display the last
> 10 :-)"
>
> Bear with me as I am little confused so let me throw some stuff down here and think out loud...
> So, I basically have to request the top 100, then do another request for the next 100, etc. etc which seems like that would start all over from scratch and be a bit of a performance hit correct???  I would think the optimal way would be search returns an object which maintains a cursor into the index tree until I close it so I can keep asking for the next 100.  It sounds like this new api doesn't do that?  And maybe the old one didn't either but from client perspective, I thought the Hits object might actually just maintain that pointer.
>
> NOTE: I am not doing anything close to search.  Just basic column indexing like an RDBMS would do for us except we don't have an RDBMS.  Our old RDBMS system has scaled up to being too costly(3 terabytes).  We are now scaling out with noSQL and trying to replace the RDBMS before the costs start to be more than the customers pay us.
>
> BIG NOTE: I think back to hibernate here where if you use select * from xx where yyy and setMaxResults and setFirstPage(index), it gets slower and slower as you page further in, BUT if you instead use the ScrollableResults, it maintains a cursor and the speed NEVER gets slower as you page into the results.
>
> Maybe I am using the wrong library but there are a lot of noSQL users of Hbase starting to use SOLR from what I understand.  Should I be using a different indexing library perhaps?
>
> Thanks,
> Dean
>
>
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Sunday, June 19, 2011 12:16 PM
> To: java-user@lucene.apache.org
> Subject: RE: looks like no allowing of paging without counting entire result set?
>
>> I am wondering how the old Hits object worked that was deprecated and
>> removed....that looks like I could stop asking it for more results and it
> would
>> work better not counting all activities that matched in my 10 mil or 100
> mil
>> result set and just returning the first 100, second 100 and then I can cut
> off
>> which would be way more performant.
>
> Hits did exactly what you described before. It got as many results as needed
> to show the nth page. To when showing the page for results 20 to 30, it
> fetches at least 30 results.
>
> In general Full Text Search engines are only scoring the top results. This
> is e.g. one reason why Google limits the maximum page you can go to.
>
>> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?
>
> It supports it like 2.9, but not using the Hits API. As described above, to
> show results 991 to 1000 request the top-1000 results and display the last
> 10 :-)
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: looks like no allowing of paging without counting entire result set?

Posted by "Hiller, Dean x66079" <de...@broadridge.com>.

The noSQL world flips indexing upside down.  Instead of the database doing it for you, you do it, and this turns out to be a huge advantage in noSQL when I have huge data.  I need to create an index on my activity table account, security, activityDate columns...one index for each account instead of one huge billion entry index.  In RDBMS, this would be one huge billion node index tree.  In my db, it is 8000000 index trees instead which is way more performant.

So basically in short, I do not need scoring at all, I just need basic rdbms type indexing library and ideally when I get 3 pages in, I can exit out and I can page all I want to the end if I needed to.

I have been looking at the Collector and it seems like a hack, but I have been thinking I can lock in the collector to prevent lucene from reading in all rows.....I can also from the other thread then design a cursor.release that will tell my collector on the other thread to throw an exception so no more reading is done.

I only wish there was a more raw api that I may be missing in Lucene down below dealing with just index trees??

Thanks,
Dean

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, June 20, 2011 6:15 AM
To: java-user@lucene.apache.org
Subject: Re: looks like no allowing of paging without counting entire result set?

re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *]

About paging... Yes, you have to start all over again for each search. The basic
problem is that you have to score every document each search, the last document
scored might be the highest-scoring document.

But let's back up a step, can you tell us what the higher-level
problem you're trying
to solve is? *Why* do you want to do "deep paging"? Do you care about scoring
the documents or do you just want to look at all of them that match?

One solution would be to use a Collector that collected as many documents as
you ever want to return and then you can use that list to "page". But
that requires
a stateful connection, which may be appropriate to your problem...

Best
Erick

On Sun, Jun 19, 2011 at 2:39 PM, Hiller, Dean  x66079
<de...@broadridge.com> wrote:
> "It supports it like 2.9, but not using the Hits API. As described above, to
> show results 991 to 1000 request the top-1000 results and display the last
> 10 :-)"
>
> Bear with me as I am little confused so let me throw some stuff down here and think out loud...
> So, I basically have to request the top 100, then do another request for the next 100, etc. etc which seems like that would start all over from scratch and be a bit of a performance hit correct???  I would think the optimal way would be search returns an object which maintains a cursor into the index tree until I close it so I can keep asking for the next 100.  It sounds like this new api doesn't do that?  And maybe the old one didn't either but from client perspective, I thought the Hits object might actually just maintain that pointer.
>
> NOTE: I am not doing anything close to search.  Just basic column indexing like an RDBMS would do for us except we don't have an RDBMS.  Our old RDBMS system has scaled up to being too costly(3 terabytes).  We are now scaling out with noSQL and trying to replace the RDBMS before the costs start to be more than the customers pay us.
>
> BIG NOTE: I think back to hibernate here where if you use select * from xx where yyy and setMaxResults and setFirstPage(index), it gets slower and slower as you page further in, BUT if you instead use the ScrollableResults, it maintains a cursor and the speed NEVER gets slower as you page into the results.
>
> Maybe I am using the wrong library but there are a lot of noSQL users of Hbase starting to use SOLR from what I understand.  Should I be using a different indexing library perhaps?
>
> Thanks,
> Dean
>
>
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Sunday, June 19, 2011 12:16 PM
> To: java-user@lucene.apache.org
> Subject: RE: looks like no allowing of paging without counting entire result set?
>
>> I am wondering how the old Hits object worked that was deprecated and
>> removed....that looks like I could stop asking it for more results and it
> would
>> work better not counting all activities that matched in my 10 mil or 100
> mil
>> result set and just returning the first 100, second 100 and then I can cut
> off
>> which would be way more performant.
>
> Hits did exactly what you described before. It got as many results as needed
> to show the nth page. To when showing the page for results 20 to 30, it
> fetches at least 30 results.
>
> In general Full Text Search engines are only scoring the top results. This
> is e.g. one reason why Google limits the maximum page you can go to.
>
>> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?
>
> It supports it like 2.9, but not using the Hits API. As described above, to
> show results 991 to 1000 request the top-1000 results and display the last
> 10 :-)
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: looks like no allowing of paging without counting entire result set?

Posted by Erick Erickson <er...@gmail.com>.

re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *]

About paging... Yes, you have to start all over again for each search. The basic
problem is that you have to score every document each search, the last document
scored might be the highest-scoring document.

But let's back up a step, can you tell us what the higher-level
problem you're trying
to solve is? *Why* do you want to do "deep paging"? Do you care about scoring
the documents or do you just want to look at all of them that match?

One solution would be to use a Collector that collected as many documents as
you ever want to return and then you can use that list to "page". But
that requires
a stateful connection, which may be appropriate to your problem...

Best
Erick

On Sun, Jun 19, 2011 at 2:39 PM, Hiller, Dean  x66079
<de...@broadridge.com> wrote:
> "It supports it like 2.9, but not using the Hits API. As described above, to
> show results 991 to 1000 request the top-1000 results and display the last
> 10 :-)"
>
> Bear with me as I am little confused so let me throw some stuff down here and think out loud...
> So, I basically have to request the top 100, then do another request for the next 100, etc. etc which seems like that would start all over from scratch and be a bit of a performance hit correct???  I would think the optimal way would be search returns an object which maintains a cursor into the index tree until I close it so I can keep asking for the next 100.  It sounds like this new api doesn't do that?  And maybe the old one didn't either but from client perspective, I thought the Hits object might actually just maintain that pointer.
>
> NOTE: I am not doing anything close to search.  Just basic column indexing like an RDBMS would do for us except we don't have an RDBMS.  Our old RDBMS system has scaled up to being too costly(3 terabytes).  We are now scaling out with noSQL and trying to replace the RDBMS before the costs start to be more than the customers pay us.
>
> BIG NOTE: I think back to hibernate here where if you use select * from xx where yyy and setMaxResults and setFirstPage(index), it gets slower and slower as you page further in, BUT if you instead use the ScrollableResults, it maintains a cursor and the speed NEVER gets slower as you page into the results.
>
> Maybe I am using the wrong library but there are a lot of noSQL users of Hbase starting to use SOLR from what I understand.  Should I be using a different indexing library perhaps?
>
> Thanks,
> Dean
>
>
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Sunday, June 19, 2011 12:16 PM
> To: java-user@lucene.apache.org
> Subject: RE: looks like no allowing of paging without counting entire result set?
>
>> I am wondering how the old Hits object worked that was deprecated and
>> removed....that looks like I could stop asking it for more results and it
> would
>> work better not counting all activities that matched in my 10 mil or 100
> mil
>> result set and just returning the first 100, second 100 and then I can cut
> off
>> which would be way more performant.
>
> Hits did exactly what you described before. It got as many results as needed
> to show the nth page. To when showing the page for results 20 to 30, it
> fetches at least 30 results.
>
> In general Full Text Search engines are only scoring the top results. This
> is e.g. one reason why Google limits the maximum page you can go to.
>
>> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?
>
> It supports it like 2.9, but not using the Hits API. As described above, to
> show results 991 to 1000 request the top-1000 results and display the last
> 10 :-)
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: looks like no allowing of paging without counting entire result set?

Posted by "Hiller, Dean x66079" <de...@broadridge.com>.

"It supports it like 2.9, but not using the Hits API. As described above, to
show results 991 to 1000 request the top-1000 results and display the last
10 :-)"

Bear with me as I am little confused so let me throw some stuff down here and think out loud...
So, I basically have to request the top 100, then do another request for the next 100, etc. etc which seems like that would start all over from scratch and be a bit of a performance hit correct???  I would think the optimal way would be search returns an object which maintains a cursor into the index tree until I close it so I can keep asking for the next 100.  It sounds like this new api doesn't do that?  And maybe the old one didn't either but from client perspective, I thought the Hits object might actually just maintain that pointer.

NOTE: I am not doing anything close to search.  Just basic column indexing like an RDBMS would do for us except we don't have an RDBMS.  Our old RDBMS system has scaled up to being too costly(3 terabytes).  We are now scaling out with noSQL and trying to replace the RDBMS before the costs start to be more than the customers pay us.

BIG NOTE: I think back to hibernate here where if you use select * from xx where yyy and setMaxResults and setFirstPage(index), it gets slower and slower as you page further in, BUT if you instead use the ScrollableResults, it maintains a cursor and the speed NEVER gets slower as you page into the results.

Maybe I am using the wrong library but there are a lot of noSQL users of Hbase starting to use SOLR from what I understand.  Should I be using a different indexing library perhaps?

Thanks,
Dean


-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Sunday, June 19, 2011 12:16 PM
To: java-user@lucene.apache.org
Subject: RE: looks like no allowing of paging without counting entire result set?

> I am wondering how the old Hits object worked that was deprecated and
> removed....that looks like I could stop asking it for more results and it
would
> work better not counting all activities that matched in my 10 mil or 100
mil
> result set and just returning the first 100, second 100 and then I can cut
off
> which would be way more performant.

Hits did exactly what you described before. It got as many results as needed
to show the nth page. To when showing the page for results 20 to 30, it
fetches at least 30 results.

In general Full Text Search engines are only scoring the top results. This
is e.g. one reason why Google limits the maximum page you can go to.

> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?

It supports it like 2.9, but not using the Hits API. As described above, to
show results 991 to 1000 request the top-1000 results and display the last
10 :-)

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: looks like no allowing of paging without counting entire result set?

Posted by Uwe Schindler <uw...@thetaphi.de>.

> I am wondering how the old Hits object worked that was deprecated and
> removed....that looks like I could stop asking it for more results and it
would
> work better not counting all activities that matched in my 10 mil or 100
mil
> result set and just returning the first 100, second 100 and then I can cut
off
> which would be way more performant.

Hits did exactly what you described before. It got as many results as needed
to show the nth page. To when showing the page for results 20 to 30, it
fetches at least 30 results.

In general Full Text Search engines are only scoring the top results. This
is e.g. one reason why Google limits the maximum page you can go to.

> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?

It supports it like 2.9, but not using the Hits API. As described above, to
show results 991 to 1000 request the top-1000 results and display the last
10 :-)

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org