You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Fuad Efendi <fu...@efendi.ca> on 2009/12/24 17:09:08 UTC

SOLR Performance Tuning: Pagination

I used pagination for a while till found this...


I have filtered query ID:[* TO *] returning 20 millions results (no
faceting), and pagination always seemed to be fast. However, fast only with
low values for start=12345. Queries like start=28838540 take 40-60 seconds,
and even cause OutOfMemoryException.

I use highlight, faceting on nontokenized "Country" field, standard handler.


It even seems to be a bug...


Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay

Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search

Re: SOLR Performance Tuning: Pagination

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Si si, that issue.
 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Peter Wolanin <pe...@acquia.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, January 7, 2010 9:27:04 PM
> Subject: Re: SOLR Performance Tuning: Pagination
> 
> Great - this issue?  https://issues.apache.org/jira/browse/LUCENE-2127
> 
> Sounds like it would be a real win for lucene.
> 
> -Peter
> 
> On Thu, Jan 7, 2010 at 4:12 PM, Otis Gospodnetic
> wrote:
> > Peter - Aaron just commented on a recent Solr issue (reading large result 
> sets) and mentioned his patch.
> > So far he has 2 x +1 from Grant and me to stick his patch in JIRA.
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: Peter Wolanin 
> >> To: solr-user@lucene.apache.org
> >> Sent: Sun, January 3, 2010 3:37:01 PM
> >> Subject: Re: SOLR Performance Tuning: Pagination
> >>
> >> At the NOVA Apache Lucene/Solr Meetup last May, one of the speakers
> >> from Near Infinity (Aaron McCurry I think) mentioned that he had a
> >> patch for lucene that enabled unlimited depth memory-efficient paging.
> >> Is anyone in contact with him?
> >>
> >> -Peter
> >>
> >> On Thu, Dec 24, 2009 at 11:27 AM, Grant Ingersoll wrote:
> >> >
> >> > On Dec 24, 2009, at 11:09 AM, Fuad Efendi wrote:
> >> >
> >> >> I used pagination for a while till found this...
> >> >>
> >> >>
> >> >> I have filtered query ID:[* TO *] returning 20 millions results (no
> >> >> faceting), and pagination always seemed to be fast. However, fast only 
> with
> >> >> low values for start=12345. Queries like start=28838540 take 40-60 
> seconds,
> >> >> and even cause OutOfMemoryException.
> >> >
> >> > Yeah, deep pagination in Lucene/Solr can be problematic due to the Priority
> >> Queue management.  See http://issues.apache.org/jira/browse/LUCENE-2127 and 
> the
> >> linked discussion on java-dev.
> >> >
> >> >>
> >> >> I use highlight, faceting on nontokenized "Country" field, standard 
> handler.
> >> >>
> >> >>
> >> >> It even seems to be a bug...
> >> >>
> >> >>
> >> >> Fuad Efendi
> >> >> +1 416-993-2060
> >> >> http://www.linkedin.com/in/liferay
> >> >>
> >> >> Tokenizer Inc.
> >> >> http://www.tokenizer.ca/
> >> >> Data Mining, Vertical Search
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >> > --------------------------
> >> > Grant Ingersoll
> >> > http://www.lucidimagination.com/
> >> >
> >> > Search the Lucene ecosystem using Solr/Lucene:
> >> http://www.lucidimagination.com/search
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Peter M. Wolanin, Ph.D.
> >> Momentum Specialist,  Acquia. Inc.
> >> peter.wolanin@acquia.com
> >
> >
> 
> 
> 
> -- 
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wolanin@acquia.com

Re: SOLR Performance Tuning: Pagination

Posted by Peter Wolanin <pe...@acquia.com>.

Great - this issue?  https://issues.apache.org/jira/browse/LUCENE-2127

Sounds like it would be a real win for lucene.

-Peter

On Thu, Jan 7, 2010 at 4:12 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Peter - Aaron just commented on a recent Solr issue (reading large result sets) and mentioned his patch.
> So far he has 2 x +1 from Grant and me to stick his patch in JIRA.
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
>> From: Peter Wolanin <pe...@acquia.com>
>> To: solr-user@lucene.apache.org
>> Sent: Sun, January 3, 2010 3:37:01 PM
>> Subject: Re: SOLR Performance Tuning: Pagination
>>
>> At the NOVA Apache Lucene/Solr Meetup last May, one of the speakers
>> from Near Infinity (Aaron McCurry I think) mentioned that he had a
>> patch for lucene that enabled unlimited depth memory-efficient paging.
>> Is anyone in contact with him?
>>
>> -Peter
>>
>> On Thu, Dec 24, 2009 at 11:27 AM, Grant Ingersoll wrote:
>> >
>> > On Dec 24, 2009, at 11:09 AM, Fuad Efendi wrote:
>> >
>> >> I used pagination for a while till found this...
>> >>
>> >>
>> >> I have filtered query ID:[* TO *] returning 20 millions results (no
>> >> faceting), and pagination always seemed to be fast. However, fast only with
>> >> low values for start=12345. Queries like start=28838540 take 40-60 seconds,
>> >> and even cause OutOfMemoryException.
>> >
>> > Yeah, deep pagination in Lucene/Solr can be problematic due to the Priority
>> Queue management.  See http://issues.apache.org/jira/browse/LUCENE-2127 and the
>> linked discussion on java-dev.
>> >
>> >>
>> >> I use highlight, faceting on nontokenized "Country" field, standard handler.
>> >>
>> >>
>> >> It even seems to be a bug...
>> >>
>> >>
>> >> Fuad Efendi
>> >> +1 416-993-2060
>> >> http://www.linkedin.com/in/liferay
>> >>
>> >> Tokenizer Inc.
>> >> http://www.tokenizer.ca/
>> >> Data Mining, Vertical Search
>> >>
>> >>
>> >>
>> >>
>> >
>> > --------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com/
>> >
>> > Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> >
>> >
>>
>>
>>
>> --
>> Peter M. Wolanin, Ph.D.
>> Momentum Specialist,  Acquia. Inc.
>> peter.wolanin@acquia.com
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: SOLR Performance Tuning: Pagination

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Peter - Aaron just commented on a recent Solr issue (reading large result sets) and mentioned his patch.
So far he has 2 x +1 from Grant and me to stick his patch in JIRA.

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Peter Wolanin <pe...@acquia.com>
> To: solr-user@lucene.apache.org
> Sent: Sun, January 3, 2010 3:37:01 PM
> Subject: Re: SOLR Performance Tuning: Pagination
> 
> At the NOVA Apache Lucene/Solr Meetup last May, one of the speakers
> from Near Infinity (Aaron McCurry I think) mentioned that he had a
> patch for lucene that enabled unlimited depth memory-efficient paging.
> Is anyone in contact with him?
> 
> -Peter
> 
> On Thu, Dec 24, 2009 at 11:27 AM, Grant Ingersoll wrote:
> >
> > On Dec 24, 2009, at 11:09 AM, Fuad Efendi wrote:
> >
> >> I used pagination for a while till found this...
> >>
> >>
> >> I have filtered query ID:[* TO *] returning 20 millions results (no
> >> faceting), and pagination always seemed to be fast. However, fast only with
> >> low values for start=12345. Queries like start=28838540 take 40-60 seconds,
> >> and even cause OutOfMemoryException.
> >
> > Yeah, deep pagination in Lucene/Solr can be problematic due to the Priority 
> Queue management.  See http://issues.apache.org/jira/browse/LUCENE-2127 and the 
> linked discussion on java-dev.
> >
> >>
> >> I use highlight, faceting on nontokenized "Country" field, standard handler.
> >>
> >>
> >> It even seems to be a bug...
> >>
> >>
> >> Fuad Efendi
> >> +1 416-993-2060
> >> http://www.linkedin.com/in/liferay
> >>
> >> Tokenizer Inc.
> >> http://www.tokenizer.ca/
> >> Data Mining, Vertical Search
> >>
> >>
> >>
> >>
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem using Solr/Lucene: 
> http://www.lucidimagination.com/search
> >
> >
> 
> 
> 
> -- 
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wolanin@acquia.com

Re: SOLR Performance Tuning: Pagination

Posted by Peter Wolanin <pe...@acquia.com>.

At the NOVA Apache Lucene/Solr Meetup last May, one of the speakers
from Near Infinity (Aaron McCurry I think) mentioned that he had a
patch for lucene that enabled unlimited depth memory-efficient paging.
 Is anyone in contact with him?

-Peter

On Thu, Dec 24, 2009 at 11:27 AM, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Dec 24, 2009, at 11:09 AM, Fuad Efendi wrote:
>
>> I used pagination for a while till found this...
>>
>>
>> I have filtered query ID:[* TO *] returning 20 millions results (no
>> faceting), and pagination always seemed to be fast. However, fast only with
>> low values for start=12345. Queries like start=28838540 take 40-60 seconds,
>> and even cause OutOfMemoryException.
>
> Yeah, deep pagination in Lucene/Solr can be problematic due to the Priority Queue management.  See http://issues.apache.org/jira/browse/LUCENE-2127 and the linked discussion on java-dev.
>
>>
>> I use highlight, faceting on nontokenized "Country" field, standard handler.
>>
>>
>> It even seems to be a bug...
>>
>>
>> Fuad Efendi
>> +1 416-993-2060
>> http://www.linkedin.com/in/liferay
>>
>> Tokenizer Inc.
>> http://www.tokenizer.ca/
>> Data Mining, Vertical Search
>>
>>
>>
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: SOLR Performance Tuning: Pagination

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 24, 2009, at 11:09 AM, Fuad Efendi wrote:

> I used pagination for a while till found this...
> 
> 
> I have filtered query ID:[* TO *] returning 20 millions results (no
> faceting), and pagination always seemed to be fast. However, fast only with
> low values for start=12345. Queries like start=28838540 take 40-60 seconds,
> and even cause OutOfMemoryException.

Yeah, deep pagination in Lucene/Solr can be problematic due to the Priority Queue management.  See http://issues.apache.org/jira/browse/LUCENE-2127 and the linked discussion on java-dev.

> 
> I use highlight, faceting on nontokenized "Country" field, standard handler.
> 
> 
> It even seems to be a bug...
> 
> 
> Fuad Efendi
> +1 416-993-2060
> http://www.linkedin.com/in/liferay
> 
> Tokenizer Inc.
> http://www.tokenizer.ca/
> Data Mining, Vertical Search
> 
> 
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search

Re: SOLR Performance Tuning: Pagination

Posted by Paul Rosen <pa...@performantsoftware.com>.

This is similar to a problem I've been having. It's pretty easy to just 
limit the user to 200 pages when the results are sorted by relevance, 
but if they are sorted alphabetically, then that doesn't work. It would 
be nice if there were a "limit=2000" parameter to the solr call that is 
applied to the relevance before sorting.

[It also made me jump through hoops when I wrote some unit tests for the 
indexing.]

>> -----Original Message-----
>> From: Walter Underwood [mailto:wunder@wunderwood.org]
>> Sent: December-24-09 1:51 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SOLR Performance Tuning: Pagination
>>
>> Some bots will do that, too. Maybe badly written ones, but we saw that at
>> Netflix. It was causing search timeouts just before a peak traffic period,
>> so we set a page limit in the front end, something like 200 pages.
>>
>> It makes sense for that to be very slow, because a request for hit
>> 28838540 means that Solr has to calculate the relevance for 28838540 + 10
>> documents.
>>
>> Fuad: Why are you benchmarking this? What user is looking at 20M
>> documents?
>>
>> wunder
>>
>> On Dec 24, 2009, at 10:44 AM, Erik Hatcher wrote:
>>
>>> On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote:
>>>> When do users do a query like that? --wunder
>>> Well, SolrEntityProcessor "users" do :)
>>>
>>>  http://issues.apache.org/jira/browse/SOLR-1499
>>>  (which by the way I plan on polishing and committing over the holidays)
>>>
>>> 	Erik
>>>
>>>
>>>
>>>> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
>>>>
>>>>> I used pagination for a while till found this...
>>>>>
>>>>>
>>>>> I have filtered query ID:[* TO *] returning 20 millions results (no
>>>>> faceting), and pagination always seemed to be fast. However, fast only
>> with
>>>>> low values for start=12345. Queries like start=28838540 take 40-60
>> seconds,
>>>>> and even cause OutOfMemoryException.
>>>>>
>>>>> I use highlight, faceting on nontokenized "Country" field, standard
>> handler.
>>>>>
>>>>> It even seems to be a bug...
>>>>>
>>>>>
>>>>> Fuad Efendi
>>>>> +1 416-993-2060
>>>>> http://www.linkedin.com/in/liferay
>>>>>
>>>>> Tokenizer Inc.
>>>>> http://www.tokenizer.ca/
>>>>> Data Mining, Vertical Search
>>>>>
> 
> 
>

RE: SOLR Performance Tuning: Pagination

Posted by Fuad Efendi <fu...@efendi.ca>.

Grant, Eric, Walter, and SOLR,

Thank you so much for very prompt responses (with links!)

>From time to time I try to share...


Happy Holidays!!!!!!!




> -----Original Message-----
> From: Walter Underwood [mailto:wunder@wunderwood.org]
> Sent: December-24-09 1:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR Performance Tuning: Pagination
> 
> Some bots will do that, too. Maybe badly written ones, but we saw that at
> Netflix. It was causing search timeouts just before a peak traffic period,
> so we set a page limit in the front end, something like 200 pages.
> 
> It makes sense for that to be very slow, because a request for hit
> 28838540 means that Solr has to calculate the relevance for 28838540 + 10
> documents.
> 
> Fuad: Why are you benchmarking this? What user is looking at 20M
> documents?
> 
> wunder
> 
> On Dec 24, 2009, at 10:44 AM, Erik Hatcher wrote:
> 
> >
> > On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote:
> >> When do users do a query like that? --wunder
> >
> > Well, SolrEntityProcessor "users" do :)
> >
> >  http://issues.apache.org/jira/browse/SOLR-1499
> >  (which by the way I plan on polishing and committing over the holidays)
> >
> > 	Erik
> >
> >
> >
> >>
> >> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
> >>
> >>> I used pagination for a while till found this...
> >>>
> >>>
> >>> I have filtered query ID:[* TO *] returning 20 millions results (no
> >>> faceting), and pagination always seemed to be fast. However, fast only
> with
> >>> low values for start=12345. Queries like start=28838540 take 40-60
> seconds,
> >>> and even cause OutOfMemoryException.
> >>>
> >>> I use highlight, faceting on nontokenized "Country" field, standard
> handler.
> >>>
> >>>
> >>> It even seems to be a bug...
> >>>
> >>>
> >>> Fuad Efendi
> >>> +1 416-993-2060
> >>> http://www.linkedin.com/in/liferay
> >>>
> >>> Tokenizer Inc.
> >>> http://www.tokenizer.ca/
> >>> Data Mining, Vertical Search
> >>>
> >>
> >

Re: SOLR Performance Tuning: Pagination

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 24, 2009, at 1:51 PM, Walter Underwood wrote:

> Some bots will do that, too. Maybe badly written ones, but we saw that at Netflix. It was causing search timeouts just before a peak traffic period, so we set a page limit in the front end, something like 200 pages.
> 
> It makes sense for that to be very slow, because a request for hit 28838540 means that Solr has to calculate the relevance for 28838540 + 10 documents.
> 
> Fuad: Why are you benchmarking this? What user is looking at 20M documents? 
> 

20M may be a bit much, but 500K - 1M is not out of the realm for clients that do downstream analysis.

RE: SOLR Performance Tuning: Pagination

Posted by Fuad Efendi <fu...@efendi.ca>.

Hi Walter, you are right, it were mostly robots (Googlebot, Yahoo/Slurp,
etc);

I have friendly URLs like 
http://www.tokenizer.org/USA/?page=7 (30mlns docs, 3mlns pages)
http://www.tokenizer.org/www.newegg.com/
http://www.tokenizer.org/www.newegg.com/?sort=link&dir=asc&q=Opteron

And even this:
http://www.tokenizer.org/AMD/Opteron/8350/

I disabled processing for URLs with no query parameter (empty results); but
I should really limit pagination programmatically... fortunately
http://www.tokenizer.org/?q=USA returns 50k documents (search doesn't use
"Country" field). But some queries may return huge nuber of documents
(better is to tune  "stop-word" list)

-Fuad


> -----Original Message-----
> From: Walter Underwood [mailto:wunder@wunderwood.org]
> Sent: December-24-09 1:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR Performance Tuning: Pagination
> 
> Some bots will do that, too. Maybe badly written ones, but we saw that at
> Netflix. It was causing search timeouts just before a peak traffic period,
> so we set a page limit in the front end, something like 200 pages.
> 
> It makes sense for that to be very slow, because a request for hit
> 28838540 means that Solr has to calculate the relevance for 28838540 + 10
> documents.
> 
> Fuad: Why are you benchmarking this? What user is looking at 20M
> documents?
> 
> wunder
> 
> On Dec 24, 2009, at 10:44 AM, Erik Hatcher wrote:
> 
> >
> > On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote:
> >> When do users do a query like that? --wunder
> >
> > Well, SolrEntityProcessor "users" do :)
> >
> >  http://issues.apache.org/jira/browse/SOLR-1499
> >  (which by the way I plan on polishing and committing over the holidays)
> >
> > 	Erik
> >
> >
> >
> >>
> >> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
> >>
> >>> I used pagination for a while till found this...
> >>>
> >>>
> >>> I have filtered query ID:[* TO *] returning 20 millions results (no
> >>> faceting), and pagination always seemed to be fast. However, fast only
> with
> >>> low values for start=12345. Queries like start=28838540 take 40-60
> seconds,
> >>> and even cause OutOfMemoryException.
> >>>
> >>> I use highlight, faceting on nontokenized "Country" field, standard
> handler.
> >>>
> >>>
> >>> It even seems to be a bug...
> >>>
> >>>
> >>> Fuad Efendi
> >>> +1 416-993-2060
> >>> http://www.linkedin.com/in/liferay
> >>>
> >>> Tokenizer Inc.
> >>> http://www.tokenizer.ca/
> >>> Data Mining, Vertical Search
> >>>
> >>
> >

Re: SOLR Performance Tuning: Pagination

Posted by Walter Underwood <wu...@wunderwood.org>.

Some bots will do that, too. Maybe badly written ones, but we saw that at Netflix. It was causing search timeouts just before a peak traffic period, so we set a page limit in the front end, something like 200 pages.

It makes sense for that to be very slow, because a request for hit 28838540 means that Solr has to calculate the relevance for 28838540 + 10 documents.

Fuad: Why are you benchmarking this? What user is looking at 20M documents? 

wunder

On Dec 24, 2009, at 10:44 AM, Erik Hatcher wrote:

> 
> On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote:
>> When do users do a query like that? --wunder
> 
> Well, SolrEntityProcessor "users" do :)
> 
>  http://issues.apache.org/jira/browse/SOLR-1499
>  (which by the way I plan on polishing and committing over the holidays)
> 
> 	Erik
> 
> 
> 
>> 
>> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
>> 
>>> I used pagination for a while till found this...
>>> 
>>> 
>>> I have filtered query ID:[* TO *] returning 20 millions results (no
>>> faceting), and pagination always seemed to be fast. However, fast only with
>>> low values for start=12345. Queries like start=28838540 take 40-60 seconds,
>>> and even cause OutOfMemoryException.
>>> 
>>> I use highlight, faceting on nontokenized "Country" field, standard handler.
>>> 
>>> 
>>> It even seems to be a bug...
>>> 
>>> 
>>> Fuad Efendi
>>> +1 416-993-2060
>>> http://www.linkedin.com/in/liferay
>>> 
>>> Tokenizer Inc.
>>> http://www.tokenizer.ca/
>>> Data Mining, Vertical Search
>>> 
>> 
>

Re: SOLR Performance Tuning: Pagination

Posted by Erik Hatcher <er...@gmail.com>.

On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote:
> When do users do a query like that? --wunder

Well, SolrEntityProcessor "users" do :)

   http://issues.apache.org/jira/browse/SOLR-1499
   (which by the way I plan on polishing and committing over the  
holidays)

	Erik



>
> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
>
>> I used pagination for a while till found this...
>>
>>
>> I have filtered query ID:[* TO *] returning 20 millions results (no
>> faceting), and pagination always seemed to be fast. However, fast  
>> only with
>> low values for start=12345. Queries like start=28838540 take 40-60  
>> seconds,
>> and even cause OutOfMemoryException.
>>
>> I use highlight, faceting on nontokenized "Country" field, standard  
>> handler.
>>
>>
>> It even seems to be a bug...
>>
>>
>> Fuad Efendi
>> +1 416-993-2060
>> http://www.linkedin.com/in/liferay
>>
>> Tokenizer Inc.
>> http://www.tokenizer.ca/
>> Data Mining, Vertical Search
>>
>

RE: SOLR Performance Tuning: Pagination

Posted by Fuad Efendi <fu...@efendi.ca>.

Not users... robots! Slurp/Yahoo, Googlebot, etc.

I had friendly URLs for query with filters like http://.../USA/ showing all
documents from SOLR with country=USA, with pagination; I disabled it now.
But URLs like http://.../?q=USA are still dangerous, I need to limit
pagination programmatically.



> -----Original Message-----
> From: Walter Underwood [mailto:wunder@wunderwood.org]
> Sent: December-24-09 11:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR Performance Tuning: Pagination
> 
> When do users do a query like that? --wunder
> 
> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
> 
> > I used pagination for a while till found this...
> >
> >
> > I have filtered query ID:[* TO *] returning 20 millions results (no
> > faceting), and pagination always seemed to be fast. However, fast only
> with
> > low values for start=12345. Queries like start=28838540 take 40-60
> seconds,
> > and even cause OutOfMemoryException.
> >
> > I use highlight, faceting on nontokenized "Country" field, standard
> handler.
> >
> >
> > It even seems to be a bug...
> >
> >
> > Fuad Efendi
> > +1 416-993-2060
> > http://www.linkedin.com/in/liferay
> >
> > Tokenizer Inc.
> > http://www.tokenizer.ca/
> > Data Mining, Vertical Search
> >

Re: SOLR Performance Tuning: Pagination

Posted by Joe Calderon <ca...@gmail.com>.

fwiw, when implementing distributed search i ran into a similar
problem, but then i noticed even google doesnt let you go past page
1000,  easier to just set a limit on start

On Thu, Dec 24, 2009 at 8:36 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> When do users do a query like that? --wunder
>
> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
>
>> I used pagination for a while till found this...
>>
>>
>> I have filtered query ID:[* TO *] returning 20 millions results (no
>> faceting), and pagination always seemed to be fast. However, fast only with
>> low values for start=12345. Queries like start=28838540 take 40-60 seconds,
>> and even cause OutOfMemoryException.
>>
>> I use highlight, faceting on nontokenized "Country" field, standard handler.
>>
>>
>> It even seems to be a bug...
>>
>>
>> Fuad Efendi
>> +1 416-993-2060
>> http://www.linkedin.com/in/liferay
>>
>> Tokenizer Inc.
>> http://www.tokenizer.ca/
>> Data Mining, Vertical Search
>>
>
>

Re: SOLR Performance Tuning: Pagination

Posted by Walter Underwood <wu...@wunderwood.org>.

When do users do a query like that? --wunder

On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:

> I used pagination for a while till found this...
> 
> 
> I have filtered query ID:[* TO *] returning 20 millions results (no
> faceting), and pagination always seemed to be fast. However, fast only with
> low values for start=12345. Queries like start=28838540 take 40-60 seconds,
> and even cause OutOfMemoryException.
> 
> I use highlight, faceting on nontokenized "Country" field, standard handler.
> 
> 
> It even seems to be a bug...
> 
> 
> Fuad Efendi
> +1 416-993-2060
> http://www.linkedin.com/in/liferay
> 
> Tokenizer Inc.
> http://www.tokenizer.ca/
> Data Mining, Vertical Search
>