You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Petersen, Robert" <ro...@buy.com> on 2012/12/08 02:10:05 UTC

star searches with high page number requests taking long times

Hi guys,


Sometimes we get a bot crawling our search function on our retail web site.  The ebay crawler loves to do this (Request.UserAgent: Terapeakbot).  They just do a star search and then iterate through page after page.  I've noticed that when they get to higher page numbers like page 9000, the searches are taking more than 20 seconds.  Is this expected behavior?  We're requesting standard facets with the search as well as incorporating boosting by function query.  Our index is almost 15 million docs now and we're on Solr 3.6.1, this isn't causing any errors to occur at the solr layer but our web layer times out the search after 20 seconds and logs the exception.



Thanks

Robi


Re: star searches with high page number requests taking long times

Posted by Walter Underwood <wu...@wunderwood.org>.
I put in a 50 page limit when I was at Netflix.  --wunder

On Dec 8, 2012, at 2:26 PM, Petersen, Robert wrote:

> We have a limit in place to restrict searches to the first ten thousand pages. I am going to try to get that number reduced!  I'm thinking even as low as page fifty should be the limit. What human (with a wallet) would even go as deep as fifty pages?  :)
> 
> Sent from my iGizmo
> 
> 
> On Dec 8, 2012, at 10:21 AM, "Otis Gospodnetic" <ot...@gmail.com> wrote:
> 
>> It is common practise not to allow drilling deep in search results.
>> 
>> Otis
>> --
>> SOLR Performance Monitoring - http://sematext.com/spm
>> On Dec 8, 2012 10:27 AM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
>> 
>>> What exactly is the common practice - is there a free, downloadable search
>>> component that does that or at least a "blueprint" for "recommended best
>>> practice"? What limit is common? (I know Google limits you to the top 1,000
>>> results.)
>>> 
>>> -- Jack Krupansky
>>> 
>>> -----Original Message----- From: Otis Gospodnetic
>>> Sent: Saturday, December 08, 2012 7:25 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: star searches with high page number requests taking long times
>>> 
>>> Hi Robert,
>>> 
>>> You should just prevent deep paging. Humans with wallets don't do that, so
>>> you will not lose anything by doing that. It's common practice.
>>> 
>>> Otis
>>> --
>>> SOLR Performance Monitoring - http://sematext.com/spm
>>> On Dec 7, 2012 8:10 PM, "Petersen, Robert" <ro...@buy.com> wrote:
>>> 
>>> Hi guys,
>>>> 
>>>> 
>>>> Sometimes we get a bot crawling our search function on our retail web
>>>> site.  The ebay crawler loves to do this (Request.UserAgent: Terapeakbot).
>>>> They just do a star search and then iterate through page after page. I've
>>>> noticed that when they get to higher page numbers like page 9000, the
>>>> searches are taking more than 20 seconds.  Is this expected behavior?
>>>> We're requesting standard facets with the search as well as incorporating
>>>> boosting by function query.  Our index is almost 15 million docs now and
>>>> we're on Solr 3.6.1, this isn't causing any errors to occur at the solr
>>>> layer but our web layer times out the search after 20 seconds and logs the
>>>> exception.
>>>> 
>>>> 
>>>> 
>>>> Thanks
>>>> 
>>>> Robi
>>> 
> 

--
Walter Underwood
wunder@wunderwood.org




Re: star searches with high page number requests taking long times

Posted by "Petersen, Robert" <ro...@buy.com>.
We have a limit in place to restrict searches to the first ten thousand pages. I am going to try to get that number reduced!  I'm thinking even as low as page fifty should be the limit. What human (with a wallet) would even go as deep as fifty pages?  :)

Sent from my iGizmo


On Dec 8, 2012, at 10:21 AM, "Otis Gospodnetic" <ot...@gmail.com> wrote:

> It is common practise not to allow drilling deep in search results.
> 
> Otis
> --
> SOLR Performance Monitoring - http://sematext.com/spm
> On Dec 8, 2012 10:27 AM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
> 
>> What exactly is the common practice - is there a free, downloadable search
>> component that does that or at least a "blueprint" for "recommended best
>> practice"? What limit is common? (I know Google limits you to the top 1,000
>> results.)
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Otis Gospodnetic
>> Sent: Saturday, December 08, 2012 7:25 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: star searches with high page number requests taking long times
>> 
>> Hi Robert,
>> 
>> You should just prevent deep paging. Humans with wallets don't do that, so
>> you will not lose anything by doing that. It's common practice.
>> 
>> Otis
>> --
>> SOLR Performance Monitoring - http://sematext.com/spm
>> On Dec 7, 2012 8:10 PM, "Petersen, Robert" <ro...@buy.com> wrote:
>> 
>> Hi guys,
>>> 
>>> 
>>> Sometimes we get a bot crawling our search function on our retail web
>>> site.  The ebay crawler loves to do this (Request.UserAgent: Terapeakbot).
>>> They just do a star search and then iterate through page after page. I've
>>> noticed that when they get to higher page numbers like page 9000, the
>>> searches are taking more than 20 seconds.  Is this expected behavior?
>>> We're requesting standard facets with the search as well as incorporating
>>> boosting by function query.  Our index is almost 15 million docs now and
>>> we're on Solr 3.6.1, this isn't causing any errors to occur at the solr
>>> layer but our web layer times out the search after 20 seconds and logs the
>>> exception.
>>> 
>>> 
>>> 
>>> Thanks
>>> 
>>> Robi
>> 


Re: star searches with high page number requests taking long times

Posted by Otis Gospodnetic <ot...@gmail.com>.
It is common practise not to allow drilling deep in search results.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Dec 8, 2012 10:27 AM, "Jack Krupansky" <ja...@basetechnology.com> wrote:

> What exactly is the common practice - is there a free, downloadable search
> component that does that or at least a "blueprint" for "recommended best
> practice"? What limit is common? (I know Google limits you to the top 1,000
> results.)
>
> -- Jack Krupansky
>
> -----Original Message----- From: Otis Gospodnetic
> Sent: Saturday, December 08, 2012 7:25 AM
> To: solr-user@lucene.apache.org
> Subject: Re: star searches with high page number requests taking long times
>
> Hi Robert,
>
> You should just prevent deep paging. Humans with wallets don't do that, so
> you will not lose anything by doing that. It's common practice.
>
> Otis
> --
> SOLR Performance Monitoring - http://sematext.com/spm
> On Dec 7, 2012 8:10 PM, "Petersen, Robert" <ro...@buy.com> wrote:
>
>  Hi guys,
>>
>>
>> Sometimes we get a bot crawling our search function on our retail web
>> site.  The ebay crawler loves to do this (Request.UserAgent: Terapeakbot).
>>  They just do a star search and then iterate through page after page. I've
>> noticed that when they get to higher page numbers like page 9000, the
>> searches are taking more than 20 seconds.  Is this expected behavior?
>>  We're requesting standard facets with the search as well as incorporating
>> boosting by function query.  Our index is almost 15 million docs now and
>> we're on Solr 3.6.1, this isn't causing any errors to occur at the solr
>> layer but our web layer times out the search after 20 seconds and logs the
>> exception.
>>
>>
>>
>> Thanks
>>
>> Robi
>>
>>
>>
>

Re: star searches with high page number requests taking long times

Posted by Jack Krupansky <ja...@basetechnology.com>.
What exactly is the common practice - is there a free, downloadable search 
component that does that or at least a "blueprint" for "recommended best 
practice"? What limit is common? (I know Google limits you to the top 1,000 
results.)

-- Jack Krupansky

-----Original Message----- 
From: Otis Gospodnetic
Sent: Saturday, December 08, 2012 7:25 AM
To: solr-user@lucene.apache.org
Subject: Re: star searches with high page number requests taking long times

Hi Robert,

You should just prevent deep paging. Humans with wallets don't do that, so
you will not lose anything by doing that. It's common practice.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Dec 7, 2012 8:10 PM, "Petersen, Robert" <ro...@buy.com> wrote:

> Hi guys,
>
>
> Sometimes we get a bot crawling our search function on our retail web
> site.  The ebay crawler loves to do this (Request.UserAgent: Terapeakbot).
>  They just do a star search and then iterate through page after page. 
> I've
> noticed that when they get to higher page numbers like page 9000, the
> searches are taking more than 20 seconds.  Is this expected behavior?
>  We're requesting standard facets with the search as well as incorporating
> boosting by function query.  Our index is almost 15 million docs now and
> we're on Solr 3.6.1, this isn't causing any errors to occur at the solr
> layer but our web layer times out the search after 20 seconds and logs the
> exception.
>
>
>
> Thanks
>
> Robi
>
> 


Re: star searches with high page number requests taking long times

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Robert,

You should just prevent deep paging. Humans with wallets don't do that, so
you will not lose anything by doing that. It's common practice.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Dec 7, 2012 8:10 PM, "Petersen, Robert" <ro...@buy.com> wrote:

> Hi guys,
>
>
> Sometimes we get a bot crawling our search function on our retail web
> site.  The ebay crawler loves to do this (Request.UserAgent: Terapeakbot).
>  They just do a star search and then iterate through page after page.  I've
> noticed that when they get to higher page numbers like page 9000, the
> searches are taking more than 20 seconds.  Is this expected behavior?
>  We're requesting standard facets with the search as well as incorporating
> boosting by function query.  Our index is almost 15 million docs now and
> we're on Solr 3.6.1, this isn't causing any errors to occur at the solr
> layer but our web layer times out the search after 20 seconds and logs the
> exception.
>
>
>
> Thanks
>
> Robi
>
>

Re: star searches with high page number requests taking long times

Posted by Aloke Ghoshal <al...@gmail.com>.
Hi Robert,

You could look at pageDoc & pageScore to improve things for deep paging (
http://wiki.apache.org/solr/CommonQueryParameters#pageDoc_and_pageScore).

Regards,
Aloke

On Sat, Dec 8, 2012 at 8:08 AM, Upayavira <uv...@odoko.co.uk> wrote:

> Yes, expected.
>
> When it does a search for the first, say, 10 results, it must scan
> through all docs, recording just the highest ten scoring ones.
>
> To find documents 1000 to 1010, it must scan through all docs, recording
> the best scoring 1010 documents, and then discard the first 1000. This
> is much more expensive. Try it on google, they won't let you go beyond
> around 900 pages or such (or is it 900 results?)
>
> Upayavira
>
> On Sat, Dec 8, 2012, at 01:10 AM, Petersen, Robert wrote:
> > Hi guys,
> >
> >
> > Sometimes we get a bot crawling our search function on our retail web
> > site.  The ebay crawler loves to do this (Request.UserAgent:
> > Terapeakbot).  They just do a star search and then iterate through page
> > after page.  I've noticed that when they get to higher page numbers like
> > page 9000, the searches are taking more than 20 seconds.  Is this
> > expected behavior?  We're requesting standard facets with the search as
> > well as incorporating boosting by function query.  Our index is almost 15
> > million docs now and we're on Solr 3.6.1, this isn't causing any errors
> > to occur at the solr layer but our web layer times out the search after
> > 20 seconds and logs the exception.
> >
> >
> >
> > Thanks
> >
> > Robi
> >
>

Re: star searches with high page number requests taking long times

Posted by Upayavira <uv...@odoko.co.uk>.
Yes, expected.

When it does a search for the first, say, 10 results, it must scan
through all docs, recording just the highest ten scoring ones.

To find documents 1000 to 1010, it must scan through all docs, recording
the best scoring 1010 documents, and then discard the first 1000. This
is much more expensive. Try it on google, they won't let you go beyond
around 900 pages or such (or is it 900 results?)

Upayavira

On Sat, Dec 8, 2012, at 01:10 AM, Petersen, Robert wrote:
> Hi guys,
> 
> 
> Sometimes we get a bot crawling our search function on our retail web
> site.  The ebay crawler loves to do this (Request.UserAgent:
> Terapeakbot).  They just do a star search and then iterate through page
> after page.  I've noticed that when they get to higher page numbers like
> page 9000, the searches are taking more than 20 seconds.  Is this
> expected behavior?  We're requesting standard facets with the search as
> well as incorporating boosting by function query.  Our index is almost 15
> million docs now and we're on Solr 3.6.1, this isn't causing any errors
> to occur at the solr layer but our web layer times out the search after
> 20 seconds and logs the exception.
> 
> 
> 
> Thanks
> 
> Robi
>