You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jame vaalet <ja...@gmail.com> on 2011/08/10 15:09:32 UTC

paging size in SOLR

hi,
i want to retrieve all the data from solr (say 10,000 ids ) and my page size
is 1000 .
how do i get back the data (pages) one after other ?do i have to increment
the "start" value each time by the page size from 0 and do the iteration ?
In this case am i querying the index 10 time instead of one or after first
query the result will be cached somewhere for the subsequent pages ?


JAME VAALET

RE: paging size in SOLR

Posted by Jonathan Rochkind <ro...@jhu.edu>.
I would imagine the performance penalties with deep paging will ALSO be there if you just ask for 10000 rows all at once though, instead of in, say, 100 row paged batches. Yes? No?

-----Original Message-----
From: simon [mailto:mtnest46@gmail.com] 
Sent: Wednesday, August 10, 2011 10:44 AM
To: solr-user@lucene.apache.org
Subject: Re: paging size in SOLR

Worth remembering there are some performance penalties with deep
paging, if you use the page-by-page approach. may not be too much of a
problem if you really are only looking to retrieve 10K docs.

-Simon

On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
<er...@gmail.com> wrote:
> Well, if you really want to you can specify start=0 and rows=10000 and
> get them all back at once.
>
> You can do page-by-page by incrementing the "start" parameter as you
> indicated.
>
> You can keep from re-executing the search by setting your queryResultCache
> appropriately, but this affects all searches so might be an issue.
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <ja...@gmail.com> wrote:
>> hi,
>> i want to retrieve all the data from solr (say 10,000 ids ) and my page size
>> is 1000 .
>> how do i get back the data (pages) one after other ?do i have to increment
>> the "start" value each time by the page size from 0 and do the iteration ?
>> In this case am i querying the index 10 time instead of one or after first
>> query the result will be cached somewhere for the subsequent pages ?
>>
>>
>> JAME VAALET
>>
>

Re: paging size in SOLR

Posted by Erick Erickson <er...@gmail.com>.
1> I don't know, where is it coming from? Looks like you've done stats call on
a freshly opened server.

2> 512 entries (i.e. results for 512 queries). Each entry is
<queryResultWindowSize>
doc IDs.

Best
Erick

On Fri, Aug 19, 2011 at 5:33 AM, jame vaalet <ja...@gmail.com> wrote:
> 1 .what does this specify ?
>
> <queryResultCache class="*solr.LRUCache*"
> size="*${queryResultCacheSize:0}*"initialSize
> ="*${queryResultCacheInitialSize:0}*" autowarmCount="*
> ${queryResultCacheRows:0}*" />
>
> 2.
>
> when i say *queryResultCacheSize : 512 *, does it mean 512 queries can be
> cached or 512 bytes are reserved for caching ?
>
> can some please give me an answer ?
>
>
>
> On 14 August 2011 21:41, Erick Erickson <er...@gmail.com> wrote:
>
>> Yep.
>>
>> ResultWindowSize in
>> >> solrconfig.xml
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet <ja...@gmail.com>
>> wrote:
>> >> > thanks erick ... that means it depends upon the memory allocated to
>> the
>> >> JVM
>> >> > .
>> >> >
>> >> > going back queryCacheResults factor i have got this doubt ..
>> >> > say, i have got 10 threads with 10 different queries ..and each of
>> them
>> >> in
>> >> > parallel are searching the same index with millions of docs in it
>> >> > (multisharded ) .
>> >> > now each of the queries have large number of results in it hence got
>> to
>> >> page
>> >> > them all..
>> >> > which all thread's (query ) result-set will be cached ? so that
>> >> subsequent
>> >> > pages can be retrieved quickly ..?
>> >> >
>> >> > On 14 August 2011 17:40, Erick Erickson <er...@gmail.com>
>> wrote:
>> >> >
>> >> >> There isn't an "optimum" page size that I know of, it'll vary with
>> lots
>> >> of
>> >> >> stuff, not the least of which is whatever servlet container limits
>> there
>> >> >> are.
>> >> >>
>> >> >> But I suspect you can get quite a few (1000s) without
>> >> >> too much problem, and you can always use the JSON response
>> >> >> writer to pack in more pages with less overhead.
>> >> >>
>> >> >> You pretty much have to try it and see.
>> >> >>
>> >> >> Best
>> >> >> Erick
>> >> >>
>> >> >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet <ja...@gmail.com>
>> >> wrote:
>> >> >> > speaking about pagesizes, what is the optimum page size that should
>> be
>> >> >> > retrieved each time ??
>> >> >> > i understand it depends upon the data you are fetching back
>> fromeach
>> >> hit
>> >> >> > document ... but lets say when ever a document is hit am fetching
>> back
>> >> >> 100
>> >> >> > bytes worth data from each of those docs in indexes (along with
>> solr
>> >> >> > response statements ) .
>> >> >> > this will make 100*x bytes worth data in each page if x is the page
>> >> size
>> >> >> ..
>> >> >> > what is the optimum value of this x that solr can return each time
>> >> >> without
>> >> >> > going into exceptions ....
>> >> >> >
>> >> >> > On 13 August 2011 19:59, Erick Erickson <er...@gmail.com>
>> >> wrote:
>> >> >> >
>> >> >> >> Jame:
>> >> >> >>
>> >> >> >> You control the number via settings in solrconfig.xml, so it's
>> >> >> >> up to you.
>> >> >> >>
>> >> >> >> Jonathan:
>> >> >> >> Hmmm, that's seems right, after all the "deep paging" penalty is
>> >> really
>> >> >> >> about keeping a large sorted array in memory.... but at least you
>> >> only
>> >> >> >> pay it once per 10,000, rather than 100 times (assuming page size
>> is
>> >> >> >> 100)...
>> >> >> >>
>> >> >> >> Best
>> >> >> >> Erick
>> >> >> >>
>> >> >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <
>> jamevaalet@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > when you say queryResultCache, does it only cache n number of
>> >> result
>> >> >> for
>> >> >> >> the
>> >> >> >> > last one query or more than one queries?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> >> Worth remembering there are some performance penalties with
>> deep
>> >> >> >> >> paging, if you use the page-by-page approach. may not be too
>> much
>> >> of
>> >> >> a
>> >> >> >> >> problem if you really are only looking to retrieve 10K docs.
>> >> >> >> >>
>> >> >> >> >> -Simon
>> >> >> >> >>
>> >> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>> >> >> >> >> <er...@gmail.com> wrote:
>> >> >> >> >> > Well, if you really want to you can specify start=0 and
>> >> rows=10000
>> >> >> and
>> >> >> >> >> > get them all back at once.
>> >> >> >> >> >
>> >> >> >> >> > You can do page-by-page by incrementing the "start" parameter
>> as
>> >> >> you
>> >> >> >> >> > indicated.
>> >> >> >> >> >
>> >> >> >> >> > You can keep from re-executing the search by setting your
>> >> >> >> >> queryResultCache
>> >> >> >> >> > appropriately, but this affects all searches so might be an
>> >> issue.
>> >> >> >> >> >
>> >> >> >> >> > Best
>> >> >> >> >> > Erick
>> >> >> >> >> >
>> >> >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <
>> >> jamevaalet@gmail.com
>> >> >> >
>> >> >> >> >> wrote:
>> >> >> >> >> >> hi,
>> >> >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids )
>> and
>> >> my
>> >> >> >> page
>> >> >> >> >> size
>> >> >> >> >> >> is 1000 .
>> >> >> >> >> >> how do i get back the data (pages) one after other ?do i
>> have
>> >> to
>> >> >> >> >> increment
>> >> >> >> >> >> the "start" value each time by the page size from 0 and do
>> the
>> >> >> >> iteration
>> >> >> >> >> ?
>> >> >> >> >> >> In this case am i querying the index 10 time instead of one
>> or
>> >> >> after
>> >> >> >> >> first
>> >> >> >> >> >> query the result will be cached somewhere for the subsequent
>> >> pages
>> >> >> ?
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> JAME VAALET
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> >
>> >> >> >> > -JAME
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > -JAME
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > -JAME
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > -JAME
>> >
>>
>
>
>
> --
>
> -JAME
>

Re: paging size in SOLR

Posted by jame vaalet <ja...@gmail.com>.
1 .what does this specify ?

<queryResultCache class="*solr.LRUCache*"
size="*${queryResultCacheSize:0}*"initialSize
="*${queryResultCacheInitialSize:0}*" autowarmCount="*
${queryResultCacheRows:0}*" />

2.

when i say *queryResultCacheSize : 512 *, does it mean 512 queries can be
cached or 512 bytes are reserved for caching ?

can some please give me an answer ?



On 14 August 2011 21:41, Erick Erickson <er...@gmail.com> wrote:

> Yep.
>
> ResultWindowSize in
> >> solrconfig.xml
> >>
> >> Best
> >> Erick
> >>
> >> On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet <ja...@gmail.com>
> wrote:
> >> > thanks erick ... that means it depends upon the memory allocated to
> the
> >> JVM
> >> > .
> >> >
> >> > going back queryCacheResults factor i have got this doubt ..
> >> > say, i have got 10 threads with 10 different queries ..and each of
> them
> >> in
> >> > parallel are searching the same index with millions of docs in it
> >> > (multisharded ) .
> >> > now each of the queries have large number of results in it hence got
> to
> >> page
> >> > them all..
> >> > which all thread's (query ) result-set will be cached ? so that
> >> subsequent
> >> > pages can be retrieved quickly ..?
> >> >
> >> > On 14 August 2011 17:40, Erick Erickson <er...@gmail.com>
> wrote:
> >> >
> >> >> There isn't an "optimum" page size that I know of, it'll vary with
> lots
> >> of
> >> >> stuff, not the least of which is whatever servlet container limits
> there
> >> >> are.
> >> >>
> >> >> But I suspect you can get quite a few (1000s) without
> >> >> too much problem, and you can always use the JSON response
> >> >> writer to pack in more pages with less overhead.
> >> >>
> >> >> You pretty much have to try it and see.
> >> >>
> >> >> Best
> >> >> Erick
> >> >>
> >> >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet <ja...@gmail.com>
> >> wrote:
> >> >> > speaking about pagesizes, what is the optimum page size that should
> be
> >> >> > retrieved each time ??
> >> >> > i understand it depends upon the data you are fetching back
> fromeach
> >> hit
> >> >> > document ... but lets say when ever a document is hit am fetching
> back
> >> >> 100
> >> >> > bytes worth data from each of those docs in indexes (along with
> solr
> >> >> > response statements ) .
> >> >> > this will make 100*x bytes worth data in each page if x is the page
> >> size
> >> >> ..
> >> >> > what is the optimum value of this x that solr can return each time
> >> >> without
> >> >> > going into exceptions ....
> >> >> >
> >> >> > On 13 August 2011 19:59, Erick Erickson <er...@gmail.com>
> >> wrote:
> >> >> >
> >> >> >> Jame:
> >> >> >>
> >> >> >> You control the number via settings in solrconfig.xml, so it's
> >> >> >> up to you.
> >> >> >>
> >> >> >> Jonathan:
> >> >> >> Hmmm, that's seems right, after all the "deep paging" penalty is
> >> really
> >> >> >> about keeping a large sorted array in memory.... but at least you
> >> only
> >> >> >> pay it once per 10,000, rather than 100 times (assuming page size
> is
> >> >> >> 100)...
> >> >> >>
> >> >> >> Best
> >> >> >> Erick
> >> >> >>
> >> >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <
> jamevaalet@gmail.com>
> >> >> >> wrote:
> >> >> >> > when you say queryResultCache, does it only cache n number of
> >> result
> >> >> for
> >> >> >> the
> >> >> >> > last one query or more than one queries?
> >> >> >> >
> >> >> >> >
> >> >> >> > On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
> >> >> >> >
> >> >> >> >> Worth remembering there are some performance penalties with
> deep
> >> >> >> >> paging, if you use the page-by-page approach. may not be too
> much
> >> of
> >> >> a
> >> >> >> >> problem if you really are only looking to retrieve 10K docs.
> >> >> >> >>
> >> >> >> >> -Simon
> >> >> >> >>
> >> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
> >> >> >> >> <er...@gmail.com> wrote:
> >> >> >> >> > Well, if you really want to you can specify start=0 and
> >> rows=10000
> >> >> and
> >> >> >> >> > get them all back at once.
> >> >> >> >> >
> >> >> >> >> > You can do page-by-page by incrementing the "start" parameter
> as
> >> >> you
> >> >> >> >> > indicated.
> >> >> >> >> >
> >> >> >> >> > You can keep from re-executing the search by setting your
> >> >> >> >> queryResultCache
> >> >> >> >> > appropriately, but this affects all searches so might be an
> >> issue.
> >> >> >> >> >
> >> >> >> >> > Best
> >> >> >> >> > Erick
> >> >> >> >> >
> >> >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <
> >> jamevaalet@gmail.com
> >> >> >
> >> >> >> >> wrote:
> >> >> >> >> >> hi,
> >> >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids )
> and
> >> my
> >> >> >> page
> >> >> >> >> size
> >> >> >> >> >> is 1000 .
> >> >> >> >> >> how do i get back the data (pages) one after other ?do i
> have
> >> to
> >> >> >> >> increment
> >> >> >> >> >> the "start" value each time by the page size from 0 and do
> the
> >> >> >> iteration
> >> >> >> >> ?
> >> >> >> >> >> In this case am i querying the index 10 time instead of one
> or
> >> >> after
> >> >> >> >> first
> >> >> >> >> >> query the result will be cached somewhere for the subsequent
> >> pages
> >> >> ?
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> JAME VAALET
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> >
> >> >> >> > -JAME
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > -JAME
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > -JAME
> >> >
> >>
> >
> >
> >
> > --
> >
> > -JAME
> >
>



-- 

-JAME

Re: paging size in SOLR

Posted by Erick Erickson <er...@gmail.com>.
Yep.

ResultWindowSize in
>> solrconfig.xml
>>
>> Best
>> Erick
>>
>> On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet <ja...@gmail.com> wrote:
>> > thanks erick ... that means it depends upon the memory allocated to the
>> JVM
>> > .
>> >
>> > going back queryCacheResults factor i have got this doubt ..
>> > say, i have got 10 threads with 10 different queries ..and each of them
>> in
>> > parallel are searching the same index with millions of docs in it
>> > (multisharded ) .
>> > now each of the queries have large number of results in it hence got to
>> page
>> > them all..
>> > which all thread's (query ) result-set will be cached ? so that
>> subsequent
>> > pages can be retrieved quickly ..?
>> >
>> > On 14 August 2011 17:40, Erick Erickson <er...@gmail.com> wrote:
>> >
>> >> There isn't an "optimum" page size that I know of, it'll vary with lots
>> of
>> >> stuff, not the least of which is whatever servlet container limits there
>> >> are.
>> >>
>> >> But I suspect you can get quite a few (1000s) without
>> >> too much problem, and you can always use the JSON response
>> >> writer to pack in more pages with less overhead.
>> >>
>> >> You pretty much have to try it and see.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet <ja...@gmail.com>
>> wrote:
>> >> > speaking about pagesizes, what is the optimum page size that should be
>> >> > retrieved each time ??
>> >> > i understand it depends upon the data you are fetching back fromeach
>> hit
>> >> > document ... but lets say when ever a document is hit am fetching back
>> >> 100
>> >> > bytes worth data from each of those docs in indexes (along with solr
>> >> > response statements ) .
>> >> > this will make 100*x bytes worth data in each page if x is the page
>> size
>> >> ..
>> >> > what is the optimum value of this x that solr can return each time
>> >> without
>> >> > going into exceptions ....
>> >> >
>> >> > On 13 August 2011 19:59, Erick Erickson <er...@gmail.com>
>> wrote:
>> >> >
>> >> >> Jame:
>> >> >>
>> >> >> You control the number via settings in solrconfig.xml, so it's
>> >> >> up to you.
>> >> >>
>> >> >> Jonathan:
>> >> >> Hmmm, that's seems right, after all the "deep paging" penalty is
>> really
>> >> >> about keeping a large sorted array in memory.... but at least you
>> only
>> >> >> pay it once per 10,000, rather than 100 times (assuming page size is
>> >> >> 100)...
>> >> >>
>> >> >> Best
>> >> >> Erick
>> >> >>
>> >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <ja...@gmail.com>
>> >> >> wrote:
>> >> >> > when you say queryResultCache, does it only cache n number of
>> result
>> >> for
>> >> >> the
>> >> >> > last one query or more than one queries?
>> >> >> >
>> >> >> >
>> >> >> > On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
>> >> >> >
>> >> >> >> Worth remembering there are some performance penalties with deep
>> >> >> >> paging, if you use the page-by-page approach. may not be too much
>> of
>> >> a
>> >> >> >> problem if you really are only looking to retrieve 10K docs.
>> >> >> >>
>> >> >> >> -Simon
>> >> >> >>
>> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>> >> >> >> <er...@gmail.com> wrote:
>> >> >> >> > Well, if you really want to you can specify start=0 and
>> rows=10000
>> >> and
>> >> >> >> > get them all back at once.
>> >> >> >> >
>> >> >> >> > You can do page-by-page by incrementing the "start" parameter as
>> >> you
>> >> >> >> > indicated.
>> >> >> >> >
>> >> >> >> > You can keep from re-executing the search by setting your
>> >> >> >> queryResultCache
>> >> >> >> > appropriately, but this affects all searches so might be an
>> issue.
>> >> >> >> >
>> >> >> >> > Best
>> >> >> >> > Erick
>> >> >> >> >
>> >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <
>> jamevaalet@gmail.com
>> >> >
>> >> >> >> wrote:
>> >> >> >> >> hi,
>> >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and
>> my
>> >> >> page
>> >> >> >> size
>> >> >> >> >> is 1000 .
>> >> >> >> >> how do i get back the data (pages) one after other ?do i have
>> to
>> >> >> >> increment
>> >> >> >> >> the "start" value each time by the page size from 0 and do the
>> >> >> iteration
>> >> >> >> ?
>> >> >> >> >> In this case am i querying the index 10 time instead of one or
>> >> after
>> >> >> >> first
>> >> >> >> >> query the result will be cached somewhere for the subsequent
>> pages
>> >> ?
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> JAME VAALET
>> >> >> >> >>
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > -JAME
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > -JAME
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > -JAME
>> >
>>
>
>
>
> --
>
> -JAME
>

Re: paging size in SOLR

Posted by jame vaalet <ja...@gmail.com>.
my queryResultCache size =0  and queryResultWindowSize =50
does this mean that am not caching any results ?

On 14 August 2011 18:27, Erick Erickson <er...@gmail.com> wrote:

> As many results will be cached as you ask. See solrconfig.xml,
> the queryResultCache. This cache is essentially a map of queries
> and result document IDs. The number of doc IDs cached for
> each query is controlled by queryResultWindowSize in
> solrconfig.xml
>
> Best
> Erick
>
> On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet <ja...@gmail.com> wrote:
> > thanks erick ... that means it depends upon the memory allocated to the
> JVM
> > .
> >
> > going back queryCacheResults factor i have got this doubt ..
> > say, i have got 10 threads with 10 different queries ..and each of them
> in
> > parallel are searching the same index with millions of docs in it
> > (multisharded ) .
> > now each of the queries have large number of results in it hence got to
> page
> > them all..
> > which all thread's (query ) result-set will be cached ? so that
> subsequent
> > pages can be retrieved quickly ..?
> >
> > On 14 August 2011 17:40, Erick Erickson <er...@gmail.com> wrote:
> >
> >> There isn't an "optimum" page size that I know of, it'll vary with lots
> of
> >> stuff, not the least of which is whatever servlet container limits there
> >> are.
> >>
> >> But I suspect you can get quite a few (1000s) without
> >> too much problem, and you can always use the JSON response
> >> writer to pack in more pages with less overhead.
> >>
> >> You pretty much have to try it and see.
> >>
> >> Best
> >> Erick
> >>
> >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet <ja...@gmail.com>
> wrote:
> >> > speaking about pagesizes, what is the optimum page size that should be
> >> > retrieved each time ??
> >> > i understand it depends upon the data you are fetching back fromeach
> hit
> >> > document ... but lets say when ever a document is hit am fetching back
> >> 100
> >> > bytes worth data from each of those docs in indexes (along with solr
> >> > response statements ) .
> >> > this will make 100*x bytes worth data in each page if x is the page
> size
> >> ..
> >> > what is the optimum value of this x that solr can return each time
> >> without
> >> > going into exceptions ....
> >> >
> >> > On 13 August 2011 19:59, Erick Erickson <er...@gmail.com>
> wrote:
> >> >
> >> >> Jame:
> >> >>
> >> >> You control the number via settings in solrconfig.xml, so it's
> >> >> up to you.
> >> >>
> >> >> Jonathan:
> >> >> Hmmm, that's seems right, after all the "deep paging" penalty is
> really
> >> >> about keeping a large sorted array in memory.... but at least you
> only
> >> >> pay it once per 10,000, rather than 100 times (assuming page size is
> >> >> 100)...
> >> >>
> >> >> Best
> >> >> Erick
> >> >>
> >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <ja...@gmail.com>
> >> >> wrote:
> >> >> > when you say queryResultCache, does it only cache n number of
> result
> >> for
> >> >> the
> >> >> > last one query or more than one queries?
> >> >> >
> >> >> >
> >> >> > On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
> >> >> >
> >> >> >> Worth remembering there are some performance penalties with deep
> >> >> >> paging, if you use the page-by-page approach. may not be too much
> of
> >> a
> >> >> >> problem if you really are only looking to retrieve 10K docs.
> >> >> >>
> >> >> >> -Simon
> >> >> >>
> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
> >> >> >> <er...@gmail.com> wrote:
> >> >> >> > Well, if you really want to you can specify start=0 and
> rows=10000
> >> and
> >> >> >> > get them all back at once.
> >> >> >> >
> >> >> >> > You can do page-by-page by incrementing the "start" parameter as
> >> you
> >> >> >> > indicated.
> >> >> >> >
> >> >> >> > You can keep from re-executing the search by setting your
> >> >> >> queryResultCache
> >> >> >> > appropriately, but this affects all searches so might be an
> issue.
> >> >> >> >
> >> >> >> > Best
> >> >> >> > Erick
> >> >> >> >
> >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <
> jamevaalet@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >> >> hi,
> >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and
> my
> >> >> page
> >> >> >> size
> >> >> >> >> is 1000 .
> >> >> >> >> how do i get back the data (pages) one after other ?do i have
> to
> >> >> >> increment
> >> >> >> >> the "start" value each time by the page size from 0 and do the
> >> >> iteration
> >> >> >> ?
> >> >> >> >> In this case am i querying the index 10 time instead of one or
> >> after
> >> >> >> first
> >> >> >> >> query the result will be cached somewhere for the subsequent
> pages
> >> ?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> JAME VAALET
> >> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > -JAME
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > -JAME
> >> >
> >>
> >
> >
> >
> > --
> >
> > -JAME
> >
>



-- 

-JAME

Re: paging size in SOLR

Posted by Erick Erickson <er...@gmail.com>.
As many results will be cached as you ask. See solrconfig.xml,
the queryResultCache. This cache is essentially a map of queries
and result document IDs. The number of doc IDs cached for
each query is controlled by queryResultWindowSize in
solrconfig.xml

Best
Erick

On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet <ja...@gmail.com> wrote:
> thanks erick ... that means it depends upon the memory allocated to the JVM
> .
>
> going back queryCacheResults factor i have got this doubt ..
> say, i have got 10 threads with 10 different queries ..and each of them in
> parallel are searching the same index with millions of docs in it
> (multisharded ) .
> now each of the queries have large number of results in it hence got to page
> them all..
> which all thread's (query ) result-set will be cached ? so that subsequent
> pages can be retrieved quickly ..?
>
> On 14 August 2011 17:40, Erick Erickson <er...@gmail.com> wrote:
>
>> There isn't an "optimum" page size that I know of, it'll vary with lots of
>> stuff, not the least of which is whatever servlet container limits there
>> are.
>>
>> But I suspect you can get quite a few (1000s) without
>> too much problem, and you can always use the JSON response
>> writer to pack in more pages with less overhead.
>>
>> You pretty much have to try it and see.
>>
>> Best
>> Erick
>>
>> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet <ja...@gmail.com> wrote:
>> > speaking about pagesizes, what is the optimum page size that should be
>> > retrieved each time ??
>> > i understand it depends upon the data you are fetching back fromeach hit
>> > document ... but lets say when ever a document is hit am fetching back
>> 100
>> > bytes worth data from each of those docs in indexes (along with solr
>> > response statements ) .
>> > this will make 100*x bytes worth data in each page if x is the page size
>> ..
>> > what is the optimum value of this x that solr can return each time
>> without
>> > going into exceptions ....
>> >
>> > On 13 August 2011 19:59, Erick Erickson <er...@gmail.com> wrote:
>> >
>> >> Jame:
>> >>
>> >> You control the number via settings in solrconfig.xml, so it's
>> >> up to you.
>> >>
>> >> Jonathan:
>> >> Hmmm, that's seems right, after all the "deep paging" penalty is really
>> >> about keeping a large sorted array in memory.... but at least you only
>> >> pay it once per 10,000, rather than 100 times (assuming page size is
>> >> 100)...
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <ja...@gmail.com>
>> >> wrote:
>> >> > when you say queryResultCache, does it only cache n number of result
>> for
>> >> the
>> >> > last one query or more than one queries?
>> >> >
>> >> >
>> >> > On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
>> >> >
>> >> >> Worth remembering there are some performance penalties with deep
>> >> >> paging, if you use the page-by-page approach. may not be too much of
>> a
>> >> >> problem if you really are only looking to retrieve 10K docs.
>> >> >>
>> >> >> -Simon
>> >> >>
>> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>> >> >> <er...@gmail.com> wrote:
>> >> >> > Well, if you really want to you can specify start=0 and rows=10000
>> and
>> >> >> > get them all back at once.
>> >> >> >
>> >> >> > You can do page-by-page by incrementing the "start" parameter as
>> you
>> >> >> > indicated.
>> >> >> >
>> >> >> > You can keep from re-executing the search by setting your
>> >> >> queryResultCache
>> >> >> > appropriately, but this affects all searches so might be an issue.
>> >> >> >
>> >> >> > Best
>> >> >> > Erick
>> >> >> >
>> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <jamevaalet@gmail.com
>> >
>> >> >> wrote:
>> >> >> >> hi,
>> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my
>> >> page
>> >> >> size
>> >> >> >> is 1000 .
>> >> >> >> how do i get back the data (pages) one after other ?do i have to
>> >> >> increment
>> >> >> >> the "start" value each time by the page size from 0 and do the
>> >> iteration
>> >> >> ?
>> >> >> >> In this case am i querying the index 10 time instead of one or
>> after
>> >> >> first
>> >> >> >> query the result will be cached somewhere for the subsequent pages
>> ?
>> >> >> >>
>> >> >> >>
>> >> >> >> JAME VAALET
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > -JAME
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > -JAME
>> >
>>
>
>
>
> --
>
> -JAME
>

Re: paging size in SOLR

Posted by jame vaalet <ja...@gmail.com>.
thanks erick ... that means it depends upon the memory allocated to the JVM
.

going back queryCacheResults factor i have got this doubt ..
say, i have got 10 threads with 10 different queries ..and each of them in
parallel are searching the same index with millions of docs in it
(multisharded ) .
now each of the queries have large number of results in it hence got to page
them all..
which all thread's (query ) result-set will be cached ? so that subsequent
pages can be retrieved quickly ..?

On 14 August 2011 17:40, Erick Erickson <er...@gmail.com> wrote:

> There isn't an "optimum" page size that I know of, it'll vary with lots of
> stuff, not the least of which is whatever servlet container limits there
> are.
>
> But I suspect you can get quite a few (1000s) without
> too much problem, and you can always use the JSON response
> writer to pack in more pages with less overhead.
>
> You pretty much have to try it and see.
>
> Best
> Erick
>
> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet <ja...@gmail.com> wrote:
> > speaking about pagesizes, what is the optimum page size that should be
> > retrieved each time ??
> > i understand it depends upon the data you are fetching back fromeach hit
> > document ... but lets say when ever a document is hit am fetching back
> 100
> > bytes worth data from each of those docs in indexes (along with solr
> > response statements ) .
> > this will make 100*x bytes worth data in each page if x is the page size
> ..
> > what is the optimum value of this x that solr can return each time
> without
> > going into exceptions ....
> >
> > On 13 August 2011 19:59, Erick Erickson <er...@gmail.com> wrote:
> >
> >> Jame:
> >>
> >> You control the number via settings in solrconfig.xml, so it's
> >> up to you.
> >>
> >> Jonathan:
> >> Hmmm, that's seems right, after all the "deep paging" penalty is really
> >> about keeping a large sorted array in memory.... but at least you only
> >> pay it once per 10,000, rather than 100 times (assuming page size is
> >> 100)...
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <ja...@gmail.com>
> >> wrote:
> >> > when you say queryResultCache, does it only cache n number of result
> for
> >> the
> >> > last one query or more than one queries?
> >> >
> >> >
> >> > On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
> >> >
> >> >> Worth remembering there are some performance penalties with deep
> >> >> paging, if you use the page-by-page approach. may not be too much of
> a
> >> >> problem if you really are only looking to retrieve 10K docs.
> >> >>
> >> >> -Simon
> >> >>
> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
> >> >> <er...@gmail.com> wrote:
> >> >> > Well, if you really want to you can specify start=0 and rows=10000
> and
> >> >> > get them all back at once.
> >> >> >
> >> >> > You can do page-by-page by incrementing the "start" parameter as
> you
> >> >> > indicated.
> >> >> >
> >> >> > You can keep from re-executing the search by setting your
> >> >> queryResultCache
> >> >> > appropriately, but this affects all searches so might be an issue.
> >> >> >
> >> >> > Best
> >> >> > Erick
> >> >> >
> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <jamevaalet@gmail.com
> >
> >> >> wrote:
> >> >> >> hi,
> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my
> >> page
> >> >> size
> >> >> >> is 1000 .
> >> >> >> how do i get back the data (pages) one after other ?do i have to
> >> >> increment
> >> >> >> the "start" value each time by the page size from 0 and do the
> >> iteration
> >> >> ?
> >> >> >> In this case am i querying the index 10 time instead of one or
> after
> >> >> first
> >> >> >> query the result will be cached somewhere for the subsequent pages
> ?
> >> >> >>
> >> >> >>
> >> >> >> JAME VAALET
> >> >> >>
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > -JAME
> >> >
> >>
> >
> >
> >
> > --
> >
> > -JAME
> >
>



-- 

-JAME

Re: paging size in SOLR

Posted by Erick Erickson <er...@gmail.com>.
There isn't an "optimum" page size that I know of, it'll vary with lots of
stuff, not the least of which is whatever servlet container limits there are.

But I suspect you can get quite a few (1000s) without
too much problem, and you can always use the JSON response
writer to pack in more pages with less overhead.

You pretty much have to try it and see.

Best
Erick

On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet <ja...@gmail.com> wrote:
> speaking about pagesizes, what is the optimum page size that should be
> retrieved each time ??
> i understand it depends upon the data you are fetching back fromeach hit
> document ... but lets say when ever a document is hit am fetching back 100
> bytes worth data from each of those docs in indexes (along with solr
> response statements ) .
> this will make 100*x bytes worth data in each page if x is the page size ..
> what is the optimum value of this x that solr can return each time without
> going into exceptions ....
>
> On 13 August 2011 19:59, Erick Erickson <er...@gmail.com> wrote:
>
>> Jame:
>>
>> You control the number via settings in solrconfig.xml, so it's
>> up to you.
>>
>> Jonathan:
>> Hmmm, that's seems right, after all the "deep paging" penalty is really
>> about keeping a large sorted array in memory.... but at least you only
>> pay it once per 10,000, rather than 100 times (assuming page size is
>> 100)...
>>
>> Best
>> Erick
>>
>> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <ja...@gmail.com>
>> wrote:
>> > when you say queryResultCache, does it only cache n number of result for
>> the
>> > last one query or more than one queries?
>> >
>> >
>> > On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
>> >
>> >> Worth remembering there are some performance penalties with deep
>> >> paging, if you use the page-by-page approach. may not be too much of a
>> >> problem if you really are only looking to retrieve 10K docs.
>> >>
>> >> -Simon
>> >>
>> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>> >> <er...@gmail.com> wrote:
>> >> > Well, if you really want to you can specify start=0 and rows=10000 and
>> >> > get them all back at once.
>> >> >
>> >> > You can do page-by-page by incrementing the "start" parameter as you
>> >> > indicated.
>> >> >
>> >> > You can keep from re-executing the search by setting your
>> >> queryResultCache
>> >> > appropriately, but this affects all searches so might be an issue.
>> >> >
>> >> > Best
>> >> > Erick
>> >> >
>> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <ja...@gmail.com>
>> >> wrote:
>> >> >> hi,
>> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my
>> page
>> >> size
>> >> >> is 1000 .
>> >> >> how do i get back the data (pages) one after other ?do i have to
>> >> increment
>> >> >> the "start" value each time by the page size from 0 and do the
>> iteration
>> >> ?
>> >> >> In this case am i querying the index 10 time instead of one or after
>> >> first
>> >> >> query the result will be cached somewhere for the subsequent pages ?
>> >> >>
>> >> >>
>> >> >> JAME VAALET
>> >> >>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > -JAME
>> >
>>
>
>
>
> --
>
> -JAME
>

Re: paging size in SOLR

Posted by jame vaalet <ja...@gmail.com>.
speaking about pagesizes, what is the optimum page size that should be
retrieved each time ??
i understand it depends upon the data you are fetching back fromeach hit
document ... but lets say when ever a document is hit am fetching back 100
bytes worth data from each of those docs in indexes (along with solr
response statements ) .
this will make 100*x bytes worth data in each page if x is the page size ..
what is the optimum value of this x that solr can return each time without
going into exceptions ....

On 13 August 2011 19:59, Erick Erickson <er...@gmail.com> wrote:

> Jame:
>
> You control the number via settings in solrconfig.xml, so it's
> up to you.
>
> Jonathan:
> Hmmm, that's seems right, after all the "deep paging" penalty is really
> about keeping a large sorted array in memory.... but at least you only
> pay it once per 10,000, rather than 100 times (assuming page size is
> 100)...
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <ja...@gmail.com>
> wrote:
> > when you say queryResultCache, does it only cache n number of result for
> the
> > last one query or more than one queries?
> >
> >
> > On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
> >
> >> Worth remembering there are some performance penalties with deep
> >> paging, if you use the page-by-page approach. may not be too much of a
> >> problem if you really are only looking to retrieve 10K docs.
> >>
> >> -Simon
> >>
> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
> >> <er...@gmail.com> wrote:
> >> > Well, if you really want to you can specify start=0 and rows=10000 and
> >> > get them all back at once.
> >> >
> >> > You can do page-by-page by incrementing the "start" parameter as you
> >> > indicated.
> >> >
> >> > You can keep from re-executing the search by setting your
> >> queryResultCache
> >> > appropriately, but this affects all searches so might be an issue.
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <ja...@gmail.com>
> >> wrote:
> >> >> hi,
> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my
> page
> >> size
> >> >> is 1000 .
> >> >> how do i get back the data (pages) one after other ?do i have to
> >> increment
> >> >> the "start" value each time by the page size from 0 and do the
> iteration
> >> ?
> >> >> In this case am i querying the index 10 time instead of one or after
> >> first
> >> >> query the result will be cached somewhere for the subsequent pages ?
> >> >>
> >> >>
> >> >> JAME VAALET
> >> >>
> >> >
> >>
> >
> >
> >
> > --
> >
> > -JAME
> >
>



-- 

-JAME

Re: paging size in SOLR

Posted by Erick Erickson <er...@gmail.com>.
Jame:

You control the number via settings in solrconfig.xml, so it's
up to you.

Jonathan:
Hmmm, that's seems right, after all the "deep paging" penalty is really
about keeping a large sorted array in memory.... but at least you only
pay it once per 10,000, rather than 100 times (assuming page size is
100)...

Best
Erick

On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet <ja...@gmail.com> wrote:
> when you say queryResultCache, does it only cache n number of result for the
> last one query or more than one queries?
>
>
> On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:
>
>> Worth remembering there are some performance penalties with deep
>> paging, if you use the page-by-page approach. may not be too much of a
>> problem if you really are only looking to retrieve 10K docs.
>>
>> -Simon
>>
>> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>> <er...@gmail.com> wrote:
>> > Well, if you really want to you can specify start=0 and rows=10000 and
>> > get them all back at once.
>> >
>> > You can do page-by-page by incrementing the "start" parameter as you
>> > indicated.
>> >
>> > You can keep from re-executing the search by setting your
>> queryResultCache
>> > appropriately, but this affects all searches so might be an issue.
>> >
>> > Best
>> > Erick
>> >
>> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <ja...@gmail.com>
>> wrote:
>> >> hi,
>> >> i want to retrieve all the data from solr (say 10,000 ids ) and my page
>> size
>> >> is 1000 .
>> >> how do i get back the data (pages) one after other ?do i have to
>> increment
>> >> the "start" value each time by the page size from 0 and do the iteration
>> ?
>> >> In this case am i querying the index 10 time instead of one or after
>> first
>> >> query the result will be cached somewhere for the subsequent pages ?
>> >>
>> >>
>> >> JAME VAALET
>> >>
>> >
>>
>
>
>
> --
>
> -JAME
>

Re: paging size in SOLR

Posted by jame vaalet <ja...@gmail.com>.
when you say queryResultCache, does it only cache n number of result for the
last one query or more than one queries?


On 10 August 2011 20:14, simon <mt...@gmail.com> wrote:

> Worth remembering there are some performance penalties with deep
> paging, if you use the page-by-page approach. may not be too much of a
> problem if you really are only looking to retrieve 10K docs.
>
> -Simon
>
> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
> <er...@gmail.com> wrote:
> > Well, if you really want to you can specify start=0 and rows=10000 and
> > get them all back at once.
> >
> > You can do page-by-page by incrementing the "start" parameter as you
> > indicated.
> >
> > You can keep from re-executing the search by setting your
> queryResultCache
> > appropriately, but this affects all searches so might be an issue.
> >
> > Best
> > Erick
> >
> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <ja...@gmail.com>
> wrote:
> >> hi,
> >> i want to retrieve all the data from solr (say 10,000 ids ) and my page
> size
> >> is 1000 .
> >> how do i get back the data (pages) one after other ?do i have to
> increment
> >> the "start" value each time by the page size from 0 and do the iteration
> ?
> >> In this case am i querying the index 10 time instead of one or after
> first
> >> query the result will be cached somewhere for the subsequent pages ?
> >>
> >>
> >> JAME VAALET
> >>
> >
>



-- 

-JAME

Re: paging size in SOLR

Posted by simon <mt...@gmail.com>.
Worth remembering there are some performance penalties with deep
paging, if you use the page-by-page approach. may not be too much of a
problem if you really are only looking to retrieve 10K docs.

-Simon

On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
<er...@gmail.com> wrote:
> Well, if you really want to you can specify start=0 and rows=10000 and
> get them all back at once.
>
> You can do page-by-page by incrementing the "start" parameter as you
> indicated.
>
> You can keep from re-executing the search by setting your queryResultCache
> appropriately, but this affects all searches so might be an issue.
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <ja...@gmail.com> wrote:
>> hi,
>> i want to retrieve all the data from solr (say 10,000 ids ) and my page size
>> is 1000 .
>> how do i get back the data (pages) one after other ?do i have to increment
>> the "start" value each time by the page size from 0 and do the iteration ?
>> In this case am i querying the index 10 time instead of one or after first
>> query the result will be cached somewhere for the subsequent pages ?
>>
>>
>> JAME VAALET
>>
>

Re: paging size in SOLR

Posted by Erick Erickson <er...@gmail.com>.
Well, if you really want to you can specify start=0 and rows=10000 and
get them all back at once.

You can do page-by-page by incrementing the "start" parameter as you
indicated.

You can keep from re-executing the search by setting your queryResultCache
appropriately, but this affects all searches so might be an issue.

Best
Erick

On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <ja...@gmail.com> wrote:
> hi,
> i want to retrieve all the data from solr (say 10,000 ids ) and my page size
> is 1000 .
> how do i get back the data (pages) one after other ?do i have to increment
> the "start" value each time by the page size from 0 and do the iteration ?
> In this case am i querying the index 10 time instead of one or after first
> query the result will be cached somewhere for the subsequent pages ?
>
>
> JAME VAALET
>