You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "michael.boom" <my...@yahoo.com> on 2013/11/04 14:43:42 UTC

Performance of "rows" and "start" parameters

I saw that some time ago there was a JIRA ticket dicussing this, but still i
found no relevant information on how to deal with it.

When working with big nr of docs (e.g. 70M) in my case, I'm using
start=0&rows=30 in my requests.
For the first req the query time is ok, the next one is visibily slower, the
third even more slow and so on until i get some huge query times of up
140secs, after a few hundreds requests. My test were done with SolrMeter at
a rate of 1000qpm. Same thing happens at 100qpm, tough.

Is there a best practice on how to do in this situation, or maybe an
explanation why is the query time increasing, from request to request ?

Thanks!



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance of "rows" and "start" parameters

Posted by Michael Della Bitta <mi...@appinions.com>.

Whoops, looks like I misdiagnosed this one.

Just to add: you might want to make sure lazy field loading is enabled, too.
On Nov 5, 2013 7:21 AM, "Erick Erickson" <er...@gmail.com> wrote:

> As long as start=0, this is _not_ the deep paging problem.
>
> Raymond's comments are well taken. Try restricting the
> returned fields to only id. If you have large fields, Solr 4.1+
> automatically compresses the data so you might be seeing
> lots of time spent in decompression, that'd be my first guess.
>
> And it's important to look at your QTime return in the responses,
> I forget whether SolrMeter reports that time or total time. That's
> the time spent searching, exclusive of loading the documents
> into the return packet.
>
> It looks like you're pegging the CPU (and the decompression
> might be why) and getting into a resource-starved situation.
>
> Best,
> Erick
>
>
> On Tue, Nov 5, 2013 at 6:47 AM, Raymond Wiker <rw...@gmail.com> wrote:
>
> > Are you restricting the set of fields that you return from the queries?
> If
> > not, it could be that you are returning fields that are potentially very
> > large, and may affect query performance that way.
> >
> >
> > On Tue, Nov 5, 2013 at 11:38 AM, michael.boom <my...@yahoo.com>
> wrote:
> >
> > > Thank you!
> > >
> > > I suspect that maybe my box was too small.
> > > I'm upgrading my machines to more CPU & RAM and let's see how it goes
> > from
> > > there.
> > >
> > > Would limiting the number of returned fields to a smaller value would
> > make
> > > any improvement?
> > > The behaviour I noticed was that:
> > > at start=o&rows=10 avg qtime after 200queris was about 15ms
> > > at start=o&rows=20 avg qtime after 200queris was about 20ms
> > > at start=o&rows=30 avg qtime after 200queris was about 250ms and slowly
> > > increasing.
> > > at start=o&rows=50 avg qtime after 200queris was about 1400ms and
> > > increasing
> > > really fast.
> > >
> > > Tests were made using SolrMeter, using a set of keywords, each request
> > > having specified the start=0&rows=N (N being one of the values above).
> > So,
> > > no deep paging, always requesting first N results, sorted by score.
> > >
> > > I will try again this scenario on the bigger boxes, and come back.
> > >
> > >
> > >
> > > -----
> > > Thanks,
> > > Michael
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194p4099370.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: Performance of "rows" and "start" parameters

Posted by Erick Erickson <er...@gmail.com>.

As long as start=0, this is _not_ the deep paging problem.

Raymond's comments are well taken. Try restricting the
returned fields to only id. If you have large fields, Solr 4.1+
automatically compresses the data so you might be seeing
lots of time spent in decompression, that'd be my first guess.

And it's important to look at your QTime return in the responses,
I forget whether SolrMeter reports that time or total time. That's
the time spent searching, exclusive of loading the documents
into the return packet.

It looks like you're pegging the CPU (and the decompression
might be why) and getting into a resource-starved situation.

Best,
Erick

On Tue, Nov 5, 2013 at 6:47 AM, Raymond Wiker <rw...@gmail.com> wrote:

> Are you restricting the set of fields that you return from the queries? If
> not, it could be that you are returning fields that are potentially very
> large, and may affect query performance that way.
>
>
> On Tue, Nov 5, 2013 at 11:38 AM, michael.boom <my...@yahoo.com> wrote:
>
> > Thank you!
> >
> > I suspect that maybe my box was too small.
> > I'm upgrading my machines to more CPU & RAM and let's see how it goes
> from
> > there.
> >
> > Would limiting the number of returned fields to a smaller value would
> make
> > any improvement?
> > The behaviour I noticed was that:
> > at start=o&rows=10 avg qtime after 200queris was about 15ms
> > at start=o&rows=20 avg qtime after 200queris was about 20ms
> > at start=o&rows=30 avg qtime after 200queris was about 250ms and slowly
> > increasing.
> > at start=o&rows=50 avg qtime after 200queris was about 1400ms and
> > increasing
> > really fast.
> >
> > Tests were made using SolrMeter, using a set of keywords, each request
> > having specified the start=0&rows=N (N being one of the values above).
> So,
> > no deep paging, always requesting first N results, sorted by score.
> >
> > I will try again this scenario on the bigger boxes, and come back.
> >
> >
> >
> > -----
> > Thanks,
> > Michael
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194p4099370.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Performance of "rows" and "start" parameters

Posted by Raymond Wiker <rw...@gmail.com>.

Are you restricting the set of fields that you return from the queries? If
not, it could be that you are returning fields that are potentially very
large, and may affect query performance that way.


On Tue, Nov 5, 2013 at 11:38 AM, michael.boom <my...@yahoo.com> wrote:

> Thank you!
>
> I suspect that maybe my box was too small.
> I'm upgrading my machines to more CPU & RAM and let's see how it goes from
> there.
>
> Would limiting the number of returned fields to a smaller value would make
> any improvement?
> The behaviour I noticed was that:
> at start=o&rows=10 avg qtime after 200queris was about 15ms
> at start=o&rows=20 avg qtime after 200queris was about 20ms
> at start=o&rows=30 avg qtime after 200queris was about 250ms and slowly
> increasing.
> at start=o&rows=50 avg qtime after 200queris was about 1400ms and
> increasing
> really fast.
>
> Tests were made using SolrMeter, using a set of keywords, each request
> having specified the start=0&rows=N (N being one of the values above). So,
> no deep paging, always requesting first N results, sorted by score.
>
> I will try again this scenario on the bigger boxes, and come back.
>
>
>
> -----
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194p4099370.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Performance of "rows" and "start" parameters

Posted by "michael.boom" <my...@yahoo.com>.

Thank you!

I suspect that maybe my box was too small.
I'm upgrading my machines to more CPU & RAM and let's see how it goes from
there.

Would limiting the number of returned fields to a smaller value would make
any improvement?
The behaviour I noticed was that: 
at start=o&rows=10 avg qtime after 200queris was about 15ms
at start=o&rows=20 avg qtime after 200queris was about 20ms
at start=o&rows=30 avg qtime after 200queris was about 250ms and slowly
increasing.
at start=o&rows=50 avg qtime after 200queris was about 1400ms and increasing
really fast.

Tests were made using SolrMeter, using a set of keywords, each request
having specified the start=0&rows=N (N being one of the values above). So,
no deep paging, always requesting first N results, sorted by score.

I will try again this scenario on the bigger boxes, and come back.



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194p4099370.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance of "rows" and "start" parameters

Posted by Erick Erickson <er...@gmail.com>.

bq: start=0&rows=30

Let's see the start and rows parameters for a few of
your queries, because on the surface this makes
no sense. If you're always starting at 0, this
shouldn't be happening....

And you say "the second query is visibly slower". You're
talking about the "deep paging" problem, which you shouldn't
notice until your start parameter is at least up in the
thousands, perhaps 10s of thousands.

So unless you're incrementing the start parameter way up
there, there's something else going on.....

You should be seeing this reflected in your QTimes BTW, if
not then you're seeing something else, perhaps just
too much happening on the box...

FWIW,
Erick


On Mon, Nov 4, 2013 at 11:01 AM, Michael Della Bitta <
michael.della.bitta@appinions.com> wrote:

> The query time increases because in order to calculate the set of documents
> that belongs in page N, you must first calculate all the pages prior to
> page N, and this information is not stored in between requests.
>
> Two ways of speeding this stuff up are to request bigger pages, and/or use
> filter queries over some sort of orderable field in your index to do the
> paging. So for example, if you have a timestamp field in your index, and
> your data represents 100 days, doing 100 queries, one for each day, is much
> better than doing 100 queries using start/rows.
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com <http://www.appinions.com/>
>
>
> On Mon, Nov 4, 2013 at 8:43 AM, michael.boom <my...@yahoo.com> wrote:
>
> > I saw that some time ago there was a JIRA ticket dicussing this, but
> still
> > i
> > found no relevant information on how to deal with it.
> >
> > When working with big nr of docs (e.g. 70M) in my case, I'm using
> > start=0&rows=30 in my requests.
> > For the first req the query time is ok, the next one is visibily slower,
> > the
> > third even more slow and so on until i get some huge query times of up
> > 140secs, after a few hundreds requests. My test were done with SolrMeter
> at
> > a rate of 1000qpm. Same thing happens at 100qpm, tough.
> >
> > Is there a best practice on how to do in this situation, or maybe an
> > explanation why is the query time increasing, from request to request ?
> >
> > Thanks!
> >
> >
> >
> > -----
> > Thanks,
> > Michael
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Performance of "rows" and "start" parameters

Posted by Michael Della Bitta <mi...@appinions.com>.

The query time increases because in order to calculate the set of documents
that belongs in page N, you must first calculate all the pages prior to
page N, and this information is not stored in between requests.

Two ways of speeding this stuff up are to request bigger pages, and/or use
filter queries over some sort of orderable field in your index to do the
paging. So for example, if you have a timestamp field in your index, and
your data represents 100 days, doing 100 queries, one for each day, is much
better than doing 100 queries using start/rows.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>

On Mon, Nov 4, 2013 at 8:43 AM, michael.boom <my...@yahoo.com> wrote:

> I saw that some time ago there was a JIRA ticket dicussing this, but still
> i
> found no relevant information on how to deal with it.
>
> When working with big nr of docs (e.g. 70M) in my case, I'm using
> start=0&rows=30 in my requests.
> For the first req the query time is ok, the next one is visibily slower,
> the
> third even more slow and so on until i get some huge query times of up
> 140secs, after a few hundreds requests. My test were done with SolrMeter at
> a rate of 1000qpm. Same thing happens at 100qpm, tough.
>
> Is there a best practice on how to do in this situation, or maybe an
> explanation why is the query time increasing, from request to request ?
>
> Thanks!
>
>
>
> -----
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Performance of "rows" and "start" parameters

Posted by Bill Bell <bi...@gmail.com>.

Do you want to look thru then all ? Have you considered Lucene API? Not sure if that is better but it might be.

Bill Bell
Sent from mobile


> On Nov 4, 2013, at 6:43 AM, "michael.boom" <my...@yahoo.com> wrote:
> 
> I saw that some time ago there was a JIRA ticket dicussing this, but still i
> found no relevant information on how to deal with it.
> 
> When working with big nr of docs (e.g. 70M) in my case, I'm using
> start=0&rows=30 in my requests.
> For the first req the query time is ok, the next one is visibily slower, the
> third even more slow and so on until i get some huge query times of up
> 140secs, after a few hundreds requests. My test were done with SolrMeter at
> a rate of 1000qpm. Same thing happens at 100qpm, tough.
> 
> Is there a best practice on how to do in this situation, or maybe an
> explanation why is the query time increasing, from request to request ?
> 
> Thanks!
> 
> 
> 
> -----
> Thanks,
> Michael
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html
> Sent from the Solr - User mailing list archive at Nabble.com.