You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mike anderson <sa...@gmail.com> on 2009/11/02 22:34:20 UTC

field queries seem slow

I took a look through my Solr logs this weekend and noticed that the longest
queries were on particular fields, like "author:albert einstein". Is this a
result consistent with other setups out there? If not, Is there a trick to
make these go faster? I've read up on filter queries and use those when
applicable, but they don't really solve all my problems.

If anybody wants to take a shot at it but needs to see my solrconfig, etc
just let me know.

Cheers,
Mike

Re: field queries seem slow

Posted by Lance Norskog <go...@gmail.com>.
Restarting Solr clears out all caching.

Doing a commit used to drop all of the caches for new requests, but it
no longer does this.

On Linux you can clear the kernel's disk buffer cache with a special
hook. You echo '1' into a /proc/something and this tells the kernel to
drop its caches. Sorry, don't remember the exact command.

On Thu, Nov 5, 2009 at 10:09 AM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hi,
>
> There is no way that I know to clear Solr's caches (query, document, filter caches).
> FIeldCache is a Lucene thing and it's also something you can't clear, as far as I know.
>
> Slowness on start could be due to:
>
>  * OS not cached the index yet (would be the case if your Solr was down for a while and its index got displaced from the OS buffers)
>  * sort query run for the first time, FieldCache not populated yet
>  * expensive query run for the first time, its results and hits not cached in Solr caches
>
>  * ...
>
> Otis
>
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message ----
>> From: mike anderson <sa...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thu, November 5, 2009 11:34:59 AM
>> Subject: Re: field queries seem slow
>>
>> On production our servers are restarted very rarely (once a month). But this
>> raises a question, what does it take to clear the cache? On my benchmarking
>> platform I've been simply restarting the server as a method of starting
>> fresh. Is there a cache file I could delete to make sure I'm getting
>> unbiased results? Second of all, is there an internal cache for sort fields
>> separate from the cache for queries and filters which has settings found in
>> the solrconfig.xml file?
>>
>> I did a test as you suggested to determine if that type of query is always
>> slow or just when it starts up, it seems that it is only slow when it starts
>> up. However, it seems to be slow when it starts up with and without sorting.
>> (I'm still trying to figure out how to do good benchmarking with one
>> independent variable, so it's possible that this result is inconsistent)
>>
>> for reference, my query is looking like this (+/- sort field):
>>
>> http://10.0.20.174:8986/solr/select?mlt=false&rows=10&shards=localhost:8986/solr,localhost:8986/solr,localhost:8986/solr&q=abbrev_authors%3A%22Gallinger+S%22
>>
>> I like the suggestion on date resolution, we definitely don't need second
>> accuracy (which it is now), and in fact I think we'll just start stamping
>> documents with year/week and then sort by that.
>>
>>
>> thanks for all your help!
>>
>> Cheers,
>> Mike
>>
>>
>>
>> On Wed, Nov 4, 2009 at 2:07 PM, Erick Erickson wrote:
>>
>> > By readers, I meant your searchers. Perhaps you were shutting
>> > down your servers?
>> >
>> > The warming isn't to pre-load authors, it's to pre-populate, particularly,
>> > sort fields. Which are then kept in caches. There is considerable
>> > overhead in loading the sort field the first time you sort by it. So,
>> > my question was really based on the chance that "over the
>> > weekend" corresponded to "the first queries after the server
>> > restarted", or "the first query after the underlying index searchers
>> > were (re)opened.
>> >
>> > The real question comes down to whether the same form of query
>> > (i.e. searching for different values on the same fields with the
>> > same kind of sort) is slow all the time or just when things start up.
>> >
>> > How fine is the resolution for your dates? Assuming that the sorting
>> > is the issue, if you are storing dates in the millisecond range, that's
>> > probably 20M dates that have to be loaded to sort. You might
>> > want to think about a coarser resolution  if this has any relevance.
>> >
>> > HTH
>> > Erick
>> >
>> > On Wed, Nov 4, 2009 at 1:54 PM, mike anderson
>> > >wrote:
>> >
>> > > Erik, we are doing a sort by date first, and then by score. I'm not sure
>> > > what you mean by readers.
>> > >
>> > > Since we have nearly 6M authors attached to our 20M documents I'm not
>> > sure
>> > > that autowarming would help that much (especially since we have very
>> > little
>> > > overlap in what users are searching for). But maybe it would?
>> > >
>> > > Lance, I was just being a bit lazy. thanks though.
>> > >
>> > > -mike
>> > >
>> > >
>> > > On Mon, Nov 2, 2009 at 10:27 PM, Lance Norskog
>> > wrote:
>> > >
>> > > > This searches author:albert and (default text field): einstein. This
>> > > > may not be what you expect?
>> > > >
>> > > > On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson <
>> > erickerickson@gmail.com>
>> > > > wrote:
>> > > > > Hmmmm, are you sorting? And has your readers been reopened? Is the
>> > > > > second query of that sort also slow? If the answer to this last
>> > > question
>> > > > is
>> > > > > "no",
>> > > > > have you tried some autowarming queries?
>> > > > >
>> > > > > Best
>> > > > > Erick
>> > > > >
>> > > > > On Mon, Nov 2, 2009 at 4:34 PM, mike anderson <
>> > saidtherobot@gmail.com
>> > > > >wrote:
>> > > > >
>> > > > >> I took a look through my Solr logs this weekend and noticed that the
>> > > > >> longest
>> > > > >> queries were on particular fields, like "author:albert einstein". Is
>> > > > this a
>> > > > >> result consistent with other setups out there? If not, Is there a
>> > > trick
>> > > > to
>> > > > >> make these go faster? I've read up on filter queries and use those
>> > > when
>> > > > >> applicable, but they don't really solve all my problems.
>> > > > >>
>> > > > >> If anybody wants to take a shot at it but needs to see my
>> > solrconfig,
>> > > > etc
>> > > > >> just let me know.
>> > > > >>
>> > > > >> Cheers,
>> > > > >> Mike
>> > > > >>
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Lance Norskog
>> > > > goksron@gmail.com
>> > > >
>> > >
>> >
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: field queries seem slow

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

There is no way that I know to clear Solr's caches (query, document, filter caches).
FIeldCache is a Lucene thing and it's also something you can't clear, as far as I know.

Slowness on start could be due to:

 * OS not cached the index yet (would be the case if your Solr was down for a while and its index got displaced from the OS buffers)
 * sort query run for the first time, FieldCache not populated yet
 * expensive query run for the first time, its results and hits not cached in Solr caches

 * ...

Otis

--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: mike anderson <sa...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, November 5, 2009 11:34:59 AM
> Subject: Re: field queries seem slow
> 
> On production our servers are restarted very rarely (once a month). But this
> raises a question, what does it take to clear the cache? On my benchmarking
> platform I've been simply restarting the server as a method of starting
> fresh. Is there a cache file I could delete to make sure I'm getting
> unbiased results? Second of all, is there an internal cache for sort fields
> separate from the cache for queries and filters which has settings found in
> the solrconfig.xml file?
> 
> I did a test as you suggested to determine if that type of query is always
> slow or just when it starts up, it seems that it is only slow when it starts
> up. However, it seems to be slow when it starts up with and without sorting.
> (I'm still trying to figure out how to do good benchmarking with one
> independent variable, so it's possible that this result is inconsistent)
> 
> for reference, my query is looking like this (+/- sort field):
> 
> http://10.0.20.174:8986/solr/select?mlt=false&rows=10&shards=localhost:8986/solr,localhost:8986/solr,localhost:8986/solr&q=abbrev_authors%3A%22Gallinger+S%22
> 
> I like the suggestion on date resolution, we definitely don't need second
> accuracy (which it is now), and in fact I think we'll just start stamping
> documents with year/week and then sort by that.
> 
> 
> thanks for all your help!
> 
> Cheers,
> Mike
> 
> 
> 
> On Wed, Nov 4, 2009 at 2:07 PM, Erick Erickson wrote:
> 
> > By readers, I meant your searchers. Perhaps you were shutting
> > down your servers?
> >
> > The warming isn't to pre-load authors, it's to pre-populate, particularly,
> > sort fields. Which are then kept in caches. There is considerable
> > overhead in loading the sort field the first time you sort by it. So,
> > my question was really based on the chance that "over the
> > weekend" corresponded to "the first queries after the server
> > restarted", or "the first query after the underlying index searchers
> > were (re)opened.
> >
> > The real question comes down to whether the same form of query
> > (i.e. searching for different values on the same fields with the
> > same kind of sort) is slow all the time or just when things start up.
> >
> > How fine is the resolution for your dates? Assuming that the sorting
> > is the issue, if you are storing dates in the millisecond range, that's
> > probably 20M dates that have to be loaded to sort. You might
> > want to think about a coarser resolution  if this has any relevance.
> >
> > HTH
> > Erick
> >
> > On Wed, Nov 4, 2009 at 1:54 PM, mike anderson 
> > >wrote:
> >
> > > Erik, we are doing a sort by date first, and then by score. I'm not sure
> > > what you mean by readers.
> > >
> > > Since we have nearly 6M authors attached to our 20M documents I'm not
> > sure
> > > that autowarming would help that much (especially since we have very
> > little
> > > overlap in what users are searching for). But maybe it would?
> > >
> > > Lance, I was just being a bit lazy. thanks though.
> > >
> > > -mike
> > >
> > >
> > > On Mon, Nov 2, 2009 at 10:27 PM, Lance Norskog 
> > wrote:
> > >
> > > > This searches author:albert and (default text field): einstein. This
> > > > may not be what you expect?
> > > >
> > > > On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson <
> > erickerickson@gmail.com>
> > > > wrote:
> > > > > Hmmmm, are you sorting? And has your readers been reopened? Is the
> > > > > second query of that sort also slow? If the answer to this last
> > > question
> > > > is
> > > > > "no",
> > > > > have you tried some autowarming queries?
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > > On Mon, Nov 2, 2009 at 4:34 PM, mike anderson <
> > saidtherobot@gmail.com
> > > > >wrote:
> > > > >
> > > > >> I took a look through my Solr logs this weekend and noticed that the
> > > > >> longest
> > > > >> queries were on particular fields, like "author:albert einstein". Is
> > > > this a
> > > > >> result consistent with other setups out there? If not, Is there a
> > > trick
> > > > to
> > > > >> make these go faster? I've read up on filter queries and use those
> > > when
> > > > >> applicable, but they don't really solve all my problems.
> > > > >>
> > > > >> If anybody wants to take a shot at it but needs to see my
> > solrconfig,
> > > > etc
> > > > >> just let me know.
> > > > >>
> > > > >> Cheers,
> > > > >> Mike
> > > > >>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Lance Norskog
> > > > goksron@gmail.com
> > > >
> > >
> >


Re: field queries seem slow

Posted by mike anderson <sa...@gmail.com>.
On production our servers are restarted very rarely (once a month). But this
raises a question, what does it take to clear the cache? On my benchmarking
platform I've been simply restarting the server as a method of starting
fresh. Is there a cache file I could delete to make sure I'm getting
unbiased results? Second of all, is there an internal cache for sort fields
separate from the cache for queries and filters which has settings found in
the solrconfig.xml file?

I did a test as you suggested to determine if that type of query is always
slow or just when it starts up, it seems that it is only slow when it starts
up. However, it seems to be slow when it starts up with and without sorting.
(I'm still trying to figure out how to do good benchmarking with one
independent variable, so it's possible that this result is inconsistent)

for reference, my query is looking like this (+/- sort field):

http://10.0.20.174:8986/solr/select?mlt=false&rows=10&shards=localhost:8986/solr,localhost:8986/solr,localhost:8986/solr&q=abbrev_authors%3A%22Gallinger+S%22

I like the suggestion on date resolution, we definitely don't need second
accuracy (which it is now), and in fact I think we'll just start stamping
documents with year/week and then sort by that.


thanks for all your help!

Cheers,
Mike



On Wed, Nov 4, 2009 at 2:07 PM, Erick Erickson <er...@gmail.com>wrote:

> By readers, I meant your searchers. Perhaps you were shutting
> down your servers?
>
> The warming isn't to pre-load authors, it's to pre-populate, particularly,
> sort fields. Which are then kept in caches. There is considerable
> overhead in loading the sort field the first time you sort by it. So,
> my question was really based on the chance that "over the
> weekend" corresponded to "the first queries after the server
> restarted", or "the first query after the underlying index searchers
> were (re)opened.
>
> The real question comes down to whether the same form of query
> (i.e. searching for different values on the same fields with the
> same kind of sort) is slow all the time or just when things start up.
>
> How fine is the resolution for your dates? Assuming that the sorting
> is the issue, if you are storing dates in the millisecond range, that's
> probably 20M dates that have to be loaded to sort. You might
> want to think about a coarser resolution  if this has any relevance.
>
> HTH
> Erick
>
> On Wed, Nov 4, 2009 at 1:54 PM, mike anderson <saidtherobot@gmail.com
> >wrote:
>
> > Erik, we are doing a sort by date first, and then by score. I'm not sure
> > what you mean by readers.
> >
> > Since we have nearly 6M authors attached to our 20M documents I'm not
> sure
> > that autowarming would help that much (especially since we have very
> little
> > overlap in what users are searching for). But maybe it would?
> >
> > Lance, I was just being a bit lazy. thanks though.
> >
> > -mike
> >
> >
> > On Mon, Nov 2, 2009 at 10:27 PM, Lance Norskog <go...@gmail.com>
> wrote:
> >
> > > This searches author:albert and (default text field): einstein. This
> > > may not be what you expect?
> > >
> > > On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson <
> erickerickson@gmail.com>
> > > wrote:
> > > > Hmmmm, are you sorting? And has your readers been reopened? Is the
> > > > second query of that sort also slow? If the answer to this last
> > question
> > > is
> > > > "no",
> > > > have you tried some autowarming queries?
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On Mon, Nov 2, 2009 at 4:34 PM, mike anderson <
> saidtherobot@gmail.com
> > > >wrote:
> > > >
> > > >> I took a look through my Solr logs this weekend and noticed that the
> > > >> longest
> > > >> queries were on particular fields, like "author:albert einstein". Is
> > > this a
> > > >> result consistent with other setups out there? If not, Is there a
> > trick
> > > to
> > > >> make these go faster? I've read up on filter queries and use those
> > when
> > > >> applicable, but they don't really solve all my problems.
> > > >>
> > > >> If anybody wants to take a shot at it but needs to see my
> solrconfig,
> > > etc
> > > >> just let me know.
> > > >>
> > > >> Cheers,
> > > >> Mike
> > > >>
> > > >
> > >
> > >
> > >
> > > --
> > > Lance Norskog
> > > goksron@gmail.com
> > >
> >
>

Re: field queries seem slow

Posted by Erick Erickson <er...@gmail.com>.
By readers, I meant your searchers. Perhaps you were shutting
down your servers?

The warming isn't to pre-load authors, it's to pre-populate, particularly,
sort fields. Which are then kept in caches. There is considerable
overhead in loading the sort field the first time you sort by it. So,
my question was really based on the chance that "over the
weekend" corresponded to "the first queries after the server
restarted", or "the first query after the underlying index searchers
were (re)opened.

The real question comes down to whether the same form of query
(i.e. searching for different values on the same fields with the
same kind of sort) is slow all the time or just when things start up.

How fine is the resolution for your dates? Assuming that the sorting
is the issue, if you are storing dates in the millisecond range, that's
probably 20M dates that have to be loaded to sort. You might
want to think about a coarser resolution  if this has any relevance.

HTH
Erick

On Wed, Nov 4, 2009 at 1:54 PM, mike anderson <sa...@gmail.com>wrote:

> Erik, we are doing a sort by date first, and then by score. I'm not sure
> what you mean by readers.
>
> Since we have nearly 6M authors attached to our 20M documents I'm not sure
> that autowarming would help that much (especially since we have very little
> overlap in what users are searching for). But maybe it would?
>
> Lance, I was just being a bit lazy. thanks though.
>
> -mike
>
>
> On Mon, Nov 2, 2009 at 10:27 PM, Lance Norskog <go...@gmail.com> wrote:
>
> > This searches author:albert and (default text field): einstein. This
> > may not be what you expect?
> >
> > On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson <er...@gmail.com>
> > wrote:
> > > Hmmmm, are you sorting? And has your readers been reopened? Is the
> > > second query of that sort also slow? If the answer to this last
> question
> > is
> > > "no",
> > > have you tried some autowarming queries?
> > >
> > > Best
> > > Erick
> > >
> > > On Mon, Nov 2, 2009 at 4:34 PM, mike anderson <saidtherobot@gmail.com
> > >wrote:
> > >
> > >> I took a look through my Solr logs this weekend and noticed that the
> > >> longest
> > >> queries were on particular fields, like "author:albert einstein". Is
> > this a
> > >> result consistent with other setups out there? If not, Is there a
> trick
> > to
> > >> make these go faster? I've read up on filter queries and use those
> when
> > >> applicable, but they don't really solve all my problems.
> > >>
> > >> If anybody wants to take a shot at it but needs to see my solrconfig,
> > etc
> > >> just let me know.
> > >>
> > >> Cheers,
> > >> Mike
> > >>
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > goksron@gmail.com
> >
>

Re: field queries seem slow

Posted by mike anderson <sa...@gmail.com>.
Erik, we are doing a sort by date first, and then by score. I'm not sure
what you mean by readers.

Since we have nearly 6M authors attached to our 20M documents I'm not sure
that autowarming would help that much (especially since we have very little
overlap in what users are searching for). But maybe it would?

Lance, I was just being a bit lazy. thanks though.

-mike


On Mon, Nov 2, 2009 at 10:27 PM, Lance Norskog <go...@gmail.com> wrote:

> This searches author:albert and (default text field): einstein. This
> may not be what you expect?
>
> On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson <er...@gmail.com>
> wrote:
> > Hmmmm, are you sorting? And has your readers been reopened? Is the
> > second query of that sort also slow? If the answer to this last question
> is
> > "no",
> > have you tried some autowarming queries?
> >
> > Best
> > Erick
> >
> > On Mon, Nov 2, 2009 at 4:34 PM, mike anderson <saidtherobot@gmail.com
> >wrote:
> >
> >> I took a look through my Solr logs this weekend and noticed that the
> >> longest
> >> queries were on particular fields, like "author:albert einstein". Is
> this a
> >> result consistent with other setups out there? If not, Is there a trick
> to
> >> make these go faster? I've read up on filter queries and use those when
> >> applicable, but they don't really solve all my problems.
> >>
> >> If anybody wants to take a shot at it but needs to see my solrconfig,
> etc
> >> just let me know.
> >>
> >> Cheers,
> >> Mike
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: field queries seem slow

Posted by Lance Norskog <go...@gmail.com>.
This searches author:albert and (default text field): einstein. This
may not be what you expect?

On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson <er...@gmail.com> wrote:
> Hmmmm, are you sorting? And has your readers been reopened? Is the
> second query of that sort also slow? If the answer to this last question is
> "no",
> have you tried some autowarming queries?
>
> Best
> Erick
>
> On Mon, Nov 2, 2009 at 4:34 PM, mike anderson <sa...@gmail.com>wrote:
>
>> I took a look through my Solr logs this weekend and noticed that the
>> longest
>> queries were on particular fields, like "author:albert einstein". Is this a
>> result consistent with other setups out there? If not, Is there a trick to
>> make these go faster? I've read up on filter queries and use those when
>> applicable, but they don't really solve all my problems.
>>
>> If anybody wants to take a shot at it but needs to see my solrconfig, etc
>> just let me know.
>>
>> Cheers,
>> Mike
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: field queries seem slow

Posted by Erick Erickson <er...@gmail.com>.
Hmmmm, are you sorting? And has your readers been reopened? Is the
second query of that sort also slow? If the answer to this last question is
"no",
have you tried some autowarming queries?

Best
Erick

On Mon, Nov 2, 2009 at 4:34 PM, mike anderson <sa...@gmail.com>wrote:

> I took a look through my Solr logs this weekend and noticed that the
> longest
> queries were on particular fields, like "author:albert einstein". Is this a
> result consistent with other setups out there? If not, Is there a trick to
> make these go faster? I've read up on filter queries and use those when
> applicable, but they don't really solve all my problems.
>
> If anybody wants to take a shot at it but needs to see my solrconfig, etc
> just let me know.
>
> Cheers,
> Mike
>