You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Luis Cappa Banda <lu...@gmail.com> on 2013/11/05 18:16:53 UTC

Replication: slow first query after replication.

Hi guys!

I have a master-slave replication (Solr 4.1 version) with a 30 seconds
polling interval and continuously new documents are indexed, so after 30
seconds always new data must be replicated. My test index is not huge: just
5M documents.

I have experimented that a simple "q=*:*" query appears to be very slow (up
to 10 secs of QTime). After that first slow query the following "q=*:*"
queries are much quicker. I feel that warming up caches after replication
has something to say about this weird behavior, but maybe an index re-built
is also involved.

Question time:

*1.* How can I warm up caches against? There exists any solrconfig.xml
searcher to configure to be executed after replication events?

*2. *My system needs to execute queries to the slaves continuously. If
there exists any warm up way to reload caches, some queries will experience
slow response times until reload has finished, isn't it?

*3. *After a replication has done, does Solr execute any index rebuild
operation that slow down query responses, or this poor performance is just
due to caches?

*4. *My system is always querying by the latest documents indexed (I'm
filtering by document dates), and I don't use "fq" to execute that queries.
In this scenario, do you recommend to disable caches?

Thank you very much in advance!

Best,

-- 
- Luis Cappa

Re: Replication: slow first query after replication.

Posted by Luis Cappa Banda <lu...@gmail.com>.
Against --> again, :-)


2013/11/5 Luis Cappa Banda <lu...@gmail.com>

> Hi guys!
>
> I have a master-slave replication (Solr 4.1 version) with a 30 seconds
> polling interval and continuously new documents are indexed, so after 30
> seconds always new data must be replicated. My test index is not huge: just
> 5M documents.
>
> I have experimented that a simple "q=*:*" query appears to be very slow
> (up to 10 secs of QTime). After that first slow query the following "q=*:*"
> queries are much quicker. I feel that warming up caches after replication
> has something to say about this weird behavior, but maybe an index re-built
> is also involved.
>
> Question time:
>
> *1.* How can I warm up caches against? There exists any solrconfig.xml
> searcher to configure to be executed after replication events?
>
> *2. *My system needs to execute queries to the slaves continuously. If
> there exists any warm up way to reload caches, some queries will experience
> slow response times until reload has finished, isn't it?
>
> *3. *After a replication has done, does Solr execute any index rebuild
> operation that slow down query responses, or this poor performance is just
> due to caches?
>
> *4. *My system is always querying by the latest documents indexed (I'm
> filtering by document dates), and I don't use "fq" to execute that queries.
> In this scenario, do you recommend to disable caches?
>
> Thank you very much in advance!
>
> Best,
>
> --
> - Luis Cappa
>



-- 
- Luis Cappa

Re: Replication: slow first query after replication.

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/5/2013 10:45 PM, Luis Cappa wrote:
> I have seen that when disabling replication and executing queries the time responses are good. Interesting... I can't ser the solution, then, because slow replication tomes are needed to almost always get 'fresh' documents in slaves to search by, but this appareantly slows down first queries launched because of caches warm up. There must be a solution for this scenario - I think that it should be very common. Do you think that disabling caches will improve this?

If disabling replication keeps your query times good, then I would say
that Solr's caches may be the only reason that you're seeing those good
query times.  With nothing changing the index, the caches remain valid.
 When the index changes, the caches are invalidated and Solr creates new
empty caches, which is why autowarming can be important.

I don't think that disabling the caches will help, but you can always
try it.

Thanks,
Shawn


Re: Replication: slow first query after replication.

Posted by Luis Cappa <lu...@gmail.com>.
Hello, Shawn!

I have seen that when disabling replication and executing queries the time responses are good. Interesting... I can't ser the solution, then, because slow replication tomes are needed to almost always get 'fresh' documents in slaves to search by, but this appareantly slows down first queries launched because of caches warm up. There must be a solution for this scenario - I think that it should be very common. Do you think that disabling caches will improve this?

Thanks a lot!


- Luis Cappa

> El 05/11/2013, a las 23:29, Shawn Heisey <so...@elyograg.org> escribió:
> 
>> On 11/5/2013 10:16 AM, Luis Cappa Banda wrote:
>> I have a master-slave replication (Solr 4.1 version) with a 30 seconds
>> polling interval and continuously new documents are indexed, so after 30
>> seconds always new data must be replicated. My test index is not huge: just
>> 5M documents.
>> 
>> I have experimented that a simple "q=*:*" query appears to be very slow (up
>> to 10 secs of QTime). After that first slow query the following "q=*:*"
>> queries are much quicker. I feel that warming up caches after replication
>> has something to say about this weird behavior, but maybe an index re-built
>> is also involved.
>> 
>> Question time:
>> 
>> *1.* How can I warm up caches against? There exists any solrconfig.xml
>> searcher to configure to be executed after replication events?
>> 
>> *2. *My system needs to execute queries to the slaves continuously. If
>> there exists any warm up way to reload caches, some queries will experience
>> slow response times until reload has finished, isn't it?
>> 
>> *3. *After a replication has done, does Solr execute any index rebuild
>> operation that slow down query responses, or this poor performance is just
>> due to caches?
>> 
>> *4. *My system is always querying by the latest documents indexed (I'm
>> filtering by document dates), and I don't use "fq" to execute that queries.
>> In this scenario, do you recommend to disable caches?
> 
> I suspect that you may be running into a situation where you don't have enough OS disk cache for your index.  When you replicate, the new data that has just been replicated pushes existing data out of the cache.  You run your query that is slow, and the *Solr* caches (not the same thing as the OS disk cache) get populated, making later queries fast.  You should be able to configure autowarming on your Solr caches to help with this, but be aware that autowarming can be time-consuming, and if you have replications happening potentially every 30 seconds, you may find that your autowarming is taking more time than that.  This can lead to other problems.
> 
> If the amount of disk space taken up by those 5 million documents is significantly larger than the amount of memory available on the server that is not allocated directly to programs like Solr itself, then the only true solution will be to add memory to the server.
> 
> Thanks,
> Shawn
> 

Re: Replication: slow first query after replication.

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/5/2013 10:16 AM, Luis Cappa Banda wrote:
> I have a master-slave replication (Solr 4.1 version) with a 30 seconds
> polling interval and continuously new documents are indexed, so after 30
> seconds always new data must be replicated. My test index is not huge: just
> 5M documents.
>
> I have experimented that a simple "q=*:*" query appears to be very slow (up
> to 10 secs of QTime). After that first slow query the following "q=*:*"
> queries are much quicker. I feel that warming up caches after replication
> has something to say about this weird behavior, but maybe an index re-built
> is also involved.
>
> Question time:
>
> *1.* How can I warm up caches against? There exists any solrconfig.xml
> searcher to configure to be executed after replication events?
>
> *2. *My system needs to execute queries to the slaves continuously. If
> there exists any warm up way to reload caches, some queries will experience
> slow response times until reload has finished, isn't it?
>
> *3. *After a replication has done, does Solr execute any index rebuild
> operation that slow down query responses, or this poor performance is just
> due to caches?
>
> *4. *My system is always querying by the latest documents indexed (I'm
> filtering by document dates), and I don't use "fq" to execute that queries.
> In this scenario, do you recommend to disable caches?

I suspect that you may be running into a situation where you don't have 
enough OS disk cache for your index.  When you replicate, the new data 
that has just been replicated pushes existing data out of the cache.  
You run your query that is slow, and the *Solr* caches (not the same 
thing as the OS disk cache) get populated, making later queries fast.  
You should be able to configure autowarming on your Solr caches to help 
with this, but be aware that autowarming can be time-consuming, and if 
you have replications happening potentially every 30 seconds, you may 
find that your autowarming is taking more time than that.  This can lead 
to other problems.

If the amount of disk space taken up by those 5 million documents is 
significantly larger than the amount of memory available on the server 
that is not allocated directly to programs like Solr itself, then the 
only true solution will be to add memory to the server.

Thanks,
Shawn