You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by gustavonasu <gu...@gmail.com> on 2013/04/23 19:27:47 UTC

Autocommit and replication have been slowing down

Hi,

    We migrated recently from Solr 1.4 to 3.6.1. In the new version we have
noticed that after some hours (around 8) the autocommit is taking more time
to be executed.

    In the new version we have noticed that after some hours the autocommit
is taking more time to be executed. We configured autocommit with maxDocs=50
and maxTime=10000ms but we've gotten few (3-5) minutes to index documents (I
got this time seeing the docsPending on the Update Stats and refresh page.
Is there another way to verify that information?). 

    A similar problem has been happening with the replication. We configured
the pollInterval with 60s but the replication takes some minutes to be
executed. You could see the timeElapsed value (around 6 minutes) on the
Replication Stats.

    After a server restart the indexing works as we expected for some hours.

    Our solrconfig.xml file is almost the default. We just increased some
params on filterCache, queryResultCache and queryResultWindowSize. 

    Has anyone ever had same problem?

    Could someone has a hint or direction where to start?

*** Update Handlers
name: 	 updateHandler  
class: 	 org.apache.solr.update.DirectUpdateHandler2  
version: 	 1.0  
description: 	 Update handler that efficiently directly updates the on-disk
main lucene index  
stats: 	commits : 1085 
autocommit maxDocs : 50 
autocommit maxTime : 10000ms 
autocommits : 1085 
optimizes : 0 
rollbacks : 0 
expungeDeletes : 0 
docsPending : 18 
adds : 18 
deletesById : 5 
deletesByQuery : 0 
errors : 0 
cumulative_adds : 6294 
cumulative_deletesById : 5397 
cumulative_deletesByQuery : 0 
cumulative_errors : 0 

*** Replication Stats
stats: 	handlerStart : 1366654495647 
requests : 0 
errors : 0 
timeouts : 0 
totalTime : 0 
avgTimePerRequest : NaN 
avgRequestsPerSecond : 0.0 
indexSize : 2.29 GB 
indexVersion : 1354902172888 
generation : 121266 
indexPath : /opt/solr/data/index.20130418170401 
isMaster : false 
isSlave : true 
masterUrl : http://master:9090/solr/replication 
pollInterval : 00:00:60 
isPollingDisabled : false 
isReplicating : true 
timeElapsed : 376 
bytesDownloaded : 35835 
downloadSpeed : 95 
previousCycleTimeInSeconds : 0 
indexReplicatedAt : Tue Apr 23 13:44:52 BRT 2013 
confFilesReplicatedAt : Mon Mar 18 10:27:00 BRT 2013 
replicationFailedAt : Mon Apr 22 08:05:00 BRT 2013 
timesFailed : 6 
timesIndexReplicated : 45318 
lastCycleBytesDownloaded : 35835 
timesConfigReplicated : 3 
confFilesReplicated : [schema.xml] 

Thanks,
Gustavo Nasu



--
View this message in context: http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autocommit and replication have been slowing down

Posted by gustavonasu <gu...@gmail.com>.
Hi Shawn,

Thanks for the lesson! I really appreciate your help.

I'll figure out a way to use that knowledge to solve my problem.

Best Regards



--
View this message in context: http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361p4058584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autocommit and replication have been slowing down

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/23/2013 3:44 PM, gustavonasu wrote:
> If I understand well the autoWarmCount is the number of elements used from
> the cache for new searches. I guess that this isn't the problem because
> after the commit property increases on the "UPDATE HANDLERS" (admin UI) I
> can see the new docs in the searches result.

Autowarming is the process of using entries in the cache on the old 
searcher to warm up the caches on new searcher.  If one of the most 
recently used entries in your old filterCache (before commit) is 
"inStock:true" then a search for that filter will be executed on the new 
searcher and the results stored in the new filterCache.  It has nothing 
to do with being able to see recently added or deleted documents in 
searches.

> Unfortunately I can't increase the java heap on the servers right now. So I
> was think to change some configurations to release some memory. For example,
> we could decrease the maxBufferedDocs value. Do you know if it will be
> effective?

One of the biggest RAM consumers in Solr is Lucene's FieldCache, but you 
can't configure that.  If that's becoming a problem, you have to change 
the nature of your queries so they require less memory.

There are two main configuration points you can use to control Solr RAM 
usage - ramBufferSizeMB and the various Solr caches.  The 
maxBufferedDocs option is deprecated, replaced by ramBufferSizeMB.  I 
would just remove it.  Take a look at a recent Solr version like 4.2.1, 
look at the example solrconfig.xml for collection1.

The default ramBufferSizeMB in newer Solr versions is 100.  Unless you 
are indexing incredibly large documents, this is plenty, and you 
probably won't make things go any faster by increasing it.  Decreasing 
it is one way of reducing Solr's RAM usage.  The previous default was 32.

For Solr's caches, you normally don't need very big numbers.  If you 
have tons of RAM to spare, making them large might provide some benefit. 
  What I am saying below is only general advice, there are sometimes 
very good reasons for making things much larger.  If you do choose the 
change the numbers, be sure to keep an eye on your query times to make 
sure they are still acceptable.  It all depends on what kind of queries 
your users send.

The documentCache offers questionable benefits.  It may be better to let 
the operating system cache your stored fields.  You probably don't need 
more than a thousand or two entries in the documentCache, and I've seen 
some people saying that a value of zero might be preferable.  I'm not 
sure I would completely disable it without a LOT of testing.

The queryResultCache can really help Solr perform well, but if it's too 
big, then it will just eat RAM without providing any actual benefit. 
Having more than a couple thousand entries here is probably unnecessary. 
  Mine is set to 512.  Also look for the queryResultWindowSize parameter.

The filterCache is amazing when it comes to performance enhancement, but 
also a major source of autowarming headaches.  I have this set to a max 
size of 64, with an autowarmCount of 4.  I've got very complex filters, 
and even warming only four entries, it still sometimes takes 20-30 
seconds.  The filterCache does get used for some things besides filter 
queries, but normally you don't need very many entries here.

Thanks,
Shawn


Re: Autocommit and replication have been slowing down

Posted by gustavonasu <gu...@gmail.com>.
Hi Shawn,

Thanks for the answer.

If I understand well the autoWarmCount is the number of elements used from
the cache for new searches. I guess that this isn't the problem because
after the commit property increases on the "UPDATE HANDLERS" (admin UI) I
can see the new docs in the searches result.

Unfortunately I can't increase the java heap on the servers right now. So I
was think to change some configurations to release some memory. For example,
we could decrease the maxBufferedDocs value. Do you know if it will be
effective?

Best Regards



--
View this message in context: http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361p4058459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autocommit and replication have been slowing down

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/23/2013 11:27 AM, gustavonasu wrote:
>      We migrated recently from Solr 1.4 to 3.6.1. In the new version we have
> noticed that after some hours (around 8) the autocommit is taking more time
> to be executed.
>
>      In the new version we have noticed that after some hours the autocommit
> is taking more time to be executed. We configured autocommit with maxDocs=50
> and maxTime=10000ms but we've gotten few (3-5) minutes to index documents (I
> got this time seeing the docsPending on the Update Stats and refresh page.
> Is there another way to verify that information?).

Your question is a bit jumbled so I don't know exactly what you are 
saying for all of this, but I'll attempt to answer what I can.  Usually 
if your commits are taking a really long time, it means you're running 
into one of two problems:

1) It is taking a really long time to autowarm your Solr caches.  In 
most cases, it is the filterCache that takes the time, but not always. 
You can see how long it takes to warm the entire searcher as well as 
each individual cache in the Statistics page of the admin UI.  To fix 
this, you have to reduce the autowarmCount on your caches, reduce the 
complexity of your queries and filters or both.

2) Your Java heap is getting exhausted and Java is spending too much 
time doing full garbage collections so it can keep working.  Eventually 
this problem will result in OOM (Out of Memory) errors in your Solr log. 
  To fix this, raise your max heap, which is the -Xmx java option when 
starting your servlet container.  Raising the java heap might also 
require that you add physical RAM to your server.

On version 3.6, I believe that an index update/commit that results in 
segment merging will wait for that merging to complete.  If you do a lot 
of indexing, eventually you will run into a very large merge, and that 
can take a lot of time.  This would not explain why every autoCommit is 
taking a long time, though - it would only explain one out of dozens or 
hundreds.

>      A similar problem has been happening with the replication. We configured
> the pollInterval with 60s but the replication takes some minutes to be
> executed. You could see the timeElapsed value (around 6 minutes) on the
> Replication Stats.

If you optimize your index, or do enough index updates so that a large 
merge takes place, then a very large portion of your index will be 
comprised of brand new files, and if your index is large, that can take 
a long time to replicate.  It is also possible for the java heap problem 
(mentioned above) to cause this.

Thanks,
Shawn