You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by gouthsmsimhadri <go...@gmail.com> on 2015/01/24 04:40:31 UTC

Replicas fall into recovery mode right after update

I'm working with a cluster of solr-cloud servers at a configration of 10
shards and 4 replicas on each shard in stress environment.  
Planned production configuration is 10 shards and 15 replicas on each shard.  

Current commit settings are as follows

        <autoSoftCommit>
            <maxDocs>500000</maxDocs>
            <maxTime>180000</maxTime>
        </autoSoftCommit>

        <autoCommit>
            <maxDocs>2000000</maxDocs>
            <maxTime>180000</maxTime>
            <openSearcher>false</openSearcher>
        </autoCommit>


The application requires to index approximately 90 Million docs which is
indexed in two ways
a)	Full indexing. It takes 4 hours to index 90 Million docs and the rate of
docs coming to the searcher is around 6000 per second
b)	Incremental indexing. It takes an hour to index delta changes. Roughly
there are 3 million changes and rate of docs coming to the searchers is 2500
per second 

I use two collections for example collection1 and collection2
Each collection has system settings at 12 GB of available RAM and quad core 
Intel(R) Xeon(R) CPU X5570  @ 2.93GHz

Full indexing is always performed on a collection which is not serving live
traffic and Once job is completed we swap collection so the collection with
latest data serves traffic and other is inactive. 

The other mode of incremental indexing  is performed  always on the
collection which is serving live traffic.

The problem is in about 10 minutes of indexing is triggered, the replicas
goes in to recovery mode. This happens on all the shards. In about 20
minutes or more rest of replicas start to fall into recovery mode. In about
half an hour all replicas except the leader is in recovery mode.

I cannot throttle the indexing load as that will increase our overall
indexing time. So to overcome this issue, I remove all the replicas before
the indexing is started and then add them after the indexing completes.

The behavior(replicas falling into recovery mode) in incremental mode of
indexing is troublesome as i cannot remove replicas during incremental
indexing since it serves live traffic, i tried to throttle the speed at
which documents are indexed but with no success as the cluster still goes on
recovery.

If i let the cluster as is the indexing  eventually completes and also
recovers after a while, but since this is serving live traffic i just cannot
let these replicas go into recovery mode since it degrades the search
performance also (from the tests performed). 

I tried different commit settings like the below
a)	No auto soft commit, no auto hard commit and a commit triggered at the
end of indexing
b)	No auto soft commit, yes auto hard commit and a commit in the end of
indexing
c)	Yes auto soft commit , no auto hard commit 
d)	Yes auto soft commit , yes auto hard commit 
e)	Different frequency setting for commits for above

Unfortunately all the above yields the same behavior . The replicas still
goes in recovery

I have increased the zookeeper timeout from 30 seconds to 5 minutes and the
problem persists. 

Is there any setting that would fix this issue ?




-----
 -goutham
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Replicas-fall-into-recovery-mode-right-after-update-tp4181706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replicas fall into recovery mode right after update

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
What version of Solr are you using? What GC parameters are you using? Do
you have GC logs enabled? Look at full GC times in those logs and see
what's happening. This particular problem is usually because replicas
cannot accept the rate of updates and they fall back to recovery state. You
should also check the leader logs to find what kind of exceptions are being
logged.

Also, do multiple shards share the same disk? If yes, then creating so many
shards might not help because the disk will become a bottleneck.

On Sat, Jan 24, 2015 at 3:40 AM, gouthsmsimhadri <go...@gmail.com>
wrote:

> I'm working with a cluster of solr-cloud servers at a configration of 10
> shards and 4 replicas on each shard in stress environment.
> Planned production configuration is 10 shards and 15 replicas on each
> shard.
>
> Current commit settings are as follows
>
>         <autoSoftCommit>
>             <maxDocs>500000</maxDocs>
>             <maxTime>180000</maxTime>
>         </autoSoftCommit>
>
>         <autoCommit>
>             <maxDocs>2000000</maxDocs>
>             <maxTime>180000</maxTime>
>             <openSearcher>false</openSearcher>
>         </autoCommit>
>
>
> The application requires to index approximately 90 Million docs which is
> indexed in two ways
> a)      Full indexing. It takes 4 hours to index 90 Million docs and the
> rate of
> docs coming to the searcher is around 6000 per second
> b)      Incremental indexing. It takes an hour to index delta changes.
> Roughly
> there are 3 million changes and rate of docs coming to the searchers is
> 2500
> per second
>
> I use two collections for example collection1 and collection2
> Each collection has system settings at 12 GB of available RAM and quad core
> Intel(R) Xeon(R) CPU X5570  @ 2.93GHz
>
> Full indexing is always performed on a collection which is not serving live
> traffic and Once job is completed we swap collection so the collection with
> latest data serves traffic and other is inactive.
>
> The other mode of incremental indexing  is performed  always on the
> collection which is serving live traffic.
>
> The problem is in about 10 minutes of indexing is triggered, the replicas
> goes in to recovery mode. This happens on all the shards. In about 20
> minutes or more rest of replicas start to fall into recovery mode. In about
> half an hour all replicas except the leader is in recovery mode.
>
> I cannot throttle the indexing load as that will increase our overall
> indexing time. So to overcome this issue, I remove all the replicas before
> the indexing is started and then add them after the indexing completes.
>
> The behavior(replicas falling into recovery mode) in incremental mode of
> indexing is troublesome as i cannot remove replicas during incremental
> indexing since it serves live traffic, i tried to throttle the speed at
> which documents are indexed but with no success as the cluster still goes
> on
> recovery.
>
> If i let the cluster as is the indexing  eventually completes and also
> recovers after a while, but since this is serving live traffic i just
> cannot
> let these replicas go into recovery mode since it degrades the search
> performance also (from the tests performed).
>
> I tried different commit settings like the below
> a)      No auto soft commit, no auto hard commit and a commit triggered at
> the
> end of indexing
> b)      No auto soft commit, yes auto hard commit and a commit in the end
> of
> indexing
> c)      Yes auto soft commit , no auto hard commit
> d)      Yes auto soft commit , yes auto hard commit
> e)      Different frequency setting for commits for above
>
> Unfortunately all the above yields the same behavior . The replicas still
> goes in recovery
>
> I have increased the zookeeper timeout from 30 seconds to 5 minutes and the
> problem persists.
>
> Is there any setting that would fix this issue ?
>
>
>
>
> -----
>  -goutham
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Replicas-fall-into-recovery-mode-right-after-update-tp4181706.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Replicas fall into recovery mode right after update

Posted by Nishanth S <ni...@gmail.com>.
Can you tell what version of solr you are using and what causes your
replicas to go into recovery.

On Fri, Jan 23, 2015 at 8:40 PM, gouthsmsimhadri <go...@gmail.com>
wrote:

> I'm working with a cluster of solr-cloud servers at a configration of 10
> shards and 4 replicas on each shard in stress environment.
> Planned production configuration is 10 shards and 15 replicas on each
> shard.
>
> Current commit settings are as follows
>
>         <autoSoftCommit>
>             <maxDocs>500000</maxDocs>
>             <maxTime>180000</maxTime>
>         </autoSoftCommit>
>
>         <autoCommit>
>             <maxDocs>2000000</maxDocs>
>             <maxTime>180000</maxTime>
>             <openSearcher>false</openSearcher>
>         </autoCommit>
>
>
> The application requires to index approximately 90 Million docs which is
> indexed in two ways
> a)      Full indexing. It takes 4 hours to index 90 Million docs and the
> rate of
> docs coming to the searcher is around 6000 per second
> b)      Incremental indexing. It takes an hour to index delta changes.
> Roughly
> there are 3 million changes and rate of docs coming to the searchers is
> 2500
> per second
>
> I use two collections for example collection1 and collection2
> Each collection has system settings at 12 GB of available RAM and quad core
> Intel(R) Xeon(R) CPU X5570  @ 2.93GHz
>
> Full indexing is always performed on a collection which is not serving live
> traffic and Once job is completed we swap collection so the collection with
> latest data serves traffic and other is inactive.
>
> The other mode of incremental indexing  is performed  always on the
> collection which is serving live traffic.
>
> The problem is in about 10 minutes of indexing is triggered, the replicas
> goes in to recovery mode. This happens on all the shards. In about 20
> minutes or more rest of replicas start to fall into recovery mode. In about
> half an hour all replicas except the leader is in recovery mode.
>
> I cannot throttle the indexing load as that will increase our overall
> indexing time. So to overcome this issue, I remove all the replicas before
> the indexing is started and then add them after the indexing completes.
>
> The behavior(replicas falling into recovery mode) in incremental mode of
> indexing is troublesome as i cannot remove replicas during incremental
> indexing since it serves live traffic, i tried to throttle the speed at
> which documents are indexed but with no success as the cluster still goes
> on
> recovery.
>
> If i let the cluster as is the indexing  eventually completes and also
> recovers after a while, but since this is serving live traffic i just
> cannot
> let these replicas go into recovery mode since it degrades the search
> performance also (from the tests performed).
>
> I tried different commit settings like the below
> a)      No auto soft commit, no auto hard commit and a commit triggered at
> the
> end of indexing
> b)      No auto soft commit, yes auto hard commit and a commit in the end
> of
> indexing
> c)      Yes auto soft commit , no auto hard commit
> d)      Yes auto soft commit , yes auto hard commit
> e)      Different frequency setting for commits for above
>
> Unfortunately all the above yields the same behavior . The replicas still
> goes in recovery
>
> I have increased the zookeeper timeout from 30 seconds to 5 minutes and the
> problem persists.
>
> Is there any setting that would fix this issue ?
>
>
>
>
> -----
>  -goutham
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Replicas-fall-into-recovery-mode-right-after-update-tp4181706.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>