You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hendrik Haddorp <he...@gmx.net> on 2019/01/25 08:38:56 UTC

SolrCloud recovery

Hi,

I have a SolrCloud with many collections. When I restart an instance and 
the replicas are recovering I noticed that number replicas recovering at 
one point is usually around 5. This results in the recovery to take 
rather long. Is there a configuration option that controls how many 
replicas can recover in parallel?

thanks,
Hendrik

Re: SolrCloud recovery

Posted by Hendrik Haddorp <he...@gmx.net>.
On a system with about 1600 collections, each having one shard and a 
replication factor of two it took around an hour to recover completely 
after an instance restart. The setup used HDFS for the storage. And we 
are using Solr 7.4 at the moment. The overseer queue management helped 
us a lot! Before that Solr could easily swirl into death if the queue 
grew too fast. I haven't checked the logs on what the recovery does. Is 
there anything specific to look for?

During the recovery one can see how Solr is going over the replicas one 
by one and never really working on more then about 5 replicas at a time, 
often less. The progress also seems to be done in alphabetical order. I 
believe that used to be different in older versions. I will try to give 
the coreLoadThreads setting a test.

Hendrik

On 25.01.2019 16:51, Erick Erickson wrote:
> That's just _loading_, recovery happens later so I'd
> be surprised if this really made a difference, but you
> never know.
>
> I'm more interested in _why_ recovery takes so long.
> and why recovery happens in the first place. It's normal
> for replicas when starting up to to from down->recovering->active,
> that's just part of the normal cycle. But the recovering state
> should be relatively short absent having to replicate the
> index from the leader.
>
> If active indexing is going on, then the replicas may have to
> copy their index down from the leader. Does this happen
> on a system that is not indexing?
>
> What version of Solr? All the state changes go through
> the Overseer, and there were some very significant improvements
> in Solr 6.6+, see:
> https://issues.apache.org/jira/browse/SOLR-10265
>
> And can you put a number to "rather long"? There's a built-in
> 3 minute wait for leader election if there's no leader for
> a slice. That's not relevant if the replica in recovery
> belongs to a shard that already has a leader, but if you
> restart your entire cluster it can come into play.
>
> Best,
> Erick
>
> On Fri, Jan 25, 2019 at 3:32 AM Hendrik Haddorp <he...@gmx.net> wrote:
>> Thanks, that sounds good. Didn't know that parameter.
>>
>> On 25.01.2019 11:23, Vadim Ivanov wrote:
>>>    You can try to tweak solr.xml
>>>
>>>
>>> coreLoadThreads
>>> Specifies the number of threads that will be assigned to load cores in parallel.
>>>
>>> https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html
>>>
>>>>> -----Original Message-----
>>>>> From: Hendrik Haddorp [mailto:hendrik.haddorp@gmx.net]
>>>>> Sent: Friday, January 25, 2019 11:39 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: SolrCloud recovery
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have a SolrCloud with many collections. When I restart an instance and
>>>>> the replicas are recovering I noticed that number replicas recovering at
>>>>> one point is usually around 5. This results in the recovery to take
>>>>> rather long. Is there a configuration option that controls how many
>>>>> replicas can recover in parallel?
>>>>>
>>>>> thanks,
>>>>> Hendrik


Re: SolrCloud recovery

Posted by Erick Erickson <er...@gmail.com>.
That's just _loading_, recovery happens later so I'd
be surprised if this really made a difference, but you
never know.

I'm more interested in _why_ recovery takes so long.
and why recovery happens in the first place. It's normal
for replicas when starting up to to from down->recovering->active,
that's just part of the normal cycle. But the recovering state
should be relatively short absent having to replicate the
index from the leader.

If active indexing is going on, then the replicas may have to
copy their index down from the leader. Does this happen
on a system that is not indexing?

What version of Solr? All the state changes go through
the Overseer, and there were some very significant improvements
in Solr 6.6+, see:
https://issues.apache.org/jira/browse/SOLR-10265

And can you put a number to "rather long"? There's a built-in
3 minute wait for leader election if there's no leader for
a slice. That's not relevant if the replica in recovery
belongs to a shard that already has a leader, but if you
restart your entire cluster it can come into play.

Best,
Erick

On Fri, Jan 25, 2019 at 3:32 AM Hendrik Haddorp <he...@gmx.net> wrote:
>
> Thanks, that sounds good. Didn't know that parameter.
>
> On 25.01.2019 11:23, Vadim Ivanov wrote:
> >   You can try to tweak solr.xml
> >
> >
> > coreLoadThreads
> > Specifies the number of threads that will be assigned to load cores in parallel.
> >
> > https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html
> >
> >>> -----Original Message-----
> >>> From: Hendrik Haddorp [mailto:hendrik.haddorp@gmx.net]
> >>> Sent: Friday, January 25, 2019 11:39 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: SolrCloud recovery
> >>>
> >>> Hi,
> >>>
> >>> I have a SolrCloud with many collections. When I restart an instance and
> >>> the replicas are recovering I noticed that number replicas recovering at
> >>> one point is usually around 5. This results in the recovery to take
> >>> rather long. Is there a configuration option that controls how many
> >>> replicas can recover in parallel?
> >>>
> >>> thanks,
> >>> Hendrik
>

Re: SolrCloud recovery

Posted by Hendrik Haddorp <he...@gmx.net>.
Thanks, that sounds good. Didn't know that parameter.

On 25.01.2019 11:23, Vadim Ivanov wrote:
>   You can try to tweak solr.xml
>
>
> coreLoadThreads
> Specifies the number of threads that will be assigned to load cores in parallel.
>
> https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html
>
>>> -----Original Message-----
>>> From: Hendrik Haddorp [mailto:hendrik.haddorp@gmx.net]
>>> Sent: Friday, January 25, 2019 11:39 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: SolrCloud recovery
>>>
>>> Hi,
>>>
>>> I have a SolrCloud with many collections. When I restart an instance and
>>> the replicas are recovering I noticed that number replicas recovering at
>>> one point is usually around 5. This results in the recovery to take
>>> rather long. Is there a configuration option that controls how many
>>> replicas can recover in parallel?
>>>
>>> thanks,
>>> Hendrik


RE: SolrCloud recovery

Posted by Vadim Ivanov <va...@spb.ntk-intourist.ru>.
 You can try to tweak solr.xml


coreLoadThreads
Specifies the number of threads that will be assigned to load cores in parallel.

https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html

> 
> > -----Original Message-----
> > From: Hendrik Haddorp [mailto:hendrik.haddorp@gmx.net]
> > Sent: Friday, January 25, 2019 11:39 AM
> > To: solr-user@lucene.apache.org
> > Subject: SolrCloud recovery
> >
> > Hi,
> >
> > I have a SolrCloud with many collections. When I restart an instance and
> > the replicas are recovering I noticed that number replicas recovering at
> > one point is usually around 5. This results in the recovery to take
> > rather long. Is there a configuration option that controls how many
> > replicas can recover in parallel?
> >
> > thanks,
> > Hendrik


RE: SolrCloud recovery

Posted by Vadim Ivanov <va...@spb.ntk-intourist.ru>.
You can try to tweak solr.xml

> -----Original Message-----
> From: Hendrik Haddorp [mailto:hendrik.haddorp@gmx.net]
> Sent: Friday, January 25, 2019 11:39 AM
> To: solr-user@lucene.apache.org
> Subject: SolrCloud recovery
> 
> Hi,
> 
> I have a SolrCloud with many collections. When I restart an instance and
> the replicas are recovering I noticed that number replicas recovering at
> one point is usually around 5. This results in the recovery to take
> rather long. Is there a configuration option that controls how many
> replicas can recover in parallel?
> 
> thanks,
> Hendrik