You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Per Steffensen <st...@designware.dk> on 2013/08/14 15:10:47 UTC
Two hour recovery in non-replica setup?
Hi
We have a fairly large SolrCloud installation where we continuously
index a lot of documents. Users are doing searches against the system
from time to time.
From time to time our Solrs lose their Zookeeper connection. Guess
thats what happens. But it takes two hours before a Solr that loses its
ZK connection has its shards active again. I cant imagine what it needs
to do for two hours? We are not using replica, so synch of data among
replica is not it. At the time we havnt dived into the problem, and we
do not know there the time is spent, whether it is between the
connection is lost and the Solr "realizes" it, or whether it is from
when it "realizes" it and until the shards are declared active again.
We will dive into the problem, but before doing that I wanted to ask
here if this is a known problem, and, if yes, whether or not it is
solved? FYI we are using Solr 4.0.0.
Regards, Per Steffensen
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Two hour recovery in non-replica setup?
Posted by Per Steffensen <st...@designware.dk>.
Hi
Thanks, Mark. I am pretty sure SOLR-4899 is not it, but our problem
might we solved after 4.0.0 anyway. We will consider if we will upgrade
to latest 4.x and/or if we will dive into the problem to see if we can
figure out what happens.
Regards, Per Steffensen
On 8/14/13 4:13 PM, Mark Miller wrote:
> There have been hundreds of fixes since 4.0 - many around zookeeper integration. Really hard to say what it might be - I can say nothing should take that long by design.
>
> A pretty important somewhat recent bug found was https://issues.apache.org/jira/browse/SOLR-4899 When reconnecting after ZooKeeper expiration, we need to be willing to wait forever, not for 30 seconds.
>
> Don't know that it has anything to do with what you are seeing, or if perhaps that didn't exist in 4.0 - but worth looking at.
>
> - Mark
>
> On Aug 14, 2013, at 9:10 AM, Per Steffensen <st...@designware.dk> wrote:
>
>> Hi
>>
>> We have a fairly large SolrCloud installation where we continuously index a lot of documents. Users are doing searches against the system from time to time.
>>
>> From time to time our Solrs lose their Zookeeper connection. Guess thats what happens. But it takes two hours before a Solr that loses its ZK connection has its shards active again. I cant imagine what it needs to do for two hours? We are not using replica, so synch of data among replica is not it. At the time we havnt dived into the problem, and we do not know there the time is spent, whether it is between the connection is lost and the Solr "realizes" it, or whether it is from when it "realizes" it and until the shards are declared active again.
>>
>> We will dive into the problem, but before doing that I wanted to ask here if this is a known problem, and, if yes, whether or not it is solved? FYI we are using Solr 4.0.0.
>>
>> Regards, Per Steffensen
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Two hour recovery in non-replica setup?
Posted by Mark Miller <ma...@gmail.com>.
There have been hundreds of fixes since 4.0 - many around zookeeper integration. Really hard to say what it might be - I can say nothing should take that long by design.
A pretty important somewhat recent bug found was https://issues.apache.org/jira/browse/SOLR-4899 When reconnecting after ZooKeeper expiration, we need to be willing to wait forever, not for 30 seconds.
Don't know that it has anything to do with what you are seeing, or if perhaps that didn't exist in 4.0 - but worth looking at.
- Mark
On Aug 14, 2013, at 9:10 AM, Per Steffensen <st...@designware.dk> wrote:
> Hi
>
> We have a fairly large SolrCloud installation where we continuously index a lot of documents. Users are doing searches against the system from time to time.
>
> From time to time our Solrs lose their Zookeeper connection. Guess thats what happens. But it takes two hours before a Solr that loses its ZK connection has its shards active again. I cant imagine what it needs to do for two hours? We are not using replica, so synch of data among replica is not it. At the time we havnt dived into the problem, and we do not know there the time is spent, whether it is between the connection is lost and the Solr "realizes" it, or whether it is from when it "realizes" it and until the shards are declared active again.
>
> We will dive into the problem, but before doing that I wanted to ask here if this is a known problem, and, if yes, whether or not it is solved? FYI we are using Solr 4.0.0.
>
> Regards, Per Steffensen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org