You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by lboutros <bo...@gmail.com> on 2014/02/13 09:22:20 UTC

SolrCloud Zookeeper disconnection/reconnection

Dear all,

we are currenty using Solr 4.3.1 in production (With SolrCloud).

We encounter quite the same problem described in this other old post:

http://lucene.472066.n3.nabble.com/SolrCloud-CloudSolrServer-Zookeeper-disconnects-and-re-connects-with-heavy-memory-usage-consumption-td4026421.html

Sometime some nodes are disconnected from Zookeeper and then they try to
reconnect. The process is quite long because we have a quite long warming
process. And because of this long warming process, just after the recovery
process, the node is disconnected again and so on... until OOM sometime.

We already increased the Zk timeout. But it is not enought.

We are thinking to migrate to Solr 4.6.1 at least (perhaps 4.7 will be up
before the end of the migration :) ).

I know that a lot of SolrCloud bugs are corrected since Solr 4.3.1.

But, could we be sure that this problem will be resolved ? Or can this
problem occur with the last Solr version ? (I know this is not an easy
question ;) )

It seems that this correction : 

Deadlock while trying to recover after a ZK session expiry :
https://issues.apache.org/jira/browse/SOLR-5615

is a good point in addressing our current problem.

But do you think it will be enought ?

One last thing, I don't know if it is already adressed by a correction, but,
if there is no updates between disconnection and the reconnection, the
recovery process should not do anything more than the reconnection, I mean:
no replication, no tLog replay and no warming process. Is it the case ?

Ludovic.



-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-disconnection-reconnection-tp4117101.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud Zookeeper disconnection/reconnection

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.
Start with http://wiki.apache.org/solr/SolrPerformanceProblems It has a
section on GC tuning and a link to some example settings.
On 16 Feb 2014 21:19, "lboutros" <bo...@gmail.com> wrote:

> Thanks a lot for your answer.
>
> Is there a web page, on the wiki for instance, where we could find some JVM
> settings or recommandations that we should used for Solr with some index
> configurations?
>
> Ludovic.
>
>
>
>
>
> -----
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-disconnection-reconnection-tp4117101p4117653.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SolrCloud Zookeeper disconnection/reconnection

Posted by lboutros <bo...@gmail.com>.
Thanks a lot for your answer.

Is there a web page, on the wiki for instance, where we could find some JVM
settings or recommandations that we should used for Solr with some index
configurations? 

Ludovic.





-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-disconnection-reconnection-tp4117101p4117653.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud Zookeeper disconnection/reconnection

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.
Ludovic, recent Solr changes won't do much to prevent ZK session expiry,
you might want to enable GC logging on Solr and Zookeeper to check for
pauses and tune appropriately.

The patch below fixes a situation under which the cloud can get to a bad
state during the recovery after session expiry. The recovery after a
session expiry is unavoidable, but as you guessed, it would be quick if
there aren't too many updates.

4.6.1 also has SOLR-5577 which will prevent updates from unnecessarily
stalling when you are disconnected from ZK for a short while.

These changes (and probably others) will thus probably help the cloud
behave better on ZK expiry and for that reason I would encourage you to
upgrade, but the ZK expiry problem would have to be dealt with ensuring
that ZK and Solr don't pause for too long and by choosing an appropriate
session timeout (which btw will be defaulted up to 30s from 15s in Solr 4.7
onwards).
On 13 Feb 2014 08:23, "lboutros" <bo...@gmail.com> wrote:

> Dear all,
>
> we are currenty using Solr 4.3.1 in production (With SolrCloud).
>
> We encounter quite the same problem described in this other old post:
>
>
> http://lucene.472066.n3.nabble.com/SolrCloud-CloudSolrServer-Zookeeper-disconnects-and-re-connects-with-heavy-memory-usage-consumption-td4026421.html
>
> Sometime some nodes are disconnected from Zookeeper and then they try to
> reconnect. The process is quite long because we have a quite long warming
> process. And because of this long warming process, just after the recovery
> process, the node is disconnected again and so on... until OOM sometime.
>
> We already increased the Zk timeout. But it is not enought.
>
> We are thinking to migrate to Solr 4.6.1 at least (perhaps 4.7 will be up
> before the end of the migration :) ).
>
> I know that a lot of SolrCloud bugs are corrected since Solr 4.3.1.
>
> But, could we be sure that this problem will be resolved ? Or can this
> problem occur with the last Solr version ? (I know this is not an easy
> question ;) )
>
> It seems that this correction :
>
> Deadlock while trying to recover after a ZK session expiry :
> https://issues.apache.org/jira/browse/SOLR-5615
>
> is a good point in addressing our current problem.
>
> But do you think it will be enought ?
>
> One last thing, I don't know if it is already adressed by a correction,
> but,
> if there is no updates between disconnection and the reconnection, the
> recovery process should not do anything more than the reconnection, I mean:
> no replication, no tLog replay and no warming process. Is it the case ?
>
> Ludovic.
>
>
>
> -----
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-disconnection-reconnection-tp4117101.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>