You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by zzT <zi...@gmail.com> on 2014/05/21 11:23:03 UTC

SolrCloud : node recovery fails with "No registered leader was found"

SolrCloud configuration contains a single shard and 2 Solr servers, therefore
one acts as a leader and one as a replica.

Through a series of events(*) I've ended up with one Solr server being in
"Active" status and the leader of the shard while the other one in "Recovery
failed" status which cannot recover no matter what. It keeps retrying every
600 sec and logs the following error

ERROR org.apache.solr.cloud.RecoveryStrategy [RecoveryThread] - Error while
trying to recover. core=sample:org.apache.solr.common.SolrException: No
registered leader was found after waiting for 4000ms , collection: sample
slice: shard1
        at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:531)
        at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:514)
        at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:345)
        at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247)

Does anyone have any idea as to why the replica cannot locate the leader?
What is the proposed solution in this case? 

(*) Sorry for not being able to provide more details but if it's of any help
here goes
-> SolrCloud fails to start because of write.lock in index folders
-> Shutdown servers and remove write.lock files
-> Restart Zookeeper ensemble
-> Restart Solr servers




--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-node-recovery-fails-with-No-registered-leader-was-found-tp4137331.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud : node recovery fails with "No registered leader was found"

Posted by yann180 <ya...@yahoo.com>.
Hi guys,

just wondering if any solution was found for this?

I have a similar problem - Solr 4.7.2, 2-server cloud, single replicated
shard.

At random times one of the server dies with a the same message as in the
title of this thread.

I was hoping there might be a solution? (upgrading Solr is not practical for
me because of the JDK 1.7 requirement).

Thanks

Yann



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-node-recovery-fails-with-No-registered-leader-was-found-tp4137331p4166601.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud : node recovery fails with "No registered leader was found"

Posted by heaven <ah...@gmail.com>.
Seeing the same thing after a crash of one ZK node (from 5):
{code}
org.apache.solr.common.SolrException: No registered leader was found after
waiting for 4000ms , collection: crm-prod slice: shard1
	at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:545)
	at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:528)
	at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:250)
	at
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:982)
	at
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
	at
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:349)
	at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:278)
	at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
	at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
	at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
	at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
	at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
	at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
	at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
	at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
	at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
	at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
	at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
	at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
	at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
	at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
	at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
	at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:368)
	at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
	at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
	at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
	at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
	at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
	at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
	at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:744)
{code}



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-node-recovery-fails-with-No-registered-leader-was-found-tp4137331p4157312.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud : node recovery fails with "No registered leader was found"

Posted by zzT <zi...@gmail.com>.
I'm using Solr 4.7.2.

A few things I've missed follow. Before reaching the "one leader-one failed
to recover" state, the situation was no leader for the shard and both nodes
in "recovery failed" mode. A bit of tinkering to clusterstate.json "forced"
the one to be the leader but that didn't change a thing. The error message
"No registered leader was found after waiting" was the same before and after
tinkering.

The other thing I found out is that there was a period when ZK ensemble was
down for a while. In Solr log and upon startup I found ithe following
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for /collections/sample/leaders/shard1

It seems though that the Solr servers could not communicate with ZK once it
was functional again.
Restarting the servers once more fixed the issue and the failing server
managed to "recover" the index successfully.







--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-node-recovery-fails-with-No-registered-leader-was-found-tp4137331p4137584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud : node recovery fails with "No registered leader was found"

Posted by Erick Erickson <er...@gmail.com>.
What version of Solr?

On Wed, May 21, 2014 at 2:23 AM, zzT <zi...@gmail.com> wrote:
> SolrCloud configuration contains a single shard and 2 Solr servers, therefore
> one acts as a leader and one as a replica.
>
> Through a series of events(*) I've ended up with one Solr server being in
> "Active" status and the leader of the shard while the other one in "Recovery
> failed" status which cannot recover no matter what. It keeps retrying every
> 600 sec and logs the following error
>
> ERROR org.apache.solr.cloud.RecoveryStrategy [RecoveryThread] - Error while
> trying to recover. core=sample:org.apache.solr.common.SolrException: No
> registered leader was found after waiting for 4000ms , collection: sample
> slice: shard1
>         at
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:531)
>         at
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:514)
>         at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:345)
>         at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247)
>
> Does anyone have any idea as to why the replica cannot locate the leader?
> What is the proposed solution in this case?
>
> (*) Sorry for not being able to provide more details but if it's of any help
> here goes
> -> SolrCloud fails to start because of write.lock in index folders
> -> Shutdown servers and remove write.lock files
> -> Restart Zookeeper ensemble
> -> Restart Solr servers
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-node-recovery-fails-with-No-registered-leader-was-found-tp4137331.html
> Sent from the Solr - User mailing list archive at Nabble.com.