You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2015/03/08 13:36:30 UTC

4.10.4 - nodes up, shard without leader

Hello - i stumbled upon an issue i've never seen earlier, a shard with all nodes up and running but no leader. This is on 4.10.4. One of the two nodes emits the following error log entry:

2015-03-08 05:25:49,095 WARN [solr.cloud.ElectionContext] - [Thread-136] - : cancelElection did not find election node to remove /overseer_elect/election/93434598784958483-178.21.116.225:8080_solr-n_0000000246
2015-03-08 05:25:49,121 WARN [solr.cloud.ElectionContext] - [Thread-136] - : cancelElection did not find election node to remove /collections/oi/leader_elect/shard3/election/93434598784958483-178.21.116.225:8080_solr_oi_h-n_0000000043
2015-03-08 05:25:49,220 ERROR [solr.update.UpdateLog] - [Thread-136] - : Error inspecting tlog tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0000000000000001394 refcount=2}
java.nio.channels.ClosedChannelException
        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
        at org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:784)
        at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
        at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:125)
        at java.io.InputStream.read(InputStream.java:101)
        at org.apache.solr.update.TransactionLog.endsWithCommit(TransactionLog.java:218)
        at org.apache.solr.update.UpdateLog.recoverFromLog(UpdateLog.java:800)
        at org.apache.solr.cloud.ZkController.register(ZkController.java:841)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:277)
        at org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
2015-03-08 05:25:49,225 ERROR [solr.update.UpdateLog] - [Thread-136] - : Error inspecting tlog tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0000000000000001471 refcount=2}
java.nio.channels.ClosedChannelException
        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
        at org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:784)
        at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
        at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:125)
        at java.io.InputStream.read(InputStream.java:101)
        at org.apache.solr.update.TransactionLog.endsWithCommit(TransactionLog.java:218)
        at org.apache.solr.update.UpdateLog.recoverFromLog(UpdateLog.java:800)
        at org.apache.solr.cloud.ZkController.register(ZkController.java:841)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:277)
        at org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
2015-03-08 12:21:04,438 WARN [solr.cloud.RecoveryStrategy] - [zkCallback-2-thread-28] - : Stopping recovery for core=oi_h coreNodeName=178.21.116.225:8080_solr_oi_h

The other node makes a mess in the logs:

2015-03-08 05:25:46,020 WARN [solr.cloud.RecoveryStrategy] - [zkCallback-2-thread-20] - : Stopping recovery for core=oi_c coreNodeName=194.145.201.190:
8080_solr_oi_c
2015-03-08 05:26:08,670 ERROR [solr.cloud.ShardLeaderElectionContext] - [zkCallback-2-thread-19] - : There was a problem trying to register as the leader:org.
apache.solr.common.SolrException: Could not register as the leader because creating the ephemeral registration node in ZooKeeper failed
        at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:146)
        at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:317)
        at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
        at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
        at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
        at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
        at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:210)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /collections/openi
ndex/leaders/shard3
        at org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:40)
        at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:134)
        ... 11 more
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /collections/oi/leaders/shard3
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:463)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:460)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:417)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:404)
        at org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:138)
        at org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:34)
        ... 12 more

2015-03-08 05:26:08,682 WARN [solr.cloud.ElectionContext] - [zkCallback-2-thread-19] - : cancelElection did not find election node to remove /collections/open
index/leader_elect/shard3/election/93434598784958480-194.145.201.190:8080_solr_oi_c-n_0000000044
2015-03-08 05:26:12,692 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_c:org.apache.solr.common.Solr
Exception: No registered leader was found after waiting for 4000ms , collection: oi slice: shard3
        at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
        at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
        at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:330)
        at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)

... and then a continuous mess like below

2015-03-08 05:26:12,692 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... (0) core=oi_c
2015-03-08 05:26:26,703 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_c:org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: oi slice: shard3
        at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
        at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
        at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:330)
        at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)

I haven't seen this before, the cluster was unavailable for indexing, search still worked as expected. Is there an open issue i can't seem to find now?

Markus

Re: 4.10.4 - nodes up, shard without leader

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.

The update log replay issue looks like
https://issues.apache.org/jira/browse/SOLR-6583
On 9 Mar 2015 01:41, "Mark Miller" <ma...@gmail.com> wrote:

> Interesting bug.
>
> First there is the already closed transaction log. That by itself deserves
> a look. I'm not even positive we should be replaying the log we
> reconnecting from ZK disconnect, but even if we do, this should never
> happen.
>
> Beyond that there seems to be some race. Because of the log trouble, we try
> and cancel the election - but we don't find the ephemeral election node yet
> for some reason and so just assume it's fine, no node there to remove
> (well, we WARN, because it is a little unexpected). Then that ephemeral
> node materializes I guess, and the new leader doesn't register because the
> old leader won't give up the thrown. We don't try and force the new leader
> because that may just hide bugs and cause data loss, we no leader is
> elected.
>
> I'd guess there are two JIRA issues to resolve here.
>
> - Mark
>
> On Sun, Mar 8, 2015 at 8:37 AM Markus Jelsma <ma...@openindex.io>
> wrote:
>
> > Hello - i stumbled upon an issue i've never seen earlier, a shard with
> all
> > nodes up and running but no leader. This is on 4.10.4. One of the two
> nodes
> > emits the following error log entry:
> >
> > 2015-03-08 05:25:49,095 WARN [solr.cloud.ElectionContext] - [Thread-136]
> -
> > : cancelElection did not find election node to remove
> > /overseer_elect/election/93434598784958483-178.21.116.
> > 225:8080_solr-n_0000000246
> > 2015-03-08 05:25:49,121 WARN [solr.cloud.ElectionContext] - [Thread-136]
> -
> > : cancelElection did not find election node to remove
> >
> /collections/oi/leader_elect/shard3/election/93434598784958483-178.21.116.
> > 225:8080_solr_oi_h-n_0000000043
> > 2015-03-08 05:25:49,220 ERROR [solr.update.UpdateLog] - [Thread-136] - :
> > Error inspecting tlog
> tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0000000000000001394
> > refcount=2}
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
> >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
> >         at org.apache.solr.update.ChannelFastInputStream.
> > readWrappedStream(TransactionLog.java:784)
> >         at org.apache.solr.common.util.FastInputStream.refill(
> > FastInputStream.java:89)
> >         at org.apache.solr.common.util.FastInputStream.read(
> > FastInputStream.java:125)
> >         at java.io.InputStream.read(InputStream.java:101)
> >         at org.apache.solr.update.TransactionLog.endsWithCommit(
> > TransactionLog.java:218)
> >         at org.apache.solr.update.UpdateLog.recoverFromLog(
> > UpdateLog.java:800)
> >         at org.apache.solr.cloud.ZkController.register(
> > ZkController.java:841)
> >         at org.apache.solr.cloud.ZkController$1.command(
> > ZkController.java:277)
> >         at org.apache.solr.common.cloud.ConnectionManager$1$1.run(
> > ConnectionManager.java:166)
> > 2015-03-08 05:25:49,225 ERROR [solr.update.UpdateLog] - [Thread-136] - :
> > Error inspecting tlog
> tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0000000000000001471
> > refcount=2}
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
> >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
> >         at org.apache.solr.update.ChannelFastInputStream.
> > readWrappedStream(TransactionLog.java:784)
> >         at org.apache.solr.common.util.FastInputStream.refill(
> > FastInputStream.java:89)
> >         at org.apache.solr.common.util.FastInputStream.read(
> > FastInputStream.java:125)
> >         at java.io.InputStream.read(InputStream.java:101)
> >         at org.apache.solr.update.TransactionLog.endsWithCommit(
> > TransactionLog.java:218)
> >         at org.apache.solr.update.UpdateLog.recoverFromLog(
> > UpdateLog.java:800)
> >         at org.apache.solr.cloud.ZkController.register(
> > ZkController.java:841)
> >         at org.apache.solr.cloud.ZkController$1.command(
> > ZkController.java:277)
> >         at org.apache.solr.common.cloud.ConnectionManager$1$1.run(
> > ConnectionManager.java:166)
> > 2015-03-08 12:21:04,438 WARN [solr.cloud.RecoveryStrategy] -
> > [zkCallback-2-thread-28] - : Stopping recovery for core=oi_h
> coreNodeName=
> > 178.21.116.225:8080_solr_oi_h
> >
> > The other node makes a mess in the logs:
> >
> > 2015-03-08 05:25:46,020 WARN [solr.cloud.RecoveryStrategy] -
> > [zkCallback-2-thread-20] - : Stopping recovery for core=oi_c
> coreNodeName=
> > 194.145.201.190:
> > 8080_solr_oi_c
> > 2015-03-08 05:26:08,670 ERROR [solr.cloud.ShardLeaderElectionContext] -
> > [zkCallback-2-thread-19] - : There was a problem trying to register as
> the
> > leader:org.
> > apache.solr.common.SolrException: Could not register as the leader
> > because creating the ephemeral registration node in ZooKeeper failed
> >         at org.apache.solr.cloud.ShardLeaderElectionContextBase
> > .runLeaderProcess(ElectionContext.java:146)
> >         at org.apache.solr.cloud.ShardLeaderElectionContext.
> > runLeaderProcess(ElectionContext.java:317)
> >         at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(
> > LeaderElector.java:163)
> >         at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(
> > LeaderElector.java:125)
> >         at org.apache.solr.cloud.LeaderElector.access$200(
> > LeaderElector.java:55)
> >         at org.apache.solr.cloud.LeaderElector$ElectionWatcher.
> > process(LeaderElector.java:358)
> >         at org.apache.solr.common.cloud.SolrZkClient$3$1.run(
> > SolrZkClient.java:210)
> >         at java.util.concurrent.Executors$RunnableAdapter.
> > call(Executors.java:471)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1145)
> >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:745)
> > Caused by: org.apache.solr.common.SolrException: org.apache.zookeeper.
> > KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for
> > /collections/openi
> > ndex/leaders/shard3
> >         at org.apache.solr.common.util.RetryUtil.retryOnThrowable(
> > RetryUtil.java:40)
> >         at org.apache.solr.cloud.ShardLeaderElectionContextBase
> > .runLeaderProcess(ElectionContext.java:134)
> >         ... 11 more
> > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException:
> > KeeperErrorCode = NodeExists for /collections/oi/leaders/shard3
> >         at org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:119)
> >         at org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:51)
> >         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> >         at org.apache.solr.common.cloud.SolrZkClient$11.execute(
> > SolrZkClient.java:463)
> >         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> > ZkCmdExecutor.java:74)
> >         at org.apache.solr.common.cloud.SolrZkClient.makePath(
> > SolrZkClient.java:460)
> >         at org.apache.solr.common.cloud.SolrZkClient.makePath(
> > SolrZkClient.java:417)
> >         at org.apache.solr.common.cloud.SolrZkClient.makePath(
> > SolrZkClient.java:404)
> >         at org.apache.solr.cloud.ShardLeaderElectionContextBase
> > $1.execute(ElectionContext.java:138)
> >         at org.apache.solr.common.util.RetryUtil.retryOnThrowable(
> > RetryUtil.java:34)
> >         ... 12 more
> >
> > 2015-03-08 05:26:08,682 WARN [solr.cloud.ElectionContext] -
> > [zkCallback-2-thread-19] - : cancelElection did not find election node to
> > remove /collections/open
> > index/leader_elect/shard3/election/93434598784958480-194.145.201.190:8080
> > _solr_oi_c-n_0000000044
> > 2015-03-08 05:26:12,692 ERROR [solr.cloud.RecoveryStrategy] -
> > [RecoveryThread] - : Error while trying to recover.
> > core=oi_c:org.apache.solr.common.Solr
> > Exception: No registered leader was found after waiting for 4000ms ,
> > collection: oi slice: shard3
> >         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(
> > ZkStateReader.java:568)
> >         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(
> > ZkStateReader.java:551)
> >         at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> > RecoveryStrategy.java:330)
> >         at org.apache.solr.cloud.RecoveryStrategy.run(
> > RecoveryStrategy.java:235)
> >
> > ... and then a continuous mess like below
> >
> > 2015-03-08 05:26:12,692 ERROR [solr.cloud.RecoveryStrategy] -
> > [RecoveryThread] - : Recovery failed - trying again... (0) core=oi_c
> > 2015-03-08 05:26:26,703 ERROR [solr.cloud.RecoveryStrategy] -
> > [RecoveryThread] - : Error while trying to recover.
> > core=oi_c:org.apache.solr.common.SolrException: No registered leader was
> > found after waiting for 4000ms , collection: oi slice: shard3
> >         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(
> > ZkStateReader.java:568)
> >         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(
> > ZkStateReader.java:551)
> >         at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> > RecoveryStrategy.java:330)
> >         at org.apache.solr.cloud.RecoveryStrategy.run(
> > RecoveryStrategy.java:235)
> >
> > I haven't seen this before, the cluster was unavailable for indexing,
> > search still worked as expected. Is there an open issue i can't seem to
> > find now?
> >
> > Markus
> >
>

Re: 4.10.4 - nodes up, shard without leader

Posted by Mark Miller <ma...@gmail.com>.

Interesting bug.

First there is the already closed transaction log. That by itself deserves
a look. I'm not even positive we should be replaying the log we
reconnecting from ZK disconnect, but even if we do, this should never
happen.

Beyond that there seems to be some race. Because of the log trouble, we try
and cancel the election - but we don't find the ephemeral election node yet
for some reason and so just assume it's fine, no node there to remove
(well, we WARN, because it is a little unexpected). Then that ephemeral
node materializes I guess, and the new leader doesn't register because the
old leader won't give up the thrown. We don't try and force the new leader
because that may just hide bugs and cause data loss, we no leader is
elected.

I'd guess there are two JIRA issues to resolve here.

- Mark

On Sun, Mar 8, 2015 at 8:37 AM Markus Jelsma <ma...@openindex.io>
wrote:

> Hello - i stumbled upon an issue i've never seen earlier, a shard with all
> nodes up and running but no leader. This is on 4.10.4. One of the two nodes
> emits the following error log entry:
>
> 2015-03-08 05:25:49,095 WARN [solr.cloud.ElectionContext] - [Thread-136] -
> : cancelElection did not find election node to remove
> /overseer_elect/election/93434598784958483-178.21.116.
> 225:8080_solr-n_0000000246
> 2015-03-08 05:25:49,121 WARN [solr.cloud.ElectionContext] - [Thread-136] -
> : cancelElection did not find election node to remove
> /collections/oi/leader_elect/shard3/election/93434598784958483-178.21.116.
> 225:8080_solr_oi_h-n_0000000043
> 2015-03-08 05:25:49,220 ERROR [solr.update.UpdateLog] - [Thread-136] - :
> Error inspecting tlog tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0000000000000001394
> refcount=2}
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
>         at org.apache.solr.update.ChannelFastInputStream.
> readWrappedStream(TransactionLog.java:784)
>         at org.apache.solr.common.util.FastInputStream.refill(
> FastInputStream.java:89)
>         at org.apache.solr.common.util.FastInputStream.read(
> FastInputStream.java:125)
>         at java.io.InputStream.read(InputStream.java:101)
>         at org.apache.solr.update.TransactionLog.endsWithCommit(
> TransactionLog.java:218)
>         at org.apache.solr.update.UpdateLog.recoverFromLog(
> UpdateLog.java:800)
>         at org.apache.solr.cloud.ZkController.register(
> ZkController.java:841)
>         at org.apache.solr.cloud.ZkController$1.command(
> ZkController.java:277)
>         at org.apache.solr.common.cloud.ConnectionManager$1$1.run(
> ConnectionManager.java:166)
> 2015-03-08 05:25:49,225 ERROR [solr.update.UpdateLog] - [Thread-136] - :
> Error inspecting tlog tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0000000000000001471
> refcount=2}
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
>         at org.apache.solr.update.ChannelFastInputStream.
> readWrappedStream(TransactionLog.java:784)
>         at org.apache.solr.common.util.FastInputStream.refill(
> FastInputStream.java:89)
>         at org.apache.solr.common.util.FastInputStream.read(
> FastInputStream.java:125)
>         at java.io.InputStream.read(InputStream.java:101)
>         at org.apache.solr.update.TransactionLog.endsWithCommit(
> TransactionLog.java:218)
>         at org.apache.solr.update.UpdateLog.recoverFromLog(
> UpdateLog.java:800)
>         at org.apache.solr.cloud.ZkController.register(
> ZkController.java:841)
>         at org.apache.solr.cloud.ZkController$1.command(
> ZkController.java:277)
>         at org.apache.solr.common.cloud.ConnectionManager$1$1.run(
> ConnectionManager.java:166)
> 2015-03-08 12:21:04,438 WARN [solr.cloud.RecoveryStrategy] -
> [zkCallback-2-thread-28] - : Stopping recovery for core=oi_h coreNodeName=
> 178.21.116.225:8080_solr_oi_h
>
> The other node makes a mess in the logs:
>
> 2015-03-08 05:25:46,020 WARN [solr.cloud.RecoveryStrategy] -
> [zkCallback-2-thread-20] - : Stopping recovery for core=oi_c coreNodeName=
> 194.145.201.190:
> 8080_solr_oi_c
> 2015-03-08 05:26:08,670 ERROR [solr.cloud.ShardLeaderElectionContext] -
> [zkCallback-2-thread-19] - : There was a problem trying to register as the
> leader:org.
> apache.solr.common.SolrException: Could not register as the leader
> because creating the ephemeral registration node in ZooKeeper failed
>         at org.apache.solr.cloud.ShardLeaderElectionContextBase
> .runLeaderProcess(ElectionContext.java:146)
>         at org.apache.solr.cloud.ShardLeaderElectionContext.
> runLeaderProcess(ElectionContext.java:317)
>         at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(
> LeaderElector.java:163)
>         at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(
> LeaderElector.java:125)
>         at org.apache.solr.cloud.LeaderElector.access$200(
> LeaderElector.java:55)
>         at org.apache.solr.cloud.LeaderElector$ElectionWatcher.
> process(LeaderElector.java:358)
>         at org.apache.solr.common.cloud.SolrZkClient$3$1.run(
> SolrZkClient.java:210)
>         at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: org.apache.zookeeper.
> KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for
> /collections/openi
> ndex/leaders/shard3
>         at org.apache.solr.common.util.RetryUtil.retryOnThrowable(
> RetryUtil.java:40)
>         at org.apache.solr.cloud.ShardLeaderElectionContextBase
> .runLeaderProcess(ElectionContext.java:134)
>         ... 11 more
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException:
> KeeperErrorCode = NodeExists for /collections/oi/leaders/shard3
>         at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:119)
>         at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at org.apache.solr.common.cloud.SolrZkClient$11.execute(
> SolrZkClient.java:463)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:74)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(
> SolrZkClient.java:460)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(
> SolrZkClient.java:417)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(
> SolrZkClient.java:404)
>         at org.apache.solr.cloud.ShardLeaderElectionContextBase
> $1.execute(ElectionContext.java:138)
>         at org.apache.solr.common.util.RetryUtil.retryOnThrowable(
> RetryUtil.java:34)
>         ... 12 more
>
> 2015-03-08 05:26:08,682 WARN [solr.cloud.ElectionContext] -
> [zkCallback-2-thread-19] - : cancelElection did not find election node to
> remove /collections/open
> index/leader_elect/shard3/election/93434598784958480-194.145.201.190:8080
> _solr_oi_c-n_0000000044
> 2015-03-08 05:26:12,692 ERROR [solr.cloud.RecoveryStrategy] -
> [RecoveryThread] - : Error while trying to recover.
> core=oi_c:org.apache.solr.common.Solr
> Exception: No registered leader was found after waiting for 4000ms ,
> collection: oi slice: shard3
>         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(
> ZkStateReader.java:568)
>         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(
> ZkStateReader.java:551)
>         at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:330)
>         at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:235)
>
> ... and then a continuous mess like below
>
> 2015-03-08 05:26:12,692 ERROR [solr.cloud.RecoveryStrategy] -
> [RecoveryThread] - : Recovery failed - trying again... (0) core=oi_c
> 2015-03-08 05:26:26,703 ERROR [solr.cloud.RecoveryStrategy] -
> [RecoveryThread] - : Error while trying to recover.
> core=oi_c:org.apache.solr.common.SolrException: No registered leader was
> found after waiting for 4000ms , collection: oi slice: shard3
>         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(
> ZkStateReader.java:568)
>         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(
> ZkStateReader.java:551)
>         at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:330)
>         at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:235)
>
> I haven't seen this before, the cluster was unavailable for indexing,
> search still worked as expected. Is there an open issue i can't seem to
> find now?
>
> Markus
>