You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2015/10/22 09:04:27 UTC
[jira] [Reopened] (SOLR-8069) Ensure that only the valid ZooKeeper
registered leader can put a replica into Leader Initiated Recovery.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shalin Shekhar Mangar reopened SOLR-8069:
-----------------------------------------
There's a reproducible failure in the test added by SOLR-8075 caused by assertion error on asserts added in this issue.
{code}
1 tests failed.
FAILED: org.apache.solr.cloud.LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR
Error Message:
Captured an uncaught exception in thread: Thread[id=43491, name=coreZkRegister-5997-thread-1, state=RUNNABLE, group=TGRP-LeaderInitiatedRecoveryOnShardRestartTest]
Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=43491, name=coreZkRegister-5997-thread-1, state=RUNNABLE, group=TGRP-LeaderInitiatedRecoveryOnShardRestartTest]
Caused by: java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([7F78F76DDF75FAD1]:0)
at org.apache.solr.cloud.ZkController.updateLeaderInitiatedRecoveryState(ZkController.java:2133)
at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:434)
at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:197)
at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:157)
at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:346)
at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:1113)
at org.apache.solr.cloud.ZkController.register(ZkController.java:926)
at org.apache.solr.cloud.ZkController.register(ZkController.java:881)
at org.apache.solr.core.ZkContainer$2.run(ZkContainer.java:183)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
The assertion is that leaderCd != null fails because ShardLeaderElectionContext.runLeaderProcess calls ZkController.updateLeaderInitiatedRecoveryState with a null core descriptor which is by design because if you are marking a replica as 'active' then you don't necessarily need to be a leader.
> Ensure that only the valid ZooKeeper registered leader can put a replica into Leader Initiated Recovery.
> --------------------------------------------------------------------------------------------------------
>
> Key: SOLR-8069
> URL: https://issues.apache.org/jira/browse/SOLR-8069
> Project: Solr
> Issue Type: Bug
> Reporter: Mark Miller
> Assignee: Mark Miller
> Priority: Critical
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-8069.patch, SOLR-8069.patch
>
>
> I've seen this twice now. Need to work on a test.
> When some issues hit all the replicas at once, you can end up in a situation where the rightful leader was put or put itself into LIR. Even on restart, this rightful leader won't take leadership and you have to manually clear the LIR nodes.
> It seems that if all the replicas participate in election on startup, LIR should just be cleared.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org