You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Commit Tag Bot (JIRA)" <ji...@apache.org> on 2013/03/22 17:19:19 UTC
[jira] [Commented] (SOLR-3993) SolrCloud leader election on single
node stucks the initialization
[ https://issues.apache.org/jira/browse/SOLR-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610562#comment-13610562 ]
Commit Tag Bot commented on SOLR-3993:
--------------------------------------
[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revision&revision=1408323
SOLR-3993: If multiple SolrCore's for a shard coexist on a node, on cluster restart, leader election would stall until timeout, waiting to see all of the replicas come up.
> SolrCloud leader election on single node stucks the initialization
> ------------------------------------------------------------------
>
> Key: SOLR-3993
> URL: https://issues.apache.org/jira/browse/SOLR-3993
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.0
> Environment: Windows 7, Tomcat 6
> Reporter: Alexey Kudinov
> Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> setup:
> 1 node, 4 cores, 2 shards.
> 15 documents indexed.
> problem:
> init stage times out.
> probable cause:
> According to the init flow, cores are initialized one by one synchronously.
> Actually, the main thread waits ShardLeaderElectionContext.waitForReplicasToComeUp until retry threshold, while replica cores are not yet initialized, in other words there is no chance other replicas go up in the meanwhile.
> stack trace:
> Thread [main] (Suspended)
> owns: HashMap<K,V> (id=3876)
> owns: StandardContext (id=3877)
> owns: HashMap<K,V> (id=3878)
> owns: StandardHost (id=3879)
> owns: StandardEngine (id=3880)
> owns: Service[] (id=3881)
> Thread.sleep(long) line: not available [native method]
> ShardLeaderElectionContext.waitForReplicasToComeUp(boolean, String) line: 298
> ShardLeaderElectionContext.runLeaderProcess(boolean) line: 143
> LeaderElector.runIamLeaderProcess(ElectionContext, boolean) line: 152
> LeaderElector.checkIfIamLeader(int, ElectionContext, boolean) line: 96
> LeaderElector.joinElection(ElectionContext) line: 262
> ZkController.joinElection(CoreDescriptor, boolean) line: 733
> ZkController.register(String, CoreDescriptor, boolean, boolean) line: 566
> ZkController.register(String, CoreDescriptor) line: 532
> CoreContainer.registerInZk(SolrCore) line: 709
> CoreContainer.register(String, SolrCore, boolean) line: 693
> CoreContainer.load(String, InputSource) line: 535
> CoreContainer.load(String, File) line: 356
> CoreContainer$Initializer.initialize() line: 308
> SolrDispatchFilter.init(FilterConfig) line: 107
> ApplicationFilterConfig.getFilter() line: 295
> ApplicationFilterConfig.setFilterDef(FilterDef) line: 422
> ApplicationFilterConfig.<init>(Context, FilterDef) line: 115
> StandardContext.filterStart() line: 4072
> StandardContext.start() line: 4726
> StandardHost(ContainerBase).addChildInternal(Container) line: 799
> StandardHost(ContainerBase).addChild(Container) line: 779
> StandardHost.addChild(Container) line: 601
> HostConfig.deployDescriptor(String, File, String) line: 675
> HostConfig.deployDescriptors(File, String[]) line: 601
> HostConfig.deployApps() line: 502
> HostConfig.start() line: 1317
> HostConfig.lifecycleEvent(LifecycleEvent) line: 324
> LifecycleSupport.fireLifecycleEvent(String, Object) line: 142
> StandardHost(ContainerBase).start() line: 1065
> StandardHost.start() line: 840
> StandardEngine(ContainerBase).start() line: 1057
> StandardEngine.start() line: 463
> StandardService.start() line: 525
> StandardServer.start() line: 754
> Catalina.start() line: 595
> NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]
> NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available
> DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not available
> Method.invoke(Object, Object...) line: not available
> Bootstrap.start() line: 289
> Bootstrap.main(String[]) line: 414
>
> After a while, the session times out and following exception appears:
> Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
> INFO: Waiting until we see more replicas up: total=2 found=0 timeoutin=-95
> Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
> INFO: Was waiting for replicas to come up, but they are taking too long - assuming they won't come back till later
> Oct 25, 2012 1:16:56 PM org.apache.solr.common.SolrException log
> SEVERE: Errir checking for the number of election participants:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/collection1/leader_elect/shard2/election
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
> at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:227)
> at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:224)
> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
> at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:224)
> at org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:276)
> at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:143)
> at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:152)
> at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
> at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:262)
> at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:733)
> at org.apache.solr.cloud.ZkController.register(ZkController.java:566)
> at org.apache.solr.cloud.ZkController.register(ZkController.java:532)
> at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:709)
> at org.apache.solr.core.CoreContainer.register(CoreContainer.java:693)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:535)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
> at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
> at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
> at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
> at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
> at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
> at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
> at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
> at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
> at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
> at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
> at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
> at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
> at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
> at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
> at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
> at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
> at org.apache.catalina.core.StandardService.start(StandardService.java:525)
> at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
> at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
> at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
> Followed by:
> Oct 25, 2012 1:17:27 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> SEVERE: Recovery failed - trying again... core=collection1
> Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover. core=collection1
> Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover. core=collection1:org.apache.solr.common.SolrException: No registered leader was found, collection:collection1 slice:shard1
> at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413)
> at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399)
> at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org