You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dragos C (JIRA)" <ji...@apache.org> on 2016/03/03 13:53:18 UTC

[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

    [ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177771#comment-15177771 ] 

Dragos C commented on SOLR-3274:
--------------------------------

It happened to me on Solr 5.5.0 with the default setup. I just unzipped the file, started solr in cloud mode (sharts: 1, replication factor: 1), with a limit of 6GB (out of 20), java 1.8.0_73 x64, Windows Server 2008 R2 Standard and pushed the core configuration files. I have one zookeeper and one solr behind. I added some documents, but, as  Per Steffensen mentioned, the processor was barely around 70% (with various spikes above this limit). After a while, I am _always_ getting 503 http status and the reply from solr is "Cannot talk to ZooKeeper - Updates are disabled.".

Solr log:

2016-03-03 12:34:20.902 INFO  (qtp1450821318-4031) [c:CORE s:shard1 r:core_node1 x:CORE_shard1_replica1] o.a.s.u.p.LogUpdateProcessorFactory [CORE_shard1_replica1]  webapp=/solr path=/update params={}{} 0 0
2016-03-03 12:34:20.902 ERROR (qtp1450821318-4031) [c:CORE s:shard1 r:core_node1 x:CORE_shard1_replica1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled.
	at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1469)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:667)
	at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
	at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
	at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
	at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:94)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670)
	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.Server.handle(Server.java:499)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
	at java.lang.Thread.run(Unknown Source)

I have an automated tool that generates the xml documents that need to be pushed. And after I receive this error, after a while, I receive 404.



> ZooKeeper related SolrCloud problems
> ------------------------------------
>
>                 Key: SOLR-3274
>                 URL: https://issues.apache.org/jira/browse/SOLR-3274
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.0-ALPHA
>         Environment: Any
>            Reporter: Per Steffensen
>
> Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways...
> Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper "at the moment where those exception occur", but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other "channels" (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine.
> Exception 1) Often the first error we see in solr.log is something like this
> {code}
> Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled.
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
>         at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
>         at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>         at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>         at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>         at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>         at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>         at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>         at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>         at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>         at org.mortbay.jetty.Server.handle(Server.java:326)
>         at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>         at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
>         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
>         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>         at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>         at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal "keeper.getState" does not return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED for a long time, since this error starts occuring after several hours of processing without this problem showing. But why is it suddenly not connected anymore?!
> Exception 2) We also see errors like the following, and if Im not mistaken, they start occuring shortly after "Exception 1)" (above) shows for the fist time
> {code}
> Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: 
>         at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
>         at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> Please note that the exception says "no servers hosting shard: <blank>". Looking at the code a "shard"-string was actually supposed to be written at <blank>.  Basically this means that HttpShardHandler.submit was called with an empty "shard"-string parameter. But who does this? CoreAdminHandler.handleDistribUrlAction or SearchHandler.handleRequestBody or SyncStrategy or PeerSync or... I dont know, and maybe it is not that relevant, because I guess they all get the "shard"-string from ZooKeeper. Again something pointing in the direction of unstable collaboration between Solr and ZooKeeper.
> Exception 3) We also see exceptions like this
> {code}
> Mar 25, 2012 3:05:38 PM org.apache.solr.common.cloud.ZkStateReader$3 process
> WARNING: ZooKeeper watch triggered, but Solr cannot talk to ZK
> Mar 25, 2012 3:05:38 PM org.apache.solr.cloud.LeaderElector$1 process
> WARNING: 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/collA/leader_elect/slice26/election
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
>         at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:266)
>         at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:263)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
>         at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:263)
>         at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:92)
>         at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
>         at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> {code}
> Maybe this will we usable for some bug-fixing or for making the code more stable. I know 4.0 is not stable/released yet, and that we therefore should expect this kind of errors at the moment. So this is not negative criticism - just reporting of issues observed when using SolrCloud features under high load for several days. Any feedback is more than welcome.
> Regards, Per Steffensen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org