You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Andy Throgmorton (Jira)" <ji...@apache.org> on 2021/03/09 00:11:00 UTC
[jira] [Resolved] (SOLR-15228) Single host in a bad state can block
collection creation for the cluster with autoscaling enabled
[ https://issues.apache.org/jira/browse/SOLR-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Throgmorton resolved SOLR-15228.
-------------------------------------
Resolution: Duplicate
I guess Jira made another bug when I hit refresh?
> Single host in a bad state can block collection creation for the cluster with autoscaling enabled
> -------------------------------------------------------------------------------------------------
>
> Key: SOLR-15228
> URL: https://issues.apache.org/jira/browse/SOLR-15228
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: AutoScaling
> Affects Versions: 8.2
> Reporter: Andy Throgmorton
> Priority: Minor
>
> We configured a SolrCloud cluster (running 8.2) with this cluster autoscaling policy:
> {noformat}
> {
> "set-cluster-preferences":[
> {
> "minimize":"cores",
> "precision":5
> },
> {
> "maximize":"freedisk",
> "precision":25
> },
> {
> "minimize":"sysLoadAvg",
> "precision":10
> }],
> "set-cluster-policy":[
> {
> "replica": "<2",
> "node": "#ANY"
> }],
> "set-trigger": {
> "name":".auto_add_replicas",
> "event":"nodeLost",
> "waitFor":"10m",
> "enabled":true,
> "actions":[
> {
> "name":"auto_add_replicas_plan",
> "class":"solr.AutoAddReplicasPlanAction"},
> {
> "name":"execute_plan",
> "class":"solr.ExecutePlanAction"}]
> }
> }{noformat}
> A node was rebooted at one point, and when that node came back, it had trouble establishing a connection with ZK when it was initializing the CoreContainer. As a result, it returns 404s for (I think?) all admin requests.
> Now, any call to create a collection in that cluster throw an error, with this stacktrace:
> {noformat}
> 2021-03-04 12:47:03.615 ERROR (OverseerThreadFactory-141-thread-4-processing-n:HOST_REDACTED:8983_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: COLLECTON_REDACTED operation: create failed:org.apache.solr.common.SolrException: Error getting replica locations : unable to get autoscaling policy session
> at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:195)
> at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
> at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.solr.common.SolrException: unable to get autoscaling policy session
> at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:129)
> at org.apache.solr.cloud.api.collections.Assign.getPositionsUsingPolicy(Assign.java:382)
> at org.apache.solr.cloud.api.collections.Assign$PolicyBasedAssignStrategy.assign(Assign.java:630)
> at org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:410)
> at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:190)
> ... 6 more
> Caused by: org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: Error getting remote info
> at org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:78)
> at org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139)
> at org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128)
> at org.apache.solr.client.solrj.cloud.autoscaling.Row.<init>(Row.java:71)
> at org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.<init>(Policy.java:575)
> at org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:396)
> at org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:358)
> at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.createSession(PolicyHelper.java:492)
> at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:457)
> at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:513)
> at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:127)
> ... 10 more
> Caused by: org.apache.solr.common.SolrException: Error getting remote info
> at org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo(SolrClientNodeStateProvider.java:364)
> at org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:76)
> ... 20 more
> Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://HOSTNAME_REDACTED:8983/solr: Expected mime type application/octet-stream but got text/html. <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> <title>Error 404 Not Found</title>
> </head>
> <body><h2>HTTP ERROR 404</h2>
> <p>Problem accessing /solr/admin/metrics. Reason:
> <pre> Not Found</pre></p><h3>Caused by:</h3><pre>javax.servlet.ServletException: javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down.
> at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:168)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.Server.handle(Server.java:505)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
> at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
> at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)
> at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)
> at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
> at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
> at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
> at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down.
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:369)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
> at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
> at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
> ... 21 more
> </pre>
> <h3>Caused by:</h3><pre>javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down.
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:369)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
> at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
> at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.Server.handle(Server.java:505)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
> at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
> at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)
> at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)
> at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at org.eclipse.jetty....{noformat}
> I looked through the Solr code and to me it looks like:
> * Client asks to create collection (CreateCollectionCmd)
> * PolicyHelper.getReplicaLocations tries to build a map of where every replica is
> * To do that, it creates a SessionRef, which needs to populate its cache first
> * SessionRef attempts to collect all metrics, [including metrics from every node|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Policy.java#L583] (or {{Row}})
> * SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo makes the remote call
> ** It will retry on certain errors (see below), but not for this error ({{HttpSolrClient$RemoteSolrException}}), which bubbles up and fails the request
> ** [https://github.com/apache/lucene-solr/blob/branch_8_8/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310-L338]
>
> I realize this autoscaling code is gone in 9.x, but at least wanted to report this issue for documentation purposes, in case others see this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org