You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joe Obernberger <jo...@gmail.com> on 2018/08/10 19:20:35 UTC

Solr 7.1 nodes shutting down

Hi All - having an issue that seems to be related to the machine being 
under a high CPU load.  Occasionally a node will fall out of the solr 
cloud cluster.  It will be using 200% CPU and show the following exception:

2018-08-10 15:36:43.416 INFO  (qtp1908316405-203450) [c:models s:shard3 
r:core_node17 x:models_shard3_replica_n14] o.a.s.s.HttpSolrCall Unable 
to write response, client closed connection or we are shutting down
org.eclipse.jetty.io.EofException: Closed
         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:659)
         at 
org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55)
         at 
org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54)
         at java.io.OutputStream.write(OutputStream.java:116)
         at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
         at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
         at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
         at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
         at org.apache.solr.util.FastWriter.flush(FastWriter.java:140)
         at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:154)
         at 
org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:96)
         at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:73)
         at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
         at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:789)
         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)
         at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
         at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
         at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
         at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
         at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
         at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
         at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
         at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
         at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
         at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
         at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
         at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at org.eclipse.jetty.server.Server.handle(Server.java:530)
         at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
         at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
         at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
         at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
         at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
         at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
         at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
         at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
         at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
         at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
         at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
         at java.lang.Thread.run(Thread.java:748)

This is followed by a bunch of exceptions such as:


2018-08-10 19:15:58.989 ERROR (qtp1908316405-209211) [c:UNCLASS 
s:shard23 r:core_node47 x:UNCLASS_shard23_replica_n44] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: 
ClusterState says we are the leader 
(http://triton:9100/solr/UNCLASS_shard23_replica_n44), but locally we 
don't think so. Request came from null
         at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:571)
         at 
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:324)
         at 
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:259)
         at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:614)
         at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
         at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
         at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188)
         at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144)
         at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311)
         at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
         at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130)
         at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:276)
         at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
         at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
         at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:195)
         at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:109)
         at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
         at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
         at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
         at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
         at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517)
         at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
         at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
         at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
         at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
         at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
         at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
         at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
         at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
         at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
         at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
         at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
         at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at org.eclipse.jetty.server.Server.handle(Server.java:530)
         at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
         at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
         at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
         at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
         at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
         at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
         at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
         at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
         at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
         at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
         at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
         at java.lang.Thread.run(Thread.java:748)

and:

2018-08-10 19:14:10.401 ERROR (qtp1908316405-209211) [c:UNCLASS 
s:shard23 r:core_node47 x:UNCLASS_shard23_replica_n44] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
UNCLASS slice: shard23 saw 
state=DocCollection(UNCLASS//collections/UNCLASS/state.json/3828)={

any ideas on what to try?  I've been trying to figure this out for a 
couple days now, but it's very intermittent.

Thank you!

-Joe


Re: Solr 7.1 nodes shutting down

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/10/2018 1:20 PM, Joe Obernberger wrote:
> Hi All - having an issue that seems to be related to the machine being 
> under a high CPU load.  Occasionally a node will fall out of the solr 
> cloud cluster.  It will be using 200% CPU and show the following 
> exception:
>
> 2018-08-10 15:36:43.416 INFO  (qtp1908316405-203450) [c:models 
> s:shard3 r:core_node17 x:models_shard3_replica_n14] 
> o.a.s.s.HttpSolrCall Unable to write response, client closed 
> connection or we are shutting down
> org.eclipse.jetty.io.EofException: Closed

EofException means that the TCP connection got closed. Because the 
timeout that can cause such a disconnection is typically configured for 
either 50 or 60 seconds, something *extreme* has happened in order for 
that timeout to be exceeded.

With no real information to go on, I would guess that you're having 
extreme GC pauses, probably from your heap being too small.

If that's not it, figuring out the problem is going to be an involved 
process that could take a while.  You might want to hang out in the IRC 
channel for a more interactive chat.

Thanks,
Shawn