You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Salih Sen <sa...@dilisim.com> on 2017/04/03 13:52:42 UTC

Solr Cloud 6.5.0 Replicas go down while indexing

Hi,

We have a three server set up with each server having 756G ram, 48 cores,
4SSDs (each having tree solr instances on them) and a dedicated mechanical
disk for zookeeper (3 zk instances total). Each Solr instances have 31G of
heap space allocated to them. In total we have 36 Solr Instances and 3
Zookeeper instances (with 1G heapspace). Also servers 10Gig network between
them.

We set Auto hardcommit time to 15sec and 10000 docs, and soft commit to 60000
sec and 5000 seconds in order to avoid soft committing too much and
avoiding indexing bottlenecks. We also set DzkClientTimeout=90000.

But it seems replicas still randomly go down while indexing. Do you have
any suggestions to prevent this situation?

ERROR - 2017-04-03 12:24:02.503; [   ]
org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from shard:
http://192.168.30.33:9132/solr
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://192.168.30.33:9132/solr
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:621)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)
... 12 more
ERROR - 2017-04-03 12:27:11.631; [c:doc s:shard3 r:core_node22
x:doc_shard3_replica3]
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
error
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR - 2017-04-03 12:27:11.633; [c:doc s:shard3 r:core_node22
x:doc_shard3_replica3]
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
error
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR - 2017-04-03 12:27:11.645; [c:doc s:shard3 r:core_node22
x:doc_shard3_replica3]
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
error
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
W




*Salih Şen*
M: +90 533 131 17 07
E: salih@dilisim.com
W: www.dilisim.com
Skype: slhsen

Re: Solr Cloud 6.5.0 Replicas go down while indexing

Posted by Michael Joyner <mi...@newsrx.com>.
Try Increasing the number of connections your ZooKeeper allows to a very 
large number.


On 04/04/2017 09:02 AM, Salih Sen wrote:
> Hi,
>
> One of the replicas went down again today somehow disabling all 
> updates to cluster with error message "Cannot talk to ZooKeeper - 
> Updates are disabled.\u201d half an hour.
>
> ZK Leader was on the same server with Solr instance so I doubt it has 
> anything to do with network (at least between Solr and ZK leader 
> node), restarting the ZK leader seems to resolve the issue and cluster 
> accepting updates again.
>
>
> == Solr Node
> WARN  - 2017-04-04 11:49:14.414; [   ] 
> org.apache.solr.common.cloud.ConnectionManager; Watcher 
> org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name: 
> ZooKeeperConnection Watcher:192.168.30.32:2181 
> <http://192.168.30.32:2181>,192.168.30.33:2181 
> <http://192.168.30.33:2181>,192.168.30.24:2181 
> <http://192.168.30.24:2181> got event WatchedEvent state:Disconnected 
> type:None path:null path: null type: None
> WARN  - 2017-04-04 11:49:15.723; [   ] 
> org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected
> WARN  - 2017-04-04 11:49:15.727; [   ] 
> org.apache.solr.common.cloud.ConnectionManager; Watcher 
> org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name: 
> ZooKeeperConnection Watcher:192.168.30.32:2181 
> <http://192.168.30.32:2181>,192.168.30.33:2181 
> <http://192.168.30.33:2181>,192.168.30.24:2181 
> <http://192.168.30.24:2181> got event WatchedEvent state:Expired 
> type:None path:null path: null type: None
> WARN  - 2017-04-04 11:49:15.727; [   ] 
> org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper 
> session was expired. Attempting to reconnect to recover relationship 
> with ZooKeeper...
> WARN  - 2017-04-04 11:49:15.728; [   ] 
> org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection 
> expired - starting a new one...
> ERROR - 2017-04-04 11:49:22.040; [c:doc s:shard6 r:core_node27 
> x:doc_shard6_replica1] org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
> Updates are disabled.
>         at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1739)
>         at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:703)
>         at 
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
>         at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
>         at 
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
>         at 
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
>         at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
>         at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
>         at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>         at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
>
> ERROR - 2017-04-04 11:50:13.798; [   ] 
> org.apache.solr.common.SolrException; 
> null:org.apache.solr.common.SolrException: Error trying to proxy 
> request for url: http://192.168.30.24:9141/solr/doc/select
>         at 
> org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:659)
>         at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:513)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>         at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
>         at 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>         at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>         at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
>         at 
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>         at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>         at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>         at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>         at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.eclipse.jetty.io.EofException
>         at 
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:199)
>         at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:420)
>         at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:313)
>         at 
> org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:140)
>         at 
> org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:744)
>         at 
> org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
>         at 
> org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224)
>         at 
> org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:518)
>         at 
> org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:724)
>         at 
> org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:775)
>         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:235)
>         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:219)
>         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:496)
>         at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2147)
>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:2102)
>         at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2123)
>         at 
> org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:655)
>
>
> === ZK Leader
> 2017-04-04 14:48:46,327 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 
> <ht...@197>] - Accepted 
> socket connection from /192.168.30.24:57990 <http://192.168.30.24:57990>
> 2017-04-04 14:48:46,499 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 
> <ht...@197>] - Accepted 
> socket connection from /192.168.30.24:57994 <http://192.168.30.24:57994>
> 2017-04-04 14:48:50,005 [myid:3] - INFO 
>  [SessionTracker:ZooKeeperServer@347] - Expiring session 
> 0x15b14ba8a8e0054, timeout of 40000ms exceeded
> 2017-04-04 14:48:50,005 [myid:3] - INFO  [ProcessThread(sid:3 
> cport:-1)::PrepRequestProcessor@494] - Processed session termination 
> for sessionid: 0x15b14ba8a8e0054
> 2017-04-04 14:48:59,821 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861 
> <ht...@861>] - Client attempting 
> to renew session 0x15b14ba8a8e004b at /192.168.30.24:57990 
> <http://192.168.30.24:57990>
> 2017-04-04 14:48:59,822 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617 
> <ht...@617>] - Established 
> session 0x15b14ba8a8e004b with negotiated timeout 40000 for client 
> /192.168.30.24:57990 <http://192.168.30.24:57990>
> 2017-04-04 14:48:59,822 [myid:3] - WARN 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 
> <ht...@357>] - caught end of 
> stream exception
> EndOfStreamException: Unable to read additional data from client 
> sessionid 0x15b14ba8a8e004b, likely client has closed socket
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>         at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-04-04 14:48:59,827 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 
> <ht...@1007>] - Closed socket 
> connection for client /192.168.30.24:57990 
> <http://192.168.30.24:57990> which had sessionid 0x15b14ba8a8e004b
> 2017-04-04 14:48:59,827 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861 
> <ht...@861>] - Client attempting 
> to renew session 0x15b14ba8a8e0050 at /192.168.30.24:57994 
> <http://192.168.30.24:57994>
> 2017-04-04 14:48:59,827 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617 
> <ht...@617>] - Established 
> session 0x15b14ba8a8e0050 with negotiated timeout 40000 for client 
> /192.168.30.24:57994 <http://192.168.30.24:57994>
> 2017-04-04 14:49:17,455 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 
> <ht...@197>] - Accepted 
> socket connection from /192.168.30.24:58082 <http://192.168.30.24:58082>
> 2017-04-04 14:49:17,667 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 
> <ht...@197>] - Accepted 
> socket connection from /192.168.30.32:56600 <http://192.168.30.32:56600>
> 2017-04-04 14:49:17,667 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861 
> <ht...@861>] - Client attempting 
> to renew session 0x15b14ba8a8e0043 at /192.168.30.32:56600 
> <http://192.168.30.32:56600>
> 2017-04-04 14:49:17,681 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617 
> <ht...@617>] - Established 
> session 0x15b14ba8a8e0043 with negotiated timeout 40000 for client 
> /192.168.30.32:56600 <http://192.168.30.32:56600>
> 2017-04-04 14:49:22,040 [myid:3] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868 
> <ht...@868>] - Client attempting 
> to establish new session at /192.168.30.24:58082 
> <http://192.168.30.24:58082>
> 2017-04-04 14:49:22,051 [myid:3] - INFO 
>  [CommitProcessor:3:ZooKeeperServer@617] - Established session 
> 0x35ad61c452c00d3 with negotiated timeout 40000 for client 
> /192.168.30.24:58082 <http://192.168.30.24:58082>
> 2017-04-04 14:49:28,659 [myid:3] - INFO  [ProcessThread(sid:3 
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException 
> when processing sessionid:0x35ad61c452c00d3 type:delete cxid:0xe 
> zxid:0x700004c25 txntype:-1 reqpath:n/a Error 
> Path:/overseer_elect/election/97694608339632212-192.168.30.24:9133_solr-n_0000000380 
> Error:KeeperErrorCode = NoNode for 
> /overseer_elect/election/97694608339632212-192.168.30.24:9133_solr-n_0000000380
> 2017-04-04 14:49:28,675 [myid:3] - INFO  [ProcessThread(sid:3 
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException 
> when processing sessionid:0x35ad61c452c00d3 type:create cxid:0x13 
> zxid:0x700004c27 txntype:-1 reqpath:n/a Error Path:/overseer 
> Error:KeeperErrorCode = NodeExists for /overseer
>
>
> == ZK Follower 1
> 2017-04-04 14:48:45,570 [myid:1] - WARN 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 
> <ht...@357>] - caught end of 
> stream exception
> EndOfStreamException: Unable to read additional data from client 
> sessionid 0x15b14ba8a8e004b, likely client has closed socket
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>         at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-04-04 14:48:45,587 [myid:1] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 
> <ht...@1007>] - Closed socket 
> connection for client /192.168.30.24:39820 
> <http://192.168.30.24:39820> which had sessionid 0x15b14ba8a8e004b
> 2017-04-04 14:48:45,587 [myid:1] - WARN 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 
> <ht...@357>] - caught end of 
> stream exception
> EndOfStreamException: Unable to read additional data from client 
> sessionid 0x15b14ba8a8e0050, likely client has closed socket
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>         at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-04-04 14:48:45,589 [myid:1] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 
> <ht...@1007>] - Closed socket 
> connection for client /192.168.30.24:40132 
> <http://192.168.30.24:40132> which had sessionid 0x15b14ba8a8e0050
> 2017-04-04 14:48:48,351 [myid:1] - WARN 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 
> <ht...@357>] - caught end of 
> stream exception
> EndOfStreamException: Unable to read additional data from client 
> sessionid 0x15b14ba8a8e0054, likely client has closed socket
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>         at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-04-04 14:48:48,352 [myid:1] - INFO 
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 
> <ht...@1007>] - Closed socket 
> connection for client /192.168.30.24:40212 
> <http://192.168.30.24:40212> which had sessionid 0x15b14ba8a8e0054
> 2017-04-04 15:24:03,034 [myid:1] - WARN 
>  [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken 
> for id 3, my id = 1, error =
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
> 2017-04-04 15:24:03,053 [myid:1] - WARN 
>  [RecvWorker:3:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
> 2017-04-04 15:24:03,093 [myid:1] - WARN 
>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception 
> when following the leader
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
>         at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>         at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
>
>
>
>
>
> *Salih \u015een*
> M: +90 533 131 17 07
> E: salih@dilisim.com <ma...@dilisim.com>
> W: www.dilisim.com <http://www.dilisim.com>
> Skype: slhsen
>
> On 4 April 2017 at 10:36:14, Salih Sen (salih@dilisim.com 
> <ma...@dilisim.com>) wrote:
>
>> Hi,
>>
>> Sorry for the initial hurried up mail, here is some correction and 
>> further explanation:
>>
>> Problem I described previously was happening before we set 
>> zkClientTimeout value so it was 30000 when it happened.
>>
>> autoCommit maxTime value is 15000 and autoSoftCommit maxTime is 60000.
>>
>> We recently removed maxDocs values from autoCommit settings and it 
>> seems more stable so far and has better response time.
>>
>> I can\u2019t seem to find these values on Solr logs probably because 
>> logging level is currently WARN but we left those as default so I 
>> think they\u2019re set as the values in solr.xml
>> <int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
>> <int 
>> name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>
>>
>>
>> We have 12 replicas using default routing. All commits and queries 
>> are going to a single node because of the dummy client we use. 
>> Documents are send in JSON format. I don\u2019t have exact knowledge of 
>> document size, they are mostly news article sized, though with lots 
>> of dynamic fields.
>>
>> Sematext SPM currently shows \u201cAdded Docs Rate\u201d as ~1.70k/sec for the 
>> server that is receiving updates.
>>
>> Once problem starts happening multiple replicas go down (not 
>> necessarily the one receiving the update request from client) and 
>> cluster starts returning errors to update requests.
>>
>>
>> We saw entries like following in Zookeeper logs that\u2019s why we thought 
>> it might be related to zkClientTimeout and value.
>>
>> 2017-04-03 09:13:03,040 [myid:1] - INFO 
>>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 
>> <ht...@1007>] - Closed socket 
>> connection for client /192.168.30.32:36420 
>> <http://192.168.30.32:36420> which had sessionid 0x25ad61c4507008c
>> 2017-04-03 09:27:02,078 [myid:1] - WARN 
>>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 
>> <ht...@357>] - caught end of 
>> stream exception
>> EndOfStreamException: Unable to read additional data from client 
>> sessionid 0x15b14ba8a8e0026, likely client has closed socket
>> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>> at 
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-04-03 09:27:02,079 [myid:1] - INFO 
>>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 
>> <ht...@1007>] - Closed socket 
>> connection for client /192.168.30.32:35636 
>> <http://192.168.30.32:35636> which had sessionid 0x15b14ba8a8e0026
>> 2017-04-03 09:35:19,362 [myid:1] - INFO 
>>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 
>> <ht...@197>] - Accepted 
>> socket connection from /192.168.30.32:37970 <http://192.168.30.32:37970>
>>
>>
>>
>> *Salih \u015een*
>> M: +90 533 131 17 07
>> E: salih@dilisim.com <ma...@dilisim.com>
>> W: www.dilisim.com <http://www.dilisim.com>
>> Skype: slhsen
>>
>> On 3 April 2017 at 18:01:15, Erick Erickson (erickerickson@gmail.com 
>> <ma...@gmail.com>) wrote:
>>
>>> bq: We set Auto hardcommit time to 15sec and 10000 docs, and soft
>>> commit to 60000 sec and 5000 seconds
>>>
>>> Just a sanity check, the commit intervals are in milliseconds, your
>>> units look mixed up above, I'm guessing it's just a typo though. I
>>> usually don't use maxDocs because it's unpredictable. Say you're
>>> indexing at a furious rate. If you are indexing at 5,000 docs a second
>>> (and assuming the above was supposed to be soft committing every 60
>>> seconds or 5,000 docs) you'll still be autocommitting every second.
>>>
>>> While that could be related, it's not particularly germane to your
>>> timeout. My guess is that you're getting these errors on the leader?
>>> what do you have in solr.xml for:
>>>
>>> distribUpdateConnTimeout and distribUpdateSoTimeout
>>>
>>> Those are likely the timeouts that matter. And how big are your
>>> documents? The scenario I'm thinking of is that the leader sends the
>>> update to the replica and the timeout for the replica's response
>>> exceeds the ones above.
>>>
>>> BTW, it can be useful on startup to look at your solr.log. The
>>> _actual_ values for all the timeouts are printed out, including any
>>> sysvars you've used.
>>>
>>> And how are you indexing? Mostly I'm wondering how fast you're sending
>>> docs to each leader and how.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, Apr 3, 2017 at 6:52 AM, Salih Sen <salih@dilisim.com 
>>> <ma...@dilisim.com>> wrote:
>>> > Hi,
>>> >
>>> > We have a three server set up with each server having 756G ram, 48 
>>> cores,
>>> > 4SSDs (each having tree solr instances on them) and a dedicated 
>>> mechanical
>>> > disk for zookeeper (3 zk instances total). Each Solr instances 
>>> have 31G of
>>> > heap space allocated to them. In total we have 36 Solr Instances and 3
>>> > Zookeeper instances (with 1G heapspace). Also servers 10Gig 
>>> network between
>>> > them.
>>> >
>>> > We set Auto hardcommit time to 15sec and 10000 docs, and soft 
>>> commit to
>>> > 60000 sec and 5000 seconds in order to avoid soft committing too 
>>> much and
>>> > avoiding indexing bottlenecks. We also set DzkClientTimeout=90000.
>>> >
>>> > But it seems replicas still randomly go down while indexing. Do 
>>> you have any
>>> > suggestions to prevent this situation?
>>> >
>>> > ERROR - 2017-04-03 12:24:02.503; [ ]
>>> > org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from 
>>> shard:
>>> > http://192.168.30.33:9132/solr
>>> > org.apache.solr.client.solrj.SolrServerException: Timeout occured 
>>> while
>>> > waiting response from server at: http://192.168.30.33:9132/solr
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:621)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
>>> > at 
>>> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
>>> > at
>>> > 
>>> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> > at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> > at
>>> > 
>>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>>> > at
>>> > 
>>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>>> > at
>>> > 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> > at
>>> > 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > Caused by: java.net.SocketTimeoutException: Read timed out
>>> > at java.net.SocketInputStream.socketRead0(Native Method)
>>> > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>> > at java.net.SocketInputStream.read(SocketInputStream.java:171)
>>> > at java.net.SocketInputStream.read(SocketInputStream.java:141)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
>>> > at
>>> > 
>>> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
>>> > at
>>> > 
>>> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
>>> > at
>>> > 
>>> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
>>> > at
>>> > 
>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
>>> > at
>>> > 
>>> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
>>> > at
>>> > 
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
>>> > at
>>> > 
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
>>> > at
>>> > 
>>> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
>>> > at
>>> > 
>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>> > at
>>> > 
>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)
>>> > ... 12 more
>>> > ERROR - 2017-04-03 12:27:11.631; [c:doc s:shard3 r:core_node22
>>> > x:doc_shard3_replica3]
>>> > 
>>> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
>>> > error
>>> > java.net.SocketTimeoutException: Read timed out
>>> > at java.net.SocketInputStream.socketRead0(Native Method)
>>> > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>> > at java.net.SocketInputStream.read(SocketInputStream.java:171)
>>> > at java.net.SocketInputStream.read(SocketInputStream.java:141)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
>>> > at
>>> > 
>>> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
>>> > at
>>> > 
>>> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
>>> > at
>>> > 
>>> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
>>> > at
>>> > 
>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
>>> > at
>>> > 
>>> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
>>> > at
>>> > 
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
>>> > at
>>> > 
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
>>> > at
>>> > 
>>> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
>>> > at
>>> > 
>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>> > at
>>> > 
>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
>>> > at
>>> > 
>>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>>> > at
>>> > 
>>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>>> > at
>>> > 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> > at
>>> > 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > ERROR - 2017-04-03 12:27:11.633; [c:doc s:shard3 r:core_node22
>>> > x:doc_shard3_replica3]
>>> > 
>>> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
>>> > error
>>> > java.net.SocketTimeoutException: Read timed out
>>> > at java.net.SocketInputStream.socketRead0(Native Method)
>>> > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>> > at java.net.SocketInputStream.read(SocketInputStream.java:171)
>>> > at java.net.SocketInputStream.read(SocketInputStream.java:141)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
>>> > at
>>> > 
>>> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
>>> > at
>>> > 
>>> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
>>> > at
>>> > 
>>> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
>>> > at
>>> > 
>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
>>> > at
>>> > 
>>> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
>>> > at
>>> > 
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
>>> > at
>>> > 
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
>>> > at
>>> > 
>>> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
>>> > at
>>> > 
>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>> > at
>>> > 
>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
>>> > at
>>> > 
>>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>>> > at
>>> > 
>>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>>> > at
>>> > 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> > at
>>> > 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > ERROR - 2017-04-03 12:27:11.645; [c:doc s:shard3 r:core_node22
>>> > x:doc_shard3_replica3]
>>> > 
>>> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
>>> > error
>>> > java.net.SocketTimeoutException: Read timed out
>>> > at java.net.SocketInputStream.socketRead0(Native Method)
>>> > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>> > at java.net.SocketInputStream.read(SocketInputStream.java:171)
>>> > at java.net.SocketInputStream.read(SocketInputStream.java:141)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
>>> > at
>>> > 
>>> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
>>> > at
>>> > 
>>> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
>>> > at
>>> > 
>>> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
>>> > at
>>> > 
>>> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
>>> > at
>>> > 
>>> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
>>> > at
>>> > 
>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
>>> > at
>>> > 
>>> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
>>> > at
>>> > 
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
>>> > at
>>> > 
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
>>> > at
>>> > 
>>> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
>>> > at
>>> > 
>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>> > at
>>> > 
>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
>>> > at
>>> > 
>>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
>>> > at
>>> > 
>>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>>> > at
>>> > 
>>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>>> > at
>>> > 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> > at
>>> > 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > W
>>> >
>>> >
>>> >
>>> >
>>> > Salih \u015een
>>> > M: +90 533 131 17 07
>>> > E: salih@dilisim.com <ma...@dilisim.com>
>>> > W: www.dilisim.com <http://www.dilisim.com>
>>> > Skype: slhsen



Re: Solr Cloud 6.5.0 Replicas go down while indexing

Posted by Salih Sen <sa...@dilisim.com>.
Hi,

One of the replicas went down again today somehow disabling all updates to
cluster with error message "Cannot talk to ZooKeeper - Updates are
disabled.” half an hour.

ZK Leader was on the same server with Solr instance so I doubt it has
anything to do with network (at least between Solr and ZK leader node),
restarting the ZK leader seems to resolve the issue and cluster accepting
updates again.


== Solr Node
WARN  - 2017-04-04 11:49:14.414; [   ]
org.apache.solr.common.cloud.ConnectionManager; Watcher
org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name:
ZooKeeperConnection Watcher:192.168.30.32:2181,192.168.30.33:2181,
192.168.30.24:2181 got event WatchedEvent state:Disconnected type:None
path:null path: null type: None
WARN  - 2017-04-04 11:49:15.723; [   ]
org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected
WARN  - 2017-04-04 11:49:15.727; [   ]
org.apache.solr.common.cloud.ConnectionManager; Watcher
org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name:
ZooKeeperConnection Watcher:192.168.30.32:2181,192.168.30.33:2181,
192.168.30.24:2181 got event WatchedEvent state:Expired type:None path:null
path: null type: None
WARN  - 2017-04-04 11:49:15.727; [   ]
org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper
session was expired. Attempting to reconnect to recover relationship with
ZooKeeper...
WARN  - 2017-04-04 11:49:15.728; [   ]
org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired
- starting a new one...
ERROR - 2017-04-04 11:49:22.040; [c:doc s:shard6 r:core_node27
x:doc_shard6_replica1] org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
are disabled.
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1739)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:703)
        at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
        at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
        at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
        at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
        at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
        at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
        at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
        at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
        at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
        at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
        at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
        at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
        at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
        at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:534)


ERROR - 2017-04-04 11:50:13.798; [   ]
org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: Error trying to proxy request
for url: http://192.168.30.24:9141/solr/doc/select
        at
org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:659)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:513)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:534)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
        at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
        at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
        at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.eclipse.jetty.io.EofException
        at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:199)
        at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:420)
        at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:313)
        at
org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:140)
        at
org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:744)
        at
org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
        at
org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224)
        at
org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:518)
        at
org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:724)
        at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:775)
        at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:235)
        at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:219)
        at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:496)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2147)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:2102)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2123)
        at
org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:655)


=== ZK Leader
2017-04-04 14:48:46,327 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.30.24:57990
2017-04-04 14:48:46,499 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.30.24:57994
2017-04-04 14:48:50,005 [myid:3] - INFO  [SessionTracker:ZooKeeperServer@347]
- Expiring session 0x15b14ba8a8e0054, timeout of 40000ms exceeded
2017-04-04 14:48:50,005 [myid:3] - INFO  [ProcessThread(sid:3
cport:-1)::PrepRequestProcessor@494] - Processed session termination for
sessionid: 0x15b14ba8a8e0054
2017-04-04 14:48:59,821 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew
session 0x15b14ba8a8e004b at /192.168.30.24:57990
2017-04-04 14:48:59,822 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established session
0x15b14ba8a8e004b with negotiated timeout 40000 for client /
192.168.30.24:57990
2017-04-04 14:48:59,822 [myid:3] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x15b14ba8a8e004b, likely client has closed socket
        at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2017-04-04 14:48:59,827 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.24:57990 which had sessionid 0x15b14ba8a8e004b
2017-04-04 14:48:59,827 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew
session 0x15b14ba8a8e0050 at /192.168.30.24:57994
2017-04-04 14:48:59,827 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established session
0x15b14ba8a8e0050 with negotiated timeout 40000 for client /
192.168.30.24:57994
2017-04-04 14:49:17,455 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.30.24:58082
2017-04-04 14:49:17,667 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.30.32:56600
2017-04-04 14:49:17,667 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew
session 0x15b14ba8a8e0043 at /192.168.30.32:56600
2017-04-04 14:49:17,681 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established session
0x15b14ba8a8e0043 with negotiated timeout 40000 for client /
192.168.30.32:56600
2017-04-04 14:49:22,040 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish
new session at /192.168.30.24:58082
2017-04-04 14:49:22,051 [myid:3] - INFO
 [CommitProcessor:3:ZooKeeperServer@617] - Established session
0x35ad61c452c00d3 with negotiated timeout 40000 for client /
192.168.30.24:58082
2017-04-04 14:49:28,659 [myid:3] - INFO  [ProcessThread(sid:3
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
processing sessionid:0x35ad61c452c00d3 type:delete cxid:0xe
zxid:0x700004c25 txntype:-1 reqpath:n/a Error
Path:/overseer_elect/election/97694608339632212-192.168.30.24:9133_solr-n_0000000380
Error:KeeperErrorCode = NoNode for
/overseer_elect/election/97694608339632212-192.168.30.24:9133_solr-n_0000000380
2017-04-04 14:49:28,675 [myid:3] - INFO  [ProcessThread(sid:3
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
processing sessionid:0x35ad61c452c00d3 type:create cxid:0x13
zxid:0x700004c27 txntype:-1 reqpath:n/a Error Path:/overseer
Error:KeeperErrorCode = NodeExists for /overseer


== ZK Follower 1
2017-04-04 14:48:45,570 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x15b14ba8a8e004b, likely client has closed socket
        at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2017-04-04 14:48:45,587 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.24:39820 which had sessionid 0x15b14ba8a8e004b
2017-04-04 14:48:45,587 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x15b14ba8a8e0050, likely client has closed socket
        at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2017-04-04 14:48:45,589 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.24:40132 which had sessionid 0x15b14ba8a8e0050
2017-04-04 14:48:48,351 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x15b14ba8a8e0054, likely client has closed socket
        at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2017-04-04 14:48:48,352 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.24:40212 which had sessionid 0x15b14ba8a8e0054
2017-04-04 15:24:03,034 [myid:1] - WARN
 [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for id
3, my id = 1, error =
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2017-04-04 15:24:03,053 [myid:1] - WARN
 [RecvWorker:3:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
2017-04-04 15:24:03,093 [myid:1] - WARN
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
        at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
        at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)





*Salih Şen*
M: +90 533 131 17 07
E: salih@dilisim.com
W: www.dilisim.com
Skype: slhsen

On 4 April 2017 at 10:36:14, Salih Sen (salih@dilisim.com) wrote:

Hi,

Sorry for the initial hurried up mail, here is some correction and further
explanation:

Problem I described previously was happening before we set zkClientTimeout
value so it was 30000 when it happened.

autoCommit maxTime value is 15000 and autoSoftCommit maxTime is 60000.

We recently removed maxDocs values from autoCommit settings and it seems
more stable so far and has better response time.

I can’t seem to find these values on Solr logs probably because logging
level is currently WARN but we left those as default so I think they’re set
as the values in solr.xml
<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
<int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>


We have 12 replicas using default routing. All commits and queries are
going to a single node because of the dummy client we use. Documents are
send in JSON format. I don’t have exact knowledge of document size, they
are mostly news article sized, though with lots of dynamic fields.

Sematext SPM currently shows “Added Docs Rate” as ~1.70k/sec for the server
that is receiving updates.

Once problem starts happening multiple replicas go down (not necessarily
the one receiving the update request from client) and cluster starts
returning errors to update requests.


We saw entries like following in Zookeeper logs that’s why we thought it
might be related to zkClientTimeout and value.

2017-04-03 09:13:03,040 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.32:36420 which had sessionid 0x25ad61c4507008c
2017-04-03 09:27:02,078 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x15b14ba8a8e0026, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-04-03 09:27:02,079 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.32:35636 which had sessionid 0x15b14ba8a8e0026
2017-04-03 09:35:19,362 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.30.32:37970



*Salih Şen*
M: +90 533 131 17 07
E: salih@dilisim.com
W: www.dilisim.com
Skype: slhsen

On 3 April 2017 at 18:01:15, Erick Erickson (erickerickson@gmail.com) wrote:

bq: We set Auto hardcommit time to 15sec and 10000 docs, and soft
commit to 60000 sec and 5000 seconds

Just a sanity check, the commit intervals are in milliseconds, your
units look mixed up above, I'm guessing it's just a typo though. I
usually don't use maxDocs because it's unpredictable. Say you're
indexing at a furious rate. If you are indexing at 5,000 docs a second
(and assuming the above was supposed to be soft committing every 60
seconds or 5,000 docs) you'll still be autocommitting every second.

While that could be related, it's not particularly germane to your
timeout. My guess is that you're getting these errors on the leader?
what do you have in solr.xml for:

distribUpdateConnTimeout and distribUpdateSoTimeout

Those are likely the timeouts that matter. And how big are your
documents? The scenario I'm thinking of is that the leader sends the
update to the replica and the timeout for the replica's response
exceeds the ones above.

BTW, it can be useful on startup to look at your solr.log. The
_actual_ values for all the timeouts are printed out, including any
sysvars you've used.

And how are you indexing? Mostly I'm wondering how fast you're sending
docs to each leader and how.

Best,
Erick

On Mon, Apr 3, 2017 at 6:52 AM, Salih Sen <sa...@dilisim.com> wrote:
> Hi,
>
> We have a three server set up with each server having 756G ram, 48 cores,
> 4SSDs (each having tree solr instances on them) and a dedicated mechanical
> disk for zookeeper (3 zk instances total). Each Solr instances have 31G of
> heap space allocated to them. In total we have 36 Solr Instances and 3
> Zookeeper instances (with 1G heapspace). Also servers 10Gig network
between
> them.
>
> We set Auto hardcommit time to 15sec and 10000 docs, and soft commit to
> 60000 sec and 5000 seconds in order to avoid soft committing too much and
> avoiding indexing bottlenecks. We also set DzkClientTimeout=90000.
>
> But it seems replicas still randomly go down while indexing. Do you have
any
> suggestions to prevent this situation?
>
> ERROR - 2017-04-03 12:24:02.503; [ ]
> org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from shard:
> http://192.168.30.33:9132/solr
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://192.168.30.33:9132/solr
> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:621)
> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
> at
>
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
>
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
>
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
>
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
>
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
>
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
>
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
>
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
>
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
>
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
>
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
>
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
>
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
>
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)
> ... 12 more
> ERROR - 2017-04-03 12:27:11.631; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
>
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
>
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
>
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
>
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
>
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
>
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
>
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
>
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
>
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
>
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
>
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
>
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
> at
>
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
>
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ERROR - 2017-04-03 12:27:11.633; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
>
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
>
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
>
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
>
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
>
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
>
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
>
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
>
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
>
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
>
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
>
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
>
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
> at
>
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
>
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ERROR - 2017-04-03 12:27:11.645; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
>
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
>
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
>
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
>
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
>
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
>
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
>
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
>
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
>
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
>
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
>
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
>
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
> at
>
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
>
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> W
>
>
>
>
> Salih Şen
> M: +90 533 131 17 07
> E: salih@dilisim.com
> W: www.dilisim.com
> Skype: slhsen

Re: Solr Cloud 6.5.0 Replicas go down while indexing

Posted by Salih Sen <sa...@dilisim.com>.
Hi,

Sorry for the initial hurried up mail, here is some correction and further
explanation:

Problem I described previously was happening before we set zkClientTimeout
value so it was 30000 when it happened.

autoCommit maxTime value is 15000 and autoSoftCommit maxTime is 60000.

We recently removed maxDocs values from autoCommit settings and it seems
more stable so far and has better response time.

I can’t seem to find these values on Solr logs probably because logging
level is currently WARN but we left those as default so I think they’re set
as the values in solr.xml
<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
<int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>


We have 12 replicas using default routing. All commits and queries are
going to a single node because of the dummy client we use. Documents are
send in JSON format. I don’t have exact knowledge of document size, they
are mostly news article sized, though with lots of dynamic fields.

Sematext SPM currently shows “Added Docs Rate” as ~1.70k/sec for the server
that is receiving updates.

Once problem starts happening multiple replicas go down (not necessarily
the one receiving the update request from client) and cluster starts
returning errors to update requests.


We saw entries like following in Zookeeper logs that’s why we thought it
might be related to zkClientTimeout and value.

2017-04-03 09:13:03,040 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.32:36420 which had sessionid 0x25ad61c4507008c
2017-04-03 09:27:02,078 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x15b14ba8a8e0026, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-04-03 09:27:02,079 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.32:35636 which had sessionid 0x15b14ba8a8e0026
2017-04-03 09:35:19,362 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.30.32:37970



*Salih Şen*
M: +90 533 131 17 07
E: salih@dilisim.com
W: www.dilisim.com
Skype: slhsen

On 3 April 2017 at 18:01:15, Erick Erickson (erickerickson@gmail.com) wrote:

bq: We set Auto hardcommit time to 15sec and 10000 docs, and soft
commit to 60000 sec and 5000 seconds

Just a sanity check, the commit intervals are in milliseconds, your
units look mixed up above, I'm guessing it's just a typo though. I
usually don't use maxDocs because it's unpredictable. Say you're
indexing at a furious rate. If you are indexing at 5,000 docs a second
(and assuming the above was supposed to be soft committing every 60
seconds or 5,000 docs) you'll still be autocommitting every second.

While that could be related, it's not particularly germane to your
timeout. My guess is that you're getting these errors on the leader?
what do you have in solr.xml for:

distribUpdateConnTimeout and distribUpdateSoTimeout

Those are likely the timeouts that matter. And how big are your
documents? The scenario I'm thinking of is that the leader sends the
update to the replica and the timeout for the replica's response
exceeds the ones above.

BTW, it can be useful on startup to look at your solr.log. The
_actual_ values for all the timeouts are printed out, including any
sysvars you've used.

And how are you indexing? Mostly I'm wondering how fast you're sending
docs to each leader and how.

Best,
Erick

On Mon, Apr 3, 2017 at 6:52 AM, Salih Sen <sa...@dilisim.com> wrote:
> Hi,
>
> We have a three server set up with each server having 756G ram, 48 cores,
> 4SSDs (each having tree solr instances on them) and a dedicated
mechanical
> disk for zookeeper (3 zk instances total). Each Solr instances have 31G
of
> heap space allocated to them. In total we have 36 Solr Instances and 3
> Zookeeper instances (with 1G heapspace). Also servers 10Gig network
between
> them.
>
> We set Auto hardcommit time to 15sec and 10000 docs, and soft commit to
> 60000 sec and 5000 seconds in order to avoid soft committing too much and
> avoiding indexing bottlenecks. We also set DzkClientTimeout=90000.
>
> But it seems replicas still randomly go down while indexing. Do you have
any
> suggestions to prevent this situation?
>
> ERROR - 2017-04-03 12:24:02.503; [ ]
> org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from shard:
> http://192.168.30.33:9132/solr
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://192.168.30.33:9132/solr
> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:621)

> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)

> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)

> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
> at
>
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)

> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
>
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

> at
>
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

> at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

> at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)

> at
>
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)

> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)

> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)

> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)

> at
>
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)

> at
>
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)

> at
>
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)

> at
>
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)

> at
>
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)

> at
>
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)

> at
>
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)

> at
>
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)

> at
>
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)

> at
>
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)

> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)

> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)

> ... 12 more
> ERROR - 2017-04-03 12:27:11.631; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
>
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;

> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)

> at
>
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)

> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)

> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)

> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)

> at
>
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)

> at
>
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)

> at
>
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)

> at
>
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)

> at
>
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)

> at
>
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)

> at
>
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)

> at
>
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)

> at
>
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)

> at
>
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)

> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)

> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)

> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

> at
>
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

> at
>
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

> at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

> at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

> at java.lang.Thread.run(Thread.java:745)
> ERROR - 2017-04-03 12:27:11.633; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
>
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;

> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)

> at
>
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)

> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)

> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)

> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)

> at
>
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)

> at
>
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)

> at
>
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)

> at
>
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)

> at
>
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)

> at
>
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)

> at
>
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)

> at
>
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)

> at
>
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)

> at
>
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)

> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)

> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)

> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

> at
>
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

> at
>
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

> at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

> at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

> at java.lang.Thread.run(Thread.java:745)
> ERROR - 2017-04-03 12:27:11.645; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
>
org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;

> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)

> at
>
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)

> at
>
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)

> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)

> at
>
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)

> at
>
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)

> at
>
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)

> at
>
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)

> at
>
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)

> at
>
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)

> at
>
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)

> at
>
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)

> at
>
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)

> at
>
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)

> at
>
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)

> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

> at
>
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)

> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)

> at
>
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

> at
>
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

> at
>
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

> at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

> at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

> at java.lang.Thread.run(Thread.java:745)
> W
>
>
>
>
> Salih Şen
> M: +90 533 131 17 07
> E: salih@dilisim.com
> W: www.dilisim.com
> Skype: slhsen

Re: Solr Cloud 6.5.0 Replicas go down while indexing

Posted by Erick Erickson <er...@gmail.com>.
bq: We set Auto hardcommit time to 15sec and 10000 docs, and soft
commit to 60000 sec and 5000 seconds

Just a sanity check, the commit intervals are in milliseconds, your
units look mixed up above, I'm guessing it's just a typo though. I
usually don't use maxDocs because it's unpredictable. Say you're
indexing at a furious rate. If you are indexing at 5,000 docs a second
(and assuming the above was supposed to be soft committing every 60
seconds or 5,000 docs) you'll still be autocommitting every second.

While that could be related, it's not particularly germane to your
timeout. My guess is that you're getting these errors on the leader?
what do you have in solr.xml for:

distribUpdateConnTimeout and distribUpdateSoTimeout

Those are likely the timeouts that matter. And how big are your
documents? The scenario I'm thinking of is that the leader sends the
update to the replica and the timeout for the replica's response
exceeds the ones above.

BTW, it can be useful on startup to look at your solr.log. The
_actual_ values for all the timeouts are printed out, including any
sysvars you've used.

And how are you indexing? Mostly I'm wondering how fast you're sending
docs to each leader and how.

Best,
Erick

On Mon, Apr 3, 2017 at 6:52 AM, Salih Sen <sa...@dilisim.com> wrote:
> Hi,
>
> We have a three server set up with each server having 756G ram, 48 cores,
> 4SSDs (each having tree solr instances on them) and a dedicated mechanical
> disk for zookeeper (3 zk instances total). Each Solr instances have 31G of
> heap space allocated to them. In total we have 36 Solr Instances and 3
> Zookeeper instances (with 1G heapspace). Also servers 10Gig network between
> them.
>
> We set Auto hardcommit time to 15sec and 10000 docs, and soft commit to
> 60000 sec and 5000 seconds in order to avoid soft committing too much and
> avoiding indexing bottlenecks. We also set DzkClientTimeout=90000.
>
> But it seems replicas still randomly go down while indexing. Do you have any
> suggestions to prevent this situation?
>
> ERROR - 2017-04-03 12:24:02.503; [   ]
> org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from shard:
> http://192.168.30.33:9132/solr
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://192.168.30.33:9132/solr
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:621)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
> at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)
> ... 12 more
> ERROR - 2017-04-03 12:27:11.631; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ERROR - 2017-04-03 12:27:11.633; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ERROR - 2017-04-03 12:27:11.645; [c:doc s:shard3 r:core_node22
> x:doc_shard3_replica3]
> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;
> error
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> W
>
>
>
>
> Salih Şen
> M: +90 533 131 17 07
> E: salih@dilisim.com
> W: www.dilisim.com
> Skype: slhsen

Re: Solr Cloud 6.5.0 Replicas go down while indexing

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/3/2017 7:52 AM, Salih Sen wrote:
> We have a three server set up with each server having 756G ram, 48
> cores, 4SSDs (each having tree solr instances on them) and a dedicated
> mechanical disk for zookeeper (3 zk instances total). Each Solr
> instances have 31G of heap space allocated to them. In total we have
> 36 Solr Instances and 3 Zookeeper instances (with 1G heapspace). Also
> servers 10Gig network between them.

You haven't described your index(es).  How many collections in the
cloud?  How many shards for each?  How many replicas for each shard? 
How many docs in each collection?  How much *total* index data is on
each of those systems?  To determine this, add up the size of the solr
home in all of the Solr instances that exist on that server.  With this
information, we can make an educated guess about whether the setup you
have engineered is reasonably correct for the scale of your data.

It sounds like you have twelve Solr instances per server, with each one
using a 31GB heap.  That's 372GB of memory JUST for Solr heaps.  Unless
you're dealing with terabytes of index data and hundreds of millions (or
billions) of documents, I cannot imagine needing that many Solr
instances per server or that much heap memory.

Have you increased the maximum number of processes that the user which
is running Solr can have?  12 instances of Solr is going to be a LOT of
threads, and on most operating systems, each thread counts against the
user process limit.  Some operating systems might have a separate
configuration for thread limits, but I do know that Linux does not, and
counts them as processes.

> We set Auto hardcommit time to 15sec and 10000 docs, and soft commit
> to 60000 sec and 5000 seconds in order to avoid soft committing too
> much and avoiding indexing bottlenecks. We also
> set DzkClientTimeout=90000.

Side issue: It's generally preferable to only use either maxDoc or
maxTime, and maxTime will usually result in more predictable behavior,
so I recommend removing the maxDoc settings on autoCommit and
autoSoftCommit.  I doubt this will have any effect on the problem you're
experiencing, just something I noticed.  I recommend a maxTime of 60000
(one minute) for autoCommit, with openSearcher set to false, and a
maxTime of at least 120000 (two minutes) for autoSoftCommit.  If these
seem excessively high to you, go with 30000 and 60000.

On zkClientTimeout, unless you have increased the ZK server tickTime,
you'll find that you can't actually define a zkClientTimeout that high. 
The maximum is 20*tickTime.  A typical tickTime value is 2000, which
means that the usual maximum value for zkClientTimeout is 40 seconds. 
The error you've reported doesn't look related to zkClientTimeout, so
increasing that beyond 30 seconds is probably unnecessary.  The default
values for Zookeeper server tuning have been worked on by the ZK
developers for years.  I wouldn't mess with tickTime without a REALLY
good reason.

Another side issue: Putting Zookeeper data on a mechanical disk when
there are SSDs available seems like a mistake to me.  Zookeeper is even
more sensitive to disk performance than Solr is.

> But it seems replicas still randomly go down while indexing. Do you
> have any suggestions to prevent this situation?
<snip>
> Caused by: java.net.SocketTimeoutException: Read timed out

This error says that a TCP connection (http on port 9132) from one Solr
server to another hit the socket timeout -- there was no activity on the
connection for whatever the timeout is set to.  Usually a problem like
this has two causes:

1) A *serious* performance issue with Solr resulting in an incredibly
long processing time.  Most performance issues are memory-related.
2) The socket timeout has been set to a very low value.

In a later message on the thread, you indicated that the configured
socket timeout is ten minutes.  This should be plenty, and makes me
think option number one above is what we are dealing with, and the
information I asked for in the first paragraph of this reply is required
for any deeper insight.

Are there other errors in the Solr logfile that you haven't included? 
It seems likely that this is not the only problem Solr has encountered.

Thanks,
Shawn