You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2018/04/27 18:01:00 UTC

[jira] [Comment Edited] (SOLR-11881) Connection Reset Causing LIR

    [ https://issues.apache.org/jira/browse/SOLR-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456842#comment-16456842 ] 

Mark Miller edited comment on SOLR-11881 at 4/27/18 6:00 PM:
-------------------------------------------------------------

It's been a while since I've thought about this, mind is begining to churn again.

Does this even help? We stream updates from leader to the replica, and streaming cannot be retried, you'd have to buffer the stream or something. It gets a can't retry exception.


was (Author: markrmiller@gmail.com):
It's been a while since I've thought about this, mind is begin to churn again.

Does this even help? We stream updates from leader to the client, and streaming cannot be retried, you'd have to buffer the stream or something. It gets a can't retry exception.

> Connection Reset Causing LIR
> ----------------------------
>
>                 Key: SOLR-11881
>                 URL: https://issues.apache.org/jira/browse/SOLR-11881
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Major
>         Attachments: SOLR-11881.patch, SOLR-11881.patch
>
>
> We can see that a connection reset is causing LIR.
> If a leader -> replica update get's a connection like this the leader will initiate LIR
> {code}
> 2018-01-08 17:39:16.980 ERROR (qtp1010030701-468988) [c:person s:shard7_1 r:core_node56 x:person_shard7_1_replica1] o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on replica https://host08.domain:8985/solr/person_shard7_1_replica2/
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:210)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
>         at sun.security.ssl.InputRecord.read(InputRecord.java:503)
>         at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
>         at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
>         at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
>         at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
>         at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
>         at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
>         at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
>         at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
>         at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
>         at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
>         at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
>         at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>         at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
>         at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:312)
>         at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
>         at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> From https://issues.apache.org/jira/browse/SOLR-6931 Mark says "On a heavy working SolrCloud cluster, even a rare response like this from a replica can cause a recovery and heavy cluster disruption" . 
> Looking at SOLR-6931 we added a http retry handler but we only retry on GET requests. Updates are POST requests {{ConcurrentUpdateSolrClient#sendUpdateStream}}
> Update requests between the leader and replica should be retry-able since they have been versioned. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org