You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by neoman <ha...@gmail.com> on 2013/09/12 16:18:42 UTC

Solr cloud shard goes down after SocketException in another shard

Exception in  shard1 (solr01-prod) primary
<09/12/13
13:56:46:635|http-bio-8080-exec-66|ERROR|apache.solr.servlet.SolrDispatchFilter|null:ClientAbortException: 
java.net.SocketException: Broken pipe
        at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:406)
        at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:342)
        at
org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:431)
        at
org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:419)
        at
org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91)
        at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
        at
org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:95)
        at
org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:470)
        at
org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:545)
        at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:232)
        at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
        at
org.apache.solr.common.util.JavaBinCodec.writeSolrDocument(JavaBinCodec.java:320)
        at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:257)
        at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
        at
org.apache.solr.common.util.JavaBinCodec.writeArray(JavaBinCodec.java:427)
        at
org.apache.solr.common.util.JavaBinCodec.writeSolrDocumentList(JavaBinCodec.java:356)


Exception in  shard1 (solr08-prod) secondary

<09/12/13
13:56:46:729|http-bio-8080-exec-50|ERROR|apache.solr.core.SolrCore|org.apache.solr.common.SolrException:
ClusterState says we are the leader (http://solr08-prod:8080/solr/aq-core),
but locally we don't think so. Request came from
http://solr03-prod.phneaz:8080/solr/aq-core/
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
        at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)

Out configuration 
Solr 4.4, Tomcat 7, 3 shards
Thanks for your help



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cloud-shard-goes-down-after-SocketException-in-another-shard-tp4089576.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr cloud shard goes down after SocketException in another shard

Posted by Greg Walters <gw...@sherpaanalytics.com>.
Neoman,

Make sure that solr08-prod (or the elected leader at any time) isn't doing a stop-the-world garbage collection that takes long enough that the zookeeper connection times out. I've seen that in my cluster when I didn't have parallel GC enabled and my "zkClientTimeout" in solr.xml was too low.

Thanks,
Greg

-----Original Message-----
From: neoman [mailto:hariram.s@gmail.com] 
Sent: Thursday, September 12, 2013 9:19 AM
To: solr-user@lucene.apache.org
Subject: Solr cloud shard goes down after SocketException in another shard

Exception in  shard1 (solr01-prod) primary
<09/12/13
13:56:46:635|http-bio-8080-exec-66|ERROR|apache.solr.servlet.SolrDispatchFilter|null:ClientAbortException: 
java.net.SocketException: Broken pipe
        at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:406)
        at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:342)
        at
org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:431)
        at
org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:419)
        at
org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91)
        at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
        at
org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:95)
        at
org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:470)
        at
org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:545)
        at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:232)
        at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
        at
org.apache.solr.common.util.JavaBinCodec.writeSolrDocument(JavaBinCodec.java:320)
        at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:257)
        at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
        at
org.apache.solr.common.util.JavaBinCodec.writeArray(JavaBinCodec.java:427)
        at
org.apache.solr.common.util.JavaBinCodec.writeSolrDocumentList(JavaBinCodec.java:356)


Exception in  shard1 (solr08-prod) secondary

<09/12/13
13:56:46:729|http-bio-8080-exec-50|ERROR|apache.solr.core.SolrCore|org.apache.solr.common.SolrException:
ClusterState says we are the leader (http://solr08-prod:8080/solr/aq-core),
but locally we don't think so. Request came from http://solr03-prod.phneaz:8080/solr/aq-core/
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
        at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)

Out configuration
Solr 4.4, Tomcat 7, 3 shards
Thanks for your help



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cloud-shard-goes-down-after-SocketException-in-another-shard-tp4089576.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr cloud shard goes down after SocketException in another shard

Posted by Greg Walters <gw...@sherpaanalytics.com>.
Neoman,

I've got ours set at 45 seconds:

<int name="zkClientTimeout">${zkClientTimeout:45000}</int>


-----Original Message-----
From: neoman [mailto:hariram.s@gmail.com] 
Sent: Thursday, September 12, 2013 9:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud shard goes down after SocketException in another shard

Thanks greg. Currently we have 60 seconds (we reduced it recently). I may have to reduce it again. can you please share your timeout value.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cloud-shard-goes-down-after-SocketException-in-another-shard-tp4089576p4089582.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cloud shard goes down after SocketException in another shard

Posted by neoman <ha...@gmail.com>.
Thanks greg. Currently we have 60 seconds (we reduced it recently). I may
have to reduce it again. can you please share your timeout value.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cloud-shard-goes-down-after-SocketException-in-another-shard-tp4089576p4089582.html
Sent from the Solr - User mailing list archive at Nabble.com.