You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zap Org <za...@gmail.com> on 2016/04/21 06:06:09 UTC

complete cluster shutdown

I have 5 zookeeper and 2 solr machines and after a month or two whole
clustre shutdown i dont know why. The logs i get in zookeeper are attached
below. otherwise i dont get any error. All this is based on linux VM.

2016-03-11 16:50:18,159 [myid:5] - WARN  [SyncThread:5:FileTxnLog@334] -
fsync-ing the write ahead log in SyncThread:5 took 7268ms which will
adversely effect operation latency. See the ZooKeeper troubleshooting guide
2016-03-11 16:50:18,161 [myid:5] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x4535f00ee370001, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2016-03-11 16:50:18,163 [myid:5] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2185:NIOServerCnxn@1007] - Closed socket connection for
client /localhost which had sessionid 0x4535f00ee370001
2016-03-11 16:50:18,166 [myid:5] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x2535ef744dd0005, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)

Re: complete cluster shutdown

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/20/2016 10:06 PM, Zap Org wrote:
> I have 5 zookeeper and 2 solr machines and after a month or two whole
> clustre shutdown i dont know why. The logs i get in zookeeper are attached
> below. otherwise i dont get any error. All this is based on linux VM.
>
> 2016-03-11 16:50:18,159 [myid:5] - WARN  [SyncThread:5:FileTxnLog@334] -
> fsync-ing the write ahead log in SyncThread:5 took 7268ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2016-03-11 16:50:18,161 [myid:5] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x4535f00ee370001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)

You'll need to further describe exactly what "whole cluster shutdown"
means.  I cannot tell from the logs, and there are very few situations I
can imagine where Solr would just die.  I will need to know which
version of Solr you are using.  If zookeeper is separate from Solr, that
version will also be needed.

The logs you have included indicate are all WARN and INFO logs (no
ERROR), and say that the zookeeper client disconnected.  Assuming that
this zookeeper is only used for this one SolrCloud, the zookeeper client
might be Solr, an instance of CloudSolrClient, or it might be the zkcli
script.

One of the later log entries said "/localhost" which suggests that this
is not set up the way I would recommend setting up a production
SolrCloud deployment.  I recommend each Solr running on a separate
machine using the same port number, each Zookeeper running on a separate
machine using the same port number, and everything using an identical
zkHost string.  In that setup, Zookeeper and Solr might share machines,
but none of the machines will be running more than one of each kind of
process.  If you are running that kind of setup, you will never be using
"localhost" or "127.0.0.1" for connecting to zookeeper.

There are no Solr logs included here, so if something is happening with
Solr, I cannot tell what it is.

Thanks,
Shawn


Re: complete cluster shutdown

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
I guess errors like "fsync-ing the write ahead log in SyncThread:5 took
7268ms which will adversely effect operation latency."

and: "likely client has closed socket"

make me wonder if something went wrong in terms of running out of disk
space for logs (thus giving your OS no space for necessary functions)  or
if you ran into memory issues, or if something changed your network /
firewall settings to prevent communication on ports that used to work...?

I'm not an expert on the code, but those kind of external problems is where
I'd start looking if I saw errors like this.

Were all the VM's up and running or were they down too?

On Wed, Apr 20, 2016 at 10:06 PM, Zap Org <za...@gmail.com> wrote:

> I have 5 zookeeper and 2 solr machines and after a month or two whole
> clustre shutdown i dont know why. The logs i get in zookeeper are attached
> below. otherwise i dont get any error. All this is based on linux VM.
>
> 2016-03-11 16:50:18,159 [myid:5] - WARN  [SyncThread:5:FileTxnLog@334] -
> fsync-ing the write ahead log in SyncThread:5 took 7268ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2016-03-11 16:50:18,161 [myid:5] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x4535f00ee370001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2016-03-11 16:50:18,163 [myid:5] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@1007] - Closed socket connection for
> client /localhost which had sessionid 0x4535f00ee370001
> 2016-03-11 16:50:18,166 [myid:5] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x2535ef744dd0005, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
>