You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by "Christie, Marcus Aaron" <ma...@iu.edu> on 2017/05/26 20:49:23 UTC

Zookeeper log filling up disk

Dev,

This message is just to document what happened today in the SGRC dev environment.  But if you have any insight on what caused this to happen, please share.

TLDR: Zookeeper log fills disk, Logstash is spamming ZK with requests, not sure what caused it but have reconfigured ZK logging to rotate files and not fill up disk.

So today in our dev environment the Zookeeper server’s log file filled up the disk. The size was about 190GB.  It wasn’t being rotated so it has possibly been growing for a while. On the other hand I saw in the log file that there were several messages a second, that looked like this:

2017-05-26 11:42:35,070 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /127.0.0.1:46462 (no session established for client)
2017-05-26 11:42:35,070 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:46464
2017-05-26 11:42:35,071 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: Unreasonable length = 1684371039
2017-05-26 11:42:35,071 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /127.0.0.1:46464 (no session established for client)
2017-05-26 11:42:35,071 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:46466
2017-05-26 11:42:35,071 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: Unreasonable length = 1684371039
2017-05-26 11:42:35,071 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /127.0.0.1:46466 (no session established for client)
2017-05-26 11:42:35,072 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:46468
2017-05-26 11:42:35,072 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: Unreasonable length = 1684371039
2017-05-26 11:42:35,072 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /127.0.0.1:46468 (no session established for client)
…


I shut down api-orch and gfac and Kafka (which is just pushing log messages to Logstash).  I then deleted the ./data directory in the Zookeeper installation and restarted it. Still getting those error messages above.

Eventually I found that Logstash was apparently trying to make Zookeeper connections. So I shut it down as well.  Once I shut down Logstash the error messages stop in the Zookeeper log.

It’s hard to say whether the problem was
1. Logstash inappropriately sending too large messages to Zookeeper
2. Or, Zookeeper’s log file fills up disk space causing Zookeeper’s database to become corrupted. Once the disk fills up, all sorts of weird behavior can start to manifest.

I’ve reconfigured the Zookeeper logging to log to a rotated file, that rotates at 10MB and keeps a max of 10 rotated log files.  That should prevent running out of disk space. Used this [1] as a resource.  I created issue AIRAVATA-2411 [2] to incorporate this into our Ansible scripts.


Thanks,

Marcus


[1] https://community.hortonworks.com/content/supportkb/49091/zookeeperout-file-keeps-growing-until-restarted.html
[2] https://issues.apache.org/jira/browse/AIRAVATA-2411

Re: Zookeeper log filling up disk

Posted by "Christie, Marcus Aaron" <ma...@iu.edu>.

It occurs to me that this may be related to the large number of TIME_WAIT connections we’ve seen before [1].  Since this error was creating and closing several connections per second it ended up generating lots of connections in TIME_WAIT.

I just thought I would mention this in case it comes up again. I don’t think we looked at Kafka/Logstash last time the TIME_WAIT problem occurred.

[1] https://issues.apache.org/jira/browse/AIRAVATA-2321

On May 26, 2017, at 4:47 PM, Christie, Marcus Aaron <ma...@iu.edu>> wrote:

Dev,

This message is just to document what happened today in the SGRC dev environment.  But if you have any insight on what caused this to happen, please share.

TLDR: Zookeeper log fills disk, Logstash is spamming ZK with requests, not sure what caused it but have reconfigured ZK logging to rotate files and not fill up disk.

So today in our dev environment the Zookeeper server’s log file filled up the disk. The size was about 190GB.  It wasn’t being rotated so it has possibly been growing for a while. On the other hand I saw in the log file that there were several messages a second, that looked like this:

2017-05-26 11:42:35,070 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /127.0.0.1:46462 (no session established for client)
2017-05-26 11:42:35,070 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:46464
2017-05-26 11:42:35,071 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: Unreasonable length = 1684371039
2017-05-26 11:42:35,071 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /127.0.0.1:46464 (no session established for client)
2017-05-26 11:42:35,071 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:46466
2017-05-26 11:42:35,071 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: Unreasonable length = 1684371039
2017-05-26 11:42:35,071 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /127.0.0.1:46466 (no session established for client)
2017-05-26 11:42:35,072 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:46468
2017-05-26 11:42:35,072 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: Unreasonable length = 1684371039
2017-05-26 11:42:35,072 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /127.0.0.1:46468 (no session established for client)
…


I shut down api-orch and gfac and Kafka (which is just pushing log messages to Logstash).  I then deleted the ./data directory in the Zookeeper installation and restarted it. Still getting those error messages above.

Eventually I found that Logstash was apparently trying to make Zookeeper connections. So I shut it down as well.  Once I shut down Logstash the error messages stop in the Zookeeper log.

It’s hard to say whether the problem was
1. Logstash inappropriately sending too large messages to Zookeeper
2. Or, Zookeeper’s log file fills up disk space causing Zookeeper’s database to become corrupted. Once the disk fills up, all sorts of weird behavior can start to manifest.

I’ve reconfigured the Zookeeper logging to log to a rotated file, that rotates at 10MB and keeps a max of 10 rotated log files.  That should prevent running out of disk space. Used this [1] as a resource.  I created issue AIRAVATA-2411 [2] to incorporate this into our Ansible scripts.


Thanks,

Marcus


[1] https://community.hortonworks.com/content/supportkb/49091/zookeeperout-file-keeps-growing-until-restarted.html
[2] https://issues.apache.org/jira/browse/AIRAVATA-2411