You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sergio Bilello <la...@gmail.com> on 2019/10/14 21:34:49 UTC
Cassadra node join problem
Problem:
The cassandra node does not work even after restart throwing this exception:
WARN [Thread-83069] 2019-10-11 16:13:23,713 CustomTThreadPoolServer.java:125 - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Socket closed
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109) ~[apache-cassandra-3.11.4.jar:3.11.4]
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) ~[apache-cassandra-3.11.4.jar:3.11.4]
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:113) ~[apache-cassandra-3.11.4.jar:3.11.4]
at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:134) [apache-cassandra-3.11.4.jar:3.11.4]
The CPU Load goes to 50 and it becomes unresponsive.
Node configuration:
OS: Linux 4.16.13-1.el7.elrepo.x86_64 #1 SMP Wed May 30 14:31:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
This is a working node that does not have the recommended settings but it is working and it is one of the first node in the cluster
cat /proc/23935/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 122422 122422 processes
Max open files 65536 65536 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 122422 122422 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
I tried to bootstrap a new node that joins the existing cluster.
The disk space used is around 400GB SSD over 885GB available
At my first attempt, the node failed and got restarted over and over by systemctl that does not
honor the limits configuration specified and thrown
Caused by: java.nio.file.FileSystemException: /mnt/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/md-52-big-Index.db: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_161]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_161]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_161]
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) ~[na:1.8.0_161]
at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[na:1.8.0_161]
at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[na:1.8.0_161]
at org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:104) ~[apache-cassandra-3.11.4.jar:3.11.4]
.. 20 common frames omitted
^C
I fixed the above by stopping cassandra, cleaning commitlog, saved_caches, hints and data directory and restarting it and getting the PID and run the 2 commands below
sudo prlimit -n1048576 -p <JAVA_PID>
sudo prlimit -u32768 -p <CASSANDRA_PID>
because at the beginning the node didn't even joint the cluster. it was reported by UJ.
After fixing the max open file problem, The node from UpJoining passed to the status UpNormal
The node joined the cluster but after a while, it started to throw
WARN [Thread-83069] 2019-10-11 16:13:23,713 CustomTThreadPoolServer.java:125 - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Socket closed
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109) ~[apache-cassandra-3.11.4.jar:3.11.4]
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) ~[apache-cassandra-3.11.4.jar:3.11.4]
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:113) ~[apache-cassandra-3.11.4.jar:3.11.4]
at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:134) [apache-cassandra-3.11.4.jar:3.11.4]
I compared cassandra.yaml, limits.conf but it looks like that it does not help. I don't know how the current nodes are working since they don't have the recommended cassandra limits.
Any suggestions on the possible culprit?
Please let me know
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org