You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2016/03/24 12:47:25 UTC

[jira] [Updated] (CASSANDRA-11093) processs restarts are failing becase native port and jmx ports are in use

     [ https://issues.apache.org/jira/browse/CASSANDRA-11093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sam Tunnicliffe updated CASSANDRA-11093:
----------------------------------------
       Resolution: Fixed
    Reproduced In: 3.0.5, 3.5  (was: 3.x)
           Status: Resolved  (was: Patch Available)

Alright, CI looks ok (one utest failed but it looks unrelated, passes locally and has previously failed on Jenkins at least once in the past few days), so I've committed to 3.0 in {{3a244d24b8c66e2e6e2664f71e9972b7827ae5f4}} and merged to 3.5 & trunk

> processs restarts are failing becase native port and jmx ports are in use
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11093
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11093
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Configuration
>         Environment: PROD
>            Reporter: varun
>            Priority: Minor
>              Labels: lhf
>
> A process restart should automatically take care of this. But it is not and it is a problem.
> The ports are are considered in use even if the process has quit/died/killed but the socket is in a TIME_WAIT state in the TCP FSM (http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm).
> tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 192.168.1.2:9160 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 10.130.128.131:58263 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 10.130.128.131:58262 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:9042 :::* LISTEN 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57191 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57190 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37105 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42562 ::ffff:127.0.0.1:7199 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:57190 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57198 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37106 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:10.130.128.131:57197 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57191 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57198 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57197 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42567 ::ffff:127.0.0.1:7199 TIME_WAIT -
> I had to write a restart handler that does a netstat call and looks to make sure all the TIME_WAIT states exhaust before starting the node back up. This happened on 26 of the 56 when a rolling restart was performed. The issue was mostly around JMX port 7199. There was another rollling restart done on the 26 nodes to remediate the JMX ports issue in that restart one node had the issue where port 9042 was considered used after the restart and the process died after a bit of time.
> What needs to be done for port the native port 9042 and JMX port 7199 is to create the underlying TCP socket with SO_REUSEADDR. This eases the restriction and allows the port to be bound by process even if there are sockets open to that port in the TCP FSM, as long as there is no other process listening on that port. There is a Java method available to set this option in java.net.Socket https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%29.
> native port 9042: https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> JMX port 7199: https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40
> Looking in the code itself this option is being set on thrift (9160 (default)) and internode communication ports, uncrypted (7000 (default)) and SSL encrypted (7001 (default)) .
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> This needs to be set to native and jmx ports as well.
> References:
> https://unix.stackexchange.com/questions/258379/when-is-a-port-considered-being-used/258380?noredirect=1
> https://stackoverflow.com/questions/23531558/allow-restarting-java-application-with-jmx-monitoring-enabled-immediately
> https://docs.oracle.com/javase/8/docs/technotes/guides/rmi/socketfactory/
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%293
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)