You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2016/03/10 17:33:40 UTC

[jira] [Commented] (CASSANDRA-11093) processs restarts are failing becase native port and jmx ports are in use

    [ https://issues.apache.org/jira/browse/CASSANDRA-11093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189503#comment-15189503 ] 

Sam Tunnicliffe commented on CASSANDRA-11093:
---------------------------------------------

AFAICT, the {{SO_REUSEADDR}} option only needs to be explicitly set (by C*) in {{RMIServerSocketFactoryImpl}}. Both java NIO and Netty's native epoll {{ServerSocket}} implementations set the reuse address flag to true by default: [jdk|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/Net.java#l389] & [netty|https://github.com/netty/netty/blob/netty-4.0.23.Final/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollServerSocketChannelConfig.java#L42]. The native protocol server will always choose one of these options, so those sockets already have address reuse enabled.

FTR, there is also an undocumented bug in the version of Netty we currently use which causes a {{java.lang.UnsatisfiedLinkError}} when querying the {{SO_REUSEADDR}} when using the native epoll transport, so running the new {{ServerTest}} under linux fails. This bug is present in the latest released version of Netty 4.0, 4.0.34.Final, but has since been fixed on the 4.0 branch.  

As far as the RMI server goes, on my linux box running jdk 1.8.0_74, the ServerSocket created by the default factory already has reuse enabled, but this default value is documented as undefined, so explicitly setting it here seems reasonable. [~Gerrrr], what are the details of the environment where you're seeing this? Also, can you confirm that the change to {{RMIServerSocketFactoryImpl}} is sufficient to fix your problem?


> processs restarts are failing becase native port and jmx ports are in use
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11093
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11093
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Configuration
>         Environment: PROD
>            Reporter: varun
>            Priority: Minor
>              Labels: lhf
>
> A process restart should automatically take care of this. But it is not and it is a problem.
> The ports are are considered in use even if the process has quit/died/killed but the socket is in a TIME_WAIT state in the TCP FSM (http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm).
> tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 192.168.1.2:9160 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 10.130.128.131:58263 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 10.130.128.131:58262 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:9042 :::* LISTEN 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57191 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57190 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37105 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42562 ::ffff:127.0.0.1:7199 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:57190 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57198 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37106 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:10.130.128.131:57197 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57191 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57198 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57197 ESTABLISHED 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42567 ::ffff:127.0.0.1:7199 TIME_WAIT -
> I had to write a restart handler that does a netstat call and looks to make sure all the TIME_WAIT states exhaust before starting the node back up. This happened on 26 of the 56 when a rolling restart was performed. The issue was mostly around JMX port 7199. There was another rollling restart done on the 26 nodes to remediate the JMX ports issue in that restart one node had the issue where port 9042 was considered used after the restart and the process died after a bit of time.
> What needs to be done for port the native port 9042 and JMX port 7199 is to create the underlying TCP socket with SO_REUSEADDR. This eases the restriction and allows the port to be bound by process even if there are sockets open to that port in the TCP FSM, as long as there is no other process listening on that port. There is a Java method available to set this option in java.net.Socket https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%29.
> native port 9042: https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> JMX port 7199: https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40
> Looking in the code itself this option is being set on thrift (9160 (default)) and internode communication ports, uncrypted (7000 (default)) and SSL encrypted (7001 (default)) .
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> This needs to be set to native and jmx ports as well.
> References:
> https://unix.stackexchange.com/questions/258379/when-is-a-port-considered-being-used/258380?noredirect=1
> https://stackoverflow.com/questions/23531558/allow-restarting-java-application-with-jmx-monitoring-enabled-immediately
> https://docs.oracle.com/javase/8/docs/technotes/guides/rmi/socketfactory/
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%293
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)