You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Farzad Panahi (JIRA)" <ji...@apache.org> on 2016/07/28 01:14:21 UTC
[jira] [Commented] (CASSANDRA-9630) Killing cassandra process
results in unclosed connections
[ https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396756#comment-15396756 ]
Farzad Panahi commented on CASSANDRA-9630:
------------------------------------------
I am experiencing similar issue.
Cassandra version: 3.0.8
Environment: Amazon EC2
Error Case:
When I restart Cassandra service on a node, after the node comes up it sees some or all of other nodes as DN even though other nodes see this node as UN.
Here is the output of netstat and nodetool status for this error case:
1. right after stopping cassandra service on node 10.4.68.222:
{code}
--------------------------------------
ip-10-4-54-176
tcp 0 0 10.4.54.176:51268 10.4.68.222:7000 TIME_WAIT
tcp 0 0 10.4.54.176:56135 10.4.68.222:7000 TIME_WAIT
tcp 1 0 10.4.54.176:43697 10.4.68.222:7000 CLOSE_WAIT
tcp 0 0 10.4.54.176:52372 10.4.68.222:7000 TIME_WAIT
--------------------------------------
--------------------------------------
ip-10-4-54-177
tcp 0 0 10.4.54.177:56960 10.4.68.222:7000 TIME_WAIT
tcp 0 0 10.4.54.177:54539 10.4.68.222:7000 TIME_WAIT
tcp 0 0 10.4.54.177:32823 10.4.68.222:7000 TIME_WAIT
tcp 1 0 10.4.54.177:48985 10.4.68.222:7000 CLOSE_WAIT
--------------------------------------
--------------------------------------
ip-10-4-68-222
tcp 0 0 10.4.68.222:7000 10.4.54.176:43697 FIN_WAIT2
tcp 0 0 10.4.68.222:7000 10.4.54.177:48985 FIN_WAIT2
tcp 0 0 10.4.68.222:7000 10.4.68.222:54419 TIME_WAIT
tcp 0 0 10.4.68.222:7000 10.4.43.65:43197 FIN_WAIT2
tcp 0 0 10.4.68.222:7000 10.4.68.221:44149 FIN_WAIT2
tcp 0 0 10.4.68.222:7000 10.4.68.222:41302 TIME_WAIT
tcp 0 0 10.4.68.222:7000 10.4.43.66:54321 FIN_WAIT2
--------------------------------------
--------------------------------------
ip-10-4-68-221
tcp 0 0 10.4.68.221:49599 10.4.68.222:7000 TIME_WAIT
tcp 0 0 10.4.68.221:55033 10.4.68.222:7000 TIME_WAIT
tcp 0 0 10.4.68.221:51628 10.4.68.222:7000 TIME_WAIT
tcp 1 0 10.4.68.221:44149 10.4.68.222:7000 CLOSE_WAIT
--------------------------------------
--------------------------------------
ip-10-4-43-66
tcp 0 0 10.4.43.66:55930 10.4.68.222:7000 TIME_WAIT
tcp 1 0 10.4.43.66:54321 10.4.68.222:7000 CLOSE_WAIT
tcp 0 0 10.4.43.66:60968 10.4.68.222:7000 TIME_WAIT
tcp 0 0 10.4.43.66:49087 10.4.68.222:7000 TIME_WAIT
--------------------------------------
--------------------------------------
ip-10-4-43-65
tcp 1 0 10.4.43.65:43197 10.4.68.222:7000 CLOSE_WAIT
tcp 0 0 10.4.43.65:36467 10.4.68.222:7000 TIME_WAIT
tcp 0 0 10.4.43.65:53317 10.4.68.222:7000 TIME_WAIT
tcp 0 0 10.4.43.65:54897 10.4.68.222:7000 TIME_WAIT
--------------------------------------
{code}
2. a bit after stopping cassandra service on node 10.4.68.222:
{code}
--------------------------------------
ip-10-4-54-176
tcp 1 0 10.4.54.176:43697 10.4.68.222:7000 CLOSE_WAIT
--------------------------------------
--------------------------------------
ip-10-4-54-177
--------------------------------------
--------------------------------------
ip-10-4-68-222
--------------------------------------
--------------------------------------
ip-10-4-68-221
tcp 1 0 10.4.68.221:44149 10.4.68.222:7000 CLOSE_WAIT
--------------------------------------
--------------------------------------
ip-10-4-43-66
tcp 1 0 10.4.43.66:54321 10.4.68.222:7000 CLOSE_WAIT
--------------------------------------
--------------------------------------
ip-10-4-43-65
tcp 1 0 10.4.43.65:43197 10.4.68.222:7000 CLOSE_WAIT
--------------------------------------
{code}
3. after starting cassandra service on node 10.4.68.222:
{code}
--------------------------------------
ip-10-4-54-176
tcp 0 0 10.4.54.176:42460 10.4.68.222:7000 ESTABLISHED
tcp 1 303403 10.4.54.176:43697 10.4.68.222:7000 CLOSE_WAIT
tcp 0 0 10.4.54.176:42109 10.4.68.222:7000 ESTABLISHED
--------------------------------------
--------------------------------------
ip-10-4-54-177
tcp 0 0 10.4.54.177:43687 10.4.68.222:7000 ESTABLISHED
tcp 0 0 10.4.54.177:56107 10.4.68.222:7000 ESTABLISHED
tcp 0 0 10.4.54.177:39426 10.4.68.222:7000 ESTABLISHED
--------------------------------------
--------------------------------------
ip-10-4-68-222
tcp 0 0 10.4.68.222:7000 0.0.0.0:* LISTEN
tcp 0 0 10.4.68.222:7000 10.4.54.176:42109 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.54.177:43687 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.54.176:42460 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.43.66:55168 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.43.65:60239 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.54.177:39426 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.43.65:43480 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.68.221:54490 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.68.221:59771 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.54.177:56107 ESTABLISHED
tcp 0 0 10.4.68.222:7000 10.4.43.66:55581 ESTABLISHED
--------------------------------------
--------------------------------------
ip-10-4-68-221
tcp 0 0 10.4.68.221:54490 10.4.68.222:7000 ESTABLISHED
tcp 0 0 10.4.68.221:59771 10.4.68.222:7000 ESTABLISHED
tcp 1 304316 10.4.68.221:44149 10.4.68.222:7000 CLOSE_WAIT
--------------------------------------
--------------------------------------
ip-10-4-43-66
tcp 1 322344 10.4.43.66:54321 10.4.68.222:7000 CLOSE_WAIT
tcp 0 0 10.4.43.66:55581 10.4.68.222:7000 ESTABLISHED
tcp 0 0 10.4.43.66:55168 10.4.68.222:7000 ESTABLISHED
--------------------------------------
--------------------------------------
ip-10-4-43-65
tcp 1 376331 10.4.43.65:43197 10.4.68.222:7000 CLOSE_WAIT
tcp 0 0 10.4.43.65:43480 10.4.68.222:7000 ESTABLISHED
tcp 0 0 10.4.43.65:60239 10.4.68.222:7000 ESTABLISHED
--------------------------------------
{code}
4. nodetool status on all nodes after starting cassandra service on node 10.4.68.222:
{code}
ip-10-4-54-176
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.4.54.176 127.67 GB 256 47.5% 7163bf77-2fef-4e33-81c1-0e61038dece1 1b
UN 10.4.43.65 124.19 GB 256 46.2% 80265afb-8beb-4887-a696-fc9b75956894 1a
UN 10.4.54.177 136.06 GB 256 50.7% b9010e24-4e92-4212-8a17-65892ea9ff66 1b
UN 10.4.43.66 141.94 GB 256 52.3% b00fdf10-1075-4953-8a96-caf375221684 1a
UN 10.4.68.221 137.12 GB 256 50.7% 37479ec3-7b6d-4537-975c-f9d95e92ee1d 1d
UN 10.4.68.222 141.89 GB 256 52.7% 8df87657-c39b-405a-ba54-d60b577c1429 1d
--------------------------------------
--------------------------------------
ip-10-4-54-177
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.4.54.176 127.63 GB 256 47.5% 7163bf77-2fef-4e33-81c1-0e61038dece1 1b
UN 10.4.43.65 124.19 GB 256 46.2% 80265afb-8beb-4887-a696-fc9b75956894 1a
UN 10.4.54.177 136.06 GB 256 50.7% b9010e24-4e92-4212-8a17-65892ea9ff66 1b
UN 10.4.43.66 141.94 GB 256 52.3% b00fdf10-1075-4953-8a96-caf375221684 1a
UN 10.4.68.221 137.12 GB 256 50.7% 37479ec3-7b6d-4537-975c-f9d95e92ee1d 1d
UN 10.4.68.222 141.89 GB 256 52.7% 8df87657-c39b-405a-ba54-d60b577c1429 1d
------------------------
--------------------------------------
ip-10-4-68-222
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.4.54.176 127.63 GB 256 47.5% 7163bf77-2fef-4e33-81c1-0e61038dece1 1b
DN 10.4.43.65 124.19 GB 256 46.2% 80265afb-8beb-4887-a696-fc9b75956894 1a
UN 10.4.54.177 136.06 GB 256 50.7% b9010e24-4e92-4212-8a17-65892ea9ff66 1b
DN 10.4.43.66 141.94 GB 256 52.3% b00fdf10-1075-4953-8a96-caf375221684 1a
DN 10.4.68.221 137.12 GB 256 50.7% 37479ec3-7b6d-4537-975c-f9d95e92ee1d 1d
UN 10.4.68.222 141.89 GB 256 52.7% 8df87657-c39b-405a-ba54-d60b577c1429 1d
--------------------------------------
--------------------------------------
ip-10-4-68-221
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.4.54.176 127.63 GB 256 47.5% 7163bf77-2fef-4e33-81c1-0e61038dece1 1b
UN 10.4.43.65 124.19 GB 256 46.2% 80265afb-8beb-4887-a696-fc9b75956894 1a
UN 10.4.54.177 136.06 GB 256 50.7% b9010e24-4e92-4212-8a17-65892ea9ff66 1b
UN 10.4.43.66 141.94 GB 256 52.3% b00fdf10-1075-4953-8a96-caf375221684 1a
UN 10.4.68.221 137.12 GB 256 50.7% 37479ec3-7b6d-4537-975c-f9d95e92ee1d 1d
UN 10.4.68.222 141.89 GB 256 52.7% 8df87657-c39b-405a-ba54-d60b577c1429 1d
--------------------------------------
--------------------------------------
ip-10-4-43-66
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.4.54.176 127.67 GB 256 47.5% 7163bf77-2fef-4e33-81c1-0e61038dece1 1b
UN 10.4.43.65 124.19 GB 256 46.2% 80265afb-8beb-4887-a696-fc9b75956894 1a
UN 10.4.54.177 136.06 GB 256 50.7% b9010e24-4e92-4212-8a17-65892ea9ff66 1b
UN 10.4.43.66 141.95 GB 256 52.3% b00fdf10-1075-4953-8a96-caf375221684 1a
UN 10.4.68.221 137.12 GB 256 50.7% 37479ec3-7b6d-4537-975c-f9d95e92ee1d 1d
UN 10.4.68.222 141.89 GB 256 52.7% 8df87657-c39b-405a-ba54-d60b577c1429 1d
--------------------------------------
--------------------------------------
ip-10-4-43-65
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.4.54.176 127.67 GB 256 47.5% 7163bf77-2fef-4e33-81c1-0e61038dece1 1b
UN 10.4.43.65 124.19 GB 256 46.2% 80265afb-8beb-4887-a696-fc9b75956894 1a
UN 10.4.54.177 136.06 GB 256 50.7% b9010e24-4e92-4212-8a17-65892ea9ff66 1b
UN 10.4.43.66 141.94 GB 256 52.3% b00fdf10-1075-4953-8a96-caf375221684 1a
UN 10.4.68.221 137.12 GB 256 50.7% 37479ec3-7b6d-4537-975c-f9d95e92ee1d 1d
UN 10.4.68.222 141.89 GB 256 52.7% 8df87657-c39b-405a-ba54-d60b577c1429 1d
--------------------------------------
{code}
> Killing cassandra process results in unclosed connections
> ---------------------------------------------------------
>
> Key: CASSANDRA-9630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9630
> Project: Cassandra
> Issue Type: Bug
> Components: Distributed Metadata, Streaming and Messaging
> Reporter: Paulo Motta
> Assignee: Paulo Motta
> Priority: Minor
> Fix For: 3.x
>
>
> After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a cassandra process (with SIGTERM), some other nodes maintained a connection with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 minutes.
> So, when we started the killed node again, other nodes could not establish a handshake because of the connections on the CLOSE_WAIT state, so they remained on the DOWN state to each other until the initial connection expired.
> The problem did not happen if I ran a nodetool disablegossip before killing the node.
> I was able to fix this issue by reverting the CASSANDRA-8336 commits (including CASSANDRA-9238). After reverting this, cassandra now closes connection correctly when killed with -TERM, but leaves connections on CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes.
> I did not try to reproduce the problem in a clean environment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)