You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "C. Scott Andreas (JIRA)" <ji...@apache.org> on 2018/12/10 17:56:00 UTC
[jira] [Commented] (CASSANDRA-14924) Cassandra nodes becomes unreachable to each other

    [ https://issues.apache.org/jira/browse/CASSANDRA-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715238#comment-16715238 ] 

C. Scott Andreas commented on CASSANDRA-14924:
----------------------------------------------

Hi [~venksta], thanks for your report. This bug tracker is primarily used by contributors of the Apache Cassandra project toward development of the database itself. Can you reach out to the user's list or public IRC channel for support? A member of the community may be able to help.

Here's a page with information on the best channels for support: http://cassandra.apache.org/community/

> Cassandra nodes becomes unreachable to each other
> -------------------------------------------------
>
>                 Key: CASSANDRA-14924
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14924
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: ventsislav
>            Priority: Critical
>
> I have 3 nodes of elassandra running in docker containers.
> Containers created like:
> {code:java}
> > Host 10.0.0.1 : docker run --name elassandra-node-1 --net=host -e CASSANDRA_SEEDS="10.0.0.1" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest{code}
> {code:java}
> > Host 10.0.0.2 : docker run --name elassandra-node-2 --net=host -e CASSANDRA_SEEDS="10.0.0.1,10.0.0.2" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest{code}
> {code:java}
> > Host 10.0.0.3 : docker run --name elassandra-node-3 --net=host -e CASSANDRA_SEEDS="10.0.0.1,10.0.0.2,10.0.0.3" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest{code}
> Cluster was working fine for a couple of days since created, elastic, cassandra all was perfect.
> Currently however all cassandra nodes became unreachable to each other:
>  Nodetool status on all nodes is like
> {code:java}
> > Datacenter: DC1
>  > ===============
>  > Status=Up/Down
>  > |/ State=Normal/Leaving/Joining/Moving
>  > – Address Load Tokens Owns (effective) Host ID Rack
>  > DN 10.0.0.3 11.95 GiB 8 100.0% 7652f66e-194e-4886-ac10-0fc21ac8afeb r1
>  > DN 10.0.0.2 11.92 GiB 8 100.0% b91fa129-1dd0-4cf8-be96-9c06b23daac6 r1
>  > UN 10.0.0.1 11.9 GiB 8 100.0% 5c1afcff-b0aa-4985-a3cc-7f932056c08f r1{code}
> Where the UN is the current host 10.0.0.1
>  Same on all other nodes.
> Nodetool describecluster on 10.0.0.1 is like
> {code:java}
> > Cluster Information:
>  > Name: BD Storage
>  > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>  > DynamicEndPointSnitch: enabled
>  > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>  > Schema versions:
>  > 24fa5e55-3935-3c0e-9808-99ce502fe98d: [10.0.0.1]
>  > 
>  > UNREACHABLE: [10.0.0.2,10.0.0.3]{code}
> When attached to the first node its only repeating these infos:
> {code:java}
> > 2018-12-09 07:47:32,927 WARN [OptionalTasks:1] org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:361) CassandraRoleManager skipped default role setup: some nodes were not ready
>  > 2018-12-09 07:47:32,927 INFO [OptionalTasks:1] org.apache.cassandra.auth.CassandraRoleManager$4.run(CassandraRoleManager.java:400) Setup task failed with error, rescheduling
>  > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.2] org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) Handshaking version with /10.0.0.2
>  > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.3] org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) Handshaking version with /10.0.0.3{code}
> After a while when some node is restarted:
> {code:java}
> > 2018-12-09 07:52:21,972 WARN [MigrationStage:1] org.apache.cassandra.service.MigrationTask.runMayThrow(MigrationTask.java:67) Can't send schema pull request: node /10.0.0.2 is down.{code}
> Tried so far:
>  Restarting all containers at the same time
>  Restarting all containers one after another
>  Restarting cassandra in all containers like : service cassandra restart
>  Nodetool disablegossip then enable it
>  Nodetool repair : Repair command #1 failed with error Endpoint not alive: /10.0.0.2
> Seems that all node schemas are different, but I still dont understand why they are marked as down to each other.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org