You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jörn Heissler (JIRA)" <ji...@apache.org> on 2017/11/20 04:39:01 UTC
[jira] [Comment Edited] (CASSANDRA-14012) Document gossip protocol

    [ https://issues.apache.org/jira/browse/CASSANDRA-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258811#comment-16258811 ] 

Jörn Heissler edited comment on CASSANDRA-14012 at 11/20/17 4:38 AM:
---------------------------------------------------------------------

tl;dr: local DNS name as broadcast address, DNS resolver answered with 127.0.1.1.

And here's the story of me being stupid:

Cassandra configuration looks like this:
{noformat}
broadcast_rpc_address: cassandra1
{noformat}

I tried to join this node to an existing cluster, replacing an old "cassandra1" with different IP address. It didn't work, logs indicated some kind of communication problem, nothing helpful.

So I started a network sniffer (ngrep) to analyze the problem. The new node connected to the seeds and sent some messages, including the bytes {noformat}7f 00 01 01{noformat} along with lots of other bytes. I didn't know the protocol so I couldn't make much sense of any of this. Those 4 bytes didn't draw my attention either (in retrospect they should have...). What I was able to guess was that my new node sends the same message multiple times but never gets a response, aside from an initial connection ack of some kind. I also looked at the communication between two old nodes: gossip connections appear to be unidirectional. For two nodes to communicate, two connections are needed, each established by the respective sender.
So I was wondering if the initial connect to the seed would also be 2x unidirectional and if so, how the seed node would learn the ip address of my new node. I couldn't locate the new IP address in the packet dump which I thought strange.
I asked on IRC, they suggested that it could be related to my broadcast address.

Some time later I ran tcpdump to verify if cassandra would try to resolve the broadcast address. And my DNS resolver answered with two addresses, 127.0.1.1 and the real one. Problem was such an entry in /etc/hosts and dnsmasq picking it up. I fixed it and cassandra joined my new node to the cluster. And my cassandra nodes lived happily ever after.

For clarity, `is a bad broadcast address` isn't printed anywhere, that's only how I describe this issue.


was (Author: wulf4096):
tl;dr: local DNS name as broadcast address, DNS resolver answered with 127.0.1.1.

And here's the story of me being stupid:

Cassandra configuration looks like this:
{noformat}
broadcast_rpc_address: cassandra1
{noformat}

I tried to join this node to an existing cluster, replacing an old "cassandra1" with different IP address. It didn't work, logs indicated some kind of communication problem, nothing helpful.

So I started a network sniffer (ngrep) to analyze the problem. The new node connected to the seeds and sent some messages, including the bytes {noformat}7f 00 01 01{noformat} along with lots of other bytes. I didn't know the protocol so I couldn't make much sense of any of this. Those 4 bytes didn't draw my attention either (in retrospect they should have...). What I was able to guess was that my new node sends the same message multiple times but never gets a response, aside from an initial connection ack of some kind. I also looked at the communication between two old nodes: gossip connections appear to be unidirectional. For two nodes to communicate, two connections are needed, each established by the respective sender.
So I was wondering if the initial connect to the seed would also be 2x unidirectional and if so, how the seed node would learn the ip address of my new node. I couldn't locate the new IP address in the packet dump which I thought strange.
I asked on IRC, they suggested that it could be related to my broadcast address.

Some time later I ran tcpdump to verify if cassandra would try to resolve the broadcast address. And my DNS resolver answered with two addresses, 127.0.1.1 and the real one. Problem was such an entry in /etc/hosts and dnsmasq picking it up. I fixed it and cassandra joined my new node to the cluster. And my cassandra nodes lived happily ever after.

For clarity, {noformat}is a bad broadcast address{noformat} isn't printed anywhere, that's only how I describe this issue.

> Document gossip protocol
> ------------------------
>
>                 Key: CASSANDRA-14012
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14012
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jörn Heissler
>            Priority: Minor
>              Labels: Documentation
>
> I had an issue today with two nodes communicating with each other; there's a flaw in my configuration (wrong broadcast address).
> I saw a little bit of traffic on port 7000, but I couldn't understand it for lack of documentation.
> With documentation I would have understood my issue very quickly (7f 00 01 01 is a bad broadcast address!). But I didn't recognize those 4 bytes as the bc address.
> Could you please document the gossip protocol?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org