You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Vincent White (JIRA)" <ji...@apache.org> on 2018/12/03 04:59:00 UTC

[jira] [Comment Edited] (CASSANDRA-14192) netstats information mismatch between senders and receivers

    [ https://issues.apache.org/jira/browse/CASSANDRA-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344515#comment-16344515 ] 

Vincent White edited comment on CASSANDRA-14192 at 12/3/18 4:58 AM:
--------------------------------------------------------------------

This is because we now use RangeAwareSSTableWriter to write out the incoming streams to disk. Its getFilename method returns just the keyspace/table rather than a complete filename (since it can write out more than one file during it's existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo which is keyed on the output filename. 

I have been planning an update to netstats to correctly output this information again. I'll update this ticket when I have something useful.


was (Author: vincentwhite):
This is because we now use RangeAwareSSTableWriter to write out the incoming streams to disk. Its getFilename method returns just the keyspace/table rather than a complete filename (since it can write out more than one file during it's existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo which is keyed on the output filename. 

I have been planning an update to netstats to correctly output this information again. I'll update this ticket when I have someone useful.

> netstats information mismatch between senders and receivers
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-14192
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14192
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Observability
>            Reporter: Jonathan Ballet
>            Assignee: Vincent White
>            Priority: Minor
>
> When adding a new node to an existing cluster, the {{netstats}} command called while the node is joining show different statistic values between the node receiving the data and the nodes sending the data.
> Receiving node:
> {code}
> Mode: JOINING
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
>     /172.20.13.184
>     /172.20.30.7
>         Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6 GiB total
>             [...]
>     /172.20.40.128
>     /172.20.16.45
>         Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02 GiB total
>             [...]
>     /172.20.9.63
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         0              0         0
> Small messages                  n/a         0          11121         0
> Gossip messages                 n/a         0          32690         0
> {code}
> Sending node 1:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
>     /172.20.21.19
>         Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB total
>             [...]
> Read Repair Statistics:
> Attempted: 680832
> Mismatch (Blocking): 716
> Mismatch (Background): 279
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         2         123307         4
> Small messages                  n/a         2      637010302       509
> Gossip messages                 n/a        23         798851     11535
> {code}
> Sending node 2:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
>     /172.20.21.19
>         Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB total
>             [...]
> Read Repair Statistics:
> Attempted: 84967
> Mismatch (Blocking): 17568
> Mismatch (Background): 3078
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         2          17818         2
> Small messages                  n/a         2      126082304       507
> Gossip messages                 n/a        34         202810     11725
> {code}
> In this case, the join process is running since a while and the sending nodes seem to say they sent everything already. This output stays the same for a while though (maybe ~15% of the total joining time).
> However, the receiving node values stay like this once the sending nodes have sent everything, until it goes from this state to the {{NORMAL}} state (so there's visually no catching up from ~86 files to ~405 files for example, it goes directly from the state showed above to {{NORMAL}})
> This makes tracking the progress of the join process a bit more difficult than needed, because we need to compare and deduce the actual state from both the receiving node values and the sending nodes values, which are both "not correct" (sending nodes say everything has been sent but stays in this state for a long time, receiving node says it still needs to download lot of files/data before finishing.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org