You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Hayato Shimizu (JIRA)" <ji...@apache.org> on 2013/06/18 17:17:20 UTC

[jira] [Comment Edited] (CASSANDRA-5632) Cross-DC bandwidth-saving broken

    [ https://issues.apache.org/jira/browse/CASSANDRA-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686809#comment-13686809 ] 

Hayato Shimizu edited comment on CASSANDRA-5632 at 6/18/13 3:17 PM:
--------------------------------------------------------------------

The patch fixes the issue of bandwidth-saving.

However, there seems to be two regressive issues being introduced.

1. Secondary DC coordinator selection by the primary DC coordinator is not equal across all available nodes in secondary DC.
2. When using cqlsh, with EACH_QUORUM/ALL, with tracing on, on a row insert, RPC timeout occurs from a node that is not verifiable in the trace output.

Trace output has been attached for a 6 node cluster, DC1:3, DC2:3 replication factor configuration. network-topology configuration is also attached for clarity.
                
      was (Author: hayato.shimizu):
    The patch fixes the issue of bandwidth-saving.

However, there seems to be two regressive issues being introduced.

1. Secondary DC coordinator node is always the same node. This introduces a bottleneck in the secondary DC.
2. When using cqlsh, with EACH_QUORUM/ALL, with tracing on, on a row insert, RPC timeout occurs from a node that is not verifiable in the trace output.

Trace output has been attached for a 6 node cluster, DC1:3, DC2:3 replication factor configuration. network-topology configuration is also attached for clarity.
                  
> Cross-DC bandwidth-saving broken
> --------------------------------
>
>                 Key: CASSANDRA-5632
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5632
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.2.6
>
>         Attachments: 5632.txt, cassandra-topology.properties, fix_patch_bug.log
>
>
> We group messages by destination as follows to avoid sending multiple messages to a remote datacenter:
> {code}
>         // Multimap that holds onto all the messages and addresses meant for a specific datacenter
>         Map<String, Multimap<Message, InetAddress>> dcMessages
> {code}
> When we cleaned out the MessageProducer stuff for 2.0, this code
> {code}
>                     Multimap<Message, InetAddress> messages = dcMessages.get(dc);
> ...
>                     messages.put(producer.getMessage(Gossiper.instance.getVersion(destination)), destination);
> {code}
> turned into
> {code}
>                     Multimap<MessageOut, InetAddress> messages = dcMessages.get(dc);
> ...
>                     messages.put(rm.createMessage(), destination);
> {code}
> Thus, we weren't actually grouping anything anymore -- each destination replica was stored under a separate Message key, unlike under the old CachingMessageProducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira