You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Anubhav Kale (JIRA)" <ji...@apache.org> on 2016/06/17 19:41:05 UTC

[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336787#comment-15336787 ] 

Anubhav Kale commented on CASSANDRA-4663:
-----------------------------------------

I made a change to RangeStreamer to created multiple StreamSessions per host (Split token ranges into chunks equal to the number of sockets). I saw a performance improvement (time-wise) of ~33%. 

Since the same code is used for bootstrap and nodetool rebuild, it will help in both cases. The one side-effect that operators need to be aware of is the number of SS Tables created on destination (since they will blow up corresponding to number of splits).

I suggest we could add a -par option for nodetool rebuild command and let operators provide number of connections. For bootstrap, we can provide yaml setting and default to 1. (If we do decide to add yaml setting, do I need to worry about any version breaking stuff?)

If that makes sense, I will create a patch for trunk. 

> Streaming sends one file at a time serially. 
> ---------------------------------------------
>
>                 Key: CASSANDRA-4663
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4663
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Priority: Minor
>
> This is not fast enough when someone is using SSD and may be 10G link. We should try to create multiple connections and send multiple files in parallel. 
> Current approach under utilize the link(even 1G).
> This change will improve the bootstrapping time of a node. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)