You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jason Brown (JIRA)" <ji...@apache.org> on 2017/08/16 22:15:00 UTC

[jira] [Commented] (CASSANDRA-12229) Move streaming to non-blocking IO and netty (streaming 2.1)

    [ https://issues.apache.org/jira/browse/CASSANDRA-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129518#comment-16129518 ] 

Jason Brown commented on CASSANDRA-12229:
-----------------------------------------

I've posted a few fixes for correcting streaming preview checks and KeepAlive messages. These were uncovered by running the dtests, which are now passing.

To answer [~aweisberg]'s question about performance, here is the testing methodology I've been using:

- bring up a two-node cluster
- run stress, on another machine, for 10 minutes, with 500 threads. In my environment, this ended up with about 12GB of data per node.
- bring up a third node, on a new machine, and let it bootstrap

To measure how long streaming took, I've basically just taken the timestamp difference between these lines:
- {code}StreamResultFuture.java:90 - [Stream #c1785760-82cf-11e7-8381-a3a1a6cf0d28] Executing streaming plan for Bootstrap{code} (start)
- {code}StorageService.java:1458 - Bootstrap completed!{code} (end)

Bootstrapping and streaming on trunk averaged about to about 3:00 minutes in duration. The netty-based code averaged out to 2:25 minutes. So basically it's about 15-20% improvement in overall streaming times.



> Move streaming to non-blocking IO and netty (streaming 2.1)
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-12229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12229
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>             Fix For: 4.0
>
>
> As followup work to CASSANDRA-8457, we need to move streaming to use netty.
> Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files are transferred between nodes in a cluster. However, the low-level details of the current streaming implementation does not line up nicely with a non-blocking model, so I think this is a good time to review some of those details and add in additional goodness. The current implementation assumes a sequential or "single threaded" approach to the sending of stream messages as well as the transfer of files. In short, after several iterative prototypes, I propose the following:
> 1) use a single bi-diredtional connection (instead of requiring to two sockets & two threads)
> 2) send the "non-file" {{StreamMessage}} s (basically anything not {{OutboundFileMessage}}) via the normal internode messaging. This will require a slight bit more management of the session (the ability to look up a {{StreamSession}} from a static function on {{StreamManager}}, but we have have most of the pieces we need for this already.
> 3) switch to a non-blocking IO model (facilitated via netty)
> 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just be a thing already
> 5) If the entire sstable is to streamed, in addition to the DATA component, transfer all the components of the sstable (primary index, bloom filter, stats, and so on). This way we can avoid the CPU and GC pressure from deserializing the stream into objects. File streaming then amounts to a block-level transfer.
> Note: The progress/results of CASSANDRA-11303 will need to be reflected here, as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org