You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jason Brown (JIRA)" <ji...@apache.org> on 2016/12/09 13:34:58 UTC
[jira] [Commented] (CASSANDRA-8457) nio MessagingService

    [ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735327#comment-15735327 ] 

Jason Brown commented on CASSANDRA-8457:
----------------------------------------

Ok, [~slebresne], next round of review is ready :) I'm currently ferreting out a regression with {{TriggersTest}}, but that shouldn't hold up things any further.

bq. {{MessageOutHandler.SwappingByteBufDataOutputStreamPlus}} - I'm wondering if it's not premature optimization

Yeah, you may be correct here. This is one of the first parts I wrote, and clearly I was worried about the "pack multiple messages into a single flush" issue. It's possible that a simpler implementation, each message gets own buffer, might just be best. 

However, one counter argument here could be that when we have a very large message to serialize, think tens of MBs, we would allocate that one large buffer, wait to serialize data into it, and then send/flush it. {{OutboundTcpConnection}} and my current branch will buffer a certain subset of that data (say 64k) and send/flush that, rather than wait for it all to be serialized. I'm not sure if that's compelling enough at this time, or if we consider that followup ticket/work. Either way, I've removed {{MessageOutHandler.SwappingByteBufDataOutputStreamPlus}}.

bq. OMC.enqueue and background queue
Unfortunately, there's a few things that get in our way here of making it that simple, assuming we don't want to block threads. The biggest is that we don't want to write general messages to the channel until the internode messaging handshake completes successfully as it's a three-way handshake and if we start sending general messages in the middle of the handshake, it'll break the handshake. Further, it's not so clean or obvious how to get the ChannelFuture instance without *some* degree of blocking (some of which we might be able to avoid with some clever tricks @normanm showed me). Ultimately, we need a singular {{Channel}} instance, which would require blocking to get only one instance.

Another reason why I opted for the backlog queue was to allow as close to an ordered sending of the messages as possible, to a given peer: in other words, to write to the channel as many backlogged messages as possible before sending the new message. For the vast majority of cassandra's functionality, ordering of messages is irrelevant. However, I'm wondering how sensitive repair and cluster membership changes might be to this potential edge-case of message reordering.

bq. I'd expect {{enqueue()}} (which might warrant a rename) to just do a {{ctx.write()}}

Makes sense, done.

That being said, I've left the code in newly-named {{#sendMessage}} and {{#finishHandshake}} a bit tentative, with a note that you and I should hammer this part out further :)

bq. ... flushing ...

I really like what you've done wrt flushing in {{FlushingHandler}} and {{CoalesingStrategies}}. I think as I was trying to maintain both blocking and non-blocking behaviors in {{CoalesingStrategies}}, that's why that code got twisted and complex. Thus, I've eliminated {{CoalescingMessageOutHandler}} and brought in your {{FlushingHandler}} and {{CoalesingStrategies}} changes. One thing to point out is that when scheduling a task to a netty executor thread, it will execute on the same IO thread as the current context. Thus, in {{FlushingHandler}} you don't have to worry about any concurrency between a new message arrival and the execution of the task, as they execute on the same thread. I've removed your (well-documented!) javadoc comments and the volatile keyword on the {{scheduledFlush}} member field.

I agree we can address "targeted benchmarking ... in a separate ticket", as I think getting the essential behavior of flush and coalesce is most important here.

bq. Regarding dropped messages, the current implementation was going through the {{MessagingService.incrementDroppedMessages()}} ...

Hmm, {{OutboundTcpConnection}} has it's own [counter for connection-specific dropped messages|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L136], which I've maintained in {{OutboundMessagingConnection}}. It seems like {{OutboundTcpConnection}} should probably be reporting it's dropped messages to {{MessagingService.incrementDroppedMessages()}} in addition to what it currently does. If you think that's legit, I'll fix it in this patch and I'll open a separate ticket to fix it in 3.0+. wdyt?

bq. InboundHandshakeHandler#handshakeHandlerChannelHandlerName

Removed this. I think I needed it with the earlier version of netty? Either way, killed it.

bq. In {{OutboundMessagingConnection#writeBacklogToChannel}}, we seem to be sending timed out messages if those are retried

This behavior was changed in CASSANDRA-12192, so I've retained it here.

bq. {{MessagingService.MessageSender}}

Removed

bq. InboundHandshakeHandler#setupMessagingPipeline

Luckily the comment is not relevant anymore; too bad I didn't fix that up when I fixed the code :D 

bq. pull out the fix to {{AnticompactionRequest}} into a separate ticket

Done in CASSANDRA-12934.

> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters.  Let's look at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)