You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2017/03/15 09:37:42 UTC
[jira] [Commented] (CASSANDRA-8457) nio MessagingService

    [ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925803#comment-15925803 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
---------------------------------------------

Finally got time to look at the new version, sorry it wasn't sooner.

In general, that looks pretty good and while I have a number of remarks and suggestions, those are somewhat minor.

First, I took at stab at addressing some of my comments/suggestions when it felt simpler than doing a wordy comment and pushed the resulting branch [here|https://github.com/pcmanus/cassandra/commits/8457-review]. I won't go into too much details as I tried to put details for the rational of each commit in the commit messages, but I want to mention the 3rd commit as it's the only substantial one. Basically, I felt {{OutboundMessagingConnection}} was a bit dense and that the exact details around flushing behavior being split in different classes ({{OutboundMessagingConnection}} and {{MessageOutHandler}} mostly but still) was making things slightly less easily understantable. So that commit is mostly my attempt at grouping flush behavior in a single place. Hopefully, I didn't break anything.

Anyway, I tried to break those suggested commit as much as I could so it's easy to skip something if you don't agree with it.

Outside of that, a few additional remarks:
* In {{SSLFactory}}, it's confusing as to what is the difference between {{createSslContext}} and {{getSslContext}} and when one is supposed to use one or the other.
* In {{OutboundMessagingConnection}}:
** I'd prefer using an {{AtomicReference}} for {{state}} (and {{AtomicBoolean}} for {{scheduledFlushState}} but my branch above already does this one). We're not supposed to create new connections all the time and we're not creating that much of them overall, so I doubt saving a few small allocations makes a meaningful difference here, but it makes things slightly less readable (don't get me wrong, using field updaters is a good trick, and we do already use it in other places, but I prefer keeping it for where it demonstrably make a difference, or it's premature optimization).
** At the beginning of {{sendMessage()}}, shouldn't we be checking people aren't using a closed connection?
** The naming of {{MIN_MESSAGES_FOR_COALESCE}} and more generally the phrasing of the comments on {{otc_coalescing_enough_coalesced_messages}} in {{cassandra.yaml}} wasn't all that clear to me on first read. Not necessary new to this patch though.
** We set a connection timeout in {{connect()}}, but we also have the exact same timeout in {{OutboundHandshakeHandler}}. Can't we simplify a bit and remove the latter one?
** In {{connectionTimeout()}}, what happens when a connection attempt timeout? I looks like we drop the backlog on the floor without logging any message nor attempting to reconnect. Am I missing something?
** In {{finishHandshake}}, in the DISCONNECT/NEGOTIATION_FAILURE case, we don't seem to actively trying to reconect. I suppose we rely on the next message triggering that connectio? Shouldn't we reconnect more actively? We may have a message in the backlog and we don't know when the next message will come.
* In {{InboundHandshakeHandler.handleMessagingStartResponse()}}, maybe we should check the return of {{handshakeTimeout.cancel()}} and exit early if it's {{false}}, rather that reading from a closed socket and getting a random error. Also, as we cancel the timeout before setting the state to "complete", I think I would be safer to check for {{if (state == COMPLETE || (handshakeTimeout != null && handshakeTimeout.isCancelled()))}} in {{failHandshake()}} (admittedly we're not really racy because we're on the netty event loop, but it doesn't feel harmful to future proof in case this change).
* Regarding the sending and receiving buffer size handling:
** we don't seem to be handling the default for these buffer the same way for inbound and outbound connections. For outbound connections, we use the default defined in {{OutboundConnectionParams}}, but for inbound ones it appears we just default to whatever Netty defaults to (in {{NettyFactory.createInboundChannel}}). Can we consolidate that (probably want to always use our own defaults)?
** since we use separate connection for inbound and outbound, I assume we barely every use the receive buffer for outbound connection and vice-versa. That is, there is handshaking which sends message both ways in both case, but the message exhanged are tiny. Would it make sense to hardcode the size of the buffer that each side barely use to something big enough for handshake but small otherwise? So that when user configure {{DatabaseDescriptor.getInternodeRecvBufferSize/getInternodeSendBufferSize}}, it only impact what I assume they truly mean to impact.
** in {{Builder.build}} method, feels weird to only check {{sendReceiveBufferSize}} as precondition (and not {{receiveBufferSize}}) as precondition. I mean we're never passing a bad value in practice, but that's true of {{sendReceiveBufferSize}}.
* Nit: in {{NettyFactory}}, {{HANDSHAKE_HANDLER_CHANNEL_HANDLER_NAME}} is a mouthful. I'm sure no-one will get confused if we rename to {{HANDSHAKE_HANDLER_NAME}}.
* The patch still has a bunch of TODO.

> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters.  Let's look at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)