You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jason Brown (JIRA)" <ji...@apache.org> on 2016/12/02 19:28:58 UTC

[jira] [Comment Edited] (CASSANDRA-12966) Gossip thread slows down when using batch commit log

    [ https://issues.apache.org/jira/browse/CASSANDRA-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716014#comment-15716014 ] 

Jason Brown edited comment on CASSANDRA-12966 at 12/2/16 7:28 PM:
------------------------------------------------------------------

[~spodxx@gmail.com] thanks for taking a look.

bq. there are now two updateToken versions, one blocking and one asynchronous. Maybe async methods should be named differently

At a minimum, renaming {{updateTokens(final InetAddress, final Collection<Token>)}} to {{updatePeerTokens}} (or something similar) makes sense to differentiate it from {{updateTokens(Collection<Token>)}}. Not sure I'm a fan of adding the "Async" suffix to a method's name to indicate an aspect of it's behavior, but there is precedent for that in the code base. Alternatively, we could add a boolean to the method signature that let's the caller block on the result; not sure I like that so much, either. wdyt?


was (Author: jasobrown):
[~spodxx@gmail.com] thanks for taking a look.

bq. there are now two updateToken versions, one blocking and one asynchronous. Maybe async methods should be named differently

At a minimum, renaming {{updateTokens(final InetAddress, final Collection<Token>)}} to {{updatePeerTokens}} (or something similar) makes sense to differentiate it from {{updateTokens(Collection<Token>)}}. Not sure I'm a fan of adding the "Async" suffix to a method's name to indicate an aspect of it's behavior, but there is precedent for that in the code base. wdyt?

> Gossip thread slows down when using batch commit log
> ----------------------------------------------------
>
>                 Key: CASSANDRA-12966
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12966
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>
> When using batch commit log mode, the Gossip thread slows down when peers after a node bounces. This is because we perform a bunch of updates to the peers table via {{SystemKeyspace.updatePeerInfo}}, which is a synchronized method. How quickly each one of those individual updates takes depends on how busy the system is at the time wrt write traffic. If the system is largely quiescent, each update will be relatively quick (just waiting for the fsync). If the system is getting a lot of writes, and depending on the commitlog_sync_batch_window_in_ms, each of the Gossip thread's updates can get stuck in the backlog, which causes the Gossip thread to stop processing. We have observed in large clusters that a rolling restart causes triggers and exacerbates this behavior. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)