You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jason Brown (JIRA)" <ji...@apache.org> on 2018/01/18 20:46:03 UTC

[jira] [Updated] (CASSANDRA-14174) Remove GossipDigestSynVerbHandler#doSort()

     [ https://issues.apache.org/jira/browse/CASSANDRA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Brown updated CASSANDRA-14174:
------------------------------------
    Description: 
I have personally tripped up on this function a couple of times over the years, believing that it contributes to bugs in some way or another. While I have not found that (necessarily!) to be the case, I feel this function is completely useless in the grand scope of things.

Going back through the mists of time (that is, {{git log}}), it appears this function was part of the original code drop from Facebook when they open sourced cassandra. Looking at the {{#doSort()}} method, all it does is sort the incoming list of {{GossipDigest}} s by the difference between the remote node's maxValue for a given peer and the local nodes' maxValue.

The only universe where this is actually an optimization is if you go back and read the [Scuttlebutt paper|https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf] (upon which cassandra's Gossip anti-entropy reconciliation is based). The end of section 3.2 describes ordering of the incoming digests such that, in the case where you do not return all of the differences (because you are optimizing for the return message size), you can gather the differences for the peers which are most of out sync. The ordering implemented in cassandra is the second ordering described in the paper, called "scuttle depth".

As we always send all differences between two nodes (message size be damned), this optimization, borrowed from the paper, is largely irrelevant for Cassandra's purposes.

Thus, I propose we remove this method for the following gains:
 - less garbage created
 - less CPU (sure, it's mostly trivial; see next point)
 - less time spent on unnecessary functionality on the *single threaded* gossip stage.

  was:
I have personally tripped up on this function a couple of times over the years, believing that it contributes to bugs in some way or another. While I have not found that (necessarily!) to be the case, I feel this function is completely useless in the grand scope of things.

Going back through the mists of time (that is, {{git log}}), it appears this function was part of the original code drop from Facebook when they open sourced cassandra. Looking at the {{#doSort()}} method, all it does is sort the incoming list of \{{GossipDigest}} s by the difference between the remote node's maxValue for a given peer and the local nodes' maxValue.

The only universe where is actually an optimization is if you go back and read the [Scuttlebutt paper|https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf] (upon which cassandra's Gossip anti-reconcilliation is based). The end of section 3.2 describes ordering of the incoming digests such that, in the case where you do not return all of the differences (because you are optimizing for the return message size), you can gather the differences for the peers which are most of out sync. The ordering implemented in cassandra is the second ordering described in the paper, called "scuttle depth".

As we always send all differences between two nodes (message size be damned), this optimization, borrowed from the paper, is largely irrelevant for Cassandra's purposes.

Thus, I propose we remove this method for the following gains:
 - less garbage created
 - less CPU (sure, it's mostly trivial; see next point)
 - less time spent on unnecessary functionality on the *single threaded* gossip stage.


> Remove GossipDigestSynVerbHandler#doSort()
> ------------------------------------------
>
>                 Key: CASSANDRA-14174
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14174
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>             Fix For: 4.x
>
>
> I have personally tripped up on this function a couple of times over the years, believing that it contributes to bugs in some way or another. While I have not found that (necessarily!) to be the case, I feel this function is completely useless in the grand scope of things.
> Going back through the mists of time (that is, {{git log}}), it appears this function was part of the original code drop from Facebook when they open sourced cassandra. Looking at the {{#doSort()}} method, all it does is sort the incoming list of {{GossipDigest}} s by the difference between the remote node's maxValue for a given peer and the local nodes' maxValue.
> The only universe where this is actually an optimization is if you go back and read the [Scuttlebutt paper|https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf] (upon which cassandra's Gossip anti-entropy reconciliation is based). The end of section 3.2 describes ordering of the incoming digests such that, in the case where you do not return all of the differences (because you are optimizing for the return message size), you can gather the differences for the peers which are most of out sync. The ordering implemented in cassandra is the second ordering described in the paper, called "scuttle depth".
> As we always send all differences between two nodes (message size be damned), this optimization, borrowed from the paper, is largely irrelevant for Cassandra's purposes.
> Thus, I propose we remove this method for the following gains:
>  - less garbage created
>  - less CPU (sure, it's mostly trivial; see next point)
>  - less time spent on unnecessary functionality on the *single threaded* gossip stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org