You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Blake Eggleston (JIRA)" <ji...@apache.org> on 2014/11/03 17:06:34 UTC
[jira] [Commented] (CASSANDRA-6246) EPaxos

    [ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194659#comment-14194659 ] 

Blake Eggleston commented on CASSANDRA-6246:
--------------------------------------------

I have an initial implementation here: https://github.com/bdeggleston/cassandra/compare/CASSANDRA-6246?expand=1

It’s still pretty rough, I just wanted to get it to a point where we could get a feel for the performance advantages and decide if the additional complexity was worth it. There’s also none of the instance gc / optimized failure recovery we’ve been talking about.

I did some performance comparisons over the weekend. The tldr is that epaxos is 10% to 11.5x faster than classic paxos, depending on the workload.

To test, I used a cluster of 3 m3.xlarge instances in us-east, and a 4th instance executing queries against the cluster. Each C* node was in a different az. Commit log and data directories were on different disks.

There were 2 tests, each running 10k queries against the cluster. The first test measured throughput using queries that wouldn’t contend with each other. Each query inserted a row for a different partition. The second test measured performance under contention, where every query contended for the same partition. 

Each test was run with 1, 5, & 10 concurrent client requests.

With the uncontended workload, epaxos request time is 10-14% faster than the current implementation on average.
See: https://docs.google.com/spreadsheets/d/1olMYCepsE_02bMyfzV0Hke5UKuqoCNNjSIjR9yNs5iI/edit?pli=1#gid=0

With the contended workload, epaxos request time is 4.5x-11.5x faster than the current implementation on average.
See: https://docs.google.com/spreadsheets/d/1olMYCepsE_02bMyfzV0Hke5UKuqoCNNjSIjR9yNs5iI/edit?pli=1#gid=1327463955

There are 2 epaxos sections, regular, and cached. With higher contended request concurrency, the execution algorithm has to visit a lot of unexecuted instances to build it’s dependency graph. Reading the dependency data and instances out of their tables and deserializing them for each visit slows down epaxos to a point where it’s over twice as slow as classic paxos. By using a guava cache for the instance and dependency data objects, and keeping them around for a few minutes, epaxos is ~30x faster in higher contention/concurrency situations.

Some notes on the concurrent contended tests:

* The median query time for epaxos is a little slower than classic paxos for 5 concurrent contended requests. This is because epaxos is now doing an accept phase on a lot of the queries, and because classic paxos doesn’t send commit messages out if the predicate doesn’t apply to the query.
* With concurrent contending queries, 1-2.5% of the classic paxos queries timeout and fail. At this level, there are no failing epaxos queries. 
* Variance in query times is also much lower with epaxos. With 10 concurrent contending requests, the 95th %ile request time for classic paxos is 23x the median, epaxos is 1.8x.


> EPaxos
> ------
>
>                 Key: CASSANDRA-6246
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Blake Eggleston
>            Priority: Minor
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is that Multi-paxos requires leader election and hence, a period of unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, (2) is particularly useful across multiple datacenters, and (3) allows any node to act as coordinator: http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)