You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Daniel Norberg (JIRA)" <ji...@apache.org> on 2013/05/07 21:55:16 UTC

[jira] [Commented] (CASSANDRA-5422) Binary protocol sanity check

    [ https://issues.apache.org/jira/browse/CASSANDRA-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651238#comment-13651238 ] 

Daniel Norberg commented on CASSANDRA-5422:
-------------------------------------------

The main issues I identified:

* Contention in the driver, i.e. per connection locks taken for every request
* Expensive serialization, i.e. multiple layers of ChannelBuffers used in the ExecuteMessage codec.
* No write batching, i.e. every message results in an expensive syscall.
* Contention in the stress application, bottlenecking on a shared work queue and spawning of one thread per asynchronous worker.

After eliminating contention in the driver and the stress application, optimizing serialization and adding write batching I get a throughput of 200k+ requests per second on my laptop (four core 2Ghz i7 mpb) when making asynchronous requests at a concurrency level of 500. This is with request execution and mutation disabled with the above patch and running both cassandra and the stress tool with Java 7. With this throughput, the benchmark uses a bandwidth of ~60 MB/sec so server grade hardware should be able to saturate 1 Gbit ethernet interfaces, especially with larger payloads.

https://github.com/danielnorberg/java-driver/tree/optimization
https://github.com/danielnorberg/cassandra/tree/transport-benchmark

{noformat}
5/7/13 2:59:54 PM ==============================================================
com.datastax.driver.stress.Reporter:
  latencies:
             count = 352558280
         mean rate = 230848.13 calls/s
     1-minute rate = 223475.90 calls/s
     5-minute rate = 224159.41 calls/s
    15-minute rate = 190931.94 calls/s
               min = 0.27ms
               max = 124.37ms
              mean = 2.16ms
            stddev = 1.63ms
            median = 1.69ms
              75% <= 2.43ms
              95% <= 5.57ms
              98% <= 6.64ms
              99% <= 8.76ms
            99.9% <= 26.57ms

  requests:
             count = 352559217
         mean rate = 230848.12 requests/s
     1-minute rate = 223474.50 requests/s
     5-minute rate = 224159.75 requests/s
    15-minute rate = 190950.27 requests/s
{noformat}

Suggestions for further work:

* Use uniform histogram instead of biased (default) as the biased histogram takes expensive read-write locks for every update, i.e. every request. Or find some way to eliminate the read-write locking in the biased histogram.
* Make StorageProxy non-blocking and use the jsr166e ForkJoinPool instead of normal TPE for a nice throughput boost when working with a large volume of small messages.
* Change protocol to allow more than 128 outstanding requests per connection.

When running normally with request execution enabled I get ~24k rps. Quick profiling indicates that there's some contention points that could be removed, e.g. the ReentrantReadWriteLock (switchLock) in Table. We should be able to optimize the whole stack to the point where a cassandra node can achieve a sustained rate of 100k+ writes per second.


                
> Binary protocol sanity check
> ----------------------------
>
>                 Key: CASSANDRA-5422
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5422
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>            Reporter: Jonathan Ellis
>            Assignee: Daniel Norberg
>         Attachments: 5422-test.txt
>
>
> With MutationStatement.execute turned into a no-op, I only get about 33k insert_prepared ops/s on my laptop.  That is: this is an upper bound for our performance if Cassandra were infinitely fast, limited by netty handling the protocol + connections.
> This is up from about 13k/s with MS.execute running normally.
> ~40% overhead from netty seems awfully high to me, especially for insert_prepared where the return value is tiny.  (I also used 4-byte column values to minimize that part as well.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira