You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Dikang Gu (JIRA)" <ji...@apache.org> on 2018/03/06 16:28:00 UTC

[jira] [Updated] (CASSANDRA-13474) Cassandra pluggable storage engine

     [ https://issues.apache.org/jira/browse/CASSANDRA-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dikang Gu updated CASSANDRA-13474:
----------------------------------
    Description: 
Instagram is working on a project to significantly reduce Cassandra's tail latency, by implementing a new storage engine on top of RocksDB, named Rocksandra.

We started a prototype of single column (key-value) use case, and then implemented a full design to support most of the data types and data models in Cassandra, as well as streaming.

After a year of development and testing, we have rolled out the Rocksandra project to our internal deployments, and observed 3-4X reduction on P99 read latency in general, even more than 10 times reduction for some use cases.

We published a blog post about the wins and the benchmark metrics on AWS environment. https://engineering.instagram.com/open-sourcing-a-10x-reduction-in-apache-cassandra-tail-latency-d64f86b43589

I think the biggest performance win comes from we get rid of most Java garbages created by current read/write path and compactions, which reduces the JVM overhead and makes the latency to be more predictable.

We are very excited about the potential performance gain. As the next step, I propose to make the Cassandra storage engine to be pluggable (like Mysql and MongoDB), and we are very interested in providing RocksDB as one storage option with more predictable performance, together with community.

Design doc for pluggable storage engine: https://docs.google.com/document/d/1suZlvhzgB6NIyBNpM9nxoHxz_Ri7qAm-UEO8v8AIFsc/edit

  was:
We did some experiment to switch Cassandra's storage engine to RocksDB.

In the experiment, I built a prototype to integrate Cassandra 3.0.12 and RocksDB on single column (key-value) use case, shadowed one of our production use case, and saw about 4-6X P99 read latency drop during peak time, compared to 3.0.12. Also, the P99 latency became more predictable as well.

Here is detailed note with more metrics:

[https://docs.google.com/document/d/1Ztqcu8Jzh4USKoWBgDJQw82DBurQmsV-PmfiJYvu_Dc/edit?usp=sharing]

I think the biggest latency win comes from we get rid of most Java garbages created by current read/write path and compactions, which reduces the JVM overhead and makes the latency to be more predictable.

We are very excited about the potential performance gain. As the next step, I propose to make the Cassandra storage engine to be pluggable (like Mysql and MongoDB), and we are very interested in providing RocksDB as one storage option with more predictable performance, together with community.

Design doc for pluggable storage engine: https://docs.google.com/document/d/1suZlvhzgB6NIyBNpM9nxoHxz_Ri7qAm-UEO8v8AIFsc/edit


> Cassandra pluggable storage engine
> ----------------------------------
>
>                 Key: CASSANDRA-13474
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13474
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Dikang Gu
>            Priority: Major
>
> Instagram is working on a project to significantly reduce Cassandra's tail latency, by implementing a new storage engine on top of RocksDB, named Rocksandra.
> We started a prototype of single column (key-value) use case, and then implemented a full design to support most of the data types and data models in Cassandra, as well as streaming.
> After a year of development and testing, we have rolled out the Rocksandra project to our internal deployments, and observed 3-4X reduction on P99 read latency in general, even more than 10 times reduction for some use cases.
> We published a blog post about the wins and the benchmark metrics on AWS environment. https://engineering.instagram.com/open-sourcing-a-10x-reduction-in-apache-cassandra-tail-latency-d64f86b43589
> I think the biggest performance win comes from we get rid of most Java garbages created by current read/write path and compactions, which reduces the JVM overhead and makes the latency to be more predictable.
> We are very excited about the potential performance gain. As the next step, I propose to make the Cassandra storage engine to be pluggable (like Mysql and MongoDB), and we are very interested in providing RocksDB as one storage option with more predictable performance, together with community.
> Design doc for pluggable storage engine: https://docs.google.com/document/d/1suZlvhzgB6NIyBNpM9nxoHxz_Ri7qAm-UEO8v8AIFsc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org