You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Igor Rudyak (JIRA)" <ji...@apache.org> on 2016/07/26 21:52:20 UTC
[jira] [Comment Edited] (IGNITE-3588) Cassandra store should use batching in writeAll and deleteAll methods

    [ https://issues.apache.org/jira/browse/IGNITE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394608#comment-15394608 ] 

Igor Rudyak edited comment on IGNITE-3588 at 7/26/16 9:51 PM:
--------------------------------------------------------------

Valentin,

For *writeAll/readAll* Cassandra cache store implementation uses async operations (http://www.datastax.com/dev/blog/java-driver-async-queries) and futures, which has the best characteristics in terms of performance. 

Cassandra BATCH statement is actually quite often anti-pattern for those who come from relational world. BATCH statement concept in Cassandra is totally different from relational world and is not for optimizing batch/bulk operations. The main purpose of Cassandra BATCH is to keep denormalized data in sync. For example when you duplicating the same data into several tables. All other cases are not recommended for Cassandra batches: 
 - https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
 - http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
 - https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/

It's also good to mention that in CassandraCacheStore implementation (actually in CassandraSessionImpl) all operation with Cassandra is wrapped in a loop. The reason is in a case of failure it will be performed 20 attempts to retry the operation with incrementally increasing timeouts starting from 100ms and specific exception handling logic (Cassandra hosts unavailability and etc.). Thus it provides quite reliable persistence mechanism. According to load tests, even on heavily overloaded Cassandra cluster (CPU LOAD > 10 per one core) there were no lost writes/reads/deletes and maximum 6 attempts to perform one operation.


was (Author: irudyak):
Valentin,

For writeAll/readAll Cassandra cache store implementation uses async operations (http://www.datastax.com/dev/blog/java-driver-async-queries) and futures, which has the best characteristics in terms of performance. 

Cassandra BATCH statement is actually quite often anti-pattern for those who come from relational world. BATCH statement concept in Cassandra is totally different from relational world and is not for optimizing batch/bulk operations. The main purpose of Cassandra BATCH is to keep denormalized data in sync. For example when you duplicating the same data into several tables. All other cases are not recommended for Cassandra batches: 
 - https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
 - http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
 - https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/

It's also good to mention that in CassandraCacheStore implementation (actually in CassandraSessionImpl) all operation with Cassandra is wrapped in a loop. The reason is in a case of failure it will be performed 20 attempts to retry the operation with incrementally increasing timeouts starting from 100ms and specific exception handling logic (Cassandra hosts unavailability and etc.). Thus it provides quite reliable persistence mechanism. According to load tests, even on heavily overloaded Cassandra cluster (CPU LOAD > 10 per one core) there were no lost writes/reads/deletes and maximum 6 attempts to perform one operation.

> Cassandra store should use batching in writeAll and deleteAll methods
> ---------------------------------------------------------------------
>
>                 Key: IGNITE-3588
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3588
>             Project: Ignite
>          Issue Type: Improvement
>          Components: ignite-cassandra
>    Affects Versions: 1.6
>            Reporter: Valentin Kulichenko
>             Fix For: 1.7
>
>
> In current implementation Cassandra store executes all updates one by one when {{writeAll}} or {{deleteAll}} method is called.
> We should add an option to use {{BatchStatement}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)