You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "mlowicki (JIRA)" <ji...@apache.org> on 2015/01/16 09:00:55 UTC
[jira] [Created] (CASSANDRA-8636) Inconsistencies between two
tables if BATCH used
mlowicki created CASSANDRA-8636:
-----------------------------------
Summary: Inconsistencies between two tables if BATCH used
Key: CASSANDRA-8636
URL: https://issues.apache.org/jira/browse/CASSANDRA-8636
Project: Cassandra
Issue Type: Bug
Environment: Cassandra 2.1.2, cqlengine 0.20.0, Debian Wheezy
Reporter: mlowicki
Two tables:
* First one *entity* has log-like structure - whenever entity is modified we create new version of it and put into the table with new mtime which is part of compound key. Old one is removed.
* Second one called *entity_by_id* is manually managed index for *entity*. By having only id you can get basic entity attributes from *entity_by_id*.
While adding entity we do two inserts - to *entity* and *entity_by_id* (in this order)
While deleting we do the same using the same order so first we remove record from entity table.
It turned out that these two tables were inconsistent. We had ~260 records in *entity_by_id* for which there is no corresponding record in *entity*. In *entity* table it's much worse because ~7000 records in *entity_by_id* are missing and it was growing much faster.
We were using LOCAL_QUROUM. Two datacenters. We didn't get any exceptions while inserts or deletes. BatchQuery from cqlengine has been used.
if BatchQuery is not used:
{code}
with BatchQuery() as b:
- entity.batch(b).save()
- entity_by_id = EntityById.copy_fields_from(entity)
- entity_by_id.batch(b).save()
+ entity.save()
+ entity_by_id = EntityById.copy_fields_from(entity)
+ entity_by_id.save()
{code}
Everything is fine. We don't have more inconsistencies. I've check what cqlengine generates and seems that works as expected:
{code}
('BEGIN BATCH\n UPDATE sync.entity SET "name" = %(4)s WHERE "user_id" = %(0)s AND "data_type_id" = %(1)s AND "version" = %(2)s AND "id" = %(3)s\n INSERT INTO sync.entity_by_id ("user_id", "id", "parent_id", "deleted", "folder", "data_type_id", "version") VALUES (%(5)s, %(6)s, %(7)s, %(8)s, %(9)s, %(10)s, %(11)s)\nAPPLY BATCH;',)
{code}
Schemas:
{code}
CREATE TABLE entity (
user_id text,
data_type_id int,
version bigint,
id text,
cache_guid text,
client_defined_unique_tag text,
ctime timestamp,
deleted boolean,
folder boolean,
mtime timestamp,
name text,
originator_client_item_id text,
parent_id text,
position blob,
server_defined_unique_tag text,
specifics blob,
PRIMARY KEY (user_id, data_type_id, version, id)
) WITH CLUSTERING ORDER BY (data_type_id ASC, version ASC, id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX index_entity_parent_id ON sync.entity (parent_id);
CREATE TABLE entity_by_id (
user_id text,
id text,
cache_guid text,
data_type_id int,
deleted boolean,
folder boolean,
originator_client_item_id text,
parent_id text,
version bigint,
PRIMARY KEY (user_id, id)
) WITH CLUSTERING ORDER BY (id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX index_entity_by_id_parent_id ON entity_by_id (parent_id);
{code}
Previously we had many "batch size exceeded" warnings but limit is increased now as sometimes we put many KB of data into *specifics* blob field (*batch_size_warn_threshold_in_kb* is set now to 20).
We had similar data model in other project where we have the same issue. There are not missing records in *entity* but couple of thousands in *entity_by_id*. I'll send more details on this soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)