You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benjamin Lerer (JIRA)" <ji...@apache.org> on 2016/03/10 09:52:40 UTC
[jira] [Reopened] (CASSANDRA-11314) Inconsistent select count(*)
[ https://issues.apache.org/jira/browse/CASSANDRA-11314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Lerer reopened CASSANDRA-11314:
----------------------------------------
[~mlemnaru] I just would like to double check the problem. Some stuff seems weird to me. I just want to take the time to properly look into the problem.
It is better to double check and find nothing than to miss a real problem that can hit people once they are in production. :-)
> Inconsistent select count(*)
> ----------------------------
>
> Key: CASSANDRA-11314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11314
> Project: Cassandra
> Issue Type: Bug
> Components: Local Write-Read Paths
> Environment: Ununtu 14.04 LTS
> Reporter: Mircea Lemnaru
> Assignee: Benjamin Lerer
> Attachments: testrun.log, vnodes_and_hosts
>
>
> Hello,
> I currently have this setup:
> Cassandra 3.3 (Community edition downloaded from Datastax) installed on 3 nodes and I have created this table:
> CREATE TABLE billing.collected_data_day (
> collection_day int,
> timestamp timestamp,
> record_id uuid,
> dimensions map<text, text>,
> entity_id text,
> measurements map<text, text>,
> source_id text,
> PRIMARY KEY (collection_day, timestamp, record_id)
> ) WITH CLUSTERING ORDER BY (timestamp ASC, record_id ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> This table as you notice is partitioned by collection_day. This is because at the end of the day we need to have fast access to all the data generated in a day. collection day will be the x day from 1970
> In this table we have inserted roughly 12milion rows for testing purposes and we did a simple count. As you can see the results vary ...
> cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;
> count
> -------
> 55341
> (1 rows)
> cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;
> count
> -------
> 55372
> (1 rows)
> cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;
> count
> -------
> 55300
> (1 rows)
> cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;
> count
> -------
> 55300
> (1 rows)
> cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;
> count
> -------
> 55300
> (1 rows)
> cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;
> count
> -------
> 55303
> (1 rows)
> cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;
> count
> -------
> 55374
> (1 rows)
> I am running the query from the seed node of the cassandra cluster. As you can see most of the results are varying and I don't know the reason for this. We are not writing anything into the cluster at this time , we are only querying the cluster and only using this CQLSH.
> This is very similar to CASSANDRA-8940 but that is targeted for 2.1x
> Could it be that we are having the same issue in 3.3 ?
> Please let me know what extra info I can provide
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)