You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anton Lebedevich <ma...@gmail.com> on 2015/09/29 19:03:54 UTC
unpredictable sstable size for the same data
Hello.
I've copied table from one keyspace into another using
spark-cassandra-connector and size of a single sstable data file has
~2x difference: source Data.db file size ~ 450Mb, target ~ 200Mb.
Both tables were flushed and compacted before measurement and there
is only one sstable per table, compression is off.
'copy table into file.csv' produces identical csv files for both tables.
Table structure is the same in both keyspaces, there is only one host
in cassandra cluster, cassandra version is 2.1.1.
What can cause such a difference in sstable sizes for the same data? I
expected them to be identical.
nodetool cfstats src.tbl
Keyspace: src
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: tbl
SSTable count: 1
Space used (live): 496725694
Space used (total): 496725694
Space used by snapshots (total): 346576404
SSTable Compression Ratio: 0.0
Memtable cell count: 0
Memtable data size: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 1253448
Compacted partition minimum bytes: 447
Compacted partition maximum bytes: 642
Compacted partition mean bytes: 536
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0
----------------
nodetool cfstats target.tbl
Keyspace: target
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: tbl
SSTable count: 1
Space used (live): 224972892
Space used (total): 224972892
Space used by snapshots (total): 0
SSTable Compression Ratio: 0.0
Memtable cell count: 0
Memtable data size: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 1253448
Compacted partition minimum bytes: 180
Compacted partition maximum bytes: 215
Compacted partition mean bytes: 215
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0
CREATE TABLE src.tbl (
id text PRIMARY KEY,
props map<text, text>
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Regards,
Anton Lebedevich.