You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anton Lebedevich <ma...@gmail.com> on 2015/09/29 19:03:54 UTC

unpredictable sstable size for the same data

Hello.

I've copied table from one keyspace into another using
spark-cassandra-connector and size of a single sstable data file has
~2x difference: source Data.db file size  ~ 450Mb, target  ~ 200Mb.
Both tables  were flushed and compacted before measurement and there
is only one sstable per table, compression is off.
'copy table into file.csv' produces identical csv files for both tables.
Table structure is the same in both keyspaces, there is only one host
in cassandra cluster, cassandra version is  2.1.1.

What can cause such a difference in sstable sizes for the same data? I
expected them to be identical.


nodetool cfstats src.tbl
Keyspace: src
    Read Count: 0
    Read Latency: NaN ms.
    Write Count: 0
    Write Latency: NaN ms.
    Pending Flushes: 0
        Table: tbl
        SSTable count: 1
        Space used (live): 496725694
        Space used (total): 496725694
        Space used by snapshots (total): 346576404
        SSTable Compression Ratio: 0.0
        Memtable cell count: 0
        Memtable data size: 0
        Memtable switch count: 0
        Local read count: 0
        Local read latency: NaN ms
        Local write count: 0
        Local write latency: NaN ms
        Pending flushes: 0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 1253448
        Compacted partition minimum bytes: 447
        Compacted partition maximum bytes: 642
        Compacted partition mean bytes: 536
        Average live cells per slice (last five minutes): 0.0
        Maximum live cells per slice (last five minutes): 0.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

----------------
nodetool cfstats target.tbl
Keyspace: target
    Read Count: 0
    Read Latency: NaN ms.
    Write Count: 0
    Write Latency: NaN ms.
    Pending Flushes: 0
        Table: tbl
        SSTable count: 1
        Space used (live): 224972892
        Space used (total): 224972892
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.0
        Memtable cell count: 0
        Memtable data size: 0
        Memtable switch count: 0
        Local read count: 0
        Local read latency: NaN ms
        Local write count: 0
        Local write latency: NaN ms
        Pending flushes: 0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 1253448
        Compacted partition minimum bytes: 180
        Compacted partition maximum bytes: 215
        Compacted partition mean bytes: 215
        Average live cells per slice (last five minutes): 0.0
        Maximum live cells per slice (last five minutes): 0.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0


CREATE TABLE src.tbl (
    id text PRIMARY KEY,
    props map<text, text>
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Regards,
Anton Lebedevich.