You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Pavel Yaskevich (JIRA)" <ji...@apache.org> on 2011/07/08 23:43:16 UTC

[jira] [Updated] (CASSANDRA-47) SSTable compression

     [ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-47:
-------------------------------------

    Attachment: snappy-java-1.0.3-rc4.jar
                CASSANDRA-47.patch

Patch introduces CompressedDataFile with Input/Output classes. Snappy is used for compression/decompression because it showed better speeds in tests comparing to ning. Files are split into 4 bytes + 64kb chunks where 4 bytes hold information about compressed chunk size. Both Input and Output classes extend RandomAccessFile so random I/O works as expected.

All SSTable files are opened using CompressedDataFile.Input. On startup when SSTableReader.open gets called it first checks if data file is already compressed and compresses if it was not already compressed so users won't have a problem after they update.

At the header of the file it reserves 8 bytes for a "real data size" so other components of the system that use SSTables and SSTables itself have no idea that data file is compressed.

Streaming of data file sends decompressed chunks for convenience of maintaing transfer and receiving party compresses all data before write to the backing file (see CompressedDataFile.transfer(...) and CompressedFileReceiver class).

Tests are showing dramatic performance increase when reading 1 million rows created with 1024 bytes random values. Current code takes >> 1000 secs to read but with current path only 175 secs. Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress -n 1000000 -S 1024 -r, where -r option generates random values).

Writes perform a bit better like 5-10%. 

> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira