You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/05/23 04:56:48 UTC

[jira] [Updated] (CASSANDRA-674) New SSTable Format

     [ https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-674:
-------------------------------

    Attachment: 674-v3.tgz

Attaching a new version: v3. I've extracted most of the tasks that can be accomplished independently into [other tickets|https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=issue+in+%28CASSANDRA-2650%2C+CASSANDRA-2679%2C+CASSANDRA-2641%2C+CASSANDRA-2576%2C+CASSANDRA-2062%2C+CASSANDRA-2629%2C+CASSANDRA-2145%2C+CASSANDRA-2398%29+order+by+updated+desc].

Changes from v2:
* Removed Avro
* Switched to type specific compression via CASSANDRA-2398
* Used type-specific compression for timestamps
* Implemented supercolumn support

This revision compresses wide rows very well, and the datafile format is essentially finalized. Next steps are to improve the performance at read time:
# Incorporate CASSANDRA-2319 to improve wide row access times
** Since the patch removes the column index, reads always begin at the beginning of the row, and scan until the correct column range is found. 2319 would allow for random access to a block
# Store more than one row per block in order to take the best advantage of compression for narrow rows
** This patch adds a Cursor object, which represents the position in a block and file. SSTableScanner will need to hold a Cursor between rows, and pass it into each IColumnIterator that is created

> New SSTable Format
> ------------------
>
>                 Key: CASSANDRA-674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-674
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 1.0
>
>         Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, perf-674-v1.txt, perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, including #16, #47 and #328. Attached is a proposed design/implementation of a new file format for SSTables that addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for remaining issues. It is roughly the format described here: http://wiki.apache.org/cassandra/FileFormatDesignDoc 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira