You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/02/09 08:59:57 UTC

[jira] Resolved: (CASSANDRA-1092) Add Slice API, and replace CF and SC for compaction reads

     [ https://issues.apache.org/jira/browse/CASSANDRA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood resolved CASSANDRA-1092.
---------------------------------

    Resolution: Invalid

Waaay too invasive.

> Add Slice API, and replace CF and SC for compaction reads
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-1092
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1092
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.8
>
>         Attachments: 0001-Add-Slice-and-ColumnKey.patch, 0002-Refactor-Scanner-interface-into-filtering-and-filter.patch, 0003-Add-a-Scanner-for-merging-sorted-slice-lists.patch, 0004-Make-CompactionIterator-extend-SliceMergingIterator.patch
>
>
> Currently, we have two read paths for fetching Columns from disk: the io.sstable.SSTableScanner interface, and the db.filter.SSTable*Iterator interfaces. The latter is intended for iterating over the IColumns contained in a single row, while the former iterates over entire rows at once (although SSTableScanner supports returning a db.filter implementation per row).
> While this separation has allowed for highly optimized pushdown filtering in the db.filter classes, the lack of abstraction makes it impossible to reason about changes to the file format, and depends on random access into the file. Additionally, the separation of 'row iteration' from 'icolumn iteration' ignores the fact that super columns contain an additional level of columns that could be iterated. Rather than introducing a third level of iterators that deals with iterating over subcolumns, a unified interface for iterating over arbitrarily nested columns would clarify the code, and open the door to many interesting possibilities (see CASSANDRA-998).
> This ticket deals with implementing an initial cut of the unified interface, which reuses the "Scanner" name. The org.apache.cassandra.Scanner interface is essentially an extended iterator, which is further enhanced by org.apache.cassandra.SeekableScanner to add operations that reposition the iterator. By the end of CASSANDRA-998, SeekableScanner will have implementations for the Memtable and SSTables, allowing for uniform iteration of all sources.
> The object that a Scanner iterates over is org.apache.cassandra.Slice, which is immutable, and contains parent deletion Metadata (markedForDeleteAt/localDeletionTime: like a ColumnFamily or SuperColumn). Since only the highest markedForDeleteAt or localDeletionTime matters for nested columns, Slices simplify storage of this data by storing a single value for all parents. The Metadata in a Slice is bounded at each end by a org.apache.cassandra.db.ColumnKey, which is a compound key representing the full path to a column, or a parent boundary.
> The ColumnKeys in a Slice make it possible to delete column name ranges. By convention (in this patch), the ColumnKeys in a Slice always share parents. In the future, if we wanted to support range deletes for rows or supercolumns, it would be trivial to remove that assumption.
> SSTables and Memtables can be abstracted into "sorted lists of Slices" which are individually non-intersecting. Client reads and compactions can use org.apache.cassandra.SliceMergingIterator to merge the Slices from multiple Scanners into a new Scanner which is globally non-intersecting. This process will be at the heart of any read from a ColumnFamilyStore by the end of 998, but this issue only uses SliceMergingIterator at the core of compaction, by making CompactionIterator a subclass of SliceMergingIterator.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira