You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org> on 2012/03/07 10:42:59 UTC
[jira] [Commented] (CASSANDRA-2319) Promote row index

    [ https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224147#comment-13224147 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
---------------------------------------------

I've put a version of this issue at https://github.com/pcmanus/cassandra/commits/2319_index_promotion (against current trunk). Contrarily to the previously attached patches, this doesn't change the file format much. It pretty literally do what the issue title said: it promotes the columns index from the data file to the index file. Note that the patch is split in 3 commits that have some form of logical separation but the code only compile with all 3 commits.

So this remove the column index and bloom filter from the row header in the data file and move them in the index file along with the (key,position) pair. There is a number of choices/details worth mentioning:
* Only wide rows have a column index and bloom filter. So one difference with the current implementation is that skinny rows have no column bloom filter. I figure that it's probably not worth the space in the index file in that latter case (but I'm fine discussing that point)
* The key cache now keeps the whole information from the index file for a given row. This means that for wide rows, column index and bf are cached along with the position. Which is imo a good thing, but does mean the size of a key cache entry is not constant anymore (The estimation of the key cache memory size will have to be modified accordingly but the current patch don't do it).
* For wide rows, the index entry also ship with the row deletion times. This is necessary since we won't seek at the beginning of the row anymore.
* In the column indexes, offsets are relating to the beginning of the row in the data file rather than from the beginning of the index as is the case now.

Some other implementation points:
* EchoedRow is removed. It would be possible to echo rows following this patch but we would need to echo the column index too so that felt complicated enough that it could be left to a later ticket if we consider it worth it.
* I didn't found a non overly complicated/inefficient way to implement this patch without using seek() instead of just file marks. So in particular MappedFileDataInput gets a seek() method, even though that method throw an exception if we seek outside the segment (which should never happen).

I did a short (and honestly not very scientific) benchmark of a time series like workload with a number of thread inserting time series columns in a bunch of rows and other threads reading the tail of those rows (as expected, the performance degrades with more sstables added and improve with compaction). As soon as more than more than 1 sstable was present, the performance with this patch was around 30-40% better than without the patch.  I'll note that the test was very short and with everything on local host, so again the exact benefits may vary, but the ability to discard sstables based on index infos (saving a seek) seems to be a clear boost in that case.

I didn't saw any noticeable difference (neither good or bad) on a normal stress, as should be expected.

Note that this patch paves the way to removing the two phases compaction of LazilyCompactedRow, but that is left to a follow up ticket.
                
> Promote row index
> -----------------
>
>                 Key: CASSANDRA-2319
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>              Labels: compression, index, timeseries
>         Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. For a row of appreciable size, the row index ends up directing the third seek (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains information that tells us whether or not the 3rd seek is necessary (the column range or name we are trying to slice may not exist in a given sstable), promoting the row index into the sstable index would allow us to drop the maximum number of seeks for wide rows back to 2, and, more importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data in wide rows, where data is appended to the beginning or end of the row. Our existing compaction strategy gets lucky and clusters the oldest data in the oldest sstables: for queries to recently appended data, we would be able to eliminate wide rows using only the sstable index, rather than needing to seek into the data file to determine that it isn't interesting. For narrow rows, this change would have no effect, as they will not reach the threshold for indexing anyway.
> A first cut design for this change would look very similar to the file format design proposed on #674: http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira