You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2014/12/22 16:23:15 UTC

[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine

    [ https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255824#comment-14255824 ] 

Sylvain Lebresne commented on CASSANDRA-8099:
---------------------------------------------

As an update on this I've (force) pushed to [the branch|https://github.com/pcmanus/cassandra/tree/8099_engine_refactor].

This is still not done but it compiles, run and work for basic queries.  That said, at least reverse, distinct and 2ndary index queries (and a bunch of other smaller details) are known not to work (but that should change relatively soon). Also, I'll still need to work on thrift support and general backward compatibility for rolling upgrades. Nonetheless, the general design is here and shouldn't change drastically. Plus it's barely tested at this point, I need to add more comments.

This is a huge patch. And while this doesn't change how Cassandra works in any fudamental way, this can't really be called a small incremental change. I'm the first to admit that those are not good things in theory, but to be honest, as this tries to clean up the abstractions that are at the core of the storage engine, I don't see how this could be done a whole lot more incrementally. I do am convinced that this will make implementing tons of outstanding features a lot more easy and is worth it just for that. Anyway, that's my (lame) excuse for this. Also, it's in a single big commit because I had a lot of back and forth while implementing so that my wip commits were more misleading than anything. At at latter point, if that's deemed better for review, I could try to split it in smaller commits of somewhat-related code (though I'm not entirely sure it will help tremendously).


> Refactor and modernize the storage engine
> -----------------------------------------
>
>                 Key: CASSANDRA-8099
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0
>
>
> The current storage engine (which for this ticket I'll loosely define as "the code implementing the read/write path") is suffering from old age. One of the main problem is that the only structure it deals with is the cell, which completely ignores the more high level CQL structure that groups cell into (CQL) rows.
> This leads to many inefficiencies, like the fact that during a reads we have to group cells multiple times (to count on replica, then to count on the coordinator, then to produce the CQL resultset) because we forget about the grouping right away each time (so lots of useless cell names comparisons in particular). But outside inefficiencies, having to manually recreate the CQL structure every time we need it for something is hindering new features and makes the code more complex that it should be.
> Said storage engine also has tons of technical debt. To pick an example, the fact that during range queries we update {{SliceQueryFilter.count}} is pretty hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to go into to simply "remove the last query result".
> So I want to bite the bullet and modernize this storage engine. I propose to do 2 main things:
> # Make the storage engine more aware of the CQL structure. In practice, instead of having partitions be a simple iterable map of cells, it should be an iterable list of row (each being itself composed of per-column cells, though obviously not exactly the same kind of cell we have today).
> # Make the engine more iterative. What I mean here is that in the read path, we end up reading all cells in memory (we put them in a ColumnFamily object), but there is really no reason to. If instead we were working with iterators all the way through, we could get to a point where we're basically transferring data from disk to the network, and we should be able to reduce GC substantially.
> Please note that such refactor should provide some performance improvements right off the bat but it's not it's primary goal either. It's primary goal is to simplify the storage engine and adds abstraction that are better suited to further optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)