You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Dan Hendry (JIRA)" <ji...@apache.org> on 2014/09/05 18:19:28 UTC

[jira] [Commented] (CASSANDRA-7890) LCS and time series data

    [ https://issues.apache.org/jira/browse/CASSANDRA-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123098#comment-14123098 ] 

Dan Hendry commented on CASSANDRA-7890:
---------------------------------------

Note that I am not currently working on this ticket. After a bit of poking around, it does not seem like the implementation would be trivial.

> LCS and time series data
> ------------------------
>
>                 Key: CASSANDRA-7890
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7890
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Dan Hendry
>             Fix For: 3.0
>
>
> Consider the following very typical schema for bucketed time series data:
> {noformat}
> CREATE TABLE user_timeline (
> 	ts_bucket bigint,
> 	username varchar,
> 	ts timeuuid,
> 	data blob,
> 	PRIMARY KEY ((ts_bucket, username), ts))
> {noformat}
> If you have a single cassandra node (or cluster where RF = N) and use the ByteOrderedPartitioner, LCS becomes *ridiculously*, *obscenely*, efficient. Under a typical workload where data is inserted in order, compaction IO could be reduced to *near zero* as sstable ranges dont overlap (with a trivial change to LCS so sstables with no overlap are not rewritten when being promoted into the next level). Better yet, we don't _require_ ordered data insertion. Even if insertion order is completely random, you still get standard LCS performance characteristics which are usually acceptable (although I believe there are a few degenerate compaction cases which are not handled in the current implementation). A quick benchmark using vanilla cassandra 2.0.10 (ie no rewrite optimization) shows a *77% reduction in compaction IO* when switching from the Murmur3Partitioner to the ByteOrderedPartitioner.
> The obvious problem is, of course, that using an order preserving partitioner is a Very Bad idea when N > RF. Using an OPP for time series data ordered by time is utter lunacy.
> It seems to me that one solution is to split apart the roles of the partitioner so that data distribution across the cluster and data ordering on disk can be controlled independently. Ideally on disk ordering could be set per CF. Im curious about the historical choice to order data on disk by token and not key. Randomized (hashed key ordered) distribution across the cluster is obviously a good idea but natural key ordered on disk seem like it would have a number of advantages:
> * Better read performance and file system page cache efficiency for any workload which access certain ranges of row keys more frequently than others (this applies to _many_ use cases beyond time series data).
> * I can't think of a realistic workload where CRUD operations would be noticeably less performant when using natural instead of hash ordering. 
> * Better compression ratios (although probably only for skinny rows).
> * Range based truncation becomes feasible.
> * Ordered range scans might be feasible to implement even with random cluster distribution.
> The only things I can think of which could suffer when using different cluster and disk ordering are bootstrap and repair. Although I have no evidence, the massive potential performance gains certainly still seem to be worth it.
> Thoughts? This approach seems to be fundamentally different from other tickets related to improving time series data (CASSANDRA-6602, CASSANDRA-5561) which focus only on new or modified compaction strategies. By changing data sort order, existing compaction strategies can be made significantly more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)