You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tupshin Harper (JIRA)" <ji...@apache.org> on 2014/02/14 03:39:31 UTC

[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

    [ https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901052#comment-13901052 ] 

Tupshin Harper commented on CASSANDRA-6167:
-------------------------------------------

Adding an example for how this could be used for efficient client-side implemented aggregation.

Assume a CQL table with the following structure:
CREATE TABLE t6167 (
  uid text,
  evtts int,
  evtval text,
  PRIMARY KEY (uid, evtts)
) WITH CLUSTERING ORDER BY (evtts DESC)
(In a real system, evtts would probably be a timeuuid to avoid risk of collisions)
Assume data in that table for a single partition looks like
 uid | evtts | evtval
-----+-------+--------
   1 |     7 |    0.5
   1 |     6 |   -1.4
   1 |     5 |    0.3
   1 |     4 |   s5.1
   1 |     3 |    1.7
   1 |     2 |    1.3
   1 |     1 |    2.1
(Ignore the monotonically increasing timestamps.  Here only for simplicity.  Timeuuids, there would not do this, of course)


So this structure is used to write new floats (only used as an example of arbitrary aggregation)

Source events will write values such as 2.1 and 1.3
At read time, the logic would be as follows:
you would get a slice of the partition from most recent (hence the DESC ordering) back to either beginning of the partition or the most recently written summation value (e.g. s5.1 and s4.5).
With the syntax from option 2 above (and using % as a wildcard), you would get
 SELECT uid,evtts,evtval from t6167 where  uid=1 and evtts < NOW() UNTIL PARTITION evtval='s%' 
Asuming NOW() >= 8, and the above data, this would return :
 uid | evtts | evtval
-----+-------+--------
   1 |     7 |    0.5
   1 |     6 |   -1.4
   1 |     5 |    0.3
   1 |     4 |   s5.1
At that point, the client would return evtval 4.5 by doing client-side agregation of those 4 rows. 
Then optionally, if enough time had elapsed since the last aggregation column to ensure no out of order delivery (business rule), then that same reader thread would write back a new aggregation value at an appropriate timestamp lagging behing the current time by the potential out-of-order delivery window.
These writes would be inherently idemptotent, and hence race-condition free and could be easily tuned for delivery window and aggregation frequency to various workloads

> Add end-slice termination predicate
> -----------------------------------
>
>                 Key: CASSANDRA-6167
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>            Reporter: Tupshin Harper
>            Priority: Minor
>              Labels: ponies
>
> When doing performing storage-engine slices, it would sometimes be beneficial to have the slice terminate for other reasons other than number of columns or min/max cell name.
> Since we are able to look at the contents of each cell as we read it, this is potentially doable with very little overhead. 
> Probably more challenging than the storage-engine implementation itself, is to come up with appropriate CQL syntax (Thrift, should we decide to support it, would be trivial).
> Two possibilities ar
> 1) special where function:
> SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND partition_predicate({predicate})
> or a bigger language change, but i think one I prefer. more like:
> 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event {predicate}
> Neither feels perfect, but I do like the fact that the second one at least clearly states what it is intended to do.
> By using "UNTIL PARTITION", we could re-use the UNTIL keyword to handle other kinds of early-termination of selects that the coordinator might be able to do, such as stop retrieving additional rows from shards after a particular criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)