You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by graham sanderson <gr...@vast.com> on 2014/06/24 07:22:10 UTC

static columns and TTL - wouldn't it be nice if static columns played nice with a partition whose partition keys have all (TTL) expired

So, I was thinking about a new use case, where an ideal situation would be to have something like

CREATE TABLE series {
	id uuid,
	inserted timeuuid,
	small_thing blob,
	large_static_thing blob static,
	PRIMARY KEY (id, inserted)
}

So this is my first use of static columns, but I also want to use TTL (I just built 2.0.8 to play with)

https://issues.apache.org/jira/browse/CASSANDRA-6782 and friends are pretty confusing when it comes to TTL and the row marker, but from my playing, it seems at least you can control behavior because you can (re) INSERT the primary key values only using or not using a TTL. (Side node docs still say UPDATE and INSERT are identical which is strictly no longer true)

So what I really want is the ability to do

INSERT INTO series (id, large_static_thing)

then repeated 

INSERT INTO series (id, inserted, small_thing) VALUE (a, b, c) USING TTL X;

and have the partition (and the static column) disappear when the last “row” for the partition key is gone.

I can get this behavior if I update the large_static_thing every time along with inserting small_thiing, but that is exactly what I don’t want to do because it is large and static.

It sort of seems semantically right that "a special column that is shared by all the rows of the same partition”  should at least have an option to have it expire when all “rows” expire.

It seems like this would be technically feasible (though very much non trivial) if you had a syntax, say “large_static_thing blob static autoexpiring”, to make the static column an ExpiringColumn, and have any row updates with TTL insert a new OnDiskAtom type (that contains a TTL but no value) for the static column. These could be reconciled/reduced/compacted or whatever with the ExpiringColumn during read and compaction.

It all sounds a bit over-complicated… so:

1) Does this sounds like a useful feature, or is it a me only use case
2) Can someone think of a way to model this reasonably efficiently today without using TTL on the static column (and thus having to rewrite it every time) - not that I’m trying to be abusive and I haven’t thought this thru, but my spider sense makes me think that maybe I can abuse an index on a small expiring column to quickly find empty partition keys
3) Is it actually simpler to implement than I think in the code base (This is the first time I’ve peeked at these areas of the code)
4) If implemented as I suggested above, does that have to be done in a major version?

Thanks,

Graham

Re: static columns and TTL - wouldn't it be nice if static columns played nice with a partition whose partition keys have all (TTL) expired

Posted by graham sanderson <gr...@vast.com>.

Maybe in the short term just some CQL syntax to detect the case of a static only row

SELECT id, inserted, large_static_thing from series;

works, and returns a single row with null for inserted if there are no “rows”, but this is a fictitious row… (and I must include large_static_thing in order to get id) - and of course I’d have to filter for null inserted on the client.

If i was to do manual clean up, I’d really want a way to select just the partition keys with only static columns as is detected by the code that makes the fictitious row today - ideally it’d just check for the absence of any column names prefixed by the partition key

On Jun 24, 2014, at 12:26 AM, graham sanderson <gr...@vast.com> wrote:

> Note, that as I think about it, if you had the new OnDiskAtom time with TTL and no value, then you wouldn’t need anything special about static columns, you’d just need a CQL syntax to update/set the TTL for a column which might be useful for lots of things.
> 
> On Jun 24, 2014, at 12:22 AM, graham sanderson <gr...@vast.com> wrote:
> 
>> So, I was thinking about a new use case, where an ideal situation would be to have something like
>> 
>> CREATE TABLE series {
>> 	id uuid,
>> 	inserted timeuuid,
>> 	small_thing blob,
>> 	large_static_thing blob static,
>> 	PRIMARY KEY (id, inserted)
>> }
>> 
>> So this is my first use of static columns, but I also want to use TTL (I just built 2.0.8 to play with)
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-6782 and friends are pretty confusing when it comes to TTL and the row marker, but from my playing, it seems at least you can control behavior because you can (re) INSERT the primary key values only using or not using a TTL. (Side node docs still say UPDATE and INSERT are identical which is strictly no longer true)
>> 
>> So what I really want is the ability to do
>> 
>> INSERT INTO series (id, large_static_thing)
>> 
>> then repeated 
>> 
>> INSERT INTO series (id, inserted, small_thing) VALUE (a, b, c) USING TTL X;
>> 
>> and have the partition (and the static column) disappear when the last “row” for the partition key is gone.
>> 
>> I can get this behavior if I update the large_static_thing every time along with inserting small_thiing, but that is exactly what I don’t want to do because it is large and static.
>> 
>> It sort of seems semantically right that "a special column that is shared by all the rows of the same partition”  should at least have an option to have it expire when all “rows” expire.
>> 
>> It seems like this would be technically feasible (though very much non trivial) if you had a syntax, say “large_static_thing blob static autoexpiring”, to make the static column an ExpiringColumn, and have any row updates with TTL insert a new OnDiskAtom type (that contains a TTL but no value) for the static column. These could be reconciled/reduced/compacted or whatever with the ExpiringColumn during read and compaction.
>> 
>> It all sounds a bit over-complicated… so:
>> 
>> 1) Does this sounds like a useful feature, or is it a me only use case
>> 2) Can someone think of a way to model this reasonably efficiently today without using TTL on the static column (and thus having to rewrite it every time) - not that I’m trying to be abusive and I haven’t thought this thru, but my spider sense makes me think that maybe I can abuse an index on a small expiring column to quickly find empty partition keys
>> 3) Is it actually simpler to implement than I think in the code base (This is the first time I’ve peeked at these areas of the code)
>> 4) If implemented as I suggested above, does that have to be done in a major version?
>> 
>> Thanks,
>> 
>> Graham
>> 
>> 
>

Re: static columns and TTL - wouldn't it be nice if static columns played nice with a partition whose partition keys have all (TTL) expired

Posted by graham sanderson <gr...@vast.com>.

Note, that as I think about it, if you had the new OnDiskAtom time with TTL and no value, then you wouldn’t need anything special about static columns, you’d just need a CQL syntax to update/set the TTL for a column which might be useful for lots of things.

On Jun 24, 2014, at 12:22 AM, graham sanderson <gr...@vast.com> wrote:

> So, I was thinking about a new use case, where an ideal situation would be to have something like
> 
> CREATE TABLE series {
> 	id uuid,
> 	inserted timeuuid,
> 	small_thing blob,
> 	large_static_thing blob static,
> 	PRIMARY KEY (id, inserted)
> }
> 
> So this is my first use of static columns, but I also want to use TTL (I just built 2.0.8 to play with)
> 
> https://issues.apache.org/jira/browse/CASSANDRA-6782 and friends are pretty confusing when it comes to TTL and the row marker, but from my playing, it seems at least you can control behavior because you can (re) INSERT the primary key values only using or not using a TTL. (Side node docs still say UPDATE and INSERT are identical which is strictly no longer true)
> 
> So what I really want is the ability to do
> 
> INSERT INTO series (id, large_static_thing)
> 
> then repeated 
> 
> INSERT INTO series (id, inserted, small_thing) VALUE (a, b, c) USING TTL X;
> 
> and have the partition (and the static column) disappear when the last “row” for the partition key is gone.
> 
> I can get this behavior if I update the large_static_thing every time along with inserting small_thiing, but that is exactly what I don’t want to do because it is large and static.
> 
> It sort of seems semantically right that "a special column that is shared by all the rows of the same partition”  should at least have an option to have it expire when all “rows” expire.
> 
> It seems like this would be technically feasible (though very much non trivial) if you had a syntax, say “large_static_thing blob static autoexpiring”, to make the static column an ExpiringColumn, and have any row updates with TTL insert a new OnDiskAtom type (that contains a TTL but no value) for the static column. These could be reconciled/reduced/compacted or whatever with the ExpiringColumn during read and compaction.
> 
> It all sounds a bit over-complicated… so:
> 
> 1) Does this sounds like a useful feature, or is it a me only use case
> 2) Can someone think of a way to model this reasonably efficiently today without using TTL on the static column (and thus having to rewrite it every time) - not that I’m trying to be abusive and I haven’t thought this thru, but my spider sense makes me think that maybe I can abuse an index on a small expiring column to quickly find empty partition keys
> 3) Is it actually simpler to implement than I think in the code base (This is the first time I’ve peeked at these areas of the code)
> 4) If implemented as I suggested above, does that have to be done in a major version?
> 
> Thanks,
> 
> Graham
> 
>