You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jimmy Lin <y2...@gmail.com> on 2014/04/22 08:51:27 UTC

fixed size collection possible?

hi,
look at the collection type support in cql3,
e.g
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html

we can append or remove using "+" and "-" operator

UPDATE users
  SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';

UPDATE users
  SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';


is there a way to keep a fixed size of the list(collection) ?

I was thinking about using TTL to remove older data after certain time
but then the list will become too big if the ttl is too long, and if
ttl is too short I running the risk of having a empty list(if there is
no new activity).

Even if I don't use collection type and have my own table, I still ran
into the same issue.


Any recommendation to handle this type of situation?


thanks

Re: fixed size collection possible?

Posted by Chris Lohfink <cl...@blackbirdit.com>.
It isn’t natively supported but theres some things you can do if need it.

A lot depends on how frequently this list is getting updated. For heavier workloads I would recommend using a custom CF for this instead of collections.  If extreme inserts you would want to add additional partitioning to it as well.  As mentioned below Id recommend having a cleanup MR job to periodically clean it up if the cost of TTLs possibly leading to 0 entries is too expensive.  Putting it in its own CF helps in that it removes the elements of the list from polluting your users partition.  If there gets to be a lot of tombstones/inserts this could make reading the user bad (it would look like queue which has horrible performance) so it will at least section off that badness from the regular user lookups.

CREATE TABLE user_top_places (
  user_id varchar,
  created timeuuid,
  place varchar,
  PRIMARY KEY (user_id, created))
  WITH CLUSTERING ORDER BY (created DESC);

then to add a new one to the front of the “list”

 INSERT INTO user_top_places (user_id, created, place) VALUES ('frodo', now(), 'mordor’);

and you can see the last 10 entries

SELECT * FROM user_top_places WHERE user_id = 'frodo' LIMIT 10;

This will give you the last 10 entries (allows duplicates though).  Older records will still be around though and disk space could eventually become a problem for you.  If it becomes bad I would recommend using a periodic job like hadoop to remove excess columns (solely to save disk space).  Although if can afford the disk it would give better performance if just let it grow to a point (providing rows don’t get too large, i.e. >64mb).  If this isn’t very high in writes there might be some more clever things you can do...

If not having duplicates is more important then you can set “place” as your column name:

CREATE TABLE user_top_places (user_id varchar, place varchar, created timestamp, PRIMARY KEY (user_id, place));
INSERT INTO user_top_places (user_id, place, created) VALUES ('frodo', 'mordor', dateof(now()));

but the results won’t be in order of latest inserted so might have to do some client side filtering to show the latest only using the created field.

---
Chris Lohfink

On Apr 22, 2014, at 1:51 AM, Jimmy Lin <y2...@gmail.com> wrote:

> hi,
> look at the collection type support in cql3,
> e.g
> http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html
> 
> we can append or remove using "+" and "-" operator
> UPDATE users
>   SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
> UPDATE users
>   SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';
> 
> is there a way to keep a fixed size of the list(collection) ?
> I was thinking about using TTL to remove older data after certain time but then the list will become too big if the ttl is too long, and if ttl is too short I running the risk of having a empty list(if there is no new activity).
> 
> Even if I don't use collection type and have my own table, I still ran into the same issue.
> 
> Any recommendation to handle this type of situation?
> 
> thanks
> 


Re: fixed size collection possible?

Posted by Tupshin Harper <tu...@tupshin.com>.
No there isn't, though I would like to see such a feature, albeit more
at the CQL partition layer rather than the collection layer. Anyway,
that is sometimes referred to as a capped collection in other dbs, and
you might find the history in this ticket interesting. It points to
ways to simulate the behavior client-side.
https://issues.apache.org/jira/browse/CASSANDRA-3929

-Tupshin

On Tue, Apr 22, 2014 at 2:51 AM, Jimmy Lin <y2...@gmail.com> wrote:
> hi,
> look at the collection type support in cql3,
> e.g
> http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html
>
> we can append or remove using "+" and "-" operator
>
> UPDATE users
>   SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
>
> UPDATE users
>   SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';
>
>
> is there a way to keep a fixed size of the list(collection) ?
>
> I was thinking about using TTL to remove older data after certain time but
> then the list will become too big if the ttl is too long, and if ttl is too
> short I running the risk of having a empty list(if there is no new
> activity).
>
> Even if I don't use collection type and have my own table, I still ran into
> the same issue.
>
>
> Any recommendation to handle this type of situation?
>
>
> thanks
>
>