You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Kant Kodali <ka...@peernova.com> on 2016/10/23 19:35:03 UTC

is there any problem having too many clustering columns?

Hi All,

Is there any problem having too many clustering columns? My goal is to
store data by columns in order and for any given partition (primary key)
each of its non-clustering column (columns that are not part of primary
key) can lead to a new column underneath or the CQL equivalent would be a
new row in a partition and from the other thread I heard the sweet spot is
about 100MB per partition in which case I would like to include all minus
one columns as clustering columns and the one that is left out as a regular
non-clustering column.

In short I would do something like this

create table hello(
    a int,
    b text,
    c int,
    d text,
    e int,
    f  bigint,
    g text,
    h text,
    i  int,
    body blob
    primary key(a, b, c, d, e, f, g, h, i)
)

instead of say doing something like the one below

create table hello(
    a int,
    b text,
    c int,
    d text,
    e int,
    f  bigint,
    g text,
    h text,
    i  int,
    body blob
    primary key(a, b)
)

These are just example tables(not my actual ones) but I hope you get the
idea. please let me know if you see something wrong with my approach?

Thanks!

Re: is there any problem having too many clustering columns?

Posted by Kant Kodali <ka...@peernova.com>.

That helps! thanks! I assume you meant "updating one the columns in the
PRIMARY KEY would require DELETE + INSERT".

since we don't do updates or deletes on this table I believe could leverage
this!



On Sun, Oct 23, 2016 at 12:44 PM, DuyHai Doan <do...@gmail.com> wrote:

> There is nothing wrong with your schema, but just remember that because to
> set everything except one as clustering columns, updating them is no longer
> possible. To "update" the value of one of those columns you'll need to do a
> DELETE + INSERT.
>
> Example:
>
> with normal schema: UPDATE hello SET e = <new_value> WHERE a = xxx AND b =
> yyy
>
> with all clustering schema:
>
> DELETE hello WHERE a = xxx AND b = yyy AND c = ... AND e = <old_value> AND
> f = .... AND i = ...
> INSERT INTO hello(a,b,..e,...i) VALUES(.., ..., <new_value>,...)
>
> In term of storage engine, you'll create a bunch of tombstones and
> duplicates of values
>
>
>
> On Sun, Oct 23, 2016 at 9:35 PM, Kant Kodali <ka...@peernova.com> wrote:
>
>> Hi All,
>>
>> Is there any problem having too many clustering columns? My goal is to
>> store data by columns in order and for any given partition (primary key)
>> each of its non-clustering column (columns that are not part of primary
>> key) can lead to a new column underneath or the CQL equivalent would be a
>> new row in a partition and from the other thread I heard the sweet spot is
>> about 100MB per partition in which case I would like to include all minus
>> one columns as clustering columns and the one that is left out as a regular
>> non-clustering column.
>>
>> In short I would do something like this
>>
>> create table hello(
>>     a int,
>>     b text,
>>     c int,
>>     d text,
>>     e int,
>>     f  bigint,
>>     g text,
>>     h text,
>>     i  int,
>>     body blob
>>     primary key(a, b, c, d, e, f, g, h, i)
>> )
>>
>> instead of say doing something like the one below
>>
>> create table hello(
>>     a int,
>>     b text,
>>     c int,
>>     d text,
>>     e int,
>>     f  bigint,
>>     g text,
>>     h text,
>>     i  int,
>>     body blob
>>     primary key(a, b)
>> )
>>
>> These are just example tables(not my actual ones) but I hope you get the
>> idea. please let me know if you see something wrong with my approach?
>>
>> Thanks!
>>
>>
>

Re: is there any problem having too many clustering columns?

Posted by DuyHai Doan <do...@gmail.com>.

There is nothing wrong with your schema, but just remember that because to
set everything except one as clustering columns, updating them is no longer
possible. To "update" the value of one of those columns you'll need to do a
DELETE + INSERT.

Example:

with normal schema: UPDATE hello SET e = <new_value> WHERE a = xxx AND b =
yyy

with all clustering schema:

DELETE hello WHERE a = xxx AND b = yyy AND c = ... AND e = <old_value> AND
f = .... AND i = ...
INSERT INTO hello(a,b,..e,...i) VALUES(.., ..., <new_value>,...)

In term of storage engine, you'll create a bunch of tombstones and
duplicates of values



On Sun, Oct 23, 2016 at 9:35 PM, Kant Kodali <ka...@peernova.com> wrote:

> Hi All,
>
> Is there any problem having too many clustering columns? My goal is to
> store data by columns in order and for any given partition (primary key)
> each of its non-clustering column (columns that are not part of primary
> key) can lead to a new column underneath or the CQL equivalent would be a
> new row in a partition and from the other thread I heard the sweet spot is
> about 100MB per partition in which case I would like to include all minus
> one columns as clustering columns and the one that is left out as a regular
> non-clustering column.
>
> In short I would do something like this
>
> create table hello(
>     a int,
>     b text,
>     c int,
>     d text,
>     e int,
>     f  bigint,
>     g text,
>     h text,
>     i  int,
>     body blob
>     primary key(a, b, c, d, e, f, g, h, i)
> )
>
> instead of say doing something like the one below
>
> create table hello(
>     a int,
>     b text,
>     c int,
>     d text,
>     e int,
>     f  bigint,
>     g text,
>     h text,
>     i  int,
>     body blob
>     primary key(a, b)
> )
>
> These are just example tables(not my actual ones) but I hope you get the
> idea. please let me know if you see something wrong with my approach?
>
> Thanks!
>
>