You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Edward Kibardin <in...@gmail.com> on 2012/09/26 23:52:02 UTC

Once again, super columns or composites?

Hi Community,

I know, I know... every one is claiming Super Columns are not good enough
and it dangerous to use them now.
But from my perspective, they have several very good advantages like:

   1. You are not fixed schema and always can add one more columns to
   subset of your supercolumns
   2. SuperColumn is loaded as whole if you requesting for at least one sub
   column, but it's the same as loading a whole composite value to get only
   one sub-value
   3. In supercolumns you can update only one subcolumn without touching
   other subcolumns, in case of composites you're unable to update just a
   portion of composite value.

May be I do not understand composites correctly, but having very small
supercolumns (10-15 subcolumns) I still think SuperColumns might be the
best solution for me...
In addition, building supercolumns with SSTableWriter is pretty
much strait-forward for me, while it's not the case with composites...

Any arguments?

Re: Once again, super columns or composites?

Posted by "Hiller, Dean" <De...@nrel.gov>.
Can you describe your use-case in detail as it might be easier to explain a model with composite names.
Later,
Dean

From: Edward Kibardin <in...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, September 27, 2012 4:02 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Once again, super columns or composites?

Sylvain, thanks for the response!

I have a use case which involves update of 1.5 millions of values a day.
Currently I'm just creating a new SSTable using SSTableWriter and uploading these SuperColunms to Cassandra.
But from my understanding, you just can't update composite column, only delete and insert... so this may make my update use case much more complicated.
It also not possible to add any sub-column to your composite, which mean we falling again to delete-insert case.
... and as I know, DynamicComposites is not recommended (and actually not supported by Pycassa).

Am I correct?

Ed


On Thu, Sep 27, 2012 at 9:28 AM, Sylvain Lebresne <sy...@datastax.com>> wrote:
When people suggest composites instead of super columns, they mean
composite column 'names', not composite column 'values'. None of the
advantages you cite stand in the case of composite column 'names'.

--
Sylvain

On Wed, Sep 26, 2012 at 11:52 PM, Edward Kibardin <in...@gmail.com>> wrote:
> Hi Community,
>
> I know, I know... every one is claiming Super Columns are not good enough
> and it dangerous to use them now.
> But from my perspective, they have several very good advantages like:
>
> You are not fixed schema and always can add one more columns to subset of
> your supercolumns
> SuperColumn is loaded as whole if you requesting for at least one sub
> column, but it's the same as loading a whole composite value to get only one
> sub-value
> In supercolumns you can update only one subcolumn without touching other
> subcolumns, in case of composites you're unable to update just a portion of
> composite value.
>
> May be I do not understand composites correctly, but having very small
> supercolumns (10-15 subcolumns) I still think SuperColumns might be the best
> solution for me...
> In addition, building supercolumns with SSTableWriter is pretty much
> strait-forward for me, while it's not the case with composites...
>
> Any arguments?
>
>


Re: Once again, super columns or composites?

Posted by Edward Kibardin <in...@gmail.com>.
Oh... Sylvain, thanks a lot for such a complete answer.

Yeah, I understand my mistake in suggestions regarding composites.
It seems, composites are pretty much an advanced version of key manual
joining into a string column name: <key1>:<key2>

Thanks a lot!
Ed

On Thu, Sep 27, 2012 at 2:02 PM, Sylvain Lebresne <sy...@datastax.com>wrote:

> > But from my understanding, you just can't update composite column, only
> > delete and insert... so this may make my update use case much more
> > complicated.
>
> Let me try to sum things up.
> In regular column families, a column (value) is defined by 2 keys: the
> row key and the column name.
> In super column families, a column (value) is defined by 3 keys: the
> row key, the super column name and the column name.
>
> So a super column is really just the set of columns that share the
> same (row key, super column name) pair.
>
> The idea of composite columns is to use regular columns, but simply to
> distinguish multiple parts of the column name. So now if you take the
> example of a CompositeType with 2 components. In that column family:
> a column (value) is defined by 3 keys: the row key, the first
> component of the column name and the second component of the column
> name.
>
> In other words, composites are a *generalization* of super columns and
> super columns are the case of composites with 2 components. Except
> that super columns are hard-wired in the cassandra code base in a way
> that come with a number of limitation, the main one being that we
> always deserialize a super column (again, which is just a set of
> columns) in its entirety when we read it from disk.
>
> So no, it's not true that " you just can't update composite column,
> only delete and insert" nor that it is "not possible to add any
> sub-column to your composite".
>
> That being said, if you are using the thrift interface, super columns
> do have a few perks currently:
>   - the grouping of all the sub-columns composing a super columns is
> hard-wired in Cassandra. The equivalent for composites, which consists
> in grouping all columns having the same value for a given component,
> must be done client side. Maybe some client library do that for you
> but I'm not sure (I don't know for Pycassa for instance).
>   - there is a few queries that can be easily done with super columns
> that don't translate easily to composites, namely deleting whole super
> columns and to a less extend querying multiple super columns by name.
> That's due to a few limitations that upcoming versions of Cassandra
> will solve but it's not the case with currently released versions.
>
> The bottom line is: if you can do without those few perks, then you'd
> better use composites since they have less limitations. If you can't
> really do without these perks and can live with the super columns
> limitations, then go on, use super columns. (And if you want the perks
> without the limitations, wait for Cassandra 1.2 and use CQL3 :D)
>
>
> > ... and as I know, DynamicComposites is not recommended (and actually not
> > supported by Pycassa).
>
> DynamicComposites don't do what you think they do. They do nothing
> more than "regular" composite as far as comparing them to SuperColumns
> is concerned, except giving you ways to shoot yourself in the foot.
>
> --
> Sylvain
>

Re: Once again, super columns or composites?

Posted by Sylvain Lebresne <sy...@datastax.com>.
> But from my understanding, you just can't update composite column, only
> delete and insert... so this may make my update use case much more
> complicated.

Let me try to sum things up.
In regular column families, a column (value) is defined by 2 keys: the
row key and the column name.
In super column families, a column (value) is defined by 3 keys: the
row key, the super column name and the column name.

So a super column is really just the set of columns that share the
same (row key, super column name) pair.

The idea of composite columns is to use regular columns, but simply to
distinguish multiple parts of the column name. So now if you take the
example of a CompositeType with 2 components. In that column family:
a column (value) is defined by 3 keys: the row key, the first
component of the column name and the second component of the column
name.

In other words, composites are a *generalization* of super columns and
super columns are the case of composites with 2 components. Except
that super columns are hard-wired in the cassandra code base in a way
that come with a number of limitation, the main one being that we
always deserialize a super column (again, which is just a set of
columns) in its entirety when we read it from disk.

So no, it's not true that " you just can't update composite column,
only delete and insert" nor that it is "not possible to add any
sub-column to your composite".

That being said, if you are using the thrift interface, super columns
do have a few perks currently:
  - the grouping of all the sub-columns composing a super columns is
hard-wired in Cassandra. The equivalent for composites, which consists
in grouping all columns having the same value for a given component,
must be done client side. Maybe some client library do that for you
but I'm not sure (I don't know for Pycassa for instance).
  - there is a few queries that can be easily done with super columns
that don't translate easily to composites, namely deleting whole super
columns and to a less extend querying multiple super columns by name.
That's due to a few limitations that upcoming versions of Cassandra
will solve but it's not the case with currently released versions.

The bottom line is: if you can do without those few perks, then you'd
better use composites since they have less limitations. If you can't
really do without these perks and can live with the super columns
limitations, then go on, use super columns. (And if you want the perks
without the limitations, wait for Cassandra 1.2 and use CQL3 :D)


> ... and as I know, DynamicComposites is not recommended (and actually not
> supported by Pycassa).

DynamicComposites don't do what you think they do. They do nothing
more than "regular" composite as far as comparing them to SuperColumns
is concerned, except giving you ways to shoot yourself in the foot.

--
Sylvain

Re: Once again, super columns or composites?

Posted by Edward Kibardin <in...@gmail.com>.
Sylvain, thanks for the response!

I have a use case which involves update of 1.5 millions of values a day.
Currently I'm just creating a new SSTable using SSTableWriter and uploading
these SuperColunms to Cassandra.
But from my understanding, you just can't update composite column, only
delete and insert... so this may make my update use case much more
complicated.
It also not possible to add any sub-column to your composite, which mean we
falling again to delete-insert case.
... and as I know, DynamicComposites is not recommended (and actually not
supported by Pycassa).

Am I correct?

Ed


On Thu, Sep 27, 2012 at 9:28 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> When people suggest composites instead of super columns, they mean
> composite column 'names', not composite column 'values'. None of the
> advantages you cite stand in the case of composite column 'names'.
>
> --
> Sylvain
>
> On Wed, Sep 26, 2012 at 11:52 PM, Edward Kibardin <in...@gmail.com>
> wrote:
> > Hi Community,
> >
> > I know, I know... every one is claiming Super Columns are not good enough
> > and it dangerous to use them now.
> > But from my perspective, they have several very good advantages like:
> >
> > You are not fixed schema and always can add one more columns to subset of
> > your supercolumns
> > SuperColumn is loaded as whole if you requesting for at least one sub
> > column, but it's the same as loading a whole composite value to get only
> one
> > sub-value
> > In supercolumns you can update only one subcolumn without touching other
> > subcolumns, in case of composites you're unable to update just a portion
> of
> > composite value.
> >
> > May be I do not understand composites correctly, but having very small
> > supercolumns (10-15 subcolumns) I still think SuperColumns might be the
> best
> > solution for me...
> > In addition, building supercolumns with SSTableWriter is pretty much
> > strait-forward for me, while it's not the case with composites...
> >
> > Any arguments?
> >
> >
>

Re: Once again, super columns or composites?

Posted by Sylvain Lebresne <sy...@datastax.com>.
When people suggest composites instead of super columns, they mean
composite column 'names', not composite column 'values'. None of the
advantages you cite stand in the case of composite column 'names'.

--
Sylvain

On Wed, Sep 26, 2012 at 11:52 PM, Edward Kibardin <in...@gmail.com> wrote:
> Hi Community,
>
> I know, I know... every one is claiming Super Columns are not good enough
> and it dangerous to use them now.
> But from my perspective, they have several very good advantages like:
>
> You are not fixed schema and always can add one more columns to subset of
> your supercolumns
> SuperColumn is loaded as whole if you requesting for at least one sub
> column, but it's the same as loading a whole composite value to get only one
> sub-value
> In supercolumns you can update only one subcolumn without touching other
> subcolumns, in case of composites you're unable to update just a portion of
> composite value.
>
> May be I do not understand composites correctly, but having very small
> supercolumns (10-15 subcolumns) I still think SuperColumns might be the best
> solution for me...
> In addition, building supercolumns with SSTableWriter is pretty much
> strait-forward for me, while it's not the case with composites...
>
> Any arguments?
>
>