You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by David Boxenhorn <da...@lookin2.com> on 2011/02/03 11:28:08 UTC

Do supercolumns have a purpose?

Is there any advantage to using supercolumns
(columnFamilyName[superColumnName[columnName[val]]]) instead of regular
columns with concatenated keys
(columnFamilyName[superColumnName@columnName[val]])?


When I designed my data model, I used supercolumns wherever I needed two
levels of key depth - just because they were there, and I figured that they
must be there for a reason.

Now I see that in 0.7 secondary indexes don't work on supercolumns or
subcolumns (is that right?), which seems to me like a very serious
limitation of supercolumn families.

It raises the question: Is there anything that supercolumn families are good
for?

And here's a related question: Why can't Cassandra implement supercolumn
families as regular column families, internally, and give you that
functionality?

Re: Do supercolumns have a purpose?

Posted by Ryan King <ry...@twitter.com>.

On Thu, Feb 3, 2011 at 6:49 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com> wrote:
>>>
>>> The advantage would be to enable secondary indexes on supercolumn
>>> families.
>>
>> Then I suggest opening a ticket for adding secondary indexes to supercolumn
>> families and voting on it.
>
> https://issues.apache.org/jira/browse/CASSANDRA-598

I think we're talking about 2 different indexes here. That ticket was
about indexing in the storage format, but the OP was asking about
secondary indexes.

-ryan

Re: Do supercolumns have a purpose?

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com> wrote:
>>
>> The advantage would be to enable secondary indexes on supercolumn
>> families.
>
> Then I suggest opening a ticket for adding secondary indexes to supercolumn
> families and voting on it.

https://issues.apache.org/jira/browse/CASSANDRA-598

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Do supercolumns have a purpose?

Posted by David Boxenhorn <da...@lookin2.com>.

Well, I am an "actual active developer" and I have "managed to do pretty
nice stuffs with Cassandra" - without secondary indexes so far. But I'm
looking forward to having secondary indexes in my arsenal when new
functional requirements come up, and I'm bummed out that my early design
decision to use supercolums wherever I could, instead of concatenating keys,
has closed off a whole lot of possibilities. I knew when I started that
secondary keys were in the future, if I had known that they would be only
for regular column families I wouldn't have used supercolumn families in the
first place, now I'm pretty much stuck (too late to go back - we're
launching in March).


On Thu, Feb 3, 2011 at 4:44 PM, Sylvain Lebresne <sy...@datastax.com>wrote:

> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com> wrote:
>
>> The advantage would be to enable secondary indexes on supercolumn
>> families.
>>
>
> Then I suggest opening a ticket for adding secondary indexes to supercolumn
> families and voting on it. This will be 1 or 2 order of magnitude less work
> than getting rid of super column internally, and probably a much better
> solution anyway.
>
>
>> I understand from this thread that indexes are supercolumn families are
>> not going to be:
>>
>> http://www.mail-archive.com/user@cassandra.apache.org/msg09527.html
>>
>
> I should maybe let Jonathan answer this one, but the way I understand it is
> that adding secondary indexes to super column is not a top priority to
> actual active developers. Not that it will never ever happen. And voting for
> tickets in JIRA is one way to help make it raise its priority.
>
> In any case, if the goal you're pursuing is adding secondary indexes to
> super column, then that's the ticket you should open, and if after careful
> consideration it is decided that getting rid of super column is the best way
> to reach that goal then so be it (spoiler: it is not).
>
>
>> Which, it seems to me, effectively deprecates supercolumn families. (I
>> don't see any of the three problems you brought up as overcoming this
>> problem, except, perhaps, for special cases.)
>>
>
> You're untitled to your opinions obviously but I doubt everyone share that
> feeling (I don't for instance). Before 0.7, there was no secondary indexes
> at all and still a bunch of people managed to do pretty nice stuffs with
> Cassandra. In particular denormalized views are sometimes (often?)
> preferable to secondary indexes for performance reasons. For that super
> columns are quite handy.
>
> --
> Sylvain
>
>
>>
>>
>>  On Thu, Feb 3, 2011 at 3:32 PM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>
>>> On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>
>>>> Thanks Sylvain!
>>>>
>>>> Can I vote for internally implementing supercolumn families as regular
>>>> column families? (With a smooth upgrade process that doesn't require
>>>> shutting down a live cluster.)
>>>>
>>>
>>> I forgot to add that I don't know if this make a lot of sense. That would
>>> be a fairly major refactor (so error prone), you'd still have to deal with
>>> the point I mentioned in my previous mail (for range deletes you would have
>>> to change the on-disk format for instance), and all this for no actual
>>> benefits, even downsides actually (encoded supercolumn will take more space
>>> on-disk (and on-memory)). Super columns are there and work fairly well, so
>>> what would be the point ?
>>>
>>> I'm only just saying that 'in theory', super columns are not the super
>>> shiny magical feature that give you stuff you can't hope to have with only
>>> regular column family. That doesn't make then at least nice.
>>>
>>> That being said, you are free to create whatever ticket you want and vote
>>> for it. Don't expect too much support tough :)
>>>
>>>
>>>> What if supercolumn families were supported as regular column families +
>>>> an index (on what used to be supercolumn keys)? Would that solve some
>>>> problems?
>>>>
>>>
>>> You'd still have to remember for each CF if it has this index on what
>>> used to be supercolumn keys and handle those differently. Really not
>>> convince this would make the code cleaner that how it is now. And making the
>>> code cleaner is really the only reason I can thing of for wanting to get rid
>>> of super columns internally, so ...
>>>
>>>
>>>>
>>>>
>>>> On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>>>
>>>>> > Is there any advantage to using supercolumns
>>>>> > (columnFamilyName[superColumnName[columnName[val]]]) instead of
>>>>> regular
>>>>> > columns with concatenated keys
>>>>> > (columnFamilyName[superColumnName@columnName[val]])?
>>>>> >
>>>>> > When I designed my data model, I used supercolumns wherever I needed
>>>>> two
>>>>> > levels of key depth - just because they were there, and I figured
>>>>> that they
>>>>> > must be there for a reason.
>>>>> >
>>>>> > Now I see that in 0.7 secondary indexes don't work on supercolumns or
>>>>> > subcolumns (is that right?), which seems to me like a very serious
>>>>> > limitation of supercolumn families.
>>>>> >
>>>>> > It raises the question: Is there anything that supercolumn families
>>>>> are good
>>>>> > for?
>>>>>
>>>>> There is a bunch of queries that you cannot do (or less conveniently)
>>>>> if you
>>>>> encode super columns using regular columns with concatenated keys:
>>>>>
>>>>> 1) If you use regular columns with concatenated keys, the count
>>>>> argument
>>>>> count simple columns. With super columns it counts super columns. It
>>>>> means
>>>>> that you can't do "give me the 10 first super columns of this row".
>>>>>
>>>>> 2) If you need to get x super columns by name, you'll have to issue x
>>>>> get_slice query (one of each super column). On the client side it
>>>>> sucks.
>>>>> Internally in Cassandra we could do it reasonably well though.
>>>>>
>>>>> 3) You cannot remove entire super columns since there is no support for
>>>>> range
>>>>> deletions.
>>>>>
>>>>> Moreover, the encoding with concatenated keys uses more disk space (and
>>>>> less
>>>>> disk used for the same information means less things to read so it may
>>>>> have
>>>>> a slight impact on read performance too -- it's probably really slight
>>>>> on most
>>>>> usage but nevertheless).
>>>>>
>>>>> > And here's a related question: Why can't Cassandra implement
>>>>> supercolumn
>>>>> > families as regular column families, internally, and give you that
>>>>> > functionality?
>>>>>
>>>>> For the 1) and 2) above, we could deal with those internally fairly
>>>>> easily I
>>>>> think and rather well (which means it wouldn't be much worse
>>>>> performance-wise
>>>>> than with the actual implementaion of super columns, not that it would
>>>>> be
>>>>> better). For 3), range deletes are harder and would require more
>>>>> significant
>>>>> changes (that doesn't mean that Cassandra will never have it). Even
>>>>> without
>>>>> that, there would be the disk space lost.
>>>>>
>>>>> --
>>>>> Sylvain
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Do supercolumns have a purpose?

Posted by David Boxenhorn <da...@lookin2.com>.

I agree, that is the way to go. Then each piece of new functionality will
not have to be implemented twice.

On Sat, Feb 12, 2011 at 9:41 AM, Stu Hood <st...@gmail.com> wrote:

> I would like to continue to support super columns, but to slowly convert
> them into "compound column names", since that is really all they really are.
>
>
> On Thu, Feb 10, 2011 at 10:16 AM, Frank LoVecchio <fr...@isidorey.com>wrote:
>
>> I've found super column families quite useful when using
>> RandomOrderedPartioner on a low-maintenance cluster (as opposed to
>> Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type;
>> try doing that with one regular column family and secondary indexes (you
>> could obviously sort on the client side, but that is tedious and not logical
>> for older data).
>>
>> On Thu, Feb 10, 2011 at 12:32 AM, David Boxenhorn <da...@lookin2.com>wrote:
>>
>>> Mike, my problem is that I have an database and codebase that already
>>> uses supercolumns. If I had to do it over, it wouldn't use them, for the
>>> reasons you point out. In fact, I have a feeling that over time supercolumns
>>> will become deprecated de facto, if not de jure. That's why I would like to
>>> see them represented internally as regular columns, with an upgrade path for
>>> backward compatibility.
>>>
>>> I would love to do it myself! (I haven't looked at the code base, but I
>>> don't understand why it should be so hard.) But my employer has other
>>> ideas...
>>>
>>>
>>> On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <mi...@simplegeo.com> wrote:
>>>
>>>> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com>wrote:
>>>>
>>>>> Shaun, I agree with you, but marking them as deprecated is not good
>>>>> enough for me. I can't easily stop using supercolumns. I need an upgrade
>>>>> path.
>>>>>
>>>>
>>>> David,
>>>>
>>>> Cassandra is open source and community developed. The right thing to do
>>>> is what's best for the community, which sometimes conflicts with what's best
>>>> for individual users. Such strife should be minimized, it will never be
>>>> eliminated. Luckily, because this is an open source, liberal licensed
>>>> project, if you feel strongly about something you should feel free to add
>>>> whatever features you want yourself. I'm sure other people in your situation
>>>> will thank you for it.
>>>>
>>>> At a minimum I think it would behoove you to re-read some of the
>>>> comments here re: why super columns aren't really needed and take another
>>>> look at your data model and code. I would actually be quite surprised to
>>>> find a use of super columns that could not be trivially converted to normal
>>>> columns. In fact, it should be possible to do at the framework/client
>>>> library layer - you probably wouldn't even need to change any application
>>>> code.
>>>>
>>>> Mike
>>>>
>>>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net>wrote:
>>>>>
>>>>>>
>>>>>> I'm a newbie here, but, with apologies for my presumptuousness, I
>>>>>> think you should deprecate SuperColumns. They are already distracting you,
>>>>>> and as the years go by the cost of supporting them as you add more and more
>>>>>> functionality is only likely to get worse. It would be better to concentrate
>>>>>> on making the "core" column families better (and I'm sure we can all think
>>>>>> of lots of things we'd like).
>>>>>>
>>>>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>>>>> users like David who are currently using them. But if you mark them clearly
>>>>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>>>>> of effort into migration tools... or even a "virtual" layer supporting
>>>>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>>>>> you get to 1.0, say), without people feeling betrayed.
>>>>>>
>>>>>> -- Shaun
>>>>>>
>>>>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>>>>
>>>>>> "My main point was to say that it's think it is better to create
>>>>>> tickets for what you want, rather than for something else completely
>>>>>> different that would, as a by-product, give you what you want."
>>>>>>
>>>>>> Then let me say what I want: I want supercolumn families to have any
>>>>>> feature that regular column families have.
>>>>>>
>>>>>> My data model is full of supercolumns. I used them, even though I knew
>>>>>> it didn't *have to*, "because they were there", which implied to me that I
>>>>>> was supposed to use them for some good reason. Now I suspect that they will
>>>>>> gradually become less and less functional, as features are added to regular
>>>>>> column families and not supported for supercolumn families.
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <
>>>>>> sylvain@datastax.com> wrote:
>>>>>>
>>>>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com>wrote:
>>>>>>>
>>>>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <
>>>>>>>> sylvain@datastax.com> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <david@lookin2.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>>>>> families.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>>>>> probably a much better solution anyway.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I realize that this is largely subjective, and on such matters code
>>>>>>>> speaks louder than words, but I don't think I agree with you on the issue of
>>>>>>>> which alternative is less work, or even which is a better solution.
>>>>>>>>
>>>>>>>
>>>>>>> You are right, I put probably too much emphase in that sentence. My
>>>>>>> main point was to say that it's think it is better to create tickets for
>>>>>>> what you want, rather than for something else completely different that
>>>>>>> would, as a by-product, give you what you want.
>>>>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>>>>> super columns, then there is a good chance this would be less work than
>>>>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>>>>> columns may not make too much sense without #598, which itself would require
>>>>>>> quite some work, so clearly I spoke a bit quickly.
>>>>>>>
>>>>>>>
>>>>>>>> If the goal is to have a hierarchical model, limiting the depth to
>>>>>>>> two seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>>>>>> hierarchy?
>>>>>>>>
>>>>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>>>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>>>>>> has a similar architecture and goes even further [2].
>>>>>>>>
>>>>>>>> It seems to me that super columns are a historical artifact from
>>>>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>>>>> posting lists of messages, sharded by user. So that's what they built. In my
>>>>>>>> dealings with the Cassandra code, super columns end up making a mess all
>>>>>>>> over the place when algorithms need to be special cased and branch based on
>>>>>>>> the column/supercolumn distinction.
>>>>>>>>
>>>>>>>> I won't even mention what it does to the thrift interface.
>>>>>>>>
>>>>>>>
>>>>>>> Actually, I agree with you, more than you know. If I were to start
>>>>>>> coding Cassandra now, I wouldn't include super columns (and I would probably
>>>>>>> not go for a depth unlimited hierarchical model either). But it's there and
>>>>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an
>>>>>>> option (it would be a big compatibility breakage). And (even though I
>>>>>>> certainly though about this more than once :)) I'm slightly
>>>>>>> less enthusiastic about keeping them in thrift but encoding them in regular
>>>>>>> column family internally: it would still be a lot of work but we would still
>>>>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>>>>
>>>>>>> --
>>>>>>> Sylvain
>>>>>>>
>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Frank LoVecchio
>> Senior Software Engineer | Isidorey, LLC
>> Google Voice +1.720.295.9179
>> isidorey.com | facebook.com/franklovecchio | franklovecchio.com |
>> rodsandricers.com
>>
>>
>

Re: Do supercolumns have a purpose?

Posted by Stu Hood <st...@gmail.com>.

I would like to continue to support super columns, but to slowly convert
them into "compound column names", since that is really all they really are.

On Thu, Feb 10, 2011 at 10:16 AM, Frank LoVecchio <fr...@isidorey.com>wrote:

> I've found super column families quite useful when using
> RandomOrderedPartioner on a low-maintenance cluster (as opposed to
> Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type;
> try doing that with one regular column family and secondary indexes (you
> could obviously sort on the client side, but that is tedious and not logical
> for older data).
>
> On Thu, Feb 10, 2011 at 12:32 AM, David Boxenhorn <da...@lookin2.com>wrote:
>
>> Mike, my problem is that I have an database and codebase that already uses
>> supercolumns. If I had to do it over, it wouldn't use them, for the reasons
>> you point out. In fact, I have a feeling that over time supercolumns will
>> become deprecated de facto, if not de jure. That's why I would like to see
>> them represented internally as regular columns, with an upgrade path for
>> backward compatibility.
>>
>> I would love to do it myself! (I haven't looked at the code base, but I
>> don't understand why it should be so hard.) But my employer has other
>> ideas...
>>
>>
>> On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <mi...@simplegeo.com> wrote:
>>
>>> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com>wrote:
>>>
>>>> Shaun, I agree with you, but marking them as deprecated is not good
>>>> enough for me. I can't easily stop using supercolumns. I need an upgrade
>>>> path.
>>>>
>>>
>>> David,
>>>
>>> Cassandra is open source and community developed. The right thing to do
>>> is what's best for the community, which sometimes conflicts with what's best
>>> for individual users. Such strife should be minimized, it will never be
>>> eliminated. Luckily, because this is an open source, liberal licensed
>>> project, if you feel strongly about something you should feel free to add
>>> whatever features you want yourself. I'm sure other people in your situation
>>> will thank you for it.
>>>
>>> At a minimum I think it would behoove you to re-read some of the comments
>>> here re: why super columns aren't really needed and take another look at
>>> your data model and code. I would actually be quite surprised to find a use
>>> of super columns that could not be trivially converted to normal columns. In
>>> fact, it should be possible to do at the framework/client library layer -
>>> you probably wouldn't even need to change any application code.
>>>
>>> Mike
>>>
>>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>>>
>>>>>
>>>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>>>> you should deprecate SuperColumns. They are already distracting you, and as
>>>>> the years go by the cost of supporting them as you add more and more
>>>>> functionality is only likely to get worse. It would be better to concentrate
>>>>> on making the "core" column families better (and I'm sure we can all think
>>>>> of lots of things we'd like).
>>>>>
>>>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>>>> users like David who are currently using them. But if you mark them clearly
>>>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>>>> of effort into migration tools... or even a "virtual" layer supporting
>>>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>>>> you get to 1.0, say), without people feeling betrayed.
>>>>>
>>>>> -- Shaun
>>>>>
>>>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>>>
>>>>> "My main point was to say that it's think it is better to create
>>>>> tickets for what you want, rather than for something else completely
>>>>> different that would, as a by-product, give you what you want."
>>>>>
>>>>> Then let me say what I want: I want supercolumn families to have any
>>>>> feature that regular column families have.
>>>>>
>>>>> My data model is full of supercolumns. I used them, even though I knew
>>>>> it didn't *have to*, "because they were there", which implied to me that I
>>>>> was supposed to use them for some good reason. Now I suspect that they will
>>>>> gradually become less and less functional, as features are added to regular
>>>>> column families and not supported for supercolumn families.
>>>>>
>>>>>
>>>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <
>>>>> sylvain@datastax.com> wrote:
>>>>>
>>>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com>wrote:
>>>>>>
>>>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <
>>>>>>> sylvain@datastax.com> wrote:
>>>>>>>
>>>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>>>>>>
>>>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>>>> families.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>>>> probably a much better solution anyway.
>>>>>>>>
>>>>>>>
>>>>>>> I realize that this is largely subjective, and on such matters code
>>>>>>> speaks louder than words, but I don't think I agree with you on the issue of
>>>>>>> which alternative is less work, or even which is a better solution.
>>>>>>>
>>>>>>
>>>>>> You are right, I put probably too much emphase in that sentence. My
>>>>>> main point was to say that it's think it is better to create tickets for
>>>>>> what you want, rather than for something else completely different that
>>>>>> would, as a by-product, give you what you want.
>>>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>>>> super columns, then there is a good chance this would be less work than
>>>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>>>> columns may not make too much sense without #598, which itself would require
>>>>>> quite some work, so clearly I spoke a bit quickly.
>>>>>>
>>>>>>
>>>>>>> If the goal is to have a hierarchical model, limiting the depth to
>>>>>>> two seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>>>>> hierarchy?
>>>>>>>
>>>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>>>>> has a similar architecture and goes even further [2].
>>>>>>>
>>>>>>> It seems to me that super columns are a historical artifact from
>>>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>>>> posting lists of messages, sharded by user. So that's what they built. In my
>>>>>>> dealings with the Cassandra code, super columns end up making a mess all
>>>>>>> over the place when algorithms need to be special cased and branch based on
>>>>>>> the column/supercolumn distinction.
>>>>>>>
>>>>>>> I won't even mention what it does to the thrift interface.
>>>>>>>
>>>>>>
>>>>>> Actually, I agree with you, more than you know. If I were to start
>>>>>> coding Cassandra now, I wouldn't include super columns (and I would probably
>>>>>> not go for a depth unlimited hierarchical model either). But it's there and
>>>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an
>>>>>> option (it would be a big compatibility breakage). And (even though I
>>>>>> certainly though about this more than once :)) I'm slightly
>>>>>> less enthusiastic about keeping them in thrift but encoding them in regular
>>>>>> column family internally: it would still be a lot of work but we would still
>>>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>>>
>>>>>> --
>>>>>> Sylvain
>>>>>>
>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Frank LoVecchio
> Senior Software Engineer | Isidorey, LLC
> Google Voice +1.720.295.9179
> isidorey.com | facebook.com/franklovecchio | franklovecchio.com |
> rodsandricers.com
>
>

Re: Do supercolumns have a purpose?

Posted by Frank LoVecchio <fr...@isidorey.com>.

I've found super column families quite useful when using
RandomOrderedPartioner on a low-maintenance cluster (as opposed to
Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type;
try doing that with one regular column family and secondary indexes (you
could obviously sort on the client side, but that is tedious and not logical
for older data).

On Thu, Feb 10, 2011 at 12:32 AM, David Boxenhorn <da...@lookin2.com> wrote:

> Mike, my problem is that I have an database and codebase that already uses
> supercolumns. If I had to do it over, it wouldn't use them, for the reasons
> you point out. In fact, I have a feeling that over time supercolumns will
> become deprecated de facto, if not de jure. That's why I would like to see
> them represented internally as regular columns, with an upgrade path for
> backward compatibility.
>
> I would love to do it myself! (I haven't looked at the code base, but I
> don't understand why it should be so hard.) But my employer has other
> ideas...
>
>
> On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <mi...@simplegeo.com> wrote:
>
>> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com>wrote:
>>
>>> Shaun, I agree with you, but marking them as deprecated is not good
>>> enough for me. I can't easily stop using supercolumns. I need an upgrade
>>> path.
>>>
>>
>> David,
>>
>> Cassandra is open source and community developed. The right thing to do is
>> what's best for the community, which sometimes conflicts with what's best
>> for individual users. Such strife should be minimized, it will never be
>> eliminated. Luckily, because this is an open source, liberal licensed
>> project, if you feel strongly about something you should feel free to add
>> whatever features you want yourself. I'm sure other people in your situation
>> will thank you for it.
>>
>> At a minimum I think it would behoove you to re-read some of the comments
>> here re: why super columns aren't really needed and take another look at
>> your data model and code. I would actually be quite surprised to find a use
>> of super columns that could not be trivially converted to normal columns. In
>> fact, it should be possible to do at the framework/client library layer -
>> you probably wouldn't even need to change any application code.
>>
>> Mike
>>
>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>>
>>>>
>>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>>> you should deprecate SuperColumns. They are already distracting you, and as
>>>> the years go by the cost of supporting them as you add more and more
>>>> functionality is only likely to get worse. It would be better to concentrate
>>>> on making the "core" column families better (and I'm sure we can all think
>>>> of lots of things we'd like).
>>>>
>>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>>> users like David who are currently using them. But if you mark them clearly
>>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>>> of effort into migration tools... or even a "virtual" layer supporting
>>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>>> you get to 1.0, say), without people feeling betrayed.
>>>>
>>>> -- Shaun
>>>>
>>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>>
>>>> "My main point was to say that it's think it is better to create tickets
>>>> for what you want, rather than for something else completely different that
>>>> would, as a by-product, give you what you want."
>>>>
>>>> Then let me say what I want: I want supercolumn families to have any
>>>> feature that regular column families have.
>>>>
>>>> My data model is full of supercolumns. I used them, even though I knew
>>>> it didn't *have to*, "because they were there", which implied to me that I
>>>> was supposed to use them for some good reason. Now I suspect that they will
>>>> gradually become less and less functional, as features are added to regular
>>>> column families and not supported for supercolumn families.
>>>>
>>>>
>>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sylvain@datastax.com
>>>> > wrote:
>>>>
>>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com>wrote:
>>>>>
>>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <
>>>>>> sylvain@datastax.com> wrote:
>>>>>>
>>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>>>>>
>>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>>> families.
>>>>>>>>
>>>>>>>
>>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>>> probably a much better solution anyway.
>>>>>>>
>>>>>>
>>>>>> I realize that this is largely subjective, and on such matters code
>>>>>> speaks louder than words, but I don't think I agree with you on the issue of
>>>>>> which alternative is less work, or even which is a better solution.
>>>>>>
>>>>>
>>>>> You are right, I put probably too much emphase in that sentence. My
>>>>> main point was to say that it's think it is better to create tickets for
>>>>> what you want, rather than for something else completely different that
>>>>> would, as a by-product, give you what you want.
>>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>>> super columns, then there is a good chance this would be less work than
>>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>>> columns may not make too much sense without #598, which itself would require
>>>>> quite some work, so clearly I spoke a bit quickly.
>>>>>
>>>>>
>>>>>> If the goal is to have a hierarchical model, limiting the depth to two
>>>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>>>> hierarchy?
>>>>>>
>>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>>>> has a similar architecture and goes even further [2].
>>>>>>
>>>>>> It seems to me that super columns are a historical artifact from
>>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>>> posting lists of messages, sharded by user. So that's what they built. In my
>>>>>> dealings with the Cassandra code, super columns end up making a mess all
>>>>>> over the place when algorithms need to be special cased and branch based on
>>>>>> the column/supercolumn distinction.
>>>>>>
>>>>>> I won't even mention what it does to the thrift interface.
>>>>>>
>>>>>
>>>>> Actually, I agree with you, more than you know. If I were to start
>>>>> coding Cassandra now, I wouldn't include super columns (and I would probably
>>>>> not go for a depth unlimited hierarchical model either). But it's there and
>>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an
>>>>> option (it would be a big compatibility breakage). And (even though I
>>>>> certainly though about this more than once :)) I'm slightly
>>>>> less enthusiastic about keeping them in thrift but encoding them in regular
>>>>> column family internally: it would still be a lot of work but we would still
>>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>>
>>>>> --
>>>>> Sylvain
>>>>>
>>>>>
>>>>>> Mike
>>>>>>
>>>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Frank LoVecchio
Senior Software Engineer | Isidorey, LLC
Google Voice +1.720.295.9179
isidorey.com | facebook.com/franklovecchio | franklovecchio.com |
rodsandricers.com

Re: Do supercolumns have a purpose?

Posted by David Boxenhorn <da...@lookin2.com>.

Mike, my problem is that I have an database and codebase that already uses
supercolumns. If I had to do it over, it wouldn't use them, for the reasons
you point out. In fact, I have a feeling that over time supercolumns will
become deprecated de facto, if not de jure. That's why I would like to see
them represented internally as regular columns, with an upgrade path for
backward compatibility.

I would love to do it myself! (I haven't looked at the code base, but I
don't understand why it should be so hard.) But my employer has other
ideas...


On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <mi...@simplegeo.com> wrote:

> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com> wrote:
>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>>
>
> David,
>
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
>
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
>
> Mike
>
> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>>
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>>
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>>
>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com>wrote:
>>>>
>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sylvain@datastax.com
>>>>> > wrote:
>>>>>
>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>>>>
>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>> families.
>>>>>>>
>>>>>>
>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>> probably a much better solution anyway.
>>>>>>
>>>>>
>>>>> I realize that this is largely subjective, and on such matters code
>>>>> speaks louder than words, but I don't think I agree with you on the issue of
>>>>> which alternative is less work, or even which is a better solution.
>>>>>
>>>>
>>>> You are right, I put probably too much emphase in that sentence. My main
>>>> point was to say that it's think it is better to create tickets for what you
>>>> want, rather than for something else completely different that would, as a
>>>> by-product, give you what you want.
>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>> super columns, then there is a good chance this would be less work than
>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>> columns may not make too much sense without #598, which itself would require
>>>> quite some work, so clearly I spoke a bit quickly.
>>>>
>>>>
>>>>> If the goal is to have a hierarchical model, limiting the depth to two
>>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>>> hierarchy?
>>>>>
>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>>> has a similar architecture and goes even further [2].
>>>>>
>>>>> It seems to me that super columns are a historical artifact from
>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>> posting lists of messages, sharded by user. So that's what they built. In my
>>>>> dealings with the Cassandra code, super columns end up making a mess all
>>>>> over the place when algorithms need to be special cased and branch based on
>>>>> the column/supercolumn distinction.
>>>>>
>>>>> I won't even mention what it does to the thrift interface.
>>>>>
>>>>
>>>> Actually, I agree with you, more than you know. If I were to start
>>>> coding Cassandra now, I wouldn't include super columns (and I would probably
>>>> not go for a depth unlimited hierarchical model either). But it's there and
>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an
>>>> option (it would be a big compatibility breakage). And (even though I
>>>> certainly though about this more than once :)) I'm slightly
>>>> less enthusiastic about keeping them in thrift but encoding them in regular
>>>> column family internally: it would still be a lot of work but we would still
>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>>
>>>>> Mike
>>>>>
>>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

RE: Do supercolumns have a purpose?

Posted by Viktor Jevdokimov <Vi...@adform.com>.

SCFs are very useful and I hope lives forever. We need them!


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

Konstitucijos pr. 23,
LT-08105 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the interested recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete or destroy this message and any copies.-----Original Message-----
From: norman.maurer@googlemail.com [mailto:norman.maurer@googlemail.com] On Behalf Of Norman Maurer
Sent: Wednesday, February 09, 2011 20:59
To: user@cassandra.apache.org
Subject: Re: Do supercolumns have a purpose?

I still think super-columns are useful you just need to be aware of
the limitations...

Bye,
Norman


2011/2/9 Mike Malone <mi...@simplegeo.com>:
> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com> wrote:
>>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>
> David,
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
> Mike
>>
>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com>
>>> wrote:
>>>>
>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com> wrote:
>>>>>
>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com>
>>>>> wrote:
>>>>>>
>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>> families.
>>>>>>
>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>> probably a much better solution anyway.
>>>>>
>>>>> I realize that this is largely subjective, and on such matters code
>>>>> speaks louder than words, but I don't think I agree with you on the issue of
>>>>> which alternative is less work, or even which is a better solution.
>>>>
>>>> You are right, I put probably too much emphase in that sentence. My main
>>>> point was to say that it's think it is better to create tickets for what you
>>>> want, rather than for something else completely different that would, as a
>>>> by-product, give you what you want.
>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>> super columns, then there is a good chance this would be less work than
>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>> columns may not make too much sense without #598, which itself would require
>>>> quite some work, so clearly I spoke a bit quickly.
>>>>
>>>>>
>>>>> If the goal is to have a hierarchical model, limiting the depth to two
>>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>>> hierarchy?
>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>>> has a similar architecture and goes even further [2].
>>>>> It seems to me that super columns are a historical artifact from
>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>> posting lists of messages, sharded by user. So that's what they built. In my
>>>>> dealings with the Cassandra code, super columns end up making a mess all
>>>>> over the place when algorithms need to be special cased and branch based on
>>>>> the column/supercolumn distinction.
>>>>> I won't even mention what it does to the thrift interface.
>>>>
>>>> Actually, I agree with you, more than you know. If I were to start
>>>> coding Cassandra now, I wouldn't include super columns (and I would probably
>>>> not go for a depth unlimited hierarchical model either). But it's there and
>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an
>>>> option (it would be a big compatibility breakage). And (even though I
>>>> certainly though about this more than once :)) I'm slightly
>>>> less enthusiastic about keeping them in thrift but encoding them in regular
>>>> column family internally: it would still be a lot of work but we would still
>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>
>>>> --
>>>> Sylvain
>>>>>
>>>>> Mike
>>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>
>>>
>>
>
>

Re: Do supercolumns have a purpose?

Posted by Norman Maurer <no...@apache.org>.

I still think super-columns are useful you just need to be aware of
the limitations...

Bye,
Norman


2011/2/9 Mike Malone <mi...@simplegeo.com>:
> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com> wrote:
>>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>
> David,
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
> Mike
>>
>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com>
>>> wrote:
>>>>
>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com> wrote:
>>>>>
>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com>
>>>>> wrote:
>>>>>>
>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>> families.
>>>>>>
>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>> probably a much better solution anyway.
>>>>>
>>>>> I realize that this is largely subjective, and on such matters code
>>>>> speaks louder than words, but I don't think I agree with you on the issue of
>>>>> which alternative is less work, or even which is a better solution.
>>>>
>>>> You are right, I put probably too much emphase in that sentence. My main
>>>> point was to say that it's think it is better to create tickets for what you
>>>> want, rather than for something else completely different that would, as a
>>>> by-product, give you what you want.
>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>> super columns, then there is a good chance this would be less work than
>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>> columns may not make too much sense without #598, which itself would require
>>>> quite some work, so clearly I spoke a bit quickly.
>>>>
>>>>>
>>>>> If the goal is to have a hierarchical model, limiting the depth to two
>>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>>> hierarchy?
>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>>> has a similar architecture and goes even further [2].
>>>>> It seems to me that super columns are a historical artifact from
>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>> posting lists of messages, sharded by user. So that's what they built. In my
>>>>> dealings with the Cassandra code, super columns end up making a mess all
>>>>> over the place when algorithms need to be special cased and branch based on
>>>>> the column/supercolumn distinction.
>>>>> I won't even mention what it does to the thrift interface.
>>>>
>>>> Actually, I agree with you, more than you know. If I were to start
>>>> coding Cassandra now, I wouldn't include super columns (and I would probably
>>>> not go for a depth unlimited hierarchical model either). But it's there and
>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an
>>>> option (it would be a big compatibility breakage). And (even though I
>>>> certainly though about this more than once :)) I'm slightly
>>>> less enthusiastic about keeping them in thrift but encoding them in regular
>>>> column family internally: it would still be a lot of work but we would still
>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>
>>>> --
>>>> Sylvain
>>>>>
>>>>> Mike
>>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>
>>>
>>
>
>

Re: Do supercolumns have a purpose?

Posted by Mike Malone <mi...@simplegeo.com>.

On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com> wrote:

> Shaun, I agree with you, but marking them as deprecated is not good enough
> for me. I can't easily stop using supercolumns. I need an upgrade path.
>

David,

Cassandra is open source and community developed. The right thing to do is
what's best for the community, which sometimes conflicts with what's best
for individual users. Such strife should be minimized, it will never be
eliminated. Luckily, because this is an open source, liberal licensed
project, if you feel strongly about something you should feel free to add
whatever features you want yourself. I'm sure other people in your situation
will thank you for it.

At a minimum I think it would behoove you to re-read some of the comments
here re: why super columns aren't really needed and take another look at
your data model and code. I would actually be quite surprised to find a use
of super columns that could not be trivially converted to normal columns. In
fact, it should be possible to do at the framework/client library layer -
you probably wouldn't even need to change any application code.

Mike

On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>
>>
>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>> you should deprecate SuperColumns. They are already distracting you, and as
>> the years go by the cost of supporting them as you add more and more
>> functionality is only likely to get worse. It would be better to concentrate
>> on making the "core" column families better (and I'm sure we can all think
>> of lots of things we'd like).
>>
>> Just dropping SuperColumns would be bad for your reputation -- and for
>> users like David who are currently using them. But if you mark them clearly
>> as deprecated and explain why and what to do instead (perhaps putting a bit
>> of effort into migration tools... or even a "virtual" layer supporting
>> arbitrary hierarchical data), then you can drop them in a few years (when
>> you get to 1.0, say), without people feeling betrayed.
>>
>> -- Shaun
>>
>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>
>> "My main point was to say that it's think it is better to create tickets
>> for what you want, rather than for something else completely different that
>> would, as a by-product, give you what you want."
>>
>> Then let me say what I want: I want supercolumn families to have any
>> feature that regular column families have.
>>
>> My data model is full of supercolumns. I used them, even though I knew it
>> didn't *have to*, "because they were there", which implied to me that I was
>> supposed to use them for some good reason. Now I suspect that they will
>> gradually become less and less functional, as features are added to regular
>> column families and not supported for supercolumn families.
>>
>>
>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>
>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com> wrote:
>>>
>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>>>
>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>>>
>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>> families.
>>>>>>
>>>>>
>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>> magnitude less work than getting rid of super column internally, and
>>>>> probably a much better solution anyway.
>>>>>
>>>>
>>>> I realize that this is largely subjective, and on such matters code
>>>> speaks louder than words, but I don't think I agree with you on the issue of
>>>> which alternative is less work, or even which is a better solution.
>>>>
>>>
>>> You are right, I put probably too much emphase in that sentence. My main
>>> point was to say that it's think it is better to create tickets for what you
>>> want, rather than for something else completely different that would, as a
>>> by-product, give you what you want.
>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>> super columns, then there is a good chance this would be less work than
>>> getting rid of super columns. But to be fair, secondary indexes on super
>>> columns may not make too much sense without #598, which itself would require
>>> quite some work, so clearly I spoke a bit quickly.
>>>
>>>
>>>> If the goal is to have a hierarchical model, limiting the depth to two
>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>> hierarchy?
>>>>
>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>> impractical, allowing a depth of two seems inconsistent and
>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>> has a similar architecture and goes even further [2].
>>>>
>>>> It seems to me that super columns are a historical artifact from
>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>> posting lists of messages, sharded by user. So that's what they built. In my
>>>> dealings with the Cassandra code, super columns end up making a mess all
>>>> over the place when algorithms need to be special cased and branch based on
>>>> the column/supercolumn distinction.
>>>>
>>>> I won't even mention what it does to the thrift interface.
>>>>
>>>
>>> Actually, I agree with you, more than you know. If I were to start coding
>>> Cassandra now, I wouldn't include super columns (and I would probably not go
>>> for a depth unlimited hierarchical model either). But it's there and I'm not
>>> sure getting rid of them fully (meaning, including in thrift) is an option
>>> (it would be a big compatibility breakage). And (even though I certainly
>>> though about this more than once :)) I'm slightly less enthusiastic about
>>> keeping them in thrift but encoding them in regular column family
>>> internally: it would still be a lot of work but we would still probably end
>>> up with nasty tricks to stick to the thrift api.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>>> Mike
>>>>
>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>
>>>
>>>
>>
>>
>

Re: Do supercolumns have a purpose?

Posted by David Boxenhorn <da...@lookin2.com>.

Shaun, I agree with you, but marking them as deprecated is not good enough
for me. I can't easily stop using supercolumns. I need an upgrade path.

On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:

>
> I'm a newbie here, but, with apologies for my presumptuousness, I think you
> should deprecate SuperColumns. They are already distracting you, and as the
> years go by the cost of supporting them as you add more and more
> functionality is only likely to get worse. It would be better to concentrate
> on making the "core" column families better (and I'm sure we can all think
> of lots of things we'd like).
>
> Just dropping SuperColumns would be bad for your reputation -- and for
> users like David who are currently using them. But if you mark them clearly
> as deprecated and explain why and what to do instead (perhaps putting a bit
> of effort into migration tools... or even a "virtual" layer supporting
> arbitrary hierarchical data), then you can drop them in a few years (when
> you get to 1.0, say), without people feeling betrayed.
>
> -- Shaun
>
> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>
> "My main point was to say that it's think it is better to create tickets
> for what you want, rather than for something else completely different that
> would, as a by-product, give you what you want."
>
> Then let me say what I want: I want supercolumn families to have any
> feature that regular column families have.
>
> My data model is full of supercolumns. I used them, even though I knew it
> didn't *have to*, "because they were there", which implied to me that I was
> supposed to use them for some good reason. Now I suspect that they will
> gradually become less and less functional, as features are added to regular
> column families and not supported for supercolumn families.
>
>
> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com>wrote:
>
>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com> wrote:
>>
>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>>
>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>>
>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>> families.
>>>>>
>>>>
>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>> magnitude less work than getting rid of super column internally, and
>>>> probably a much better solution anyway.
>>>>
>>>
>>> I realize that this is largely subjective, and on such matters code
>>> speaks louder than words, but I don't think I agree with you on the issue of
>>> which alternative is less work, or even which is a better solution.
>>>
>>
>> You are right, I put probably too much emphase in that sentence. My main
>> point was to say that it's think it is better to create tickets for what you
>> want, rather than for something else completely different that would, as a
>> by-product, give you what you want.
>> Then I suspect that *if* the only goal is to get secondary indexes on
>> super columns, then there is a good chance this would be less work than
>> getting rid of super columns. But to be fair, secondary indexes on super
>> columns may not make too much sense without #598, which itself would require
>> quite some work, so clearly I spoke a bit quickly.
>>
>>
>>> If the goal is to have a hierarchical model, limiting the depth to two
>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>> hierarchy?
>>>
>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>> impractical, allowing a depth of two seems inconsistent and
>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>> has a similar architecture and goes even further [2].
>>>
>>> It seems to me that super columns are a historical artifact from
>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>> posting lists of messages, sharded by user. So that's what they built. In my
>>> dealings with the Cassandra code, super columns end up making a mess all
>>> over the place when algorithms need to be special cased and branch based on
>>> the column/supercolumn distinction.
>>>
>>> I won't even mention what it does to the thrift interface.
>>>
>>
>> Actually, I agree with you, more than you know. If I were to start coding
>> Cassandra now, I wouldn't include super columns (and I would probably not go
>> for a depth unlimited hierarchical model either). But it's there and I'm not
>> sure getting rid of them fully (meaning, including in thrift) is an option
>> (it would be a big compatibility breakage). And (even though I certainly
>> though about this more than once :)) I'm slightly less enthusiastic about
>> keeping them in thrift but encoding them in regular column family
>> internally: it would still be a lot of work but we would still probably end
>> up with nasty tricks to stick to the thrift api.
>>
>> --
>> Sylvain
>>
>>
>>> Mike
>>>
>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>
>>
>>
>
>

Re: Do supercolumns have a purpose?

Posted by Shaun Cutts <sh...@cuttshome.net>.

I'm a newbie here, but, with apologies for my presumptuousness, I think you should deprecate SuperColumns. They are already distracting you, and as the years go by the cost of supporting them as you add more and more functionality is only likely to get worse. It would be better to concentrate on making the "core" column families better (and I'm sure we can all think of lots of things we'd like).

Just dropping SuperColumns would be bad for your reputation -- and for users like David who are currently using them. But if you mark them clearly as deprecated and explain why and what to do instead (perhaps putting a bit of effort into migration tools... or even a "virtual" layer supporting arbitrary hierarchical data), then you can drop them in a few years (when you get to 1.0, say), without people feeling betrayed.

-- Shaun

On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:

> "My main point was to say that it's think it is better to create tickets for what you want, rather than for something else completely different that would, as a by-product, give you what you want."
> 
> Then let me say what I want: I want supercolumn families to have any feature that regular column families have. 
> 
> My data model is full of supercolumns. I used them, even though I knew it didn't *have to*, "because they were there", which implied to me that I was supposed to use them for some good reason. Now I suspect that they will gradually become less and less functional, as features are added to regular column families and not supported for supercolumn families. 
> 
> 
> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com> wrote:
> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com> wrote:
> The advantage would be to enable secondary indexes on supercolumn families.
> 
> Then I suggest opening a ticket for adding secondary indexes to supercolumn families and voting on it. This will be 1 or 2 order of magnitude less work than getting rid of super column internally, and probably a much better solution anyway.
> 
> I realize that this is largely subjective, and on such matters code speaks louder than words, but I don't think I agree with you on the issue of which alternative is less work, or even which is a better solution.
> 
> You are right, I put probably too much emphase in that sentence. My main point was to say that it's think it is better to create tickets for what you want, rather than for something else completely different that would, as a by-product, give you what you want.
> Then I suspect that *if* the only goal is to get secondary indexes on super columns, then there is a good chance this would be less work than getting rid of super columns. But to be fair, secondary indexes on super columns may not make too much sense without #598, which itself would require quite some work, so clearly I spoke a bit quickly.
>  
> If the goal is to have a hierarchical model, limiting the depth to two seems arbitrary. Why not go all the way and allow an arbitrarily deep hierarchy?
> 
> If a more sophisticated hierarchical model is deemed unnecessary, or impractical, allowing a depth of two seems inconsistent and unnecessary. It's pretty trivial to overlay a hierarchical model on top of the map-of-sorted-maps model that Cassandra implements. Ed Anuff has implemented a custom comparator that does the job [1]. Google's Megastore has a similar architecture and goes even further [2].
> 
> It seems to me that super columns are a historical artifact from Cassandra's early life as Facebook's inbox storage system. They needed posting lists of messages, sharded by user. So that's what they built. In my dealings with the Cassandra code, super columns end up making a mess all over the place when algorithms need to be special cased and branch based on the column/supercolumn distinction.
> 
> I won't even mention what it does to the thrift interface.
> 
> Actually, I agree with you, more than you know. If I were to start coding Cassandra now, I wouldn't include super columns (and I would probably not go for a depth unlimited hierarchical model either). But it's there and I'm not sure getting rid of them fully (meaning, including in thrift) is an option (it would be a big compatibility breakage). And (even though I certainly though about this more than once :)) I'm slightly less enthusiastic about keeping them in thrift but encoding them in regular column family internally: it would still be a lot of work but we would still probably end up with nasty tricks to stick to the thrift api. 
>  
> --
> Sylvain
> 
> 
> Mike
> 
> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
> 
>

Re: Do supercolumns have a purpose?

Posted by David Boxenhorn <da...@lookin2.com>.

"My main point was to say that it's think it is better to create tickets for
what you want, rather than for something else completely different that
would, as a by-product, give you what you want."

Then let me say what I want: I want supercolumn families to have any feature
that regular column families have.

My data model is full of supercolumns. I used them, even though I knew it
didn't *have to*, "because they were there", which implied to me that I was
supposed to use them for some good reason. Now I suspect that they will
gradually become less and less functional, as features are added to regular
column families and not supported for supercolumn families.


On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com> wrote:
>
>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>
>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>
>>>> The advantage would be to enable secondary indexes on supercolumn
>>>> families.
>>>>
>>>
>>> Then I suggest opening a ticket for adding secondary indexes to
>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>> magnitude less work than getting rid of super column internally, and
>>> probably a much better solution anyway.
>>>
>>
>> I realize that this is largely subjective, and on such matters code speaks
>> louder than words, but I don't think I agree with you on the issue of which
>> alternative is less work, or even which is a better solution.
>>
>
> You are right, I put probably too much emphase in that sentence. My main
> point was to say that it's think it is better to create tickets for what you
> want, rather than for something else completely different that would, as a
> by-product, give you what you want.
> Then I suspect that *if* the only goal is to get secondary indexes on super
> columns, then there is a good chance this would be less work than getting
> rid of super columns. But to be fair, secondary indexes on super columns may
> not make too much sense without #598, which itself would require quite some
> work, so clearly I spoke a bit quickly.
>
>
>> If the goal is to have a hierarchical model, limiting the depth to two
>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>> hierarchy?
>>
>> If a more sophisticated hierarchical model is deemed unnecessary, or
>> impractical, allowing a depth of two seems inconsistent and
>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>> implemented a custom comparator that does the job [1]. Google's Megastore
>> has a similar architecture and goes even further [2].
>>
>> It seems to me that super columns are a historical artifact from
>> Cassandra's early life as Facebook's inbox storage system. They needed
>> posting lists of messages, sharded by user. So that's what they built. In my
>> dealings with the Cassandra code, super columns end up making a mess all
>> over the place when algorithms need to be special cased and branch based on
>> the column/supercolumn distinction.
>>
>> I won't even mention what it does to the thrift interface.
>>
>
> Actually, I agree with you, more than you know. If I were to start coding
> Cassandra now, I wouldn't include super columns (and I would probably not go
> for a depth unlimited hierarchical model either). But it's there and I'm not
> sure getting rid of them fully (meaning, including in thrift) is an option
> (it would be a big compatibility breakage). And (even though I certainly
> though about this more than once :)) I'm slightly less enthusiastic about
> keeping them in thrift but encoding them in regular column family
> internally: it would still be a lot of work but we would still probably end
> up with nasty tricks to stick to the thrift api.
>
> --
> Sylvain
>
>
>> Mike
>>
>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>
>
>

Re: Do supercolumns have a purpose?

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mi...@simplegeo.com> wrote:

> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com>wrote:
>
>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>
>>> The advantage would be to enable secondary indexes on supercolumn
>>> families.
>>>
>>
>> Then I suggest opening a ticket for adding secondary indexes to
>> supercolumn families and voting on it. This will be 1 or 2 order of
>> magnitude less work than getting rid of super column internally, and
>> probably a much better solution anyway.
>>
>
> I realize that this is largely subjective, and on such matters code speaks
> louder than words, but I don't think I agree with you on the issue of which
> alternative is less work, or even which is a better solution.
>

You are right, I put probably too much emphase in that sentence. My main
point was to say that it's think it is better to create tickets for what you
want, rather than for something else completely different that would, as a
by-product, give you what you want.
Then I suspect that *if* the only goal is to get secondary indexes on super
columns, then there is a good chance this would be less work than getting
rid of super columns. But to be fair, secondary indexes on super columns may
not make too much sense without #598, which itself would require quite some
work, so clearly I spoke a bit quickly.

> If the goal is to have a hierarchical model, limiting the depth to two
> seems arbitrary. Why not go all the way and allow an arbitrarily deep
> hierarchy?
>
> If a more sophisticated hierarchical model is deemed unnecessary, or
> impractical, allowing a depth of two seems inconsistent and
> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
> implemented a custom comparator that does the job [1]. Google's Megastore
> has a similar architecture and goes even further [2].
>
> It seems to me that super columns are a historical artifact from
> Cassandra's early life as Facebook's inbox storage system. They needed
> posting lists of messages, sharded by user. So that's what they built. In my
> dealings with the Cassandra code, super columns end up making a mess all
> over the place when algorithms need to be special cased and branch based on
> the column/supercolumn distinction.
>
> I won't even mention what it does to the thrift interface.
>

Actually, I agree with you, more than you know. If I were to start coding
Cassandra now, I wouldn't include super columns (and I would probably not go
for a depth unlimited hierarchical model either). But it's there and I'm not
sure getting rid of them fully (meaning, including in thrift) is an option
(it would be a big compatibility breakage). And (even though I certainly
though about this more than once :)) I'm slightly less enthusiastic about
keeping them in thrift but encoding them in regular column family
internally: it would still be a lot of work but we would still probably end
up with nasty tricks to stick to the thrift api.

--
Sylvain

> Mike
>
> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>

Re: Do supercolumns have a purpose?

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Feb 3, 2011 at 3:35 PM, Mike Malone <mi...@simplegeo.com> wrote:
> It seems to me that super columns are a historical artifact from Cassandra's
> early life as Facebook's inbox storage system. They needed posting lists of
> messages, sharded by user. So that's what they built. In my dealings with
> the Cassandra code, super columns end up making a mess all over the place
> when algorithms need to be special cased and branch based on the
> column/supercolumn distinction.
> I won't even mention what it does to the thrift interface.

+1

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Do supercolumns have a purpose?

Posted by Bill de hÓra <bi...@dehora.net>.

On Thu, 2011-02-03 at 15:35 -0800, Mike Malone wrote:

>  In my dealings with the Cassandra code, super columns end up making a
> mess all over the place when algorithms need to be special cased and
> branch based on the column/supercolumn distinction.
> 
> 
> I won't even mention what it does to the thrift interface.

My observation is similar, in that they (SCFs) make the "type system" in
Cassandra disjoint. This makes me doubt that moving to Avro would
simplify anything for Cassandra users. It also means knock-on effects
such as no  common supertype in APIs for languages like Java (so the
surface area of clients like Hector blow up badly when you compare it
the HBase client).   I can't wait to see how CQL fares with SCFs; a sane
query language will be closed under its operations and I doubt it can be
done atm.

That said, I keep finding uses for them, which is irksome; but maybe I'm
being lazy when it comes to modelling and now that secondary indexes are
in, I should pretend SCFs don't exist. 

Bill

Re: Do supercolumns have a purpose?

Posted by Mike Malone <mi...@simplegeo.com>.

On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com> wrote:
>
>> The advantage would be to enable secondary indexes on supercolumn
>> families.
>>
>
> Then I suggest opening a ticket for adding secondary indexes to supercolumn
> families and voting on it. This will be 1 or 2 order of magnitude less work
> than getting rid of super column internally, and probably a much better
> solution anyway.
>

I realize that this is largely subjective, and on such matters code speaks
louder than words, but I don't think I agree with you on the issue of which
alternative is less work, or even which is a better solution.

If the goal is to have a hierarchical model, limiting the depth to two seems
arbitrary. Why not go all the way and allow an arbitrarily deep hierarchy?

If a more sophisticated hierarchical model is deemed unnecessary, or
impractical, allowing a depth of two seems inconsistent and
unnecessary. It's pretty trivial to overlay a hierarchical model on top of
the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
implemented a custom comparator that does the job [1]. Google's Megastore
has a similar architecture and goes even further [2].

It seems to me that super columns are a historical artifact from Cassandra's
early life as Facebook's inbox storage system. They needed posting lists of
messages, sharded by user. So that's what they built. In my dealings with
the Cassandra code, super columns end up making a mess all over the place
when algorithms need to be special cased and branch based on the
column/supercolumn distinction.

I won't even mention what it does to the thrift interface.

Mike

[1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
[2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf

Re: Do supercolumns have a purpose?

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com> wrote:

> The advantage would be to enable secondary indexes on supercolumn families.
>

Then I suggest opening a ticket for adding secondary indexes to supercolumn
families and voting on it. This will be 1 or 2 order of magnitude less work
than getting rid of super column internally, and probably a much better
solution anyway.


> I understand from this thread that indexes are supercolumn families are not
> going to be:
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg09527.html
>

I should maybe let Jonathan answer this one, but the way I understand it is
that adding secondary indexes to super column is not a top priority to
actual active developers. Not that it will never ever happen. And voting for
tickets in JIRA is one way to help make it raise its priority.

In any case, if the goal you're pursuing is adding secondary indexes to
super column, then that's the ticket you should open, and if after careful
consideration it is decided that getting rid of super column is the best way
to reach that goal then so be it (spoiler: it is not).


> Which, it seems to me, effectively deprecates supercolumn families. (I
> don't see any of the three problems you brought up as overcoming this
> problem, except, perhaps, for special cases.)
>

You're untitled to your opinions obviously but I doubt everyone share that
feeling (I don't for instance). Before 0.7, there was no secondary indexes
at all and still a bunch of people managed to do pretty nice stuffs with
Cassandra. In particular denormalized views are sometimes (often?)
preferable to secondary indexes for performance reasons. For that super
columns are quite handy.

--
Sylvain


>
>
> On Thu, Feb 3, 2011 at 3:32 PM, Sylvain Lebresne <sy...@datastax.com>wrote:
>
>> On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>
>>> Thanks Sylvain!
>>>
>>> Can I vote for internally implementing supercolumn families as regular
>>> column families? (With a smooth upgrade process that doesn't require
>>> shutting down a live cluster.)
>>>
>>
>> I forgot to add that I don't know if this make a lot of sense. That would
>> be a fairly major refactor (so error prone), you'd still have to deal with
>> the point I mentioned in my previous mail (for range deletes you would have
>> to change the on-disk format for instance), and all this for no actual
>> benefits, even downsides actually (encoded supercolumn will take more space
>> on-disk (and on-memory)). Super columns are there and work fairly well, so
>> what would be the point ?
>>
>> I'm only just saying that 'in theory', super columns are not the super
>> shiny magical feature that give you stuff you can't hope to have with only
>> regular column family. That doesn't make then at least nice.
>>
>> That being said, you are free to create whatever ticket you want and vote
>> for it. Don't expect too much support tough :)
>>
>>
>>> What if supercolumn families were supported as regular column families +
>>> an index (on what used to be supercolumn keys)? Would that solve some
>>> problems?
>>>
>>
>> You'd still have to remember for each CF if it has this index on what used
>> to be supercolumn keys and handle those differently. Really not convince
>> this would make the code cleaner that how it is now. And making the code
>> cleaner is really the only reason I can thing of for wanting to get rid of
>> super columns internally, so ...
>>
>>
>>>
>>>
>>> On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>>
>>>> > Is there any advantage to using supercolumns
>>>> > (columnFamilyName[superColumnName[columnName[val]]]) instead of
>>>> regular
>>>> > columns with concatenated keys
>>>> > (columnFamilyName[superColumnName@columnName[val]])?
>>>> >
>>>> > When I designed my data model, I used supercolumns wherever I needed
>>>> two
>>>> > levels of key depth - just because they were there, and I figured that
>>>> they
>>>> > must be there for a reason.
>>>> >
>>>> > Now I see that in 0.7 secondary indexes don't work on supercolumns or
>>>> > subcolumns (is that right?), which seems to me like a very serious
>>>> > limitation of supercolumn families.
>>>> >
>>>> > It raises the question: Is there anything that supercolumn families
>>>> are good
>>>> > for?
>>>>
>>>> There is a bunch of queries that you cannot do (or less conveniently) if
>>>> you
>>>> encode super columns using regular columns with concatenated keys:
>>>>
>>>> 1) If you use regular columns with concatenated keys, the count argument
>>>> count simple columns. With super columns it counts super columns. It
>>>> means
>>>> that you can't do "give me the 10 first super columns of this row".
>>>>
>>>> 2) If you need to get x super columns by name, you'll have to issue x
>>>> get_slice query (one of each super column). On the client side it sucks.
>>>> Internally in Cassandra we could do it reasonably well though.
>>>>
>>>> 3) You cannot remove entire super columns since there is no support for
>>>> range
>>>> deletions.
>>>>
>>>> Moreover, the encoding with concatenated keys uses more disk space (and
>>>> less
>>>> disk used for the same information means less things to read so it may
>>>> have
>>>> a slight impact on read performance too -- it's probably really slight
>>>> on most
>>>> usage but nevertheless).
>>>>
>>>> > And here's a related question: Why can't Cassandra implement
>>>> supercolumn
>>>> > families as regular column families, internally, and give you that
>>>> > functionality?
>>>>
>>>> For the 1) and 2) above, we could deal with those internally fairly
>>>> easily I
>>>> think and rather well (which means it wouldn't be much worse
>>>> performance-wise
>>>> than with the actual implementaion of super columns, not that it would
>>>> be
>>>> better). For 3), range deletes are harder and would require more
>>>> significant
>>>> changes (that doesn't mean that Cassandra will never have it). Even
>>>> without
>>>> that, there would be the disk space lost.
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>>
>>>
>>
>

Re: Do supercolumns have a purpose?

Posted by David Boxenhorn <da...@lookin2.com>.

The advantage would be to enable secondary indexes on supercolumn families.

I understand from this thread that indexes are supercolumn families are not
going to be:

http://www.mail-archive.com/user@cassandra.apache.org/msg09527.html

Which, it seems to me, effectively deprecates supercolumn families. (I don't
see any of the three problems you brought up as overcoming this problem,
except, perhaps, for special cases.)


On Thu, Feb 3, 2011 at 3:32 PM, Sylvain Lebresne <sy...@datastax.com>wrote:

> On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn <da...@lookin2.com> wrote:
>
>> Thanks Sylvain!
>>
>> Can I vote for internally implementing supercolumn families as regular
>> column families? (With a smooth upgrade process that doesn't require
>> shutting down a live cluster.)
>>
>
> I forgot to add that I don't know if this make a lot of sense. That would
> be a fairly major refactor (so error prone), you'd still have to deal with
> the point I mentioned in my previous mail (for range deletes you would have
> to change the on-disk format for instance), and all this for no actual
> benefits, even downsides actually (encoded supercolumn will take more space
> on-disk (and on-memory)). Super columns are there and work fairly well, so
> what would be the point ?
>
> I'm only just saying that 'in theory', super columns are not the super
> shiny magical feature that give you stuff you can't hope to have with only
> regular column family. That doesn't make then at least nice.
>
> That being said, you are free to create whatever ticket you want and vote
> for it. Don't expect too much support tough :)
>
>
>> What if supercolumn families were supported as regular column families +
>> an index (on what used to be supercolumn keys)? Would that solve some
>> problems?
>>
>
> You'd still have to remember for each CF if it has this index on what used
> to be supercolumn keys and handle those differently. Really not convince
> this would make the code cleaner that how it is now. And making the code
> cleaner is really the only reason I can thing of for wanting to get rid of
> super columns internally, so ...
>
>
>>
>>
>> On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne <sy...@datastax.com>wrote:
>>
>>> > Is there any advantage to using supercolumns
>>> > (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
>>> > columns with concatenated keys
>>> > (columnFamilyName[superColumnName@columnName[val]])?
>>> >
>>> > When I designed my data model, I used supercolumns wherever I needed
>>> two
>>> > levels of key depth - just because they were there, and I figured that
>>> they
>>> > must be there for a reason.
>>> >
>>> > Now I see that in 0.7 secondary indexes don't work on supercolumns or
>>> > subcolumns (is that right?), which seems to me like a very serious
>>> > limitation of supercolumn families.
>>> >
>>> > It raises the question: Is there anything that supercolumn families are
>>> good
>>> > for?
>>>
>>> There is a bunch of queries that you cannot do (or less conveniently) if
>>> you
>>> encode super columns using regular columns with concatenated keys:
>>>
>>> 1) If you use regular columns with concatenated keys, the count argument
>>> count simple columns. With super columns it counts super columns. It
>>> means
>>> that you can't do "give me the 10 first super columns of this row".
>>>
>>> 2) If you need to get x super columns by name, you'll have to issue x
>>> get_slice query (one of each super column). On the client side it sucks.
>>> Internally in Cassandra we could do it reasonably well though.
>>>
>>> 3) You cannot remove entire super columns since there is no support for
>>> range
>>> deletions.
>>>
>>> Moreover, the encoding with concatenated keys uses more disk space (and
>>> less
>>> disk used for the same information means less things to read so it may
>>> have
>>> a slight impact on read performance too -- it's probably really slight on
>>> most
>>> usage but nevertheless).
>>>
>>> > And here's a related question: Why can't Cassandra implement
>>> supercolumn
>>> > families as regular column families, internally, and give you that
>>> > functionality?
>>>
>>> For the 1) and 2) above, we could deal with those internally fairly
>>> easily I
>>> think and rather well (which means it wouldn't be much worse
>>> performance-wise
>>> than with the actual implementaion of super columns, not that it would be
>>> better). For 3), range deletes are harder and would require more
>>> significant
>>> changes (that doesn't mean that Cassandra will never have it). Even
>>> without
>>> that, there would be the disk space lost.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>
>

Re: Do supercolumns have a purpose?

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn <da...@lookin2.com> wrote:

> Thanks Sylvain!
>
> Can I vote for internally implementing supercolumn families as regular
> column families? (With a smooth upgrade process that doesn't require
> shutting down a live cluster.)
>

I forgot to add that I don't know if this make a lot of sense. That would be
a fairly major refactor (so error prone), you'd still have to deal with the
point I mentioned in my previous mail (for range deletes you would have to
change the on-disk format for instance), and all this for no actual
benefits, even downsides actually (encoded supercolumn will take more space
on-disk (and on-memory)). Super columns are there and work fairly well, so
what would be the point ?

I'm only just saying that 'in theory', super columns are not the super shiny
magical feature that give you stuff you can't hope to have with only regular
column family. That doesn't make then at least nice.

That being said, you are free to create whatever ticket you want and vote
for it. Don't expect too much support tough :)


> What if supercolumn families were supported as regular column families + an
> index (on what used to be supercolumn keys)? Would that solve some problems?
>

You'd still have to remember for each CF if it has this index on what used
to be supercolumn keys and handle those differently. Really not convince
this would make the code cleaner that how it is now. And making the code
cleaner is really the only reason I can thing of for wanting to get rid of
super columns internally, so ...


>
>
> On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne <sy...@datastax.com>wrote:
>
>> > Is there any advantage to using supercolumns
>> > (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
>> > columns with concatenated keys
>> > (columnFamilyName[superColumnName@columnName[val]])?
>> >
>> > When I designed my data model, I used supercolumns wherever I needed two
>> > levels of key depth - just because they were there, and I figured that
>> they
>> > must be there for a reason.
>> >
>> > Now I see that in 0.7 secondary indexes don't work on supercolumns or
>> > subcolumns (is that right?), which seems to me like a very serious
>> > limitation of supercolumn families.
>> >
>> > It raises the question: Is there anything that supercolumn families are
>> good
>> > for?
>>
>> There is a bunch of queries that you cannot do (or less conveniently) if
>> you
>> encode super columns using regular columns with concatenated keys:
>>
>> 1) If you use regular columns with concatenated keys, the count argument
>> count simple columns. With super columns it counts super columns. It means
>> that you can't do "give me the 10 first super columns of this row".
>>
>> 2) If you need to get x super columns by name, you'll have to issue x
>> get_slice query (one of each super column). On the client side it sucks.
>> Internally in Cassandra we could do it reasonably well though.
>>
>> 3) You cannot remove entire super columns since there is no support for
>> range
>> deletions.
>>
>> Moreover, the encoding with concatenated keys uses more disk space (and
>> less
>> disk used for the same information means less things to read so it may
>> have
>> a slight impact on read performance too -- it's probably really slight on
>> most
>> usage but nevertheless).
>>
>> > And here's a related question: Why can't Cassandra implement supercolumn
>> > families as regular column families, internally, and give you that
>> > functionality?
>>
>> For the 1) and 2) above, we could deal with those internally fairly easily
>> I
>> think and rather well (which means it wouldn't be much worse
>> performance-wise
>> than with the actual implementaion of super columns, not that it would be
>> better). For 3), range deletes are harder and would require more
>> significant
>> changes (that doesn't mean that Cassandra will never have it). Even
>> without
>> that, there would be the disk space lost.
>>
>> --
>> Sylvain
>>
>>
>

Re: Do supercolumns have a purpose?

Posted by David Boxenhorn <da...@lookin2.com>.

Thanks Sylvain!

Can I vote for internally implementing supercolumn families as regular
column families? (With a smooth upgrade process that doesn't require
shutting down a live cluster.)

What if supercolumn families were supported as regular column families + an
index (on what used to be supercolumn keys)? Would that solve some problems?


On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne <sy...@datastax.com>wrote:

> > Is there any advantage to using supercolumns
> > (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
> > columns with concatenated keys
> > (columnFamilyName[superColumnName@columnName[val]])?
> >
> > When I designed my data model, I used supercolumns wherever I needed two
> > levels of key depth - just because they were there, and I figured that
> they
> > must be there for a reason.
> >
> > Now I see that in 0.7 secondary indexes don't work on supercolumns or
> > subcolumns (is that right?), which seems to me like a very serious
> > limitation of supercolumn families.
> >
> > It raises the question: Is there anything that supercolumn families are
> good
> > for?
>
> There is a bunch of queries that you cannot do (or less conveniently) if
> you
> encode super columns using regular columns with concatenated keys:
>
> 1) If you use regular columns with concatenated keys, the count argument
> count simple columns. With super columns it counts super columns. It means
> that you can't do "give me the 10 first super columns of this row".
>
> 2) If you need to get x super columns by name, you'll have to issue x
> get_slice query (one of each super column). On the client side it sucks.
> Internally in Cassandra we could do it reasonably well though.
>
> 3) You cannot remove entire super columns since there is no support for
> range
> deletions.
>
> Moreover, the encoding with concatenated keys uses more disk space (and
> less
> disk used for the same information means less things to read so it may have
> a slight impact on read performance too -- it's probably really slight on
> most
> usage but nevertheless).
>
> > And here's a related question: Why can't Cassandra implement supercolumn
> > families as regular column families, internally, and give you that
> > functionality?
>
> For the 1) and 2) above, we could deal with those internally fairly easily
> I
> think and rather well (which means it wouldn't be much worse
> performance-wise
> than with the actual implementaion of super columns, not that it would be
> better). For 3), range deletes are harder and would require more
> significant
> changes (that doesn't mean that Cassandra will never have it). Even without
> that, there would be the disk space lost.
>
> --
> Sylvain
>
>

Re: Do supercolumns have a purpose?

Posted by Sylvain Lebresne <sy...@datastax.com>.

> Is there any advantage to using supercolumns
> (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
> columns with concatenated keys
> (columnFamilyName[superColumnName@columnName[val]])?
>
> When I designed my data model, I used supercolumns wherever I needed two
> levels of key depth - just because they were there, and I figured that
they
> must be there for a reason.
>
> Now I see that in 0.7 secondary indexes don't work on supercolumns or
> subcolumns (is that right?), which seems to me like a very serious
> limitation of supercolumn families.
>
> It raises the question: Is there anything that supercolumn families are
good
> for?

There is a bunch of queries that you cannot do (or less conveniently) if you
encode super columns using regular columns with concatenated keys:

1) If you use regular columns with concatenated keys, the count argument
count simple columns. With super columns it counts super columns. It means
that you can't do "give me the 10 first super columns of this row".

2) If you need to get x super columns by name, you'll have to issue x
get_slice query (one of each super column). On the client side it sucks.
Internally in Cassandra we could do it reasonably well though.

3) You cannot remove entire super columns since there is no support for
range
deletions.

Moreover, the encoding with concatenated keys uses more disk space (and less
disk used for the same information means less things to read so it may have
a slight impact on read performance too -- it's probably really slight on
most
usage but nevertheless).

> And here's a related question: Why can't Cassandra implement supercolumn
> families as regular column families, internally, and give you that
> functionality?

For the 1) and 2) above, we could deal with those internally fairly easily I
think and rather well (which means it wouldn't be much worse
performance-wise
than with the actual implementaion of super columns, not that it would be
better). For 3), range deletes are harder and would require more significant
changes (that doesn't mean that Cassandra will never have it). Even without
that, there would be the disk space lost.

--
Sylvain