You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2009/08/24 17:21:43 UTC

Re: Data model names, reloaded

IMO the window for making this kind of change has passed.  We've
talked about finalizing the 0.4 api weeks ago, we got a beta out with
it, and it does the job.  The timeline wasn't a surprise to anyone
paying attention to the list.  It's time to move on.

-Jonathan

On Fri, Aug 21, 2009 at 1:36 PM, Evan Weaver<ew...@gmail.com> wrote:
> I think the below scheme successfully avoids the current
> misconceptions, and addresses the issues raised in the previous
> thread.
>
> The names are memorable and short, Anglo-Saxon-style, and take
> advantage of existing database concepts in non-conflicting ways. They
> are not ambiguous or novel. They descend step-by-step from the
> container to the thing contained.
>
> Proposal 2:
>
>  Database
>  Record set
>  Record (w/key)
>  Field set
>  Field
>
> Notes:
>  * Database is the same as in SQL/CouchDB/MongoDB
>  * Record set is based on "record", below. It expresses a container of
> unique rows, without the BigTable baggage (see PS).
>  * Record is the same as row, without the relational baggage.
>  * Field set is based on "field", below, and parallels "record set".
> It expresses a container of unique fields.
>  * Field is the same as in CouchDB, and does not carry the SQL baggage
> of "column", or the relational-theory baggage of "attribute".
>
> I think if we adopted these, we would quickly move from "most
> confusing data model" to "least confusing", based on my research into
> other popular terminology
> (http://markmail.org/thread/6vys3hk774zcrd6v).
>
> Evan
>
> ----
>
> PS. The implementation of column families hasn't changed from
> BigTable, but the use in modeling has. Common Cassandra designs are
> more row-oriented than column-oriented.
>
> With that in mind, keyspace, row, and super-column could also each be
> called column family. They all have sets of related columns in them,
> among other things. Everything but the column itself is some kind of
> "column family". This is a big stumbling block.
>
> I want a new user to be able to look at any level and answer "what is
> the immediate container of this object?" If they can't do that, then
> the term is ambiguous.
>
> --
> Evan Weaver
>

Re: Data model names, reloaded

Posted by Tim Estes <ti...@digitalreasoning.com>.

I for one don't mind having a clear distinction from the relational  
language bias that this change might imply.

Maybe I'm a little too NOSQL, but I think that the current naming does  
avoid confusion with relational concepts that could be  
counterproductive in designing correctly to this type of model.

-- 
Tim Estes
CEO
Digital Reasoning Systems


On Aug 24, 2009, at 10:21 AM, Jonathan Ellis wrote:

> IMO the window for making this kind of change has passed.  We've
> talked about finalizing the 0.4 api weeks ago, we got a beta out with
> it, and it does the job.  The timeline wasn't a surprise to anyone
> paying attention to the list.  It's time to move on.
>
> -Jonathan
>
> On Fri, Aug 21, 2009 at 1:36 PM, Evan Weaver<ew...@gmail.com> wrote:
>> I think the below scheme successfully avoids the current
>> misconceptions, and addresses the issues raised in the previous
>> thread.
>>
>> The names are memorable and short, Anglo-Saxon-style, and take
>> advantage of existing database concepts in non-conflicting ways. They
>> are not ambiguous or novel. They descend step-by-step from the
>> container to the thing contained.
>>
>> Proposal 2:
>>
>>  Database
>>  Record set
>>  Record (w/key)
>>  Field set
>>  Field
>>
>> Notes:
>>  * Database is the same as in SQL/CouchDB/MongoDB
>>  * Record set is based on "record", below. It expresses a container  
>> of
>> unique rows, without the BigTable baggage (see PS).
>>  * Record is the same as row, without the relational baggage.
>>  * Field set is based on "field", below, and parallels "record set".
>> It expresses a container of unique fields.
>>  * Field is the same as in CouchDB, and does not carry the SQL  
>> baggage
>> of "column", or the relational-theory baggage of "attribute".
>>
>> I think if we adopted these, we would quickly move from "most
>> confusing data model" to "least confusing", based on my research into
>> other popular terminology
>> (http://markmail.org/thread/6vys3hk774zcrd6v).
>>
>> Evan
>>
>> ----
>>
>> PS. The implementation of column families hasn't changed from
>> BigTable, but the use in modeling has. Common Cassandra designs are
>> more row-oriented than column-oriented.
>>
>> With that in mind, keyspace, row, and super-column could also each be
>> called column family. They all have sets of related columns in them,
>> among other things. Everything but the column itself is some kind of
>> "column family". This is a big stumbling block.
>>
>> I want a new user to be able to look at any level and answer "what is
>> the immediate container of this object?" If they can't do that, then
>> the term is ambiguous.
>>
>> --
>> Evan Weaver
>>

Re: Data model names, reloaded

Posted by Evan Weaver <ew...@gmail.com>.

In one of the first threads I said I wasn't interested in stalling the
0.4 release. I also said we would write the patches. We talked about
the change to Ellis first at the hackfest a month ago; it's taken a
lot of work to get it right.

Plus, we build the Ruby client first, which seemed more important. The
experience of building a comprehensive client helped convince us the
naming was a problem.

Fundamentally there is a disagreement about the cost of the breakage
vs. the benefit of the improvement. If we are open to accepting such a
change for 0.5 then that seems worth discussing. Ellis suggested
taking a vote, so I will do that, and then we can move on.

Evan

On Mon, Aug 24, 2009 at 8:53 AM, Jonathan Ellis<jb...@gmail.com> wrote:
> On Mon, Aug 24, 2009 at 10:50 AM, Toby DiPasquale<co...@gmail.com> wrote:
>> That was not my intent. Evan's provided some good stuff. However, I
>> think your original post would not have incited my post if you'd
>> provided some point at which this could be re-evaluated instead of
>> implying that the subject needed to be dropped altogether. Do you have
>> a suggestion as to when this could be revisited?
>
> The corollary to "it's too late for 0.4" is "it's too late, period."
>
> Every project reaches a point past which it's no longer worth
> revisiting certain fundamental decisions.  IMO Cassandra has passed
> that point for "what do we call a Column."
>
> -Jonathan
>

-- 
Evan Weaver

Re: Data model names, reloaded

Posted by Michael Greene <mi...@gmail.com>.

I agree that this is too late for 0.4.  0.4 is a huge improvement over 0.3,
we've already released a beta, a second beta should be forthcoming, and we
need to get this in users' hands.  I'm not sure I agree with Jonathan's
corollary, as there should always be room for making changes which would
benefit our users, even major API changes (after 1.0, this might be a
different story).
Evan, Ryan, et al. have put a lot of work into this, and their proposals are
sound.  However, there are now several groups using Cassandra in production
or pre-production systems that would like to see a 0.4 release out soon.

Michael

On Mon, Aug 24, 2009 at 11:01 AM, Chris Goffinet <cg...@chrisgoffinet.com>wrote:

> +1 I am going to agree with Jonathan. I have been quiet mostly on this
> thread, just to see how things played out. I stopped reading after awhile,
> as I feel we really need to move past this for now, and defer. Improving
> cassandra's performance, feature set and stability should be our focus IMHO.
> Right now we have features that make it difficult to deploy into production.
> Until we have those things resolved and working, having major user adoption
> isn't going to help until we get all of those things worked out.
>
>
>
> On Aug 24, 2009, at 8:53 AM, Jonathan Ellis wrote:
>
>  On Mon, Aug 24, 2009 at 10:50 AM, Toby DiPasquale<co...@gmail.com>
>> wrote:
>>
>>> That was not my intent. Evan's provided some good stuff. However, I
>>> think your original post would not have incited my post if you'd
>>> provided some point at which this could be re-evaluated instead of
>>> implying that the subject needed to be dropped altogether. Do you have
>>> a suggestion as to when this could be revisited?
>>>
>>
>> The corollary to "it's too late for 0.4" is "it's too late, period."
>>
>> Every project reaches a point past which it's no longer worth
>> revisiting certain fundamental decisions.  IMO Cassandra has passed
>> that point for "what do we call a Column."
>>
>> -Jonathan
>>
>
>

Re: Data model names, reloaded

Posted by Chris Goffinet <cg...@chrisgoffinet.com>.

+1 I am going to agree with Jonathan. I have been quiet mostly on this  
thread, just to see how things played out. I stopped reading after  
awhile, as I feel we really need to move past this for now, and defer.  
Improving cassandra's performance, feature set and stability should be  
our focus IMHO. Right now we have features that make it difficult to  
deploy into production. Until we have those things resolved and  
working, having major user adoption isn't going to help until we get  
all of those things worked out.


On Aug 24, 2009, at 8:53 AM, Jonathan Ellis wrote:

> On Mon, Aug 24, 2009 at 10:50 AM, Toby DiPasquale<codeslinger@gmail.com 
> > wrote:
>> That was not my intent. Evan's provided some good stuff. However, I
>> think your original post would not have incited my post if you'd
>> provided some point at which this could be re-evaluated instead of
>> implying that the subject needed to be dropped altogether. Do you  
>> have
>> a suggestion as to when this could be revisited?
>
> The corollary to "it's too late for 0.4" is "it's too late, period."
>
> Every project reaches a point past which it's no longer worth
> revisiting certain fundamental decisions.  IMO Cassandra has passed
> that point for "what do we call a Column."
>
> -Jonathan

Re: Data model names, reloaded

Posted by Toby DiPasquale <co...@gmail.com>.

On Mon, Aug 24, 2009 at 11:53 AM, Jonathan Ellis<jb...@gmail.com> wrote:
> The corollary to "it's too late for 0.4" is "it's too late, period."
>
> Every project reaches a point past which it's no longer worth
> revisiting certain fundamental decisions.  IMO Cassandra has passed
> that point for "what do we call a Column."

I would urge you to reconsider that position. I think that making the
nomenclature clearer would serve project adoption well in the future.

-- 
Toby DiPasquale

Re: Data model names, reloaded

Posted by Jonathan Ellis <jb...@gmail.com>.

On Mon, Aug 24, 2009 at 10:50 AM, Toby DiPasquale<co...@gmail.com> wrote:
> That was not my intent. Evan's provided some good stuff. However, I
> think your original post would not have incited my post if you'd
> provided some point at which this could be re-evaluated instead of
> implying that the subject needed to be dropped altogether. Do you have
> a suggestion as to when this could be revisited?

The corollary to "it's too late for 0.4" is "it's too late, period."

Every project reaches a point past which it's no longer worth
revisiting certain fundamental decisions.  IMO Cassandra has passed
that point for "what do we call a Column."

-Jonathan

Re: Data model names, reloaded

Posted by Toby DiPasquale <co...@gmail.com>.

On Mon, Aug 24, 2009 at 11:46 AM, Jonathan Ellis<jb...@gmail.com> wrote:
> Again, we've been clear about the direction and the timeline for 0.4.
> This kind of proposal needed to happen a month ago.  It didn't.  That
> may be a shame, but that's how it works, and trying to hold up
> everyone else for your pet feature (without even patches! you'll
> pardon me if the implication seems to be that you expect others to do
> that part for you) is rude.  That's not how OSS should work.

That was not my intent. Evan's provided some good stuff. However, I
think your original post would not have incited my post if you'd
provided some point at which this could be re-evaluated instead of
implying that the subject needed to be dropped altogether. Do you have
a suggestion as to when this could be revisited?

-- 
Toby DiPasquale

Re: Data model names, reloaded

Posted by Curt Micol <as...@gmail.com>.

On Mon, Aug 24, 2009 at 1:16 PM, Michael Greene<mi...@gmail.com> wrote:
> I haven't seen any 'piss and vinegar'.  Personally, I was just trying to
> explain my -1 and suggest that maybe it could be done in the future.  The
> reason that it was timed as such was not a concerted effort from a cabal or
> anything else sinister, but that the thread was brought up on IRC and
> arguments stirred.
> I should have chimed in with my -1 earlier but other issues had consumed my
> attention, my apologies.

My response wasn't directed at any one response, I apologize if it
came off directed at you Michael.  It seemed people had strong
feelings one way and those feelings kind of blindsided the list
suddenly.

I wasn't keeping up on IRC today, which would explain the sudden burst
of activity on the list.

-- 
# Curt Micol

Re: Data model names, reloaded

Posted by Michael Greene <mi...@gmail.com>.

I haven't seen any 'piss and vinegar'.  Personally, I was just trying to
explain my -1 and suggest that maybe it could be done in the future.  The
reason that it was timed as such was not a concerted effort from a cabal or
anything else sinister, but that the thread was brought up on IRC and
arguments stirred.
I should have chimed in with my -1 earlier but other issues had consumed my
attention, my apologies.

Michael

On Mon, Aug 24, 2009 at 12:05 PM, Curt Micol <as...@gmail.com> wrote:

> I really don't understand the flurry of aggressive responses.  It's
> okay to say, "I don't think this is a good idea", or even a "-1,
> timing isn't great" without bringing piss and vinegar to the thread.
> Votes are cast on everything else, why not here?
>
> Would've been nice to have an actual discussion rather than silence
> and then a wrecking ball.
>
> --
> # Curt Micol
>

Re: Data model names, reloaded

Posted by Curt Micol <as...@gmail.com>.

I really don't understand the flurry of aggressive responses.  It's
okay to say, "I don't think this is a good idea", or even a "-1,
timing isn't great" without bringing piss and vinegar to the thread.
Votes are cast on everything else, why not here?

Would've been nice to have an actual discussion rather than silence
and then a wrecking ball.

-- 
# Curt Micol

Re: Data model names, reloaded

Posted by Tim Estes <ti...@digitalreasoning.com>.

+1. I can speak to having to educate a large community on some of the  
merits of these concepts. Moving to the other names would create a lot  
of confusion such as "why don't we just run an Oracle RAC if we just  
need to scale the Database, record sets, and fields"

With Columns, SuperColumns, and key spaces - they know they are  
working with something different and do need to think a little  
different. I realize this creates some educational overhead but it  
comes at savings in sense ambiguity that would be introduced by the  
proposed shift.

-- 
Tim Estes
CEO
Digital Reasoning Systems


On Aug 24, 2009, at 11:20 AM, Eric Evans wrote:

> On Mon, 2009-08-24 at 08:50 -0700, Ryan King wrote:
>> We have never indicated that we expected others to do the work. I
>> actually have some patches for our first renaming suggestion already,
>> but given the massive size of the change, we though it prudent to
>> discuss it with others before investing the time in making the  
>> change.
>> I've set aside several days this week just to work on patches for
>> this.
>
> To me, it's no consolation that you guys are willing to make the  
> source
> and documentation changes. It doesn't matter *who* makes them, the
> amount of churn is going to be enormous, the proposed changes are very
> destabilizing, and I would argue that the current naming is so
> entrenched that no matter how thorough you think you are being,  
> context
> will be lost.
>
> There is also all sorts of "documentation" that is beyond your control
> to change. Presentation materials, videos, blog postings, etc will all
> be rendered moot the moment changes like these occur.
>
> That's not to mention all of the current users who will now be  
> forced to
> rewire their brains to understand the new terminology.
>
> Now the argument as I understand it is that the proposed naming is so
> much more succinct, that it will make Cassandra so much easier for
> people to understand, that it warrants all of this cost. That it  
> will be
> worth it in the long term. I disagree. It isn't clear to me that the
> proposed names are *any* better than what we have, let alone that they
> warrant this sort of disruptive change
>
> -- 
> Eric Evans
> eevans@rackspace.com
>

Re: Data model names, reloaded

Posted by Eric Evans <ee...@rackspace.com>.

On Mon, 2009-08-24 at 08:50 -0700, Ryan King wrote:
> We have never indicated that we expected others to do the work. I
> actually have some patches for our first renaming suggestion already,
> but given the massive size of the change, we though it prudent to
> discuss it with others before investing the time in making the change.
> I've set aside several days this week just to work on patches for
> this.

To me, it's no consolation that you guys are willing to make the source
and documentation changes. It doesn't matter *who* makes them, the
amount of churn is going to be enormous, the proposed changes are very
destabilizing, and I would argue that the current naming is so
entrenched that no matter how thorough you think you are being, context
will be lost.

There is also all sorts of "documentation" that is beyond your control
to change. Presentation materials, videos, blog postings, etc will all
be rendered moot the moment changes like these occur.

That's not to mention all of the current users who will now be forced to
rewire their brains to understand the new terminology.

Now the argument as I understand it is that the proposed naming is so
much more succinct, that it will make Cassandra so much easier for
people to understand, that it warrants all of this cost. That it will be
worth it in the long term. I disagree. It isn't clear to me that the
proposed names are *any* better than what we have, let alone that they
warrant this sort of disruptive change

-- 
Eric Evans
eevans@rackspace.com

Re: Data model names, reloaded

Posted by Ryan King <ry...@twitter.com>.

On Mon, Aug 24, 2009 at 8:46 AM, Jonathan Ellis<jb...@gmail.com> wrote:
> On Mon, Aug 24, 2009 at 10:26 AM, Toby DiPasquale<co...@gmail.com> wrote:
>> That feels to me to be a short-sighted point of view. I'd imagine that
>> its more important for people be able to understand the data model
>> than meeting some kind of arbitrary timeline. I, too, find the current
>> naming confusing and would love for this to be improved
>
> I'm going to have to call bullshit on the idea that this is about
> taking the time to get things right on the one hand and "meeting some
> kind of arbitrary timeline" on the other.
>
> Put that way, the choice is obvious!  Except of course that is not a
> fair representation of the tradeoffs.
>
> The release timeline isn't something arbitrary we pulled out of our
> asses.  0.3 has serious issues that 0.4 fixes, including but not
> limited to the API.  (The changelog was recently posted; I won't
> repeat it here.)  Having an updated, stable 0.4 out there will be far
> more valuable to the project than rearranging the deck chairs of
> terminology.  Cassandra is fundamentally a different model than the
> relational one everyone knows and loves.  That's the root of the
> problem with understanding Cassandra: the concepts.  The labels you
> attach to those, not so much.
>
> Again, we've been clear about the direction and the timeline for 0.4.
> This kind of proposal needed to happen a month ago.  It didn't.  That
> may be a shame, but that's how it works, and trying to hold up
> everyone else for your pet feature (without even patches! you'll
> pardon me if the implication seems to be that you expect others to do
> that part for you) is rude.  That's not how OSS should work.

We have never indicated that we expected others to do the work. I
actually have some patches for our first renaming suggestion already,
but given the massive size of the change, we though it prudent to
discuss it with others before investing the time in making the change.
I've set aside several days this week just to work on patches for
this.

-ryan

Re: Data model names, reloaded

Posted by Sandeep Tata <sa...@gmail.com>.

> terminology.  Cassandra is fundamentally a different model than the
> relational one everyone knows and loves.  That's the root of the
> problem with understanding Cassandra: the concepts.  The labels you
> attach to those, not so much.

I have to agree with this.

Docs & tutorials that highlight the differences from relational
modeling are probably going to be *far* more useful than changes to
names (in the API or in the code).

IMHO articles like
http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
will be far more useful than a name change, and perhaps no
simpler/easier after a name change.

Re: Data model names, reloaded

Posted by Jonathan Ellis <jb...@gmail.com>.

On Mon, Aug 24, 2009 at 10:26 AM, Toby DiPasquale<co...@gmail.com> wrote:
> That feels to me to be a short-sighted point of view. I'd imagine that
> its more important for people be able to understand the data model
> than meeting some kind of arbitrary timeline. I, too, find the current
> naming confusing and would love for this to be improved

I'm going to have to call bullshit on the idea that this is about
taking the time to get things right on the one hand and "meeting some
kind of arbitrary timeline" on the other.

Put that way, the choice is obvious!  Except of course that is not a
fair representation of the tradeoffs.

The release timeline isn't something arbitrary we pulled out of our
asses.  0.3 has serious issues that 0.4 fixes, including but not
limited to the API.  (The changelog was recently posted; I won't
repeat it here.)  Having an updated, stable 0.4 out there will be far
more valuable to the project than rearranging the deck chairs of
terminology.  Cassandra is fundamentally a different model than the
relational one everyone knows and loves.  That's the root of the
problem with understanding Cassandra: the concepts.  The labels you
attach to those, not so much.

Again, we've been clear about the direction and the timeline for 0.4.
This kind of proposal needed to happen a month ago.  It didn't.  That
may be a shame, but that's how it works, and trying to hold up
everyone else for your pet feature (without even patches! you'll
pardon me if the implication seems to be that you expect others to do
that part for you) is rude.  That's not how OSS should work.

-Jonathan

Re: Data model names, reloaded

Posted by Toby DiPasquale <co...@gmail.com>.

On Mon, Aug 24, 2009 at 11:21 AM, Jonathan Ellis<jb...@gmail.com> wrote:
> IMO the window for making this kind of change has passed.  We've
> talked about finalizing the 0.4 api weeks ago, we got a beta out with
> it, and it does the job.  The timeline wasn't a surprise to anyone
> paying attention to the list.  It's time to move on.

That feels to me to be a short-sighted point of view. I'd imagine that
its more important for people be able to understand the data model
than meeting some kind of arbitrary timeline. I, too, find the current
naming confusing and would love for this to be improved, both for
myself and also to lower the barrier for others to start using
Cassandra.

-- 
Toby DiPasquale

Re: Data model names, reloaded

Posted by mo...@gmail.com.

i agreeeven Cassandra .3 was very usable

On Mon, Aug 24, 2009 at 8:21 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> IMO the window for making this kind of change has passed.  We've
> talked about finalizing the 0.4 api weeks ago, we got a beta out with
> it, and it does the job.  The timeline wasn't a surprise to anyone
> paying attention to the list.  It's time to move on.
>
> -Jonathan
>
> On Fri, Aug 21, 2009 at 1:36 PM, Evan Weaver<ew...@gmail.com> wrote:
> > I think the below scheme successfully avoids the current
> > misconceptions, and addresses the issues raised in the previous
> > thread.
> >
> > The names are memorable and short, Anglo-Saxon-style, and take
> > advantage of existing database concepts in non-conflicting ways. They
> > are not ambiguous or novel. They descend step-by-step from the
> > container to the thing contained.
> >
> > Proposal 2:
> >
> >  Database
> >  Record set
> >  Record (w/key)
> >  Field set
> >  Field
> >
> > Notes:
> >  * Database is the same as in SQL/CouchDB/MongoDB
> >  * Record set is based on "record", below. It expresses a container of
> > unique rows, without the BigTable baggage (see PS).
> >  * Record is the same as row, without the relational baggage.
> >  * Field set is based on "field", below, and parallels "record set".
> > It expresses a container of unique fields.
> >  * Field is the same as in CouchDB, and does not carry the SQL baggage
> > of "column", or the relational-theory baggage of "attribute".
> >
> > I think if we adopted these, we would quickly move from "most
> > confusing data model" to "least confusing", based on my research into
> > other popular terminology
> > (http://markmail.org/thread/6vys3hk774zcrd6v).
> >
> > Evan
> >
> > ----
> >
> > PS. The implementation of column families hasn't changed from
> > BigTable, but the use in modeling has. Common Cassandra designs are
> > more row-oriented than column-oriented.
> >
> > With that in mind, keyspace, row, and super-column could also each be
> > called column family. They all have sets of related columns in them,
> > among other things. Everything but the column itself is some kind of
> > "column family". This is a big stumbling block.
> >
> > I want a new user to be able to look at any level and answer "what is
> > the immediate container of this object?" If they can't do that, then
> > the term is ambiguous.
> >
> > --
> > Evan Weaver
> >
>



-- 
Bidegg worlds best auction site
http://bidegg.com