You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2015/03/17 22:06:11 UTC

3.0 and the Cassandra release process

Cassandra 2.1 was released in September, which means that if we were on
track with our stated goal of six month releases, 3.0 would be done about
now.  Instead, we haven't even delivered a beta.  The immediate cause this
time is blocking for 8099
<https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality is
that nobody should really be surprised.  Something always comes up -- we've
averaged about nine months since 1.0, with 2.1 taking an entire year.

We could make theory align with reality by acknowledging, "if nine months
is our 'natural' release schedule, then so be it."  But I think we can do
better.

Broadly speaking, we have two constituencies with Cassandra releases:

First, we have the users who are building or porting an application on
Cassandra.  These users want the newest features to make their job easier.
If 2.1.0 has a few bugs, it's not the end of the world.  They have time to
wait for 2.1.x to stabilize while they write their code.  They would like
to see us deliver on our six month schedule or even faster.

Second, we have the users who have an application in production.  These
users, or their bosses, want Cassandra to be as stable as possible.
Assuming they deploy on a stable release like 2.0.12, they don't want to
touch it.  They would like to see us release *less* often.  (Because that
means they have to do less upgrades while remaining in our backwards
compatibility window.)

With our current "big release every X months" model, these users' needs are
in tension.

We discussed this six months ago, and ended up with this:

What if we tried a [four month] release cycle, BUT we would guarantee that
> you could do a rolling upgrade until we bump the supermajor version? So 2.0
> could upgrade to 3.0 without having to go through 2.1.  (But to go to 3.1
> or 4.0 you would have to go through 3.0.)
>

Crucially, I added

Whether this is reasonable depends on how fast we can stabilize releases.
> 2.1.0 will be a good test of this.
>

Unfortunately, even after DataStax hired half a dozen full-time test
engineers, 2.1.0 continued the proud tradition of being unready for
production use, with "wait for .5 before upgrading" once again looking like
a good guideline.

I’m starting to think that the entire model of “write a bunch of new
features all at once and then try to stabilize it for release” is broken.
We’ve been trying that for years and empirically speaking the evidence is
that it just doesn’t work, either from a stability standpoint or even just
shipping on time.

A big reason that it takes us so long to stabilize new releases now is
that, because our major release cycle is so long, it’s super tempting to
slip in “just one” new feature into bugfix releases, and I’m as guilty of
that as anyone.

For similar reasons, it’s difficult to do a meaningful freeze with big
feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
also have significant work done (but not finished) on 6230, 7970, 6696, and
6477, all of which are meaningful improvements that address demonstrated
user pain.  So if we keep doing what we’ve been doing, our choices are to
either delay 3.0 further while we finish and stabilize these, or we wait
nine months to a year for the next release.  Either way, one of our
constituencies gets disappointed.

So, I’d like to try something different.  I think we were on the right
track with shorter releases with more compatibility.  But I’d like to throw
in a twist.  Intel cuts down on risk with a “tick-tock” schedule for new
architectures and process shrinks instead of trying to do both at once.  We
can do something similar here:

One month releases.  Period.  If it’s not done, it can wait.
*Every other release only accepts bug fixes.*

By itself, one-month releases are going to dramatically reduce the
complexity of testing and debugging new releases -- and bugs that do slip
past us will only affect a smaller percentage of users, avoiding the “big
release has a bunch of bugs no one has seen before and pretty much everyone
is hit by something” scenario.  But by adding in the second rule, I think
we have a real chance to make a quantum leap here: stable, production-ready
releases every two months.

So here is my proposal for 3.0:

We’re just about ready to start serious review of 8099.  When that’s done,
we branch 3.0 and cut a beta and then release candidates.  Whatever isn’t
done by then, has to wait; unlike prior betas, we will only accept bug
fixes into 3.0 after branching.

One month after 3.0, we will ship 3.1 (with new features).  At the same
time, we will branch 3.2.  New features in trunk will go into 3.3.  The 3.2
branch will only get bug fixes.  We will maintain backwards compatibility
for all of 3.x; eventually (no less than a year) we will pick a release to
be 4.0, and drop deprecated features and old backwards compatibilities.
Otherwise there will be nothing special about the 4.0 designation.  (Note
that with an “odd releases have new features, even releases only have bug
fixes” policy, 4.0 will actually be *more* stable than 3.11.)

Larger features can continue to be developed in separate branches, the way
8099 is being worked on today, and committed to trunk when ready.  So this
is not saying that we are limited only to features we can build in a single
month.

Some things will have to change with our dev process, for the better.  In
particular, with one month to commit new features, we don’t have room for
committing sloppy work and stabilizing it later.  Trunk has to be stable at
all times.  I asked Ariel Weisberg to put together his thoughts separately
on what worked for his team at VoltDB, and how we can apply that to
Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.  (TLDR:
Redefine “done” to include automated tests.  Infrastructure to run tests
against github branches before merging to trunk.  A new test harness for
long-running regression tests.)

I’m optimistic that as we improve our process this way, our even releases
will become increasingly stable.  If so, we can skip sub-minor releases
(3.2.x) entirely, and focus on keeping the release train moving.  In the
meantime, we will continue delivering 2.1.x stability releases.

This won’t be an entirely smooth transition.  In particular, you will have
noticed that 3.1 will get more than a month’s worth of new features while
we stabilize 3.0 as the last of the old way of doing things, so some
patience is in order as we try this out.  By 3.4 and 3.6 later this year we
should have a good idea if this is working, and we can make adjustments as
warranted.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: 3.0 and the Cassandra release process

Posted by Jacob Rhoden <ja...@me.com>.

Thanks for everyone's hard work and perseverance, Cassandra to is truly amazing. It really does make redundancy so much easier making my life far less stressful (: it surely is this awesomeness that creates the demand for features in the first place. So this is a great problem to have.

Certainly having a product where the user base continually encourages people not to use the current major version is a situation that could be improved.

Doing something to attempt to improve the current process is better than (for example) doing nothing. Modelling a process based on another companies proven strategy seems better than making it up as you go. 

I suggest anyone who would minus one this should also need to include an alternate proposal to change the status quo.

Thanks,
Jacob



______________________________
Sent from iPhone

> On 18 Mar 2015, at 8:06 am, Jonathan Ellis <jb...@gmail.com> wrote:
> 
> Cassandra 2.1 was released in September, which means that if we were on
> track with our stated goal of six month releases, 3.0 would be done about
> now.  Instead, we haven't even delivered a beta.  The immediate cause this
> time is blocking for 8099
> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality is
> that nobody should really be surprised.  Something always comes up -- we've
> averaged about nine months since 1.0, with 2.1 taking an entire year.
> 
> We could make theory align with reality by acknowledging, "if nine months
> is our 'natural' release schedule, then so be it."  But I think we can do
> better.
> 
> Broadly speaking, we have two constituencies with Cassandra releases:
> 
> First, we have the users who are building or porting an application on
> Cassandra.  These users want the newest features to make their job easier.
> If 2.1.0 has a few bugs, it's not the end of the world.  They have time to
> wait for 2.1.x to stabilize while they write their code.  They would like
> to see us deliver on our six month schedule or even faster.
> 
> Second, we have the users who have an application in production.  These
> users, or their bosses, want Cassandra to be as stable as possible.
> Assuming they deploy on a stable release like 2.0.12, they don't want to
> touch it.  They would like to see us release *less* often.  (Because that
> means they have to do less upgrades while remaining in our backwards
> compatibility window.)
> 
> With our current "big release every X months" model, these users' needs are
> in tension.
> 
> We discussed this six months ago, and ended up with this:
> 
> What if we tried a [four month] release cycle, BUT we would guarantee that
>> you could do a rolling upgrade until we bump the supermajor version? So 2.0
>> could upgrade to 3.0 without having to go through 2.1.  (But to go to 3.1
>> or 4.0 you would have to go through 3.0.)
> 
> Crucially, I added
> 
> Whether this is reasonable depends on how fast we can stabilize releases.
>> 2.1.0 will be a good test of this.
> 
> Unfortunately, even after DataStax hired half a dozen full-time test
> engineers, 2.1.0 continued the proud tradition of being unready for
> production use, with "wait for .5 before upgrading" once again looking like
> a good guideline.
> 
> I’m starting to think that the entire model of “write a bunch of new
> features all at once and then try to stabilize it for release” is broken.
> We’ve been trying that for years and empirically speaking the evidence is
> that it just doesn’t work, either from a stability standpoint or even just
> shipping on time.
> 
> A big reason that it takes us so long to stabilize new releases now is
> that, because our major release cycle is so long, it’s super tempting to
> slip in “just one” new feature into bugfix releases, and I’m as guilty of
> that as anyone.
> 
> For similar reasons, it’s difficult to do a meaningful freeze with big
> feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
> also have significant work done (but not finished) on 6230, 7970, 6696, and
> 6477, all of which are meaningful improvements that address demonstrated
> user pain.  So if we keep doing what we’ve been doing, our choices are to
> either delay 3.0 further while we finish and stabilize these, or we wait
> nine months to a year for the next release.  Either way, one of our
> constituencies gets disappointed.
> 
> So, I’d like to try something different.  I think we were on the right
> track with shorter releases with more compatibility.  But I’d like to throw
> in a twist.  Intel cuts down on risk with a “tick-tock” schedule for new
> architectures and process shrinks instead of trying to do both at once.  We
> can do something similar here:
> 
> One month releases.  Period.  If it’s not done, it can wait.
> *Every other release only accepts bug fixes.*
> 
> By itself, one-month releases are going to dramatically reduce the
> complexity of testing and debugging new releases -- and bugs that do slip
> past us will only affect a smaller percentage of users, avoiding the “big
> release has a bunch of bugs no one has seen before and pretty much everyone
> is hit by something” scenario.  But by adding in the second rule, I think
> we have a real chance to make a quantum leap here: stable, production-ready
> releases every two months.
> 
> So here is my proposal for 3.0:
> 
> We’re just about ready to start serious review of 8099.  When that’s done,
> we branch 3.0 and cut a beta and then release candidates.  Whatever isn’t
> done by then, has to wait; unlike prior betas, we will only accept bug
> fixes into 3.0 after branching.
> 
> One month after 3.0, we will ship 3.1 (with new features).  At the same
> time, we will branch 3.2.  New features in trunk will go into 3.3.  The 3.2
> branch will only get bug fixes.  We will maintain backwards compatibility
> for all of 3.x; eventually (no less than a year) we will pick a release to
> be 4.0, and drop deprecated features and old backwards compatibilities.
> Otherwise there will be nothing special about the 4.0 designation.  (Note
> that with an “odd releases have new features, even releases only have bug
> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> 
> Larger features can continue to be developed in separate branches, the way
> 8099 is being worked on today, and committed to trunk when ready.  So this
> is not saying that we are limited only to features we can build in a single
> month.
> 
> Some things will have to change with our dev process, for the better.  In
> particular, with one month to commit new features, we don’t have room for
> committing sloppy work and stabilizing it later.  Trunk has to be stable at
> all times.  I asked Ariel Weisberg to put together his thoughts separately
> on what worked for his team at VoltDB, and how we can apply that to
> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.  (TLDR:
> Redefine “done” to include automated tests.  Infrastructure to run tests
> against github branches before merging to trunk.  A new test harness for
> long-running regression tests.)
> 
> I’m optimistic that as we improve our process this way, our even releases
> will become increasingly stable.  If so, we can skip sub-minor releases
> (3.2.x) entirely, and focus on keeping the release train moving.  In the
> meantime, we will continue delivering 2.1.x stability releases.
> 
> This won’t be an entirely smooth transition.  In particular, you will have
> noticed that 3.1 will get more than a month’s worth of new features while
> we stabilize 3.0 as the last of the old way of doing things, so some
> patience is in order as we try this out.  By 3.4 and 3.6 later this year we
> should have a good idea if this is working, and we can make adjustments as
> warranted.
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced

Re: 3.0 and the Cassandra release process

Posted by Gary Dusbabek <gd...@gmail.com>.

+1. This sounds like a step in a better direction.

Gary.

On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Cassandra 2.1 was released in September, which means that if we were on
> track with our stated goal of six month releases, 3.0 would be done about
> now.  Instead, we haven't even delivered a beta.  The immediate cause this
> time is blocking for 8099
> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality is
> that nobody should really be surprised.  Something always comes up -- we've
> averaged about nine months since 1.0, with 2.1 taking an entire year.
>
> We could make theory align with reality by acknowledging, "if nine months
> is our 'natural' release schedule, then so be it."  But I think we can do
> better.
>
> Broadly speaking, we have two constituencies with Cassandra releases:
>
> First, we have the users who are building or porting an application on
> Cassandra.  These users want the newest features to make their job easier.
> If 2.1.0 has a few bugs, it's not the end of the world.  They have time to
> wait for 2.1.x to stabilize while they write their code.  They would like
> to see us deliver on our six month schedule or even faster.
>
> Second, we have the users who have an application in production.  These
> users, or their bosses, want Cassandra to be as stable as possible.
> Assuming they deploy on a stable release like 2.0.12, they don't want to
> touch it.  They would like to see us release *less* often.  (Because that
> means they have to do less upgrades while remaining in our backwards
> compatibility window.)
>
> With our current "big release every X months" model, these users' needs are
> in tension.
>
> We discussed this six months ago, and ended up with this:
>
> What if we tried a [four month] release cycle, BUT we would guarantee that
> > you could do a rolling upgrade until we bump the supermajor version? So
> 2.0
> > could upgrade to 3.0 without having to go through 2.1.  (But to go to 3.1
> > or 4.0 you would have to go through 3.0.)
> >
>
> Crucially, I added
>
> Whether this is reasonable depends on how fast we can stabilize releases.
> > 2.1.0 will be a good test of this.
> >
>
> Unfortunately, even after DataStax hired half a dozen full-time test
> engineers, 2.1.0 continued the proud tradition of being unready for
> production use, with "wait for .5 before upgrading" once again looking like
> a good guideline.
>
> I'm starting to think that the entire model of "write a bunch of new
> features all at once and then try to stabilize it for release" is broken.
> We've been trying that for years and empirically speaking the evidence is
> that it just doesn't work, either from a stability standpoint or even just
> shipping on time.
>
> A big reason that it takes us so long to stabilize new releases now is
> that, because our major release cycle is so long, it's super tempting to
> slip in "just one" new feature into bugfix releases, and I'm as guilty of
> that as anyone.
>
> For similar reasons, it's difficult to do a meaningful freeze with big
> feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
> also have significant work done (but not finished) on 6230, 7970, 6696, and
> 6477, all of which are meaningful improvements that address demonstrated
> user pain.  So if we keep doing what we've been doing, our choices are to
> either delay 3.0 further while we finish and stabilize these, or we wait
> nine months to a year for the next release.  Either way, one of our
> constituencies gets disappointed.
>
> So, I'd like to try something different.  I think we were on the right
> track with shorter releases with more compatibility.  But I'd like to throw
> in a twist.  Intel cuts down on risk with a "tick-tock" schedule for new
> architectures and process shrinks instead of trying to do both at once.  We
> can do something similar here:
>
> One month releases.  Period.  If it's not done, it can wait.
> *Every other release only accepts bug fixes.*
>
> By itself, one-month releases are going to dramatically reduce the
> complexity of testing and debugging new releases -- and bugs that do slip
> past us will only affect a smaller percentage of users, avoiding the "big
> release has a bunch of bugs no one has seen before and pretty much everyone
> is hit by something" scenario.  But by adding in the second rule, I think
> we have a real chance to make a quantum leap here: stable, production-ready
> releases every two months.
>
> So here is my proposal for 3.0:
>
> We're just about ready to start serious review of 8099.  When that's done,
> we branch 3.0 and cut a beta and then release candidates.  Whatever isn't
> done by then, has to wait; unlike prior betas, we will only accept bug
> fixes into 3.0 after branching.
>
> One month after 3.0, we will ship 3.1 (with new features).  At the same
> time, we will branch 3.2.  New features in trunk will go into 3.3.  The 3.2
> branch will only get bug fixes.  We will maintain backwards compatibility
> for all of 3.x; eventually (no less than a year) we will pick a release to
> be 4.0, and drop deprecated features and old backwards compatibilities.
> Otherwise there will be nothing special about the 4.0 designation.  (Note
> that with an "odd releases have new features, even releases only have bug
> fixes" policy, 4.0 will actually be *more* stable than 3.11.)
>
> Larger features can continue to be developed in separate branches, the way
> 8099 is being worked on today, and committed to trunk when ready.  So this
> is not saying that we are limited only to features we can build in a single
> month.
>
> Some things will have to change with our dev process, for the better.  In
> particular, with one month to commit new features, we don't have room for
> committing sloppy work and stabilizing it later.  Trunk has to be stable at
> all times.  I asked Ariel Weisberg to put together his thoughts separately
> on what worked for his team at VoltDB, and how we can apply that to
> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.  (TLDR:
> Redefine "done" to include automated tests.  Infrastructure to run tests
> against github branches before merging to trunk.  A new test harness for
> long-running regression tests.)
>
> I'm optimistic that as we improve our process this way, our even releases
> will become increasingly stable.  If so, we can skip sub-minor releases
> (3.2.x) entirely, and focus on keeping the release train moving.  In the
> meantime, we will continue delivering 2.1.x stability releases.
>
> This won't be an entirely smooth transition.  In particular, you will have
> noticed that 3.1 will get more than a month's worth of new features while
> we stabilize 3.0 as the last of the old way of doing things, so some
> patience is in order as we try this out.  By 3.4 and 3.6 later this year we
> should have a good idea if this is working, and we can make adjustments as
> warranted.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

RE: 3.0 and the Cassandra release process

Posted by "Chuck Allen -X (charlall - RANDSTAD NORTH AMERICA LP at Cisco)" <ch...@cisco.com>.

O yea, and BGL4 is now green without any impending risks. 

Additionally, the other yellow projects LWR05 & MTV05 are on a path that will lead to green in coming weeks.

Thats All Folks

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Wednesday, April 15, 2015 3:40 AM
To: dev
Subject: Re: 3.0 and the Cassandra release process

Short answer: yes.

Longer answer, pasted from my reply to Jon Haddad elsewhere in the thread:

We are moving away from designating major releases like 3.0 as "special,"
other than as a marker of compatibility.  In fact we are moving away from major releases entirely, with each release being a much smaller, digestible unit of change, and the ultimate goal of every even release being production-quality.

This means that bugs won't pile up and compound each other.  And bugs that do slip through will affect less users.  As 3.x stabilizes, more people will try out the releases, yielding better quality, yielding even more people trying them out in a virtuous cycle.

This won't just happen by wishing for it.  I am very serious about investing the energy we would have spent on backporting fixes to a "stable"
branch, into improving our QA process and test coverage.  After a very short list of in-progress features that may not make the 3.0 cutoff (#6477,
#6696 come to mind) I'm willing to virtually pause new feature development entirely to make this happen.

On Tue, Apr 14, 2015 at 11:53 PM, Phil Yang <ud...@gmail.com> wrote:

> Hi Jonathan,
>
> How long will tick-tock releases will be maintained? Do users have to 
> upgrade to a new even release with new features to fix the bugs in an 
> older even release?
>
> 2015-04-14 6:28 GMT+08:00 Jonathan Ellis <jb...@gmail.com>:
>
> > On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> > >
> > > I’m optimistic that as we improve our process this way, our even
> releases
> > > will become increasingly stable.  If so, we can skip sub-minor 
> > > releases
> > > (3.2.x) entirely, and focus on keeping the release train moving.  
> > > In
> the
> > > meantime, we will continue delivering 2.1.x stability releases.
> > >
> >
> > The weak point of this plan is the transition from the "big release"
> > development methodology culminating in 3.0, to the monthly tick-tock 
> > releases.  Since 3.0 needs to go through a beta/release candidate 
> > phase, during which we're going to be serious about not adding new 
> > features,
> that
> > means that 3.1 will come with multiple months worth of features, so 
> > right off the bat we're starting from a disadvantage from a 
> > stability
> standpoint.
> >
> > Recognizing that it will take several months for the tick-tock 
> > releases
> to
> > stabilize, I would like to ship 3.0.x stability releases 
> > concurrently
> with
> > 3.y tick-tock releases.  This should stabilize 3.0.x faster than
> tick-tock,
> > while at the same time hedging our bets such that if we assess 
> > tick-tock
> in
> > six months and decide it's not delivering on its goals, we're not 
> > six months behind in having a usable set of features that we shipped in 3.0.
> >
> > So, to summarize:
> >
> > - New features will *only* go into tick-tock releases.
> > - Bug fixes will go into tick-tock releases and a 3.0.x branch, 
> > which
> will
> > be maintained for at least a year
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
> >
>
>
>
> --
> Thanks,
> Phil Yang
>

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: 3.0 and the Cassandra release process

Posted by Jonathan Ellis <jb...@gmail.com>.

Short answer: yes.

Longer answer, pasted from my reply to Jon Haddad elsewhere in the thread:

We are moving away from designating major releases like 3.0 as "special,"
other than as a marker of compatibility.  In fact we are moving away from
major releases entirely, with each release being a much smaller, digestible
unit of change, and the ultimate goal of every even release being
production-quality.

This means that bugs won't pile up and compound each other.  And bugs that
do slip through will affect less users.  As 3.x stabilizes, more people
will try out the releases, yielding better quality, yielding even more
people trying them out in a virtuous cycle.

This won't just happen by wishing for it.  I am very serious about
investing the energy we would have spent on backporting fixes to a "stable"
branch, into improving our QA process and test coverage.  After a very
short list of in-progress features that may not make the 3.0 cutoff (#6477,
#6696 come to mind) I'm willing to virtually pause new feature development
entirely to make this happen.

On Tue, Apr 14, 2015 at 11:53 PM, Phil Yang <ud...@gmail.com> wrote:

> Hi Jonathan,
>
> How long will tick-tock releases will be maintained? Do users have to
> upgrade to a new even release with new features to fix the bugs in an older
> even release?
>
> 2015-04-14 6:28 GMT+08:00 Jonathan Ellis <jb...@gmail.com>:
>
> > On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> > >
> > > I’m optimistic that as we improve our process this way, our even
> releases
> > > will become increasingly stable.  If so, we can skip sub-minor releases
> > > (3.2.x) entirely, and focus on keeping the release train moving.  In
> the
> > > meantime, we will continue delivering 2.1.x stability releases.
> > >
> >
> > The weak point of this plan is the transition from the "big release"
> > development methodology culminating in 3.0, to the monthly tick-tock
> > releases.  Since 3.0 needs to go through a beta/release candidate phase,
> > during which we're going to be serious about not adding new features,
> that
> > means that 3.1 will come with multiple months worth of features, so right
> > off the bat we're starting from a disadvantage from a stability
> standpoint.
> >
> > Recognizing that it will take several months for the tick-tock releases
> to
> > stabilize, I would like to ship 3.0.x stability releases concurrently
> with
> > 3.y tick-tock releases.  This should stabilize 3.0.x faster than
> tick-tock,
> > while at the same time hedging our bets such that if we assess tick-tock
> in
> > six months and decide it's not delivering on its goals, we're not six
> > months behind in having a usable set of features that we shipped in 3.0.
> >
> > So, to summarize:
> >
> > - New features will *only* go into tick-tock releases.
> > - Bug fixes will go into tick-tock releases and a 3.0.x branch, which
> will
> > be maintained for at least a year
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
> >
>
>
>
> --
> Thanks,
> Phil Yang
>

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: 3.0 and the Cassandra release process

Posted by Phil Yang <ud...@gmail.com>.

Hi Jonathan,

How long will tick-tock releases will be maintained? Do users have to
upgrade to a new even release with new features to fix the bugs in an older
even release?

2015-04-14 6:28 GMT+08:00 Jonathan Ellis <jb...@gmail.com>:

> On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
> >
> > I’m optimistic that as we improve our process this way, our even releases
> > will become increasingly stable.  If so, we can skip sub-minor releases
> > (3.2.x) entirely, and focus on keeping the release train moving.  In the
> > meantime, we will continue delivering 2.1.x stability releases.
> >
>
> The weak point of this plan is the transition from the "big release"
> development methodology culminating in 3.0, to the monthly tick-tock
> releases.  Since 3.0 needs to go through a beta/release candidate phase,
> during which we're going to be serious about not adding new features, that
> means that 3.1 will come with multiple months worth of features, so right
> off the bat we're starting from a disadvantage from a stability standpoint.
>
> Recognizing that it will take several months for the tick-tock releases to
> stabilize, I would like to ship 3.0.x stability releases concurrently with
> 3.y tick-tock releases.  This should stabilize 3.0.x faster than tick-tock,
> while at the same time hedging our bets such that if we assess tick-tock in
> six months and decide it's not delivering on its goals, we're not six
> months behind in having a usable set of features that we shipped in 3.0.
>
> So, to summarize:
>
> - New features will *only* go into tick-tock releases.
> - Bug fixes will go into tick-tock releases and a 3.0.x branch, which will
> be maintained for at least a year
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>



-- 
Thanks,
Phil Yang

Re: 3.0 and the Cassandra release process

Posted by Jonathan Ellis <jb...@gmail.com>.

On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis <jb...@gmail.com> wrote:

>
> I’m optimistic that as we improve our process this way, our even releases
> will become increasingly stable.  If so, we can skip sub-minor releases
> (3.2.x) entirely, and focus on keeping the release train moving.  In the
> meantime, we will continue delivering 2.1.x stability releases.
>

The weak point of this plan is the transition from the "big release"
development methodology culminating in 3.0, to the monthly tick-tock
releases.  Since 3.0 needs to go through a beta/release candidate phase,
during which we're going to be serious about not adding new features, that
means that 3.1 will come with multiple months worth of features, so right
off the bat we're starting from a disadvantage from a stability standpoint.

Recognizing that it will take several months for the tick-tock releases to
stabilize, I would like to ship 3.0.x stability releases concurrently with
3.y tick-tock releases.  This should stabilize 3.0.x faster than tick-tock,
while at the same time hedging our bets such that if we assess tick-tock in
six months and decide it's not delivering on its goals, we're not six
months behind in having a usable set of features that we shipped in 3.0.

So, to summarize:

- New features will *only* go into tick-tock releases.
- Bug fixes will go into tick-tock releases and a 3.0.x branch, which will
be maintained for at least a year

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: 3.0 and the Cassandra release process

Posted by Sylvain Lebresne <sy...@datastax.com>.

+1

On Tue, Mar 17, 2015 at 10:06 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Cassandra 2.1 was released in September, which means that if we were on
> track with our stated goal of six month releases, 3.0 would be done about
> now.  Instead, we haven't even delivered a beta.  The immediate cause this
> time is blocking for 8099
> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality is
> that nobody should really be surprised.  Something always comes up -- we've
> averaged about nine months since 1.0, with 2.1 taking an entire year.
>
> We could make theory align with reality by acknowledging, "if nine months
> is our 'natural' release schedule, then so be it."  But I think we can do
> better.
>
> Broadly speaking, we have two constituencies with Cassandra releases:
>
> First, we have the users who are building or porting an application on
> Cassandra.  These users want the newest features to make their job easier.
> If 2.1.0 has a few bugs, it's not the end of the world.  They have time to
> wait for 2.1.x to stabilize while they write their code.  They would like
> to see us deliver on our six month schedule or even faster.
>
> Second, we have the users who have an application in production.  These
> users, or their bosses, want Cassandra to be as stable as possible.
> Assuming they deploy on a stable release like 2.0.12, they don't want to
> touch it.  They would like to see us release *less* often.  (Because that
> means they have to do less upgrades while remaining in our backwards
> compatibility window.)
>
> With our current "big release every X months" model, these users' needs are
> in tension.
>
> We discussed this six months ago, and ended up with this:
>
> What if we tried a [four month] release cycle, BUT we would guarantee that
> > you could do a rolling upgrade until we bump the supermajor version? So
> 2.0
> > could upgrade to 3.0 without having to go through 2.1.  (But to go to 3.1
> > or 4.0 you would have to go through 3.0.)
> >
>
> Crucially, I added
>
> Whether this is reasonable depends on how fast we can stabilize releases.
> > 2.1.0 will be a good test of this.
> >
>
> Unfortunately, even after DataStax hired half a dozen full-time test
> engineers, 2.1.0 continued the proud tradition of being unready for
> production use, with "wait for .5 before upgrading" once again looking like
> a good guideline.
>
> I’m starting to think that the entire model of “write a bunch of new
> features all at once and then try to stabilize it for release” is broken.
> We’ve been trying that for years and empirically speaking the evidence is
> that it just doesn’t work, either from a stability standpoint or even just
> shipping on time.
>
> A big reason that it takes us so long to stabilize new releases now is
> that, because our major release cycle is so long, it’s super tempting to
> slip in “just one” new feature into bugfix releases, and I’m as guilty of
> that as anyone.
>
> For similar reasons, it’s difficult to do a meaningful freeze with big
> feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
> also have significant work done (but not finished) on 6230, 7970, 6696, and
> 6477, all of which are meaningful improvements that address demonstrated
> user pain.  So if we keep doing what we’ve been doing, our choices are to
> either delay 3.0 further while we finish and stabilize these, or we wait
> nine months to a year for the next release.  Either way, one of our
> constituencies gets disappointed.
>
> So, I’d like to try something different.  I think we were on the right
> track with shorter releases with more compatibility.  But I’d like to throw
> in a twist.  Intel cuts down on risk with a “tick-tock” schedule for new
> architectures and process shrinks instead of trying to do both at once.  We
> can do something similar here:
>
> One month releases.  Period.  If it’s not done, it can wait.
> *Every other release only accepts bug fixes.*
>
> By itself, one-month releases are going to dramatically reduce the
> complexity of testing and debugging new releases -- and bugs that do slip
> past us will only affect a smaller percentage of users, avoiding the “big
> release has a bunch of bugs no one has seen before and pretty much everyone
> is hit by something” scenario.  But by adding in the second rule, I think
> we have a real chance to make a quantum leap here: stable, production-ready
> releases every two months.
>
> So here is my proposal for 3.0:
>
> We’re just about ready to start serious review of 8099.  When that’s done,
> we branch 3.0 and cut a beta and then release candidates.  Whatever isn’t
> done by then, has to wait; unlike prior betas, we will only accept bug
> fixes into 3.0 after branching.
>
> One month after 3.0, we will ship 3.1 (with new features).  At the same
> time, we will branch 3.2.  New features in trunk will go into 3.3.  The 3.2
> branch will only get bug fixes.  We will maintain backwards compatibility
> for all of 3.x; eventually (no less than a year) we will pick a release to
> be 4.0, and drop deprecated features and old backwards compatibilities.
> Otherwise there will be nothing special about the 4.0 designation.  (Note
> that with an “odd releases have new features, even releases only have bug
> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
>
> Larger features can continue to be developed in separate branches, the way
> 8099 is being worked on today, and committed to trunk when ready.  So this
> is not saying that we are limited only to features we can build in a single
> month.
>
> Some things will have to change with our dev process, for the better.  In
> particular, with one month to commit new features, we don’t have room for
> committing sloppy work and stabilizing it later.  Trunk has to be stable at
> all times.  I asked Ariel Weisberg to put together his thoughts separately
> on what worked for his team at VoltDB, and how we can apply that to
> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.  (TLDR:
> Redefine “done” to include automated tests.  Infrastructure to run tests
> against github branches before merging to trunk.  A new test harness for
> long-running regression tests.)
>
> I’m optimistic that as we improve our process this way, our even releases
> will become increasingly stable.  If so, we can skip sub-minor releases
> (3.2.x) entirely, and focus on keeping the release train moving.  In the
> meantime, we will continue delivering 2.1.x stability releases.
>
> This won’t be an entirely smooth transition.  In particular, you will have
> noticed that 3.1 will get more than a month’s worth of new features while
> we stabilize 3.0 as the last of the old way of doing things, so some
> patience is in order as we try this out.  By 3.4 and 3.6 later this year we
> should have a good idea if this is working, and we can make adjustments as
> warranted.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Re: 3.0 and the Cassandra release process

Posted by Robert Stupp <sn...@snazy.de>.

+1

I also appreciate Ariel’s effort. The improved CI integration is great - being able to run a huge amount of tests on different platforms against one's development branch is a huge improvement.


> Am 17.03.2015 um 22:06 schrieb Jonathan Ellis <jb...@gmail.com>:
> 
> Cassandra 2.1 was released in September, which means that if we were on
> track with our stated goal of six month releases, 3.0 would be done about
> now.  Instead, we haven't even delivered a beta.  The immediate cause this
> time is blocking for 8099
> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality is
> that nobody should really be surprised.  Something always comes up -- we've
> averaged about nine months since 1.0, with 2.1 taking an entire year.
> 
> We could make theory align with reality by acknowledging, "if nine months
> is our 'natural' release schedule, then so be it."  But I think we can do
> better.
> 
> Broadly speaking, we have two constituencies with Cassandra releases:
> 
> First, we have the users who are building or porting an application on
> Cassandra.  These users want the newest features to make their job easier.
> If 2.1.0 has a few bugs, it's not the end of the world.  They have time to
> wait for 2.1.x to stabilize while they write their code.  They would like
> to see us deliver on our six month schedule or even faster.
> 
> Second, we have the users who have an application in production.  These
> users, or their bosses, want Cassandra to be as stable as possible.
> Assuming they deploy on a stable release like 2.0.12, they don't want to
> touch it.  They would like to see us release *less* often.  (Because that
> means they have to do less upgrades while remaining in our backwards
> compatibility window.)
> 
> With our current "big release every X months" model, these users' needs are
> in tension.
> 
> We discussed this six months ago, and ended up with this:
> 
> What if we tried a [four month] release cycle, BUT we would guarantee that
>> you could do a rolling upgrade until we bump the supermajor version? So 2.0
>> could upgrade to 3.0 without having to go through 2.1.  (But to go to 3.1
>> or 4.0 you would have to go through 3.0.)
>> 
> 
> Crucially, I added
> 
> Whether this is reasonable depends on how fast we can stabilize releases.
>> 2.1.0 will be a good test of this.
>> 
> 
> Unfortunately, even after DataStax hired half a dozen full-time test
> engineers, 2.1.0 continued the proud tradition of being unready for
> production use, with "wait for .5 before upgrading" once again looking like
> a good guideline.
> 
> I’m starting to think that the entire model of “write a bunch of new
> features all at once and then try to stabilize it for release” is broken.
> We’ve been trying that for years and empirically speaking the evidence is
> that it just doesn’t work, either from a stability standpoint or even just
> shipping on time.
> 
> A big reason that it takes us so long to stabilize new releases now is
> that, because our major release cycle is so long, it’s super tempting to
> slip in “just one” new feature into bugfix releases, and I’m as guilty of
> that as anyone.
> 
> For similar reasons, it’s difficult to do a meaningful freeze with big
> feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
> also have significant work done (but not finished) on 6230, 7970, 6696, and
> 6477, all of which are meaningful improvements that address demonstrated
> user pain.  So if we keep doing what we’ve been doing, our choices are to
> either delay 3.0 further while we finish and stabilize these, or we wait
> nine months to a year for the next release.  Either way, one of our
> constituencies gets disappointed.
> 
> So, I’d like to try something different.  I think we were on the right
> track with shorter releases with more compatibility.  But I’d like to throw
> in a twist.  Intel cuts down on risk with a “tick-tock” schedule for new
> architectures and process shrinks instead of trying to do both at once.  We
> can do something similar here:
> 
> One month releases.  Period.  If it’s not done, it can wait.
> *Every other release only accepts bug fixes.*
> 
> By itself, one-month releases are going to dramatically reduce the
> complexity of testing and debugging new releases -- and bugs that do slip
> past us will only affect a smaller percentage of users, avoiding the “big
> release has a bunch of bugs no one has seen before and pretty much everyone
> is hit by something” scenario.  But by adding in the second rule, I think
> we have a real chance to make a quantum leap here: stable, production-ready
> releases every two months.
> 
> So here is my proposal for 3.0:
> 
> We’re just about ready to start serious review of 8099.  When that’s done,
> we branch 3.0 and cut a beta and then release candidates.  Whatever isn’t
> done by then, has to wait; unlike prior betas, we will only accept bug
> fixes into 3.0 after branching.
> 
> One month after 3.0, we will ship 3.1 (with new features).  At the same
> time, we will branch 3.2.  New features in trunk will go into 3.3.  The 3.2
> branch will only get bug fixes.  We will maintain backwards compatibility
> for all of 3.x; eventually (no less than a year) we will pick a release to
> be 4.0, and drop deprecated features and old backwards compatibilities.
> Otherwise there will be nothing special about the 4.0 designation.  (Note
> that with an “odd releases have new features, even releases only have bug
> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> 
> Larger features can continue to be developed in separate branches, the way
> 8099 is being worked on today, and committed to trunk when ready.  So this
> is not saying that we are limited only to features we can build in a single
> month.
> 
> Some things will have to change with our dev process, for the better.  In
> particular, with one month to commit new features, we don’t have room for
> committing sloppy work and stabilizing it later.  Trunk has to be stable at
> all times.  I asked Ariel Weisberg to put together his thoughts separately
> on what worked for his team at VoltDB, and how we can apply that to
> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.  (TLDR:
> Redefine “done” to include automated tests.  Infrastructure to run tests
> against github branches before merging to trunk.  A new test harness for
> long-running regression tests.)
> 
> I’m optimistic that as we improve our process this way, our even releases
> will become increasingly stable.  If so, we can skip sub-minor releases
> (3.2.x) entirely, and focus on keeping the release train moving.  In the
> meantime, we will continue delivering 2.1.x stability releases.
> 
> This won’t be an entirely smooth transition.  In particular, you will have
> noticed that 3.1 will get more than a month’s worth of new features while
> we stabilize 3.0 as the last of the old way of doing things, so some
> patience is in order as we try this out.  By 3.4 and 3.6 later this year we
> should have a good idea if this is working, and we can make adjustments as
> warranted.
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced

—
Robert Stupp
@snazy

Re: 3.0 and the Cassandra release process

Posted by Michael Kjellman <mk...@internalcircle.com>.

❤️ it. +1

-kjellman

> On Mar 17, 2015, at 2:06 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> 
> Cassandra 2.1 was released in September, which means that if we were on
> track with our stated goal of six month releases, 3.0 would be done about
> now.  Instead, we haven't even delivered a beta.  The immediate cause this
> time is blocking for 8099
> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality is
> that nobody should really be surprised.  Something always comes up -- we've
> averaged about nine months since 1.0, with 2.1 taking an entire year.
> 
> We could make theory align with reality by acknowledging, "if nine months
> is our 'natural' release schedule, then so be it."  But I think we can do
> better.
> 
> Broadly speaking, we have two constituencies with Cassandra releases:
> 
> First, we have the users who are building or porting an application on
> Cassandra.  These users want the newest features to make their job easier.
> If 2.1.0 has a few bugs, it's not the end of the world.  They have time to
> wait for 2.1.x to stabilize while they write their code.  They would like
> to see us deliver on our six month schedule or even faster.
> 
> Second, we have the users who have an application in production.  These
> users, or their bosses, want Cassandra to be as stable as possible.
> Assuming they deploy on a stable release like 2.0.12, they don't want to
> touch it.  They would like to see us release *less* often.  (Because that
> means they have to do less upgrades while remaining in our backwards
> compatibility window.)
> 
> With our current "big release every X months" model, these users' needs are
> in tension.
> 
> We discussed this six months ago, and ended up with this:
> 
> What if we tried a [four month] release cycle, BUT we would guarantee that
>> you could do a rolling upgrade until we bump the supermajor version? So 2.0
>> could upgrade to 3.0 without having to go through 2.1.  (But to go to 3.1
>> or 4.0 you would have to go through 3.0.)
>> 
> 
> Crucially, I added
> 
> Whether this is reasonable depends on how fast we can stabilize releases.
>> 2.1.0 will be a good test of this.
>> 
> 
> Unfortunately, even after DataStax hired half a dozen full-time test
> engineers, 2.1.0 continued the proud tradition of being unready for
> production use, with "wait for .5 before upgrading" once again looking like
> a good guideline.
> 
> I’m starting to think that the entire model of “write a bunch of new
> features all at once and then try to stabilize it for release” is broken.
> We’ve been trying that for years and empirically speaking the evidence is
> that it just doesn’t work, either from a stability standpoint or even just
> shipping on time.
> 
> A big reason that it takes us so long to stabilize new releases now is
> that, because our major release cycle is so long, it’s super tempting to
> slip in “just one” new feature into bugfix releases, and I’m as guilty of
> that as anyone.
> 
> For similar reasons, it’s difficult to do a meaningful freeze with big
> feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
> also have significant work done (but not finished) on 6230, 7970, 6696, and
> 6477, all of which are meaningful improvements that address demonstrated
> user pain.  So if we keep doing what we’ve been doing, our choices are to
> either delay 3.0 further while we finish and stabilize these, or we wait
> nine months to a year for the next release.  Either way, one of our
> constituencies gets disappointed.
> 
> So, I’d like to try something different.  I think we were on the right
> track with shorter releases with more compatibility.  But I’d like to throw
> in a twist.  Intel cuts down on risk with a “tick-tock” schedule for new
> architectures and process shrinks instead of trying to do both at once.  We
> can do something similar here:
> 
> One month releases.  Period.  If it’s not done, it can wait.
> *Every other release only accepts bug fixes.*
> 
> By itself, one-month releases are going to dramatically reduce the
> complexity of testing and debugging new releases -- and bugs that do slip
> past us will only affect a smaller percentage of users, avoiding the “big
> release has a bunch of bugs no one has seen before and pretty much everyone
> is hit by something” scenario.  But by adding in the second rule, I think
> we have a real chance to make a quantum leap here: stable, production-ready
> releases every two months.
> 
> So here is my proposal for 3.0:
> 
> We’re just about ready to start serious review of 8099.  When that’s done,
> we branch 3.0 and cut a beta and then release candidates.  Whatever isn’t
> done by then, has to wait; unlike prior betas, we will only accept bug
> fixes into 3.0 after branching.
> 
> One month after 3.0, we will ship 3.1 (with new features).  At the same
> time, we will branch 3.2.  New features in trunk will go into 3.3.  The 3.2
> branch will only get bug fixes.  We will maintain backwards compatibility
> for all of 3.x; eventually (no less than a year) we will pick a release to
> be 4.0, and drop deprecated features and old backwards compatibilities.
> Otherwise there will be nothing special about the 4.0 designation.  (Note
> that with an “odd releases have new features, even releases only have bug
> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> 
> Larger features can continue to be developed in separate branches, the way
> 8099 is being worked on today, and committed to trunk when ready.  So this
> is not saying that we are limited only to features we can build in a single
> month.
> 
> Some things will have to change with our dev process, for the better.  In
> particular, with one month to commit new features, we don’t have room for
> committing sloppy work and stabilizing it later.  Trunk has to be stable at
> all times.  I asked Ariel Weisberg to put together his thoughts separately
> on what worked for his team at VoltDB, and how we can apply that to
> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.  (TLDR:
> Redefine “done” to include automated tests.  Infrastructure to run tests
> against github branches before merging to trunk.  A new test harness for
> long-running regression tests.)
> 
> I’m optimistic that as we improve our process this way, our even releases
> will become increasingly stable.  If so, we can skip sub-minor releases
> (3.2.x) entirely, and focus on keeping the release train moving.  In the
> meantime, we will continue delivering 2.1.x stability releases.
> 
> This won’t be an entirely smooth transition.  In particular, you will have
> noticed that 3.1 will get more than a month’s worth of new features while
> we stabilize 3.0 as the last of the old way of doing things, so some
> patience is in order as we try this out.  By 3.4 and 3.6 later this year we
> should have a good idea if this is working, and we can make adjustments as
> warranted.
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced

Re: 3.0 and the Cassandra release process

Posted by Chris Burroughs <ch...@gmail.com>.

Broadly as a contributor and operator I like the idea of more frequent 
releases off of an always stable master.  First customer ship quality 
all the time [1]!

I'm a little concerned that the specific tick-tock proposal could 
devolve into a 'devodd' style where the 'feature release' becomes a 
thing no one wants to run in production.  However, if master is always 
stable it doesn't really matter when releases are cut and if master is 
*not* stable that is a larger problem then the details of the release 
cadence.  I say give it a shot.


[1] http://wiki.illumos.org/display/illumos/On+the+Quality+Death+Spiral

Re: 3.0 and the Cassandra release process

Posted by Aleksey Yeschenko <al...@apache.org>.

+1

-- 
AY

On March 17, 2015 at 14:07:03, Jonathan Ellis (jbellis@gmail.com) wrote:

Cassandra 2.1 was released in September, which means that if we were on  
track with our stated goal of six month releases, 3.0 would be done about  
now. Instead, we haven't even delivered a beta. The immediate cause this  
time is blocking for 8099  
<https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality is  
that nobody should really be surprised. Something always comes up -- we've  
averaged about nine months since 1.0, with 2.1 taking an entire year.  

We could make theory align with reality by acknowledging, "if nine months  
is our 'natural' release schedule, then so be it." But I think we can do  
better.  

Broadly speaking, we have two constituencies with Cassandra releases:  

First, we have the users who are building or porting an application on  
Cassandra. These users want the newest features to make their job easier.  
If 2.1.0 has a few bugs, it's not the end of the world. They have time to  
wait for 2.1.x to stabilize while they write their code. They would like  
to see us deliver on our six month schedule or even faster.  

Second, we have the users who have an application in production. These  
users, or their bosses, want Cassandra to be as stable as possible.  
Assuming they deploy on a stable release like 2.0.12, they don't want to  
touch it. They would like to see us release *less* often. (Because that  
means they have to do less upgrades while remaining in our backwards  
compatibility window.)  

With our current "big release every X months" model, these users' needs are  
in tension.  

We discussed this six months ago, and ended up with this:  

What if we tried a [four month] release cycle, BUT we would guarantee that  
> you could do a rolling upgrade until we bump the supermajor version? So 2.0  
> could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1  
> or 4.0 you would have to go through 3.0.)  
>  

Crucially, I added  

Whether this is reasonable depends on how fast we can stabilize releases.  
> 2.1.0 will be a good test of this.  
>  

Unfortunately, even after DataStax hired half a dozen full-time test  
engineers, 2.1.0 continued the proud tradition of being unready for  
production use, with "wait for .5 before upgrading" once again looking like  
a good guideline.  

I’m starting to think that the entire model of “write a bunch of new  
features all at once and then try to stabilize it for release” is broken.  
We’ve been trying that for years and empirically speaking the evidence is  
that it just doesn’t work, either from a stability standpoint or even just  
shipping on time.  

A big reason that it takes us so long to stabilize new releases now is  
that, because our major release cycle is so long, it’s super tempting to  
slip in “just one” new feature into bugfix releases, and I’m as guilty of  
that as anyone.  

For similar reasons, it’s difficult to do a meaningful freeze with big  
feature releases. A look at 3.0 shows why: we have 8099 coming, but we  
also have significant work done (but not finished) on 6230, 7970, 6696, and  
6477, all of which are meaningful improvements that address demonstrated  
user pain. So if we keep doing what we’ve been doing, our choices are to  
either delay 3.0 further while we finish and stabilize these, or we wait  
nine months to a year for the next release. Either way, one of our  
constituencies gets disappointed.  

So, I’d like to try something different. I think we were on the right  
track with shorter releases with more compatibility. But I’d like to throw  
in a twist. Intel cuts down on risk with a “tick-tock” schedule for new  
architectures and process shrinks instead of trying to do both at once. We  
can do something similar here:  

One month releases. Period. If it’s not done, it can wait.  
*Every other release only accepts bug fixes.*  

By itself, one-month releases are going to dramatically reduce the  
complexity of testing and debugging new releases -- and bugs that do slip  
past us will only affect a smaller percentage of users, avoiding the “big  
release has a bunch of bugs no one has seen before and pretty much everyone  
is hit by something” scenario. But by adding in the second rule, I think  
we have a real chance to make a quantum leap here: stable, production-ready  
releases every two months.  

So here is my proposal for 3.0:  

We’re just about ready to start serious review of 8099. When that’s done,  
we branch 3.0 and cut a beta and then release candidates. Whatever isn’t  
done by then, has to wait; unlike prior betas, we will only accept bug  
fixes into 3.0 after branching.  

One month after 3.0, we will ship 3.1 (with new features). At the same  
time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2  
branch will only get bug fixes. We will maintain backwards compatibility  
for all of 3.x; eventually (no less than a year) we will pick a release to  
be 4.0, and drop deprecated features and old backwards compatibilities.  
Otherwise there will be nothing special about the 4.0 designation. (Note  
that with an “odd releases have new features, even releases only have bug  
fixes” policy, 4.0 will actually be *more* stable than 3.11.)  

Larger features can continue to be developed in separate branches, the way  
8099 is being worked on today, and committed to trunk when ready. So this  
is not saying that we are limited only to features we can build in a single  
month.  

Some things will have to change with our dev process, for the better. In  
particular, with one month to commit new features, we don’t have room for  
committing sloppy work and stabilizing it later. Trunk has to be stable at  
all times. I asked Ariel Weisberg to put together his thoughts separately  
on what worked for his team at VoltDB, and how we can apply that to  
Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>. (TLDR:  
Redefine “done” to include automated tests. Infrastructure to run tests  
against github branches before merging to trunk. A new test harness for  
long-running regression tests.)  

I’m optimistic that as we improve our process this way, our even releases  
will become increasingly stable. If so, we can skip sub-minor releases  
(3.2.x) entirely, and focus on keeping the release train moving. In the  
meantime, we will continue delivering 2.1.x stability releases.  

This won’t be an entirely smooth transition. In particular, you will have  
noticed that 3.1 will get more than a month’s worth of new features while  
we stabilize 3.0 as the last of the old way of doing things, so some  
patience is in order as we try this out. By 3.4 and 3.6 later this year we  
should have a good idea if this is working, and we can make adjustments as  
warranted.  

--  
Jonathan Ellis  
Project Chair, Apache Cassandra  
co-founder, http://www.datastax.com  
@spyced

Re: 3.0 and the Cassandra release process

Posted by Jason Brown <ja...@gmail.com>.

Hey all,

I had a hallway conversation with some folks here last week, and they
expressed some concerns with this proposal. I will not attempt to summarize
their arguments as I don't believe I could do them ample justice, but I
strongly encouraged those individuals to speak up and be heard on this
thread (I know they are watching!).

Thanks,

-Jason

On Mon, Mar 23, 2015 at 6:32 AM, 曹志富 <ca...@gmail.com> wrote:

> +1
>
> --------------------------------------
> Ranger Tsao
>
> 2015-03-20 22:57 GMT+08:00 Ryan McGuire <ry...@datastax.com>:
>
> > I'm taking notes from the infrastructure doc and wrote down some action
> > items for my team:
> >
> > https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976
> >
> >
> > --
> >
> > [image: datastax_logo.png] <http://www.datastax.com/>
> >
> > Ryan McGuire
> >
> > Software Engineering Manager in Test | ryan@datastax.com
> >
> > [image: linkedin.png] <https://www.linkedin.com/in/enigmacurry> [image:
> > twitter.png] <http://twitter.com/enigmacurry>
> > <http://github.com/enigmacurry>
> >
> >
> > On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg <
> > ariel.weisberg@datastax.com
> > > wrote:
> >
> > > Hi,
> > >
> > > I realized one of the documents we didn't send out was the
> infrastructure
> > > side changes I am looking for. This one is maybe a little rougher as it
> > was
> > > the first one I wrote on the subject.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing
> > >
> > > The goal is to have infrastructure that gives developers as close to
> > > immediate feedback as possible on their code before they merge.
> Feedback
> > > that is delayed to after merging to trunk should come in a day or two
> and
> > > there is a product owner (Michael Shuler) responsible for making sure
> > that
> > > issues are addressed quickly.
> > >
> > > QA is going to help by providing developers with a better tools for
> > writing
> > > higher level functional tests that explore all of the functions
> together
> > > along with the configuration space without developers having to do any
> > work
> > > other then plugging in functionality to exercise and then validate
> > > something specific. This kind of harness is hard to get right and make
> > > reliable and expressive so they have their work cut out for them.
> > >
> > > It's going to be an iterative process where the tests improve as new
> work
> > > introduces missing coverage and as bugs/regressions drive the
> > introduction
> > > of new tests. The monthly retrospective (planning on doing that first
> of
> > > the month) is also going to help us refine the testing and development
> > > process.
> > >
> > > Ariel
> > >
> > > On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown <ja...@gmail.com>
> > wrote:
> > >
> > > > +1 to this general proposal. I think the time has finally come for us
> > to
> > > > try something new, and this sounds legit. Thanks!
> > > >
> > > > On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang <ud...@gmail.com>
> wrote:
> > > >
> > > > > Can I regard the odd version as the "development preview" and the
> > even
> > > > > version as the "production ready"?
> > > > >
> > > > > IMO, as a database infrastructure project, "stable" is more
> important
> > > > than
> > > > > other kinds of projects. LTS is a good idea, but if we don't
> support
> > > > > non-LTS releases for enough time to fix their bugs, users on
> non-LTS
> > > > > release may have to upgrade a new major release to fix the bugs and
> > may
> > > > > have to handle some new bugs by the new features. I'm afraid that
> > > > > eventually people would only think about the LTS one.
> > > > >
> > > > >
> > > > > 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich <po...@gmail.com>:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman <
> > > > > > mkjellman@internalcircle.com> wrote:
> > > > > >
> > > > > > > For most of my life I’ve lived on the software bleeding edge
> both
> > > > > > > personally and professionally. Maybe it’s a personal weakness,
> > but
> > > I
> > > > > > guess
> > > > > > > I get a thrill out of the problem solving aspect?
> > > > > > >
> > > > > > > Recently I came to a bit of an epiphany — the closer I keep to
> > the
> > > > > daily
> > > > > > > build — generally the happier I am on a daily basis. Bugs
> happen,
> > > but
> > > > > for
> > > > > > > the most part (aside from show stopper bugs), pain points for
> > > myself
> > > > > in a
> > > > > > > given daily build can generally can be debugged to 1 or maybe 2
> > > root
> > > > > > > causes, fixed in ~24 hours, and then life is better the next
> day
> > > > again.
> > > > > > In
> > > > > > > comparison, the old waterfall model generally means taking an
> > > > > “official”
> > > > > > > release at some point and waiting for some poor soul (or
> > developer)
> > > > to
> > > > > > > actually run the thing. No matter how good the QA team is,
> until
> > > it’s
> > > > > > > actually used in the real world, most bugs aren’t found.
> > > > > > >
> > > > > > > If you and your organization can wait 24 hours * number of bugs
> > > > > > discovered
> > > > > > > after people actually started using the thing, you end up with
> a
> > > > > “usable
> > > > > > > build” around the holy-grail minor X.X.5 release of Cassandra.
> > > > > > >
> > > > > > > I love the idea of the LTS model Jonathan describes because it
> > > means
> > > > > more
> > > > > > > code can get real testing and “bake” for longer instead of
> > sitting
> > > > > > largely
> > > > > > > unused on some git repository in a datacenter far far away. A
> lot
> > > of
> > > > > code
> > > > > > > has changed between 2.0 and trunk today. The code has diverged
> to
> > > the
> > > > > > point
> > > > > > > that if you write something for 2.0 (as the most stable major
> > > branch
> > > > > > > currently available), merging it forward to 3.0 or after
> > generally
> > > > > means
> > > > > > > rewriting it. If the only thing that comes out of this is a
> > smaller
> > > > > delta
> > > > > > > of LOC between the deployable version/branch and what we can
> > > develop
> > > > > > > against and what QA is focused on I think that’s a massive win.
> > > > > > >
> > > > > > > Something like CASSANDRA-8099 will need 2x the baking time of
> > even
> > > > many
> > > > > > of
> > > > > > > the more risky changes the project has made. While I wouldn’t
> > want
> > > to
> > > > > > run a
> > > > > > > build with CASSANDRA-8099 in it anytime soon, there are now
> > > hundreds
> > > > of
> > > > > > > other changes blocked, most likely many containing new bugs of
> > > their
> > > > > own,
> > > > > > > but have no exposure at all to even the most involved C*
> > > developers.
> > > > > > >
> > > > > > > I really think this will be a huge win for the project and I’m
> > > super
> > > > > > > thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for
> > > guiding
> > > > > this
> > > > > > > change to a much more sustainable release model for the entire
> > > > > community.
> > > > > > >
> > > > > > > best,
> > > > > > > kjellman
> > > > > > >
> > > > > > >
> > > > > > > > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <
> > > > > > ariel.weisberg@datastax.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Keep in mind it is a bug fix release every month and a
> feature
> > > > > release
> > > > > > > every two months.
> > > > > > > >
> > > > > > > > For development that is really a two month cycle with all bug
> > > fixes
> > > > > > > being backported one release. As a developer if you want to get
> > > > > something
> > > > > > > in a release you have two months and you should be sizing
> pieces
> > of
> > > > > large
> > > > > > > tasks so they ship at least every two months.
> > > > > > > >
> > > > > > > > Ariel
> > > > > > > >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <
> > > > tscanausa@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> I like the idea but I agree that every month is a bit
> > > aggressive.
> > > > I
> > > > > > > have no
> > > > > > > >> say but:
> > > > > > > >>
> > > > > > > >> I would say 4 releases a year instead of 12. with 2 months
> of
> > > new
> > > > > > > features
> > > > > > > >> and 1 month of bug squashing per a release. With the 4th
> > quarter
> > > > > just
> > > > > > > bugs.
> > > > > > > >>
> > > > > > > >> I would also proposed 2 year LTS releases for the releases
> > after
> > > > the
> > > > > > 4th
> > > > > > > >> quarter. So everyone could get a new feature release every
> > > quarter
> > > > > and
> > > > > > > the
> > > > > > > >> stability of super major versions for 2 years.
> > > > > > > >>
> > > > > > > >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <
> > > > > > dbrosius@mebigfatguy.com
> > > > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> It would seem the practical implications of this is that
> > there
> > > > > would
> > > > > > be
> > > > > > > >>> significantly more development on branches, with
> potentially
> > > more
> > > > > > > >>> significant delays on merging these branches. This would
> > imply
> > > to
> > > > > me
> > > > > > > that
> > > > > > > >>> more Jenkins servers would need to be set up to handle
> > > > auto-testing
> > > > > > of
> > > > > > > more
> > > > > > > >>> branches, as if feature work spends more time on external
> > > > branches,
> > > > > > it
> > > > > > > is
> > > > > > > >>> then likely to be be less tested (even if by accident) as
> > less
> > > > > > > developers
> > > > > > > >>> would be working on that branch. Only when a feature was
> > > blessed
> > > > to
> > > > > > > make it
> > > > > > > >>> to the release-tracked branch, would it become exposed to
> the
> > > > > > majority
> > > > > > > of
> > > > > > > >>> developers/testers, etc doing normal
> running/playing/testing.
> > > > > > > >>>
> > > > > > > >>> This isn't to knock the idea in anyway, just wanted to
> > mention
> > > > > what i
> > > > > > > >>> think the outcome would be.
> > > > > > > >>>
> > > > > > > >>> dave
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>>
> > > > > > > >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <
> > > > > > jbellis@gmail.com>
> > > > > > > >>>>> wrote:
> > > > > > > >>>>>>> Cassandra 2.1 was released in September, which means
> that
> > > if
> > > > we
> > > > > > > were
> > > > > > > >>>>> on
> > > > > > > >>>>>>> track with our stated goal of six month releases, 3.0
> > would
> > > > be
> > > > > > done
> > > > > > > >>>>> about
> > > > > > > >>>>>>> now.  Instead, we haven't even delivered a beta.  The
> > > > immediate
> > > > > > > cause
> > > > > > > >>>>>> this
> > > > > > > >>>>>>> time is blocking for 8099
> > > > > > > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099
> >,
> > > but
> > > > > the
> > > > > > > >>>>> reality
> > > > > > > >>>>>> is
> > > > > > > >>>>>>> that nobody should really be surprised.  Something
> always
> > > > comes
> > > > > > up
> > > > > > > --
> > > > > > > >>>>>> we've
> > > > > > > >>>>>>> averaged about nine months since 1.0, with 2.1 taking
> an
> > > > entire
> > > > > > > year.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> We could make theory align with reality by
> acknowledging,
> > > "if
> > > > > > nine
> > > > > > > >>>>> months
> > > > > > > >>>>>>> is our 'natural' release schedule, then so be it."
> But I
> > > > think
> > > > > > we
> > > > > > > >>>>> can
> > > > > > > >>>>> do
> > > > > > > >>>>>>> better.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Broadly speaking, we have two constituencies with
> > Cassandra
> > > > > > > releases:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> First, we have the users who are building or porting an
> > > > > > application
> > > > > > > >>>>> on
> > > > > > > >>>>>>> Cassandra.  These users want the newest features to
> make
> > > > their
> > > > > > job
> > > > > > > >>>>>> easier.
> > > > > > > >>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.
> > > They
> > > > > > have
> > > > > > > >>>>> time
> > > > > > > >>>>>> to
> > > > > > > >>>>>>> wait for 2.1.x to stabilize while they write their
> code.
> > > > They
> > > > > > > would
> > > > > > > >>>>> like
> > > > > > > >>>>>>> to see us deliver on our six month schedule or even
> > faster.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Second, we have the users who have an application in
> > > > > production.
> > > > > > > >>>>> These
> > > > > > > >>>>>>> users, or their bosses, want Cassandra to be as stable
> as
> > > > > > possible.
> > > > > > > >>>>>>> Assuming they deploy on a stable release like 2.0.12,
> > they
> > > > > don't
> > > > > > > want
> > > > > > > >>>>> to
> > > > > > > >>>>>>> touch it.  They would like to see us release *less*
> > often.
> > > > > > > (Because
> > > > > > > >>>>> that
> > > > > > > >>>>>>> means they have to do less upgrades while remaining in
> > our
> > > > > > > backwards
> > > > > > > >>>>>>> compatibility window.)
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> With our current "big release every X months" model,
> > these
> > > > > users'
> > > > > > > >>>>> needs
> > > > > > > >>>>>> are
> > > > > > > >>>>>>> in tension.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> We discussed this six months ago, and ended up with
> this:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> What if we tried a [four month] release cycle, BUT we
> > would
> > > > > > > guarantee
> > > > > > > >>>>>> that
> > > > > > > >>>>>>>> you could do a rolling upgrade until we bump the
> > > supermajor
> > > > > > > version?
> > > > > > > >>>>> So
> > > > > > > >>>>>> 2.0
> > > > > > > >>>>>>>> could upgrade to 3.0 without having to go through 2.1.
> > > (But
> > > > > to
> > > > > > go
> > > > > > > >>>>> to
> > > > > > > >>>>>> 3.1
> > > > > > > >>>>>>>> or 4.0 you would have to go through 3.0.)
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Crucially, I added
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Whether this is reasonable depends on how fast we can
> > > > stabilize
> > > > > > > >>>>> releases.
> > > > > > > >>>>>>>> 2.1.0 will be a good test of this.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Unfortunately, even after DataStax hired half a dozen
> > > > full-time
> > > > > > > test
> > > > > > > >>>>>>> engineers, 2.1.0 continued the proud tradition of being
> > > > unready
> > > > > > for
> > > > > > > >>>>>>> production use, with "wait for .5 before upgrading"
> once
> > > > again
> > > > > > > >>>>> looking
> > > > > > > >>>>>> like
> > > > > > > >>>>>>> a good guideline.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> I’m starting to think that the entire model of “write a
> > > bunch
> > > > > of
> > > > > > > new
> > > > > > > >>>>>>> features all at once and then try to stabilize it for
> > > > release”
> > > > > is
> > > > > > > >>>>> broken.
> > > > > > > >>>>>>> We’ve been trying that for years and empirically
> speaking
> > > the
> > > > > > > >>>>> evidence
> > > > > > > >>>>> is
> > > > > > > >>>>>>> that it just doesn’t work, either from a stability
> > > standpoint
> > > > > or
> > > > > > > even
> > > > > > > >>>>>> just
> > > > > > > >>>>>>> shipping on time.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> A big reason that it takes us so long to stabilize new
> > > > releases
> > > > > > now
> > > > > > > >>>>> is
> > > > > > > >>>>>>> that, because our major release cycle is so long, it’s
> > > super
> > > > > > > tempting
> > > > > > > >>>>> to
> > > > > > > >>>>>>> slip in “just one” new feature into bugfix releases,
> and
> > > I’m
> > > > as
> > > > > > > >>>>> guilty
> > > > > > > >>>>> of
> > > > > > > >>>>>>> that as anyone.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> For similar reasons, it’s difficult to do a meaningful
> > > freeze
> > > > > > with
> > > > > > > >>>>> big
> > > > > > > >>>>>>> feature releases.  A look at 3.0 shows why: we have
> 8099
> > > > > coming,
> > > > > > > but
> > > > > > > >>>>> we
> > > > > > > >>>>>>> also have significant work done (but not finished) on
> > 6230,
> > > > > 7970,
> > > > > > > >>>>> 6696,
> > > > > > > >>>>>> and
> > > > > > > >>>>>>> 6477, all of which are meaningful improvements that
> > address
> > > > > > > >>>>> demonstrated
> > > > > > > >>>>>>> user pain.  So if we keep doing what we’ve been doing,
> > our
> > > > > > choices
> > > > > > > >>>>> are
> > > > > > > >>>>> to
> > > > > > > >>>>>>> either delay 3.0 further while we finish and stabilize
> > > these,
> > > > > or
> > > > > > we
> > > > > > > >>>>> wait
> > > > > > > >>>>>>> nine months to a year for the next release.  Either
> way,
> > > one
> > > > of
> > > > > > our
> > > > > > > >>>>>>> constituencies gets disappointed.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> So, I’d like to try something different.  I think we
> were
> > > on
> > > > > the
> > > > > > > >>>>> right
> > > > > > > >>>>>>> track with shorter releases with more compatibility.
> But
> > > I’d
> > > > > > like
> > > > > > > to
> > > > > > > >>>>>> throw
> > > > > > > >>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock”
> > > > > schedule
> > > > > > > for
> > > > > > > >>>>> new
> > > > > > > >>>>>>> architectures and process shrinks instead of trying to
> do
> > > > both
> > > > > at
> > > > > > > >>>>> once.
> > > > > > > >>>>>> We
> > > > > > > >>>>>>> can do something similar here:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> One month releases.  Period.  If it’s not done, it can
> > > wait.
> > > > > > > >>>>>>> *Every other release only accepts bug fixes.*
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> By itself, one-month releases are going to dramatically
> > > > reduce
> > > > > > the
> > > > > > > >>>>>>> complexity of testing and debugging new releases -- and
> > > bugs
> > > > > that
> > > > > > > do
> > > > > > > >>>>> slip
> > > > > > > >>>>>>> past us will only affect a smaller percentage of users,
> > > > > avoiding
> > > > > > > the
> > > > > > > >>>>> “big
> > > > > > > >>>>>>> release has a bunch of bugs no one has seen before and
> > > pretty
> > > > > > much
> > > > > > > >>>>>> everyone
> > > > > > > >>>>>>> is hit by something” scenario.  But by adding in the
> > second
> > > > > > rule, I
> > > > > > > >>>>> think
> > > > > > > >>>>>>> we have a real chance to make a quantum leap here:
> > stable,
> > > > > > > >>>>>> production-ready
> > > > > > > >>>>>>> releases every two months.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> So here is my proposal for 3.0:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> We’re just about ready to start serious review of 8099.
> > > When
> > > > > > > that’s
> > > > > > > >>>>>> done,
> > > > > > > >>>>>>> we branch 3.0 and cut a beta and then release
> candidates.
> > > > > > Whatever
> > > > > > > >>>>> isn’t
> > > > > > > >>>>>>> done by then, has to wait; unlike prior betas, we will
> > only
> > > > > > accept
> > > > > > > >>>>> bug
> > > > > > > >>>>>>> fixes into 3.0 after branching.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> One month after 3.0, we will ship 3.1 (with new
> > features).
> > > > At
> > > > > > the
> > > > > > > >>>>> same
> > > > > > > >>>>>>> time, we will branch 3.2.  New features in trunk will
> go
> > > into
> > > > > > 3.3.
> > > > > > > >>>>> The
> > > > > > > >>>>>> 3.2
> > > > > > > >>>>>>> branch will only get bug fixes.  We will maintain
> > backwards
> > > > > > > >>>>> compatibility
> > > > > > > >>>>>>> for all of 3.x; eventually (no less than a year) we
> will
> > > > pick a
> > > > > > > >>>>> release
> > > > > > > >>>>>> to
> > > > > > > >>>>>>> be 4.0, and drop deprecated features and old backwards
> > > > > > > >>>>> compatibilities.
> > > > > > > >>>>>>> Otherwise there will be nothing special about the 4.0
> > > > > > designation.
> > > > > > > >>>>> (Note
> > > > > > > >>>>>>> that with an “odd releases have new features, even
> > releases
> > > > > only
> > > > > > > have
> > > > > > > >>>>> bug
> > > > > > > >>>>>>> fixes” policy, 4.0 will actually be *more* stable than
> > > 3.11.)
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Larger features can continue to be developed in
> separate
> > > > > > branches,
> > > > > > > >>>>> the
> > > > > > > >>>>>> way
> > > > > > > >>>>>>> 8099 is being worked on today, and committed to trunk
> > when
> > > > > ready.
> > > > > > > So
> > > > > > > >>>>>> this
> > > > > > > >>>>>>> is not saying that we are limited only to features we
> can
> > > > build
> > > > > > in
> > > > > > > a
> > > > > > > >>>>>> single
> > > > > > > >>>>>>> month.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Some things will have to change with our dev process,
> for
> > > the
> > > > > > > better.
> > > > > > > >>>>> In
> > > > > > > >>>>>>> particular, with one month to commit new features, we
> > don’t
> > > > > have
> > > > > > > room
> > > > > > > >>>>> for
> > > > > > > >>>>>>> committing sloppy work and stabilizing it later.  Trunk
> > has
> > > > to
> > > > > be
> > > > > > > >>>>> stable
> > > > > > > >>>>>> at
> > > > > > > >>>>>>> all times.  I asked Ariel Weisberg to put together his
> > > > thoughts
> > > > > > > >>>>>> separately
> > > > > > > >>>>>>> on what worked for his team at VoltDB, and how we can
> > apply
> > > > > that
> > > > > > to
> > > > > > > >>>>>>> Cassandra -- see his email from Friday <
> > > > http://bit.ly/1MHaOKX
> > > > > >.
> > > > > > > >>>>> (TLDR:
> > > > > > > >>>>>>> Redefine “done” to include automated tests.
> > Infrastructure
> > > > to
> > > > > > run
> > > > > > > >>>>> tests
> > > > > > > >>>>>>> against github branches before merging to trunk.  A new
> > > test
> > > > > > > harness
> > > > > > > >>>>> for
> > > > > > > >>>>>>> long-running regression tests.)
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> I’m optimistic that as we improve our process this way,
> > our
> > > > > even
> > > > > > > >>>>> releases
> > > > > > > >>>>>>> will become increasingly stable.  If so, we can skip
> > > > sub-minor
> > > > > > > >>>>> releases
> > > > > > > >>>>>>> (3.2.x) entirely, and focus on keeping the release
> train
> > > > > moving.
> > > > > > > In
> > > > > > > >>>>> the
> > > > > > > >>>>>>> meantime, we will continue delivering 2.1.x stability
> > > > releases.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> This won’t be an entirely smooth transition.  In
> > > particular,
> > > > > you
> > > > > > > will
> > > > > > > >>>>>> have
> > > > > > > >>>>>>> noticed that 3.1 will get more than a month’s worth of
> > new
> > > > > > features
> > > > > > > >>>>> while
> > > > > > > >>>>>>> we stabilize 3.0 as the last of the old way of doing
> > > things,
> > > > so
> > > > > > > some
> > > > > > > >>>>>>> patience is in order as we try this out.  By 3.4 and
> 3.6
> > > > later
> > > > > > this
> > > > > > > >>>>> year
> > > > > > > >>>>>> we
> > > > > > > >>>>>>> should have a good idea if this is working, and we can
> > make
> > > > > > > >>>>> adjustments
> > > > > > > >>>>>> as
> > > > > > > >>>>>>> warranted.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> --
> > > > > > > >>>>>>> Jonathan Ellis
> > > > > > > >>>>>>> Project Chair, Apache Cassandra
> > > > > > > >>>>>>> co-founder, http://www.datastax.com
> > > > > > > >>>>>>> @spyced
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Phil Yang
> > > > >
> > > >
> > >
> >
>

Re: 3.0 and the Cassandra release process

Posted by 曹志富 <ca...@gmail.com>.

+1

--------------------------------------
Ranger Tsao

2015-03-20 22:57 GMT+08:00 Ryan McGuire <ry...@datastax.com>:

> I'm taking notes from the infrastructure doc and wrote down some action
> items for my team:
>
> https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976
>
>
> --
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Ryan McGuire
>
> Software Engineering Manager in Test | ryan@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/in/enigmacurry> [image:
> twitter.png] <http://twitter.com/enigmacurry>
> <http://github.com/enigmacurry>
>
>
> On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg <
> ariel.weisberg@datastax.com
> > wrote:
>
> > Hi,
> >
> > I realized one of the documents we didn't send out was the infrastructure
> > side changes I am looking for. This one is maybe a little rougher as it
> was
> > the first one I wrote on the subject.
> >
> >
> >
> https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing
> >
> > The goal is to have infrastructure that gives developers as close to
> > immediate feedback as possible on their code before they merge. Feedback
> > that is delayed to after merging to trunk should come in a day or two and
> > there is a product owner (Michael Shuler) responsible for making sure
> that
> > issues are addressed quickly.
> >
> > QA is going to help by providing developers with a better tools for
> writing
> > higher level functional tests that explore all of the functions together
> > along with the configuration space without developers having to do any
> work
> > other then plugging in functionality to exercise and then validate
> > something specific. This kind of harness is hard to get right and make
> > reliable and expressive so they have their work cut out for them.
> >
> > It's going to be an iterative process where the tests improve as new work
> > introduces missing coverage and as bugs/regressions drive the
> introduction
> > of new tests. The monthly retrospective (planning on doing that first of
> > the month) is also going to help us refine the testing and development
> > process.
> >
> > Ariel
> >
> > On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown <ja...@gmail.com>
> wrote:
> >
> > > +1 to this general proposal. I think the time has finally come for us
> to
> > > try something new, and this sounds legit. Thanks!
> > >
> > > On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang <ud...@gmail.com> wrote:
> > >
> > > > Can I regard the odd version as the "development preview" and the
> even
> > > > version as the "production ready"?
> > > >
> > > > IMO, as a database infrastructure project, "stable" is more important
> > > than
> > > > other kinds of projects. LTS is a good idea, but if we don't support
> > > > non-LTS releases for enough time to fix their bugs, users on non-LTS
> > > > release may have to upgrade a new major release to fix the bugs and
> may
> > > > have to handle some new bugs by the new features. I'm afraid that
> > > > eventually people would only think about the LTS one.
> > > >
> > > >
> > > > 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich <po...@gmail.com>:
> > > >
> > > > > +1
> > > > >
> > > > > On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman <
> > > > > mkjellman@internalcircle.com> wrote:
> > > > >
> > > > > > For most of my life I’ve lived on the software bleeding edge both
> > > > > > personally and professionally. Maybe it’s a personal weakness,
> but
> > I
> > > > > guess
> > > > > > I get a thrill out of the problem solving aspect?
> > > > > >
> > > > > > Recently I came to a bit of an epiphany — the closer I keep to
> the
> > > > daily
> > > > > > build — generally the happier I am on a daily basis. Bugs happen,
> > but
> > > > for
> > > > > > the most part (aside from show stopper bugs), pain points for
> > myself
> > > > in a
> > > > > > given daily build can generally can be debugged to 1 or maybe 2
> > root
> > > > > > causes, fixed in ~24 hours, and then life is better the next day
> > > again.
> > > > > In
> > > > > > comparison, the old waterfall model generally means taking an
> > > > “official”
> > > > > > release at some point and waiting for some poor soul (or
> developer)
> > > to
> > > > > > actually run the thing. No matter how good the QA team is, until
> > it’s
> > > > > > actually used in the real world, most bugs aren’t found.
> > > > > >
> > > > > > If you and your organization can wait 24 hours * number of bugs
> > > > > discovered
> > > > > > after people actually started using the thing, you end up with a
> > > > “usable
> > > > > > build” around the holy-grail minor X.X.5 release of Cassandra.
> > > > > >
> > > > > > I love the idea of the LTS model Jonathan describes because it
> > means
> > > > more
> > > > > > code can get real testing and “bake” for longer instead of
> sitting
> > > > > largely
> > > > > > unused on some git repository in a datacenter far far away. A lot
> > of
> > > > code
> > > > > > has changed between 2.0 and trunk today. The code has diverged to
> > the
> > > > > point
> > > > > > that if you write something for 2.0 (as the most stable major
> > branch
> > > > > > currently available), merging it forward to 3.0 or after
> generally
> > > > means
> > > > > > rewriting it. If the only thing that comes out of this is a
> smaller
> > > > delta
> > > > > > of LOC between the deployable version/branch and what we can
> > develop
> > > > > > against and what QA is focused on I think that’s a massive win.
> > > > > >
> > > > > > Something like CASSANDRA-8099 will need 2x the baking time of
> even
> > > many
> > > > > of
> > > > > > the more risky changes the project has made. While I wouldn’t
> want
> > to
> > > > > run a
> > > > > > build with CASSANDRA-8099 in it anytime soon, there are now
> > hundreds
> > > of
> > > > > > other changes blocked, most likely many containing new bugs of
> > their
> > > > own,
> > > > > > but have no exposure at all to even the most involved C*
> > developers.
> > > > > >
> > > > > > I really think this will be a huge win for the project and I’m
> > super
> > > > > > thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for
> > guiding
> > > > this
> > > > > > change to a much more sustainable release model for the entire
> > > > community.
> > > > > >
> > > > > > best,
> > > > > > kjellman
> > > > > >
> > > > > >
> > > > > > > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <
> > > > > ariel.weisberg@datastax.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Keep in mind it is a bug fix release every month and a feature
> > > > release
> > > > > > every two months.
> > > > > > >
> > > > > > > For development that is really a two month cycle with all bug
> > fixes
> > > > > > being backported one release. As a developer if you want to get
> > > > something
> > > > > > in a release you have two months and you should be sizing pieces
> of
> > > > large
> > > > > > tasks so they ship at least every two months.
> > > > > > >
> > > > > > > Ariel
> > > > > > >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <
> > > tscanausa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> I like the idea but I agree that every month is a bit
> > aggressive.
> > > I
> > > > > > have no
> > > > > > >> say but:
> > > > > > >>
> > > > > > >> I would say 4 releases a year instead of 12. with 2 months of
> > new
> > > > > > features
> > > > > > >> and 1 month of bug squashing per a release. With the 4th
> quarter
> > > > just
> > > > > > bugs.
> > > > > > >>
> > > > > > >> I would also proposed 2 year LTS releases for the releases
> after
> > > the
> > > > > 4th
> > > > > > >> quarter. So everyone could get a new feature release every
> > quarter
> > > > and
> > > > > > the
> > > > > > >> stability of super major versions for 2 years.
> > > > > > >>
> > > > > > >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <
> > > > > dbrosius@mebigfatguy.com
> > > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> It would seem the practical implications of this is that
> there
> > > > would
> > > > > be
> > > > > > >>> significantly more development on branches, with potentially
> > more
> > > > > > >>> significant delays on merging these branches. This would
> imply
> > to
> > > > me
> > > > > > that
> > > > > > >>> more Jenkins servers would need to be set up to handle
> > > auto-testing
> > > > > of
> > > > > > more
> > > > > > >>> branches, as if feature work spends more time on external
> > > branches,
> > > > > it
> > > > > > is
> > > > > > >>> then likely to be be less tested (even if by accident) as
> less
> > > > > > developers
> > > > > > >>> would be working on that branch. Only when a feature was
> > blessed
> > > to
> > > > > > make it
> > > > > > >>> to the release-tracked branch, would it become exposed to the
> > > > > majority
> > > > > > of
> > > > > > >>> developers/testers, etc doing normal running/playing/testing.
> > > > > > >>>
> > > > > > >>> This isn't to knock the idea in anyway, just wanted to
> mention
> > > > what i
> > > > > > >>> think the outcome would be.
> > > > > > >>>
> > > > > > >>> dave
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>>
> > > > > > >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <
> > > > > jbellis@gmail.com>
> > > > > > >>>>> wrote:
> > > > > > >>>>>>> Cassandra 2.1 was released in September, which means that
> > if
> > > we
> > > > > > were
> > > > > > >>>>> on
> > > > > > >>>>>>> track with our stated goal of six month releases, 3.0
> would
> > > be
> > > > > done
> > > > > > >>>>> about
> > > > > > >>>>>>> now.  Instead, we haven't even delivered a beta.  The
> > > immediate
> > > > > > cause
> > > > > > >>>>>> this
> > > > > > >>>>>>> time is blocking for 8099
> > > > > > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>,
> > but
> > > > the
> > > > > > >>>>> reality
> > > > > > >>>>>> is
> > > > > > >>>>>>> that nobody should really be surprised.  Something always
> > > comes
> > > > > up
> > > > > > --
> > > > > > >>>>>> we've
> > > > > > >>>>>>> averaged about nine months since 1.0, with 2.1 taking an
> > > entire
> > > > > > year.
> > > > > > >>>>>>>
> > > > > > >>>>>>> We could make theory align with reality by acknowledging,
> > "if
> > > > > nine
> > > > > > >>>>> months
> > > > > > >>>>>>> is our 'natural' release schedule, then so be it."  But I
> > > think
> > > > > we
> > > > > > >>>>> can
> > > > > > >>>>> do
> > > > > > >>>>>>> better.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Broadly speaking, we have two constituencies with
> Cassandra
> > > > > > releases:
> > > > > > >>>>>>>
> > > > > > >>>>>>> First, we have the users who are building or porting an
> > > > > application
> > > > > > >>>>> on
> > > > > > >>>>>>> Cassandra.  These users want the newest features to make
> > > their
> > > > > job
> > > > > > >>>>>> easier.
> > > > > > >>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.
> > They
> > > > > have
> > > > > > >>>>> time
> > > > > > >>>>>> to
> > > > > > >>>>>>> wait for 2.1.x to stabilize while they write their code.
> > > They
> > > > > > would
> > > > > > >>>>> like
> > > > > > >>>>>>> to see us deliver on our six month schedule or even
> faster.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Second, we have the users who have an application in
> > > > production.
> > > > > > >>>>> These
> > > > > > >>>>>>> users, or their bosses, want Cassandra to be as stable as
> > > > > possible.
> > > > > > >>>>>>> Assuming they deploy on a stable release like 2.0.12,
> they
> > > > don't
> > > > > > want
> > > > > > >>>>> to
> > > > > > >>>>>>> touch it.  They would like to see us release *less*
> often.
> > > > > > (Because
> > > > > > >>>>> that
> > > > > > >>>>>>> means they have to do less upgrades while remaining in
> our
> > > > > > backwards
> > > > > > >>>>>>> compatibility window.)
> > > > > > >>>>>>>
> > > > > > >>>>>>> With our current "big release every X months" model,
> these
> > > > users'
> > > > > > >>>>> needs
> > > > > > >>>>>> are
> > > > > > >>>>>>> in tension.
> > > > > > >>>>>>>
> > > > > > >>>>>>> We discussed this six months ago, and ended up with this:
> > > > > > >>>>>>>
> > > > > > >>>>>>> What if we tried a [four month] release cycle, BUT we
> would
> > > > > > guarantee
> > > > > > >>>>>> that
> > > > > > >>>>>>>> you could do a rolling upgrade until we bump the
> > supermajor
> > > > > > version?
> > > > > > >>>>> So
> > > > > > >>>>>> 2.0
> > > > > > >>>>>>>> could upgrade to 3.0 without having to go through 2.1.
> > (But
> > > > to
> > > > > go
> > > > > > >>>>> to
> > > > > > >>>>>> 3.1
> > > > > > >>>>>>>> or 4.0 you would have to go through 3.0.)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> Crucially, I added
> > > > > > >>>>>>>
> > > > > > >>>>>>> Whether this is reasonable depends on how fast we can
> > > stabilize
> > > > > > >>>>> releases.
> > > > > > >>>>>>>> 2.1.0 will be a good test of this.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> Unfortunately, even after DataStax hired half a dozen
> > > full-time
> > > > > > test
> > > > > > >>>>>>> engineers, 2.1.0 continued the proud tradition of being
> > > unready
> > > > > for
> > > > > > >>>>>>> production use, with "wait for .5 before upgrading" once
> > > again
> > > > > > >>>>> looking
> > > > > > >>>>>> like
> > > > > > >>>>>>> a good guideline.
> > > > > > >>>>>>>
> > > > > > >>>>>>> I’m starting to think that the entire model of “write a
> > bunch
> > > > of
> > > > > > new
> > > > > > >>>>>>> features all at once and then try to stabilize it for
> > > release”
> > > > is
> > > > > > >>>>> broken.
> > > > > > >>>>>>> We’ve been trying that for years and empirically speaking
> > the
> > > > > > >>>>> evidence
> > > > > > >>>>> is
> > > > > > >>>>>>> that it just doesn’t work, either from a stability
> > standpoint
> > > > or
> > > > > > even
> > > > > > >>>>>> just
> > > > > > >>>>>>> shipping on time.
> > > > > > >>>>>>>
> > > > > > >>>>>>> A big reason that it takes us so long to stabilize new
> > > releases
> > > > > now
> > > > > > >>>>> is
> > > > > > >>>>>>> that, because our major release cycle is so long, it’s
> > super
> > > > > > tempting
> > > > > > >>>>> to
> > > > > > >>>>>>> slip in “just one” new feature into bugfix releases, and
> > I’m
> > > as
> > > > > > >>>>> guilty
> > > > > > >>>>> of
> > > > > > >>>>>>> that as anyone.
> > > > > > >>>>>>>
> > > > > > >>>>>>> For similar reasons, it’s difficult to do a meaningful
> > freeze
> > > > > with
> > > > > > >>>>> big
> > > > > > >>>>>>> feature releases.  A look at 3.0 shows why: we have 8099
> > > > coming,
> > > > > > but
> > > > > > >>>>> we
> > > > > > >>>>>>> also have significant work done (but not finished) on
> 6230,
> > > > 7970,
> > > > > > >>>>> 6696,
> > > > > > >>>>>> and
> > > > > > >>>>>>> 6477, all of which are meaningful improvements that
> address
> > > > > > >>>>> demonstrated
> > > > > > >>>>>>> user pain.  So if we keep doing what we’ve been doing,
> our
> > > > > choices
> > > > > > >>>>> are
> > > > > > >>>>> to
> > > > > > >>>>>>> either delay 3.0 further while we finish and stabilize
> > these,
> > > > or
> > > > > we
> > > > > > >>>>> wait
> > > > > > >>>>>>> nine months to a year for the next release.  Either way,
> > one
> > > of
> > > > > our
> > > > > > >>>>>>> constituencies gets disappointed.
> > > > > > >>>>>>>
> > > > > > >>>>>>> So, I’d like to try something different.  I think we were
> > on
> > > > the
> > > > > > >>>>> right
> > > > > > >>>>>>> track with shorter releases with more compatibility.  But
> > I’d
> > > > > like
> > > > > > to
> > > > > > >>>>>> throw
> > > > > > >>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock”
> > > > schedule
> > > > > > for
> > > > > > >>>>> new
> > > > > > >>>>>>> architectures and process shrinks instead of trying to do
> > > both
> > > > at
> > > > > > >>>>> once.
> > > > > > >>>>>> We
> > > > > > >>>>>>> can do something similar here:
> > > > > > >>>>>>>
> > > > > > >>>>>>> One month releases.  Period.  If it’s not done, it can
> > wait.
> > > > > > >>>>>>> *Every other release only accepts bug fixes.*
> > > > > > >>>>>>>
> > > > > > >>>>>>> By itself, one-month releases are going to dramatically
> > > reduce
> > > > > the
> > > > > > >>>>>>> complexity of testing and debugging new releases -- and
> > bugs
> > > > that
> > > > > > do
> > > > > > >>>>> slip
> > > > > > >>>>>>> past us will only affect a smaller percentage of users,
> > > > avoiding
> > > > > > the
> > > > > > >>>>> “big
> > > > > > >>>>>>> release has a bunch of bugs no one has seen before and
> > pretty
> > > > > much
> > > > > > >>>>>> everyone
> > > > > > >>>>>>> is hit by something” scenario.  But by adding in the
> second
> > > > > rule, I
> > > > > > >>>>> think
> > > > > > >>>>>>> we have a real chance to make a quantum leap here:
> stable,
> > > > > > >>>>>> production-ready
> > > > > > >>>>>>> releases every two months.
> > > > > > >>>>>>>
> > > > > > >>>>>>> So here is my proposal for 3.0:
> > > > > > >>>>>>>
> > > > > > >>>>>>> We’re just about ready to start serious review of 8099.
> > When
> > > > > > that’s
> > > > > > >>>>>> done,
> > > > > > >>>>>>> we branch 3.0 and cut a beta and then release candidates.
> > > > > Whatever
> > > > > > >>>>> isn’t
> > > > > > >>>>>>> done by then, has to wait; unlike prior betas, we will
> only
> > > > > accept
> > > > > > >>>>> bug
> > > > > > >>>>>>> fixes into 3.0 after branching.
> > > > > > >>>>>>>
> > > > > > >>>>>>> One month after 3.0, we will ship 3.1 (with new
> features).
> > > At
> > > > > the
> > > > > > >>>>> same
> > > > > > >>>>>>> time, we will branch 3.2.  New features in trunk will go
> > into
> > > > > 3.3.
> > > > > > >>>>> The
> > > > > > >>>>>> 3.2
> > > > > > >>>>>>> branch will only get bug fixes.  We will maintain
> backwards
> > > > > > >>>>> compatibility
> > > > > > >>>>>>> for all of 3.x; eventually (no less than a year) we will
> > > pick a
> > > > > > >>>>> release
> > > > > > >>>>>> to
> > > > > > >>>>>>> be 4.0, and drop deprecated features and old backwards
> > > > > > >>>>> compatibilities.
> > > > > > >>>>>>> Otherwise there will be nothing special about the 4.0
> > > > > designation.
> > > > > > >>>>> (Note
> > > > > > >>>>>>> that with an “odd releases have new features, even
> releases
> > > > only
> > > > > > have
> > > > > > >>>>> bug
> > > > > > >>>>>>> fixes” policy, 4.0 will actually be *more* stable than
> > 3.11.)
> > > > > > >>>>>>>
> > > > > > >>>>>>> Larger features can continue to be developed in separate
> > > > > branches,
> > > > > > >>>>> the
> > > > > > >>>>>> way
> > > > > > >>>>>>> 8099 is being worked on today, and committed to trunk
> when
> > > > ready.
> > > > > > So
> > > > > > >>>>>> this
> > > > > > >>>>>>> is not saying that we are limited only to features we can
> > > build
> > > > > in
> > > > > > a
> > > > > > >>>>>> single
> > > > > > >>>>>>> month.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Some things will have to change with our dev process, for
> > the
> > > > > > better.
> > > > > > >>>>> In
> > > > > > >>>>>>> particular, with one month to commit new features, we
> don’t
> > > > have
> > > > > > room
> > > > > > >>>>> for
> > > > > > >>>>>>> committing sloppy work and stabilizing it later.  Trunk
> has
> > > to
> > > > be
> > > > > > >>>>> stable
> > > > > > >>>>>> at
> > > > > > >>>>>>> all times.  I asked Ariel Weisberg to put together his
> > > thoughts
> > > > > > >>>>>> separately
> > > > > > >>>>>>> on what worked for his team at VoltDB, and how we can
> apply
> > > > that
> > > > > to
> > > > > > >>>>>>> Cassandra -- see his email from Friday <
> > > http://bit.ly/1MHaOKX
> > > > >.
> > > > > > >>>>> (TLDR:
> > > > > > >>>>>>> Redefine “done” to include automated tests.
> Infrastructure
> > > to
> > > > > run
> > > > > > >>>>> tests
> > > > > > >>>>>>> against github branches before merging to trunk.  A new
> > test
> > > > > > harness
> > > > > > >>>>> for
> > > > > > >>>>>>> long-running regression tests.)
> > > > > > >>>>>>>
> > > > > > >>>>>>> I’m optimistic that as we improve our process this way,
> our
> > > > even
> > > > > > >>>>> releases
> > > > > > >>>>>>> will become increasingly stable.  If so, we can skip
> > > sub-minor
> > > > > > >>>>> releases
> > > > > > >>>>>>> (3.2.x) entirely, and focus on keeping the release train
> > > > moving.
> > > > > > In
> > > > > > >>>>> the
> > > > > > >>>>>>> meantime, we will continue delivering 2.1.x stability
> > > releases.
> > > > > > >>>>>>>
> > > > > > >>>>>>> This won’t be an entirely smooth transition.  In
> > particular,
> > > > you
> > > > > > will
> > > > > > >>>>>> have
> > > > > > >>>>>>> noticed that 3.1 will get more than a month’s worth of
> new
> > > > > features
> > > > > > >>>>> while
> > > > > > >>>>>>> we stabilize 3.0 as the last of the old way of doing
> > things,
> > > so
> > > > > > some
> > > > > > >>>>>>> patience is in order as we try this out.  By 3.4 and 3.6
> > > later
> > > > > this
> > > > > > >>>>> year
> > > > > > >>>>>> we
> > > > > > >>>>>>> should have a good idea if this is working, and we can
> make
> > > > > > >>>>> adjustments
> > > > > > >>>>>> as
> > > > > > >>>>>>> warranted.
> > > > > > >>>>>>>
> > > > > > >>>>>>> --
> > > > > > >>>>>>> Jonathan Ellis
> > > > > > >>>>>>> Project Chair, Apache Cassandra
> > > > > > >>>>>>> co-founder, http://www.datastax.com
> > > > > > >>>>>>> @spyced
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Phil Yang
> > > >
> > >
> >
>

Re: 3.0 and the Cassandra release process

Posted by Ryan McGuire <ry...@datastax.com>.

I'm taking notes from the infrastructure doc and wrote down some action
items for my team:

https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976


--

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan McGuire

Software Engineering Manager in Test | ryan@datastax.com

[image: linkedin.png] <https://www.linkedin.com/in/enigmacurry> [image:
twitter.png] <http://twitter.com/enigmacurry>
<http://github.com/enigmacurry>


On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg <ariel.weisberg@datastax.com
> wrote:

> Hi,
>
> I realized one of the documents we didn't send out was the infrastructure
> side changes I am looking for. This one is maybe a little rougher as it was
> the first one I wrote on the subject.
>
>
> https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing
>
> The goal is to have infrastructure that gives developers as close to
> immediate feedback as possible on their code before they merge. Feedback
> that is delayed to after merging to trunk should come in a day or two and
> there is a product owner (Michael Shuler) responsible for making sure that
> issues are addressed quickly.
>
> QA is going to help by providing developers with a better tools for writing
> higher level functional tests that explore all of the functions together
> along with the configuration space without developers having to do any work
> other then plugging in functionality to exercise and then validate
> something specific. This kind of harness is hard to get right and make
> reliable and expressive so they have their work cut out for them.
>
> It's going to be an iterative process where the tests improve as new work
> introduces missing coverage and as bugs/regressions drive the introduction
> of new tests. The monthly retrospective (planning on doing that first of
> the month) is also going to help us refine the testing and development
> process.
>
> Ariel
>
> On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown <ja...@gmail.com> wrote:
>
> > +1 to this general proposal. I think the time has finally come for us to
> > try something new, and this sounds legit. Thanks!
> >
> > On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang <ud...@gmail.com> wrote:
> >
> > > Can I regard the odd version as the "development preview" and the even
> > > version as the "production ready"?
> > >
> > > IMO, as a database infrastructure project, "stable" is more important
> > than
> > > other kinds of projects. LTS is a good idea, but if we don't support
> > > non-LTS releases for enough time to fix their bugs, users on non-LTS
> > > release may have to upgrade a new major release to fix the bugs and may
> > > have to handle some new bugs by the new features. I'm afraid that
> > > eventually people would only think about the LTS one.
> > >
> > >
> > > 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich <po...@gmail.com>:
> > >
> > > > +1
> > > >
> > > > On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman <
> > > > mkjellman@internalcircle.com> wrote:
> > > >
> > > > > For most of my life I’ve lived on the software bleeding edge both
> > > > > personally and professionally. Maybe it’s a personal weakness, but
> I
> > > > guess
> > > > > I get a thrill out of the problem solving aspect?
> > > > >
> > > > > Recently I came to a bit of an epiphany — the closer I keep to the
> > > daily
> > > > > build — generally the happier I am on a daily basis. Bugs happen,
> but
> > > for
> > > > > the most part (aside from show stopper bugs), pain points for
> myself
> > > in a
> > > > > given daily build can generally can be debugged to 1 or maybe 2
> root
> > > > > causes, fixed in ~24 hours, and then life is better the next day
> > again.
> > > > In
> > > > > comparison, the old waterfall model generally means taking an
> > > “official”
> > > > > release at some point and waiting for some poor soul (or developer)
> > to
> > > > > actually run the thing. No matter how good the QA team is, until
> it’s
> > > > > actually used in the real world, most bugs aren’t found.
> > > > >
> > > > > If you and your organization can wait 24 hours * number of bugs
> > > > discovered
> > > > > after people actually started using the thing, you end up with a
> > > “usable
> > > > > build” around the holy-grail minor X.X.5 release of Cassandra.
> > > > >
> > > > > I love the idea of the LTS model Jonathan describes because it
> means
> > > more
> > > > > code can get real testing and “bake” for longer instead of sitting
> > > > largely
> > > > > unused on some git repository in a datacenter far far away. A lot
> of
> > > code
> > > > > has changed between 2.0 and trunk today. The code has diverged to
> the
> > > > point
> > > > > that if you write something for 2.0 (as the most stable major
> branch
> > > > > currently available), merging it forward to 3.0 or after generally
> > > means
> > > > > rewriting it. If the only thing that comes out of this is a smaller
> > > delta
> > > > > of LOC between the deployable version/branch and what we can
> develop
> > > > > against and what QA is focused on I think that’s a massive win.
> > > > >
> > > > > Something like CASSANDRA-8099 will need 2x the baking time of even
> > many
> > > > of
> > > > > the more risky changes the project has made. While I wouldn’t want
> to
> > > > run a
> > > > > build with CASSANDRA-8099 in it anytime soon, there are now
> hundreds
> > of
> > > > > other changes blocked, most likely many containing new bugs of
> their
> > > own,
> > > > > but have no exposure at all to even the most involved C*
> developers.
> > > > >
> > > > > I really think this will be a huge win for the project and I’m
> super
> > > > > thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for
> guiding
> > > this
> > > > > change to a much more sustainable release model for the entire
> > > community.
> > > > >
> > > > > best,
> > > > > kjellman
> > > > >
> > > > >
> > > > > > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <
> > > > ariel.weisberg@datastax.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Keep in mind it is a bug fix release every month and a feature
> > > release
> > > > > every two months.
> > > > > >
> > > > > > For development that is really a two month cycle with all bug
> fixes
> > > > > being backported one release. As a developer if you want to get
> > > something
> > > > > in a release you have two months and you should be sizing pieces of
> > > large
> > > > > tasks so they ship at least every two months.
> > > > > >
> > > > > > Ariel
> > > > > >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <
> > tscanausa@gmail.com
> > > >
> > > > > wrote:
> > > > > >>
> > > > > >> I like the idea but I agree that every month is a bit
> aggressive.
> > I
> > > > > have no
> > > > > >> say but:
> > > > > >>
> > > > > >> I would say 4 releases a year instead of 12. with 2 months of
> new
> > > > > features
> > > > > >> and 1 month of bug squashing per a release. With the 4th quarter
> > > just
> > > > > bugs.
> > > > > >>
> > > > > >> I would also proposed 2 year LTS releases for the releases after
> > the
> > > > 4th
> > > > > >> quarter. So everyone could get a new feature release every
> quarter
> > > and
> > > > > the
> > > > > >> stability of super major versions for 2 years.
> > > > > >>
> > > > > >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <
> > > > dbrosius@mebigfatguy.com
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> It would seem the practical implications of this is that there
> > > would
> > > > be
> > > > > >>> significantly more development on branches, with potentially
> more
> > > > > >>> significant delays on merging these branches. This would imply
> to
> > > me
> > > > > that
> > > > > >>> more Jenkins servers would need to be set up to handle
> > auto-testing
> > > > of
> > > > > more
> > > > > >>> branches, as if feature work spends more time on external
> > branches,
> > > > it
> > > > > is
> > > > > >>> then likely to be be less tested (even if by accident) as less
> > > > > developers
> > > > > >>> would be working on that branch. Only when a feature was
> blessed
> > to
> > > > > make it
> > > > > >>> to the release-tracked branch, would it become exposed to the
> > > > majority
> > > > > of
> > > > > >>> developers/testers, etc doing normal running/playing/testing.
> > > > > >>>
> > > > > >>> This isn't to knock the idea in anyway, just wanted to mention
> > > what i
> > > > > >>> think the outcome would be.
> > > > > >>>
> > > > > >>> dave
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>>
> > > > > >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <
> > > > jbellis@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>>>> Cassandra 2.1 was released in September, which means that
> if
> > we
> > > > > were
> > > > > >>>>> on
> > > > > >>>>>>> track with our stated goal of six month releases, 3.0 would
> > be
> > > > done
> > > > > >>>>> about
> > > > > >>>>>>> now.  Instead, we haven't even delivered a beta.  The
> > immediate
> > > > > cause
> > > > > >>>>>> this
> > > > > >>>>>>> time is blocking for 8099
> > > > > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>,
> but
> > > the
> > > > > >>>>> reality
> > > > > >>>>>> is
> > > > > >>>>>>> that nobody should really be surprised.  Something always
> > comes
> > > > up
> > > > > --
> > > > > >>>>>> we've
> > > > > >>>>>>> averaged about nine months since 1.0, with 2.1 taking an
> > entire
> > > > > year.
> > > > > >>>>>>>
> > > > > >>>>>>> We could make theory align with reality by acknowledging,
> "if
> > > > nine
> > > > > >>>>> months
> > > > > >>>>>>> is our 'natural' release schedule, then so be it."  But I
> > think
> > > > we
> > > > > >>>>> can
> > > > > >>>>> do
> > > > > >>>>>>> better.
> > > > > >>>>>>>
> > > > > >>>>>>> Broadly speaking, we have two constituencies with Cassandra
> > > > > releases:
> > > > > >>>>>>>
> > > > > >>>>>>> First, we have the users who are building or porting an
> > > > application
> > > > > >>>>> on
> > > > > >>>>>>> Cassandra.  These users want the newest features to make
> > their
> > > > job
> > > > > >>>>>> easier.
> > > > > >>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.
> They
> > > > have
> > > > > >>>>> time
> > > > > >>>>>> to
> > > > > >>>>>>> wait for 2.1.x to stabilize while they write their code.
> > They
> > > > > would
> > > > > >>>>> like
> > > > > >>>>>>> to see us deliver on our six month schedule or even faster.
> > > > > >>>>>>>
> > > > > >>>>>>> Second, we have the users who have an application in
> > > production.
> > > > > >>>>> These
> > > > > >>>>>>> users, or their bosses, want Cassandra to be as stable as
> > > > possible.
> > > > > >>>>>>> Assuming they deploy on a stable release like 2.0.12, they
> > > don't
> > > > > want
> > > > > >>>>> to
> > > > > >>>>>>> touch it.  They would like to see us release *less* often.
> > > > > (Because
> > > > > >>>>> that
> > > > > >>>>>>> means they have to do less upgrades while remaining in our
> > > > > backwards
> > > > > >>>>>>> compatibility window.)
> > > > > >>>>>>>
> > > > > >>>>>>> With our current "big release every X months" model, these
> > > users'
> > > > > >>>>> needs
> > > > > >>>>>> are
> > > > > >>>>>>> in tension.
> > > > > >>>>>>>
> > > > > >>>>>>> We discussed this six months ago, and ended up with this:
> > > > > >>>>>>>
> > > > > >>>>>>> What if we tried a [four month] release cycle, BUT we would
> > > > > guarantee
> > > > > >>>>>> that
> > > > > >>>>>>>> you could do a rolling upgrade until we bump the
> supermajor
> > > > > version?
> > > > > >>>>> So
> > > > > >>>>>> 2.0
> > > > > >>>>>>>> could upgrade to 3.0 without having to go through 2.1.
> (But
> > > to
> > > > go
> > > > > >>>>> to
> > > > > >>>>>> 3.1
> > > > > >>>>>>>> or 4.0 you would have to go through 3.0.)
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Crucially, I added
> > > > > >>>>>>>
> > > > > >>>>>>> Whether this is reasonable depends on how fast we can
> > stabilize
> > > > > >>>>> releases.
> > > > > >>>>>>>> 2.1.0 will be a good test of this.
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Unfortunately, even after DataStax hired half a dozen
> > full-time
> > > > > test
> > > > > >>>>>>> engineers, 2.1.0 continued the proud tradition of being
> > unready
> > > > for
> > > > > >>>>>>> production use, with "wait for .5 before upgrading" once
> > again
> > > > > >>>>> looking
> > > > > >>>>>> like
> > > > > >>>>>>> a good guideline.
> > > > > >>>>>>>
> > > > > >>>>>>> I’m starting to think that the entire model of “write a
> bunch
> > > of
> > > > > new
> > > > > >>>>>>> features all at once and then try to stabilize it for
> > release”
> > > is
> > > > > >>>>> broken.
> > > > > >>>>>>> We’ve been trying that for years and empirically speaking
> the
> > > > > >>>>> evidence
> > > > > >>>>> is
> > > > > >>>>>>> that it just doesn’t work, either from a stability
> standpoint
> > > or
> > > > > even
> > > > > >>>>>> just
> > > > > >>>>>>> shipping on time.
> > > > > >>>>>>>
> > > > > >>>>>>> A big reason that it takes us so long to stabilize new
> > releases
> > > > now
> > > > > >>>>> is
> > > > > >>>>>>> that, because our major release cycle is so long, it’s
> super
> > > > > tempting
> > > > > >>>>> to
> > > > > >>>>>>> slip in “just one” new feature into bugfix releases, and
> I’m
> > as
> > > > > >>>>> guilty
> > > > > >>>>> of
> > > > > >>>>>>> that as anyone.
> > > > > >>>>>>>
> > > > > >>>>>>> For similar reasons, it’s difficult to do a meaningful
> freeze
> > > > with
> > > > > >>>>> big
> > > > > >>>>>>> feature releases.  A look at 3.0 shows why: we have 8099
> > > coming,
> > > > > but
> > > > > >>>>> we
> > > > > >>>>>>> also have significant work done (but not finished) on 6230,
> > > 7970,
> > > > > >>>>> 6696,
> > > > > >>>>>> and
> > > > > >>>>>>> 6477, all of which are meaningful improvements that address
> > > > > >>>>> demonstrated
> > > > > >>>>>>> user pain.  So if we keep doing what we’ve been doing, our
> > > > choices
> > > > > >>>>> are
> > > > > >>>>> to
> > > > > >>>>>>> either delay 3.0 further while we finish and stabilize
> these,
> > > or
> > > > we
> > > > > >>>>> wait
> > > > > >>>>>>> nine months to a year for the next release.  Either way,
> one
> > of
> > > > our
> > > > > >>>>>>> constituencies gets disappointed.
> > > > > >>>>>>>
> > > > > >>>>>>> So, I’d like to try something different.  I think we were
> on
> > > the
> > > > > >>>>> right
> > > > > >>>>>>> track with shorter releases with more compatibility.  But
> I’d
> > > > like
> > > > > to
> > > > > >>>>>> throw
> > > > > >>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock”
> > > schedule
> > > > > for
> > > > > >>>>> new
> > > > > >>>>>>> architectures and process shrinks instead of trying to do
> > both
> > > at
> > > > > >>>>> once.
> > > > > >>>>>> We
> > > > > >>>>>>> can do something similar here:
> > > > > >>>>>>>
> > > > > >>>>>>> One month releases.  Period.  If it’s not done, it can
> wait.
> > > > > >>>>>>> *Every other release only accepts bug fixes.*
> > > > > >>>>>>>
> > > > > >>>>>>> By itself, one-month releases are going to dramatically
> > reduce
> > > > the
> > > > > >>>>>>> complexity of testing and debugging new releases -- and
> bugs
> > > that
> > > > > do
> > > > > >>>>> slip
> > > > > >>>>>>> past us will only affect a smaller percentage of users,
> > > avoiding
> > > > > the
> > > > > >>>>> “big
> > > > > >>>>>>> release has a bunch of bugs no one has seen before and
> pretty
> > > > much
> > > > > >>>>>> everyone
> > > > > >>>>>>> is hit by something” scenario.  But by adding in the second
> > > > rule, I
> > > > > >>>>> think
> > > > > >>>>>>> we have a real chance to make a quantum leap here: stable,
> > > > > >>>>>> production-ready
> > > > > >>>>>>> releases every two months.
> > > > > >>>>>>>
> > > > > >>>>>>> So here is my proposal for 3.0:
> > > > > >>>>>>>
> > > > > >>>>>>> We’re just about ready to start serious review of 8099.
> When
> > > > > that’s
> > > > > >>>>>> done,
> > > > > >>>>>>> we branch 3.0 and cut a beta and then release candidates.
> > > > Whatever
> > > > > >>>>> isn’t
> > > > > >>>>>>> done by then, has to wait; unlike prior betas, we will only
> > > > accept
> > > > > >>>>> bug
> > > > > >>>>>>> fixes into 3.0 after branching.
> > > > > >>>>>>>
> > > > > >>>>>>> One month after 3.0, we will ship 3.1 (with new features).
> > At
> > > > the
> > > > > >>>>> same
> > > > > >>>>>>> time, we will branch 3.2.  New features in trunk will go
> into
> > > > 3.3.
> > > > > >>>>> The
> > > > > >>>>>> 3.2
> > > > > >>>>>>> branch will only get bug fixes.  We will maintain backwards
> > > > > >>>>> compatibility
> > > > > >>>>>>> for all of 3.x; eventually (no less than a year) we will
> > pick a
> > > > > >>>>> release
> > > > > >>>>>> to
> > > > > >>>>>>> be 4.0, and drop deprecated features and old backwards
> > > > > >>>>> compatibilities.
> > > > > >>>>>>> Otherwise there will be nothing special about the 4.0
> > > > designation.
> > > > > >>>>> (Note
> > > > > >>>>>>> that with an “odd releases have new features, even releases
> > > only
> > > > > have
> > > > > >>>>> bug
> > > > > >>>>>>> fixes” policy, 4.0 will actually be *more* stable than
> 3.11.)
> > > > > >>>>>>>
> > > > > >>>>>>> Larger features can continue to be developed in separate
> > > > branches,
> > > > > >>>>> the
> > > > > >>>>>> way
> > > > > >>>>>>> 8099 is being worked on today, and committed to trunk when
> > > ready.
> > > > > So
> > > > > >>>>>> this
> > > > > >>>>>>> is not saying that we are limited only to features we can
> > build
> > > > in
> > > > > a
> > > > > >>>>>> single
> > > > > >>>>>>> month.
> > > > > >>>>>>>
> > > > > >>>>>>> Some things will have to change with our dev process, for
> the
> > > > > better.
> > > > > >>>>> In
> > > > > >>>>>>> particular, with one month to commit new features, we don’t
> > > have
> > > > > room
> > > > > >>>>> for
> > > > > >>>>>>> committing sloppy work and stabilizing it later.  Trunk has
> > to
> > > be
> > > > > >>>>> stable
> > > > > >>>>>> at
> > > > > >>>>>>> all times.  I asked Ariel Weisberg to put together his
> > thoughts
> > > > > >>>>>> separately
> > > > > >>>>>>> on what worked for his team at VoltDB, and how we can apply
> > > that
> > > > to
> > > > > >>>>>>> Cassandra -- see his email from Friday <
> > http://bit.ly/1MHaOKX
> > > >.
> > > > > >>>>> (TLDR:
> > > > > >>>>>>> Redefine “done” to include automated tests.  Infrastructure
> > to
> > > > run
> > > > > >>>>> tests
> > > > > >>>>>>> against github branches before merging to trunk.  A new
> test
> > > > > harness
> > > > > >>>>> for
> > > > > >>>>>>> long-running regression tests.)
> > > > > >>>>>>>
> > > > > >>>>>>> I’m optimistic that as we improve our process this way, our
> > > even
> > > > > >>>>> releases
> > > > > >>>>>>> will become increasingly stable.  If so, we can skip
> > sub-minor
> > > > > >>>>> releases
> > > > > >>>>>>> (3.2.x) entirely, and focus on keeping the release train
> > > moving.
> > > > > In
> > > > > >>>>> the
> > > > > >>>>>>> meantime, we will continue delivering 2.1.x stability
> > releases.
> > > > > >>>>>>>
> > > > > >>>>>>> This won’t be an entirely smooth transition.  In
> particular,
> > > you
> > > > > will
> > > > > >>>>>> have
> > > > > >>>>>>> noticed that 3.1 will get more than a month’s worth of new
> > > > features
> > > > > >>>>> while
> > > > > >>>>>>> we stabilize 3.0 as the last of the old way of doing
> things,
> > so
> > > > > some
> > > > > >>>>>>> patience is in order as we try this out.  By 3.4 and 3.6
> > later
> > > > this
> > > > > >>>>> year
> > > > > >>>>>> we
> > > > > >>>>>>> should have a good idea if this is working, and we can make
> > > > > >>>>> adjustments
> > > > > >>>>>> as
> > > > > >>>>>>> warranted.
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Jonathan Ellis
> > > > > >>>>>>> Project Chair, Apache Cassandra
> > > > > >>>>>>> co-founder, http://www.datastax.com
> > > > > >>>>>>> @spyced
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Phil Yang
> > >
> >
>

Re: 3.0 and the Cassandra release process

Posted by Ariel Weisberg <ar...@datastax.com>.

Hi,

I realized one of the documents we didn't send out was the infrastructure
side changes I am looking for. This one is maybe a little rougher as it was
the first one I wrote on the subject.

https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing

The goal is to have infrastructure that gives developers as close to
immediate feedback as possible on their code before they merge. Feedback
that is delayed to after merging to trunk should come in a day or two and
there is a product owner (Michael Shuler) responsible for making sure that
issues are addressed quickly.

QA is going to help by providing developers with a better tools for writing
higher level functional tests that explore all of the functions together
along with the configuration space without developers having to do any work
other then plugging in functionality to exercise and then validate
something specific. This kind of harness is hard to get right and make
reliable and expressive so they have their work cut out for them.

It's going to be an iterative process where the tests improve as new work
introduces missing coverage and as bugs/regressions drive the introduction
of new tests. The monthly retrospective (planning on doing that first of
the month) is also going to help us refine the testing and development
process.

Ariel

On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown <ja...@gmail.com> wrote:

> +1 to this general proposal. I think the time has finally come for us to
> try something new, and this sounds legit. Thanks!
>
> On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang <ud...@gmail.com> wrote:
>
> > Can I regard the odd version as the "development preview" and the even
> > version as the "production ready"?
> >
> > IMO, as a database infrastructure project, "stable" is more important
> than
> > other kinds of projects. LTS is a good idea, but if we don't support
> > non-LTS releases for enough time to fix their bugs, users on non-LTS
> > release may have to upgrade a new major release to fix the bugs and may
> > have to handle some new bugs by the new features. I'm afraid that
> > eventually people would only think about the LTS one.
> >
> >
> > 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich <po...@gmail.com>:
> >
> > > +1
> > >
> > > On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman <
> > > mkjellman@internalcircle.com> wrote:
> > >
> > > > For most of my life I’ve lived on the software bleeding edge both
> > > > personally and professionally. Maybe it’s a personal weakness, but I
> > > guess
> > > > I get a thrill out of the problem solving aspect?
> > > >
> > > > Recently I came to a bit of an epiphany — the closer I keep to the
> > daily
> > > > build — generally the happier I am on a daily basis. Bugs happen, but
> > for
> > > > the most part (aside from show stopper bugs), pain points for myself
> > in a
> > > > given daily build can generally can be debugged to 1 or maybe 2 root
> > > > causes, fixed in ~24 hours, and then life is better the next day
> again.
> > > In
> > > > comparison, the old waterfall model generally means taking an
> > “official”
> > > > release at some point and waiting for some poor soul (or developer)
> to
> > > > actually run the thing. No matter how good the QA team is, until it’s
> > > > actually used in the real world, most bugs aren’t found.
> > > >
> > > > If you and your organization can wait 24 hours * number of bugs
> > > discovered
> > > > after people actually started using the thing, you end up with a
> > “usable
> > > > build” around the holy-grail minor X.X.5 release of Cassandra.
> > > >
> > > > I love the idea of the LTS model Jonathan describes because it means
> > more
> > > > code can get real testing and “bake” for longer instead of sitting
> > > largely
> > > > unused on some git repository in a datacenter far far away. A lot of
> > code
> > > > has changed between 2.0 and trunk today. The code has diverged to the
> > > point
> > > > that if you write something for 2.0 (as the most stable major branch
> > > > currently available), merging it forward to 3.0 or after generally
> > means
> > > > rewriting it. If the only thing that comes out of this is a smaller
> > delta
> > > > of LOC between the deployable version/branch and what we can develop
> > > > against and what QA is focused on I think that’s a massive win.
> > > >
> > > > Something like CASSANDRA-8099 will need 2x the baking time of even
> many
> > > of
> > > > the more risky changes the project has made. While I wouldn’t want to
> > > run a
> > > > build with CASSANDRA-8099 in it anytime soon, there are now hundreds
> of
> > > > other changes blocked, most likely many containing new bugs of their
> > own,
> > > > but have no exposure at all to even the most involved C* developers.
> > > >
> > > > I really think this will be a huge win for the project and I’m super
> > > > thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding
> > this
> > > > change to a much more sustainable release model for the entire
> > community.
> > > >
> > > > best,
> > > > kjellman
> > > >
> > > >
> > > > > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <
> > > ariel.weisberg@datastax.com>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Keep in mind it is a bug fix release every month and a feature
> > release
> > > > every two months.
> > > > >
> > > > > For development that is really a two month cycle with all bug fixes
> > > > being backported one release. As a developer if you want to get
> > something
> > > > in a release you have two months and you should be sizing pieces of
> > large
> > > > tasks so they ship at least every two months.
> > > > >
> > > > > Ariel
> > > > >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <
> tscanausa@gmail.com
> > >
> > > > wrote:
> > > > >>
> > > > >> I like the idea but I agree that every month is a bit aggressive.
> I
> > > > have no
> > > > >> say but:
> > > > >>
> > > > >> I would say 4 releases a year instead of 12. with 2 months of new
> > > > features
> > > > >> and 1 month of bug squashing per a release. With the 4th quarter
> > just
> > > > bugs.
> > > > >>
> > > > >> I would also proposed 2 year LTS releases for the releases after
> the
> > > 4th
> > > > >> quarter. So everyone could get a new feature release every quarter
> > and
> > > > the
> > > > >> stability of super major versions for 2 years.
> > > > >>
> > > > >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <
> > > dbrosius@mebigfatguy.com
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >>> It would seem the practical implications of this is that there
> > would
> > > be
> > > > >>> significantly more development on branches, with potentially more
> > > > >>> significant delays on merging these branches. This would imply to
> > me
> > > > that
> > > > >>> more Jenkins servers would need to be set up to handle
> auto-testing
> > > of
> > > > more
> > > > >>> branches, as if feature work spends more time on external
> branches,
> > > it
> > > > is
> > > > >>> then likely to be be less tested (even if by accident) as less
> > > > developers
> > > > >>> would be working on that branch. Only when a feature was blessed
> to
> > > > make it
> > > > >>> to the release-tracked branch, would it become exposed to the
> > > majority
> > > > of
> > > > >>> developers/testers, etc doing normal running/playing/testing.
> > > > >>>
> > > > >>> This isn't to knock the idea in anyway, just wanted to mention
> > what i
> > > > >>> think the outcome would be.
> > > > >>>
> > > > >>> dave
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>>
> > > > >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <
> > > jbellis@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>>> Cassandra 2.1 was released in September, which means that if
> we
> > > > were
> > > > >>>>> on
> > > > >>>>>>> track with our stated goal of six month releases, 3.0 would
> be
> > > done
> > > > >>>>> about
> > > > >>>>>>> now.  Instead, we haven't even delivered a beta.  The
> immediate
> > > > cause
> > > > >>>>>> this
> > > > >>>>>>> time is blocking for 8099
> > > > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but
> > the
> > > > >>>>> reality
> > > > >>>>>> is
> > > > >>>>>>> that nobody should really be surprised.  Something always
> comes
> > > up
> > > > --
> > > > >>>>>> we've
> > > > >>>>>>> averaged about nine months since 1.0, with 2.1 taking an
> entire
> > > > year.
> > > > >>>>>>>
> > > > >>>>>>> We could make theory align with reality by acknowledging, "if
> > > nine
> > > > >>>>> months
> > > > >>>>>>> is our 'natural' release schedule, then so be it."  But I
> think
> > > we
> > > > >>>>> can
> > > > >>>>> do
> > > > >>>>>>> better.
> > > > >>>>>>>
> > > > >>>>>>> Broadly speaking, we have two constituencies with Cassandra
> > > > releases:
> > > > >>>>>>>
> > > > >>>>>>> First, we have the users who are building or porting an
> > > application
> > > > >>>>> on
> > > > >>>>>>> Cassandra.  These users want the newest features to make
> their
> > > job
> > > > >>>>>> easier.
> > > > >>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.  They
> > > have
> > > > >>>>> time
> > > > >>>>>> to
> > > > >>>>>>> wait for 2.1.x to stabilize while they write their code.
> They
> > > > would
> > > > >>>>> like
> > > > >>>>>>> to see us deliver on our six month schedule or even faster.
> > > > >>>>>>>
> > > > >>>>>>> Second, we have the users who have an application in
> > production.
> > > > >>>>> These
> > > > >>>>>>> users, or their bosses, want Cassandra to be as stable as
> > > possible.
> > > > >>>>>>> Assuming they deploy on a stable release like 2.0.12, they
> > don't
> > > > want
> > > > >>>>> to
> > > > >>>>>>> touch it.  They would like to see us release *less* often.
> > > > (Because
> > > > >>>>> that
> > > > >>>>>>> means they have to do less upgrades while remaining in our
> > > > backwards
> > > > >>>>>>> compatibility window.)
> > > > >>>>>>>
> > > > >>>>>>> With our current "big release every X months" model, these
> > users'
> > > > >>>>> needs
> > > > >>>>>> are
> > > > >>>>>>> in tension.
> > > > >>>>>>>
> > > > >>>>>>> We discussed this six months ago, and ended up with this:
> > > > >>>>>>>
> > > > >>>>>>> What if we tried a [four month] release cycle, BUT we would
> > > > guarantee
> > > > >>>>>> that
> > > > >>>>>>>> you could do a rolling upgrade until we bump the supermajor
> > > > version?
> > > > >>>>> So
> > > > >>>>>> 2.0
> > > > >>>>>>>> could upgrade to 3.0 without having to go through 2.1.  (But
> > to
> > > go
> > > > >>>>> to
> > > > >>>>>> 3.1
> > > > >>>>>>>> or 4.0 you would have to go through 3.0.)
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Crucially, I added
> > > > >>>>>>>
> > > > >>>>>>> Whether this is reasonable depends on how fast we can
> stabilize
> > > > >>>>> releases.
> > > > >>>>>>>> 2.1.0 will be a good test of this.
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Unfortunately, even after DataStax hired half a dozen
> full-time
> > > > test
> > > > >>>>>>> engineers, 2.1.0 continued the proud tradition of being
> unready
> > > for
> > > > >>>>>>> production use, with "wait for .5 before upgrading" once
> again
> > > > >>>>> looking
> > > > >>>>>> like
> > > > >>>>>>> a good guideline.
> > > > >>>>>>>
> > > > >>>>>>> I’m starting to think that the entire model of “write a bunch
> > of
> > > > new
> > > > >>>>>>> features all at once and then try to stabilize it for
> release”
> > is
> > > > >>>>> broken.
> > > > >>>>>>> We’ve been trying that for years and empirically speaking the
> > > > >>>>> evidence
> > > > >>>>> is
> > > > >>>>>>> that it just doesn’t work, either from a stability standpoint
> > or
> > > > even
> > > > >>>>>> just
> > > > >>>>>>> shipping on time.
> > > > >>>>>>>
> > > > >>>>>>> A big reason that it takes us so long to stabilize new
> releases
> > > now
> > > > >>>>> is
> > > > >>>>>>> that, because our major release cycle is so long, it’s super
> > > > tempting
> > > > >>>>> to
> > > > >>>>>>> slip in “just one” new feature into bugfix releases, and I’m
> as
> > > > >>>>> guilty
> > > > >>>>> of
> > > > >>>>>>> that as anyone.
> > > > >>>>>>>
> > > > >>>>>>> For similar reasons, it’s difficult to do a meaningful freeze
> > > with
> > > > >>>>> big
> > > > >>>>>>> feature releases.  A look at 3.0 shows why: we have 8099
> > coming,
> > > > but
> > > > >>>>> we
> > > > >>>>>>> also have significant work done (but not finished) on 6230,
> > 7970,
> > > > >>>>> 6696,
> > > > >>>>>> and
> > > > >>>>>>> 6477, all of which are meaningful improvements that address
> > > > >>>>> demonstrated
> > > > >>>>>>> user pain.  So if we keep doing what we’ve been doing, our
> > > choices
> > > > >>>>> are
> > > > >>>>> to
> > > > >>>>>>> either delay 3.0 further while we finish and stabilize these,
> > or
> > > we
> > > > >>>>> wait
> > > > >>>>>>> nine months to a year for the next release.  Either way, one
> of
> > > our
> > > > >>>>>>> constituencies gets disappointed.
> > > > >>>>>>>
> > > > >>>>>>> So, I’d like to try something different.  I think we were on
> > the
> > > > >>>>> right
> > > > >>>>>>> track with shorter releases with more compatibility.  But I’d
> > > like
> > > > to
> > > > >>>>>> throw
> > > > >>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock”
> > schedule
> > > > for
> > > > >>>>> new
> > > > >>>>>>> architectures and process shrinks instead of trying to do
> both
> > at
> > > > >>>>> once.
> > > > >>>>>> We
> > > > >>>>>>> can do something similar here:
> > > > >>>>>>>
> > > > >>>>>>> One month releases.  Period.  If it’s not done, it can wait.
> > > > >>>>>>> *Every other release only accepts bug fixes.*
> > > > >>>>>>>
> > > > >>>>>>> By itself, one-month releases are going to dramatically
> reduce
> > > the
> > > > >>>>>>> complexity of testing and debugging new releases -- and bugs
> > that
> > > > do
> > > > >>>>> slip
> > > > >>>>>>> past us will only affect a smaller percentage of users,
> > avoiding
> > > > the
> > > > >>>>> “big
> > > > >>>>>>> release has a bunch of bugs no one has seen before and pretty
> > > much
> > > > >>>>>> everyone
> > > > >>>>>>> is hit by something” scenario.  But by adding in the second
> > > rule, I
> > > > >>>>> think
> > > > >>>>>>> we have a real chance to make a quantum leap here: stable,
> > > > >>>>>> production-ready
> > > > >>>>>>> releases every two months.
> > > > >>>>>>>
> > > > >>>>>>> So here is my proposal for 3.0:
> > > > >>>>>>>
> > > > >>>>>>> We’re just about ready to start serious review of 8099.  When
> > > > that’s
> > > > >>>>>> done,
> > > > >>>>>>> we branch 3.0 and cut a beta and then release candidates.
> > > Whatever
> > > > >>>>> isn’t
> > > > >>>>>>> done by then, has to wait; unlike prior betas, we will only
> > > accept
> > > > >>>>> bug
> > > > >>>>>>> fixes into 3.0 after branching.
> > > > >>>>>>>
> > > > >>>>>>> One month after 3.0, we will ship 3.1 (with new features).
> At
> > > the
> > > > >>>>> same
> > > > >>>>>>> time, we will branch 3.2.  New features in trunk will go into
> > > 3.3.
> > > > >>>>> The
> > > > >>>>>> 3.2
> > > > >>>>>>> branch will only get bug fixes.  We will maintain backwards
> > > > >>>>> compatibility
> > > > >>>>>>> for all of 3.x; eventually (no less than a year) we will
> pick a
> > > > >>>>> release
> > > > >>>>>> to
> > > > >>>>>>> be 4.0, and drop deprecated features and old backwards
> > > > >>>>> compatibilities.
> > > > >>>>>>> Otherwise there will be nothing special about the 4.0
> > > designation.
> > > > >>>>> (Note
> > > > >>>>>>> that with an “odd releases have new features, even releases
> > only
> > > > have
> > > > >>>>> bug
> > > > >>>>>>> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> > > > >>>>>>>
> > > > >>>>>>> Larger features can continue to be developed in separate
> > > branches,
> > > > >>>>> the
> > > > >>>>>> way
> > > > >>>>>>> 8099 is being worked on today, and committed to trunk when
> > ready.
> > > > So
> > > > >>>>>> this
> > > > >>>>>>> is not saying that we are limited only to features we can
> build
> > > in
> > > > a
> > > > >>>>>> single
> > > > >>>>>>> month.
> > > > >>>>>>>
> > > > >>>>>>> Some things will have to change with our dev process, for the
> > > > better.
> > > > >>>>> In
> > > > >>>>>>> particular, with one month to commit new features, we don’t
> > have
> > > > room
> > > > >>>>> for
> > > > >>>>>>> committing sloppy work and stabilizing it later.  Trunk has
> to
> > be
> > > > >>>>> stable
> > > > >>>>>> at
> > > > >>>>>>> all times.  I asked Ariel Weisberg to put together his
> thoughts
> > > > >>>>>> separately
> > > > >>>>>>> on what worked for his team at VoltDB, and how we can apply
> > that
> > > to
> > > > >>>>>>> Cassandra -- see his email from Friday <
> http://bit.ly/1MHaOKX
> > >.
> > > > >>>>> (TLDR:
> > > > >>>>>>> Redefine “done” to include automated tests.  Infrastructure
> to
> > > run
> > > > >>>>> tests
> > > > >>>>>>> against github branches before merging to trunk.  A new test
> > > > harness
> > > > >>>>> for
> > > > >>>>>>> long-running regression tests.)
> > > > >>>>>>>
> > > > >>>>>>> I’m optimistic that as we improve our process this way, our
> > even
> > > > >>>>> releases
> > > > >>>>>>> will become increasingly stable.  If so, we can skip
> sub-minor
> > > > >>>>> releases
> > > > >>>>>>> (3.2.x) entirely, and focus on keeping the release train
> > moving.
> > > > In
> > > > >>>>> the
> > > > >>>>>>> meantime, we will continue delivering 2.1.x stability
> releases.
> > > > >>>>>>>
> > > > >>>>>>> This won’t be an entirely smooth transition.  In particular,
> > you
> > > > will
> > > > >>>>>> have
> > > > >>>>>>> noticed that 3.1 will get more than a month’s worth of new
> > > features
> > > > >>>>> while
> > > > >>>>>>> we stabilize 3.0 as the last of the old way of doing things,
> so
> > > > some
> > > > >>>>>>> patience is in order as we try this out.  By 3.4 and 3.6
> later
> > > this
> > > > >>>>> year
> > > > >>>>>> we
> > > > >>>>>>> should have a good idea if this is working, and we can make
> > > > >>>>> adjustments
> > > > >>>>>> as
> > > > >>>>>>> warranted.
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Jonathan Ellis
> > > > >>>>>>> Project Chair, Apache Cassandra
> > > > >>>>>>> co-founder, http://www.datastax.com
> > > > >>>>>>> @spyced
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Phil Yang
> >
>

Re: 3.0 and the Cassandra release process

Posted by Jason Brown <ja...@gmail.com>.

+1 to this general proposal. I think the time has finally come for us to
try something new, and this sounds legit. Thanks!

On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang <ud...@gmail.com> wrote:

> Can I regard the odd version as the "development preview" and the even
> version as the "production ready"?
>
> IMO, as a database infrastructure project, "stable" is more important than
> other kinds of projects. LTS is a good idea, but if we don't support
> non-LTS releases for enough time to fix their bugs, users on non-LTS
> release may have to upgrade a new major release to fix the bugs and may
> have to handle some new bugs by the new features. I'm afraid that
> eventually people would only think about the LTS one.
>
>
> 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich <po...@gmail.com>:
>
> > +1
> >
> > On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman <
> > mkjellman@internalcircle.com> wrote:
> >
> > > For most of my life I’ve lived on the software bleeding edge both
> > > personally and professionally. Maybe it’s a personal weakness, but I
> > guess
> > > I get a thrill out of the problem solving aspect?
> > >
> > > Recently I came to a bit of an epiphany — the closer I keep to the
> daily
> > > build — generally the happier I am on a daily basis. Bugs happen, but
> for
> > > the most part (aside from show stopper bugs), pain points for myself
> in a
> > > given daily build can generally can be debugged to 1 or maybe 2 root
> > > causes, fixed in ~24 hours, and then life is better the next day again.
> > In
> > > comparison, the old waterfall model generally means taking an
> “official”
> > > release at some point and waiting for some poor soul (or developer) to
> > > actually run the thing. No matter how good the QA team is, until it’s
> > > actually used in the real world, most bugs aren’t found.
> > >
> > > If you and your organization can wait 24 hours * number of bugs
> > discovered
> > > after people actually started using the thing, you end up with a
> “usable
> > > build” around the holy-grail minor X.X.5 release of Cassandra.
> > >
> > > I love the idea of the LTS model Jonathan describes because it means
> more
> > > code can get real testing and “bake” for longer instead of sitting
> > largely
> > > unused on some git repository in a datacenter far far away. A lot of
> code
> > > has changed between 2.0 and trunk today. The code has diverged to the
> > point
> > > that if you write something for 2.0 (as the most stable major branch
> > > currently available), merging it forward to 3.0 or after generally
> means
> > > rewriting it. If the only thing that comes out of this is a smaller
> delta
> > > of LOC between the deployable version/branch and what we can develop
> > > against and what QA is focused on I think that’s a massive win.
> > >
> > > Something like CASSANDRA-8099 will need 2x the baking time of even many
> > of
> > > the more risky changes the project has made. While I wouldn’t want to
> > run a
> > > build with CASSANDRA-8099 in it anytime soon, there are now hundreds of
> > > other changes blocked, most likely many containing new bugs of their
> own,
> > > but have no exposure at all to even the most involved C* developers.
> > >
> > > I really think this will be a huge win for the project and I’m super
> > > thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding
> this
> > > change to a much more sustainable release model for the entire
> community.
> > >
> > > best,
> > > kjellman
> > >
> > >
> > > > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <
> > ariel.weisberg@datastax.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > Keep in mind it is a bug fix release every month and a feature
> release
> > > every two months.
> > > >
> > > > For development that is really a two month cycle with all bug fixes
> > > being backported one release. As a developer if you want to get
> something
> > > in a release you have two months and you should be sizing pieces of
> large
> > > tasks so they ship at least every two months.
> > > >
> > > > Ariel
> > > >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <tscanausa@gmail.com
> >
> > > wrote:
> > > >>
> > > >> I like the idea but I agree that every month is a bit aggressive. I
> > > have no
> > > >> say but:
> > > >>
> > > >> I would say 4 releases a year instead of 12. with 2 months of new
> > > features
> > > >> and 1 month of bug squashing per a release. With the 4th quarter
> just
> > > bugs.
> > > >>
> > > >> I would also proposed 2 year LTS releases for the releases after the
> > 4th
> > > >> quarter. So everyone could get a new feature release every quarter
> and
> > > the
> > > >> stability of super major versions for 2 years.
> > > >>
> > > >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <
> > dbrosius@mebigfatguy.com
> > > >
> > > >> wrote:
> > > >>
> > > >>> It would seem the practical implications of this is that there
> would
> > be
> > > >>> significantly more development on branches, with potentially more
> > > >>> significant delays on merging these branches. This would imply to
> me
> > > that
> > > >>> more Jenkins servers would need to be set up to handle auto-testing
> > of
> > > more
> > > >>> branches, as if feature work spends more time on external branches,
> > it
> > > is
> > > >>> then likely to be be less tested (even if by accident) as less
> > > developers
> > > >>> would be working on that branch. Only when a feature was blessed to
> > > make it
> > > >>> to the release-tracked branch, would it become exposed to the
> > majority
> > > of
> > > >>> developers/testers, etc doing normal running/playing/testing.
> > > >>>
> > > >>> This isn't to knock the idea in anyway, just wanted to mention
> what i
> > > >>> think the outcome would be.
> > > >>>
> > > >>> dave
> > > >>>
> > > >>>
> > > >>>
> > > >>>>
> > > >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <
> > jbellis@gmail.com>
> > > >>>>> wrote:
> > > >>>>>>> Cassandra 2.1 was released in September, which means that if we
> > > were
> > > >>>>> on
> > > >>>>>>> track with our stated goal of six month releases, 3.0 would be
> > done
> > > >>>>> about
> > > >>>>>>> now.  Instead, we haven't even delivered a beta.  The immediate
> > > cause
> > > >>>>>> this
> > > >>>>>>> time is blocking for 8099
> > > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but
> the
> > > >>>>> reality
> > > >>>>>> is
> > > >>>>>>> that nobody should really be surprised.  Something always comes
> > up
> > > --
> > > >>>>>> we've
> > > >>>>>>> averaged about nine months since 1.0, with 2.1 taking an entire
> > > year.
> > > >>>>>>>
> > > >>>>>>> We could make theory align with reality by acknowledging, "if
> > nine
> > > >>>>> months
> > > >>>>>>> is our 'natural' release schedule, then so be it."  But I think
> > we
> > > >>>>> can
> > > >>>>> do
> > > >>>>>>> better.
> > > >>>>>>>
> > > >>>>>>> Broadly speaking, we have two constituencies with Cassandra
> > > releases:
> > > >>>>>>>
> > > >>>>>>> First, we have the users who are building or porting an
> > application
> > > >>>>> on
> > > >>>>>>> Cassandra.  These users want the newest features to make their
> > job
> > > >>>>>> easier.
> > > >>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.  They
> > have
> > > >>>>> time
> > > >>>>>> to
> > > >>>>>>> wait for 2.1.x to stabilize while they write their code.  They
> > > would
> > > >>>>> like
> > > >>>>>>> to see us deliver on our six month schedule or even faster.
> > > >>>>>>>
> > > >>>>>>> Second, we have the users who have an application in
> production.
> > > >>>>> These
> > > >>>>>>> users, or their bosses, want Cassandra to be as stable as
> > possible.
> > > >>>>>>> Assuming they deploy on a stable release like 2.0.12, they
> don't
> > > want
> > > >>>>> to
> > > >>>>>>> touch it.  They would like to see us release *less* often.
> > > (Because
> > > >>>>> that
> > > >>>>>>> means they have to do less upgrades while remaining in our
> > > backwards
> > > >>>>>>> compatibility window.)
> > > >>>>>>>
> > > >>>>>>> With our current "big release every X months" model, these
> users'
> > > >>>>> needs
> > > >>>>>> are
> > > >>>>>>> in tension.
> > > >>>>>>>
> > > >>>>>>> We discussed this six months ago, and ended up with this:
> > > >>>>>>>
> > > >>>>>>> What if we tried a [four month] release cycle, BUT we would
> > > guarantee
> > > >>>>>> that
> > > >>>>>>>> you could do a rolling upgrade until we bump the supermajor
> > > version?
> > > >>>>> So
> > > >>>>>> 2.0
> > > >>>>>>>> could upgrade to 3.0 without having to go through 2.1.  (But
> to
> > go
> > > >>>>> to
> > > >>>>>> 3.1
> > > >>>>>>>> or 4.0 you would have to go through 3.0.)
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> Crucially, I added
> > > >>>>>>>
> > > >>>>>>> Whether this is reasonable depends on how fast we can stabilize
> > > >>>>> releases.
> > > >>>>>>>> 2.1.0 will be a good test of this.
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> Unfortunately, even after DataStax hired half a dozen full-time
> > > test
> > > >>>>>>> engineers, 2.1.0 continued the proud tradition of being unready
> > for
> > > >>>>>>> production use, with "wait for .5 before upgrading" once again
> > > >>>>> looking
> > > >>>>>> like
> > > >>>>>>> a good guideline.
> > > >>>>>>>
> > > >>>>>>> I’m starting to think that the entire model of “write a bunch
> of
> > > new
> > > >>>>>>> features all at once and then try to stabilize it for release”
> is
> > > >>>>> broken.
> > > >>>>>>> We’ve been trying that for years and empirically speaking the
> > > >>>>> evidence
> > > >>>>> is
> > > >>>>>>> that it just doesn’t work, either from a stability standpoint
> or
> > > even
> > > >>>>>> just
> > > >>>>>>> shipping on time.
> > > >>>>>>>
> > > >>>>>>> A big reason that it takes us so long to stabilize new releases
> > now
> > > >>>>> is
> > > >>>>>>> that, because our major release cycle is so long, it’s super
> > > tempting
> > > >>>>> to
> > > >>>>>>> slip in “just one” new feature into bugfix releases, and I’m as
> > > >>>>> guilty
> > > >>>>> of
> > > >>>>>>> that as anyone.
> > > >>>>>>>
> > > >>>>>>> For similar reasons, it’s difficult to do a meaningful freeze
> > with
> > > >>>>> big
> > > >>>>>>> feature releases.  A look at 3.0 shows why: we have 8099
> coming,
> > > but
> > > >>>>> we
> > > >>>>>>> also have significant work done (but not finished) on 6230,
> 7970,
> > > >>>>> 6696,
> > > >>>>>> and
> > > >>>>>>> 6477, all of which are meaningful improvements that address
> > > >>>>> demonstrated
> > > >>>>>>> user pain.  So if we keep doing what we’ve been doing, our
> > choices
> > > >>>>> are
> > > >>>>> to
> > > >>>>>>> either delay 3.0 further while we finish and stabilize these,
> or
> > we
> > > >>>>> wait
> > > >>>>>>> nine months to a year for the next release.  Either way, one of
> > our
> > > >>>>>>> constituencies gets disappointed.
> > > >>>>>>>
> > > >>>>>>> So, I’d like to try something different.  I think we were on
> the
> > > >>>>> right
> > > >>>>>>> track with shorter releases with more compatibility.  But I’d
> > like
> > > to
> > > >>>>>> throw
> > > >>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock”
> schedule
> > > for
> > > >>>>> new
> > > >>>>>>> architectures and process shrinks instead of trying to do both
> at
> > > >>>>> once.
> > > >>>>>> We
> > > >>>>>>> can do something similar here:
> > > >>>>>>>
> > > >>>>>>> One month releases.  Period.  If it’s not done, it can wait.
> > > >>>>>>> *Every other release only accepts bug fixes.*
> > > >>>>>>>
> > > >>>>>>> By itself, one-month releases are going to dramatically reduce
> > the
> > > >>>>>>> complexity of testing and debugging new releases -- and bugs
> that
> > > do
> > > >>>>> slip
> > > >>>>>>> past us will only affect a smaller percentage of users,
> avoiding
> > > the
> > > >>>>> “big
> > > >>>>>>> release has a bunch of bugs no one has seen before and pretty
> > much
> > > >>>>>> everyone
> > > >>>>>>> is hit by something” scenario.  But by adding in the second
> > rule, I
> > > >>>>> think
> > > >>>>>>> we have a real chance to make a quantum leap here: stable,
> > > >>>>>> production-ready
> > > >>>>>>> releases every two months.
> > > >>>>>>>
> > > >>>>>>> So here is my proposal for 3.0:
> > > >>>>>>>
> > > >>>>>>> We’re just about ready to start serious review of 8099.  When
> > > that’s
> > > >>>>>> done,
> > > >>>>>>> we branch 3.0 and cut a beta and then release candidates.
> > Whatever
> > > >>>>> isn’t
> > > >>>>>>> done by then, has to wait; unlike prior betas, we will only
> > accept
> > > >>>>> bug
> > > >>>>>>> fixes into 3.0 after branching.
> > > >>>>>>>
> > > >>>>>>> One month after 3.0, we will ship 3.1 (with new features).  At
> > the
> > > >>>>> same
> > > >>>>>>> time, we will branch 3.2.  New features in trunk will go into
> > 3.3.
> > > >>>>> The
> > > >>>>>> 3.2
> > > >>>>>>> branch will only get bug fixes.  We will maintain backwards
> > > >>>>> compatibility
> > > >>>>>>> for all of 3.x; eventually (no less than a year) we will pick a
> > > >>>>> release
> > > >>>>>> to
> > > >>>>>>> be 4.0, and drop deprecated features and old backwards
> > > >>>>> compatibilities.
> > > >>>>>>> Otherwise there will be nothing special about the 4.0
> > designation.
> > > >>>>> (Note
> > > >>>>>>> that with an “odd releases have new features, even releases
> only
> > > have
> > > >>>>> bug
> > > >>>>>>> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> > > >>>>>>>
> > > >>>>>>> Larger features can continue to be developed in separate
> > branches,
> > > >>>>> the
> > > >>>>>> way
> > > >>>>>>> 8099 is being worked on today, and committed to trunk when
> ready.
> > > So
> > > >>>>>> this
> > > >>>>>>> is not saying that we are limited only to features we can build
> > in
> > > a
> > > >>>>>> single
> > > >>>>>>> month.
> > > >>>>>>>
> > > >>>>>>> Some things will have to change with our dev process, for the
> > > better.
> > > >>>>> In
> > > >>>>>>> particular, with one month to commit new features, we don’t
> have
> > > room
> > > >>>>> for
> > > >>>>>>> committing sloppy work and stabilizing it later.  Trunk has to
> be
> > > >>>>> stable
> > > >>>>>> at
> > > >>>>>>> all times.  I asked Ariel Weisberg to put together his thoughts
> > > >>>>>> separately
> > > >>>>>>> on what worked for his team at VoltDB, and how we can apply
> that
> > to
> > > >>>>>>> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX
> >.
> > > >>>>> (TLDR:
> > > >>>>>>> Redefine “done” to include automated tests.  Infrastructure to
> > run
> > > >>>>> tests
> > > >>>>>>> against github branches before merging to trunk.  A new test
> > > harness
> > > >>>>> for
> > > >>>>>>> long-running regression tests.)
> > > >>>>>>>
> > > >>>>>>> I’m optimistic that as we improve our process this way, our
> even
> > > >>>>> releases
> > > >>>>>>> will become increasingly stable.  If so, we can skip sub-minor
> > > >>>>> releases
> > > >>>>>>> (3.2.x) entirely, and focus on keeping the release train
> moving.
> > > In
> > > >>>>> the
> > > >>>>>>> meantime, we will continue delivering 2.1.x stability releases.
> > > >>>>>>>
> > > >>>>>>> This won’t be an entirely smooth transition.  In particular,
> you
> > > will
> > > >>>>>> have
> > > >>>>>>> noticed that 3.1 will get more than a month’s worth of new
> > features
> > > >>>>> while
> > > >>>>>>> we stabilize 3.0 as the last of the old way of doing things, so
> > > some
> > > >>>>>>> patience is in order as we try this out.  By 3.4 and 3.6 later
> > this
> > > >>>>> year
> > > >>>>>> we
> > > >>>>>>> should have a good idea if this is working, and we can make
> > > >>>>> adjustments
> > > >>>>>> as
> > > >>>>>>> warranted.
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Jonathan Ellis
> > > >>>>>>> Project Chair, Apache Cassandra
> > > >>>>>>> co-founder, http://www.datastax.com
> > > >>>>>>> @spyced
> > > >>>>>
> > > >>>>
> > > >>>
> > > >
> > >
> > >
> >
>
>
>
> --
> Thanks,
> Phil Yang
>

Re: 3.0 and the Cassandra release process

Posted by Phil Yang <ud...@gmail.com>.

Can I regard the odd version as the "development preview" and the even
version as the "production ready"?

IMO, as a database infrastructure project, "stable" is more important than
other kinds of projects. LTS is a good idea, but if we don't support
non-LTS releases for enough time to fix their bugs, users on non-LTS
release may have to upgrade a new major release to fix the bugs and may
have to handle some new bugs by the new features. I'm afraid that
eventually people would only think about the LTS one.


2015-03-19 8:48 GMT+08:00 Pavel Yaskevich <po...@gmail.com>:

> +1
>
> On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman <
> mkjellman@internalcircle.com> wrote:
>
> > For most of my life I’ve lived on the software bleeding edge both
> > personally and professionally. Maybe it’s a personal weakness, but I
> guess
> > I get a thrill out of the problem solving aspect?
> >
> > Recently I came to a bit of an epiphany — the closer I keep to the daily
> > build — generally the happier I am on a daily basis. Bugs happen, but for
> > the most part (aside from show stopper bugs), pain points for myself in a
> > given daily build can generally can be debugged to 1 or maybe 2 root
> > causes, fixed in ~24 hours, and then life is better the next day again.
> In
> > comparison, the old waterfall model generally means taking an “official”
> > release at some point and waiting for some poor soul (or developer) to
> > actually run the thing. No matter how good the QA team is, until it’s
> > actually used in the real world, most bugs aren’t found.
> >
> > If you and your organization can wait 24 hours * number of bugs
> discovered
> > after people actually started using the thing, you end up with a “usable
> > build” around the holy-grail minor X.X.5 release of Cassandra.
> >
> > I love the idea of the LTS model Jonathan describes because it means more
> > code can get real testing and “bake” for longer instead of sitting
> largely
> > unused on some git repository in a datacenter far far away. A lot of code
> > has changed between 2.0 and trunk today. The code has diverged to the
> point
> > that if you write something for 2.0 (as the most stable major branch
> > currently available), merging it forward to 3.0 or after generally means
> > rewriting it. If the only thing that comes out of this is a smaller delta
> > of LOC between the deployable version/branch and what we can develop
> > against and what QA is focused on I think that’s a massive win.
> >
> > Something like CASSANDRA-8099 will need 2x the baking time of even many
> of
> > the more risky changes the project has made. While I wouldn’t want to
> run a
> > build with CASSANDRA-8099 in it anytime soon, there are now hundreds of
> > other changes blocked, most likely many containing new bugs of their own,
> > but have no exposure at all to even the most involved C* developers.
> >
> > I really think this will be a huge win for the project and I’m super
> > thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this
> > change to a much more sustainable release model for the entire community.
> >
> > best,
> > kjellman
> >
> >
> > > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <
> ariel.weisberg@datastax.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > Keep in mind it is a bug fix release every month and a feature release
> > every two months.
> > >
> > > For development that is really a two month cycle with all bug fixes
> > being backported one release. As a developer if you want to get something
> > in a release you have two months and you should be sizing pieces of large
> > tasks so they ship at least every two months.
> > >
> > > Ariel
> > >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <ts...@gmail.com>
> > wrote:
> > >>
> > >> I like the idea but I agree that every month is a bit aggressive. I
> > have no
> > >> say but:
> > >>
> > >> I would say 4 releases a year instead of 12. with 2 months of new
> > features
> > >> and 1 month of bug squashing per a release. With the 4th quarter just
> > bugs.
> > >>
> > >> I would also proposed 2 year LTS releases for the releases after the
> 4th
> > >> quarter. So everyone could get a new feature release every quarter and
> > the
> > >> stability of super major versions for 2 years.
> > >>
> > >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <
> dbrosius@mebigfatguy.com
> > >
> > >> wrote:
> > >>
> > >>> It would seem the practical implications of this is that there would
> be
> > >>> significantly more development on branches, with potentially more
> > >>> significant delays on merging these branches. This would imply to me
> > that
> > >>> more Jenkins servers would need to be set up to handle auto-testing
> of
> > more
> > >>> branches, as if feature work spends more time on external branches,
> it
> > is
> > >>> then likely to be be less tested (even if by accident) as less
> > developers
> > >>> would be working on that branch. Only when a feature was blessed to
> > make it
> > >>> to the release-tracked branch, would it become exposed to the
> majority
> > of
> > >>> developers/testers, etc doing normal running/playing/testing.
> > >>>
> > >>> This isn't to knock the idea in anyway, just wanted to mention what i
> > >>> think the outcome would be.
> > >>>
> > >>> dave
> > >>>
> > >>>
> > >>>
> > >>>>
> > >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <
> jbellis@gmail.com>
> > >>>>> wrote:
> > >>>>>>> Cassandra 2.1 was released in September, which means that if we
> > were
> > >>>>> on
> > >>>>>>> track with our stated goal of six month releases, 3.0 would be
> done
> > >>>>> about
> > >>>>>>> now.  Instead, we haven't even delivered a beta.  The immediate
> > cause
> > >>>>>> this
> > >>>>>>> time is blocking for 8099
> > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
> > >>>>> reality
> > >>>>>> is
> > >>>>>>> that nobody should really be surprised.  Something always comes
> up
> > --
> > >>>>>> we've
> > >>>>>>> averaged about nine months since 1.0, with 2.1 taking an entire
> > year.
> > >>>>>>>
> > >>>>>>> We could make theory align with reality by acknowledging, "if
> nine
> > >>>>> months
> > >>>>>>> is our 'natural' release schedule, then so be it."  But I think
> we
> > >>>>> can
> > >>>>> do
> > >>>>>>> better.
> > >>>>>>>
> > >>>>>>> Broadly speaking, we have two constituencies with Cassandra
> > releases:
> > >>>>>>>
> > >>>>>>> First, we have the users who are building or porting an
> application
> > >>>>> on
> > >>>>>>> Cassandra.  These users want the newest features to make their
> job
> > >>>>>> easier.
> > >>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.  They
> have
> > >>>>> time
> > >>>>>> to
> > >>>>>>> wait for 2.1.x to stabilize while they write their code.  They
> > would
> > >>>>> like
> > >>>>>>> to see us deliver on our six month schedule or even faster.
> > >>>>>>>
> > >>>>>>> Second, we have the users who have an application in production.
> > >>>>> These
> > >>>>>>> users, or their bosses, want Cassandra to be as stable as
> possible.
> > >>>>>>> Assuming they deploy on a stable release like 2.0.12, they don't
> > want
> > >>>>> to
> > >>>>>>> touch it.  They would like to see us release *less* often.
> > (Because
> > >>>>> that
> > >>>>>>> means they have to do less upgrades while remaining in our
> > backwards
> > >>>>>>> compatibility window.)
> > >>>>>>>
> > >>>>>>> With our current "big release every X months" model, these users'
> > >>>>> needs
> > >>>>>> are
> > >>>>>>> in tension.
> > >>>>>>>
> > >>>>>>> We discussed this six months ago, and ended up with this:
> > >>>>>>>
> > >>>>>>> What if we tried a [four month] release cycle, BUT we would
> > guarantee
> > >>>>>> that
> > >>>>>>>> you could do a rolling upgrade until we bump the supermajor
> > version?
> > >>>>> So
> > >>>>>> 2.0
> > >>>>>>>> could upgrade to 3.0 without having to go through 2.1.  (But to
> go
> > >>>>> to
> > >>>>>> 3.1
> > >>>>>>>> or 4.0 you would have to go through 3.0.)
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> Crucially, I added
> > >>>>>>>
> > >>>>>>> Whether this is reasonable depends on how fast we can stabilize
> > >>>>> releases.
> > >>>>>>>> 2.1.0 will be a good test of this.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> Unfortunately, even after DataStax hired half a dozen full-time
> > test
> > >>>>>>> engineers, 2.1.0 continued the proud tradition of being unready
> for
> > >>>>>>> production use, with "wait for .5 before upgrading" once again
> > >>>>> looking
> > >>>>>> like
> > >>>>>>> a good guideline.
> > >>>>>>>
> > >>>>>>> I’m starting to think that the entire model of “write a bunch of
> > new
> > >>>>>>> features all at once and then try to stabilize it for release” is
> > >>>>> broken.
> > >>>>>>> We’ve been trying that for years and empirically speaking the
> > >>>>> evidence
> > >>>>> is
> > >>>>>>> that it just doesn’t work, either from a stability standpoint or
> > even
> > >>>>>> just
> > >>>>>>> shipping on time.
> > >>>>>>>
> > >>>>>>> A big reason that it takes us so long to stabilize new releases
> now
> > >>>>> is
> > >>>>>>> that, because our major release cycle is so long, it’s super
> > tempting
> > >>>>> to
> > >>>>>>> slip in “just one” new feature into bugfix releases, and I’m as
> > >>>>> guilty
> > >>>>> of
> > >>>>>>> that as anyone.
> > >>>>>>>
> > >>>>>>> For similar reasons, it’s difficult to do a meaningful freeze
> with
> > >>>>> big
> > >>>>>>> feature releases.  A look at 3.0 shows why: we have 8099 coming,
> > but
> > >>>>> we
> > >>>>>>> also have significant work done (but not finished) on 6230, 7970,
> > >>>>> 6696,
> > >>>>>> and
> > >>>>>>> 6477, all of which are meaningful improvements that address
> > >>>>> demonstrated
> > >>>>>>> user pain.  So if we keep doing what we’ve been doing, our
> choices
> > >>>>> are
> > >>>>> to
> > >>>>>>> either delay 3.0 further while we finish and stabilize these, or
> we
> > >>>>> wait
> > >>>>>>> nine months to a year for the next release.  Either way, one of
> our
> > >>>>>>> constituencies gets disappointed.
> > >>>>>>>
> > >>>>>>> So, I’d like to try something different.  I think we were on the
> > >>>>> right
> > >>>>>>> track with shorter releases with more compatibility.  But I’d
> like
> > to
> > >>>>>> throw
> > >>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock” schedule
> > for
> > >>>>> new
> > >>>>>>> architectures and process shrinks instead of trying to do both at
> > >>>>> once.
> > >>>>>> We
> > >>>>>>> can do something similar here:
> > >>>>>>>
> > >>>>>>> One month releases.  Period.  If it’s not done, it can wait.
> > >>>>>>> *Every other release only accepts bug fixes.*
> > >>>>>>>
> > >>>>>>> By itself, one-month releases are going to dramatically reduce
> the
> > >>>>>>> complexity of testing and debugging new releases -- and bugs that
> > do
> > >>>>> slip
> > >>>>>>> past us will only affect a smaller percentage of users, avoiding
> > the
> > >>>>> “big
> > >>>>>>> release has a bunch of bugs no one has seen before and pretty
> much
> > >>>>>> everyone
> > >>>>>>> is hit by something” scenario.  But by adding in the second
> rule, I
> > >>>>> think
> > >>>>>>> we have a real chance to make a quantum leap here: stable,
> > >>>>>> production-ready
> > >>>>>>> releases every two months.
> > >>>>>>>
> > >>>>>>> So here is my proposal for 3.0:
> > >>>>>>>
> > >>>>>>> We’re just about ready to start serious review of 8099.  When
> > that’s
> > >>>>>> done,
> > >>>>>>> we branch 3.0 and cut a beta and then release candidates.
> Whatever
> > >>>>> isn’t
> > >>>>>>> done by then, has to wait; unlike prior betas, we will only
> accept
> > >>>>> bug
> > >>>>>>> fixes into 3.0 after branching.
> > >>>>>>>
> > >>>>>>> One month after 3.0, we will ship 3.1 (with new features).  At
> the
> > >>>>> same
> > >>>>>>> time, we will branch 3.2.  New features in trunk will go into
> 3.3.
> > >>>>> The
> > >>>>>> 3.2
> > >>>>>>> branch will only get bug fixes.  We will maintain backwards
> > >>>>> compatibility
> > >>>>>>> for all of 3.x; eventually (no less than a year) we will pick a
> > >>>>> release
> > >>>>>> to
> > >>>>>>> be 4.0, and drop deprecated features and old backwards
> > >>>>> compatibilities.
> > >>>>>>> Otherwise there will be nothing special about the 4.0
> designation.
> > >>>>> (Note
> > >>>>>>> that with an “odd releases have new features, even releases only
> > have
> > >>>>> bug
> > >>>>>>> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> > >>>>>>>
> > >>>>>>> Larger features can continue to be developed in separate
> branches,
> > >>>>> the
> > >>>>>> way
> > >>>>>>> 8099 is being worked on today, and committed to trunk when ready.
> > So
> > >>>>>> this
> > >>>>>>> is not saying that we are limited only to features we can build
> in
> > a
> > >>>>>> single
> > >>>>>>> month.
> > >>>>>>>
> > >>>>>>> Some things will have to change with our dev process, for the
> > better.
> > >>>>> In
> > >>>>>>> particular, with one month to commit new features, we don’t have
> > room
> > >>>>> for
> > >>>>>>> committing sloppy work and stabilizing it later.  Trunk has to be
> > >>>>> stable
> > >>>>>> at
> > >>>>>>> all times.  I asked Ariel Weisberg to put together his thoughts
> > >>>>>> separately
> > >>>>>>> on what worked for his team at VoltDB, and how we can apply that
> to
> > >>>>>>> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
> > >>>>> (TLDR:
> > >>>>>>> Redefine “done” to include automated tests.  Infrastructure to
> run
> > >>>>> tests
> > >>>>>>> against github branches before merging to trunk.  A new test
> > harness
> > >>>>> for
> > >>>>>>> long-running regression tests.)
> > >>>>>>>
> > >>>>>>> I’m optimistic that as we improve our process this way, our even
> > >>>>> releases
> > >>>>>>> will become increasingly stable.  If so, we can skip sub-minor
> > >>>>> releases
> > >>>>>>> (3.2.x) entirely, and focus on keeping the release train moving.
> > In
> > >>>>> the
> > >>>>>>> meantime, we will continue delivering 2.1.x stability releases.
> > >>>>>>>
> > >>>>>>> This won’t be an entirely smooth transition.  In particular, you
> > will
> > >>>>>> have
> > >>>>>>> noticed that 3.1 will get more than a month’s worth of new
> features
> > >>>>> while
> > >>>>>>> we stabilize 3.0 as the last of the old way of doing things, so
> > some
> > >>>>>>> patience is in order as we try this out.  By 3.4 and 3.6 later
> this
> > >>>>> year
> > >>>>>> we
> > >>>>>>> should have a good idea if this is working, and we can make
> > >>>>> adjustments
> > >>>>>> as
> > >>>>>>> warranted.
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Jonathan Ellis
> > >>>>>>> Project Chair, Apache Cassandra
> > >>>>>>> co-founder, http://www.datastax.com
> > >>>>>>> @spyced
> > >>>>>
> > >>>>
> > >>>
> > >
> >
> >
>



-- 
Thanks,
Phil Yang

Re: 3.0 and the Cassandra release process

Posted by Pavel Yaskevich <po...@gmail.com>.

+1

On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman <
mkjellman@internalcircle.com> wrote:

> For most of my life I’ve lived on the software bleeding edge both
> personally and professionally. Maybe it’s a personal weakness, but I guess
> I get a thrill out of the problem solving aspect?
>
> Recently I came to a bit of an epiphany — the closer I keep to the daily
> build — generally the happier I am on a daily basis. Bugs happen, but for
> the most part (aside from show stopper bugs), pain points for myself in a
> given daily build can generally can be debugged to 1 or maybe 2 root
> causes, fixed in ~24 hours, and then life is better the next day again. In
> comparison, the old waterfall model generally means taking an “official”
> release at some point and waiting for some poor soul (or developer) to
> actually run the thing. No matter how good the QA team is, until it’s
> actually used in the real world, most bugs aren’t found.
>
> If you and your organization can wait 24 hours * number of bugs discovered
> after people actually started using the thing, you end up with a “usable
> build” around the holy-grail minor X.X.5 release of Cassandra.
>
> I love the idea of the LTS model Jonathan describes because it means more
> code can get real testing and “bake” for longer instead of sitting largely
> unused on some git repository in a datacenter far far away. A lot of code
> has changed between 2.0 and trunk today. The code has diverged to the point
> that if you write something for 2.0 (as the most stable major branch
> currently available), merging it forward to 3.0 or after generally means
> rewriting it. If the only thing that comes out of this is a smaller delta
> of LOC between the deployable version/branch and what we can develop
> against and what QA is focused on I think that’s a massive win.
>
> Something like CASSANDRA-8099 will need 2x the baking time of even many of
> the more risky changes the project has made. While I wouldn’t want to run a
> build with CASSANDRA-8099 in it anytime soon, there are now hundreds of
> other changes blocked, most likely many containing new bugs of their own,
> but have no exposure at all to even the most involved C* developers.
>
> I really think this will be a huge win for the project and I’m super
> thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this
> change to a much more sustainable release model for the entire community.
>
> best,
> kjellman
>
>
> > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <ar...@datastax.com>
> wrote:
> >
> > Hi,
> >
> > Keep in mind it is a bug fix release every month and a feature release
> every two months.
> >
> > For development that is really a two month cycle with all bug fixes
> being backported one release. As a developer if you want to get something
> in a release you have two months and you should be sizing pieces of large
> tasks so they ship at least every two months.
> >
> > Ariel
> >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <ts...@gmail.com>
> wrote:
> >>
> >> I like the idea but I agree that every month is a bit aggressive. I
> have no
> >> say but:
> >>
> >> I would say 4 releases a year instead of 12. with 2 months of new
> features
> >> and 1 month of bug squashing per a release. With the 4th quarter just
> bugs.
> >>
> >> I would also proposed 2 year LTS releases for the releases after the 4th
> >> quarter. So everyone could get a new feature release every quarter and
> the
> >> stability of super major versions for 2 years.
> >>
> >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <dbrosius@mebigfatguy.com
> >
> >> wrote:
> >>
> >>> It would seem the practical implications of this is that there would be
> >>> significantly more development on branches, with potentially more
> >>> significant delays on merging these branches. This would imply to me
> that
> >>> more Jenkins servers would need to be set up to handle auto-testing of
> more
> >>> branches, as if feature work spends more time on external branches, it
> is
> >>> then likely to be be less tested (even if by accident) as less
> developers
> >>> would be working on that branch. Only when a feature was blessed to
> make it
> >>> to the release-tracked branch, would it become exposed to the majority
> of
> >>> developers/testers, etc doing normal running/playing/testing.
> >>>
> >>> This isn't to knock the idea in anyway, just wanted to mention what i
> >>> think the outcome would be.
> >>>
> >>> dave
> >>>
> >>>
> >>>
> >>>>
> >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com>
> >>>>> wrote:
> >>>>>>> Cassandra 2.1 was released in September, which means that if we
> were
> >>>>> on
> >>>>>>> track with our stated goal of six month releases, 3.0 would be done
> >>>>> about
> >>>>>>> now.  Instead, we haven't even delivered a beta.  The immediate
> cause
> >>>>>> this
> >>>>>>> time is blocking for 8099
> >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
> >>>>> reality
> >>>>>> is
> >>>>>>> that nobody should really be surprised.  Something always comes up
> --
> >>>>>> we've
> >>>>>>> averaged about nine months since 1.0, with 2.1 taking an entire
> year.
> >>>>>>>
> >>>>>>> We could make theory align with reality by acknowledging, "if nine
> >>>>> months
> >>>>>>> is our 'natural' release schedule, then so be it."  But I think we
> >>>>> can
> >>>>> do
> >>>>>>> better.
> >>>>>>>
> >>>>>>> Broadly speaking, we have two constituencies with Cassandra
> releases:
> >>>>>>>
> >>>>>>> First, we have the users who are building or porting an application
> >>>>> on
> >>>>>>> Cassandra.  These users want the newest features to make their job
> >>>>>> easier.
> >>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.  They have
> >>>>> time
> >>>>>> to
> >>>>>>> wait for 2.1.x to stabilize while they write their code.  They
> would
> >>>>> like
> >>>>>>> to see us deliver on our six month schedule or even faster.
> >>>>>>>
> >>>>>>> Second, we have the users who have an application in production.
> >>>>> These
> >>>>>>> users, or their bosses, want Cassandra to be as stable as possible.
> >>>>>>> Assuming they deploy on a stable release like 2.0.12, they don't
> want
> >>>>> to
> >>>>>>> touch it.  They would like to see us release *less* often.
> (Because
> >>>>> that
> >>>>>>> means they have to do less upgrades while remaining in our
> backwards
> >>>>>>> compatibility window.)
> >>>>>>>
> >>>>>>> With our current "big release every X months" model, these users'
> >>>>> needs
> >>>>>> are
> >>>>>>> in tension.
> >>>>>>>
> >>>>>>> We discussed this six months ago, and ended up with this:
> >>>>>>>
> >>>>>>> What if we tried a [four month] release cycle, BUT we would
> guarantee
> >>>>>> that
> >>>>>>>> you could do a rolling upgrade until we bump the supermajor
> version?
> >>>>> So
> >>>>>> 2.0
> >>>>>>>> could upgrade to 3.0 without having to go through 2.1.  (But to go
> >>>>> to
> >>>>>> 3.1
> >>>>>>>> or 4.0 you would have to go through 3.0.)
> >>>>>>>>
> >>>>>>>
> >>>>>>> Crucially, I added
> >>>>>>>
> >>>>>>> Whether this is reasonable depends on how fast we can stabilize
> >>>>> releases.
> >>>>>>>> 2.1.0 will be a good test of this.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Unfortunately, even after DataStax hired half a dozen full-time
> test
> >>>>>>> engineers, 2.1.0 continued the proud tradition of being unready for
> >>>>>>> production use, with "wait for .5 before upgrading" once again
> >>>>> looking
> >>>>>> like
> >>>>>>> a good guideline.
> >>>>>>>
> >>>>>>> I’m starting to think that the entire model of “write a bunch of
> new
> >>>>>>> features all at once and then try to stabilize it for release” is
> >>>>> broken.
> >>>>>>> We’ve been trying that for years and empirically speaking the
> >>>>> evidence
> >>>>> is
> >>>>>>> that it just doesn’t work, either from a stability standpoint or
> even
> >>>>>> just
> >>>>>>> shipping on time.
> >>>>>>>
> >>>>>>> A big reason that it takes us so long to stabilize new releases now
> >>>>> is
> >>>>>>> that, because our major release cycle is so long, it’s super
> tempting
> >>>>> to
> >>>>>>> slip in “just one” new feature into bugfix releases, and I’m as
> >>>>> guilty
> >>>>> of
> >>>>>>> that as anyone.
> >>>>>>>
> >>>>>>> For similar reasons, it’s difficult to do a meaningful freeze with
> >>>>> big
> >>>>>>> feature releases.  A look at 3.0 shows why: we have 8099 coming,
> but
> >>>>> we
> >>>>>>> also have significant work done (but not finished) on 6230, 7970,
> >>>>> 6696,
> >>>>>> and
> >>>>>>> 6477, all of which are meaningful improvements that address
> >>>>> demonstrated
> >>>>>>> user pain.  So if we keep doing what we’ve been doing, our choices
> >>>>> are
> >>>>> to
> >>>>>>> either delay 3.0 further while we finish and stabilize these, or we
> >>>>> wait
> >>>>>>> nine months to a year for the next release.  Either way, one of our
> >>>>>>> constituencies gets disappointed.
> >>>>>>>
> >>>>>>> So, I’d like to try something different.  I think we were on the
> >>>>> right
> >>>>>>> track with shorter releases with more compatibility.  But I’d like
> to
> >>>>>> throw
> >>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock” schedule
> for
> >>>>> new
> >>>>>>> architectures and process shrinks instead of trying to do both at
> >>>>> once.
> >>>>>> We
> >>>>>>> can do something similar here:
> >>>>>>>
> >>>>>>> One month releases.  Period.  If it’s not done, it can wait.
> >>>>>>> *Every other release only accepts bug fixes.*
> >>>>>>>
> >>>>>>> By itself, one-month releases are going to dramatically reduce the
> >>>>>>> complexity of testing and debugging new releases -- and bugs that
> do
> >>>>> slip
> >>>>>>> past us will only affect a smaller percentage of users, avoiding
> the
> >>>>> “big
> >>>>>>> release has a bunch of bugs no one has seen before and pretty much
> >>>>>> everyone
> >>>>>>> is hit by something” scenario.  But by adding in the second rule, I
> >>>>> think
> >>>>>>> we have a real chance to make a quantum leap here: stable,
> >>>>>> production-ready
> >>>>>>> releases every two months.
> >>>>>>>
> >>>>>>> So here is my proposal for 3.0:
> >>>>>>>
> >>>>>>> We’re just about ready to start serious review of 8099.  When
> that’s
> >>>>>> done,
> >>>>>>> we branch 3.0 and cut a beta and then release candidates.  Whatever
> >>>>> isn’t
> >>>>>>> done by then, has to wait; unlike prior betas, we will only accept
> >>>>> bug
> >>>>>>> fixes into 3.0 after branching.
> >>>>>>>
> >>>>>>> One month after 3.0, we will ship 3.1 (with new features).  At the
> >>>>> same
> >>>>>>> time, we will branch 3.2.  New features in trunk will go into 3.3.
> >>>>> The
> >>>>>> 3.2
> >>>>>>> branch will only get bug fixes.  We will maintain backwards
> >>>>> compatibility
> >>>>>>> for all of 3.x; eventually (no less than a year) we will pick a
> >>>>> release
> >>>>>> to
> >>>>>>> be 4.0, and drop deprecated features and old backwards
> >>>>> compatibilities.
> >>>>>>> Otherwise there will be nothing special about the 4.0 designation.
> >>>>> (Note
> >>>>>>> that with an “odd releases have new features, even releases only
> have
> >>>>> bug
> >>>>>>> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> >>>>>>>
> >>>>>>> Larger features can continue to be developed in separate branches,
> >>>>> the
> >>>>>> way
> >>>>>>> 8099 is being worked on today, and committed to trunk when ready.
> So
> >>>>>> this
> >>>>>>> is not saying that we are limited only to features we can build in
> a
> >>>>>> single
> >>>>>>> month.
> >>>>>>>
> >>>>>>> Some things will have to change with our dev process, for the
> better.
> >>>>> In
> >>>>>>> particular, with one month to commit new features, we don’t have
> room
> >>>>> for
> >>>>>>> committing sloppy work and stabilizing it later.  Trunk has to be
> >>>>> stable
> >>>>>> at
> >>>>>>> all times.  I asked Ariel Weisberg to put together his thoughts
> >>>>>> separately
> >>>>>>> on what worked for his team at VoltDB, and how we can apply that to
> >>>>>>> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
> >>>>> (TLDR:
> >>>>>>> Redefine “done” to include automated tests.  Infrastructure to run
> >>>>> tests
> >>>>>>> against github branches before merging to trunk.  A new test
> harness
> >>>>> for
> >>>>>>> long-running regression tests.)
> >>>>>>>
> >>>>>>> I’m optimistic that as we improve our process this way, our even
> >>>>> releases
> >>>>>>> will become increasingly stable.  If so, we can skip sub-minor
> >>>>> releases
> >>>>>>> (3.2.x) entirely, and focus on keeping the release train moving.
> In
> >>>>> the
> >>>>>>> meantime, we will continue delivering 2.1.x stability releases.
> >>>>>>>
> >>>>>>> This won’t be an entirely smooth transition.  In particular, you
> will
> >>>>>> have
> >>>>>>> noticed that 3.1 will get more than a month’s worth of new features
> >>>>> while
> >>>>>>> we stabilize 3.0 as the last of the old way of doing things, so
> some
> >>>>>>> patience is in order as we try this out.  By 3.4 and 3.6 later this
> >>>>> year
> >>>>>> we
> >>>>>>> should have a good idea if this is working, and we can make
> >>>>> adjustments
> >>>>>> as
> >>>>>>> warranted.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jonathan Ellis
> >>>>>>> Project Chair, Apache Cassandra
> >>>>>>> co-founder, http://www.datastax.com
> >>>>>>> @spyced
> >>>>>
> >>>>
> >>>
> >
>
>

Re: 3.0 and the Cassandra release process

Posted by Michael Kjellman <mk...@internalcircle.com>.

For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect?

Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found.

If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra.

I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of sitting largely unused on some git repository in a datacenter far far away. A lot of code has changed between 2.0 and trunk today. The code has diverged to the point that if you write something for 2.0 (as the most stable major branch currently available), merging it forward to 3.0 or after generally means rewriting it. If the only thing that comes out of this is a smaller delta of LOC between the deployable version/branch and what we can develop against and what QA is focused on I think that’s a massive win.

Something like CASSANDRA-8099 will need 2x the baking time of even many of the more risky changes the project has made. While I wouldn’t want to run a build with CASSANDRA-8099 in it anytime soon, there are now hundreds of other changes blocked, most likely many containing new bugs of their own, but have no exposure at all to even the most involved C* developers.

I really think this will be a huge win for the project and I’m super thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this change to a much more sustainable release model for the entire community.

best,
kjellman

 
> On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <ar...@datastax.com> wrote:
> 
> Hi,
> 
> Keep in mind it is a bug fix release every month and a feature release every two months.
> 
> For development that is really a two month cycle with all bug fixes being backported one release. As a developer if you want to get something in a release you have two months and you should be sizing pieces of large tasks so they ship at least every two months.
> 
> Ariel
>> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <ts...@gmail.com> wrote:
>> 
>> I like the idea but I agree that every month is a bit aggressive. I have no
>> say but:
>> 
>> I would say 4 releases a year instead of 12. with 2 months of new features
>> and 1 month of bug squashing per a release. With the 4th quarter just bugs.
>> 
>> I would also proposed 2 year LTS releases for the releases after the 4th
>> quarter. So everyone could get a new feature release every quarter and the
>> stability of super major versions for 2 years.
>> 
>> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <db...@mebigfatguy.com>
>> wrote:
>> 
>>> It would seem the practical implications of this is that there would be
>>> significantly more development on branches, with potentially more
>>> significant delays on merging these branches. This would imply to me that
>>> more Jenkins servers would need to be set up to handle auto-testing of more
>>> branches, as if feature work spends more time on external branches, it is
>>> then likely to be be less tested (even if by accident) as less developers
>>> would be working on that branch. Only when a feature was blessed to make it
>>> to the release-tracked branch, would it become exposed to the majority of
>>> developers/testers, etc doing normal running/playing/testing.
>>> 
>>> This isn't to knock the idea in anyway, just wanted to mention what i
>>> think the outcome would be.
>>> 
>>> dave
>>> 
>>> 
>>> 
>>>> 
>>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com>
>>>>> wrote:
>>>>>>> Cassandra 2.1 was released in September, which means that if we were
>>>>> on
>>>>>>> track with our stated goal of six month releases, 3.0 would be done
>>>>> about
>>>>>>> now.  Instead, we haven't even delivered a beta.  The immediate cause
>>>>>> this
>>>>>>> time is blocking for 8099
>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
>>>>> reality
>>>>>> is
>>>>>>> that nobody should really be surprised.  Something always comes up --
>>>>>> we've
>>>>>>> averaged about nine months since 1.0, with 2.1 taking an entire year.
>>>>>>> 
>>>>>>> We could make theory align with reality by acknowledging, "if nine
>>>>> months
>>>>>>> is our 'natural' release schedule, then so be it."  But I think we
>>>>> can
>>>>> do
>>>>>>> better.
>>>>>>> 
>>>>>>> Broadly speaking, we have two constituencies with Cassandra releases:
>>>>>>> 
>>>>>>> First, we have the users who are building or porting an application
>>>>> on
>>>>>>> Cassandra.  These users want the newest features to make their job
>>>>>> easier.
>>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.  They have
>>>>> time
>>>>>> to
>>>>>>> wait for 2.1.x to stabilize while they write their code.  They would
>>>>> like
>>>>>>> to see us deliver on our six month schedule or even faster.
>>>>>>> 
>>>>>>> Second, we have the users who have an application in production.
>>>>> These
>>>>>>> users, or their bosses, want Cassandra to be as stable as possible.
>>>>>>> Assuming they deploy on a stable release like 2.0.12, they don't want
>>>>> to
>>>>>>> touch it.  They would like to see us release *less* often.  (Because
>>>>> that
>>>>>>> means they have to do less upgrades while remaining in our backwards
>>>>>>> compatibility window.)
>>>>>>> 
>>>>>>> With our current "big release every X months" model, these users'
>>>>> needs
>>>>>> are
>>>>>>> in tension.
>>>>>>> 
>>>>>>> We discussed this six months ago, and ended up with this:
>>>>>>> 
>>>>>>> What if we tried a [four month] release cycle, BUT we would guarantee
>>>>>> that
>>>>>>>> you could do a rolling upgrade until we bump the supermajor version?
>>>>> So
>>>>>> 2.0
>>>>>>>> could upgrade to 3.0 without having to go through 2.1.  (But to go
>>>>> to
>>>>>> 3.1
>>>>>>>> or 4.0 you would have to go through 3.0.)
>>>>>>>> 
>>>>>>> 
>>>>>>> Crucially, I added
>>>>>>> 
>>>>>>> Whether this is reasonable depends on how fast we can stabilize
>>>>> releases.
>>>>>>>> 2.1.0 will be a good test of this.
>>>>>>>> 
>>>>>>> 
>>>>>>> Unfortunately, even after DataStax hired half a dozen full-time test
>>>>>>> engineers, 2.1.0 continued the proud tradition of being unready for
>>>>>>> production use, with "wait for .5 before upgrading" once again
>>>>> looking
>>>>>> like
>>>>>>> a good guideline.
>>>>>>> 
>>>>>>> I’m starting to think that the entire model of “write a bunch of new
>>>>>>> features all at once and then try to stabilize it for release” is
>>>>> broken.
>>>>>>> We’ve been trying that for years and empirically speaking the
>>>>> evidence
>>>>> is
>>>>>>> that it just doesn’t work, either from a stability standpoint or even
>>>>>> just
>>>>>>> shipping on time.
>>>>>>> 
>>>>>>> A big reason that it takes us so long to stabilize new releases now
>>>>> is
>>>>>>> that, because our major release cycle is so long, it’s super tempting
>>>>> to
>>>>>>> slip in “just one” new feature into bugfix releases, and I’m as
>>>>> guilty
>>>>> of
>>>>>>> that as anyone.
>>>>>>> 
>>>>>>> For similar reasons, it’s difficult to do a meaningful freeze with
>>>>> big
>>>>>>> feature releases.  A look at 3.0 shows why: we have 8099 coming, but
>>>>> we
>>>>>>> also have significant work done (but not finished) on 6230, 7970,
>>>>> 6696,
>>>>>> and
>>>>>>> 6477, all of which are meaningful improvements that address
>>>>> demonstrated
>>>>>>> user pain.  So if we keep doing what we’ve been doing, our choices
>>>>> are
>>>>> to
>>>>>>> either delay 3.0 further while we finish and stabilize these, or we
>>>>> wait
>>>>>>> nine months to a year for the next release.  Either way, one of our
>>>>>>> constituencies gets disappointed.
>>>>>>> 
>>>>>>> So, I’d like to try something different.  I think we were on the
>>>>> right
>>>>>>> track with shorter releases with more compatibility.  But I’d like to
>>>>>> throw
>>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock” schedule for
>>>>> new
>>>>>>> architectures and process shrinks instead of trying to do both at
>>>>> once.
>>>>>> We
>>>>>>> can do something similar here:
>>>>>>> 
>>>>>>> One month releases.  Period.  If it’s not done, it can wait.
>>>>>>> *Every other release only accepts bug fixes.*
>>>>>>> 
>>>>>>> By itself, one-month releases are going to dramatically reduce the
>>>>>>> complexity of testing and debugging new releases -- and bugs that do
>>>>> slip
>>>>>>> past us will only affect a smaller percentage of users, avoiding the
>>>>> “big
>>>>>>> release has a bunch of bugs no one has seen before and pretty much
>>>>>> everyone
>>>>>>> is hit by something” scenario.  But by adding in the second rule, I
>>>>> think
>>>>>>> we have a real chance to make a quantum leap here: stable,
>>>>>> production-ready
>>>>>>> releases every two months.
>>>>>>> 
>>>>>>> So here is my proposal for 3.0:
>>>>>>> 
>>>>>>> We’re just about ready to start serious review of 8099.  When that’s
>>>>>> done,
>>>>>>> we branch 3.0 and cut a beta and then release candidates.  Whatever
>>>>> isn’t
>>>>>>> done by then, has to wait; unlike prior betas, we will only accept
>>>>> bug
>>>>>>> fixes into 3.0 after branching.
>>>>>>> 
>>>>>>> One month after 3.0, we will ship 3.1 (with new features).  At the
>>>>> same
>>>>>>> time, we will branch 3.2.  New features in trunk will go into 3.3.
>>>>> The
>>>>>> 3.2
>>>>>>> branch will only get bug fixes.  We will maintain backwards
>>>>> compatibility
>>>>>>> for all of 3.x; eventually (no less than a year) we will pick a
>>>>> release
>>>>>> to
>>>>>>> be 4.0, and drop deprecated features and old backwards
>>>>> compatibilities.
>>>>>>> Otherwise there will be nothing special about the 4.0 designation.
>>>>> (Note
>>>>>>> that with an “odd releases have new features, even releases only have
>>>>> bug
>>>>>>> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
>>>>>>> 
>>>>>>> Larger features can continue to be developed in separate branches,
>>>>> the
>>>>>> way
>>>>>>> 8099 is being worked on today, and committed to trunk when ready.  So
>>>>>> this
>>>>>>> is not saying that we are limited only to features we can build in a
>>>>>> single
>>>>>>> month.
>>>>>>> 
>>>>>>> Some things will have to change with our dev process, for the better.
>>>>> In
>>>>>>> particular, with one month to commit new features, we don’t have room
>>>>> for
>>>>>>> committing sloppy work and stabilizing it later.  Trunk has to be
>>>>> stable
>>>>>> at
>>>>>>> all times.  I asked Ariel Weisberg to put together his thoughts
>>>>>> separately
>>>>>>> on what worked for his team at VoltDB, and how we can apply that to
>>>>>>> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
>>>>> (TLDR:
>>>>>>> Redefine “done” to include automated tests.  Infrastructure to run
>>>>> tests
>>>>>>> against github branches before merging to trunk.  A new test harness
>>>>> for
>>>>>>> long-running regression tests.)
>>>>>>> 
>>>>>>> I’m optimistic that as we improve our process this way, our even
>>>>> releases
>>>>>>> will become increasingly stable.  If so, we can skip sub-minor
>>>>> releases
>>>>>>> (3.2.x) entirely, and focus on keeping the release train moving.  In
>>>>> the
>>>>>>> meantime, we will continue delivering 2.1.x stability releases.
>>>>>>> 
>>>>>>> This won’t be an entirely smooth transition.  In particular, you will
>>>>>> have
>>>>>>> noticed that 3.1 will get more than a month’s worth of new features
>>>>> while
>>>>>>> we stabilize 3.0 as the last of the old way of doing things, so some
>>>>>>> patience is in order as we try this out.  By 3.4 and 3.6 later this
>>>>> year
>>>>>> we
>>>>>>> should have a good idea if this is working, and we can make
>>>>> adjustments
>>>>>> as
>>>>>>> warranted.
>>>>>>> 
>>>>>>> --
>>>>>>> Jonathan Ellis
>>>>>>> Project Chair, Apache Cassandra
>>>>>>> co-founder, http://www.datastax.com
>>>>>>> @spyced
>>>>> 
>>>> 
>>> 
>

Re: 3.0 and the Cassandra release process

Posted by Ariel Weisberg <ar...@datastax.com>.

Hi,

Keep in mind it is a bug fix release every month and a feature release every two months.

For development that is really a two month cycle with all bug fixes being backported one release. As a developer if you want to get something in a release you have two months and you should be sizing pieces of large tasks so they ship at least every two months.

Ariel
> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <ts...@gmail.com> wrote:
> 
> I like the idea but I agree that every month is a bit aggressive. I have no
> say but:
> 
> I would say 4 releases a year instead of 12. with 2 months of new features
> and 1 month of bug squashing per a release. With the 4th quarter just bugs.
> 
> I would also proposed 2 year LTS releases for the releases after the 4th
> quarter. So everyone could get a new feature release every quarter and the
> stability of super major versions for 2 years.
> 
> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <db...@mebigfatguy.com>
> wrote:
> 
>> It would seem the practical implications of this is that there would be
>> significantly more development on branches, with potentially more
>> significant delays on merging these branches. This would imply to me that
>> more Jenkins servers would need to be set up to handle auto-testing of more
>> branches, as if feature work spends more time on external branches, it is
>> then likely to be be less tested (even if by accident) as less developers
>> would be working on that branch. Only when a feature was blessed to make it
>> to the release-tracked branch, would it become exposed to the majority of
>> developers/testers, etc doing normal running/playing/testing.
>> 
>> This isn't to knock the idea in anyway, just wanted to mention what i
>> think the outcome would be.
>> 
>> dave
>> 
>> 
>> 
>>> 
>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com>
>>>> wrote:
>>>>>> Cassandra 2.1 was released in September, which means that if we were
>>>> on
>>>>>> track with our stated goal of six month releases, 3.0 would be done
>>>> about
>>>>>> now.  Instead, we haven't even delivered a beta.  The immediate cause
>>>>> this
>>>>>> time is blocking for 8099
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
>>>> reality
>>>>> is
>>>>>> that nobody should really be surprised.  Something always comes up --
>>>>> we've
>>>>>> averaged about nine months since 1.0, with 2.1 taking an entire year.
>>>>>> 
>>>>>> We could make theory align with reality by acknowledging, "if nine
>>>> months
>>>>>> is our 'natural' release schedule, then so be it."  But I think we
>>>> can
>>>> do
>>>>>> better.
>>>>>> 
>>>>>> Broadly speaking, we have two constituencies with Cassandra releases:
>>>>>> 
>>>>>> First, we have the users who are building or porting an application
>>>> on
>>>>>> Cassandra.  These users want the newest features to make their job
>>>>> easier.
>>>>>> If 2.1.0 has a few bugs, it's not the end of the world.  They have
>>>> time
>>>>> to
>>>>>> wait for 2.1.x to stabilize while they write their code.  They would
>>>> like
>>>>>> to see us deliver on our six month schedule or even faster.
>>>>>> 
>>>>>> Second, we have the users who have an application in production.
>>>> These
>>>>>> users, or their bosses, want Cassandra to be as stable as possible.
>>>>>> Assuming they deploy on a stable release like 2.0.12, they don't want
>>>> to
>>>>>> touch it.  They would like to see us release *less* often.  (Because
>>>> that
>>>>>> means they have to do less upgrades while remaining in our backwards
>>>>>> compatibility window.)
>>>>>> 
>>>>>> With our current "big release every X months" model, these users'
>>>> needs
>>>>> are
>>>>>> in tension.
>>>>>> 
>>>>>> We discussed this six months ago, and ended up with this:
>>>>>> 
>>>>>> What if we tried a [four month] release cycle, BUT we would guarantee
>>>>> that
>>>>>>> you could do a rolling upgrade until we bump the supermajor version?
>>>> So
>>>>> 2.0
>>>>>>> could upgrade to 3.0 without having to go through 2.1.  (But to go
>>>> to
>>>>> 3.1
>>>>>>> or 4.0 you would have to go through 3.0.)
>>>>>>> 
>>>>>> 
>>>>>> Crucially, I added
>>>>>> 
>>>>>> Whether this is reasonable depends on how fast we can stabilize
>>>> releases.
>>>>>>> 2.1.0 will be a good test of this.
>>>>>>> 
>>>>>> 
>>>>>> Unfortunately, even after DataStax hired half a dozen full-time test
>>>>>> engineers, 2.1.0 continued the proud tradition of being unready for
>>>>>> production use, with "wait for .5 before upgrading" once again
>>>> looking
>>>>> like
>>>>>> a good guideline.
>>>>>> 
>>>>>> I’m starting to think that the entire model of “write a bunch of new
>>>>>> features all at once and then try to stabilize it for release” is
>>>> broken.
>>>>>> We’ve been trying that for years and empirically speaking the
>>>> evidence
>>>> is
>>>>>> that it just doesn’t work, either from a stability standpoint or even
>>>>> just
>>>>>> shipping on time.
>>>>>> 
>>>>>> A big reason that it takes us so long to stabilize new releases now
>>>> is
>>>>>> that, because our major release cycle is so long, it’s super tempting
>>>> to
>>>>>> slip in “just one” new feature into bugfix releases, and I’m as
>>>> guilty
>>>> of
>>>>>> that as anyone.
>>>>>> 
>>>>>> For similar reasons, it’s difficult to do a meaningful freeze with
>>>> big
>>>>>> feature releases.  A look at 3.0 shows why: we have 8099 coming, but
>>>> we
>>>>>> also have significant work done (but not finished) on 6230, 7970,
>>>> 6696,
>>>>> and
>>>>>> 6477, all of which are meaningful improvements that address
>>>> demonstrated
>>>>>> user pain.  So if we keep doing what we’ve been doing, our choices
>>>> are
>>>> to
>>>>>> either delay 3.0 further while we finish and stabilize these, or we
>>>> wait
>>>>>> nine months to a year for the next release.  Either way, one of our
>>>>>> constituencies gets disappointed.
>>>>>> 
>>>>>> So, I’d like to try something different.  I think we were on the
>>>> right
>>>>>> track with shorter releases with more compatibility.  But I’d like to
>>>>> throw
>>>>>> in a twist.  Intel cuts down on risk with a “tick-tock” schedule for
>>>> new
>>>>>> architectures and process shrinks instead of trying to do both at
>>>> once.
>>>>> We
>>>>>> can do something similar here:
>>>>>> 
>>>>>> One month releases.  Period.  If it’s not done, it can wait.
>>>>>> *Every other release only accepts bug fixes.*
>>>>>> 
>>>>>> By itself, one-month releases are going to dramatically reduce the
>>>>>> complexity of testing and debugging new releases -- and bugs that do
>>>> slip
>>>>>> past us will only affect a smaller percentage of users, avoiding the
>>>> “big
>>>>>> release has a bunch of bugs no one has seen before and pretty much
>>>>> everyone
>>>>>> is hit by something” scenario.  But by adding in the second rule, I
>>>> think
>>>>>> we have a real chance to make a quantum leap here: stable,
>>>>> production-ready
>>>>>> releases every two months.
>>>>>> 
>>>>>> So here is my proposal for 3.0:
>>>>>> 
>>>>>> We’re just about ready to start serious review of 8099.  When that’s
>>>>> done,
>>>>>> we branch 3.0 and cut a beta and then release candidates.  Whatever
>>>> isn’t
>>>>>> done by then, has to wait; unlike prior betas, we will only accept
>>>> bug
>>>>>> fixes into 3.0 after branching.
>>>>>> 
>>>>>> One month after 3.0, we will ship 3.1 (with new features).  At the
>>>> same
>>>>>> time, we will branch 3.2.  New features in trunk will go into 3.3.
>>>> The
>>>>> 3.2
>>>>>> branch will only get bug fixes.  We will maintain backwards
>>>> compatibility
>>>>>> for all of 3.x; eventually (no less than a year) we will pick a
>>>> release
>>>>> to
>>>>>> be 4.0, and drop deprecated features and old backwards
>>>> compatibilities.
>>>>>> Otherwise there will be nothing special about the 4.0 designation.
>>>> (Note
>>>>>> that with an “odd releases have new features, even releases only have
>>>> bug
>>>>>> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
>>>>>> 
>>>>>> Larger features can continue to be developed in separate branches,
>>>> the
>>>>> way
>>>>>> 8099 is being worked on today, and committed to trunk when ready.  So
>>>>> this
>>>>>> is not saying that we are limited only to features we can build in a
>>>>> single
>>>>>> month.
>>>>>> 
>>>>>> Some things will have to change with our dev process, for the better.
>>>> In
>>>>>> particular, with one month to commit new features, we don’t have room
>>>> for
>>>>>> committing sloppy work and stabilizing it later.  Trunk has to be
>>>> stable
>>>>> at
>>>>>> all times.  I asked Ariel Weisberg to put together his thoughts
>>>>> separately
>>>>>> on what worked for his team at VoltDB, and how we can apply that to
>>>>>> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
>>>> (TLDR:
>>>>>> Redefine “done” to include automated tests.  Infrastructure to run
>>>> tests
>>>>>> against github branches before merging to trunk.  A new test harness
>>>> for
>>>>>> long-running regression tests.)
>>>>>> 
>>>>>> I’m optimistic that as we improve our process this way, our even
>>>> releases
>>>>>> will become increasingly stable.  If so, we can skip sub-minor
>>>> releases
>>>>>> (3.2.x) entirely, and focus on keeping the release train moving.  In
>>>> the
>>>>>> meantime, we will continue delivering 2.1.x stability releases.
>>>>>> 
>>>>>> This won’t be an entirely smooth transition.  In particular, you will
>>>>> have
>>>>>> noticed that 3.1 will get more than a month’s worth of new features
>>>> while
>>>>>> we stabilize 3.0 as the last of the old way of doing things, so some
>>>>>> patience is in order as we try this out.  By 3.4 and 3.6 later this
>>>> year
>>>>> we
>>>>>> should have a good idea if this is working, and we can make
>>>> adjustments
>>>>> as
>>>>>> warranted.
>>>>>> 
>>>>>> --
>>>>>> Jonathan Ellis
>>>>>> Project Chair, Apache Cassandra
>>>>>> co-founder, http://www.datastax.com
>>>>>> @spyced
>>>> 
>>> 
>>

Re: 3.0 and the Cassandra release process

Posted by Terrance Shepherd <ts...@gmail.com>.

I like the idea but I agree that every month is a bit aggressive. I have no
say but:

I would say 4 releases a year instead of 12. with 2 months of new features
and 1 month of bug squashing per a release. With the 4th quarter just bugs.

I would also proposed 2 year LTS releases for the releases after the 4th
quarter. So everyone could get a new feature release every quarter and the
stability of super major versions for 2 years.

On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <db...@mebigfatguy.com>
wrote:

> It would seem the practical implications of this is that there would be
> significantly more development on branches, with potentially more
> significant delays on merging these branches. This would imply to me that
> more Jenkins servers would need to be set up to handle auto-testing of more
> branches, as if feature work spends more time on external branches, it is
> then likely to be be less tested (even if by accident) as less developers
> would be working on that branch. Only when a feature was blessed to make it
> to the release-tracked branch, would it become exposed to the majority of
> developers/testers, etc doing normal running/playing/testing.
>
> This isn't to knock the idea in anyway, just wanted to mention what i
> think the outcome would be.
>
> dave
>
>
>
>  >
>>> > On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com>
>>> wrote:
>>> > > Cassandra 2.1 was released in September, which means that if we were
>>> on
>>> > > track with our stated goal of six month releases, 3.0 would be done
>>> about
>>> > > now.  Instead, we haven't even delivered a beta.  The immediate cause
>>> > this
>>> > > time is blocking for 8099
>>> > > <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
>>> reality
>>> > is
>>> > > that nobody should really be surprised.  Something always comes up --
>>> > we've
>>> > > averaged about nine months since 1.0, with 2.1 taking an entire year.
>>> > >
>>> > > We could make theory align with reality by acknowledging, "if nine
>>> months
>>> > > is our 'natural' release schedule, then so be it."  But I think we
>>> can
>>> do
>>> > > better.
>>> > >
>>> > > Broadly speaking, we have two constituencies with Cassandra releases:
>>> > >
>>> > > First, we have the users who are building or porting an application
>>> on
>>> > > Cassandra.  These users want the newest features to make their job
>>> > easier.
>>> > > If 2.1.0 has a few bugs, it's not the end of the world.  They have
>>> time
>>> > to
>>> > > wait for 2.1.x to stabilize while they write their code.  They would
>>> like
>>> > > to see us deliver on our six month schedule or even faster.
>>> > >
>>> > > Second, we have the users who have an application in production.
>>> These
>>> > > users, or their bosses, want Cassandra to be as stable as possible.
>>> > > Assuming they deploy on a stable release like 2.0.12, they don't want
>>> to
>>> > > touch it.  They would like to see us release *less* often.  (Because
>>> that
>>> > > means they have to do less upgrades while remaining in our backwards
>>> > > compatibility window.)
>>> > >
>>> > > With our current "big release every X months" model, these users'
>>> needs
>>> > are
>>> > > in tension.
>>> > >
>>> > > We discussed this six months ago, and ended up with this:
>>> > >
>>> > > What if we tried a [four month] release cycle, BUT we would guarantee
>>> > that
>>> > >> you could do a rolling upgrade until we bump the supermajor version?
>>> So
>>> > 2.0
>>> > >> could upgrade to 3.0 without having to go through 2.1.  (But to go
>>> to
>>> > 3.1
>>> > >> or 4.0 you would have to go through 3.0.)
>>> > >>
>>> > >
>>> > > Crucially, I added
>>> > >
>>> > > Whether this is reasonable depends on how fast we can stabilize
>>> releases.
>>> > >> 2.1.0 will be a good test of this.
>>> > >>
>>> > >
>>> > > Unfortunately, even after DataStax hired half a dozen full-time test
>>> > > engineers, 2.1.0 continued the proud tradition of being unready for
>>> > > production use, with "wait for .5 before upgrading" once again
>>> looking
>>> > like
>>> > > a good guideline.
>>> > >
>>> > > I’m starting to think that the entire model of “write a bunch of new
>>> > > features all at once and then try to stabilize it for release” is
>>> broken.
>>> > > We’ve been trying that for years and empirically speaking the
>>> evidence
>>> is
>>> > > that it just doesn’t work, either from a stability standpoint or even
>>> > just
>>> > > shipping on time.
>>> > >
>>> > > A big reason that it takes us so long to stabilize new releases now
>>> is
>>> > > that, because our major release cycle is so long, it’s super tempting
>>> to
>>> > > slip in “just one” new feature into bugfix releases, and I’m as
>>> guilty
>>> of
>>> > > that as anyone.
>>> > >
>>> > > For similar reasons, it’s difficult to do a meaningful freeze with
>>> big
>>> > > feature releases.  A look at 3.0 shows why: we have 8099 coming, but
>>> we
>>> > > also have significant work done (but not finished) on 6230, 7970,
>>> 6696,
>>> > and
>>> > > 6477, all of which are meaningful improvements that address
>>> demonstrated
>>> > > user pain.  So if we keep doing what we’ve been doing, our choices
>>> are
>>> to
>>> > > either delay 3.0 further while we finish and stabilize these, or we
>>> wait
>>> > > nine months to a year for the next release.  Either way, one of our
>>> > > constituencies gets disappointed.
>>> > >
>>> > > So, I’d like to try something different.  I think we were on the
>>> right
>>> > > track with shorter releases with more compatibility.  But I’d like to
>>> > throw
>>> > > in a twist.  Intel cuts down on risk with a “tick-tock” schedule for
>>> new
>>> > > architectures and process shrinks instead of trying to do both at
>>> once.
>>> > We
>>> > > can do something similar here:
>>> > >
>>> > > One month releases.  Period.  If it’s not done, it can wait.
>>> > > *Every other release only accepts bug fixes.*
>>> > >
>>> > > By itself, one-month releases are going to dramatically reduce the
>>> > > complexity of testing and debugging new releases -- and bugs that do
>>> slip
>>> > > past us will only affect a smaller percentage of users, avoiding the
>>> “big
>>> > > release has a bunch of bugs no one has seen before and pretty much
>>> > everyone
>>> > > is hit by something” scenario.  But by adding in the second rule, I
>>> think
>>> > > we have a real chance to make a quantum leap here: stable,
>>> > production-ready
>>> > > releases every two months.
>>> > >
>>> > > So here is my proposal for 3.0:
>>> > >
>>> > > We’re just about ready to start serious review of 8099.  When that’s
>>> > done,
>>> > > we branch 3.0 and cut a beta and then release candidates.  Whatever
>>> isn’t
>>> > > done by then, has to wait; unlike prior betas, we will only accept
>>> bug
>>> > > fixes into 3.0 after branching.
>>> > >
>>> > > One month after 3.0, we will ship 3.1 (with new features).  At the
>>> same
>>> > > time, we will branch 3.2.  New features in trunk will go into 3.3.
>>> The
>>> > 3.2
>>> > > branch will only get bug fixes.  We will maintain backwards
>>> compatibility
>>> > > for all of 3.x; eventually (no less than a year) we will pick a
>>> release
>>> > to
>>> > > be 4.0, and drop deprecated features and old backwards
>>> compatibilities.
>>> > > Otherwise there will be nothing special about the 4.0 designation.
>>> (Note
>>> > > that with an “odd releases have new features, even releases only have
>>> bug
>>> > > fixes” policy, 4.0 will actually be *more* stable than 3.11.)
>>> > >
>>> > > Larger features can continue to be developed in separate branches,
>>> the
>>> > way
>>> > > 8099 is being worked on today, and committed to trunk when ready.  So
>>> > this
>>> > > is not saying that we are limited only to features we can build in a
>>> > single
>>> > > month.
>>> > >
>>> > > Some things will have to change with our dev process, for the better.
>>> In
>>> > > particular, with one month to commit new features, we don’t have room
>>> for
>>> > > committing sloppy work and stabilizing it later.  Trunk has to be
>>> stable
>>> > at
>>> > > all times.  I asked Ariel Weisberg to put together his thoughts
>>> > separately
>>> > > on what worked for his team at VoltDB, and how we can apply that to
>>> > > Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
>>> (TLDR:
>>> > > Redefine “done” to include automated tests.  Infrastructure to run
>>> tests
>>> > > against github branches before merging to trunk.  A new test harness
>>> for
>>> > > long-running regression tests.)
>>> > >
>>> > > I’m optimistic that as we improve our process this way, our even
>>> releases
>>> > > will become increasingly stable.  If so, we can skip sub-minor
>>> releases
>>> > > (3.2.x) entirely, and focus on keeping the release train moving.  In
>>> the
>>> > > meantime, we will continue delivering 2.1.x stability releases.
>>> > >
>>> > > This won’t be an entirely smooth transition.  In particular, you will
>>> > have
>>> > > noticed that 3.1 will get more than a month’s worth of new features
>>> while
>>> > > we stabilize 3.0 as the last of the old way of doing things, so some
>>> > > patience is in order as we try this out.  By 3.4 and 3.6 later this
>>> year
>>> > we
>>> > > should have a good idea if this is working, and we can make
>>> adjustments
>>> > as
>>> > > warranted.
>>> > >
>>> > > --
>>> > > Jonathan Ellis
>>> > > Project Chair, Apache Cassandra
>>> > > co-founder, http://www.datastax.com
>>> > > @spyced
>>>
>>
>

Re: 3.0 and the Cassandra release process

Posted by Ariel Weisberg <ar...@datastax.com>.

Hi,

Long lived feature branches are already a thing and orthogonal IMO to release frequency. The goal is that developers will implement larger features as smaller tested components that have already shipped. Some times this means working in a less destructive fashion so you can always ship a working implementation of everything (which is a mixed bag).

Developers should be able to put their work on trunk faster because they will know before the merge what the impact of their changes will be. That is why we are emphasizing have Jenkin’s run on all commits (trunk and branch). We want the testing that is performed on branches to be as close to the testing performed on trunk. Once something is merged to trunk we want it to be about as tested as it is going to get within a day or two.

Part of releasing more frequently is getting away from relying on developers/testers running things and moving towards automated testing that exercises the database the same way users do with the same expectations of correctness. We also have to address the process issues that are causing the tests we have to demonstrate that trunk is not releasable on a regular basis.

Ariel

> On Mar 18, 2015, at 5:34 PM, Dave Brosius <db...@mebigfatguy.com> wrote:
> 
> It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only when a feature was blessed to make it to the release-tracked branch, would it become exposed to the majority of developers/testers, etc doing normal running/playing/testing.
> 
> This isn't to knock the idea in anyway, just wanted to mention what i think the outcome would be.
> 
> dave
> 
> 
>>> >
>>> > On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com>
>>> wrote:
>>> > > Cassandra 2.1 was released in September, which means that if we were on
>>> > > track with our stated goal of six month releases, 3.0 would be done
>>> about
>>> > > now.  Instead, we haven't even delivered a beta.  The immediate cause
>>> > this
>>> > > time is blocking for 8099
>>> > > <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
>>> reality
>>> > is
>>> > > that nobody should really be surprised.  Something always comes up --
>>> > we've
>>> > > averaged about nine months since 1.0, with 2.1 taking an entire year.
>>> > >
>>> > > We could make theory align with reality by acknowledging, "if nine
>>> months
>>> > > is our 'natural' release schedule, then so be it."  But I think we can
>>> do
>>> > > better.
>>> > >
>>> > > Broadly speaking, we have two constituencies with Cassandra releases:
>>> > >
>>> > > First, we have the users who are building or porting an application on
>>> > > Cassandra.  These users want the newest features to make their job
>>> > easier.
>>> > > If 2.1.0 has a few bugs, it's not the end of the world.  They have time
>>> > to
>>> > > wait for 2.1.x to stabilize while they write their code.  They would
>>> like
>>> > > to see us deliver on our six month schedule or even faster.
>>> > >
>>> > > Second, we have the users who have an application in production.  These
>>> > > users, or their bosses, want Cassandra to be as stable as possible.
>>> > > Assuming they deploy on a stable release like 2.0.12, they don't want
>>> to
>>> > > touch it.  They would like to see us release *less* often.  (Because
>>> that
>>> > > means they have to do less upgrades while remaining in our backwards
>>> > > compatibility window.)
>>> > >
>>> > > With our current "big release every X months" model, these users' needs
>>> > are
>>> > > in tension.
>>> > >
>>> > > We discussed this six months ago, and ended up with this:
>>> > >
>>> > > What if we tried a [four month] release cycle, BUT we would guarantee
>>> > that
>>> > >> you could do a rolling upgrade until we bump the supermajor version?
>>> So
>>> > 2.0
>>> > >> could upgrade to 3.0 without having to go through 2.1.  (But to go to
>>> > 3.1
>>> > >> or 4.0 you would have to go through 3.0.)
>>> > >>
>>> > >
>>> > > Crucially, I added
>>> > >
>>> > > Whether this is reasonable depends on how fast we can stabilize
>>> releases.
>>> > >> 2.1.0 will be a good test of this.
>>> > >>
>>> > >
>>> > > Unfortunately, even after DataStax hired half a dozen full-time test
>>> > > engineers, 2.1.0 continued the proud tradition of being unready for
>>> > > production use, with "wait for .5 before upgrading" once again looking
>>> > like
>>> > > a good guideline.
>>> > >
>>> > > I’m starting to think that the entire model of “write a bunch of new
>>> > > features all at once and then try to stabilize it for release” is
>>> broken.
>>> > > We’ve been trying that for years and empirically speaking the evidence
>>> is
>>> > > that it just doesn’t work, either from a stability standpoint or even
>>> > just
>>> > > shipping on time.
>>> > >
>>> > > A big reason that it takes us so long to stabilize new releases now is
>>> > > that, because our major release cycle is so long, it’s super tempting
>>> to
>>> > > slip in “just one” new feature into bugfix releases, and I’m as guilty
>>> of
>>> > > that as anyone.
>>> > >
>>> > > For similar reasons, it’s difficult to do a meaningful freeze with big
>>> > > feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
>>> > > also have significant work done (but not finished) on 6230, 7970, 6696,
>>> > and
>>> > > 6477, all of which are meaningful improvements that address
>>> demonstrated
>>> > > user pain.  So if we keep doing what we’ve been doing, our choices are
>>> to
>>> > > either delay 3.0 further while we finish and stabilize these, or we
>>> wait
>>> > > nine months to a year for the next release.  Either way, one of our
>>> > > constituencies gets disappointed.
>>> > >
>>> > > So, I’d like to try something different.  I think we were on the right
>>> > > track with shorter releases with more compatibility.  But I’d like to
>>> > throw
>>> > > in a twist.  Intel cuts down on risk with a “tick-tock” schedule for
>>> new
>>> > > architectures and process shrinks instead of trying to do both at once.
>>> > We
>>> > > can do something similar here:
>>> > >
>>> > > One month releases.  Period.  If it’s not done, it can wait.
>>> > > *Every other release only accepts bug fixes.*
>>> > >
>>> > > By itself, one-month releases are going to dramatically reduce the
>>> > > complexity of testing and debugging new releases -- and bugs that do
>>> slip
>>> > > past us will only affect a smaller percentage of users, avoiding the
>>> “big
>>> > > release has a bunch of bugs no one has seen before and pretty much
>>> > everyone
>>> > > is hit by something” scenario.  But by adding in the second rule, I
>>> think
>>> > > we have a real chance to make a quantum leap here: stable,
>>> > production-ready
>>> > > releases every two months.
>>> > >
>>> > > So here is my proposal for 3.0:
>>> > >
>>> > > We’re just about ready to start serious review of 8099.  When that’s
>>> > done,
>>> > > we branch 3.0 and cut a beta and then release candidates.  Whatever
>>> isn’t
>>> > > done by then, has to wait; unlike prior betas, we will only accept bug
>>> > > fixes into 3.0 after branching.
>>> > >
>>> > > One month after 3.0, we will ship 3.1 (with new features).  At the same
>>> > > time, we will branch 3.2.  New features in trunk will go into 3.3.  The
>>> > 3.2
>>> > > branch will only get bug fixes.  We will maintain backwards
>>> compatibility
>>> > > for all of 3.x; eventually (no less than a year) we will pick a release
>>> > to
>>> > > be 4.0, and drop deprecated features and old backwards compatibilities.
>>> > > Otherwise there will be nothing special about the 4.0 designation.
>>> (Note
>>> > > that with an “odd releases have new features, even releases only have
>>> bug
>>> > > fixes” policy, 4.0 will actually be *more* stable than 3.11.)
>>> > >
>>> > > Larger features can continue to be developed in separate branches, the
>>> > way
>>> > > 8099 is being worked on today, and committed to trunk when ready.  So
>>> > this
>>> > > is not saying that we are limited only to features we can build in a
>>> > single
>>> > > month.
>>> > >
>>> > > Some things will have to change with our dev process, for the better.
>>> In
>>> > > particular, with one month to commit new features, we don’t have room
>>> for
>>> > > committing sloppy work and stabilizing it later.  Trunk has to be
>>> stable
>>> > at
>>> > > all times.  I asked Ariel Weisberg to put together his thoughts
>>> > separately
>>> > > on what worked for his team at VoltDB, and how we can apply that to
>>> > > Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
>>> (TLDR:
>>> > > Redefine “done” to include automated tests.  Infrastructure to run
>>> tests
>>> > > against github branches before merging to trunk.  A new test harness
>>> for
>>> > > long-running regression tests.)
>>> > >
>>> > > I’m optimistic that as we improve our process this way, our even
>>> releases
>>> > > will become increasingly stable.  If so, we can skip sub-minor releases
>>> > > (3.2.x) entirely, and focus on keeping the release train moving.  In
>>> the
>>> > > meantime, we will continue delivering 2.1.x stability releases.
>>> > >
>>> > > This won’t be an entirely smooth transition.  In particular, you will
>>> > have
>>> > > noticed that 3.1 will get more than a month’s worth of new features
>>> while
>>> > > we stabilize 3.0 as the last of the old way of doing things, so some
>>> > > patience is in order as we try this out.  By 3.4 and 3.6 later this
>>> year
>>> > we
>>> > > should have a good idea if this is working, and we can make adjustments
>>> > as
>>> > > warranted.
>>> > >
>>> > > --
>>> > > Jonathan Ellis
>>> > > Project Chair, Apache Cassandra
>>> > > co-founder, http://www.datastax.com
>>> > > @spyced
>

Re: 3.0 and the Cassandra release process

Posted by Dave Brosius <db...@mebigfatguy.com>.

It would seem the practical implications of this is that there would be 
significantly more development on branches, with potentially more 
significant delays on merging these branches. This would imply to me 
that more Jenkins servers would need to be set up to handle auto-testing 
of more branches, as if feature work spends more time on external 
branches, it is then likely to be be less tested (even if by accident) 
as less developers would be working on that branch. Only when a feature 
was blessed to make it to the release-tracked branch, would it become 
exposed to the majority of developers/testers, etc doing normal 
running/playing/testing.

This isn't to knock the idea in anyway, just wanted to mention what i 
think the outcome would be.

dave


>> >
>> > On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> > > Cassandra 2.1 was released in September, which means that if we were on
>> > > track with our stated goal of six month releases, 3.0 would be done
>> about
>> > > now.  Instead, we haven't even delivered a beta.  The immediate cause
>> > this
>> > > time is blocking for 8099
>> > > <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
>> reality
>> > is
>> > > that nobody should really be surprised.  Something always comes up --
>> > we've
>> > > averaged about nine months since 1.0, with 2.1 taking an entire year.
>> > >
>> > > We could make theory align with reality by acknowledging, "if nine
>> months
>> > > is our 'natural' release schedule, then so be it."  But I think we can
>> do
>> > > better.
>> > >
>> > > Broadly speaking, we have two constituencies with Cassandra releases:
>> > >
>> > > First, we have the users who are building or porting an application on
>> > > Cassandra.  These users want the newest features to make their job
>> > easier.
>> > > If 2.1.0 has a few bugs, it's not the end of the world.  They have time
>> > to
>> > > wait for 2.1.x to stabilize while they write their code.  They would
>> like
>> > > to see us deliver on our six month schedule or even faster.
>> > >
>> > > Second, we have the users who have an application in production.  These
>> > > users, or their bosses, want Cassandra to be as stable as possible.
>> > > Assuming they deploy on a stable release like 2.0.12, they don't want
>> to
>> > > touch it.  They would like to see us release *less* often.  (Because
>> that
>> > > means they have to do less upgrades while remaining in our backwards
>> > > compatibility window.)
>> > >
>> > > With our current "big release every X months" model, these users' needs
>> > are
>> > > in tension.
>> > >
>> > > We discussed this six months ago, and ended up with this:
>> > >
>> > > What if we tried a [four month] release cycle, BUT we would guarantee
>> > that
>> > >> you could do a rolling upgrade until we bump the supermajor version?
>> So
>> > 2.0
>> > >> could upgrade to 3.0 without having to go through 2.1.  (But to go to
>> > 3.1
>> > >> or 4.0 you would have to go through 3.0.)
>> > >>
>> > >
>> > > Crucially, I added
>> > >
>> > > Whether this is reasonable depends on how fast we can stabilize
>> releases.
>> > >> 2.1.0 will be a good test of this.
>> > >>
>> > >
>> > > Unfortunately, even after DataStax hired half a dozen full-time test
>> > > engineers, 2.1.0 continued the proud tradition of being unready for
>> > > production use, with "wait for .5 before upgrading" once again looking
>> > like
>> > > a good guideline.
>> > >
>> > > I’m starting to think that the entire model of “write a bunch of new
>> > > features all at once and then try to stabilize it for release” is
>> broken.
>> > > We’ve been trying that for years and empirically speaking the evidence
>> is
>> > > that it just doesn’t work, either from a stability standpoint or even
>> > just
>> > > shipping on time.
>> > >
>> > > A big reason that it takes us so long to stabilize new releases now is
>> > > that, because our major release cycle is so long, it’s super tempting
>> to
>> > > slip in “just one” new feature into bugfix releases, and I’m as guilty
>> of
>> > > that as anyone.
>> > >
>> > > For similar reasons, it’s difficult to do a meaningful freeze with big
>> > > feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
>> > > also have significant work done (but not finished) on 6230, 7970, 6696,
>> > and
>> > > 6477, all of which are meaningful improvements that address
>> demonstrated
>> > > user pain.  So if we keep doing what we’ve been doing, our choices are
>> to
>> > > either delay 3.0 further while we finish and stabilize these, or we
>> wait
>> > > nine months to a year for the next release.  Either way, one of our
>> > > constituencies gets disappointed.
>> > >
>> > > So, I’d like to try something different.  I think we were on the right
>> > > track with shorter releases with more compatibility.  But I’d like to
>> > throw
>> > > in a twist.  Intel cuts down on risk with a “tick-tock” schedule for
>> new
>> > > architectures and process shrinks instead of trying to do both at once.
>> > We
>> > > can do something similar here:
>> > >
>> > > One month releases.  Period.  If it’s not done, it can wait.
>> > > *Every other release only accepts bug fixes.*
>> > >
>> > > By itself, one-month releases are going to dramatically reduce the
>> > > complexity of testing and debugging new releases -- and bugs that do
>> slip
>> > > past us will only affect a smaller percentage of users, avoiding the
>> “big
>> > > release has a bunch of bugs no one has seen before and pretty much
>> > everyone
>> > > is hit by something” scenario.  But by adding in the second rule, I
>> think
>> > > we have a real chance to make a quantum leap here: stable,
>> > production-ready
>> > > releases every two months.
>> > >
>> > > So here is my proposal for 3.0:
>> > >
>> > > We’re just about ready to start serious review of 8099.  When that’s
>> > done,
>> > > we branch 3.0 and cut a beta and then release candidates.  Whatever
>> isn’t
>> > > done by then, has to wait; unlike prior betas, we will only accept bug
>> > > fixes into 3.0 after branching.
>> > >
>> > > One month after 3.0, we will ship 3.1 (with new features).  At the same
>> > > time, we will branch 3.2.  New features in trunk will go into 3.3.  The
>> > 3.2
>> > > branch will only get bug fixes.  We will maintain backwards
>> compatibility
>> > > for all of 3.x; eventually (no less than a year) we will pick a release
>> > to
>> > > be 4.0, and drop deprecated features and old backwards compatibilities.
>> > > Otherwise there will be nothing special about the 4.0 designation.
>> (Note
>> > > that with an “odd releases have new features, even releases only have
>> bug
>> > > fixes” policy, 4.0 will actually be *more* stable than 3.11.)
>> > >
>> > > Larger features can continue to be developed in separate branches, the
>> > way
>> > > 8099 is being worked on today, and committed to trunk when ready.  So
>> > this
>> > > is not saying that we are limited only to features we can build in a
>> > single
>> > > month.
>> > >
>> > > Some things will have to change with our dev process, for the better.
>> In
>> > > particular, with one month to commit new features, we don’t have room
>> for
>> > > committing sloppy work and stabilizing it later.  Trunk has to be
>> stable
>> > at
>> > > all times.  I asked Ariel Weisberg to put together his thoughts
>> > separately
>> > > on what worked for his team at VoltDB, and how we can apply that to
>> > > Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
>> (TLDR:
>> > > Redefine “done” to include automated tests.  Infrastructure to run
>> tests
>> > > against github branches before merging to trunk.  A new test harness
>> for
>> > > long-running regression tests.)
>> > >
>> > > I’m optimistic that as we improve our process this way, our even
>> releases
>> > > will become increasingly stable.  If so, we can skip sub-minor releases
>> > > (3.2.x) entirely, and focus on keeping the release train moving.  In
>> the
>> > > meantime, we will continue delivering 2.1.x stability releases.
>> > >
>> > > This won’t be an entirely smooth transition.  In particular, you will
>> > have
>> > > noticed that 3.1 will get more than a month’s worth of new features
>> while
>> > > we stabilize 3.0 as the last of the old way of doing things, so some
>> > > patience is in order as we try this out.  By 3.4 and 3.6 later this
>> year
>> > we
>> > > should have a good idea if this is working, and we can make adjustments
>> > as
>> > > warranted.
>> > >
>> > > --
>> > > Jonathan Ellis
>> > > Project Chair, Apache Cassandra
>> > > co-founder, http://www.datastax.com
>> > > @spyced

Re: 3.0 and the Cassandra release process

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

If every other release is a bug fix release, would the versioning go:

3.1.0 <-- feature release
3.1.1 <-- bug fix release

Eventually it seems like it might be possible to be able to push out a bug
fix release more frequently than once a month?

On Wed, Mar 18, 2015 at 7:59 AM Josh McKenzie <jo...@datastax.com>
wrote:

> +1
>
> On Wed, Mar 18, 2015 at 7:54 AM, Jake Luciani <ja...@gmail.com> wrote:
>
> > +1
> >
> > On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> > > Cassandra 2.1 was released in September, which means that if we were on
> > > track with our stated goal of six month releases, 3.0 would be done
> about
> > > now.  Instead, we haven't even delivered a beta.  The immediate cause
> > this
> > > time is blocking for 8099
> > > <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
> reality
> > is
> > > that nobody should really be surprised.  Something always comes up --
> > we've
> > > averaged about nine months since 1.0, with 2.1 taking an entire year.
> > >
> > > We could make theory align with reality by acknowledging, "if nine
> months
> > > is our 'natural' release schedule, then so be it."  But I think we can
> do
> > > better.
> > >
> > > Broadly speaking, we have two constituencies with Cassandra releases:
> > >
> > > First, we have the users who are building or porting an application on
> > > Cassandra.  These users want the newest features to make their job
> > easier.
> > > If 2.1.0 has a few bugs, it's not the end of the world.  They have time
> > to
> > > wait for 2.1.x to stabilize while they write their code.  They would
> like
> > > to see us deliver on our six month schedule or even faster.
> > >
> > > Second, we have the users who have an application in production.  These
> > > users, or their bosses, want Cassandra to be as stable as possible.
> > > Assuming they deploy on a stable release like 2.0.12, they don't want
> to
> > > touch it.  They would like to see us release *less* often.  (Because
> that
> > > means they have to do less upgrades while remaining in our backwards
> > > compatibility window.)
> > >
> > > With our current "big release every X months" model, these users' needs
> > are
> > > in tension.
> > >
> > > We discussed this six months ago, and ended up with this:
> > >
> > > What if we tried a [four month] release cycle, BUT we would guarantee
> > that
> > >> you could do a rolling upgrade until we bump the supermajor version?
> So
> > 2.0
> > >> could upgrade to 3.0 without having to go through 2.1.  (But to go to
> > 3.1
> > >> or 4.0 you would have to go through 3.0.)
> > >>
> > >
> > > Crucially, I added
> > >
> > > Whether this is reasonable depends on how fast we can stabilize
> releases.
> > >> 2.1.0 will be a good test of this.
> > >>
> > >
> > > Unfortunately, even after DataStax hired half a dozen full-time test
> > > engineers, 2.1.0 continued the proud tradition of being unready for
> > > production use, with "wait for .5 before upgrading" once again looking
> > like
> > > a good guideline.
> > >
> > > I’m starting to think that the entire model of “write a bunch of new
> > > features all at once and then try to stabilize it for release” is
> broken.
> > > We’ve been trying that for years and empirically speaking the evidence
> is
> > > that it just doesn’t work, either from a stability standpoint or even
> > just
> > > shipping on time.
> > >
> > > A big reason that it takes us so long to stabilize new releases now is
> > > that, because our major release cycle is so long, it’s super tempting
> to
> > > slip in “just one” new feature into bugfix releases, and I’m as guilty
> of
> > > that as anyone.
> > >
> > > For similar reasons, it’s difficult to do a meaningful freeze with big
> > > feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
> > > also have significant work done (but not finished) on 6230, 7970, 6696,
> > and
> > > 6477, all of which are meaningful improvements that address
> demonstrated
> > > user pain.  So if we keep doing what we’ve been doing, our choices are
> to
> > > either delay 3.0 further while we finish and stabilize these, or we
> wait
> > > nine months to a year for the next release.  Either way, one of our
> > > constituencies gets disappointed.
> > >
> > > So, I’d like to try something different.  I think we were on the right
> > > track with shorter releases with more compatibility.  But I’d like to
> > throw
> > > in a twist.  Intel cuts down on risk with a “tick-tock” schedule for
> new
> > > architectures and process shrinks instead of trying to do both at once.
> > We
> > > can do something similar here:
> > >
> > > One month releases.  Period.  If it’s not done, it can wait.
> > > *Every other release only accepts bug fixes.*
> > >
> > > By itself, one-month releases are going to dramatically reduce the
> > > complexity of testing and debugging new releases -- and bugs that do
> slip
> > > past us will only affect a smaller percentage of users, avoiding the
> “big
> > > release has a bunch of bugs no one has seen before and pretty much
> > everyone
> > > is hit by something” scenario.  But by adding in the second rule, I
> think
> > > we have a real chance to make a quantum leap here: stable,
> > production-ready
> > > releases every two months.
> > >
> > > So here is my proposal for 3.0:
> > >
> > > We’re just about ready to start serious review of 8099.  When that’s
> > done,
> > > we branch 3.0 and cut a beta and then release candidates.  Whatever
> isn’t
> > > done by then, has to wait; unlike prior betas, we will only accept bug
> > > fixes into 3.0 after branching.
> > >
> > > One month after 3.0, we will ship 3.1 (with new features).  At the same
> > > time, we will branch 3.2.  New features in trunk will go into 3.3.  The
> > 3.2
> > > branch will only get bug fixes.  We will maintain backwards
> compatibility
> > > for all of 3.x; eventually (no less than a year) we will pick a release
> > to
> > > be 4.0, and drop deprecated features and old backwards compatibilities.
> > > Otherwise there will be nothing special about the 4.0 designation.
> (Note
> > > that with an “odd releases have new features, even releases only have
> bug
> > > fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> > >
> > > Larger features can continue to be developed in separate branches, the
> > way
> > > 8099 is being worked on today, and committed to trunk when ready.  So
> > this
> > > is not saying that we are limited only to features we can build in a
> > single
> > > month.
> > >
> > > Some things will have to change with our dev process, for the better.
> In
> > > particular, with one month to commit new features, we don’t have room
> for
> > > committing sloppy work and stabilizing it later.  Trunk has to be
> stable
> > at
> > > all times.  I asked Ariel Weisberg to put together his thoughts
> > separately
> > > on what worked for his team at VoltDB, and how we can apply that to
> > > Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
> (TLDR:
> > > Redefine “done” to include automated tests.  Infrastructure to run
> tests
> > > against github branches before merging to trunk.  A new test harness
> for
> > > long-running regression tests.)
> > >
> > > I’m optimistic that as we improve our process this way, our even
> releases
> > > will become increasingly stable.  If so, we can skip sub-minor releases
> > > (3.2.x) entirely, and focus on keeping the release train moving.  In
> the
> > > meantime, we will continue delivering 2.1.x stability releases.
> > >
> > > This won’t be an entirely smooth transition.  In particular, you will
> > have
> > > noticed that 3.1 will get more than a month’s worth of new features
> while
> > > we stabilize 3.0 as the last of the old way of doing things, so some
> > > patience is in order as we try this out.  By 3.4 and 3.6 later this
> year
> > we
> > > should have a good idea if this is working, and we can make adjustments
> > as
> > > warranted.
> > >
> > > --
> > > Jonathan Ellis
> > > Project Chair, Apache Cassandra
> > > co-founder, http://www.datastax.com
> > > @spyced
> >
> >
> >
> > --
> > http://twitter.com/tjake
> >
>
>
>
> --
> Joshua McKenzie
> DataStax -- The Apache Cassandra Company
>

Re: 3.0 and the Cassandra release process

Posted by Josh McKenzie <jo...@datastax.com>.

+1

On Wed, Mar 18, 2015 at 7:54 AM, Jake Luciani <ja...@gmail.com> wrote:

> +1
>
> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> > Cassandra 2.1 was released in September, which means that if we were on
> > track with our stated goal of six month releases, 3.0 would be done about
> > now.  Instead, we haven't even delivered a beta.  The immediate cause
> this
> > time is blocking for 8099
> > <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality
> is
> > that nobody should really be surprised.  Something always comes up --
> we've
> > averaged about nine months since 1.0, with 2.1 taking an entire year.
> >
> > We could make theory align with reality by acknowledging, "if nine months
> > is our 'natural' release schedule, then so be it."  But I think we can do
> > better.
> >
> > Broadly speaking, we have two constituencies with Cassandra releases:
> >
> > First, we have the users who are building or porting an application on
> > Cassandra.  These users want the newest features to make their job
> easier.
> > If 2.1.0 has a few bugs, it's not the end of the world.  They have time
> to
> > wait for 2.1.x to stabilize while they write their code.  They would like
> > to see us deliver on our six month schedule or even faster.
> >
> > Second, we have the users who have an application in production.  These
> > users, or their bosses, want Cassandra to be as stable as possible.
> > Assuming they deploy on a stable release like 2.0.12, they don't want to
> > touch it.  They would like to see us release *less* often.  (Because that
> > means they have to do less upgrades while remaining in our backwards
> > compatibility window.)
> >
> > With our current "big release every X months" model, these users' needs
> are
> > in tension.
> >
> > We discussed this six months ago, and ended up with this:
> >
> > What if we tried a [four month] release cycle, BUT we would guarantee
> that
> >> you could do a rolling upgrade until we bump the supermajor version? So
> 2.0
> >> could upgrade to 3.0 without having to go through 2.1.  (But to go to
> 3.1
> >> or 4.0 you would have to go through 3.0.)
> >>
> >
> > Crucially, I added
> >
> > Whether this is reasonable depends on how fast we can stabilize releases.
> >> 2.1.0 will be a good test of this.
> >>
> >
> > Unfortunately, even after DataStax hired half a dozen full-time test
> > engineers, 2.1.0 continued the proud tradition of being unready for
> > production use, with "wait for .5 before upgrading" once again looking
> like
> > a good guideline.
> >
> > I’m starting to think that the entire model of “write a bunch of new
> > features all at once and then try to stabilize it for release” is broken.
> > We’ve been trying that for years and empirically speaking the evidence is
> > that it just doesn’t work, either from a stability standpoint or even
> just
> > shipping on time.
> >
> > A big reason that it takes us so long to stabilize new releases now is
> > that, because our major release cycle is so long, it’s super tempting to
> > slip in “just one” new feature into bugfix releases, and I’m as guilty of
> > that as anyone.
> >
> > For similar reasons, it’s difficult to do a meaningful freeze with big
> > feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
> > also have significant work done (but not finished) on 6230, 7970, 6696,
> and
> > 6477, all of which are meaningful improvements that address demonstrated
> > user pain.  So if we keep doing what we’ve been doing, our choices are to
> > either delay 3.0 further while we finish and stabilize these, or we wait
> > nine months to a year for the next release.  Either way, one of our
> > constituencies gets disappointed.
> >
> > So, I’d like to try something different.  I think we were on the right
> > track with shorter releases with more compatibility.  But I’d like to
> throw
> > in a twist.  Intel cuts down on risk with a “tick-tock” schedule for new
> > architectures and process shrinks instead of trying to do both at once.
> We
> > can do something similar here:
> >
> > One month releases.  Period.  If it’s not done, it can wait.
> > *Every other release only accepts bug fixes.*
> >
> > By itself, one-month releases are going to dramatically reduce the
> > complexity of testing and debugging new releases -- and bugs that do slip
> > past us will only affect a smaller percentage of users, avoiding the “big
> > release has a bunch of bugs no one has seen before and pretty much
> everyone
> > is hit by something” scenario.  But by adding in the second rule, I think
> > we have a real chance to make a quantum leap here: stable,
> production-ready
> > releases every two months.
> >
> > So here is my proposal for 3.0:
> >
> > We’re just about ready to start serious review of 8099.  When that’s
> done,
> > we branch 3.0 and cut a beta and then release candidates.  Whatever isn’t
> > done by then, has to wait; unlike prior betas, we will only accept bug
> > fixes into 3.0 after branching.
> >
> > One month after 3.0, we will ship 3.1 (with new features).  At the same
> > time, we will branch 3.2.  New features in trunk will go into 3.3.  The
> 3.2
> > branch will only get bug fixes.  We will maintain backwards compatibility
> > for all of 3.x; eventually (no less than a year) we will pick a release
> to
> > be 4.0, and drop deprecated features and old backwards compatibilities.
> > Otherwise there will be nothing special about the 4.0 designation.  (Note
> > that with an “odd releases have new features, even releases only have bug
> > fixes” policy, 4.0 will actually be *more* stable than 3.11.)
> >
> > Larger features can continue to be developed in separate branches, the
> way
> > 8099 is being worked on today, and committed to trunk when ready.  So
> this
> > is not saying that we are limited only to features we can build in a
> single
> > month.
> >
> > Some things will have to change with our dev process, for the better.  In
> > particular, with one month to commit new features, we don’t have room for
> > committing sloppy work and stabilizing it later.  Trunk has to be stable
> at
> > all times.  I asked Ariel Weisberg to put together his thoughts
> separately
> > on what worked for his team at VoltDB, and how we can apply that to
> > Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.  (TLDR:
> > Redefine “done” to include automated tests.  Infrastructure to run tests
> > against github branches before merging to trunk.  A new test harness for
> > long-running regression tests.)
> >
> > I’m optimistic that as we improve our process this way, our even releases
> > will become increasingly stable.  If so, we can skip sub-minor releases
> > (3.2.x) entirely, and focus on keeping the release train moving.  In the
> > meantime, we will continue delivering 2.1.x stability releases.
> >
> > This won’t be an entirely smooth transition.  In particular, you will
> have
> > noticed that 3.1 will get more than a month’s worth of new features while
> > we stabilize 3.0 as the last of the old way of doing things, so some
> > patience is in order as we try this out.  By 3.4 and 3.6 later this year
> we
> > should have a good idea if this is working, and we can make adjustments
> as
> > warranted.
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
>
>
>
> --
> http://twitter.com/tjake
>



-- 
Joshua McKenzie
DataStax -- The Apache Cassandra Company

Re: 3.0 and the Cassandra release process

Posted by Jake Luciani <ja...@gmail.com>.

+1

On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Cassandra 2.1 was released in September, which means that if we were on
> track with our stated goal of six month releases, 3.0 would be done about
> now.  Instead, we haven't even delivered a beta.  The immediate cause this
> time is blocking for 8099
> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the reality is
> that nobody should really be surprised.  Something always comes up -- we've
> averaged about nine months since 1.0, with 2.1 taking an entire year.
>
> We could make theory align with reality by acknowledging, "if nine months
> is our 'natural' release schedule, then so be it."  But I think we can do
> better.
>
> Broadly speaking, we have two constituencies with Cassandra releases:
>
> First, we have the users who are building or porting an application on
> Cassandra.  These users want the newest features to make their job easier.
> If 2.1.0 has a few bugs, it's not the end of the world.  They have time to
> wait for 2.1.x to stabilize while they write their code.  They would like
> to see us deliver on our six month schedule or even faster.
>
> Second, we have the users who have an application in production.  These
> users, or their bosses, want Cassandra to be as stable as possible.
> Assuming they deploy on a stable release like 2.0.12, they don't want to
> touch it.  They would like to see us release *less* often.  (Because that
> means they have to do less upgrades while remaining in our backwards
> compatibility window.)
>
> With our current "big release every X months" model, these users' needs are
> in tension.
>
> We discussed this six months ago, and ended up with this:
>
> What if we tried a [four month] release cycle, BUT we would guarantee that
>> you could do a rolling upgrade until we bump the supermajor version? So 2.0
>> could upgrade to 3.0 without having to go through 2.1.  (But to go to 3.1
>> or 4.0 you would have to go through 3.0.)
>>
>
> Crucially, I added
>
> Whether this is reasonable depends on how fast we can stabilize releases.
>> 2.1.0 will be a good test of this.
>>
>
> Unfortunately, even after DataStax hired half a dozen full-time test
> engineers, 2.1.0 continued the proud tradition of being unready for
> production use, with "wait for .5 before upgrading" once again looking like
> a good guideline.
>
> I’m starting to think that the entire model of “write a bunch of new
> features all at once and then try to stabilize it for release” is broken.
> We’ve been trying that for years and empirically speaking the evidence is
> that it just doesn’t work, either from a stability standpoint or even just
> shipping on time.
>
> A big reason that it takes us so long to stabilize new releases now is
> that, because our major release cycle is so long, it’s super tempting to
> slip in “just one” new feature into bugfix releases, and I’m as guilty of
> that as anyone.
>
> For similar reasons, it’s difficult to do a meaningful freeze with big
> feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
> also have significant work done (but not finished) on 6230, 7970, 6696, and
> 6477, all of which are meaningful improvements that address demonstrated
> user pain.  So if we keep doing what we’ve been doing, our choices are to
> either delay 3.0 further while we finish and stabilize these, or we wait
> nine months to a year for the next release.  Either way, one of our
> constituencies gets disappointed.
>
> So, I’d like to try something different.  I think we were on the right
> track with shorter releases with more compatibility.  But I’d like to throw
> in a twist.  Intel cuts down on risk with a “tick-tock” schedule for new
> architectures and process shrinks instead of trying to do both at once.  We
> can do something similar here:
>
> One month releases.  Period.  If it’s not done, it can wait.
> *Every other release only accepts bug fixes.*
>
> By itself, one-month releases are going to dramatically reduce the
> complexity of testing and debugging new releases -- and bugs that do slip
> past us will only affect a smaller percentage of users, avoiding the “big
> release has a bunch of bugs no one has seen before and pretty much everyone
> is hit by something” scenario.  But by adding in the second rule, I think
> we have a real chance to make a quantum leap here: stable, production-ready
> releases every two months.
>
> So here is my proposal for 3.0:
>
> We’re just about ready to start serious review of 8099.  When that’s done,
> we branch 3.0 and cut a beta and then release candidates.  Whatever isn’t
> done by then, has to wait; unlike prior betas, we will only accept bug
> fixes into 3.0 after branching.
>
> One month after 3.0, we will ship 3.1 (with new features).  At the same
> time, we will branch 3.2.  New features in trunk will go into 3.3.  The 3.2
> branch will only get bug fixes.  We will maintain backwards compatibility
> for all of 3.x; eventually (no less than a year) we will pick a release to
> be 4.0, and drop deprecated features and old backwards compatibilities.
> Otherwise there will be nothing special about the 4.0 designation.  (Note
> that with an “odd releases have new features, even releases only have bug
> fixes” policy, 4.0 will actually be *more* stable than 3.11.)
>
> Larger features can continue to be developed in separate branches, the way
> 8099 is being worked on today, and committed to trunk when ready.  So this
> is not saying that we are limited only to features we can build in a single
> month.
>
> Some things will have to change with our dev process, for the better.  In
> particular, with one month to commit new features, we don’t have room for
> committing sloppy work and stabilizing it later.  Trunk has to be stable at
> all times.  I asked Ariel Weisberg to put together his thoughts separately
> on what worked for his team at VoltDB, and how we can apply that to
> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.  (TLDR:
> Redefine “done” to include automated tests.  Infrastructure to run tests
> against github branches before merging to trunk.  A new test harness for
> long-running regression tests.)
>
> I’m optimistic that as we improve our process this way, our even releases
> will become increasingly stable.  If so, we can skip sub-minor releases
> (3.2.x) entirely, and focus on keeping the release train moving.  In the
> meantime, we will continue delivering 2.1.x stability releases.
>
> This won’t be an entirely smooth transition.  In particular, you will have
> noticed that 3.1 will get more than a month’s worth of new features while
> we stabilize 3.0 as the last of the old way of doing things, so some
> patience is in order as we try this out.  By 3.4 and 3.6 later this year we
> should have a good idea if this is working, and we can make adjustments as
> warranted.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced



-- 
http://twitter.com/tjake