You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Joshua McKenzie <jm...@apache.org> on 2020/06/16 16:13:39 UTC

[DISCUSS] Considering when to push tickets out of 4.0

I wanted to open up a discussion about optionality of a few tickets for
4.0. The three I'm specifically thinking of here are:
1) CASSANDRA-15146: Transition TLS server configuration options are overly
complex
2) CASSANDRA-14825: Expose table schema for drivers
3) CASSANDRA-15299: CASSANDRA-13304 follow-up: improve checksumming and
compression in protocol v5-beta

I am *personally* of the opinion that each of these three should be
considered optional for 4.0 and not blockers to cut beta. My reasoning:
1) it's been 4 years, 7 months, 11 days since the release of 3.0.0
2) Alternatively, it's been 3 years, 4 months, 13 days since the release of
3.10.0 (the last time we added new features to the DB)
3) 2 of the 3 tickets involve non-trivial changes to the drivers. The top 5
drivers alone see tens of thousands of aggregate downloads a day; getting
all 5 of those in parity w/the new featureset and to be tested during the
GA phase is going to be very difficult with driver impacting, significant
protocol changes this late in the alpha cycle (this would argue for them
being pushed to 4.1; including here just to point out the ambivalent PoV
here)
4) If we plan on releasing 4.1 six months after the release of 4.0 (i.e.
calender scope vs. feature scope - not yet agreed upon but an option), we
would be looking at a relatively trivial delay of the addition of
"nice-to-have" features relative to broader infrastructure adoption cycles.

I know this is a controversial topic, and I've spoken with many of you that
are working on or reviewing the above tickets - your points of view and
arguments in favor of keeping them in 4.0 definitely resonate with me. That
said, trying to put myself in the shoes of an end user that hasn't seen a
material functionality upgrade in 3+ years and could be testing out and
using zero-copy streaming, audit logging, the new messaging service code,
and the hundreds of bugfixes and almost 300 improvements already in 4.0 - I
think the value in getting this release in my hands would outweigh the
value in getting these three particular features in 4.0 vs. 4.1.

Also, to reiterate, I would personally advocate for these three tickets
being *optional*, meaning if we merge the 1 awaiting review and 5 in
review, then we push them to 4.1.

So - what does everyone else think?

~Josh

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Sam Tunnicliffe <sa...@beobal.com>.

> On 17 Jun 2020, at 09:36, Benedict Elliott Smith <be...@apache.org> wrote:
> 
> If these tickets are the only blockers I agree with Scott's assessment.  We could even disable the v5 protocol if we're keen to get it out of the door today, and only enable it once 15299 lands.  I don't personally think the other two tickets would be impossible to land during a beta either, even if they are API affecting - they should be backwards compatible after all.

I feel the same way, though rather than disabling v5 we could just decide that removing its beta status needn't be a requirement for 4.0. As has been mentioned in previous threads, we aren't planning to remove v4, or even v3, support in 4.0 so keeping v5 as a beta protocol version would allow time for the drivers to implement full support before promoting it to a fully supported version. Making such a change, which only modifies the status of the version, seems reasonable in a minor provided that the beta version has been thoroughly validated.  

As far as I'm aware, neither the java, python nor gocql drivers currently support the existing checksumming feature from CASSANDRA-13304. So I'm 100% in agreement with Benedict that we should revert this before beta. The remaining decision is whether we feel it's appropriate and desirable to release v5 without any additional mechanism for ensuring integrity. If so, then we could punt 15299 out of 4.0/v5 entirely; if not then we either hold off cutting 4.0 beta until 15299 is available or we remove the expectation that v5 will come out of beta in 4.0 

Responding to Mick from earlier in the thread:

> I understand the importance of CASSANDRA-15299. But it hasn't had any
> comments in 12 twelve days, and in this stage of the feature freeze, with
> so few alpha bugs remaining, that's a long time. Sam, can you speak to its
> eta?

That is way too long without any visible progress and I apologise for the radio silence. I have a rather small amount of tidying up to do, but otherwise I think what I have is ready for review and the client facing aspects are stable. I'm still actively working on a test harness and have some bulk renaming to do to, but I don't think that these should hold up the review too much. I'll aim to push an update to the JIRA tomorrow.


> 
>> [Josh] however historically on the project we've had a large number of defects surfaced by a diverse collection of users
>> [Scott] this was in part a case of a pressing need to investigate a potential 3.0 data resurrection issue drawing attention from 4.0
> 
> This is a really common theme with 4.0, whose timeline has been hit primarily because of issues still circulating with the 3.0 line that were never discovered by testing or user reports during beta, RC, or four years of releases.  My personal view, informed by this, is that we _didn't find_ the most serious bugs historically, even with user reports, and we need to be honest with ourselves about this in order to plot a route forwards to high quality releases.  We cannot _depend_ on community feedback for determining release quality; we need a plan to consciously deliver it ourselves.
> 
> 
> On 17/06/2020, 05:12, "Scott Andreas" <sc...@paradoxica.net> wrote:
> 
>    I'll take attribution for the delay in comment on 15299; this was in part a case of a pressing need to investigate a potential 3.0 data resurrection issue drawing attention from 4.0.
> 
>    I agree with the statement that we shouldn't consider protocol V5 ready for finalization in its current form. If we feel that this ticket alone is what delays release of beta and are comfortable with a release note caveating that one V5 ticket remains before the new protocol is finalized, that could be a reasonable compromise.
> 
>    I don't have especially strong feelings re: 15146 and 14825 and think these are reasonable candidates for deferral.
> 
>    ________________________________________
>    From: Joshua McKenzie <jm...@apache.org>
>    Sent: Tuesday, June 16, 2020 4:08 PM
>    To: dev@cassandra.apache.org
>    Subject: Re: [DISCUSS] Considering when to push tickets out of 4.0
> 
>    I completely respect and agree with the need for a drumbeat to change our
>    culture around testing and quality; I also agree we haven't done much to
>    materially change that uniquely to 4.0. The 40_quality_testing epic is our
>    first step in that direction though I have some personal concerns about
>    leaning on bespoke manual testing for quality since we humans are
>    infinitely fallible. :)
> 
>    What elicited that response from me is the claim that we haven't yet tested
>    the software, implicitly invalidating the time and energy the community has
>    put into that thus far. I wouldn't argue that we've adequately tested for a
>    GA release, certainly, but we're discussing beta in this thread. As a
>    project, the advice we have about the testing and usage of the beta is
>    something along the lines of "use this in test/QA and only in cases where
>    minutes of downtime is acceptable." Perhaps we should consider revising the
>    release lifecycle on the wiki if this is something we're not aligned on?
> 
>    To your point above, the problems found to date were largely with 3.0 and
>    found by user report and not by project developer testing. The sooner we
>    can get the 4.0 beta into the hands of the community, the sooner we can get
>    more of those reports while we also work to broaden and deepen our
>    programmatic testing frameworks and platforms. (To acknowledge: I presume
>    that a majority of the user testing that surfaced defects in 3.0 came from
>    one large user's investment of time and resources, however historically on
>    the project we've had a large number of defects surfaced by a diverse
>    collection of users and I'd like to see us move in that direction again for
>    the long-term health of the project. Hence my attempts to move us towards
>    beta and take on an awareness campaign and call to action for the community
>    to engage in testing.)
> 
> 
>    On Tue, Jun 16, 2020 at 6:37 PM Benedict Elliott Smith <be...@apache.org>
>    wrote:
> 
>>> Further, we have thousands of tests across all our suites
>> 
>> I think most here would agree that our testing remains inadequate, and
>> that this (modest, even in pure numerical terms for such a large project)
>> number of often poorly-written unit tests does not really change that fact.
>> 
>> Most of the problems found to date have been found with 3.0, not with 4.0,
>> and found by user report.  We agreed a long time ago that we would aim for
>> 4.0 to be a more stable release than any prior.  Today I think the only
>> reason that might be true is the amount of work invested in fixing problems
>> found in _earlier releases_, not due to verification of 4.0.
>> 
>> I say this not to influence the decision about when and what lands in
>> beta, only to ensure we stay honest with ourselves about our progress on
>> quality.  I hope the software itself is higher quality today, but I do not
>> believe it is honest to (yet) claim that our testing is significantly
>> higher quality than those releases we all agree were inadequate.  There
>> exists some wider external use case testing, but being mostly invisible to
>> the community it is unclear how much broader our coverage is with these
>> included.
>> 
>> On 16/06/2020, 23:08, "David Capwell" <dc...@apple.com.INVALID> wrote:
>> 
>>    Inline
>> 
>>> On Jun 16, 2020, at 2:17 PM, Joshua McKenzie <jm...@apache.org>
>> wrote:
>>> 
>>>> 
>>>> we still produce incorrect results as shown by CASSANDRA-15313;
>> this is a
>>>> correctness issue, so must be a blocker for v5 protocol.
>>> 
>>> That makes complete sense; I'd somehow missed the incorrect results
>> aspect
>>> in trying to get context on the work. I'd be eager to hear about
>> progress
>>> on it as well.
>>> 
>>> Regarding the question of "why would users test if we haven't tested
>> yet",
>>> I respectfully disagree both on the assertion we haven't tested yet
>> as well
>>> as on the distinction between an "us vs. them" in the community.
>> We're all
>>> users and participants in the Cassandra community and ecosystem so
>> anyone
>>> downloading the DB to test it out is just as vital as one of us from
>> the
>>> dev list, committer list, or pmc list testing out the DB.
>> 
>>    I apologies if I came off discriminatory, I will try to absorb your
>> words carefully; thank you for correcting my behavior.
>> 
>>> While we can
>>> reasonably expect a dev paid full time working on the project with a
>> large
>>> amount of infrastructure doing testing to be crucial to getting a
>> release
>>> out and doing certain kinds of testing, there are literally
>> thousands of
>>> different companies out in the world basing their critical
>> infrastructure
>>> on this project and them testing out their use-cases and migration
>> is just
>>> as critical to this release being ready. It takes a village.
>> 
>>    I do agree that user validation is important for the release, I was
>> mostly trying to question why start here before the testing work in JIRA is
>> complete.  Maybe I am in the wrong, I have been heads down working on data
>> corruption issues in 3.x; I have become more risk adverse.
>> 
>>> 
>>> Further, we have thousands of tests across all our suites, hundreds
>> of new
>>> use-case testing that has been done against 4.0 at this point, and
>> 30+%
>>> more bugs fixed in this release than 3.0; the blanket assertion that
>> we
>>> haven't tested 4.0 yet doesn't resonate with me. While we haven't
>> done the
>>> entirety of our final 40 beta phase testing yet, testing is
>> constantly
>>> going on against this codebase by both people on the ML and off.
>>> 
>>> Now, if there are major known glaring issues where we have problems
>> that
>>> would prevent users from actually testing out the beta and kicking
>> the
>>> tires, that's a different story entirely and I'd argue those tickets
>> should
>>> be reflected in the alpha phase (see: CASSANDRA-15299 apparently ;) )
>>> 
>>> Does that make sense?
>> 
>>    I have been meaning to ask this, mostly asking people in Slack and
>> this actually confuses me.
>> 
>>    I was working off the assumption that the fix version meant it was a
>> blocker for that release, and that Alpha special cased and would have
>> releases even with blocking issues (which is documented in the Release
>> Lifecycle).  When I ask around I hear that this is not correct and that
>> alpha means “blocks beta”, beta means “blocks RC”, etc (is any of this
>> documented, I couldn’t find any last time I was talking to others about
>> this).
>> 
>>    Now, lets say we close alpha and cut a beta release, my understanding
>> is that tickets which block the next beta release are alpha…. So do we
>> still mark them alpha (even though we won’t have a alpha release)?
>> 
>>    This has been confusing me since beta has a lot of work pending… sorry
>> for not bring this up in a dedicated dev@ thread
>> 
>> 
>>> 
>>> On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <
>> benedict@apache.org>
>>> wrote:
>>> 
>>>> So, if it helps matters: I am explicitly -1 the prior version of
>> this work
>>>> due to the technical concerns expressed here and on the ticket.  So
>> we
>>>> either need to revert that patch or incorporate 15299.
>>>> 
>>>> On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> 2) Alternatively, it's been 3 years, 4 months, 13 days since the
>>>> release of
>>>>> 3.10.0 (the last time we added new features to the DB)
>>>>> 
>>>> 
>>>> 
>>>>   We did tick-tock, pushing feature releases too quickly, and
>> without
>>>>   supporting them for long enough to get stable. And then we've
>> done "a
>>>> la no
>>>>   feature releases" for over 3 years. It feels like the bar went
>> from
>>>> too low
>>>>   to too high.
>>>> 
>>>>   I understand the importance of CASSANDRA-15299. But it hasn't
>> had any
>>>>   comments in 12 twelve days, and in this stage of the feature
>> freeze,
>>>> with
>>>>   so few alpha bugs remaining, that's a long time. Sam, can you
>> speak to
>>>> its
>>>>   eta?
>>>> 
>>>> 
>>>> 
>>>>> 4) If we plan on releasing 4.1 six months after the release of 4.0
>>>> (i.e.
>>>>> calender scope vs. feature scope - not yet agreed upon but an
>>>> option),
>>>> 
>>>> 
>>>> 
>>>>   I like this. I think it's worth appreciating the different
>>>> perspectives of
>>>>   this community: those involved with private clusters that don't
>> rely on
>>>>   official releases, versus those involved with the public and
>> other
>>>> people's
>>>>   clusters. The latter group needs those official releases much
>> more, but
>>>>   this also ties into putting those users more in focus and
>> figuring out
>>>>   where the bar best sits. This isn't meant to divide, we all care
>> and
>>>> voice
>>>>   for the user, but just to utilise the different strengths
>> brought to
>>>> the
>>>>   table.
>>>> 
>>>> 
>>>>> If we want 4.0.0 out faster, the biggest gains would be to get the
>>>> test
>>>>   plans written up and get more people working on automated
>> testing.
>>>> 
>>>> 
>>>>   Yes, 110%.  Though, as long as this continues to improve, as it
>> has,
>>>> does
>>>>   it need to be a blocker on 4.0?
>>>> 
>>>> 
>>>> 
>>>> 
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>>> 
>> 
>> 
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>    For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 
> 
>    ---------------------------------------------------------------------
>    To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>    For additional commands, e-mail: dev-help@cassandra.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Benjamin Lerer <be...@datastax.com>.

Just to clarify the status of CASSANDRA-14825
The latest version of the patch has been reviewed by Dinesh and I. I am
fixing the last details (mainly the documentation). So I expect the patch
to be ready to commit, today or tomorrow.

On Wed, Jun 17, 2020 at 10:36 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> If these tickets are the only blockers I agree with Scott's assessment.
> We could even disable the v5 protocol if we're keen to get it out of the
> door today, and only enable it once 15299 lands.  I don't personally think
> the other two tickets would be impossible to land during a beta either,
> even if they are API affecting - they should be backwards compatible after
> all.
>
> > [Josh] however historically on the project we've had a large number of
> defects surfaced by a diverse collection of users
> > [Scott] this was in part a case of a pressing need to investigate a
> potential 3.0 data resurrection issue drawing attention from 4.0
>
> This is a really common theme with 4.0, whose timeline has been hit
> primarily because of issues still circulating with the 3.0 line that were
> never discovered by testing or user reports during beta, RC, or four years
> of releases.  My personal view, informed by this, is that we _didn't find_
> the most serious bugs historically, even with user reports, and we need to
> be honest with ourselves about this in order to plot a route forwards to
> high quality releases.  We cannot _depend_ on community feedback for
> determining release quality; we need a plan to consciously deliver it
> ourselves.
>
>
> On 17/06/2020, 05:12, "Scott Andreas" <sc...@paradoxica.net> wrote:
>
>     I'll take attribution for the delay in comment on 15299; this was in
> part a case of a pressing need to investigate a potential 3.0 data
> resurrection issue drawing attention from 4.0.
>
>     I agree with the statement that we shouldn't consider protocol V5
> ready for finalization in its current form. If we feel that this ticket
> alone is what delays release of beta and are comfortable with a release
> note caveating that one V5 ticket remains before the new protocol is
> finalized, that could be a reasonable compromise.
>
>     I don't have especially strong feelings re: 15146 and 14825 and think
> these are reasonable candidates for deferral.
>
>     ________________________________________
>     From: Joshua McKenzie <jm...@apache.org>
>     Sent: Tuesday, June 16, 2020 4:08 PM
>     To: dev@cassandra.apache.org
>     Subject: Re: [DISCUSS] Considering when to push tickets out of 4.0
>
>     I completely respect and agree with the need for a drumbeat to change
> our
>     culture around testing and quality; I also agree we haven't done much
> to
>     materially change that uniquely to 4.0. The 40_quality_testing epic is
> our
>     first step in that direction though I have some personal concerns about
>     leaning on bespoke manual testing for quality since we humans are
>     infinitely fallible. :)
>
>     What elicited that response from me is the claim that we haven't yet
> tested
>     the software, implicitly invalidating the time and energy the
> community has
>     put into that thus far. I wouldn't argue that we've adequately tested
> for a
>     GA release, certainly, but we're discussing beta in this thread. As a
>     project, the advice we have about the testing and usage of the beta is
>     something along the lines of "use this in test/QA and only in cases
> where
>     minutes of downtime is acceptable." Perhaps we should consider
> revising the
>     release lifecycle on the wiki if this is something we're not aligned
> on?
>
>     To your point above, the problems found to date were largely with 3.0
> and
>     found by user report and not by project developer testing. The sooner
> we
>     can get the 4.0 beta into the hands of the community, the sooner we
> can get
>     more of those reports while we also work to broaden and deepen our
>     programmatic testing frameworks and platforms. (To acknowledge: I
> presume
>     that a majority of the user testing that surfaced defects in 3.0 came
> from
>     one large user's investment of time and resources, however
> historically on
>     the project we've had a large number of defects surfaced by a diverse
>     collection of users and I'd like to see us move in that direction
> again for
>     the long-term health of the project. Hence my attempts to move us
> towards
>     beta and take on an awareness campaign and call to action for the
> community
>     to engage in testing.)
>
>
>     On Tue, Jun 16, 2020 at 6:37 PM Benedict Elliott Smith <
> benedict@apache.org>
>     wrote:
>
>     > > Further, we have thousands of tests across all our suites
>     >
>     > I think most here would agree that our testing remains inadequate,
> and
>     > that this (modest, even in pure numerical terms for such a large
> project)
>     > number of often poorly-written unit tests does not really change
> that fact.
>     >
>     > Most of the problems found to date have been found with 3.0, not
> with 4.0,
>     > and found by user report.  We agreed a long time ago that we would
> aim for
>     > 4.0 to be a more stable release than any prior.  Today I think the
> only
>     > reason that might be true is the amount of work invested in fixing
> problems
>     > found in _earlier releases_, not due to verification of 4.0.
>     >
>     > I say this not to influence the decision about when and what lands in
>     > beta, only to ensure we stay honest with ourselves about our
> progress on
>     > quality.  I hope the software itself is higher quality today, but I
> do not
>     > believe it is honest to (yet) claim that our testing is significantly
>     > higher quality than those releases we all agree were inadequate.
> There
>     > exists some wider external use case testing, but being mostly
> invisible to
>     > the community it is unclear how much broader our coverage is with
> these
>     > included.
>     >
>     > On 16/06/2020, 23:08, "David Capwell" <dc...@apple.com.INVALID>
> wrote:
>     >
>     >     Inline
>     >
>     >     > On Jun 16, 2020, at 2:17 PM, Joshua McKenzie <
> jmckenzie@apache.org>
>     > wrote:
>     >     >
>     >     >>
>     >     >> we still produce incorrect results as shown by
> CASSANDRA-15313;
>     > this is a
>     >     >> correctness issue, so must be a blocker for v5 protocol.
>     >     >
>     >     > That makes complete sense; I'd somehow missed the incorrect
> results
>     > aspect
>     >     > in trying to get context on the work. I'd be eager to hear
> about
>     > progress
>     >     > on it as well.
>     >     >
>     >     > Regarding the question of "why would users test if we haven't
> tested
>     > yet",
>     >     > I respectfully disagree both on the assertion we haven't
> tested yet
>     > as well
>     >     > as on the distinction between an "us vs. them" in the
> community.
>     > We're all
>     >     > users and participants in the Cassandra community and
> ecosystem so
>     > anyone
>     >     > downloading the DB to test it out is just as vital as one of
> us from
>     > the
>     >     > dev list, committer list, or pmc list testing out the DB.
>     >
>     >     I apologies if I came off discriminatory, I will try to absorb
> your
>     > words carefully; thank you for correcting my behavior.
>     >
>     >     > While we can
>     >     > reasonably expect a dev paid full time working on the project
> with a
>     > large
>     >     > amount of infrastructure doing testing to be crucial to
> getting a
>     > release
>     >     > out and doing certain kinds of testing, there are literally
>     > thousands of
>     >     > different companies out in the world basing their critical
>     > infrastructure
>     >     > on this project and them testing out their use-cases and
> migration
>     > is just
>     >     > as critical to this release being ready. It takes a village.
>     >
>     >     I do agree that user validation is important for the release, I
> was
>     > mostly trying to question why start here before the testing work in
> JIRA is
>     > complete.  Maybe I am in the wrong, I have been heads down working
> on data
>     > corruption issues in 3.x; I have become more risk adverse.
>     >
>     >     >
>     >     > Further, we have thousands of tests across all our suites,
> hundreds
>     > of new
>     >     > use-case testing that has been done against 4.0 at this point,
> and
>     > 30+%
>     >     > more bugs fixed in this release than 3.0; the blanket
> assertion that
>     > we
>     >     > haven't tested 4.0 yet doesn't resonate with me. While we
> haven't
>     > done the
>     >     > entirety of our final 40 beta phase testing yet, testing is
>     > constantly
>     >     > going on against this codebase by both people on the ML and
> off.
>     >     >
>     >     > Now, if there are major known glaring issues where we have
> problems
>     > that
>     >     > would prevent users from actually testing out the beta and
> kicking
>     > the
>     >     > tires, that's a different story entirely and I'd argue those
> tickets
>     > should
>     >     > be reflected in the alpha phase (see: CASSANDRA-15299
> apparently ;) )
>     >     >
>     >     > Does that make sense?
>     >
>     >     I have been meaning to ask this, mostly asking people in Slack
> and
>     > this actually confuses me.
>     >
>     >     I was working off the assumption that the fix version meant it
> was a
>     > blocker for that release, and that Alpha special cased and would have
>     > releases even with blocking issues (which is documented in the
> Release
>     > Lifecycle).  When I ask around I hear that this is not correct and
> that
>     > alpha means “blocks beta”, beta means “blocks RC”, etc (is any of
> this
>     > documented, I couldn’t find any last time I was talking to others
> about
>     > this).
>     >
>     >     Now, lets say we close alpha and cut a beta release, my
> understanding
>     > is that tickets which block the next beta release are alpha…. So do
> we
>     > still mark them alpha (even though we won’t have a alpha release)?
>     >
>     >     This has been confusing me since beta has a lot of work pending…
> sorry
>     > for not bring this up in a dedicated dev@ thread
>     >
>     >
>     >     >
>     >     > On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <
>     > benedict@apache.org>
>     >     > wrote:
>     >     >
>     >     >> So, if it helps matters: I am explicitly -1 the prior version
> of
>     > this work
>     >     >> due to the technical concerns expressed here and on the
> ticket.  So
>     > we
>     >     >> either need to revert that patch or incorporate 15299.
>     >     >>
>     >     >> On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org>
> wrote:
>     >     >>
>     >     >>>
>     >     >>> 2) Alternatively, it's been 3 years, 4 months, 13 days since
> the
>     >     >> release of
>     >     >>> 3.10.0 (the last time we added new features to the DB)
>     >     >>>
>     >     >>
>     >     >>
>     >     >>    We did tick-tock, pushing feature releases too quickly, and
>     > without
>     >     >>    supporting them for long enough to get stable. And then
> we've
>     > done "a
>     >     >> la no
>     >     >>    feature releases" for over 3 years. It feels like the bar
> went
>     > from
>     >     >> too low
>     >     >>    to too high.
>     >     >>
>     >     >>    I understand the importance of CASSANDRA-15299. But it
> hasn't
>     > had any
>     >     >>    comments in 12 twelve days, and in this stage of the
> feature
>     > freeze,
>     >     >> with
>     >     >>    so few alpha bugs remaining, that's a long time. Sam, can
> you
>     > speak to
>     >     >> its
>     >     >>    eta?
>     >     >>
>     >     >>
>     >     >>
>     >     >>> 4) If we plan on releasing 4.1 six months after the release
> of 4.0
>     >     >> (i.e.
>     >     >>> calender scope vs. feature scope - not yet agreed upon but an
>     >     >> option),
>     >     >>
>     >     >>
>     >     >>
>     >     >>    I like this. I think it's worth appreciating the different
>     >     >> perspectives of
>     >     >>    this community: those involved with private clusters that
> don't
>     > rely on
>     >     >>    official releases, versus those involved with the public
> and
>     > other
>     >     >> people's
>     >     >>    clusters. The latter group needs those official releases
> much
>     > more, but
>     >     >>    this also ties into putting those users more in focus and
>     > figuring out
>     >     >>    where the bar best sits. This isn't meant to divide, we
> all care
>     > and
>     >     >> voice
>     >     >>    for the user, but just to utilise the different strengths
>     > brought to
>     >     >> the
>     >     >>    table.
>     >     >>
>     >     >>
>     >     >>> If we want 4.0.0 out faster, the biggest gains would be to
> get the
>     >     >> test
>     >     >>    plans written up and get more people working on automated
>     > testing.
>     >     >>
>     >     >>
>     >     >>    Yes, 110%.  Though, as long as this continues to improve,
> as it
>     > has,
>     >     >> does
>     >     >>    it need to be a blocker on 4.0?
>     >     >>
>     >     >>
>     >     >>
>     >     >>
>     > ---------------------------------------------------------------------
>     >     >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     >     >> For additional commands, e-mail:
> dev-help@cassandra.apache.org
>     >     >>
>     >     >>
>     >
>     >
>     >
>  ---------------------------------------------------------------------
>     >     To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     >     For additional commands, e-mail: dev-help@cassandra.apache.org
>     >
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     > For additional commands, e-mail: dev-help@cassandra.apache.org
>     >
>     >
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Benedict Elliott Smith <be...@apache.org>.

If these tickets are the only blockers I agree with Scott's assessment.  We could even disable the v5 protocol if we're keen to get it out of the door today, and only enable it once 15299 lands.  I don't personally think the other two tickets would be impossible to land during a beta either, even if they are API affecting - they should be backwards compatible after all.

> [Josh] however historically on the project we've had a large number of defects surfaced by a diverse collection of users
> [Scott] this was in part a case of a pressing need to investigate a potential 3.0 data resurrection issue drawing attention from 4.0

This is a really common theme with 4.0, whose timeline has been hit primarily because of issues still circulating with the 3.0 line that were never discovered by testing or user reports during beta, RC, or four years of releases.  My personal view, informed by this, is that we _didn't find_ the most serious bugs historically, even with user reports, and we need to be honest with ourselves about this in order to plot a route forwards to high quality releases.  We cannot _depend_ on community feedback for determining release quality; we need a plan to consciously deliver it ourselves.


On 17/06/2020, 05:12, "Scott Andreas" <sc...@paradoxica.net> wrote:

    I'll take attribution for the delay in comment on 15299; this was in part a case of a pressing need to investigate a potential 3.0 data resurrection issue drawing attention from 4.0.

    I agree with the statement that we shouldn't consider protocol V5 ready for finalization in its current form. If we feel that this ticket alone is what delays release of beta and are comfortable with a release note caveating that one V5 ticket remains before the new protocol is finalized, that could be a reasonable compromise.

    I don't have especially strong feelings re: 15146 and 14825 and think these are reasonable candidates for deferral.

    ________________________________________
    From: Joshua McKenzie <jm...@apache.org>
    Sent: Tuesday, June 16, 2020 4:08 PM
    To: dev@cassandra.apache.org
    Subject: Re: [DISCUSS] Considering when to push tickets out of 4.0

    I completely respect and agree with the need for a drumbeat to change our
    culture around testing and quality; I also agree we haven't done much to
    materially change that uniquely to 4.0. The 40_quality_testing epic is our
    first step in that direction though I have some personal concerns about
    leaning on bespoke manual testing for quality since we humans are
    infinitely fallible. :)

    What elicited that response from me is the claim that we haven't yet tested
    the software, implicitly invalidating the time and energy the community has
    put into that thus far. I wouldn't argue that we've adequately tested for a
    GA release, certainly, but we're discussing beta in this thread. As a
    project, the advice we have about the testing and usage of the beta is
    something along the lines of "use this in test/QA and only in cases where
    minutes of downtime is acceptable." Perhaps we should consider revising the
    release lifecycle on the wiki if this is something we're not aligned on?

    To your point above, the problems found to date were largely with 3.0 and
    found by user report and not by project developer testing. The sooner we
    can get the 4.0 beta into the hands of the community, the sooner we can get
    more of those reports while we also work to broaden and deepen our
    programmatic testing frameworks and platforms. (To acknowledge: I presume
    that a majority of the user testing that surfaced defects in 3.0 came from
    one large user's investment of time and resources, however historically on
    the project we've had a large number of defects surfaced by a diverse
    collection of users and I'd like to see us move in that direction again for
    the long-term health of the project. Hence my attempts to move us towards
    beta and take on an awareness campaign and call to action for the community
    to engage in testing.)


    On Tue, Jun 16, 2020 at 6:37 PM Benedict Elliott Smith <be...@apache.org>
    wrote:

    > > Further, we have thousands of tests across all our suites
    >
    > I think most here would agree that our testing remains inadequate, and
    > that this (modest, even in pure numerical terms for such a large project)
    > number of often poorly-written unit tests does not really change that fact.
    >
    > Most of the problems found to date have been found with 3.0, not with 4.0,
    > and found by user report.  We agreed a long time ago that we would aim for
    > 4.0 to be a more stable release than any prior.  Today I think the only
    > reason that might be true is the amount of work invested in fixing problems
    > found in _earlier releases_, not due to verification of 4.0.
    >
    > I say this not to influence the decision about when and what lands in
    > beta, only to ensure we stay honest with ourselves about our progress on
    > quality.  I hope the software itself is higher quality today, but I do not
    > believe it is honest to (yet) claim that our testing is significantly
    > higher quality than those releases we all agree were inadequate.  There
    > exists some wider external use case testing, but being mostly invisible to
    > the community it is unclear how much broader our coverage is with these
    > included.
    >
    > On 16/06/2020, 23:08, "David Capwell" <dc...@apple.com.INVALID> wrote:
    >
    >     Inline
    >
    >     > On Jun 16, 2020, at 2:17 PM, Joshua McKenzie <jm...@apache.org>
    > wrote:
    >     >
    >     >>
    >     >> we still produce incorrect results as shown by CASSANDRA-15313;
    > this is a
    >     >> correctness issue, so must be a blocker for v5 protocol.
    >     >
    >     > That makes complete sense; I'd somehow missed the incorrect results
    > aspect
    >     > in trying to get context on the work. I'd be eager to hear about
    > progress
    >     > on it as well.
    >     >
    >     > Regarding the question of "why would users test if we haven't tested
    > yet",
    >     > I respectfully disagree both on the assertion we haven't tested yet
    > as well
    >     > as on the distinction between an "us vs. them" in the community.
    > We're all
    >     > users and participants in the Cassandra community and ecosystem so
    > anyone
    >     > downloading the DB to test it out is just as vital as one of us from
    > the
    >     > dev list, committer list, or pmc list testing out the DB.
    >
    >     I apologies if I came off discriminatory, I will try to absorb your
    > words carefully; thank you for correcting my behavior.
    >
    >     > While we can
    >     > reasonably expect a dev paid full time working on the project with a
    > large
    >     > amount of infrastructure doing testing to be crucial to getting a
    > release
    >     > out and doing certain kinds of testing, there are literally
    > thousands of
    >     > different companies out in the world basing their critical
    > infrastructure
    >     > on this project and them testing out their use-cases and migration
    > is just
    >     > as critical to this release being ready. It takes a village.
    >
    >     I do agree that user validation is important for the release, I was
    > mostly trying to question why start here before the testing work in JIRA is
    > complete.  Maybe I am in the wrong, I have been heads down working on data
    > corruption issues in 3.x; I have become more risk adverse.
    >
    >     >
    >     > Further, we have thousands of tests across all our suites, hundreds
    > of new
    >     > use-case testing that has been done against 4.0 at this point, and
    > 30+%
    >     > more bugs fixed in this release than 3.0; the blanket assertion that
    > we
    >     > haven't tested 4.0 yet doesn't resonate with me. While we haven't
    > done the
    >     > entirety of our final 40 beta phase testing yet, testing is
    > constantly
    >     > going on against this codebase by both people on the ML and off.
    >     >
    >     > Now, if there are major known glaring issues where we have problems
    > that
    >     > would prevent users from actually testing out the beta and kicking
    > the
    >     > tires, that's a different story entirely and I'd argue those tickets
    > should
    >     > be reflected in the alpha phase (see: CASSANDRA-15299 apparently ;) )
    >     >
    >     > Does that make sense?
    >
    >     I have been meaning to ask this, mostly asking people in Slack and
    > this actually confuses me.
    >
    >     I was working off the assumption that the fix version meant it was a
    > blocker for that release, and that Alpha special cased and would have
    > releases even with blocking issues (which is documented in the Release
    > Lifecycle).  When I ask around I hear that this is not correct and that
    > alpha means “blocks beta”, beta means “blocks RC”, etc (is any of this
    > documented, I couldn’t find any last time I was talking to others about
    > this).
    >
    >     Now, lets say we close alpha and cut a beta release, my understanding
    > is that tickets which block the next beta release are alpha…. So do we
    > still mark them alpha (even though we won’t have a alpha release)?
    >
    >     This has been confusing me since beta has a lot of work pending… sorry
    > for not bring this up in a dedicated dev@ thread
    >
    >
    >     >
    >     > On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <
    > benedict@apache.org>
    >     > wrote:
    >     >
    >     >> So, if it helps matters: I am explicitly -1 the prior version of
    > this work
    >     >> due to the technical concerns expressed here and on the ticket.  So
    > we
    >     >> either need to revert that patch or incorporate 15299.
    >     >>
    >     >> On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org> wrote:
    >     >>
    >     >>>
    >     >>> 2) Alternatively, it's been 3 years, 4 months, 13 days since the
    >     >> release of
    >     >>> 3.10.0 (the last time we added new features to the DB)
    >     >>>
    >     >>
    >     >>
    >     >>    We did tick-tock, pushing feature releases too quickly, and
    > without
    >     >>    supporting them for long enough to get stable. And then we've
    > done "a
    >     >> la no
    >     >>    feature releases" for over 3 years. It feels like the bar went
    > from
    >     >> too low
    >     >>    to too high.
    >     >>
    >     >>    I understand the importance of CASSANDRA-15299. But it hasn't
    > had any
    >     >>    comments in 12 twelve days, and in this stage of the feature
    > freeze,
    >     >> with
    >     >>    so few alpha bugs remaining, that's a long time. Sam, can you
    > speak to
    >     >> its
    >     >>    eta?
    >     >>
    >     >>
    >     >>
    >     >>> 4) If we plan on releasing 4.1 six months after the release of 4.0
    >     >> (i.e.
    >     >>> calender scope vs. feature scope - not yet agreed upon but an
    >     >> option),
    >     >>
    >     >>
    >     >>
    >     >>    I like this. I think it's worth appreciating the different
    >     >> perspectives of
    >     >>    this community: those involved with private clusters that don't
    > rely on
    >     >>    official releases, versus those involved with the public and
    > other
    >     >> people's
    >     >>    clusters. The latter group needs those official releases much
    > more, but
    >     >>    this also ties into putting those users more in focus and
    > figuring out
    >     >>    where the bar best sits. This isn't meant to divide, we all care
    > and
    >     >> voice
    >     >>    for the user, but just to utilise the different strengths
    > brought to
    >     >> the
    >     >>    table.
    >     >>
    >     >>
    >     >>> If we want 4.0.0 out faster, the biggest gains would be to get the
    >     >> test
    >     >>    plans written up and get more people working on automated
    > testing.
    >     >>
    >     >>
    >     >>    Yes, 110%.  Though, as long as this continues to improve, as it
    > has,
    >     >> does
    >     >>    it need to be a blocker on 4.0?
    >     >>
    >     >>
    >     >>
    >     >>
    > ---------------------------------------------------------------------
    >     >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >     >> For additional commands, e-mail: dev-help@cassandra.apache.org
    >     >>
    >     >>
    >
    >
    >     ---------------------------------------------------------------------
    >     To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >     For additional commands, e-mail: dev-help@cassandra.apache.org
    >
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    > For additional commands, e-mail: dev-help@cassandra.apache.org
    >
    >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    For additional commands, e-mail: dev-help@cassandra.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Scott Andreas <sc...@paradoxica.net>.

I'll take attribution for the delay in comment on 15299; this was in part a case of a pressing need to investigate a potential 3.0 data resurrection issue drawing attention from 4.0.

I agree with the statement that we shouldn't consider protocol V5 ready for finalization in its current form. If we feel that this ticket alone is what delays release of beta and are comfortable with a release note caveating that one V5 ticket remains before the new protocol is finalized, that could be a reasonable compromise.

I don't have especially strong feelings re: 15146 and 14825 and think these are reasonable candidates for deferral.

________________________________________
From: Joshua McKenzie <jm...@apache.org>
Sent: Tuesday, June 16, 2020 4:08 PM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Considering when to push tickets out of 4.0

I completely respect and agree with the need for a drumbeat to change our
culture around testing and quality; I also agree we haven't done much to
materially change that uniquely to 4.0. The 40_quality_testing epic is our
first step in that direction though I have some personal concerns about
leaning on bespoke manual testing for quality since we humans are
infinitely fallible. :)

What elicited that response from me is the claim that we haven't yet tested
the software, implicitly invalidating the time and energy the community has
put into that thus far. I wouldn't argue that we've adequately tested for a
GA release, certainly, but we're discussing beta in this thread. As a
project, the advice we have about the testing and usage of the beta is
something along the lines of "use this in test/QA and only in cases where
minutes of downtime is acceptable." Perhaps we should consider revising the
release lifecycle on the wiki if this is something we're not aligned on?

To your point above, the problems found to date were largely with 3.0 and
found by user report and not by project developer testing. The sooner we
can get the 4.0 beta into the hands of the community, the sooner we can get
more of those reports while we also work to broaden and deepen our
programmatic testing frameworks and platforms. (To acknowledge: I presume
that a majority of the user testing that surfaced defects in 3.0 came from
one large user's investment of time and resources, however historically on
the project we've had a large number of defects surfaced by a diverse
collection of users and I'd like to see us move in that direction again for
the long-term health of the project. Hence my attempts to move us towards
beta and take on an awareness campaign and call to action for the community
to engage in testing.)


On Tue, Jun 16, 2020 at 6:37 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> > Further, we have thousands of tests across all our suites
>
> I think most here would agree that our testing remains inadequate, and
> that this (modest, even in pure numerical terms for such a large project)
> number of often poorly-written unit tests does not really change that fact.
>
> Most of the problems found to date have been found with 3.0, not with 4.0,
> and found by user report.  We agreed a long time ago that we would aim for
> 4.0 to be a more stable release than any prior.  Today I think the only
> reason that might be true is the amount of work invested in fixing problems
> found in _earlier releases_, not due to verification of 4.0.
>
> I say this not to influence the decision about when and what lands in
> beta, only to ensure we stay honest with ourselves about our progress on
> quality.  I hope the software itself is higher quality today, but I do not
> believe it is honest to (yet) claim that our testing is significantly
> higher quality than those releases we all agree were inadequate.  There
> exists some wider external use case testing, but being mostly invisible to
> the community it is unclear how much broader our coverage is with these
> included.
>
> On 16/06/2020, 23:08, "David Capwell" <dc...@apple.com.INVALID> wrote:
>
>     Inline
>
>     > On Jun 16, 2020, at 2:17 PM, Joshua McKenzie <jm...@apache.org>
> wrote:
>     >
>     >>
>     >> we still produce incorrect results as shown by CASSANDRA-15313;
> this is a
>     >> correctness issue, so must be a blocker for v5 protocol.
>     >
>     > That makes complete sense; I'd somehow missed the incorrect results
> aspect
>     > in trying to get context on the work. I'd be eager to hear about
> progress
>     > on it as well.
>     >
>     > Regarding the question of "why would users test if we haven't tested
> yet",
>     > I respectfully disagree both on the assertion we haven't tested yet
> as well
>     > as on the distinction between an "us vs. them" in the community.
> We're all
>     > users and participants in the Cassandra community and ecosystem so
> anyone
>     > downloading the DB to test it out is just as vital as one of us from
> the
>     > dev list, committer list, or pmc list testing out the DB.
>
>     I apologies if I came off discriminatory, I will try to absorb your
> words carefully; thank you for correcting my behavior.
>
>     > While we can
>     > reasonably expect a dev paid full time working on the project with a
> large
>     > amount of infrastructure doing testing to be crucial to getting a
> release
>     > out and doing certain kinds of testing, there are literally
> thousands of
>     > different companies out in the world basing their critical
> infrastructure
>     > on this project and them testing out their use-cases and migration
> is just
>     > as critical to this release being ready. It takes a village.
>
>     I do agree that user validation is important for the release, I was
> mostly trying to question why start here before the testing work in JIRA is
> complete.  Maybe I am in the wrong, I have been heads down working on data
> corruption issues in 3.x; I have become more risk adverse.
>
>     >
>     > Further, we have thousands of tests across all our suites, hundreds
> of new
>     > use-case testing that has been done against 4.0 at this point, and
> 30+%
>     > more bugs fixed in this release than 3.0; the blanket assertion that
> we
>     > haven't tested 4.0 yet doesn't resonate with me. While we haven't
> done the
>     > entirety of our final 40 beta phase testing yet, testing is
> constantly
>     > going on against this codebase by both people on the ML and off.
>     >
>     > Now, if there are major known glaring issues where we have problems
> that
>     > would prevent users from actually testing out the beta and kicking
> the
>     > tires, that's a different story entirely and I'd argue those tickets
> should
>     > be reflected in the alpha phase (see: CASSANDRA-15299 apparently ;) )
>     >
>     > Does that make sense?
>
>     I have been meaning to ask this, mostly asking people in Slack and
> this actually confuses me.
>
>     I was working off the assumption that the fix version meant it was a
> blocker for that release, and that Alpha special cased and would have
> releases even with blocking issues (which is documented in the Release
> Lifecycle).  When I ask around I hear that this is not correct and that
> alpha means “blocks beta”, beta means “blocks RC”, etc (is any of this
> documented, I couldn’t find any last time I was talking to others about
> this).
>
>     Now, lets say we close alpha and cut a beta release, my understanding
> is that tickets which block the next beta release are alpha…. So do we
> still mark them alpha (even though we won’t have a alpha release)?
>
>     This has been confusing me since beta has a lot of work pending… sorry
> for not bring this up in a dedicated dev@ thread
>
>
>     >
>     > On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <
> benedict@apache.org>
>     > wrote:
>     >
>     >> So, if it helps matters: I am explicitly -1 the prior version of
> this work
>     >> due to the technical concerns expressed here and on the ticket.  So
> we
>     >> either need to revert that patch or incorporate 15299.
>     >>
>     >> On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org> wrote:
>     >>
>     >>>
>     >>> 2) Alternatively, it's been 3 years, 4 months, 13 days since the
>     >> release of
>     >>> 3.10.0 (the last time we added new features to the DB)
>     >>>
>     >>
>     >>
>     >>    We did tick-tock, pushing feature releases too quickly, and
> without
>     >>    supporting them for long enough to get stable. And then we've
> done "a
>     >> la no
>     >>    feature releases" for over 3 years. It feels like the bar went
> from
>     >> too low
>     >>    to too high.
>     >>
>     >>    I understand the importance of CASSANDRA-15299. But it hasn't
> had any
>     >>    comments in 12 twelve days, and in this stage of the feature
> freeze,
>     >> with
>     >>    so few alpha bugs remaining, that's a long time. Sam, can you
> speak to
>     >> its
>     >>    eta?
>     >>
>     >>
>     >>
>     >>> 4) If we plan on releasing 4.1 six months after the release of 4.0
>     >> (i.e.
>     >>> calender scope vs. feature scope - not yet agreed upon but an
>     >> option),
>     >>
>     >>
>     >>
>     >>    I like this. I think it's worth appreciating the different
>     >> perspectives of
>     >>    this community: those involved with private clusters that don't
> rely on
>     >>    official releases, versus those involved with the public and
> other
>     >> people's
>     >>    clusters. The latter group needs those official releases much
> more, but
>     >>    this also ties into putting those users more in focus and
> figuring out
>     >>    where the bar best sits. This isn't meant to divide, we all care
> and
>     >> voice
>     >>    for the user, but just to utilise the different strengths
> brought to
>     >> the
>     >>    table.
>     >>
>     >>
>     >>> If we want 4.0.0 out faster, the biggest gains would be to get the
>     >> test
>     >>    plans written up and get more people working on automated
> testing.
>     >>
>     >>
>     >>    Yes, 110%.  Though, as long as this continues to improve, as it
> has,
>     >> does
>     >>    it need to be a blocker on 4.0?
>     >>
>     >>
>     >>
>     >>
> ---------------------------------------------------------------------
>     >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     >> For additional commands, e-mail: dev-help@cassandra.apache.org
>     >>
>     >>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Joshua McKenzie <jm...@apache.org>.

I completely respect and agree with the need for a drumbeat to change our
culture around testing and quality; I also agree we haven't done much to
materially change that uniquely to 4.0. The 40_quality_testing epic is our
first step in that direction though I have some personal concerns about
leaning on bespoke manual testing for quality since we humans are
infinitely fallible. :)

What elicited that response from me is the claim that we haven't yet tested
the software, implicitly invalidating the time and energy the community has
put into that thus far. I wouldn't argue that we've adequately tested for a
GA release, certainly, but we're discussing beta in this thread. As a
project, the advice we have about the testing and usage of the beta is
something along the lines of "use this in test/QA and only in cases where
minutes of downtime is acceptable." Perhaps we should consider revising the
release lifecycle on the wiki if this is something we're not aligned on?

To your point above, the problems found to date were largely with 3.0 and
found by user report and not by project developer testing. The sooner we
can get the 4.0 beta into the hands of the community, the sooner we can get
more of those reports while we also work to broaden and deepen our
programmatic testing frameworks and platforms. (To acknowledge: I presume
that a majority of the user testing that surfaced defects in 3.0 came from
one large user's investment of time and resources, however historically on
the project we've had a large number of defects surfaced by a diverse
collection of users and I'd like to see us move in that direction again for
the long-term health of the project. Hence my attempts to move us towards
beta and take on an awareness campaign and call to action for the community
to engage in testing.)


On Tue, Jun 16, 2020 at 6:37 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> > Further, we have thousands of tests across all our suites
>
> I think most here would agree that our testing remains inadequate, and
> that this (modest, even in pure numerical terms for such a large project)
> number of often poorly-written unit tests does not really change that fact.
>
> Most of the problems found to date have been found with 3.0, not with 4.0,
> and found by user report.  We agreed a long time ago that we would aim for
> 4.0 to be a more stable release than any prior.  Today I think the only
> reason that might be true is the amount of work invested in fixing problems
> found in _earlier releases_, not due to verification of 4.0.
>
> I say this not to influence the decision about when and what lands in
> beta, only to ensure we stay honest with ourselves about our progress on
> quality.  I hope the software itself is higher quality today, but I do not
> believe it is honest to (yet) claim that our testing is significantly
> higher quality than those releases we all agree were inadequate.  There
> exists some wider external use case testing, but being mostly invisible to
> the community it is unclear how much broader our coverage is with these
> included.
>
> On 16/06/2020, 23:08, "David Capwell" <dc...@apple.com.INVALID> wrote:
>
>     Inline
>
>     > On Jun 16, 2020, at 2:17 PM, Joshua McKenzie <jm...@apache.org>
> wrote:
>     >
>     >>
>     >> we still produce incorrect results as shown by CASSANDRA-15313;
> this is a
>     >> correctness issue, so must be a blocker for v5 protocol.
>     >
>     > That makes complete sense; I'd somehow missed the incorrect results
> aspect
>     > in trying to get context on the work. I'd be eager to hear about
> progress
>     > on it as well.
>     >
>     > Regarding the question of "why would users test if we haven't tested
> yet",
>     > I respectfully disagree both on the assertion we haven't tested yet
> as well
>     > as on the distinction between an "us vs. them" in the community.
> We're all
>     > users and participants in the Cassandra community and ecosystem so
> anyone
>     > downloading the DB to test it out is just as vital as one of us from
> the
>     > dev list, committer list, or pmc list testing out the DB.
>
>     I apologies if I came off discriminatory, I will try to absorb your
> words carefully; thank you for correcting my behavior.
>
>     > While we can
>     > reasonably expect a dev paid full time working on the project with a
> large
>     > amount of infrastructure doing testing to be crucial to getting a
> release
>     > out and doing certain kinds of testing, there are literally
> thousands of
>     > different companies out in the world basing their critical
> infrastructure
>     > on this project and them testing out their use-cases and migration
> is just
>     > as critical to this release being ready. It takes a village.
>
>     I do agree that user validation is important for the release, I was
> mostly trying to question why start here before the testing work in JIRA is
> complete.  Maybe I am in the wrong, I have been heads down working on data
> corruption issues in 3.x; I have become more risk adverse.
>
>     >
>     > Further, we have thousands of tests across all our suites, hundreds
> of new
>     > use-case testing that has been done against 4.0 at this point, and
> 30+%
>     > more bugs fixed in this release than 3.0; the blanket assertion that
> we
>     > haven't tested 4.0 yet doesn't resonate with me. While we haven't
> done the
>     > entirety of our final 40 beta phase testing yet, testing is
> constantly
>     > going on against this codebase by both people on the ML and off.
>     >
>     > Now, if there are major known glaring issues where we have problems
> that
>     > would prevent users from actually testing out the beta and kicking
> the
>     > tires, that's a different story entirely and I'd argue those tickets
> should
>     > be reflected in the alpha phase (see: CASSANDRA-15299 apparently ;) )
>     >
>     > Does that make sense?
>
>     I have been meaning to ask this, mostly asking people in Slack and
> this actually confuses me.
>
>     I was working off the assumption that the fix version meant it was a
> blocker for that release, and that Alpha special cased and would have
> releases even with blocking issues (which is documented in the Release
> Lifecycle).  When I ask around I hear that this is not correct and that
> alpha means “blocks beta”, beta means “blocks RC”, etc (is any of this
> documented, I couldn’t find any last time I was talking to others about
> this).
>
>     Now, lets say we close alpha and cut a beta release, my understanding
> is that tickets which block the next beta release are alpha…. So do we
> still mark them alpha (even though we won’t have a alpha release)?
>
>     This has been confusing me since beta has a lot of work pending… sorry
> for not bring this up in a dedicated dev@ thread
>
>
>     >
>     > On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <
> benedict@apache.org>
>     > wrote:
>     >
>     >> So, if it helps matters: I am explicitly -1 the prior version of
> this work
>     >> due to the technical concerns expressed here and on the ticket.  So
> we
>     >> either need to revert that patch or incorporate 15299.
>     >>
>     >> On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org> wrote:
>     >>
>     >>>
>     >>> 2) Alternatively, it's been 3 years, 4 months, 13 days since the
>     >> release of
>     >>> 3.10.0 (the last time we added new features to the DB)
>     >>>
>     >>
>     >>
>     >>    We did tick-tock, pushing feature releases too quickly, and
> without
>     >>    supporting them for long enough to get stable. And then we've
> done "a
>     >> la no
>     >>    feature releases" for over 3 years. It feels like the bar went
> from
>     >> too low
>     >>    to too high.
>     >>
>     >>    I understand the importance of CASSANDRA-15299. But it hasn't
> had any
>     >>    comments in 12 twelve days, and in this stage of the feature
> freeze,
>     >> with
>     >>    so few alpha bugs remaining, that's a long time. Sam, can you
> speak to
>     >> its
>     >>    eta?
>     >>
>     >>
>     >>
>     >>> 4) If we plan on releasing 4.1 six months after the release of 4.0
>     >> (i.e.
>     >>> calender scope vs. feature scope - not yet agreed upon but an
>     >> option),
>     >>
>     >>
>     >>
>     >>    I like this. I think it's worth appreciating the different
>     >> perspectives of
>     >>    this community: those involved with private clusters that don't
> rely on
>     >>    official releases, versus those involved with the public and
> other
>     >> people's
>     >>    clusters. The latter group needs those official releases much
> more, but
>     >>    this also ties into putting those users more in focus and
> figuring out
>     >>    where the bar best sits. This isn't meant to divide, we all care
> and
>     >> voice
>     >>    for the user, but just to utilise the different strengths
> brought to
>     >> the
>     >>    table.
>     >>
>     >>
>     >>> If we want 4.0.0 out faster, the biggest gains would be to get the
>     >> test
>     >>    plans written up and get more people working on automated
> testing.
>     >>
>     >>
>     >>    Yes, 110%.  Though, as long as this continues to improve, as it
> has,
>     >> does
>     >>    it need to be a blocker on 4.0?
>     >>
>     >>
>     >>
>     >>
> ---------------------------------------------------------------------
>     >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     >> For additional commands, e-mail: dev-help@cassandra.apache.org
>     >>
>     >>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Benedict Elliott Smith <be...@apache.org>.

> Further, we have thousands of tests across all our suites

I think most here would agree that our testing remains inadequate, and that this (modest, even in pure numerical terms for such a large project) number of often poorly-written unit tests does not really change that fact.

Most of the problems found to date have been found with 3.0, not with 4.0, and found by user report.  We agreed a long time ago that we would aim for 4.0 to be a more stable release than any prior.  Today I think the only reason that might be true is the amount of work invested in fixing problems found in _earlier releases_, not due to verification of 4.0.

I say this not to influence the decision about when and what lands in beta, only to ensure we stay honest with ourselves about our progress on quality.  I hope the software itself is higher quality today, but I do not believe it is honest to (yet) claim that our testing is significantly higher quality than those releases we all agree were inadequate.  There exists some wider external use case testing, but being mostly invisible to the community it is unclear how much broader our coverage is with these included.

On 16/06/2020, 23:08, "David Capwell" <dc...@apple.com.INVALID> wrote:

    Inline

    > On Jun 16, 2020, at 2:17 PM, Joshua McKenzie <jm...@apache.org> wrote:
    > 
    >> 
    >> we still produce incorrect results as shown by CASSANDRA-15313; this is a
    >> correctness issue, so must be a blocker for v5 protocol.
    > 
    > That makes complete sense; I'd somehow missed the incorrect results aspect
    > in trying to get context on the work. I'd be eager to hear about progress
    > on it as well.
    > 
    > Regarding the question of "why would users test if we haven't tested yet",
    > I respectfully disagree both on the assertion we haven't tested yet as well
    > as on the distinction between an "us vs. them" in the community. We're all
    > users and participants in the Cassandra community and ecosystem so anyone
    > downloading the DB to test it out is just as vital as one of us from the
    > dev list, committer list, or pmc list testing out the DB.

    I apologies if I came off discriminatory, I will try to absorb your words carefully; thank you for correcting my behavior.

    > While we can
    > reasonably expect a dev paid full time working on the project with a large
    > amount of infrastructure doing testing to be crucial to getting a release
    > out and doing certain kinds of testing, there are literally thousands of
    > different companies out in the world basing their critical infrastructure
    > on this project and them testing out their use-cases and migration is just
    > as critical to this release being ready. It takes a village.

    I do agree that user validation is important for the release, I was mostly trying to question why start here before the testing work in JIRA is complete.  Maybe I am in the wrong, I have been heads down working on data corruption issues in 3.x; I have become more risk adverse.

    > 
    > Further, we have thousands of tests across all our suites, hundreds of new
    > use-case testing that has been done against 4.0 at this point, and 30+%
    > more bugs fixed in this release than 3.0; the blanket assertion that we
    > haven't tested 4.0 yet doesn't resonate with me. While we haven't done the
    > entirety of our final 40 beta phase testing yet, testing is constantly
    > going on against this codebase by both people on the ML and off.
    > 
    > Now, if there are major known glaring issues where we have problems that
    > would prevent users from actually testing out the beta and kicking the
    > tires, that's a different story entirely and I'd argue those tickets should
    > be reflected in the alpha phase (see: CASSANDRA-15299 apparently ;) )
    > 
    > Does that make sense?

    I have been meaning to ask this, mostly asking people in Slack and this actually confuses me.  

    I was working off the assumption that the fix version meant it was a blocker for that release, and that Alpha special cased and would have releases even with blocking issues (which is documented in the Release Lifecycle).  When I ask around I hear that this is not correct and that alpha means “blocks beta”, beta means “blocks RC”, etc (is any of this documented, I couldn’t find any last time I was talking to others about this). 

    Now, lets say we close alpha and cut a beta release, my understanding is that tickets which block the next beta release are alpha…. So do we still mark them alpha (even though we won’t have a alpha release)?

    This has been confusing me since beta has a lot of work pending… sorry for not bring this up in a dedicated dev@ thread


    > 
    > On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <be...@apache.org>
    > wrote:
    > 
    >> So, if it helps matters: I am explicitly -1 the prior version of this work
    >> due to the technical concerns expressed here and on the ticket.  So we
    >> either need to revert that patch or incorporate 15299.
    >> 
    >> On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org> wrote:
    >> 
    >>> 
    >>> 2) Alternatively, it's been 3 years, 4 months, 13 days since the
    >> release of
    >>> 3.10.0 (the last time we added new features to the DB)
    >>> 
    >> 
    >> 
    >>    We did tick-tock, pushing feature releases too quickly, and without
    >>    supporting them for long enough to get stable. And then we've done "a
    >> la no
    >>    feature releases" for over 3 years. It feels like the bar went from
    >> too low
    >>    to too high.
    >> 
    >>    I understand the importance of CASSANDRA-15299. But it hasn't had any
    >>    comments in 12 twelve days, and in this stage of the feature freeze,
    >> with
    >>    so few alpha bugs remaining, that's a long time. Sam, can you speak to
    >> its
    >>    eta?
    >> 
    >> 
    >> 
    >>> 4) If we plan on releasing 4.1 six months after the release of 4.0
    >> (i.e.
    >>> calender scope vs. feature scope - not yet agreed upon but an
    >> option),
    >> 
    >> 
    >> 
    >>    I like this. I think it's worth appreciating the different
    >> perspectives of
    >>    this community: those involved with private clusters that don't rely on
    >>    official releases, versus those involved with the public and other
    >> people's
    >>    clusters. The latter group needs those official releases much more, but
    >>    this also ties into putting those users more in focus and figuring out
    >>    where the bar best sits. This isn't meant to divide, we all care and
    >> voice
    >>    for the user, but just to utilise the different strengths brought to
    >> the
    >>    table.
    >> 
    >> 
    >>> If we want 4.0.0 out faster, the biggest gains would be to get the
    >> test
    >>    plans written up and get more people working on automated testing.
    >> 
    >> 
    >>    Yes, 110%.  Though, as long as this continues to improve, as it has,
    >> does
    >>    it need to be a blocker on 4.0?
    >> 
    >> 
    >> 
    >> ---------------------------------------------------------------------
    >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >> For additional commands, e-mail: dev-help@cassandra.apache.org
    >> 
    >> 


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    For additional commands, e-mail: dev-help@cassandra.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by David Capwell <dc...@apple.com.INVALID>.

Inline

> On Jun 16, 2020, at 2:17 PM, Joshua McKenzie <jm...@apache.org> wrote:
> 
>> 
>> we still produce incorrect results as shown by CASSANDRA-15313; this is a
>> correctness issue, so must be a blocker for v5 protocol.
> 
> That makes complete sense; I'd somehow missed the incorrect results aspect
> in trying to get context on the work. I'd be eager to hear about progress
> on it as well.
> 
> Regarding the question of "why would users test if we haven't tested yet",
> I respectfully disagree both on the assertion we haven't tested yet as well
> as on the distinction between an "us vs. them" in the community. We're all
> users and participants in the Cassandra community and ecosystem so anyone
> downloading the DB to test it out is just as vital as one of us from the
> dev list, committer list, or pmc list testing out the DB.

I apologies if I came off discriminatory, I will try to absorb your words carefully; thank you for correcting my behavior.

> While we can
> reasonably expect a dev paid full time working on the project with a large
> amount of infrastructure doing testing to be crucial to getting a release
> out and doing certain kinds of testing, there are literally thousands of
> different companies out in the world basing their critical infrastructure
> on this project and them testing out their use-cases and migration is just
> as critical to this release being ready. It takes a village.

I do agree that user validation is important for the release, I was mostly trying to question why start here before the testing work in JIRA is complete.  Maybe I am in the wrong, I have been heads down working on data corruption issues in 3.x; I have become more risk adverse.

> 
> Further, we have thousands of tests across all our suites, hundreds of new
> use-case testing that has been done against 4.0 at this point, and 30+%
> more bugs fixed in this release than 3.0; the blanket assertion that we
> haven't tested 4.0 yet doesn't resonate with me. While we haven't done the
> entirety of our final 40 beta phase testing yet, testing is constantly
> going on against this codebase by both people on the ML and off.
> 
> Now, if there are major known glaring issues where we have problems that
> would prevent users from actually testing out the beta and kicking the
> tires, that's a different story entirely and I'd argue those tickets should
> be reflected in the alpha phase (see: CASSANDRA-15299 apparently ;) )
> 
> Does that make sense?

I have been meaning to ask this, mostly asking people in Slack and this actually confuses me.  

I was working off the assumption that the fix version meant it was a blocker for that release, and that Alpha special cased and would have releases even with blocking issues (which is documented in the Release Lifecycle).  When I ask around I hear that this is not correct and that alpha means “blocks beta”, beta means “blocks RC”, etc (is any of this documented, I couldn’t find any last time I was talking to others about this). 

Now, lets say we close alpha and cut a beta release, my understanding is that tickets which block the next beta release are alpha…. So do we still mark them alpha (even though we won’t have a alpha release)?

This has been confusing me since beta has a lot of work pending… sorry for not bring this up in a dedicated dev@ thread


> 
> On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
>> So, if it helps matters: I am explicitly -1 the prior version of this work
>> due to the technical concerns expressed here and on the ticket.  So we
>> either need to revert that patch or incorporate 15299.
>> 
>> On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org> wrote:
>> 
>>> 
>>> 2) Alternatively, it's been 3 years, 4 months, 13 days since the
>> release of
>>> 3.10.0 (the last time we added new features to the DB)
>>> 
>> 
>> 
>>    We did tick-tock, pushing feature releases too quickly, and without
>>    supporting them for long enough to get stable. And then we've done "a
>> la no
>>    feature releases" for over 3 years. It feels like the bar went from
>> too low
>>    to too high.
>> 
>>    I understand the importance of CASSANDRA-15299. But it hasn't had any
>>    comments in 12 twelve days, and in this stage of the feature freeze,
>> with
>>    so few alpha bugs remaining, that's a long time. Sam, can you speak to
>> its
>>    eta?
>> 
>> 
>> 
>>> 4) If we plan on releasing 4.1 six months after the release of 4.0
>> (i.e.
>>> calender scope vs. feature scope - not yet agreed upon but an
>> option),
>> 
>> 
>> 
>>    I like this. I think it's worth appreciating the different
>> perspectives of
>>    this community: those involved with private clusters that don't rely on
>>    official releases, versus those involved with the public and other
>> people's
>>    clusters. The latter group needs those official releases much more, but
>>    this also ties into putting those users more in focus and figuring out
>>    where the bar best sits. This isn't meant to divide, we all care and
>> voice
>>    for the user, but just to utilise the different strengths brought to
>> the
>>    table.
>> 
>> 
>>> If we want 4.0.0 out faster, the biggest gains would be to get the
>> test
>>    plans written up and get more people working on automated testing.
>> 
>> 
>>    Yes, 110%.  Though, as long as this continues to improve, as it has,
>> does
>>    it need to be a blocker on 4.0?
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Joshua McKenzie <jm...@apache.org>.

>
> we still produce incorrect results as shown by CASSANDRA-15313; this is a
> correctness issue, so must be a blocker for v5 protocol.

That makes complete sense; I'd somehow missed the incorrect results aspect
in trying to get context on the work. I'd be eager to hear about progress
on it as well.

Regarding the question of "why would users test if we haven't tested yet",
I respectfully disagree both on the assertion we haven't tested yet as well
as on the distinction between an "us vs. them" in the community. We're all
users and participants in the Cassandra community and ecosystem so anyone
downloading the DB to test it out is just as vital as one of us from the
dev list, committer list, or pmc list testing out the DB. While we can
reasonably expect a dev paid full time working on the project with a large
amount of infrastructure doing testing to be crucial to getting a release
out and doing certain kinds of testing, there are literally thousands of
different companies out in the world basing their critical infrastructure
on this project and them testing out their use-cases and migration is just
as critical to this release being ready. It takes a village.

Further, we have thousands of tests across all our suites, hundreds of new
use-case testing that has been done against 4.0 at this point, and 30+%
more bugs fixed in this release than 3.0; the blanket assertion that we
haven't tested 4.0 yet doesn't resonate with me. While we haven't done the
entirety of our final 40 beta phase testing yet, testing is constantly
going on against this codebase by both people on the ML and off.

Now, if there are major known glaring issues where we have problems that
would prevent users from actually testing out the beta and kicking the
tires, that's a different story entirely and I'd argue those tickets should
be reflected in the alpha phase (see: CASSANDRA-15299 apparently ;) )

Does that make sense?

On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> So, if it helps matters: I am explicitly -1 the prior version of this work
> due to the technical concerns expressed here and on the ticket.  So we
> either need to revert that patch or incorporate 15299.
>
> On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org> wrote:
>
>     >
>     > 2) Alternatively, it's been 3 years, 4 months, 13 days since the
> release of
>     > 3.10.0 (the last time we added new features to the DB)
>     >
>
>
>     We did tick-tock, pushing feature releases too quickly, and without
>     supporting them for long enough to get stable. And then we've done "a
> la no
>     feature releases" for over 3 years. It feels like the bar went from
> too low
>     to too high.
>
>     I understand the importance of CASSANDRA-15299. But it hasn't had any
>     comments in 12 twelve days, and in this stage of the feature freeze,
> with
>     so few alpha bugs remaining, that's a long time. Sam, can you speak to
> its
>     eta?
>
>
>
>     > 4) If we plan on releasing 4.1 six months after the release of 4.0
> (i.e.
>     > calender scope vs. feature scope - not yet agreed upon but an
> option),
>
>
>
>     I like this. I think it's worth appreciating the different
> perspectives of
>     this community: those involved with private clusters that don't rely on
>     official releases, versus those involved with the public and other
> people's
>     clusters. The latter group needs those official releases much more, but
>     this also ties into putting those users more in focus and figuring out
>     where the bar best sits. This isn't meant to divide, we all care and
> voice
>     for the user, but just to utilise the different strengths brought to
> the
>     table.
>
>
>     > If we want 4.0.0 out faster, the biggest gains would be to get the
> test
>     plans written up and get more people working on automated testing.
>
>
>     Yes, 110%.  Though, as long as this continues to improve, as it has,
> does
>     it need to be a blocker on 4.0?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Benedict Elliott Smith <be...@apache.org>.

So, if it helps matters: I am explicitly -1 the prior version of this work due to the technical concerns expressed here and on the ticket.  So we either need to revert that patch or incorporate 15299.

On 16/06/2020, 21:48, "Mick Semb Wever" <mc...@apache.org> wrote:

    >
    > 2) Alternatively, it's been 3 years, 4 months, 13 days since the release of
    > 3.10.0 (the last time we added new features to the DB)
    >


    We did tick-tock, pushing feature releases too quickly, and without
    supporting them for long enough to get stable. And then we've done "a la no
    feature releases" for over 3 years. It feels like the bar went from too low
    to too high.

    I understand the importance of CASSANDRA-15299. But it hasn't had any
    comments in 12 twelve days, and in this stage of the feature freeze, with
    so few alpha bugs remaining, that's a long time. Sam, can you speak to its
    eta?



    > 4) If we plan on releasing 4.1 six months after the release of 4.0 (i.e.
    > calender scope vs. feature scope - not yet agreed upon but an option),



    I like this. I think it's worth appreciating the different perspectives of
    this community: those involved with private clusters that don't rely on
    official releases, versus those involved with the public and other people's
    clusters. The latter group needs those official releases much more, but
    this also ties into putting those users more in focus and figuring out
    where the bar best sits. This isn't meant to divide, we all care and voice
    for the user, but just to utilise the different strengths brought to the
    table.


    > If we want 4.0.0 out faster, the biggest gains would be to get the test
    plans written up and get more people working on automated testing.


    Yes, 110%.  Though, as long as this continues to improve, as it has, does
    it need to be a blocker on 4.0?



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Mick Semb Wever <mc...@apache.org>.

>
> 2) Alternatively, it's been 3 years, 4 months, 13 days since the release of
> 3.10.0 (the last time we added new features to the DB)
>


We did tick-tock, pushing feature releases too quickly, and without
supporting them for long enough to get stable. And then we've done "a la no
feature releases" for over 3 years. It feels like the bar went from too low
to too high.

I understand the importance of CASSANDRA-15299. But it hasn't had any
comments in 12 twelve days, and in this stage of the feature freeze, with
so few alpha bugs remaining, that's a long time. Sam, can you speak to its
eta?



> 4) If we plan on releasing 4.1 six months after the release of 4.0 (i.e.
> calender scope vs. feature scope - not yet agreed upon but an option),



I like this. I think it's worth appreciating the different perspectives of
this community: those involved with private clusters that don't rely on
official releases, versus those involved with the public and other people's
clusters. The latter group needs those official releases much more, but
this also ties into putting those users more in focus and figuring out
where the bar best sits. This isn't meant to divide, we all care and voice
for the user, but just to utilise the different strengths brought to the
table.


> If we want 4.0.0 out faster, the biggest gains would be to get the test
plans written up and get more people working on automated testing.


Yes, 110%.  Though, as long as this continues to improve, as it has, does
it need to be a blocker on 4.0?

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by David Capwell <dc...@apple.com.INVALID>.

inline

> On Jun 16, 2020, at 11:01 AM, Joshua McKenzie <jm...@apache.org> wrote:
> 
>> 
>> CASSANDRA-15299 - this should be a blocker for v5,
> 
> Could you explain a little more about this? I'm missing context.

V5 added checksumming but had 2 main issues; lack of header checksum, and checksum was on the decompressed stream rather than the compressed stream; the second issue being the bigger issue since it was shown to cause Cassandra to crash (see CASSANDRA-15556) and return incorrect results (see CASSANDRA-15313, this fails frequently showing even more streams where Cassandra returns bad results).  I added a patch to make sure we don’t crash anymore, but we still produce incorrect results as shown by CASSANDRA-15313; this is a correctness issue, so must be a blocker for v5 protocol.

> 
> punting these features don’t really get 4.0.0 released any faster.
> 
> GA, no. Beta, where we have a large call to arms to get broad user testing
> and adoption in QA and dev environments, however, I'd contend yes.

I think you and I see this very differently; why should users push to QA if we have not tested yet?  Maybe I missed a lot, I have been ignoring dev@ and JIRA for a while now since I am mostly dealing with corruption issues and performance issues upgrading from 2.x to 3.x; forgive me if I missed any conversations.  Is there a reason we want users to start testing before we start the testing effort?  Do we expect users to detect correctness issues, or looking for crash reports only; shouldn’t we at least have a baseline of longevity/upgrade testing done before we ask our users to do that for us?


> The
> sooner we can get this into the hands of the community to test it, the
> better in terms of the stability of the final release IMO.
> 
> 
> 
> On Tue, Jun 16, 2020 at 1:21 PM David Capwell <dc...@apple.com.invalid>
> wrote:
> 
>> CASSANDRA-15146 and CASSANDRA-14825 both can be rolled out and be
>> backwards compatible, so 4.0 vs 4.1 is fine to me
>> CASSANDRA-15299 - this should be a blocker for v5, so if this was punted
>> out of 4.0 for any reason, we should also disable v5 protocol.  About it
>> being a blocker for beta, since this would break the API, it has to be done
>> pre-beta.  Anything which relies on v5 should also be disabled and removed
>> from user facing docs.  Given this, I am more in favor of keeping it in 4.0
>> 
>>> The top 5
>>> drivers alone see tens of thousands of aggregate downloads a day; getting
>>> all 5 of those in parity w/the new featureset and to be tested during the
>>> GA phase is going to be very difficult with driver impacting, significant
>>> protocol changes this late in the alpha cycle (this would argue for them
>>> being pushed to 4.1; including here just to point out the ambivalent PoV
>>> here)
>> 
>> For 14825 it would expose a way for clients to learn the schema without
>> needing to know how we store the schema, which should make it less brittle
>> the next time the implementation changes; given that this would be optional
>> to clients, I don’t see why clients can’t migrate to this at their own rate.
>> 15299 is an interface breaking change, but for a new interface that has
>> not yet been released; given the fact v4 is still supported, clients can
>> migrate over time.
>> 
>> Clients have a decoupled life cycle from Apache Cassandra, and the
>> features in question are optional for clients, so I find it reasonable that
>> clients migrate at their own rate.  As long as we don’t break the existing
>> APIs, I don’t see why client implementations must be a blocker for Apache
>> Cassandra to release.
>> 
>>> That
>>> said, trying to put myself in the shoes of an end user that hasn't seen a
>>> material functionality upgrade in 3+ years and could be testing out and
>>> using zero-copy streaming, audit logging, the new messaging service code,
>>> and the hundreds of bugfixes and almost 300 improvements already in 4.0
>> - I
>>> think the value in getting this release in my hands would outweigh the
>>> value in getting these three particular features in 4.0 vs. 4.1.
>> 
>> The biggest body of work I know of is testing and the fact we are missing
>> a lot of tests currently.  Given this, the biggest effort remaining is
>> working down this and resolving issues found; so punting these features
>> don’t really get 4.0.0 released any faster.  If we want 4.0.0 out faster,
>> the biggest gains would be to get the test plans written up and get more
>> people working on automated testing.
>> 
>> 
>>> On Jun 16, 2020, at 9:13 AM, Joshua McKenzie <jm...@apache.org>
>> wrote:
>>> 
>>> I wanted to open up a discussion about optionality of a few tickets for
>>> 4.0. The three I'm specifically thinking of here are:
>>> 1) CASSANDRA-15146: Transition TLS server configuration options are
>> overly
>>> complex
>>> 2) CASSANDRA-14825: Expose table schema for drivers
>>> 3) CASSANDRA-15299: CASSANDRA-13304 follow-up: improve checksumming and
>>> compression in protocol v5-beta
>>> 
>>> I am *personally* of the opinion that each of these three should be
>>> considered optional for 4.0 and not blockers to cut beta. My reasoning:
>>> 1) it's been 4 years, 7 months, 11 days since the release of 3.0.0
>>> 2) Alternatively, it's been 3 years, 4 months, 13 days since the release
>> of
>>> 3.10.0 (the last time we added new features to the DB)
>>> 3) 2 of the 3 tickets involve non-trivial changes to the drivers. The
>> top 5
>>> drivers alone see tens of thousands of aggregate downloads a day; getting
>>> all 5 of those in parity w/the new featureset and to be tested during the
>>> GA phase is going to be very difficult with driver impacting, significant
>>> protocol changes this late in the alpha cycle (this would argue for them
>>> being pushed to 4.1; including here just to point out the ambivalent PoV
>>> here)
>>> 4) If we plan on releasing 4.1 six months after the release of 4.0 (i.e.
>>> calender scope vs. feature scope - not yet agreed upon but an option), we
>>> would be looking at a relatively trivial delay of the addition of
>>> "nice-to-have" features relative to broader infrastructure adoption
>> cycles.
>>> 
>>> I know this is a controversial topic, and I've spoken with many of you
>> that
>>> are working on or reviewing the above tickets - your points of view and
>>> arguments in favor of keeping them in 4.0 definitely resonate with me.
>> That
>>> said, trying to put myself in the shoes of an end user that hasn't seen a
>>> material functionality upgrade in 3+ years and could be testing out and
>>> using zero-copy streaming, audit logging, the new messaging service code,
>>> and the hundreds of bugfixes and almost 300 improvements already in 4.0
>> - I
>>> think the value in getting this release in my hands would outweigh the
>>> value in getting these three particular features in 4.0 vs. 4.1.
>>> 
>>> Also, to reiterate, I would personally advocate for these three tickets
>>> being *optional*, meaning if we merge the 1 awaiting review and 5 in
>>> review, then we push them to 4.1.
>>> 
>>> So - what does everyone else think?
>>> 
>>> ~Josh
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by Joshua McKenzie <jm...@apache.org>.

>
> CASSANDRA-15299 - this should be a blocker for v5,

Could you explain a little more about this? I'm missing context.

 punting these features don’t really get 4.0.0 released any faster.

GA, no. Beta, where we have a large call to arms to get broad user testing
and adoption in QA and dev environments, however, I'd contend yes. The
sooner we can get this into the hands of the community to test it, the
better in terms of the stability of the final release IMO.



On Tue, Jun 16, 2020 at 1:21 PM David Capwell <dc...@apple.com.invalid>
wrote:

> CASSANDRA-15146 and CASSANDRA-14825 both can be rolled out and be
> backwards compatible, so 4.0 vs 4.1 is fine to me
> CASSANDRA-15299 - this should be a blocker for v5, so if this was punted
> out of 4.0 for any reason, we should also disable v5 protocol.  About it
> being a blocker for beta, since this would break the API, it has to be done
> pre-beta.  Anything which relies on v5 should also be disabled and removed
> from user facing docs.  Given this, I am more in favor of keeping it in 4.0
>
> > The top 5
> > drivers alone see tens of thousands of aggregate downloads a day; getting
> > all 5 of those in parity w/the new featureset and to be tested during the
> > GA phase is going to be very difficult with driver impacting, significant
> > protocol changes this late in the alpha cycle (this would argue for them
> > being pushed to 4.1; including here just to point out the ambivalent PoV
> > here)
>
> For 14825 it would expose a way for clients to learn the schema without
> needing to know how we store the schema, which should make it less brittle
> the next time the implementation changes; given that this would be optional
> to clients, I don’t see why clients can’t migrate to this at their own rate.
> 15299 is an interface breaking change, but for a new interface that has
> not yet been released; given the fact v4 is still supported, clients can
> migrate over time.
>
> Clients have a decoupled life cycle from Apache Cassandra, and the
> features in question are optional for clients, so I find it reasonable that
> clients migrate at their own rate.  As long as we don’t break the existing
> APIs, I don’t see why client implementations must be a blocker for Apache
> Cassandra to release.
>
> > That
> > said, trying to put myself in the shoes of an end user that hasn't seen a
> > material functionality upgrade in 3+ years and could be testing out and
> > using zero-copy streaming, audit logging, the new messaging service code,
> > and the hundreds of bugfixes and almost 300 improvements already in 4.0
> - I
> > think the value in getting this release in my hands would outweigh the
> > value in getting these three particular features in 4.0 vs. 4.1.
>
> The biggest body of work I know of is testing and the fact we are missing
> a lot of tests currently.  Given this, the biggest effort remaining is
> working down this and resolving issues found; so punting these features
> don’t really get 4.0.0 released any faster.  If we want 4.0.0 out faster,
> the biggest gains would be to get the test plans written up and get more
> people working on automated testing.
>
>
> > On Jun 16, 2020, at 9:13 AM, Joshua McKenzie <jm...@apache.org>
> wrote:
> >
> > I wanted to open up a discussion about optionality of a few tickets for
> > 4.0. The three I'm specifically thinking of here are:
> > 1) CASSANDRA-15146: Transition TLS server configuration options are
> overly
> > complex
> > 2) CASSANDRA-14825: Expose table schema for drivers
> > 3) CASSANDRA-15299: CASSANDRA-13304 follow-up: improve checksumming and
> > compression in protocol v5-beta
> >
> > I am *personally* of the opinion that each of these three should be
> > considered optional for 4.0 and not blockers to cut beta. My reasoning:
> > 1) it's been 4 years, 7 months, 11 days since the release of 3.0.0
> > 2) Alternatively, it's been 3 years, 4 months, 13 days since the release
> of
> > 3.10.0 (the last time we added new features to the DB)
> > 3) 2 of the 3 tickets involve non-trivial changes to the drivers. The
> top 5
> > drivers alone see tens of thousands of aggregate downloads a day; getting
> > all 5 of those in parity w/the new featureset and to be tested during the
> > GA phase is going to be very difficult with driver impacting, significant
> > protocol changes this late in the alpha cycle (this would argue for them
> > being pushed to 4.1; including here just to point out the ambivalent PoV
> > here)
> > 4) If we plan on releasing 4.1 six months after the release of 4.0 (i.e.
> > calender scope vs. feature scope - not yet agreed upon but an option), we
> > would be looking at a relatively trivial delay of the addition of
> > "nice-to-have" features relative to broader infrastructure adoption
> cycles.
> >
> > I know this is a controversial topic, and I've spoken with many of you
> that
> > are working on or reviewing the above tickets - your points of view and
> > arguments in favor of keeping them in 4.0 definitely resonate with me.
> That
> > said, trying to put myself in the shoes of an end user that hasn't seen a
> > material functionality upgrade in 3+ years and could be testing out and
> > using zero-copy streaming, audit logging, the new messaging service code,
> > and the hundreds of bugfixes and almost 300 improvements already in 4.0
> - I
> > think the value in getting this release in my hands would outweigh the
> > value in getting these three particular features in 4.0 vs. 4.1.
> >
> > Also, to reiterate, I would personally advocate for these three tickets
> > being *optional*, meaning if we merge the 1 awaiting review and 5 in
> > review, then we push them to 4.1.
> >
> > So - what does everyone else think?
> >
> > ~Josh
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Considering when to push tickets out of 4.0

Posted by David Capwell <dc...@apple.com.INVALID>.

CASSANDRA-15146 and CASSANDRA-14825 both can be rolled out and be backwards compatible, so 4.0 vs 4.1 is fine to me
CASSANDRA-15299 - this should be a blocker for v5, so if this was punted out of 4.0 for any reason, we should also disable v5 protocol.  About it being a blocker for beta, since this would break the API, it has to be done pre-beta.  Anything which relies on v5 should also be disabled and removed from user facing docs.  Given this, I am more in favor of keeping it in 4.0

> The top 5
> drivers alone see tens of thousands of aggregate downloads a day; getting
> all 5 of those in parity w/the new featureset and to be tested during the
> GA phase is going to be very difficult with driver impacting, significant
> protocol changes this late in the alpha cycle (this would argue for them
> being pushed to 4.1; including here just to point out the ambivalent PoV
> here)

For 14825 it would expose a way for clients to learn the schema without needing to know how we store the schema, which should make it less brittle the next time the implementation changes; given that this would be optional to clients, I don’t see why clients can’t migrate to this at their own rate.
15299 is an interface breaking change, but for a new interface that has not yet been released; given the fact v4 is still supported, clients can migrate over time.

Clients have a decoupled life cycle from Apache Cassandra, and the features in question are optional for clients, so I find it reasonable that clients migrate at their own rate.  As long as we don’t break the existing APIs, I don’t see why client implementations must be a blocker for Apache Cassandra to release.

> That
> said, trying to put myself in the shoes of an end user that hasn't seen a
> material functionality upgrade in 3+ years and could be testing out and
> using zero-copy streaming, audit logging, the new messaging service code,
> and the hundreds of bugfixes and almost 300 improvements already in 4.0 - I
> think the value in getting this release in my hands would outweigh the
> value in getting these three particular features in 4.0 vs. 4.1.

The biggest body of work I know of is testing and the fact we are missing a lot of tests currently.  Given this, the biggest effort remaining is working down this and resolving issues found; so punting these features don’t really get 4.0.0 released any faster.  If we want 4.0.0 out faster, the biggest gains would be to get the test plans written up and get more people working on automated testing.

> On Jun 16, 2020, at 9:13 AM, Joshua McKenzie <jm...@apache.org> wrote:
> 
> I wanted to open up a discussion about optionality of a few tickets for
> 4.0. The three I'm specifically thinking of here are:
> 1) CASSANDRA-15146: Transition TLS server configuration options are overly
> complex
> 2) CASSANDRA-14825: Expose table schema for drivers
> 3) CASSANDRA-15299: CASSANDRA-13304 follow-up: improve checksumming and
> compression in protocol v5-beta
> 
> I am *personally* of the opinion that each of these three should be
> considered optional for 4.0 and not blockers to cut beta. My reasoning:
> 1) it's been 4 years, 7 months, 11 days since the release of 3.0.0
> 2) Alternatively, it's been 3 years, 4 months, 13 days since the release of
> 3.10.0 (the last time we added new features to the DB)
> 3) 2 of the 3 tickets involve non-trivial changes to the drivers. The top 5
> drivers alone see tens of thousands of aggregate downloads a day; getting
> all 5 of those in parity w/the new featureset and to be tested during the
> GA phase is going to be very difficult with driver impacting, significant
> protocol changes this late in the alpha cycle (this would argue for them
> being pushed to 4.1; including here just to point out the ambivalent PoV
> here)
> 4) If we plan on releasing 4.1 six months after the release of 4.0 (i.e.
> calender scope vs. feature scope - not yet agreed upon but an option), we
> would be looking at a relatively trivial delay of the addition of
> "nice-to-have" features relative to broader infrastructure adoption cycles.
> 
> I know this is a controversial topic, and I've spoken with many of you that
> are working on or reviewing the above tickets - your points of view and
> arguments in favor of keeping them in 4.0 definitely resonate with me. That
> said, trying to put myself in the shoes of an end user that hasn't seen a
> material functionality upgrade in 3+ years and could be testing out and
> using zero-copy streaming, audit logging, the new messaging service code,
> and the hundreds of bugfixes and almost 300 improvements already in 4.0 - I
> think the value in getting this release in my hands would outweigh the
> value in getting these three particular features in 4.0 vs. 4.1.
> 
> Also, to reiterate, I would personally advocate for these three tickets
> being *optional*, meaning if we merge the 1 awaiting review and 5 in
> review, then we push them to 4.1.
> 
> So - what does everyone else think?
> 
> ~Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org