You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jon Haddad <jo...@jonhaddad.com> on 2020/06/30 19:43:52 UTC

[DISCUSS] Future of MVs

A couple days ago when writing a separate email I came across this DataStax
blog post discussing MVs [1].  Imagine my surprise when I noticed the date
was five years ago...

While at TLP, I helped numerous customers move off of MVs, mostly because
they affected stability of clusters in a horrific way.  The most telling
project involved helping someone create new tables to manage 1GB of data
because the views performed so poorly they made the cluster unresponsive
and unusable.  Despite being around for five years, they've seen very
little improvement that makes them usable for non trivial, non laptop
workloads.

Since the original commits, it doesn't look like there's been much work to
improve them, and they're yet another feature I ended up saying "just don't
use".  I haven't heard any plans to improve them in any meaningful way -
either to address their issues with performance or the inability to repair
them.

The original contributor of MVs (Carl Yeksigian) seems to have disappeared
from the project, meaning we have a broken feature without a maintainer,
and no plans to fix it.

As we move forward with the 4.0 release, we should consider this an
opportunity to deprecate materialized views, and remove them in 5.0.  We
should take this opportunity to learn from the mistake and raise the bar
for new features to undergo a much more thorough run the wringer before
merging.

I'm curious what folks think - am I way off base here?  Am I missing a JIRA
that can magically fix the issues with performance, availability &
correctness?

[1]
https://www.datastax.com/blog/2015/06/new-cassandra-30-materialized-views
[2] https://issues.apache.org/jira/browse/CASSANDRA-6477

Re: [DISCUSS] Future of MVs

Posted by Brandon Williams <dr...@gmail.com>.

+1

On Tue, Jun 30, 2020 at 2:44 PM Jon Haddad <jo...@jonhaddad.com> wrote:
>
> A couple days ago when writing a separate email I came across this DataStax
> blog post discussing MVs [1].  Imagine my surprise when I noticed the date
> was five years ago...
>
> While at TLP, I helped numerous customers move off of MVs, mostly because
> they affected stability of clusters in a horrific way.  The most telling
> project involved helping someone create new tables to manage 1GB of data
> because the views performed so poorly they made the cluster unresponsive
> and unusable.  Despite being around for five years, they've seen very
> little improvement that makes them usable for non trivial, non laptop
> workloads.
>
> Since the original commits, it doesn't look like there's been much work to
> improve them, and they're yet another feature I ended up saying "just don't
> use".  I haven't heard any plans to improve them in any meaningful way -
> either to address their issues with performance or the inability to repair
> them.
>
> The original contributor of MVs (Carl Yeksigian) seems to have disappeared
> from the project, meaning we have a broken feature without a maintainer,
> and no plans to fix it.
>
> As we move forward with the 4.0 release, we should consider this an
> opportunity to deprecate materialized views, and remove them in 5.0.  We
> should take this opportunity to learn from the mistake and raise the bar
> for new features to undergo a much more thorough run the wringer before
> merging.
>
> I'm curious what folks think - am I way off base here?  Am I missing a JIRA
> that can magically fix the issues with performance, availability &
> correctness?
>
> [1]
> https://www.datastax.com/blog/2015/06/new-cassandra-30-materialized-views
> [2] https://issues.apache.org/jira/browse/CASSANDRA-6477

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Blake Eggleston <be...@apple.com.INVALID>.

+1 for deprecation and removal (assuming a credible plan to fix them doesn't materialize)

> On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
> 
> A couple days ago when writing a separate email I came across this DataStax
> blog post discussing MVs [1].  Imagine my surprise when I noticed the date
> was five years ago...
> 
> While at TLP, I helped numerous customers move off of MVs, mostly because
> they affected stability of clusters in a horrific way.  The most telling
> project involved helping someone create new tables to manage 1GB of data
> because the views performed so poorly they made the cluster unresponsive
> and unusable.  Despite being around for five years, they've seen very
> little improvement that makes them usable for non trivial, non laptop
> workloads.
> 
> Since the original commits, it doesn't look like there's been much work to
> improve them, and they're yet another feature I ended up saying "just don't
> use".  I haven't heard any plans to improve them in any meaningful way -
> either to address their issues with performance or the inability to repair
> them.
> 
> The original contributor of MVs (Carl Yeksigian) seems to have disappeared
> from the project, meaning we have a broken feature without a maintainer,
> and no plans to fix it.
> 
> As we move forward with the 4.0 release, we should consider this an
> opportunity to deprecate materialized views, and remove them in 5.0.  We
> should take this opportunity to learn from the mistake and raise the bar
> for new features to undergo a much more thorough run the wringer before
> merging.
> 
> I'm curious what folks think - am I way off base here?  Am I missing a JIRA
> that can magically fix the issues with performance, availability &
> correctness?
> 
> [1]
> https://www.datastax.com/blog/2015/06/new-cassandra-30-materialized-views
> [2] https://issues.apache.org/jira/browse/CASSANDRA-6477


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Jeff Jirsa <jj...@gmail.com>.

On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <jm...@apache.org>
wrote:

> We're just short of 98 tickets on the component since it's original merge
> so at least *some* work has been done to stabilize them. Not to say I'm
> endorsing running them at massive scale today without knowing what you're
> doing, to be clear. They are perhaps our largest loaded gun of a feature of
> self-foot-shooting atm. Zhao did a bunch of work on them internally and
> we've backported much of that to OSS; I've pinged him to chime in here.
>

Probably true.

>
> The "data is orphaned in your view when you lose all base replicas" issue
> is more or less "unsolvable", since a scan of a view to confirm data in the
> base table is so slow you're talking weeks to process and it totally
> trashes your page cache.

"Make the scan faster"
"Make the scan incremental and automatic"
"Make it not blow up your page cache"
"Make losing your base replicas less likely".

There's a concrete, real opportunity with MVs to create integrity
assertions we're missing. A dangling record from an MV that would point to
missing base data is something that could raise alarm bells and signal
JIRAs so we can potentially find and fix more surprise edge cases.

> So  from my PoV, I'm against us just voting to deprecate and remove without
> going into more depth into the current state of things and what options are
> on the table, since people will continue to build MV's at the client level
> which, in theory, should have worse correctness and performance
> characteristics than having a clean and well stabilized implementation in
> the coordinator.
>

Yanking features will definitely be painful for users. Leaving it
experimental seems much better for users as long as the
maintenance overhead is tolerable.

Re: [DISCUSS] Future of MVs

Posted by Jeremiah D Jordan <je...@datastax.com>.

> So  from my PoV, I'm against us just voting to deprecate and remove without
> going into more depth into the current state of things and what options are
> on the table, since people will continue to build MV's at the client level
> which, in theory, should have worse correctness and performance
> characteristics than having a clean and well stabilized implementation in
> the coordinator.

I agree with Josh here.  Multiple people have put in effort to improve the stability of MV’s since they were first put into the code base and the reasons for having them be in the DB have not changed.  Building MV like tables at the client level is actually harder to get right than doing it in the server.

-Jeremiah


> On Jun 30, 2020, at 3:45 PM, Joshua McKenzie <jm...@apache.org> wrote:
> 
> We're just short of 98 tickets on the component since it's original merge
> so at least *some* work has been done to stabilize them. Not to say I'm
> endorsing running them at massive scale today without knowing what you're
> doing, to be clear. They are perhaps our largest loaded gun of a feature of
> self-foot-shooting atm. Zhao did a bunch of work on them internally and
> we've backported much of that to OSS; I've pinged him to chime in here.
> 
> The "data is orphaned in your view when you lose all base replicas" issue
> is more or less "unsolvable", since a scan of a view to confirm data in the
> base table is so slow you're talking weeks to process and it totally
> trashes your page cache. I think Paulo landed on a "you have to rebuild the
> view if you lose all base data" reality. There's also, I believe, the
> unresolved issue of modeling how much data a base table with one to many
> views will end up taking up in its final form when denormalized. This could
> be vastly improved with something like an "EXPLAIN ANALYZE" for a table
> with views, if you'll excuse the mapping, to show "N bytes in base will
> become M with base + views" or something.
> 
> Last but definitely not least in dumping the state in my head about this,
> there's a bunch of potential for guardrailing people away from self-harm
> with MV's if we decide to go the route of guardrails (link:
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
> ).
> 
> So  from my PoV, I'm against us just voting to deprecate and remove without
> going into more depth into the current state of things and what options are
> on the table, since people will continue to build MV's at the client level
> which, in theory, should have worse correctness and performance
> characteristics than having a clean and well stabilized implementation in
> the coordinator.
> 
> Having them flagged as experimental for now as we stabilize 4.0 and get
> things out the door *seems* sufficient to me, but if people are widely
> using these out in the wild and ignoring that status and the corresponding
> warning, maybe we consider raising the volume on that warning for 4.0 while
> we figure this out.
> 
> Just my .02.
> 
> ~Josh
> 
> On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org> wrote:
> 
>>> On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
>>> 
>>> As we move forward with the 4.0 release, we should consider this an
>>> opportunity to deprecate materialized views, and remove them in 5.0.  We
>>> should take this opportunity to learn from the mistake and raise the bar
>>> for new features to undergo a much more thorough run the wringer before
>>> merging.
>> 
>> I'm in favor of marking them as deprecated and removing them in 5.0. If
>> someone steps up and can fix them in 5.0, then we always have the option of
>> accepting the fix.
>> 
>> Dinesh
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Benedict Elliott Smith <be...@apache.org>.

I think, just as importantly, we also need to grapple with what went wrong when features landed this way, since these were not isolated occurrences - suggesting structural issues were at play.

I'm not sure if a retrospective is viable with this organisational structure, but we can perhaps engage with it implicitly, in a positive way, by working to create a framework with clear expectations for how features should be delivered - to go hand-in-hand with CEP proposals.  

This framework can then also be applied to existing features considered to be inadequate, as we decide how to move forward with them.


On 30/06/2020, 22:01, "sankalp kohli" <ko...@gmail.com> wrote:

    Hi,
        I think we should revisit all features which require a lot more work to
    make them work. Here is how I think we should do for each one of them

    1. Identify such features and some details of why they are deprecation
    candidates.
    2. Ask the dev list if anyone is willing to work on improving them over the
    next 1 or 2 major releases.
    3. We then move to the user list to find who all are using it and if they
    are opposed to removing/deprecating it. Assuming few will be using it, we
    need to see the tradeoff of keeping it vs removing it on a case by case
    basis.
    4. Deprecate it in the next major or make it experimental if #2 and #3
    removes them from deprecation.
    5. Remove it in next major

    For MV, I see this email as step #2. We should move to asking the user list
    next.

    Thanks,
    Sankalp

    On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <jm...@apache.org>
    wrote:

    > We're just short of 98 tickets on the component since it's original merge
    > so at least *some* work has been done to stabilize them. Not to say I'm
    > endorsing running them at massive scale today without knowing what you're
    > doing, to be clear. They are perhaps our largest loaded gun of a feature of
    > self-foot-shooting atm. Zhao did a bunch of work on them internally and
    > we've backported much of that to OSS; I've pinged him to chime in here.
    >
    > The "data is orphaned in your view when you lose all base replicas" issue
    > is more or less "unsolvable", since a scan of a view to confirm data in the
    > base table is so slow you're talking weeks to process and it totally
    > trashes your page cache. I think Paulo landed on a "you have to rebuild the
    > view if you lose all base data" reality. There's also, I believe, the
    > unresolved issue of modeling how much data a base table with one to many
    > views will end up taking up in its final form when denormalized. This could
    > be vastly improved with something like an "EXPLAIN ANALYZE" for a table
    > with views, if you'll excuse the mapping, to show "N bytes in base will
    > become M with base + views" or something.
    >
    > Last but definitely not least in dumping the state in my head about this,
    > there's a bunch of potential for guardrailing people away from self-harm
    > with MV's if we decide to go the route of guardrails (link:
    >
    > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
    > ).
    >
    > So  from my PoV, I'm against us just voting to deprecate and remove without
    > going into more depth into the current state of things and what options are
    > on the table, since people will continue to build MV's at the client level
    > which, in theory, should have worse correctness and performance
    > characteristics than having a clean and well stabilized implementation in
    > the coordinator.
    >
    > Having them flagged as experimental for now as we stabilize 4.0 and get
    > things out the door *seems* sufficient to me, but if people are widely
    > using these out in the wild and ignoring that status and the corresponding
    > warning, maybe we consider raising the volume on that warning for 4.0 while
    > we figure this out.
    >
    > Just my .02.
    >
    > ~Josh
    >
    > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org> wrote:
    >
    > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
    > > >
    > > > As we move forward with the 4.0 release, we should consider this an
    > > > opportunity to deprecate materialized views, and remove them in 5.0.
    > We
    > > > should take this opportunity to learn from the mistake and raise the
    > bar
    > > > for new features to undergo a much more thorough run the wringer before
    > > > merging.
    > >
    > > I'm in favor of marking them as deprecated and removing them in 5.0. If
    > > someone steps up and can fix them in 5.0, then we always have the option
    > of
    > > accepting the fix.
    > >
    > > Dinesh
    > > ---------------------------------------------------------------------
    > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    > > For additional commands, e-mail: dev-help@cassandra.apache.org
    > >
    > >
    >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by "J. D. Jordan" <je...@gmail.com>.

>>> Instead of ripping it out, we could instead disable them in the yaml
>>> with big fat warning comments around it. 


FYI we have already disabled use of materialized views, SASI, and transient replication by default in 4.0

https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1393

> On Jun 30, 2020, at 6:53 PM, joshua.mckenzie@gmail.com wrote:
> 
> I followed up with the clarification about unit and dtests for that reason Dinesh. We test experimental features now.
> 
> If we’re talking about adding experimental features to the 40 quality testing effort, how does that differ from just saying “we won’t release until we’ve tested and stabilized these features and they’re no longer experimental”?
> 
> Maybe I’m just misunderstanding something here?
> 
>> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org> wrote:
>> 
>> 
>>> 
>>>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com> wrote:
>>> 
>>> Instead of ripping it out, we could instead disable them in the yaml
>>> with big fat warning comments around it.  That way people already
>>> using them can just enable them again, but it will raise the bar for
>>> new users who ignore/miss the warnings in the logs and just use them.
>> 
>> Not a bad idea. Although, the real issue is that users enable MV on a 3 node cluster with a few megs of data and conclude that MVs will horizontally scale with the size of data. This is what causes issues for users who naively roll it out in production and discover that MVs do not scale with their data growth. So whatever we do, the big fat warning should educate the unsuspecting operator.
>> 
>> Dinesh
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

Re: [DISCUSS] Future of MVs

Posted by Benedict Elliott Smith <be...@apache.org>.

Yep, agreed this is definitely the best route forwards.

On 02/07/2020, 01:10, "Joshua McKenzie" <jm...@apache.org> wrote:

    Plays pretty cleanly into the "have a test plan" we modded in last month. +1

    On Wed, Jul 1, 2020 at 6:43 PM Nate McCall <zz...@gmail.com> wrote:

    > >
    > >
    > >
    > > If so, I propose we set this thread down for now in deference to us
    > > articulating the quality bar we set and how we achieve it for features in
    > > the DB and then retroactively apply them to existing experimental
    > features.
    > > Should we determine nobody is stepping up to maintain an
    > > experimental feature in a reasonable time frame, we can cross the bridge
    > of
    > > the implications of scale of adoption and the perceived impact on the
    > user
    > > community of deprecation and removal at that time.
    > >
    >
    > We should make sure we back-haul this into the CEP process so new
    > features/large changes have to provide some idea of what the gates are to
    > be production ready.
    >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Joshua McKenzie <jm...@apache.org>.

Plays pretty cleanly into the "have a test plan" we modded in last month. +1

On Wed, Jul 1, 2020 at 6:43 PM Nate McCall <zz...@gmail.com> wrote:

> >
> >
> >
> > If so, I propose we set this thread down for now in deference to us
> > articulating the quality bar we set and how we achieve it for features in
> > the DB and then retroactively apply them to existing experimental
> features.
> > Should we determine nobody is stepping up to maintain an
> > experimental feature in a reasonable time frame, we can cross the bridge
> of
> > the implications of scale of adoption and the perceived impact on the
> user
> > community of deprecation and removal at that time.
> >
>
> We should make sure we back-haul this into the CEP process so new
> features/large changes have to provide some idea of what the gates are to
> be production ready.
>

Re: [DISCUSS] Future of MVs

Posted by Nate McCall <zz...@gmail.com>.

>
>
>
> If so, I propose we set this thread down for now in deference to us
> articulating the quality bar we set and how we achieve it for features in
> the DB and then retroactively apply them to existing experimental features.
> Should we determine nobody is stepping up to maintain an
> experimental feature in a reasonable time frame, we can cross the bridge of
> the implications of scale of adoption and the perceived impact on the user
> community of deprecation and removal at that time.
>

We should make sure we back-haul this into the CEP process so new
features/large changes have to provide some idea of what the gates are to
be production ready.

Re: [DISCUSS] Future of MVs

Posted by David Capwell <dc...@gmail.com>.

+1

On Wed, Jul 1, 2020 at 1:55 PM Jon Haddad <jo...@jonhaddad.com> wrote:

> I think coming up with a formal comprehensive guide for determining if we
> can merge these sort of huge impacting features is a great idea.
>
> I'm also on board with applying the same standard to the experimental
> features.
>
> On Wed, Jul 1, 2020 at 1:45 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > Which questions and how we frame it aside, it's clear we have some
> > foundational thinking to do, articulate, and agree upon as a project
> before
> > we can reasonably make decisions about deprecation, promotion, or
> inclusion
> > of features in the project.
> >
> > Is that fair?
> >
> > If so, I propose we set this thread down for now in deference to us
> > articulating the quality bar we set and how we achieve it for features in
> > the DB and then retroactively apply them to existing experimental
> features.
> > Should we determine nobody is stepping up to maintain an
> > experimental feature in a reasonable time frame, we can cross the bridge
> of
> > the implications of scale of adoption and the perceived impact on the
> user
> > community of deprecation and removal at that time.
> >
> > On Wed, Jul 1, 2020 at 9:59 AM Benedict Elliott Smith <
> benedict@apache.org
> > >
> > wrote:
> >
> > > I humbly suggest these are the wrong questions to ask.  Instead, two
> > sides
> > > of just one question matter: how did we miss these problems, and what
> > would
> > > we have needed to do procedurally to have not missed it.  Whatever it
> is,
> > > we need to do it now to have confidence other things were not missed,
> as
> > > well as for all future features.
> > >
> > > We should start by producing a list of what we think is necessary for
> > > deploying successful features.  We can then determine what items are
> > > missing that would have been needed to catch a problem.  Obvious things
> > > are:
> > >
> > >   * integration tests at scale
> > >   * integration tests with a variety of extreme workloads
> > >   * integration tests with various cluster topologies
> > >   * data integrity tests as part of the above
> > >   * all of the above as reproducible tests incorporated into the source
> > > tree
> > >
> > > We can then ensure Jira accurately represents all of the known issues
> > with
> > > MVs (and other features).  This includes those that are poorly defined
> > > (such as "doesn't scale").
> > >
> > > Then we can look at all issues and ask: would this approach have caught
> > > it, and if not what do we need to add to the guidelines to prevent a
> > > recurrence - and also ensure this problem is unique?  In future we can
> > ask,
> > > for bugs found in features built to these guidelines: why didn't it
> catch
> > > this bug? Do the guidelines need additional items, or greater
> specificity
> > > about how to meet given criteria?
> > >
> > > I do not think that data from deployments - even if reliably obtained -
> > > can tell us much besides which problems we prioritise.
> > >
> > >
> > >
> > > On 01/07/2020, 01:58, "joshua.mckenzie@gmail.com" <
> > > joshua.mckenzie@gmail.com> wrote:
> > >
> > >     It would be incredibly helpful for us to have some empirical data
> and
> > > agreed upon terms and benchmarks to help us navigate discussions like
> > this:
> > >
> > >       * How widely used is a feature  in C* deployments worldwide?
> > >       * What are the primary issues users face when deploying them?
> > > Scaling them? During failure scenarios?
> > >       * What does the engineering effort to bridge these gaps look
> like?
> > > Who will do that? On what time horizon?
> > >       * What does our current test coverage for this feature look like?
> > >       * What shape of defects are arising with the feature? In a
> specific
> > > subsection of the module or usage?
> > >       * Do we have an agreed upon set of standards for labeling a
> feature
> > > stable? As experimental? If not, how do we get there?
> > >       * What effort will it take to bridge from where we are to where
> we
> > > agree we need to be? On what timeline is this acceptable?
> > >
> > >     I believe these are not only answerable questions, but
> fundamentally
> > > the underlying themes our discussion alludes to. They’re also questions
> > > that apply to a lot more than just MV’s and tie into what you’re
> speaking
> > > to above Benedict.
> > >
> > >
> > >     > On Jun 30, 2020, at 8:32 PM, sankalp kohli <
> kohlisankalp@gmail.com
> > >
> > > wrote:
> > >     >
> > >     > I see this discussion as several decisions which can be made in
> > > small
> > >     > increments.
> > >     >
> > >     > 1. In release cycles, when can we propose a feature to be
> > deprecated
> > > or
> > >     > marked experimental. Ideally a new feature should come out
> > > experimental if
> > >     > required but we have several who are candidates now. We can work
> on
> > >     > integrating this in the release lifecycle doc we already have.
> > >     > 2. What is the process of making an existing feature
> experimental?
> > > How does
> > >     > it affect major releases around testing.
> > >     > 3. What is the process of deprecating/removing an experimental
> > > feature.
> > >     > (Assuming experimental features should be deprecated/removed)
> > >     >
> > >     > Coming to MV, I think we need more data before we can say we
> > >     > should deprecate MV. Here are some of them which should be part
> of
> > >     > deprecation process
> > >     > 1.Talk to customers who use them and understand what is the
> impact.
> > > Give
> > >     > them a forum to talk about it.
> > >     > 2. Do we have enough resources to bring this feature out of the
> > >     > experimental feature list in next 1 or 2 major releases. We
> cannot
> > > have too
> > >     > many experimental features in the database. Marking a feature
> > > experimental
> > >     > should not be a parking place for a non functioning feature but a
> > > place
> > >     > while we stabilize it.
> > >     >
> > >     >
> > >     >
> > >     >
> > >     >> On Tue, Jun 30, 2020 at 4:52 PM <jo...@gmail.com>
> > wrote:
> > >     >>
> > >     >> I followed up with the clarification about unit and dtests for
> > that
> > > reason
> > >     >> Dinesh. We test experimental features now.
> > >     >>
> > >     >> If we’re talking about adding experimental features to the 40
> > > quality
> > >     >> testing effort, how does that differ from just saying “we won’t
> > > release
> > >     >> until we’ve tested and stabilized these features and they’re no
> > > longer
> > >     >> experimental”?
> > >     >>
> > >     >> Maybe I’m just misunderstanding something here?
> > >     >>
> > >     >>>> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org>
> > > wrote:
> > >     >>>
> > >     >>> 
> > >     >>>>
> > >     >>>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <
> driftx@gmail.com
> > >
> > > wrote:
> > >     >>>>
> > >     >>>> Instead of ripping it out, we could instead disable them in
> the
> > > yaml
> > >     >>>> with big fat warning comments around it.  That way people
> > already
> > >     >>>> using them can just enable them again, but it will raise the
> bar
> > > for
> > >     >>>> new users who ignore/miss the warnings in the logs and just
> use
> > > them.
> > >     >>>
> > >     >>> Not a bad idea. Although, the real issue is that users enable
> MV
> > > on a 3
> > >     >> node cluster with a few megs of data and conclude that MVs will
> > >     >> horizontally scale with the size of data. This is what causes
> > > issues for
> > >     >> users who naively roll it out in production and discover that
> MVs
> > > do not
> > >     >> scale with their data growth. So whatever we do, the big fat
> > > warning should
> > >     >> educate the unsuspecting operator.
> > >     >>>
> > >     >>> Dinesh
> > >     >>>
> > > ---------------------------------------------------------------------
> > >     >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >     >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >     >>>
> > >     >>
> > >     >>
> > > ---------------------------------------------------------------------
> > >     >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >     >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >     >>
> > >     >>
> > >
> > >
>  ---------------------------------------------------------------------
> > >     To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >     For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >
>

Re: [DISCUSS] Future of MVs

Posted by Jon Haddad <jo...@jonhaddad.com>.

I think coming up with a formal comprehensive guide for determining if we
can merge these sort of huge impacting features is a great idea.

I'm also on board with applying the same standard to the experimental
features.

On Wed, Jul 1, 2020 at 1:45 PM Joshua McKenzie <jm...@apache.org> wrote:

> Which questions and how we frame it aside, it's clear we have some
> foundational thinking to do, articulate, and agree upon as a project before
> we can reasonably make decisions about deprecation, promotion, or inclusion
> of features in the project.
>
> Is that fair?
>
> If so, I propose we set this thread down for now in deference to us
> articulating the quality bar we set and how we achieve it for features in
> the DB and then retroactively apply them to existing experimental features.
> Should we determine nobody is stepping up to maintain an
> experimental feature in a reasonable time frame, we can cross the bridge of
> the implications of scale of adoption and the perceived impact on the user
> community of deprecation and removal at that time.
>
> On Wed, Jul 1, 2020 at 9:59 AM Benedict Elliott Smith <benedict@apache.org
> >
> wrote:
>
> > I humbly suggest these are the wrong questions to ask.  Instead, two
> sides
> > of just one question matter: how did we miss these problems, and what
> would
> > we have needed to do procedurally to have not missed it.  Whatever it is,
> > we need to do it now to have confidence other things were not missed, as
> > well as for all future features.
> >
> > We should start by producing a list of what we think is necessary for
> > deploying successful features.  We can then determine what items are
> > missing that would have been needed to catch a problem.  Obvious things
> > are:
> >
> >   * integration tests at scale
> >   * integration tests with a variety of extreme workloads
> >   * integration tests with various cluster topologies
> >   * data integrity tests as part of the above
> >   * all of the above as reproducible tests incorporated into the source
> > tree
> >
> > We can then ensure Jira accurately represents all of the known issues
> with
> > MVs (and other features).  This includes those that are poorly defined
> > (such as "doesn't scale").
> >
> > Then we can look at all issues and ask: would this approach have caught
> > it, and if not what do we need to add to the guidelines to prevent a
> > recurrence - and also ensure this problem is unique?  In future we can
> ask,
> > for bugs found in features built to these guidelines: why didn't it catch
> > this bug? Do the guidelines need additional items, or greater specificity
> > about how to meet given criteria?
> >
> > I do not think that data from deployments - even if reliably obtained -
> > can tell us much besides which problems we prioritise.
> >
> >
> >
> > On 01/07/2020, 01:58, "joshua.mckenzie@gmail.com" <
> > joshua.mckenzie@gmail.com> wrote:
> >
> >     It would be incredibly helpful for us to have some empirical data and
> > agreed upon terms and benchmarks to help us navigate discussions like
> this:
> >
> >       * How widely used is a feature  in C* deployments worldwide?
> >       * What are the primary issues users face when deploying them?
> > Scaling them? During failure scenarios?
> >       * What does the engineering effort to bridge these gaps look like?
> > Who will do that? On what time horizon?
> >       * What does our current test coverage for this feature look like?
> >       * What shape of defects are arising with the feature? In a specific
> > subsection of the module or usage?
> >       * Do we have an agreed upon set of standards for labeling a feature
> > stable? As experimental? If not, how do we get there?
> >       * What effort will it take to bridge from where we are to where we
> > agree we need to be? On what timeline is this acceptable?
> >
> >     I believe these are not only answerable questions, but fundamentally
> > the underlying themes our discussion alludes to. They’re also questions
> > that apply to a lot more than just MV’s and tie into what you’re speaking
> > to above Benedict.
> >
> >
> >     > On Jun 30, 2020, at 8:32 PM, sankalp kohli <kohlisankalp@gmail.com
> >
> > wrote:
> >     >
> >     > I see this discussion as several decisions which can be made in
> > small
> >     > increments.
> >     >
> >     > 1. In release cycles, when can we propose a feature to be
> deprecated
> > or
> >     > marked experimental. Ideally a new feature should come out
> > experimental if
> >     > required but we have several who are candidates now. We can work on
> >     > integrating this in the release lifecycle doc we already have.
> >     > 2. What is the process of making an existing feature experimental?
> > How does
> >     > it affect major releases around testing.
> >     > 3. What is the process of deprecating/removing an experimental
> > feature.
> >     > (Assuming experimental features should be deprecated/removed)
> >     >
> >     > Coming to MV, I think we need more data before we can say we
> >     > should deprecate MV. Here are some of them which should be part of
> >     > deprecation process
> >     > 1.Talk to customers who use them and understand what is the impact.
> > Give
> >     > them a forum to talk about it.
> >     > 2. Do we have enough resources to bring this feature out of the
> >     > experimental feature list in next 1 or 2 major releases. We cannot
> > have too
> >     > many experimental features in the database. Marking a feature
> > experimental
> >     > should not be a parking place for a non functioning feature but a
> > place
> >     > while we stabilize it.
> >     >
> >     >
> >     >
> >     >
> >     >> On Tue, Jun 30, 2020 at 4:52 PM <jo...@gmail.com>
> wrote:
> >     >>
> >     >> I followed up with the clarification about unit and dtests for
> that
> > reason
> >     >> Dinesh. We test experimental features now.
> >     >>
> >     >> If we’re talking about adding experimental features to the 40
> > quality
> >     >> testing effort, how does that differ from just saying “we won’t
> > release
> >     >> until we’ve tested and stabilized these features and they’re no
> > longer
> >     >> experimental”?
> >     >>
> >     >> Maybe I’m just misunderstanding something here?
> >     >>
> >     >>>> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org>
> > wrote:
> >     >>>
> >     >>> 
> >     >>>>
> >     >>>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <driftx@gmail.com
> >
> > wrote:
> >     >>>>
> >     >>>> Instead of ripping it out, we could instead disable them in the
> > yaml
> >     >>>> with big fat warning comments around it.  That way people
> already
> >     >>>> using them can just enable them again, but it will raise the bar
> > for
> >     >>>> new users who ignore/miss the warnings in the logs and just use
> > them.
> >     >>>
> >     >>> Not a bad idea. Although, the real issue is that users enable MV
> > on a 3
> >     >> node cluster with a few megs of data and conclude that MVs will
> >     >> horizontally scale with the size of data. This is what causes
> > issues for
> >     >> users who naively roll it out in production and discover that MVs
> > do not
> >     >> scale with their data growth. So whatever we do, the big fat
> > warning should
> >     >> educate the unsuspecting operator.
> >     >>>
> >     >>> Dinesh
> >     >>>
> > ---------------------------------------------------------------------
> >     >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >     >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >     >>>
> >     >>
> >     >>
> > ---------------------------------------------------------------------
> >     >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >     >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >     >>
> >     >>
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >     For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Future of MVs

Posted by Joshua McKenzie <jm...@apache.org>.

Which questions and how we frame it aside, it's clear we have some
foundational thinking to do, articulate, and agree upon as a project before
we can reasonably make decisions about deprecation, promotion, or inclusion
of features in the project.

Is that fair?

If so, I propose we set this thread down for now in deference to us
articulating the quality bar we set and how we achieve it for features in
the DB and then retroactively apply them to existing experimental features.
Should we determine nobody is stepping up to maintain an
experimental feature in a reasonable time frame, we can cross the bridge of
the implications of scale of adoption and the perceived impact on the user
community of deprecation and removal at that time.

On Wed, Jul 1, 2020 at 9:59 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> I humbly suggest these are the wrong questions to ask.  Instead, two sides
> of just one question matter: how did we miss these problems, and what would
> we have needed to do procedurally to have not missed it.  Whatever it is,
> we need to do it now to have confidence other things were not missed, as
> well as for all future features.
>
> We should start by producing a list of what we think is necessary for
> deploying successful features.  We can then determine what items are
> missing that would have been needed to catch a problem.  Obvious things
> are:
>
>   * integration tests at scale
>   * integration tests with a variety of extreme workloads
>   * integration tests with various cluster topologies
>   * data integrity tests as part of the above
>   * all of the above as reproducible tests incorporated into the source
> tree
>
> We can then ensure Jira accurately represents all of the known issues with
> MVs (and other features).  This includes those that are poorly defined
> (such as "doesn't scale").
>
> Then we can look at all issues and ask: would this approach have caught
> it, and if not what do we need to add to the guidelines to prevent a
> recurrence - and also ensure this problem is unique?  In future we can ask,
> for bugs found in features built to these guidelines: why didn't it catch
> this bug? Do the guidelines need additional items, or greater specificity
> about how to meet given criteria?
>
> I do not think that data from deployments - even if reliably obtained -
> can tell us much besides which problems we prioritise.
>
>
>
> On 01/07/2020, 01:58, "joshua.mckenzie@gmail.com" <
> joshua.mckenzie@gmail.com> wrote:
>
>     It would be incredibly helpful for us to have some empirical data and
> agreed upon terms and benchmarks to help us navigate discussions like this:
>
>       * How widely used is a feature  in C* deployments worldwide?
>       * What are the primary issues users face when deploying them?
> Scaling them? During failure scenarios?
>       * What does the engineering effort to bridge these gaps look like?
> Who will do that? On what time horizon?
>       * What does our current test coverage for this feature look like?
>       * What shape of defects are arising with the feature? In a specific
> subsection of the module or usage?
>       * Do we have an agreed upon set of standards for labeling a feature
> stable? As experimental? If not, how do we get there?
>       * What effort will it take to bridge from where we are to where we
> agree we need to be? On what timeline is this acceptable?
>
>     I believe these are not only answerable questions, but fundamentally
> the underlying themes our discussion alludes to. They’re also questions
> that apply to a lot more than just MV’s and tie into what you’re speaking
> to above Benedict.
>
>
>     > On Jun 30, 2020, at 8:32 PM, sankalp kohli <ko...@gmail.com>
> wrote:
>     >
>     > I see this discussion as several decisions which can be made in
> small
>     > increments.
>     >
>     > 1. In release cycles, when can we propose a feature to be deprecated
> or
>     > marked experimental. Ideally a new feature should come out
> experimental if
>     > required but we have several who are candidates now. We can work on
>     > integrating this in the release lifecycle doc we already have.
>     > 2. What is the process of making an existing feature experimental?
> How does
>     > it affect major releases around testing.
>     > 3. What is the process of deprecating/removing an experimental
> feature.
>     > (Assuming experimental features should be deprecated/removed)
>     >
>     > Coming to MV, I think we need more data before we can say we
>     > should deprecate MV. Here are some of them which should be part of
>     > deprecation process
>     > 1.Talk to customers who use them and understand what is the impact.
> Give
>     > them a forum to talk about it.
>     > 2. Do we have enough resources to bring this feature out of the
>     > experimental feature list in next 1 or 2 major releases. We cannot
> have too
>     > many experimental features in the database. Marking a feature
> experimental
>     > should not be a parking place for a non functioning feature but a
> place
>     > while we stabilize it.
>     >
>     >
>     >
>     >
>     >> On Tue, Jun 30, 2020 at 4:52 PM <jo...@gmail.com> wrote:
>     >>
>     >> I followed up with the clarification about unit and dtests for that
> reason
>     >> Dinesh. We test experimental features now.
>     >>
>     >> If we’re talking about adding experimental features to the 40
> quality
>     >> testing effort, how does that differ from just saying “we won’t
> release
>     >> until we’ve tested and stabilized these features and they’re no
> longer
>     >> experimental”?
>     >>
>     >> Maybe I’m just misunderstanding something here?
>     >>
>     >>>> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org>
> wrote:
>     >>>
>     >>> 
>     >>>>
>     >>>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com>
> wrote:
>     >>>>
>     >>>> Instead of ripping it out, we could instead disable them in the
> yaml
>     >>>> with big fat warning comments around it.  That way people already
>     >>>> using them can just enable them again, but it will raise the bar
> for
>     >>>> new users who ignore/miss the warnings in the logs and just use
> them.
>     >>>
>     >>> Not a bad idea. Although, the real issue is that users enable MV
> on a 3
>     >> node cluster with a few megs of data and conclude that MVs will
>     >> horizontally scale with the size of data. This is what causes
> issues for
>     >> users who naively roll it out in production and discover that MVs
> do not
>     >> scale with their data growth. So whatever we do, the big fat
> warning should
>     >> educate the unsuspecting operator.
>     >>>
>     >>> Dinesh
>     >>>
> ---------------------------------------------------------------------
>     >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     >>> For additional commands, e-mail: dev-help@cassandra.apache.org
>     >>>
>     >>
>     >>
> ---------------------------------------------------------------------
>     >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     >> For additional commands, e-mail: dev-help@cassandra.apache.org
>     >>
>     >>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Future of MVs

Posted by Benedict Elliott Smith <be...@apache.org>.

I humbly suggest these are the wrong questions to ask.  Instead, two sides of just one question matter: how did we miss these problems, and what would we have needed to do procedurally to have not missed it.  Whatever it is, we need to do it now to have confidence other things were not missed, as well as for all future features.  

We should start by producing a list of what we think is necessary for deploying successful features.  We can then determine what items are missing that would have been needed to catch a problem.  Obvious things are: 

  * integration tests at scale
  * integration tests with a variety of extreme workloads
  * integration tests with various cluster topologies
  * data integrity tests as part of the above
  * all of the above as reproducible tests incorporated into the source tree

We can then ensure Jira accurately represents all of the known issues with MVs (and other features).  This includes those that are poorly defined (such as "doesn't scale").

Then we can look at all issues and ask: would this approach have caught it, and if not what do we need to add to the guidelines to prevent a recurrence - and also ensure this problem is unique?  In future we can ask, for bugs found in features built to these guidelines: why didn't it catch this bug? Do the guidelines need additional items, or greater specificity about how to meet given criteria?

I do not think that data from deployments - even if reliably obtained - can tell us much besides which problems we prioritise.



On 01/07/2020, 01:58, "joshua.mckenzie@gmail.com" <jo...@gmail.com> wrote:

    It would be incredibly helpful for us to have some empirical data and agreed upon terms and benchmarks to help us navigate discussions like this:

      * How widely used is a feature  in C* deployments worldwide?
      * What are the primary issues users face when deploying them? Scaling them? During failure scenarios?
      * What does the engineering effort to bridge these gaps look like? Who will do that? On what time horizon?
      * What does our current test coverage for this feature look like?
      * What shape of defects are arising with the feature? In a specific subsection of the module or usage?
      * Do we have an agreed upon set of standards for labeling a feature stable? As experimental? If not, how do we get there?
      * What effort will it take to bridge from where we are to where we agree we need to be? On what timeline is this acceptable?

    I believe these are not only answerable questions, but fundamentally the underlying themes our discussion alludes to. They’re also questions that apply to a lot more than just MV’s and tie into what you’re speaking to above Benedict.


    > On Jun 30, 2020, at 8:32 PM, sankalp kohli <ko...@gmail.com> wrote:
    > 
    > I see this discussion as several decisions which can be made in small
    > increments.
    > 
    > 1. In release cycles, when can we propose a feature to be deprecated or
    > marked experimental. Ideally a new feature should come out experimental if
    > required but we have several who are candidates now. We can work on
    > integrating this in the release lifecycle doc we already have.
    > 2. What is the process of making an existing feature experimental? How does
    > it affect major releases around testing.
    > 3. What is the process of deprecating/removing an experimental feature.
    > (Assuming experimental features should be deprecated/removed)
    > 
    > Coming to MV, I think we need more data before we can say we
    > should deprecate MV. Here are some of them which should be part of
    > deprecation process
    > 1.Talk to customers who use them and understand what is the impact. Give
    > them a forum to talk about it.
    > 2. Do we have enough resources to bring this feature out of the
    > experimental feature list in next 1 or 2 major releases. We cannot have too
    > many experimental features in the database. Marking a feature experimental
    > should not be a parking place for a non functioning feature but a place
    > while we stabilize it.
    > 
    > 
    > 
    > 
    >> On Tue, Jun 30, 2020 at 4:52 PM <jo...@gmail.com> wrote:
    >> 
    >> I followed up with the clarification about unit and dtests for that reason
    >> Dinesh. We test experimental features now.
    >> 
    >> If we’re talking about adding experimental features to the 40 quality
    >> testing effort, how does that differ from just saying “we won’t release
    >> until we’ve tested and stabilized these features and they’re no longer
    >> experimental”?
    >> 
    >> Maybe I’m just misunderstanding something here?
    >> 
    >>>> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org> wrote:
    >>> 
    >>> 
    >>>> 
    >>>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com> wrote:
    >>>> 
    >>>> Instead of ripping it out, we could instead disable them in the yaml
    >>>> with big fat warning comments around it.  That way people already
    >>>> using them can just enable them again, but it will raise the bar for
    >>>> new users who ignore/miss the warnings in the logs and just use them.
    >>> 
    >>> Not a bad idea. Although, the real issue is that users enable MV on a 3
    >> node cluster with a few megs of data and conclude that MVs will
    >> horizontally scale with the size of data. This is what causes issues for
    >> users who naively roll it out in production and discover that MVs do not
    >> scale with their data growth. So whatever we do, the big fat warning should
    >> educate the unsuspecting operator.
    >>> 
    >>> Dinesh
    >>> ---------------------------------------------------------------------
    >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >>> For additional commands, e-mail: dev-help@cassandra.apache.org
    >>> 
    >> 
    >> ---------------------------------------------------------------------
    >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >> For additional commands, e-mail: dev-help@cassandra.apache.org
    >> 
    >> 

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    For additional commands, e-mail: dev-help@cassandra.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Jasonstack Zhao Yang <zh...@gmail.com>.

> I agree with Jeff that there is some stuff to do to address the current MV
> issues and I am willing to focus on making them production ready.

+1

On Wed, 1 Jul 2020 at 15:42, Benjamin Lerer <be...@datastax.com>
wrote:

> >
> > "Make the scan faster"
> > "Make the scan incremental and automatic"
> > "Make it not blow up your page cache"
> > "Make losing your base replicas less likely".
> >
> > There's a concrete, real opportunity with MVs to create integrity
> > assertions we're missing. A dangling record from an MV that would point
> to
> > missing base data is something that could raise alarm bells and signal
> > JIRAs so we can potentially find and fix more surprise edge cases.
> >
>
> I agree with Jeff that there is some stuff to do to address the current MV
> issues and I am willing to focus on making them production ready.
>
>
>
>
> On Wed, Jul 1, 2020 at 2:58 AM <jo...@gmail.com> wrote:
>
> > It would be incredibly helpful for us to have some empirical data and
> > agreed upon terms and benchmarks to help us navigate discussions like
> this:
> >
> >   * How widely used is a feature  in C* deployments worldwide?
> >   * What are the primary issues users face when deploying them? Scaling
> > them? During failure scenarios?
> >   * What does the engineering effort to bridge these gaps look like? Who
> > will do that? On what time horizon?
> >   * What does our current test coverage for this feature look like?
> >   * What shape of defects are arising with the feature? In a specific
> > subsection of the module or usage?
> >   * Do we have an agreed upon set of standards for labeling a feature
> > stable? As experimental? If not, how do we get there?
> >   * What effort will it take to bridge from where we are to where we
> agree
> > we need to be? On what timeline is this acceptable?
> >
> > I believe these are not only answerable questions, but fundamentally the
> > underlying themes our discussion alludes to. They’re also questions that
> > apply to a lot more than just MV’s and tie into what you’re speaking to
> > above Benedict.
> >
> >
> > > On Jun 30, 2020, at 8:32 PM, sankalp kohli <ko...@gmail.com>
> > wrote:
> > >
> > > I see this discussion as several decisions which can be made in small
> > > increments.
> > >
> > > 1. In release cycles, when can we propose a feature to be deprecated or
> > > marked experimental. Ideally a new feature should come out experimental
> > if
> > > required but we have several who are candidates now. We can work on
> > > integrating this in the release lifecycle doc we already have.
> > > 2. What is the process of making an existing feature experimental? How
> > does
> > > it affect major releases around testing.
> > > 3. What is the process of deprecating/removing an experimental feature.
> > > (Assuming experimental features should be deprecated/removed)
> > >
> > > Coming to MV, I think we need more data before we can say we
> > > should deprecate MV. Here are some of them which should be part of
> > > deprecation process
> > > 1.Talk to customers who use them and understand what is the impact.
> Give
> > > them a forum to talk about it.
> > > 2. Do we have enough resources to bring this feature out of the
> > > experimental feature list in next 1 or 2 major releases. We cannot have
> > too
> > > many experimental features in the database. Marking a feature
> > experimental
> > > should not be a parking place for a non functioning feature but a place
> > > while we stabilize it.
> > >
> > >
> > >
> > >
> > >> On Tue, Jun 30, 2020 at 4:52 PM <jo...@gmail.com> wrote:
> > >>
> > >> I followed up with the clarification about unit and dtests for that
> > reason
> > >> Dinesh. We test experimental features now.
> > >>
> > >> If we’re talking about adding experimental features to the 40 quality
> > >> testing effort, how does that differ from just saying “we won’t
> release
> > >> until we’ve tested and stabilized these features and they’re no longer
> > >> experimental”?
> > >>
> > >> Maybe I’m just misunderstanding something here?
> > >>
> > >>>> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org>
> wrote:
> > >>>
> > >>> 
> > >>>>
> > >>>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com>
> > wrote:
> > >>>>
> > >>>> Instead of ripping it out, we could instead disable them in the yaml
> > >>>> with big fat warning comments around it.  That way people already
> > >>>> using them can just enable them again, but it will raise the bar for
> > >>>> new users who ignore/miss the warnings in the logs and just use
> them.
> > >>>
> > >>> Not a bad idea. Although, the real issue is that users enable MV on
> a 3
> > >> node cluster with a few megs of data and conclude that MVs will
> > >> horizontally scale with the size of data. This is what causes issues
> for
> > >> users who naively roll it out in production and discover that MVs do
> not
> > >> scale with their data growth. So whatever we do, the big fat warning
> > should
> > >> educate the unsuspecting operator.
> > >>>
> > >>> Dinesh
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Future of MVs

Posted by Benjamin Lerer <be...@datastax.com>.

>
> "Make the scan faster"
> "Make the scan incremental and automatic"
> "Make it not blow up your page cache"
> "Make losing your base replicas less likely".
>
> There's a concrete, real opportunity with MVs to create integrity
> assertions we're missing. A dangling record from an MV that would point to
> missing base data is something that could raise alarm bells and signal
> JIRAs so we can potentially find and fix more surprise edge cases.
>

I agree with Jeff that there is some stuff to do to address the current MV
issues and I am willing to focus on making them production ready.




On Wed, Jul 1, 2020 at 2:58 AM <jo...@gmail.com> wrote:

> It would be incredibly helpful for us to have some empirical data and
> agreed upon terms and benchmarks to help us navigate discussions like this:
>
>   * How widely used is a feature  in C* deployments worldwide?
>   * What are the primary issues users face when deploying them? Scaling
> them? During failure scenarios?
>   * What does the engineering effort to bridge these gaps look like? Who
> will do that? On what time horizon?
>   * What does our current test coverage for this feature look like?
>   * What shape of defects are arising with the feature? In a specific
> subsection of the module or usage?
>   * Do we have an agreed upon set of standards for labeling a feature
> stable? As experimental? If not, how do we get there?
>   * What effort will it take to bridge from where we are to where we agree
> we need to be? On what timeline is this acceptable?
>
> I believe these are not only answerable questions, but fundamentally the
> underlying themes our discussion alludes to. They’re also questions that
> apply to a lot more than just MV’s and tie into what you’re speaking to
> above Benedict.
>
>
> > On Jun 30, 2020, at 8:32 PM, sankalp kohli <ko...@gmail.com>
> wrote:
> >
> > I see this discussion as several decisions which can be made in small
> > increments.
> >
> > 1. In release cycles, when can we propose a feature to be deprecated or
> > marked experimental. Ideally a new feature should come out experimental
> if
> > required but we have several who are candidates now. We can work on
> > integrating this in the release lifecycle doc we already have.
> > 2. What is the process of making an existing feature experimental? How
> does
> > it affect major releases around testing.
> > 3. What is the process of deprecating/removing an experimental feature.
> > (Assuming experimental features should be deprecated/removed)
> >
> > Coming to MV, I think we need more data before we can say we
> > should deprecate MV. Here are some of them which should be part of
> > deprecation process
> > 1.Talk to customers who use them and understand what is the impact. Give
> > them a forum to talk about it.
> > 2. Do we have enough resources to bring this feature out of the
> > experimental feature list in next 1 or 2 major releases. We cannot have
> too
> > many experimental features in the database. Marking a feature
> experimental
> > should not be a parking place for a non functioning feature but a place
> > while we stabilize it.
> >
> >
> >
> >
> >> On Tue, Jun 30, 2020 at 4:52 PM <jo...@gmail.com> wrote:
> >>
> >> I followed up with the clarification about unit and dtests for that
> reason
> >> Dinesh. We test experimental features now.
> >>
> >> If we’re talking about adding experimental features to the 40 quality
> >> testing effort, how does that differ from just saying “we won’t release
> >> until we’ve tested and stabilized these features and they’re no longer
> >> experimental”?
> >>
> >> Maybe I’m just misunderstanding something here?
> >>
> >>>> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org> wrote:
> >>>
> >>> 
> >>>>
> >>>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com>
> wrote:
> >>>>
> >>>> Instead of ripping it out, we could instead disable them in the yaml
> >>>> with big fat warning comments around it.  That way people already
> >>>> using them can just enable them again, but it will raise the bar for
> >>>> new users who ignore/miss the warnings in the logs and just use them.
> >>>
> >>> Not a bad idea. Although, the real issue is that users enable MV on a 3
> >> node cluster with a few megs of data and conclude that MVs will
> >> horizontally scale with the size of data. This is what causes issues for
> >> users who naively roll it out in production and discover that MVs do not
> >> scale with their data growth. So whatever we do, the big fat warning
> should
> >> educate the unsuspecting operator.
> >>>
> >>> Dinesh
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Future of MVs

Posted by jo...@gmail.com.

It would be incredibly helpful for us to have some empirical data and agreed upon terms and benchmarks to help us navigate discussions like this:

  * How widely used is a feature  in C* deployments worldwide?
  * What are the primary issues users face when deploying them? Scaling them? During failure scenarios?
  * What does the engineering effort to bridge these gaps look like? Who will do that? On what time horizon?
  * What does our current test coverage for this feature look like?
  * What shape of defects are arising with the feature? In a specific subsection of the module or usage?
  * Do we have an agreed upon set of standards for labeling a feature stable? As experimental? If not, how do we get there?
  * What effort will it take to bridge from where we are to where we agree we need to be? On what timeline is this acceptable?

I believe these are not only answerable questions, but fundamentally the underlying themes our discussion alludes to. They’re also questions that apply to a lot more than just MV’s and tie into what you’re speaking to above Benedict.


> On Jun 30, 2020, at 8:32 PM, sankalp kohli <ko...@gmail.com> wrote:
> 
> I see this discussion as several decisions which can be made in small
> increments.
> 
> 1. In release cycles, when can we propose a feature to be deprecated or
> marked experimental. Ideally a new feature should come out experimental if
> required but we have several who are candidates now. We can work on
> integrating this in the release lifecycle doc we already have.
> 2. What is the process of making an existing feature experimental? How does
> it affect major releases around testing.
> 3. What is the process of deprecating/removing an experimental feature.
> (Assuming experimental features should be deprecated/removed)
> 
> Coming to MV, I think we need more data before we can say we
> should deprecate MV. Here are some of them which should be part of
> deprecation process
> 1.Talk to customers who use them and understand what is the impact. Give
> them a forum to talk about it.
> 2. Do we have enough resources to bring this feature out of the
> experimental feature list in next 1 or 2 major releases. We cannot have too
> many experimental features in the database. Marking a feature experimental
> should not be a parking place for a non functioning feature but a place
> while we stabilize it.
> 
> 
> 
> 
>> On Tue, Jun 30, 2020 at 4:52 PM <jo...@gmail.com> wrote:
>> 
>> I followed up with the clarification about unit and dtests for that reason
>> Dinesh. We test experimental features now.
>> 
>> If we’re talking about adding experimental features to the 40 quality
>> testing effort, how does that differ from just saying “we won’t release
>> until we’ve tested and stabilized these features and they’re no longer
>> experimental”?
>> 
>> Maybe I’m just misunderstanding something here?
>> 
>>>> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org> wrote:
>>> 
>>> 
>>>> 
>>>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com> wrote:
>>>> 
>>>> Instead of ripping it out, we could instead disable them in the yaml
>>>> with big fat warning comments around it.  That way people already
>>>> using them can just enable them again, but it will raise the bar for
>>>> new users who ignore/miss the warnings in the logs and just use them.
>>> 
>>> Not a bad idea. Although, the real issue is that users enable MV on a 3
>> node cluster with a few megs of data and conclude that MVs will
>> horizontally scale with the size of data. This is what causes issues for
>> users who naively roll it out in production and discover that MVs do not
>> scale with their data growth. So whatever we do, the big fat warning should
>> educate the unsuspecting operator.
>>> 
>>> Dinesh
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by sankalp kohli <ko...@gmail.com>.

I see this discussion as several decisions which can be made in small
increments.

1. In release cycles, when can we propose a feature to be deprecated or
marked experimental. Ideally a new feature should come out experimental if
required but we have several who are candidates now. We can work on
integrating this in the release lifecycle doc we already have.
2. What is the process of making an existing feature experimental? How does
it affect major releases around testing.
3. What is the process of deprecating/removing an experimental feature.
(Assuming experimental features should be deprecated/removed)

Coming to MV, I think we need more data before we can say we
should deprecate MV. Here are some of them which should be part of
deprecation process
1.Talk to customers who use them and understand what is the impact. Give
them a forum to talk about it.
2. Do we have enough resources to bring this feature out of the
experimental feature list in next 1 or 2 major releases. We cannot have too
many experimental features in the database. Marking a feature experimental
should not be a parking place for a non functioning feature but a place
while we stabilize it.

On Tue, Jun 30, 2020 at 4:52 PM <jo...@gmail.com> wrote:

> I followed up with the clarification about unit and dtests for that reason
> Dinesh. We test experimental features now.
>
> If we’re talking about adding experimental features to the 40 quality
> testing effort, how does that differ from just saying “we won’t release
> until we’ve tested and stabilized these features and they’re no longer
> experimental”?
>
> Maybe I’m just misunderstanding something here?
>
> > On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org> wrote:
> >
> > 
> >>
> >> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com> wrote:
> >>
> >> Instead of ripping it out, we could instead disable them in the yaml
> >> with big fat warning comments around it.  That way people already
> >> using them can just enable them again, but it will raise the bar for
> >> new users who ignore/miss the warnings in the logs and just use them.
> >
> > Not a bad idea. Although, the real issue is that users enable MV on a 3
> node cluster with a few megs of data and conclude that MVs will
> horizontally scale with the size of data. This is what causes issues for
> users who naively roll it out in production and discover that MVs do not
> scale with their data growth. So whatever we do, the big fat warning should
> educate the unsuspecting operator.
> >
> > Dinesh
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Future of MVs

Posted by Dinesh Joshi <dj...@apache.org>.

> On Jun 30, 2020, at 4:52 PM, joshua.mckenzie@gmail.com wrote:
> 
> I followed up with the clarification about unit and dtests for that reason Dinesh. We test experimental features now.

I hit send before seeing your clarification. I personally feel that unit and dtests may not surface regressions. I'd prefer the user community trying out the alpha, beta, RC releases and report regressions as they find them.

Dinesh
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by jo...@gmail.com.

I followed up with the clarification about unit and dtests for that reason Dinesh. We test experimental features now.

If we’re talking about adding experimental features to the 40 quality testing effort, how does that differ from just saying “we won’t release until we’ve tested and stabilized these features and they’re no longer experimental”?

Maybe I’m just misunderstanding something here?

> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi <dj...@apache.org> wrote:
> 
> 
>> 
>> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com> wrote:
>> 
>> Instead of ripping it out, we could instead disable them in the yaml
>> with big fat warning comments around it.  That way people already
>> using them can just enable them again, but it will raise the bar for
>> new users who ignore/miss the warnings in the logs and just use them.
> 
> Not a bad idea. Although, the real issue is that users enable MV on a 3 node cluster with a few megs of data and conclude that MVs will horizontally scale with the size of data. This is what causes issues for users who naively roll it out in production and discover that MVs do not scale with their data growth. So whatever we do, the big fat warning should educate the unsuspecting operator.
> 
> Dinesh
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Dinesh Joshi <dj...@apache.org>.

> On Jun 30, 2020, at 4:05 PM, Brandon Williams <dr...@gmail.com> wrote:
> 
> Instead of ripping it out, we could instead disable them in the yaml
> with big fat warning comments around it.  That way people already
> using them can just enable them again, but it will raise the bar for
> new users who ignore/miss the warnings in the logs and just use them.

Not a bad idea. Although, the real issue is that users enable MV on a 3 node cluster with a few megs of data and conclude that MVs will horizontally scale with the size of data. This is what causes issues for users who naively roll it out in production and discover that MVs do not scale with their data growth. So whatever we do, the big fat warning should educate the unsuspecting operator.

Dinesh
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Brandon Williams <dr...@gmail.com>.

On Tue, Jun 30, 2020 at 5:41 PM <jo...@gmail.com> wrote:
> Given we’re at a place where things like MV’s and sasi are backing production cases (power users one would hope or smaller use cases) I don’t think ripping those features out and further excluding users from the ecosystem is the right move.

Instead of ripping it out, we could instead disable them in the yaml
with big fat warning comments around it.  That way people already
using them can just enable them again, but it will raise the bar for
new users who ignore/miss the warnings in the logs and just use them.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Dinesh Joshi <dj...@apache.org>.

> On Jun 30, 2020, at 3:40 PM, joshua.mckenzie@gmail.com wrote:
> 
> I don’t think we should hold up releases on testing experimental features. Especially with how many of them we have.
> 
> Given we’re at a place where things like MV’s and sasi are backing production cases (power users one would hope or smaller use cases)

Lets back up for a second here. MV's are backing production cases but we should not spend time in testing them for 4.0? That is inherently a contradictory position.

Dinesh
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by jo...@gmail.com.

I don’t think we should hold up releases on testing experimental features. Especially with how many of them we have.

Agree re: needing a more quantitative bar for new additions which we can also retroactively apply to experimental features to bring up to speed and eventually graduate. Probably worth separately defining criteria for submission of a feature as experimental while we’re at it.

Given we’re at a place where things like MV’s and sasi are backing production cases (power users one would hope or smaller use cases) I don’t think ripping those features out and further excluding users from the ecosystem is the right move. 

> On Jun 30, 2020, at 6:27 PM, David Capwell <dc...@gmail.com> wrote:
> 
> If that is the case then shouldn't we add MV to "4.0 Quality: Components
> and Test Plans" (CASSANDRA-15536)?  It is currently missing, so adding it
> to the testing road map would be a clear sign that someone is planning to
> champion and own this feature; if people feel that this is a broken
> feature, shouldn't we have tests showing this?  Would be great to see
> traction here.
> 
>> On Tue, Jun 30, 2020 at 3:11 PM Joshua McKenzie <jm...@apache.org>
>> wrote:
>> 
>> Let's forget I said anything about release cadence. That's another thread
>> entirely and a good deep conversation to explore. Don't want to derail.
>> 
>> If there's a question about "is anyone stepping forward to maintain MV's",
>> I can say with certainty that at least one full time contributor I work
>> with will engage and continue to work on and improve this feature going
>> forward. Who precisely that ends up being stands to be seen; that's more
>> fluid, but there are no plans to stop working on it going forward.
>> 
>> On Tue, Jun 30, 2020 at 5:45 PM Benedict Elliott Smith <
>> benedict@apache.org>
>> wrote:
>> 
>>> I don't think we can realistically expect majors, with the deprecation
>>> cycle they entail, to come every six months.  If nothing else, we would
>>> have too many versions to maintain at once.  I personally think all the
>>> project needs on that front is clearer roadmapping at the start of a
>>> release cycle, and we would be fine with 12-18mo release cycles.
>>> 
>>> That's another whole discussion to distract us from 4.0, anyway - though
>> I
>>> think we can tolerate a few slow burn conversations.
>>> 
>>> 
>>> On 30/06/2020, 22:10, "Joshua McKenzie" <jm...@apache.org> wrote:
>>> 
>>>    Seems like a reasonable point of view to me Sankalp. I'd also suggest
>>> we
>>>    try to find other sources of data than just the user ML, like
>>> searching on
>>>    github for instance. A collection of imperfect metrics beats just one
>>> in my
>>>    experience.
>>> 
>>>    Though I would ask why we're having this discussion this late in the
>>>    release cycle when we have what, 4 tickets left until cutting beta 1?
>>> Seems
>>>    like the kind of thing we could reasonably defer while we focus on
>>> getting
>>>    4.0 out, though I'm sympathetic to the "release is cutoff for
>>> deprecation"
>>>    argument.
>>> 
>>>    If we cadence our majors to calendar (like every 6 months for
>> example)
>>>    instead of scope this would become significantly less of a big issue
>>> imo.
>>> 
>>>    On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli <
>> kohlisankalp@gmail.com>
>>>    wrote:
>>> 
>>>> Hi,
>>>>    I think we should revisit all features which require a lot more
>>> work to
>>>> make them work. Here is how I think we should do for each one of
>> them
>>>> 
>>>> 1. Identify such features and some details of why they are
>>> deprecation
>>>> candidates.
>>>> 2. Ask the dev list if anyone is willing to work on improving them
>>> over the
>>>> next 1 or 2 major releases.
>>>> 3. We then move to the user list to find who all are using it and
>> if
>>> they
>>>> are opposed to removing/deprecating it. Assuming few will be using
>>> it, we
>>>> need to see the tradeoff of keeping it vs removing it on a case by
>>> case
>>>> basis.
>>>> 4. Deprecate it in the next major or make it experimental if #2 and
>>> #3
>>>> removes them from deprecation.
>>>> 5. Remove it in next major
>>>> 
>>>> For MV, I see this email as step #2. We should move to asking the
>>> user list
>>>> next.
>>>> 
>>>> Thanks,
>>>> Sankalp
>>>> 
>>>> On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <
>>> jmckenzie@apache.org>
>>>> wrote:
>>>> 
>>>>> We're just short of 98 tickets on the component since it's
>>> original merge
>>>>> so at least *some* work has been done to stabilize them. Not to
>>> say I'm
>>>>> endorsing running them at massive scale today without knowing
>> what
>>> you're
>>>>> doing, to be clear. They are perhaps our largest loaded gun of a
>>> feature
>>>> of
>>>>> self-foot-shooting atm. Zhao did a bunch of work on them
>>> internally and
>>>>> we've backported much of that to OSS; I've pinged him to chime in
>>> here.
>>>>> 
>>>>> The "data is orphaned in your view when you lose all base
>>> replicas" issue
>>>>> is more or less "unsolvable", since a scan of a view to confirm
>>> data in
>>>> the
>>>>> base table is so slow you're talking weeks to process and it
>>> totally
>>>>> trashes your page cache. I think Paulo landed on a "you have to
>>> rebuild
>>>> the
>>>>> view if you lose all base data" reality. There's also, I believe,
>>> the
>>>>> unresolved issue of modeling how much data a base table with one
>>> to many
>>>>> views will end up taking up in its final form when denormalized.
>>> This
>>>> could
>>>>> be vastly improved with something like an "EXPLAIN ANALYZE" for a
>>> table
>>>>> with views, if you'll excuse the mapping, to show "N bytes in
>> base
>>> will
>>>>> become M with base + views" or something.
>>>>> 
>>>>> Last but definitely not least in dumping the state in my head
>>> about this,
>>>>> there's a bunch of potential for guardrailing people away from
>>> self-harm
>>>>> with MV's if we decide to go the route of guardrails (link:
>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
>>>>> ).
>>>>> 
>>>>> So  from my PoV, I'm against us just voting to deprecate and
>> remove
>>>> without
>>>>> going into more depth into the current state of things and what
>>> options
>>>> are
>>>>> on the table, since people will continue to build MV's at the
>>> client
>>>> level
>>>>> which, in theory, should have worse correctness and performance
>>>>> characteristics than having a clean and well stabilized
>>> implementation in
>>>>> the coordinator.
>>>>> 
>>>>> Having them flagged as experimental for now as we stabilize 4.0
>>> and get
>>>>> things out the door *seems* sufficient to me, but if people are
>>> widely
>>>>> using these out in the wild and ignoring that status and the
>>>> corresponding
>>>>> warning, maybe we consider raising the volume on that warning for
>>> 4.0
>>>> while
>>>>> we figure this out.
>>>>> 
>>>>> Just my .02.
>>>>> 
>>>>> ~Josh
>>>>> 
>>>>> On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org>
>>> wrote:
>>>>> 
>>>>>>> On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>>>>> 
>>>>>>> As we move forward with the 4.0 release, we should consider
>>> this an
>>>>>>> opportunity to deprecate materialized views, and remove them
>>> in 5.0.
>>>>> We
>>>>>>> should take this opportunity to learn from the mistake and
>>> raise the
>>>>> bar
>>>>>>> for new features to undergo a much more thorough run the
>>> wringer
>>>> before
>>>>>>> merging.
>>>>>> 
>>>>>> I'm in favor of marking them as deprecated and removing them in
>>> 5.0. If
>>>>>> someone steps up and can fix them in 5.0, then we always have
>> the
>>>> option
>>>>> of
>>>>>> accepting the fix.
>>>>>> 
>>>>>> Dinesh
>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Dinesh Joshi <dj...@apache.org>.

> On Jun 30, 2020, at 3:27 PM, David Capwell <dc...@gmail.com> wrote:
> 
> If that is the case then shouldn't we add MV to "4.0 Quality: Components
> and Test Plans" (CASSANDRA-15536)?  It is currently missing, so adding it
> to the testing road map would be a clear sign that someone is planning to
> champion and own this feature; if people feel that this is a broken
> feature, shouldn't we have tests showing this?  Would be great to see
> traction here.

Good point, we should definitely test it to ensure there are no regressions even though it is marked as experimental.

I'd also like to clarify that the feature works for a certain subset of use-cases when it is limited to a certain scale. It unfortunately does not scale well with the size of data. I think it is important to call out this distinction. For many users, it's acceptable. For others it is not.

Dinesh
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Nate McCall <zz...@gmail.com>.

On Wed, Jul 1, 2020 at 10:27 AM David Capwell <dc...@gmail.com> wrote:

> If that is the case then shouldn't we add MV to "4.0 Quality: Components
> and Test Plans" (CASSANDRA-15536)?  It is currently missing, so adding it
> to the testing road map would be a clear sign that someone is planning to
> champion and own this feature; if people feel that this is a broken
> feature, shouldn't we have tests showing this?  Would be great to see
> traction here.
>

+1 - Surfacing it like that feels like a good next step to me.

Re: [DISCUSS] Future of MVs

Posted by David Capwell <dc...@gmail.com>.

If that is the case then shouldn't we add MV to "4.0 Quality: Components
and Test Plans" (CASSANDRA-15536)?  It is currently missing, so adding it
to the testing road map would be a clear sign that someone is planning to
champion and own this feature; if people feel that this is a broken
feature, shouldn't we have tests showing this?  Would be great to see
traction here.

On Tue, Jun 30, 2020 at 3:11 PM Joshua McKenzie <jm...@apache.org>
wrote:

> Let's forget I said anything about release cadence. That's another thread
> entirely and a good deep conversation to explore. Don't want to derail.
>
> If there's a question about "is anyone stepping forward to maintain MV's",
> I can say with certainty that at least one full time contributor I work
> with will engage and continue to work on and improve this feature going
> forward. Who precisely that ends up being stands to be seen; that's more
> fluid, but there are no plans to stop working on it going forward.
>
> On Tue, Jun 30, 2020 at 5:45 PM Benedict Elliott Smith <
> benedict@apache.org>
> wrote:
>
> > I don't think we can realistically expect majors, with the deprecation
> > cycle they entail, to come every six months.  If nothing else, we would
> > have too many versions to maintain at once.  I personally think all the
> > project needs on that front is clearer roadmapping at the start of a
> > release cycle, and we would be fine with 12-18mo release cycles.
> >
> > That's another whole discussion to distract us from 4.0, anyway - though
> I
> > think we can tolerate a few slow burn conversations.
> >
> >
> > On 30/06/2020, 22:10, "Joshua McKenzie" <jm...@apache.org> wrote:
> >
> >     Seems like a reasonable point of view to me Sankalp. I'd also suggest
> > we
> >     try to find other sources of data than just the user ML, like
> > searching on
> >     github for instance. A collection of imperfect metrics beats just one
> > in my
> >     experience.
> >
> >     Though I would ask why we're having this discussion this late in the
> >     release cycle when we have what, 4 tickets left until cutting beta 1?
> > Seems
> >     like the kind of thing we could reasonably defer while we focus on
> > getting
> >     4.0 out, though I'm sympathetic to the "release is cutoff for
> > deprecation"
> >     argument.
> >
> >     If we cadence our majors to calendar (like every 6 months for
> example)
> >     instead of scope this would become significantly less of a big issue
> > imo.
> >
> >     On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli <
> kohlisankalp@gmail.com>
> >     wrote:
> >
> >     > Hi,
> >     >     I think we should revisit all features which require a lot more
> > work to
> >     > make them work. Here is how I think we should do for each one of
> them
> >     >
> >     > 1. Identify such features and some details of why they are
> > deprecation
> >     > candidates.
> >     > 2. Ask the dev list if anyone is willing to work on improving them
> > over the
> >     > next 1 or 2 major releases.
> >     > 3. We then move to the user list to find who all are using it and
> if
> > they
> >     > are opposed to removing/deprecating it. Assuming few will be using
> > it, we
> >     > need to see the tradeoff of keeping it vs removing it on a case by
> > case
> >     > basis.
> >     > 4. Deprecate it in the next major or make it experimental if #2 and
> > #3
> >     > removes them from deprecation.
> >     > 5. Remove it in next major
> >     >
> >     > For MV, I see this email as step #2. We should move to asking the
> > user list
> >     > next.
> >     >
> >     > Thanks,
> >     > Sankalp
> >     >
> >     > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <
> > jmckenzie@apache.org>
> >     > wrote:
> >     >
> >     > > We're just short of 98 tickets on the component since it's
> > original merge
> >     > > so at least *some* work has been done to stabilize them. Not to
> > say I'm
> >     > > endorsing running them at massive scale today without knowing
> what
> > you're
> >     > > doing, to be clear. They are perhaps our largest loaded gun of a
> > feature
> >     > of
> >     > > self-foot-shooting atm. Zhao did a bunch of work on them
> > internally and
> >     > > we've backported much of that to OSS; I've pinged him to chime in
> > here.
> >     > >
> >     > > The "data is orphaned in your view when you lose all base
> > replicas" issue
> >     > > is more or less "unsolvable", since a scan of a view to confirm
> > data in
> >     > the
> >     > > base table is so slow you're talking weeks to process and it
> > totally
> >     > > trashes your page cache. I think Paulo landed on a "you have to
> > rebuild
> >     > the
> >     > > view if you lose all base data" reality. There's also, I believe,
> > the
> >     > > unresolved issue of modeling how much data a base table with one
> > to many
> >     > > views will end up taking up in its final form when denormalized.
> > This
> >     > could
> >     > > be vastly improved with something like an "EXPLAIN ANALYZE" for a
> > table
> >     > > with views, if you'll excuse the mapping, to show "N bytes in
> base
> > will
> >     > > become M with base + views" or something.
> >     > >
> >     > > Last but definitely not least in dumping the state in my head
> > about this,
> >     > > there's a bunch of potential for guardrailing people away from
> > self-harm
> >     > > with MV's if we decide to go the route of guardrails (link:
> >     > >
> >     > >
> >     >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
> >     > > ).
> >     > >
> >     > > So  from my PoV, I'm against us just voting to deprecate and
> remove
> >     > without
> >     > > going into more depth into the current state of things and what
> > options
> >     > are
> >     > > on the table, since people will continue to build MV's at the
> > client
> >     > level
> >     > > which, in theory, should have worse correctness and performance
> >     > > characteristics than having a clean and well stabilized
> > implementation in
> >     > > the coordinator.
> >     > >
> >     > > Having them flagged as experimental for now as we stabilize 4.0
> > and get
> >     > > things out the door *seems* sufficient to me, but if people are
> > widely
> >     > > using these out in the wild and ignoring that status and the
> >     > corresponding
> >     > > warning, maybe we consider raising the volume on that warning for
> > 4.0
> >     > while
> >     > > we figure this out.
> >     > >
> >     > > Just my .02.
> >     > >
> >     > > ~Josh
> >     > >
> >     > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org>
> > wrote:
> >     > >
> >     > > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com>
> > wrote:
> >     > > > >
> >     > > > > As we move forward with the 4.0 release, we should consider
> > this an
> >     > > > > opportunity to deprecate materialized views, and remove them
> > in 5.0.
> >     > > We
> >     > > > > should take this opportunity to learn from the mistake and
> > raise the
> >     > > bar
> >     > > > > for new features to undergo a much more thorough run the
> > wringer
> >     > before
> >     > > > > merging.
> >     > > >
> >     > > > I'm in favor of marking them as deprecated and removing them in
> > 5.0. If
> >     > > > someone steps up and can fix them in 5.0, then we always have
> the
> >     > option
> >     > > of
> >     > > > accepting the fix.
> >     > > >
> >     > > > Dinesh
> >     > > >
> > ---------------------------------------------------------------------
> >     > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >     > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> >     > > >
> >     > > >
> >     > >
> >     >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Future of MVs

Posted by Benedict Elliott Smith <be...@apache.org>.

I think the point is that we need to have a clear plan of action to bring features up to an acceptable standard.  That also implies a need to agree how we determine if a feature has reached an acceptable standard - both going forwards and retrospectively.  For those that don't reach that standard today, we need something like a retrospective CEP to agree how to rectify that.  Then we can figure out if the necessary resources can be mustered, or if we need to consider obsolescence.

I'm not convinced this discussion has to be resolved immediately, but that's how I view the situation.


On 30/06/2020, 23:11, "Joshua McKenzie" <jm...@apache.org> wrote:

    Let's forget I said anything about release cadence. That's another thread
    entirely and a good deep conversation to explore. Don't want to derail.

    If there's a question about "is anyone stepping forward to maintain MV's",
    I can say with certainty that at least one full time contributor I work
    with will engage and continue to work on and improve this feature going
    forward. Who precisely that ends up being stands to be seen; that's more
    fluid, but there are no plans to stop working on it going forward.

    On Tue, Jun 30, 2020 at 5:45 PM Benedict Elliott Smith <be...@apache.org>
    wrote:

    > I don't think we can realistically expect majors, with the deprecation
    > cycle they entail, to come every six months.  If nothing else, we would
    > have too many versions to maintain at once.  I personally think all the
    > project needs on that front is clearer roadmapping at the start of a
    > release cycle, and we would be fine with 12-18mo release cycles.
    >
    > That's another whole discussion to distract us from 4.0, anyway - though I
    > think we can tolerate a few slow burn conversations.
    >
    >
    > On 30/06/2020, 22:10, "Joshua McKenzie" <jm...@apache.org> wrote:
    >
    >     Seems like a reasonable point of view to me Sankalp. I'd also suggest
    > we
    >     try to find other sources of data than just the user ML, like
    > searching on
    >     github for instance. A collection of imperfect metrics beats just one
    > in my
    >     experience.
    >
    >     Though I would ask why we're having this discussion this late in the
    >     release cycle when we have what, 4 tickets left until cutting beta 1?
    > Seems
    >     like the kind of thing we could reasonably defer while we focus on
    > getting
    >     4.0 out, though I'm sympathetic to the "release is cutoff for
    > deprecation"
    >     argument.
    >
    >     If we cadence our majors to calendar (like every 6 months for example)
    >     instead of scope this would become significantly less of a big issue
    > imo.
    >
    >     On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli <ko...@gmail.com>
    >     wrote:
    >
    >     > Hi,
    >     >     I think we should revisit all features which require a lot more
    > work to
    >     > make them work. Here is how I think we should do for each one of them
    >     >
    >     > 1. Identify such features and some details of why they are
    > deprecation
    >     > candidates.
    >     > 2. Ask the dev list if anyone is willing to work on improving them
    > over the
    >     > next 1 or 2 major releases.
    >     > 3. We then move to the user list to find who all are using it and if
    > they
    >     > are opposed to removing/deprecating it. Assuming few will be using
    > it, we
    >     > need to see the tradeoff of keeping it vs removing it on a case by
    > case
    >     > basis.
    >     > 4. Deprecate it in the next major or make it experimental if #2 and
    > #3
    >     > removes them from deprecation.
    >     > 5. Remove it in next major
    >     >
    >     > For MV, I see this email as step #2. We should move to asking the
    > user list
    >     > next.
    >     >
    >     > Thanks,
    >     > Sankalp
    >     >
    >     > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <
    > jmckenzie@apache.org>
    >     > wrote:
    >     >
    >     > > We're just short of 98 tickets on the component since it's
    > original merge
    >     > > so at least *some* work has been done to stabilize them. Not to
    > say I'm
    >     > > endorsing running them at massive scale today without knowing what
    > you're
    >     > > doing, to be clear. They are perhaps our largest loaded gun of a
    > feature
    >     > of
    >     > > self-foot-shooting atm. Zhao did a bunch of work on them
    > internally and
    >     > > we've backported much of that to OSS; I've pinged him to chime in
    > here.
    >     > >
    >     > > The "data is orphaned in your view when you lose all base
    > replicas" issue
    >     > > is more or less "unsolvable", since a scan of a view to confirm
    > data in
    >     > the
    >     > > base table is so slow you're talking weeks to process and it
    > totally
    >     > > trashes your page cache. I think Paulo landed on a "you have to
    > rebuild
    >     > the
    >     > > view if you lose all base data" reality. There's also, I believe,
    > the
    >     > > unresolved issue of modeling how much data a base table with one
    > to many
    >     > > views will end up taking up in its final form when denormalized.
    > This
    >     > could
    >     > > be vastly improved with something like an "EXPLAIN ANALYZE" for a
    > table
    >     > > with views, if you'll excuse the mapping, to show "N bytes in base
    > will
    >     > > become M with base + views" or something.
    >     > >
    >     > > Last but definitely not least in dumping the state in my head
    > about this,
    >     > > there's a bunch of potential for guardrailing people away from
    > self-harm
    >     > > with MV's if we decide to go the route of guardrails (link:
    >     > >
    >     > >
    >     >
    > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
    >     > > ).
    >     > >
    >     > > So  from my PoV, I'm against us just voting to deprecate and remove
    >     > without
    >     > > going into more depth into the current state of things and what
    > options
    >     > are
    >     > > on the table, since people will continue to build MV's at the
    > client
    >     > level
    >     > > which, in theory, should have worse correctness and performance
    >     > > characteristics than having a clean and well stabilized
    > implementation in
    >     > > the coordinator.
    >     > >
    >     > > Having them flagged as experimental for now as we stabilize 4.0
    > and get
    >     > > things out the door *seems* sufficient to me, but if people are
    > widely
    >     > > using these out in the wild and ignoring that status and the
    >     > corresponding
    >     > > warning, maybe we consider raising the volume on that warning for
    > 4.0
    >     > while
    >     > > we figure this out.
    >     > >
    >     > > Just my .02.
    >     > >
    >     > > ~Josh
    >     > >
    >     > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org>
    > wrote:
    >     > >
    >     > > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com>
    > wrote:
    >     > > > >
    >     > > > > As we move forward with the 4.0 release, we should consider
    > this an
    >     > > > > opportunity to deprecate materialized views, and remove them
    > in 5.0.
    >     > > We
    >     > > > > should take this opportunity to learn from the mistake and
    > raise the
    >     > > bar
    >     > > > > for new features to undergo a much more thorough run the
    > wringer
    >     > before
    >     > > > > merging.
    >     > > >
    >     > > > I'm in favor of marking them as deprecated and removing them in
    > 5.0. If
    >     > > > someone steps up and can fix them in 5.0, then we always have the
    >     > option
    >     > > of
    >     > > > accepting the fix.
    >     > > >
    >     > > > Dinesh
    >     > > >
    > ---------------------------------------------------------------------
    >     > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >     > > > For additional commands, e-mail: dev-help@cassandra.apache.org
    >     > > >
    >     > > >
    >     > >
    >     >
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    > For additional commands, e-mail: dev-help@cassandra.apache.org
    >
    >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Joshua McKenzie <jm...@apache.org>.

Let's forget I said anything about release cadence. That's another thread
entirely and a good deep conversation to explore. Don't want to derail.

If there's a question about "is anyone stepping forward to maintain MV's",
I can say with certainty that at least one full time contributor I work
with will engage and continue to work on and improve this feature going
forward. Who precisely that ends up being stands to be seen; that's more
fluid, but there are no plans to stop working on it going forward.

On Tue, Jun 30, 2020 at 5:45 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> I don't think we can realistically expect majors, with the deprecation
> cycle they entail, to come every six months.  If nothing else, we would
> have too many versions to maintain at once.  I personally think all the
> project needs on that front is clearer roadmapping at the start of a
> release cycle, and we would be fine with 12-18mo release cycles.
>
> That's another whole discussion to distract us from 4.0, anyway - though I
> think we can tolerate a few slow burn conversations.
>
>
> On 30/06/2020, 22:10, "Joshua McKenzie" <jm...@apache.org> wrote:
>
>     Seems like a reasonable point of view to me Sankalp. I'd also suggest
> we
>     try to find other sources of data than just the user ML, like
> searching on
>     github for instance. A collection of imperfect metrics beats just one
> in my
>     experience.
>
>     Though I would ask why we're having this discussion this late in the
>     release cycle when we have what, 4 tickets left until cutting beta 1?
> Seems
>     like the kind of thing we could reasonably defer while we focus on
> getting
>     4.0 out, though I'm sympathetic to the "release is cutoff for
> deprecation"
>     argument.
>
>     If we cadence our majors to calendar (like every 6 months for example)
>     instead of scope this would become significantly less of a big issue
> imo.
>
>     On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli <ko...@gmail.com>
>     wrote:
>
>     > Hi,
>     >     I think we should revisit all features which require a lot more
> work to
>     > make them work. Here is how I think we should do for each one of them
>     >
>     > 1. Identify such features and some details of why they are
> deprecation
>     > candidates.
>     > 2. Ask the dev list if anyone is willing to work on improving them
> over the
>     > next 1 or 2 major releases.
>     > 3. We then move to the user list to find who all are using it and if
> they
>     > are opposed to removing/deprecating it. Assuming few will be using
> it, we
>     > need to see the tradeoff of keeping it vs removing it on a case by
> case
>     > basis.
>     > 4. Deprecate it in the next major or make it experimental if #2 and
> #3
>     > removes them from deprecation.
>     > 5. Remove it in next major
>     >
>     > For MV, I see this email as step #2. We should move to asking the
> user list
>     > next.
>     >
>     > Thanks,
>     > Sankalp
>     >
>     > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <
> jmckenzie@apache.org>
>     > wrote:
>     >
>     > > We're just short of 98 tickets on the component since it's
> original merge
>     > > so at least *some* work has been done to stabilize them. Not to
> say I'm
>     > > endorsing running them at massive scale today without knowing what
> you're
>     > > doing, to be clear. They are perhaps our largest loaded gun of a
> feature
>     > of
>     > > self-foot-shooting atm. Zhao did a bunch of work on them
> internally and
>     > > we've backported much of that to OSS; I've pinged him to chime in
> here.
>     > >
>     > > The "data is orphaned in your view when you lose all base
> replicas" issue
>     > > is more or less "unsolvable", since a scan of a view to confirm
> data in
>     > the
>     > > base table is so slow you're talking weeks to process and it
> totally
>     > > trashes your page cache. I think Paulo landed on a "you have to
> rebuild
>     > the
>     > > view if you lose all base data" reality. There's also, I believe,
> the
>     > > unresolved issue of modeling how much data a base table with one
> to many
>     > > views will end up taking up in its final form when denormalized.
> This
>     > could
>     > > be vastly improved with something like an "EXPLAIN ANALYZE" for a
> table
>     > > with views, if you'll excuse the mapping, to show "N bytes in base
> will
>     > > become M with base + views" or something.
>     > >
>     > > Last but definitely not least in dumping the state in my head
> about this,
>     > > there's a bunch of potential for guardrailing people away from
> self-harm
>     > > with MV's if we decide to go the route of guardrails (link:
>     > >
>     > >
>     >
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
>     > > ).
>     > >
>     > > So  from my PoV, I'm against us just voting to deprecate and remove
>     > without
>     > > going into more depth into the current state of things and what
> options
>     > are
>     > > on the table, since people will continue to build MV's at the
> client
>     > level
>     > > which, in theory, should have worse correctness and performance
>     > > characteristics than having a clean and well stabilized
> implementation in
>     > > the coordinator.
>     > >
>     > > Having them flagged as experimental for now as we stabilize 4.0
> and get
>     > > things out the door *seems* sufficient to me, but if people are
> widely
>     > > using these out in the wild and ignoring that status and the
>     > corresponding
>     > > warning, maybe we consider raising the volume on that warning for
> 4.0
>     > while
>     > > we figure this out.
>     > >
>     > > Just my .02.
>     > >
>     > > ~Josh
>     > >
>     > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org>
> wrote:
>     > >
>     > > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com>
> wrote:
>     > > > >
>     > > > > As we move forward with the 4.0 release, we should consider
> this an
>     > > > > opportunity to deprecate materialized views, and remove them
> in 5.0.
>     > > We
>     > > > > should take this opportunity to learn from the mistake and
> raise the
>     > > bar
>     > > > > for new features to undergo a much more thorough run the
> wringer
>     > before
>     > > > > merging.
>     > > >
>     > > > I'm in favor of marking them as deprecated and removing them in
> 5.0. If
>     > > > someone steps up and can fix them in 5.0, then we always have the
>     > option
>     > > of
>     > > > accepting the fix.
>     > > >
>     > > > Dinesh
>     > > >
> ---------------------------------------------------------------------
>     > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     > > > For additional commands, e-mail: dev-help@cassandra.apache.org
>     > > >
>     > > >
>     > >
>     >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Future of MVs

Posted by Benedict Elliott Smith <be...@apache.org>.

I don't think we can realistically expect majors, with the deprecation cycle they entail, to come every six months.  If nothing else, we would have too many versions to maintain at once.  I personally think all the project needs on that front is clearer roadmapping at the start of a release cycle, and we would be fine with 12-18mo release cycles.

That's another whole discussion to distract us from 4.0, anyway - though I think we can tolerate a few slow burn conversations.
 

On 30/06/2020, 22:10, "Joshua McKenzie" <jm...@apache.org> wrote:

    Seems like a reasonable point of view to me Sankalp. I'd also suggest we
    try to find other sources of data than just the user ML, like searching on
    github for instance. A collection of imperfect metrics beats just one in my
    experience.

    Though I would ask why we're having this discussion this late in the
    release cycle when we have what, 4 tickets left until cutting beta 1? Seems
    like the kind of thing we could reasonably defer while we focus on getting
    4.0 out, though I'm sympathetic to the "release is cutoff for deprecation"
    argument.

    If we cadence our majors to calendar (like every 6 months for example)
    instead of scope this would become significantly less of a big issue imo.

    On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli <ko...@gmail.com>
    wrote:

    > Hi,
    >     I think we should revisit all features which require a lot more work to
    > make them work. Here is how I think we should do for each one of them
    >
    > 1. Identify such features and some details of why they are deprecation
    > candidates.
    > 2. Ask the dev list if anyone is willing to work on improving them over the
    > next 1 or 2 major releases.
    > 3. We then move to the user list to find who all are using it and if they
    > are opposed to removing/deprecating it. Assuming few will be using it, we
    > need to see the tradeoff of keeping it vs removing it on a case by case
    > basis.
    > 4. Deprecate it in the next major or make it experimental if #2 and #3
    > removes them from deprecation.
    > 5. Remove it in next major
    >
    > For MV, I see this email as step #2. We should move to asking the user list
    > next.
    >
    > Thanks,
    > Sankalp
    >
    > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <jm...@apache.org>
    > wrote:
    >
    > > We're just short of 98 tickets on the component since it's original merge
    > > so at least *some* work has been done to stabilize them. Not to say I'm
    > > endorsing running them at massive scale today without knowing what you're
    > > doing, to be clear. They are perhaps our largest loaded gun of a feature
    > of
    > > self-foot-shooting atm. Zhao did a bunch of work on them internally and
    > > we've backported much of that to OSS; I've pinged him to chime in here.
    > >
    > > The "data is orphaned in your view when you lose all base replicas" issue
    > > is more or less "unsolvable", since a scan of a view to confirm data in
    > the
    > > base table is so slow you're talking weeks to process and it totally
    > > trashes your page cache. I think Paulo landed on a "you have to rebuild
    > the
    > > view if you lose all base data" reality. There's also, I believe, the
    > > unresolved issue of modeling how much data a base table with one to many
    > > views will end up taking up in its final form when denormalized. This
    > could
    > > be vastly improved with something like an "EXPLAIN ANALYZE" for a table
    > > with views, if you'll excuse the mapping, to show "N bytes in base will
    > > become M with base + views" or something.
    > >
    > > Last but definitely not least in dumping the state in my head about this,
    > > there's a bunch of potential for guardrailing people away from self-harm
    > > with MV's if we decide to go the route of guardrails (link:
    > >
    > >
    > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
    > > ).
    > >
    > > So  from my PoV, I'm against us just voting to deprecate and remove
    > without
    > > going into more depth into the current state of things and what options
    > are
    > > on the table, since people will continue to build MV's at the client
    > level
    > > which, in theory, should have worse correctness and performance
    > > characteristics than having a clean and well stabilized implementation in
    > > the coordinator.
    > >
    > > Having them flagged as experimental for now as we stabilize 4.0 and get
    > > things out the door *seems* sufficient to me, but if people are widely
    > > using these out in the wild and ignoring that status and the
    > corresponding
    > > warning, maybe we consider raising the volume on that warning for 4.0
    > while
    > > we figure this out.
    > >
    > > Just my .02.
    > >
    > > ~Josh
    > >
    > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org> wrote:
    > >
    > > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
    > > > >
    > > > > As we move forward with the 4.0 release, we should consider this an
    > > > > opportunity to deprecate materialized views, and remove them in 5.0.
    > > We
    > > > > should take this opportunity to learn from the mistake and raise the
    > > bar
    > > > > for new features to undergo a much more thorough run the wringer
    > before
    > > > > merging.
    > > >
    > > > I'm in favor of marking them as deprecated and removing them in 5.0. If
    > > > someone steps up and can fix them in 5.0, then we always have the
    > option
    > > of
    > > > accepting the fix.
    > > >
    > > > Dinesh
    > > > ---------------------------------------------------------------------
    > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    > > > For additional commands, e-mail: dev-help@cassandra.apache.org
    > > >
    > > >
    > >
    >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Future of MVs

Posted by Joshua McKenzie <jm...@apache.org>.

Seems like a reasonable point of view to me Sankalp. I'd also suggest we
try to find other sources of data than just the user ML, like searching on
github for instance. A collection of imperfect metrics beats just one in my
experience.

Though I would ask why we're having this discussion this late in the
release cycle when we have what, 4 tickets left until cutting beta 1? Seems
like the kind of thing we could reasonably defer while we focus on getting
4.0 out, though I'm sympathetic to the "release is cutoff for deprecation"
argument.

If we cadence our majors to calendar (like every 6 months for example)
instead of scope this would become significantly less of a big issue imo.

On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli <ko...@gmail.com>
wrote:

> Hi,
>     I think we should revisit all features which require a lot more work to
> make them work. Here is how I think we should do for each one of them
>
> 1. Identify such features and some details of why they are deprecation
> candidates.
> 2. Ask the dev list if anyone is willing to work on improving them over the
> next 1 or 2 major releases.
> 3. We then move to the user list to find who all are using it and if they
> are opposed to removing/deprecating it. Assuming few will be using it, we
> need to see the tradeoff of keeping it vs removing it on a case by case
> basis.
> 4. Deprecate it in the next major or make it experimental if #2 and #3
> removes them from deprecation.
> 5. Remove it in next major
>
> For MV, I see this email as step #2. We should move to asking the user list
> next.
>
> Thanks,
> Sankalp
>
> On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > We're just short of 98 tickets on the component since it's original merge
> > so at least *some* work has been done to stabilize them. Not to say I'm
> > endorsing running them at massive scale today without knowing what you're
> > doing, to be clear. They are perhaps our largest loaded gun of a feature
> of
> > self-foot-shooting atm. Zhao did a bunch of work on them internally and
> > we've backported much of that to OSS; I've pinged him to chime in here.
> >
> > The "data is orphaned in your view when you lose all base replicas" issue
> > is more or less "unsolvable", since a scan of a view to confirm data in
> the
> > base table is so slow you're talking weeks to process and it totally
> > trashes your page cache. I think Paulo landed on a "you have to rebuild
> the
> > view if you lose all base data" reality. There's also, I believe, the
> > unresolved issue of modeling how much data a base table with one to many
> > views will end up taking up in its final form when denormalized. This
> could
> > be vastly improved with something like an "EXPLAIN ANALYZE" for a table
> > with views, if you'll excuse the mapping, to show "N bytes in base will
> > become M with base + views" or something.
> >
> > Last but definitely not least in dumping the state in my head about this,
> > there's a bunch of potential for guardrailing people away from self-harm
> > with MV's if we decide to go the route of guardrails (link:
> >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
> > ).
> >
> > So  from my PoV, I'm against us just voting to deprecate and remove
> without
> > going into more depth into the current state of things and what options
> are
> > on the table, since people will continue to build MV's at the client
> level
> > which, in theory, should have worse correctness and performance
> > characteristics than having a clean and well stabilized implementation in
> > the coordinator.
> >
> > Having them flagged as experimental for now as we stabilize 4.0 and get
> > things out the door *seems* sufficient to me, but if people are widely
> > using these out in the wild and ignoring that status and the
> corresponding
> > warning, maybe we consider raising the volume on that warning for 4.0
> while
> > we figure this out.
> >
> > Just my .02.
> >
> > ~Josh
> >
> > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org> wrote:
> >
> > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
> > > >
> > > > As we move forward with the 4.0 release, we should consider this an
> > > > opportunity to deprecate materialized views, and remove them in 5.0.
> > We
> > > > should take this opportunity to learn from the mistake and raise the
> > bar
> > > > for new features to undergo a much more thorough run the wringer
> before
> > > > merging.
> > >
> > > I'm in favor of marking them as deprecated and removing them in 5.0. If
> > > someone steps up and can fix them in 5.0, then we always have the
> option
> > of
> > > accepting the fix.
> > >
> > > Dinesh
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >
>

Re: [DISCUSS] Future of MVs

Posted by sankalp kohli <ko...@gmail.com>.

Hi,
    I think we should revisit all features which require a lot more work to
make them work. Here is how I think we should do for each one of them

1. Identify such features and some details of why they are deprecation
candidates.
2. Ask the dev list if anyone is willing to work on improving them over the
next 1 or 2 major releases.
3. We then move to the user list to find who all are using it and if they
are opposed to removing/deprecating it. Assuming few will be using it, we
need to see the tradeoff of keeping it vs removing it on a case by case
basis.
4. Deprecate it in the next major or make it experimental if #2 and #3
removes them from deprecation.
5. Remove it in next major

For MV, I see this email as step #2. We should move to asking the user list
next.

Thanks,
Sankalp

On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie <jm...@apache.org>
wrote:

> We're just short of 98 tickets on the component since it's original merge
> so at least *some* work has been done to stabilize them. Not to say I'm
> endorsing running them at massive scale today without knowing what you're
> doing, to be clear. They are perhaps our largest loaded gun of a feature of
> self-foot-shooting atm. Zhao did a bunch of work on them internally and
> we've backported much of that to OSS; I've pinged him to chime in here.
>
> The "data is orphaned in your view when you lose all base replicas" issue
> is more or less "unsolvable", since a scan of a view to confirm data in the
> base table is so slow you're talking weeks to process and it totally
> trashes your page cache. I think Paulo landed on a "you have to rebuild the
> view if you lose all base data" reality. There's also, I believe, the
> unresolved issue of modeling how much data a base table with one to many
> views will end up taking up in its final form when denormalized. This could
> be vastly improved with something like an "EXPLAIN ANALYZE" for a table
> with views, if you'll excuse the mapping, to show "N bytes in base will
> become M with base + views" or something.
>
> Last but definitely not least in dumping the state in my head about this,
> there's a bunch of potential for guardrailing people away from self-harm
> with MV's if we decide to go the route of guardrails (link:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
> ).
>
> So  from my PoV, I'm against us just voting to deprecate and remove without
> going into more depth into the current state of things and what options are
> on the table, since people will continue to build MV's at the client level
> which, in theory, should have worse correctness and performance
> characteristics than having a clean and well stabilized implementation in
> the coordinator.
>
> Having them flagged as experimental for now as we stabilize 4.0 and get
> things out the door *seems* sufficient to me, but if people are widely
> using these out in the wild and ignoring that status and the corresponding
> warning, maybe we consider raising the volume on that warning for 4.0 while
> we figure this out.
>
> Just my .02.
>
> ~Josh
>
> On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org> wrote:
>
> > > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
> > >
> > > As we move forward with the 4.0 release, we should consider this an
> > > opportunity to deprecate materialized views, and remove them in 5.0.
> We
> > > should take this opportunity to learn from the mistake and raise the
> bar
> > > for new features to undergo a much more thorough run the wringer before
> > > merging.
> >
> > I'm in favor of marking them as deprecated and removing them in 5.0. If
> > someone steps up and can fix them in 5.0, then we always have the option
> of
> > accepting the fix.
> >
> > Dinesh
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Future of MVs

Posted by Joshua McKenzie <jm...@apache.org>.

We're just short of 98 tickets on the component since it's original merge
so at least *some* work has been done to stabilize them. Not to say I'm
endorsing running them at massive scale today without knowing what you're
doing, to be clear. They are perhaps our largest loaded gun of a feature of
self-foot-shooting atm. Zhao did a bunch of work on them internally and
we've backported much of that to OSS; I've pinged him to chime in here.

The "data is orphaned in your view when you lose all base replicas" issue
is more or less "unsolvable", since a scan of a view to confirm data in the
base table is so slow you're talking weeks to process and it totally
trashes your page cache. I think Paulo landed on a "you have to rebuild the
view if you lose all base data" reality. There's also, I believe, the
unresolved issue of modeling how much data a base table with one to many
views will end up taking up in its final form when denormalized. This could
be vastly improved with something like an "EXPLAIN ANALYZE" for a table
with views, if you'll excuse the mapping, to show "N bytes in base will
become M with base + views" or something.

Last but definitely not least in dumping the state in my head about this,
there's a bunch of potential for guardrailing people away from self-harm
with MV's if we decide to go the route of guardrails (link:
https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
).

So  from my PoV, I'm against us just voting to deprecate and remove without
going into more depth into the current state of things and what options are
on the table, since people will continue to build MV's at the client level
which, in theory, should have worse correctness and performance
characteristics than having a clean and well stabilized implementation in
the coordinator.

Having them flagged as experimental for now as we stabilize 4.0 and get
things out the door *seems* sufficient to me, but if people are widely
using these out in the wild and ignoring that status and the corresponding
warning, maybe we consider raising the volume on that warning for 4.0 while
we figure this out.

Just my .02.

~Josh

On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi <dj...@apache.org> wrote:

> > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
> >
> > As we move forward with the 4.0 release, we should consider this an
> > opportunity to deprecate materialized views, and remove them in 5.0.  We
> > should take this opportunity to learn from the mistake and raise the bar
> > for new features to undergo a much more thorough run the wringer before
> > merging.
>
> I'm in favor of marking them as deprecated and removing them in 5.0. If
> someone steps up and can fix them in 5.0, then we always have the option of
> accepting the fix.
>
> Dinesh
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Future of MVs

Posted by Jasonstack Zhao Yang <zh...@gmail.com>.

> While at TLP, I helped numerous customers move off of MVs, mostly because
> they affected stability of clusters in a horrific way.  The most telling
> project involved helping someone create new tables to manage 1GB of data
> because the views performed so poorly they made the cluster unresponsive
> and unusable.

The documented way to report bugs:
https://cassandra.apache.org/doc/latest/bugs.html#

with JIRA, Version, Environment.


> As we move forward with the 4.0 release, we should consider this an
opportunity to deprecate materialized views, and remove them in 5.0.

While the community is focusing on 4.0 and unable to review
CEP/Improvements,
should we discuss it when community is ready to discuss about
CEP/Improvements?


> We should take this opportunity to learn from the mistake and raise the
bar
> for new features to undergo a much more thorough run the wringer before
> merging.

Agreed to learn from mistakes, but there are still users using MV.
I think it's more responsible to work with users to improve MV on their use
cases.


>  Am I missing a JIRA
> that can magically fix the issues with performance, availability &
> correctness?

Is there any formal discussion/analysis about things being impossible to
fix/improve?

On Wed, 1 Jul 2020 at 04:23, Dinesh Joshi <dj...@apache.org> wrote:

> > On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
> >
> > As we move forward with the 4.0 release, we should consider this an
> > opportunity to deprecate materialized views, and remove them in 5.0.  We
> > should take this opportunity to learn from the mistake and raise the bar
> > for new features to undergo a much more thorough run the wringer before
> > merging.
>
> I'm in favor of marking them as deprecated and removing them in 5.0. If
> someone steps up and can fix them in 5.0, then we always have the option of
> accepting the fix.
>
> Dinesh
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Future of MVs

Posted by Dinesh Joshi <dj...@apache.org>.

> On Jun 30, 2020, at 12:43 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
> 
> As we move forward with the 4.0 release, we should consider this an
> opportunity to deprecate materialized views, and remove them in 5.0.  We
> should take this opportunity to learn from the mistake and raise the bar
> for new features to undergo a much more thorough run the wringer before
> merging.

I'm in favor of marking them as deprecated and removing them in 5.0. If someone steps up and can fix them in 5.0, then we always have the option of accepting the fix.

Dinesh
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org