You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@druid.apache.org by Suneet Saldanha <su...@imply.io> on 2020/06/08 20:49:35 UTC

Feature lifecycle for Druid features

Hi Druid devs!

I've been thinking about our release process and would love to get your
thoughts on how we manage new features.

When a new feature is added is it first marked as experimental?
How do users know which features are experimental?
How do we ensure that features do not break with each new release?
Should the release manager manually check each feature works as part of the
release process?
    This doesn't seem like it can scale.
Should integration tests always be required if the feature is being added
to core?

To address these issues, I'd like to propose we introduce a feature
lifecycle for all features so that we can set expectations for users
appropriately - either in the docs, product or both. I'd like to propose
something like this:
* Alpha - Known major bugs / performance issues. Incomplete functionality.
Disabled by default.
* Beta - Feature is not yet battle tested in production. API and
compatibility may change in the future. May not be forward/ backward
compatible.
* GA - Feature has appropriate user facing documentation and testing so
that it won't regres with a version upgrade. Will be forward / backward
compatible for x releases (maybe 4? ~ 1 year)

I think a model like this will allow us to continue to ship features
quickly while keeping the release quality bar high so that our users can
continue to rely on Druid without worrying about upgrade issues.
I understand that adding integration tests may not always make sense for
early / experimental features when we're uncertain of the API or the
broader use case we're trying to solve. This model would make it clear to
our users which features are still work in progress, and which ones they
can expect to remain stable for a longer time.

Below is an example of how I think this model can be applied to a new
feature:

This PR adds support for a new feature -
https://github.com/apache/druid/pull/9449

While it has been tested locally, there may be changes that enter Druid
before the 0.19 release that break this feature, or more likely - a
refactoring after 0.19 that breaks something in this feature. In this
example, I think the feature should be marked as alpha, since there are
future changes expected to the functionality. At this stage integration
tests are not expected. Once the feature is complete, there should be happy
path integration tests for the feature and it can graduate to Beta. After
it has been running in production for a while, the feature can graduate to
GA once we've added enough integration tests that we feel confident that
the feature will continue to work if the integration tests run successfully.

I know this is a very long email, but I look forward to hearing your
thoughts on this.
Suneet

Re: Feature lifecycle for Druid features

Posted by Suneet Saldanha <su...@imply.io>.

Hi

Thanks for the suggestions and feedback Gian and Surekha! I've not
forgotten about this thread :)

I've been a little busy recently, and haven't had time to fully process the
suggestions and write up responses/ next steps. I'll follow up to this next
week.

On Mon, Jun 15, 2020 at 11:55 PM Surekha Saharan <su...@imply.io>
wrote:

> Thanks for starting this discussion, regression testing is essential for
> maintaining code quality and happy users. Introducing feature
> lifecycle sounds good to me, I have some questions and comments:
>
> - Right now we have features categorized into "experimental" and "GA", are
> you suggesting to further divide the experimental features into "alpha" and
> "beta" ?
>
> - We should clearly define the meaning of "alpha", "beta" and "GA" features
> to set expectations for both contributors and users. I think in your
> "alpha" definition, it says "known major bugs", that doesn't sound right, i
> think there can be "known limitations or unknown major bugs" in alpha. I
> agree with the idea suggested on undocumented "alpha" features and
> documented "beta" features.
>
> - For GA features, you suggested forward/backward compatibility for x
> number of releases, i think 2 might be a good number in that case, as I
> often see folks skipping a version when upgrading, or if we decide to do
> major releases less often in future as suggested, 1 is fine too.
>
> - The existing GA features which do not meet the existing testing criteria,
> instead of putting the burden on the developer who fixes a bug in that area
> in future, we could create github issues for major features and someone
> wanting to understand that feature from the community could
> potentially help write integration tests for those features.Of course if a
> contributor writes those tests as part of doing some bug fix/enhancement,
> it's welcome.
>
> - Not all, but i think a lot of regression can be prevented by writing good
> unit tests for changes related to API, serde etc, while integration tests
> are necessary for end to end testing and catching dependency changes.
>
> Btw is there a page where we list all the experimental features or is it on
> the feature page where we add a note on it's experimental nature or in the
> release notes. Thanks Gian for classifying the current features into
> "alpha", "beta", "GA", it helps in understanding the definition.
>
> -Surekha
>
> On Mon, Jun 15, 2020 at 8:40 PM Gian Merlino <gi...@apache.org> wrote:
>
> > IMO the alpha / beta / GA terminology makes sense, and makes things
> clearer
> > to users, which is good.
> >
> > Some thoughts on the specifics of your proposal:
> >
> > - You're suggesting we commit to a specific number of releases that a GA
> > feature will be forward / backward compatible for. IMO, our current
> > commitment (one major release) is okay, but it would be good to strive
> for
> > doing this as infrequently as possible. In the future, we may decide to
> do
> > major releases less often, which will naturally lengthen the commitment
> > times.
> >
> > - I like the idea of phasing in the testing bar as features move from
> alpha
> > -> beta -> GA. I think it'd be good to point to examples of features
> where
> > the testing is done "right" for each stage. It should help contributors
> > know what to shoot for.
> >
> > - Plenty of GA features today do not meet the testing bar you've
> mentioned,
> > including some "day 1" features. This is fine — it is a natural
> consequence
> > of raising the testing bar over time — but we should have an idea of what
> > we want to do about this. One possible approach is to require that tests
> be
> > added to meet the bar when fixes or changes are made to the feature. But
> > this leads to situations where a small change can't be made without
> adding
> > a mountain of tests. IMO it'd be good to do an amount of new testing
> > commensurate with the scope of the change. A big refactor to a feature
> that
> > doesn't have much testing should involve adding a mountain of tests to
> it.
> > But we don't necessarily need to require that for a small bug fix or
> > enhancement (but it would be great, of course!).
> >
> > - For "beta" the definition you suggest is all negative ("not battle
> > tested", "may change", "may not be compatible"). We should include
> > something positive as well, to illustrate what makes beta better than
> > alpha. How about "no major known issues" or "no major API changes
> planned"?
> >
> > - I would suggest moving the "appropriate user-facing documentation"
> > requirement to beta rather than GA. In order to have a useful beta
> testing
> > period, we need to have good user-facing docs so people can try the
> feature
> > out.
> >
> > - I think we might want to leave some alpha features undocumented, if
> their
> > quality or stability level is so low that they won't be useful to people
> > that aren't developers. The goal would be to avoid clogging up the
> > user-facing docs with a bunch of half-baked stuff. Too much of that
> lowers
> > the perceived quality level of the project.
> >
> > Now, thinking about specific features, I suggest we classify the current
> > experimental features in the following way:
> >
> > - Java 11 support: Beta or GA (depending on how good the test coverage
> is)
> > - HTTP remote task runner: Alpha (there aren't integration tests yet)
> > - Router process: GA
> > - Indexer process: Alpha or Beta (also depending on how good the test
> > coverage is)
> > - Segment locking / minor compaction: Alpha
> > - Approximate histograms: GA, but deprecated (they are stable and have
> > plenty of tests, but users should consider switching to DataSketches
> > quantiles)
> > - Lookups: Beta
> > - Kinesis ingestion: GA (now that there are integration tests:
> > https://github.com/apache/druid/pull/9724)
> > - Materialized view extension: Alpha
> > - Moments sketch extension: Alpha
> >
> > On Mon, Jun 8, 2020 at 1:49 PM Suneet Saldanha <suneet.saldanha@imply.io
> >
> > wrote:
> >
> > > Hi Druid devs!
> > >
> > > I've been thinking about our release process and would love to get your
> > > thoughts on how we manage new features.
> > >
> > > When a new feature is added is it first marked as experimental?
> > > How do users know which features are experimental?
> > > How do we ensure that features do not break with each new release?
> > > Should the release manager manually check each feature works as part of
> > the
> > > release process?
> > >     This doesn't seem like it can scale.
> > > Should integration tests always be required if the feature is being
> added
> > > to core?
> > >
> > > To address these issues, I'd like to propose we introduce a feature
> > > lifecycle for all features so that we can set expectations for users
> > > appropriately - either in the docs, product or both. I'd like to
> propose
> > > something like this:
> > > * Alpha - Known major bugs / performance issues. Incomplete
> > functionality.
> > > Disabled by default.
> > > * Beta - Feature is not yet battle tested in production. API and
> > > compatibility may change in the future. May not be forward/ backward
> > > compatible.
> > > * GA - Feature has appropriate user facing documentation and testing so
> > > that it won't regres with a version upgrade. Will be forward / backward
> > > compatible for x releases (maybe 4? ~ 1 year)
> > >
> > > I think a model like this will allow us to continue to ship features
> > > quickly while keeping the release quality bar high so that our users
> can
> > > continue to rely on Druid without worrying about upgrade issues.
> > > I understand that adding integration tests may not always make sense
> for
> > > early / experimental features when we're uncertain of the API or the
> > > broader use case we're trying to solve. This model would make it clear
> to
> > > our users which features are still work in progress, and which ones
> they
> > > can expect to remain stable for a longer time.
> > >
> > > Below is an example of how I think this model can be applied to a new
> > > feature:
> > >
> > > This PR adds support for a new feature -
> > > https://github.com/apache/druid/pull/9449
> > >
> > > While it has been tested locally, there may be changes that enter Druid
> > > before the 0.19 release that break this feature, or more likely - a
> > > refactoring after 0.19 that breaks something in this feature. In this
> > > example, I think the feature should be marked as alpha, since there are
> > > future changes expected to the functionality. At this stage integration
> > > tests are not expected. Once the feature is complete, there should be
> > happy
> > > path integration tests for the feature and it can graduate to Beta.
> After
> > > it has been running in production for a while, the feature can graduate
> > to
> > > GA once we've added enough integration tests that we feel confident
> that
> > > the feature will continue to work if the integration tests run
> > > successfully.
> > >
> > > I know this is a very long email, but I look forward to hearing your
> > > thoughts on this.
> > > Suneet
> > >
> >
>

Re: Feature lifecycle for Druid features

Posted by Surekha Saharan <su...@imply.io>.

Thanks for starting this discussion, regression testing is essential for
maintaining code quality and happy users. Introducing feature
lifecycle sounds good to me, I have some questions and comments:

- Right now we have features categorized into "experimental" and "GA", are
you suggesting to further divide the experimental features into "alpha" and
"beta" ?

- We should clearly define the meaning of "alpha", "beta" and "GA" features
to set expectations for both contributors and users. I think in your
"alpha" definition, it says "known major bugs", that doesn't sound right, i
think there can be "known limitations or unknown major bugs" in alpha. I
agree with the idea suggested on undocumented "alpha" features and
documented "beta" features.

- For GA features, you suggested forward/backward compatibility for x
number of releases, i think 2 might be a good number in that case, as I
often see folks skipping a version when upgrading, or if we decide to do
major releases less often in future as suggested, 1 is fine too.

- The existing GA features which do not meet the existing testing criteria,
instead of putting the burden on the developer who fixes a bug in that area
in future, we could create github issues for major features and someone
wanting to understand that feature from the community could
potentially help write integration tests for those features.Of course if a
contributor writes those tests as part of doing some bug fix/enhancement,
it's welcome.

- Not all, but i think a lot of regression can be prevented by writing good
unit tests for changes related to API, serde etc, while integration tests
are necessary for end to end testing and catching dependency changes.

Btw is there a page where we list all the experimental features or is it on
the feature page where we add a note on it's experimental nature or in the
release notes. Thanks Gian for classifying the current features into
"alpha", "beta", "GA", it helps in understanding the definition.

-Surekha

On Mon, Jun 15, 2020 at 8:40 PM Gian Merlino <gi...@apache.org> wrote:

> IMO the alpha / beta / GA terminology makes sense, and makes things clearer
> to users, which is good.
>
> Some thoughts on the specifics of your proposal:
>
> - You're suggesting we commit to a specific number of releases that a GA
> feature will be forward / backward compatible for. IMO, our current
> commitment (one major release) is okay, but it would be good to strive for
> doing this as infrequently as possible. In the future, we may decide to do
> major releases less often, which will naturally lengthen the commitment
> times.
>
> - I like the idea of phasing in the testing bar as features move from alpha
> -> beta -> GA. I think it'd be good to point to examples of features where
> the testing is done "right" for each stage. It should help contributors
> know what to shoot for.
>
> - Plenty of GA features today do not meet the testing bar you've mentioned,
> including some "day 1" features. This is fine — it is a natural consequence
> of raising the testing bar over time — but we should have an idea of what
> we want to do about this. One possible approach is to require that tests be
> added to meet the bar when fixes or changes are made to the feature. But
> this leads to situations where a small change can't be made without adding
> a mountain of tests. IMO it'd be good to do an amount of new testing
> commensurate with the scope of the change. A big refactor to a feature that
> doesn't have much testing should involve adding a mountain of tests to it.
> But we don't necessarily need to require that for a small bug fix or
> enhancement (but it would be great, of course!).
>
> - For "beta" the definition you suggest is all negative ("not battle
> tested", "may change", "may not be compatible"). We should include
> something positive as well, to illustrate what makes beta better than
> alpha. How about "no major known issues" or "no major API changes planned"?
>
> - I would suggest moving the "appropriate user-facing documentation"
> requirement to beta rather than GA. In order to have a useful beta testing
> period, we need to have good user-facing docs so people can try the feature
> out.
>
> - I think we might want to leave some alpha features undocumented, if their
> quality or stability level is so low that they won't be useful to people
> that aren't developers. The goal would be to avoid clogging up the
> user-facing docs with a bunch of half-baked stuff. Too much of that lowers
> the perceived quality level of the project.
>
> Now, thinking about specific features, I suggest we classify the current
> experimental features in the following way:
>
> - Java 11 support: Beta or GA (depending on how good the test coverage is)
> - HTTP remote task runner: Alpha (there aren't integration tests yet)
> - Router process: GA
> - Indexer process: Alpha or Beta (also depending on how good the test
> coverage is)
> - Segment locking / minor compaction: Alpha
> - Approximate histograms: GA, but deprecated (they are stable and have
> plenty of tests, but users should consider switching to DataSketches
> quantiles)
> - Lookups: Beta
> - Kinesis ingestion: GA (now that there are integration tests:
> https://github.com/apache/druid/pull/9724)
> - Materialized view extension: Alpha
> - Moments sketch extension: Alpha
>
> On Mon, Jun 8, 2020 at 1:49 PM Suneet Saldanha <su...@imply.io>
> wrote:
>
> > Hi Druid devs!
> >
> > I've been thinking about our release process and would love to get your
> > thoughts on how we manage new features.
> >
> > When a new feature is added is it first marked as experimental?
> > How do users know which features are experimental?
> > How do we ensure that features do not break with each new release?
> > Should the release manager manually check each feature works as part of
> the
> > release process?
> >     This doesn't seem like it can scale.
> > Should integration tests always be required if the feature is being added
> > to core?
> >
> > To address these issues, I'd like to propose we introduce a feature
> > lifecycle for all features so that we can set expectations for users
> > appropriately - either in the docs, product or both. I'd like to propose
> > something like this:
> > * Alpha - Known major bugs / performance issues. Incomplete
> functionality.
> > Disabled by default.
> > * Beta - Feature is not yet battle tested in production. API and
> > compatibility may change in the future. May not be forward/ backward
> > compatible.
> > * GA - Feature has appropriate user facing documentation and testing so
> > that it won't regres with a version upgrade. Will be forward / backward
> > compatible for x releases (maybe 4? ~ 1 year)
> >
> > I think a model like this will allow us to continue to ship features
> > quickly while keeping the release quality bar high so that our users can
> > continue to rely on Druid without worrying about upgrade issues.
> > I understand that adding integration tests may not always make sense for
> > early / experimental features when we're uncertain of the API or the
> > broader use case we're trying to solve. This model would make it clear to
> > our users which features are still work in progress, and which ones they
> > can expect to remain stable for a longer time.
> >
> > Below is an example of how I think this model can be applied to a new
> > feature:
> >
> > This PR adds support for a new feature -
> > https://github.com/apache/druid/pull/9449
> >
> > While it has been tested locally, there may be changes that enter Druid
> > before the 0.19 release that break this feature, or more likely - a
> > refactoring after 0.19 that breaks something in this feature. In this
> > example, I think the feature should be marked as alpha, since there are
> > future changes expected to the functionality. At this stage integration
> > tests are not expected. Once the feature is complete, there should be
> happy
> > path integration tests for the feature and it can graduate to Beta. After
> > it has been running in production for a while, the feature can graduate
> to
> > GA once we've added enough integration tests that we feel confident that
> > the feature will continue to work if the integration tests run
> > successfully.
> >
> > I know this is a very long email, but I look forward to hearing your
> > thoughts on this.
> > Suneet
> >
>

Re: Feature lifecycle for Druid features

Posted by Gian Merlino <gi...@apache.org>.

IMO the alpha / beta / GA terminology makes sense, and makes things clearer
to users, which is good.

Some thoughts on the specifics of your proposal:

- You're suggesting we commit to a specific number of releases that a GA
feature will be forward / backward compatible for. IMO, our current
commitment (one major release) is okay, but it would be good to strive for
doing this as infrequently as possible. In the future, we may decide to do
major releases less often, which will naturally lengthen the commitment
times.

- I like the idea of phasing in the testing bar as features move from alpha
-> beta -> GA. I think it'd be good to point to examples of features where
the testing is done "right" for each stage. It should help contributors
know what to shoot for.

- Plenty of GA features today do not meet the testing bar you've mentioned,
including some "day 1" features. This is fine — it is a natural consequence
of raising the testing bar over time — but we should have an idea of what
we want to do about this. One possible approach is to require that tests be
added to meet the bar when fixes or changes are made to the feature. But
this leads to situations where a small change can't be made without adding
a mountain of tests. IMO it'd be good to do an amount of new testing
commensurate with the scope of the change. A big refactor to a feature that
doesn't have much testing should involve adding a mountain of tests to it.
But we don't necessarily need to require that for a small bug fix or
enhancement (but it would be great, of course!).

- For "beta" the definition you suggest is all negative ("not battle
tested", "may change", "may not be compatible"). We should include
something positive as well, to illustrate what makes beta better than
alpha. How about "no major known issues" or "no major API changes planned"?

- I would suggest moving the "appropriate user-facing documentation"
requirement to beta rather than GA. In order to have a useful beta testing
period, we need to have good user-facing docs so people can try the feature
out.

- I think we might want to leave some alpha features undocumented, if their
quality or stability level is so low that they won't be useful to people
that aren't developers. The goal would be to avoid clogging up the
user-facing docs with a bunch of half-baked stuff. Too much of that lowers
the perceived quality level of the project.

Now, thinking about specific features, I suggest we classify the current
experimental features in the following way:

- Java 11 support: Beta or GA (depending on how good the test coverage is)
- HTTP remote task runner: Alpha (there aren't integration tests yet)
- Router process: GA
- Indexer process: Alpha or Beta (also depending on how good the test
coverage is)
- Segment locking / minor compaction: Alpha
- Approximate histograms: GA, but deprecated (they are stable and have
plenty of tests, but users should consider switching to DataSketches
quantiles)
- Lookups: Beta
- Kinesis ingestion: GA (now that there are integration tests:
https://github.com/apache/druid/pull/9724)
- Materialized view extension: Alpha
- Moments sketch extension: Alpha

On Mon, Jun 8, 2020 at 1:49 PM Suneet Saldanha <su...@imply.io>
wrote:

> Hi Druid devs!
>
> I've been thinking about our release process and would love to get your
> thoughts on how we manage new features.
>
> When a new feature is added is it first marked as experimental?
> How do users know which features are experimental?
> How do we ensure that features do not break with each new release?
> Should the release manager manually check each feature works as part of the
> release process?
>     This doesn't seem like it can scale.
> Should integration tests always be required if the feature is being added
> to core?
>
> To address these issues, I'd like to propose we introduce a feature
> lifecycle for all features so that we can set expectations for users
> appropriately - either in the docs, product or both. I'd like to propose
> something like this:
> * Alpha - Known major bugs / performance issues. Incomplete functionality.
> Disabled by default.
> * Beta - Feature is not yet battle tested in production. API and
> compatibility may change in the future. May not be forward/ backward
> compatible.
> * GA - Feature has appropriate user facing documentation and testing so
> that it won't regres with a version upgrade. Will be forward / backward
> compatible for x releases (maybe 4? ~ 1 year)
>
> I think a model like this will allow us to continue to ship features
> quickly while keeping the release quality bar high so that our users can
> continue to rely on Druid without worrying about upgrade issues.
> I understand that adding integration tests may not always make sense for
> early / experimental features when we're uncertain of the API or the
> broader use case we're trying to solve. This model would make it clear to
> our users which features are still work in progress, and which ones they
> can expect to remain stable for a longer time.
>
> Below is an example of how I think this model can be applied to a new
> feature:
>
> This PR adds support for a new feature -
> https://github.com/apache/druid/pull/9449
>
> While it has been tested locally, there may be changes that enter Druid
> before the 0.19 release that break this feature, or more likely - a
> refactoring after 0.19 that breaks something in this feature. In this
> example, I think the feature should be marked as alpha, since there are
> future changes expected to the functionality. At this stage integration
> tests are not expected. Once the feature is complete, there should be happy
> path integration tests for the feature and it can graduate to Beta. After
> it has been running in production for a while, the feature can graduate to
> GA once we've added enough integration tests that we feel confident that
> the feature will continue to work if the integration tests run
> successfully.
>
> I know this is a very long email, but I look forward to hearing your
> thoughts on this.
> Suneet
>

Re: Feature lifecycle for Druid features

Posted by Suneet Saldanha <su...@imply.io>.

Bumping this again.

I think having a point of view on this can be useful to the community.
Here's another example:

The Pac4j and Ranger extensions have been added recently as core extensions
in 0.18 and 0.19 respectively. As a new Druid user, I'd expect these
extensions to work in production with each new release of Druid, especially
since they are in core. Unfortunately, there are no integration tests so
it's possible that these extensions break if we do a major refactoring or
introduce some conflicting dependencies.

If we have an agreed feature lifecycle, it would make it easy for a Druid
user to understand which extensions / features they can rely on for every
release, and which features are not yet validated as part of the release,
without this being a large manual burden on the release manager.

Looking forward to hearing thoughts on this,
Suneet

On Mon, Jun 8, 2020 at 1:49 PM Suneet Saldanha <su...@imply.io>
wrote:

> Hi Druid devs!
>
> I've been thinking about our release process and would love to get your
> thoughts on how we manage new features.
>
> When a new feature is added is it first marked as experimental?
> How do users know which features are experimental?
> How do we ensure that features do not break with each new release?
> Should the release manager manually check each feature works as part of
> the release process?
>     This doesn't seem like it can scale.
> Should integration tests always be required if the feature is being added
> to core?
>
> To address these issues, I'd like to propose we introduce a feature
> lifecycle for all features so that we can set expectations for users
> appropriately - either in the docs, product or both. I'd like to propose
> something like this:
> * Alpha - Known major bugs / performance issues. Incomplete functionality.
> Disabled by default.
> * Beta - Feature is not yet battle tested in production. API and
> compatibility may change in the future. May not be forward/ backward
> compatible.
> * GA - Feature has appropriate user facing documentation and testing so
> that it won't regres with a version upgrade. Will be forward / backward
> compatible for x releases (maybe 4? ~ 1 year)
>
> I think a model like this will allow us to continue to ship features
> quickly while keeping the release quality bar high so that our users can
> continue to rely on Druid without worrying about upgrade issues.
> I understand that adding integration tests may not always make sense for
> early / experimental features when we're uncertain of the API or the
> broader use case we're trying to solve. This model would make it clear to
> our users which features are still work in progress, and which ones they
> can expect to remain stable for a longer time.
>
> Below is an example of how I think this model can be applied to a new
> feature:
>
> This PR adds support for a new feature -
> https://github.com/apache/druid/pull/9449
>
> While it has been tested locally, there may be changes that enter Druid
> before the 0.19 release that break this feature, or more likely - a
> refactoring after 0.19 that breaks something in this feature. In this
> example, I think the feature should be marked as alpha, since there are
> future changes expected to the functionality. At this stage integration
> tests are not expected. Once the feature is complete, there should be happy
> path integration tests for the feature and it can graduate to Beta. After
> it has been running in production for a while, the feature can graduate to
> GA once we've added enough integration tests that we feel confident that
> the feature will continue to work if the integration tests run successfully.
>
> I know this is a very long email, but I look forward to hearing your
> thoughts on this.
> Suneet
>