You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Ryan Blue <bl...@tabular.io> on 2022/01/08 00:37:43 UTC

Re: Iceberg engine version maintenance lifecycle

Sorry for the late reply here!

These look reasonable to me. I think that this will help us reason about
trade-offs next time we have a release issue like the current one. We
should simply mark the 3.2 support as beta and get the release out next
time. I also think that we should not create situations with regressions
like this again. We should probably have copied the old MERGE/UPDATE/DELETE
plans into 3.2 to avoid a regression and then update them to the new
implementations later, without affecting releases.

Thanks for writing this up, Jack!

Ryan

On Wed, Dec 15, 2021 at 7:18 PM Jack Ye <ye...@gmail.com> wrote:

> Hi everyone,
>
> As a part of the ongoing 0.13.0 release, we are starting to formally
> support multiple engine versions for Spark, Flink and Hive. I think it is
> worth defining a formal process for us to add a new supported version,
> maintain existing versions and deprecate old versions. We briefly touched
> this topic when doing the refactoring, but I think now is a good time to
> formalize it and place it as a part of the Iceberg public documentation. As
> a starter for brainstorming, here is the process I think:
>
> Each engine has the following lifecycle states:
>
> 1. *Beta*: an engine supported is added, but still in the experimental
> stage. Maybe the engine version itself is still in preview (e.g. Spark
> 3.0.0-preview), or the engine does not yet have full feature
> compatibility compared to old versions yet. This state allows us to
> release an engine version support without the need to wait for feature
> parity, shortening the release time.
>
> 2. *Maintained*: an engine version is being actively maintained by the
> community. Users can expect feature parity for most features across all the
> maintained versions. If a feature has to leverage some new engine
> functionalities that older versions don't have, then feature parity is not
> required. For code contributors,
> - New features should always be prioritized first in the latest version
> (the latest version could be a maintained or beta version)
> - For features that could be backported, the contributor is encouraged to
> either also perform backports in separated PRs, or at least create some
> issues to track the backport.
> - If the change is small enough like a few lines, updating all versions at
> once is good enough. Otherwise, using separated PRs for each version is
> recommended.
>
> 3. *Deprecating*: an engine version is no longer actively maintained.
> People who are still interested in the version can backport any necessary
> feature or bug fix from newer versions, but the community will not spend
> effort in achieving feature parity. We recommend users to move towards a
> newer version, and we expect contributions to the specific version to
> diminish over time, and eventually no change is added to the version. At
> that time we can move the version to the end of life.
>
> 4. *End-of-life*: a vote can be initiated to fully remove a deprecating
> version out of the Iceberg repo to mark as its end of life. I am not sure
> if we should remove all the code, but I think it would help push people
> forward and keep the repository healthy.
>
> With the lifecycle states described above, we will add 1 doc section under
> each engine to describe the current engine version support status. A PR
> will be needed to perform any state transition, and that could serve as the
> place to discuss if the transition is appropriate or not.
>
> Any thoughts about the process?
>
> Best,
> Jack Ye
>


-- 
Ryan Blue
Tabular

Re: Iceberg engine version maintenance lifecycle

Posted by Kyle Bendickson <ky...@tabular.io>.
Thank you Jack for your thoughts.

I'm very much in agreement with you.

I'd like to discuss the beta version further.

Ideally, to me, the beta version is the minimum change set to work as-is
with that version of the system. We would ideally create a beta that
ignores new features, optimizations, etc where possible, but allowing for
code changes where APIs have changed (eg a method signature changed). Like
when the new folder is added, but deferring PRs that take advantage of new
features unless the old pathways have been removed.

This seems like it would help determine where breaking changes are
introduced. Given that two systems are changing at once (Iceberg and the
engine), there's no guarantee that the new version of the engine isn't
causing some problem, but it would still give us a reproducible build to
try things out on and better ability to point to a changeset / point in
time or test different versions to determine where the problem came from.

Also, I know several people that now test with the SNAPSHOT version,
especially so they can test newer engine support (as well as for upgrade
preparedness). Many of these people have submitted valuable issues and
making it easier for them to test as early as possible, when willing, could
be advantageous.

If we wanted to automate this somewhat, we might be able to create a GitHub
action that pushes to a SNAPSHOT repo for the beta with a new tag, or
possibly the creation of a new branch.

If you'd like to sync about this, I'd be happy to help contribute a Github
Action and scripts to help automate some of the creation of the beta
versions.

Let me know if you'd like to sync on it - particularly if we could attomate
based off the creation of a tag or of a branch etc. Then I could make a PR
for the GithubAction. The creation event seems like the new folder / new
introduction of the Gradle project, but tagging or branching would be
easier to integrate into GH Actions and provide us additional control on
what we determine to be the beta version.

Thanks,
Kyle (kbendick)

On Fri, Jan 7, 2022 at 4:37 PM Ryan Blue <bl...@tabular.io> wrote:

> Sorry for the late reply here!
>
> These look reasonable to me. I think that this will help us reason about
> trade-offs next time we have a release issue like the current one. We
> should simply mark the 3.2 support as beta and get the release out next
> time. I also think that we should not create situations with regressions
> like this again. We should probably have copied the old MERGE/UPDATE/DELETE
> plans into 3.2 to avoid a regression and then update them to the new
> implementations later, without affecting releases.
>
> Thanks for writing this up, Jack!
>
> Ryan
>
> On Wed, Dec 15, 2021 at 7:18 PM Jack Ye <ye...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> As a part of the ongoing 0.13.0 release, we are starting to formally
>> support multiple engine versions for Spark, Flink and Hive. I think it is
>> worth defining a formal process for us to add a new supported version,
>> maintain existing versions and deprecate old versions. We briefly touched
>> this topic when doing the refactoring, but I think now is a good time to
>> formalize it and place it as a part of the Iceberg public documentation. As
>> a starter for brainstorming, here is the process I think:
>>
>> Each engine has the following lifecycle states:
>>
>> 1. *Beta*: an engine supported is added, but still in the experimental
>> stage. Maybe the engine version itself is still in preview (e.g. Spark
>> 3.0.0-preview), or the engine does not yet have full feature
>> compatibility compared to old versions yet. This state allows us to
>> release an engine version support without the need to wait for feature
>> parity, shortening the release time.
>>
>> 2. *Maintained*: an engine version is being actively maintained by the
>> community. Users can expect feature parity for most features across all the
>> maintained versions. If a feature has to leverage some new engine
>> functionalities that older versions don't have, then feature parity is not
>> required. For code contributors,
>> - New features should always be prioritized first in the latest version
>> (the latest version could be a maintained or beta version)
>> - For features that could be backported, the contributor is encouraged to
>> either also perform backports in separated PRs, or at least create some
>> issues to track the backport.
>> - If the change is small enough like a few lines, updating all versions
>> at once is good enough. Otherwise, using separated PRs for each version is
>> recommended.
>>
>> 3. *Deprecating*: an engine version is no longer actively maintained.
>> People who are still interested in the version can backport any necessary
>> feature or bug fix from newer versions, but the community will not spend
>> effort in achieving feature parity. We recommend users to move towards a
>> newer version, and we expect contributions to the specific version to
>> diminish over time, and eventually no change is added to the version. At
>> that time we can move the version to the end of life.
>>
>> 4. *End-of-life*: a vote can be initiated to fully remove a deprecating
>> version out of the Iceberg repo to mark as its end of life. I am not sure
>> if we should remove all the code, but I think it would help push people
>> forward and keep the repository healthy.
>>
>> With the lifecycle states described above, we will add 1 doc section
>> under each engine to describe the current engine version support status. A
>> PR will be needed to perform any state transition, and that could serve as
>> the place to discuss if the transition is appropriate or not.
>>
>> Any thoughts about the process?
>>
>> Best,
>> Jack Ye
>>
>
>
> --
> Ryan Blue
> Tabular
>