You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Daniel Weeks <dw...@apache.org> on 2022/09/28 23:17:57 UTC

[DISCUSS] Semantic Versioning Guarantees post 1.0

Hey Iceberg Community,

I wanted to raise a discussion thread with respect to how to handle
semantic versioning and deprecations so that we can document the
expectations for changes to the baseline going forward.

The goal is to clarify/formalize for users, contributors, and reviewers the
expectations around changes to Iceberg public facing interfaces and what
that means in context of the 1.0 release and future releases.

Prior to 1.0, there has been an informal policy that all public facing APIs
require deprecation and backwards compatibility for at least one minor
release.  Earlier discussion around the 1.0 release included stronger major
version guarantees for the iceberg-api module and Revapi was introduced
with the intent to enforce additional major version guarantees.  Those
interfaces are the primary set that Iceberg users interface with and minor
version changes could be disruptive.

However, this still leaves a large portion of the codebase without a clear
designation of what is "publicly facing" and what is considered
"internal".  During the community sync today, a few different approaches
were discussed but it sounded like there was some support for the proposed
policy below.  The expectation being that some of the central modules would
require a deprecation cycle to ensure stronger guarantees while other
modules that are less likely to have direct dependencies would still err
toward deprecations, but if more significant structural changes are
necessary or solutions are difficult to incorporate without breaking
changes, it would be up to the discretion of reviewers/committers to allow
breaking changes.

I've also included some other ideas that are more strict or more flexible
as alternatives.

Please share support, objections, or alternatives that you think clarify
the approach going forward.

*Proposed Policy:*

*Major Version Deprecations Required*
iceberg-api module

*Minor Version Deprecations Required*
iceberg-common
iceberg-core
iceberg-parquet
iceberg-orc

*Minor Version Deprecations Discretionary*
(all modules not referenced above)

*Alternatives:*

   1. Formalize current policy
      - Minor version guarantees only
      - No major version guarantees
   2. Strict policy
      - Major version guarantees for iceberg-api module
      - Minor version guarantees for all other modules
      - Strongest guarantees, least flexible
   3. Discretionary Policy
      - Major version for iceberg-api
      - Discretionary for other modules
      - Most flexible, weaker guarantees
   4. Annotation driven policy
      - Experimental/Stable/etc. annotations denote policy at a
      class/interface-level
      - Most granular, difficult to implement/maintain

-Dan

Re: [DISCUSS] Semantic Versioning Guarantees post 1.0

Posted by Daniel Weeks <dw...@apache.org>.
Ok, just to close out this thread.  It looks like there's general
consensus the the approach outlined above, but also including the
iceberg-data module.

After looking more closely, iceberg-data and iceberg-common don't really
have much content, but they are used heavily by other areas of the
codebase, so I think the long term plan is to fold them into the
iceberg-core module so they should be treated similarly.

I'll put together a PR to update the contributing section to include
information about semantic versioning.

-Dan

On Fri, Sep 30, 2022 at 10:16 AM Daniel Weeks <da...@gmail.com>
wrote:

> Yufei, great questions.  I'll take a look at iceberg-data and see how
> that's being used/changing, but it might be a good idea to include it as
> well.
>
> I do think there are areas like you mentioned in other modules that we do
> want to try to maintain backwards compatibility and the goal is always make
> reasonable attempts to go through a deprecation cycle.
>
> Annotation based approach raised some concerns because it's hard to
> maintain and the annotations fall into disrepair over time.  I'm open to
> considering this as a possible extension of the proposed policy if we see
> issues, but there would be a fair amount of work to go through the codebase
> and start marking classes.  I feel we're trying to strike a balance of low
> up-front effort, but increase the clarity around the guarantees.
>
> Interested if others feel strongly about it.
> -Dan
>
> On Thu, Sep 29, 2022 at 3:07 PM Yufei Gu <fl...@gmail.com> wrote:
>
>> +1 for the proposed approach. Here is a question and a minor suggestion.
>>
>> Question: Do we consider the module iceberg-data as the minor version
>> compatible guaranteed?
>>
>> Suggestion: I get that we don’t want modules like spark/flink to be minor
>> and major versions compatible. It’s simple, and works well for both users
>> and developers over time. What if we want the compatible guarantee on
>> certain classes from these modules, e.g., class SparkActions. Is there a
>> way to do that? My suggestion is that we can still leverage annotations.
>> It’s not necessary to be the way of Spark/Hadoop, e.g., @experimental or
>> @stable. Instead, the annotation can just tell whether this is an API or
>> not. Considering we have two levels of compatibility guarantees, we may
>> have two annotations like @MinorVersionCompatibleAPI and
>> @MajorVersionCompatibleAPI.
>>
>> Best,
>>
>> Yufei
>>
>> `This is not a contribution`
>>
>>
>> On Thu, Sep 29, 2022 at 1:06 PM Ryan Blue <bl...@tabular.io> wrote:
>>
>>> +1 to the approach outlined here. Thanks, Dan!
>>>
>>> On Wed, Sep 28, 2022 at 4:18 PM Daniel Weeks <dw...@apache.org> wrote:
>>>
>>>> Hey Iceberg Community,
>>>>
>>>> I wanted to raise a discussion thread with respect to how to handle
>>>> semantic versioning and deprecations so that we can document the
>>>> expectations for changes to the baseline going forward.
>>>>
>>>> The goal is to clarify/formalize for users, contributors, and reviewers
>>>> the expectations around changes to Iceberg public facing interfaces and
>>>> what that means in context of the 1.0 release and future releases.
>>>>
>>>> Prior to 1.0, there has been an informal policy that all public facing
>>>> APIs require deprecation and backwards compatibility for at least one minor
>>>> release.  Earlier discussion around the 1.0 release included stronger major
>>>> version guarantees for the iceberg-api module and Revapi was introduced
>>>> with the intent to enforce additional major version guarantees.  Those
>>>> interfaces are the primary set that Iceberg users interface with and minor
>>>> version changes could be disruptive.
>>>>
>>>> However, this still leaves a large portion of the codebase without a
>>>> clear designation of what is "publicly facing" and what is considered
>>>> "internal".  During the community sync today, a few different approaches
>>>> were discussed but it sounded like there was some support for the proposed
>>>> policy below.  The expectation being that some of the central modules would
>>>> require a deprecation cycle to ensure stronger guarantees while other
>>>> modules that are less likely to have direct dependencies would still err
>>>> toward deprecations, but if more significant structural changes are
>>>> necessary or solutions are difficult to incorporate without breaking
>>>> changes, it would be up to the discretion of reviewers/committers to allow
>>>> breaking changes.
>>>>
>>>> I've also included some other ideas that are more strict or more
>>>> flexible as alternatives.
>>>>
>>>> Please share support, objections, or alternatives that you think
>>>> clarify the approach going forward.
>>>>
>>>> *Proposed Policy:*
>>>>
>>>> *Major Version Deprecations Required*
>>>> iceberg-api module
>>>>
>>>> *Minor Version Deprecations Required*
>>>> iceberg-common
>>>> iceberg-core
>>>> iceberg-parquet
>>>> iceberg-orc
>>>>
>>>> *Minor Version Deprecations Discretionary*
>>>> (all modules not referenced above)
>>>>
>>>> *Alternatives:*
>>>>
>>>>    1. Formalize current policy
>>>>       - Minor version guarantees only
>>>>       - No major version guarantees
>>>>    2. Strict policy
>>>>       - Major version guarantees for iceberg-api module
>>>>       - Minor version guarantees for all other modules
>>>>       - Strongest guarantees, least flexible
>>>>    3. Discretionary Policy
>>>>       - Major version for iceberg-api
>>>>       - Discretionary for other modules
>>>>       - Most flexible, weaker guarantees
>>>>    4. Annotation driven policy
>>>>       - Experimental/Stable/etc. annotations denote policy at a
>>>>       class/interface-level
>>>>       - Most granular, difficult to implement/maintain
>>>>
>>>> -Dan
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>

Re: [DISCUSS] Semantic Versioning Guarantees post 1.0

Posted by Daniel Weeks <da...@gmail.com>.
Yufei, great questions.  I'll take a look at iceberg-data and see how
that's being used/changing, but it might be a good idea to include it as
well.

I do think there are areas like you mentioned in other modules that we do
want to try to maintain backwards compatibility and the goal is always make
reasonable attempts to go through a deprecation cycle.

Annotation based approach raised some concerns because it's hard to
maintain and the annotations fall into disrepair over time.  I'm open to
considering this as a possible extension of the proposed policy if we see
issues, but there would be a fair amount of work to go through the codebase
and start marking classes.  I feel we're trying to strike a balance of low
up-front effort, but increase the clarity around the guarantees.

Interested if others feel strongly about it.
-Dan

On Thu, Sep 29, 2022 at 3:07 PM Yufei Gu <fl...@gmail.com> wrote:

> +1 for the proposed approach. Here is a question and a minor suggestion.
>
> Question: Do we consider the module iceberg-data as the minor version
> compatible guaranteed?
>
> Suggestion: I get that we don’t want modules like spark/flink to be minor
> and major versions compatible. It’s simple, and works well for both users
> and developers over time. What if we want the compatible guarantee on
> certain classes from these modules, e.g., class SparkActions. Is there a
> way to do that? My suggestion is that we can still leverage annotations.
> It’s not necessary to be the way of Spark/Hadoop, e.g., @experimental or
> @stable. Instead, the annotation can just tell whether this is an API or
> not. Considering we have two levels of compatibility guarantees, we may
> have two annotations like @MinorVersionCompatibleAPI and
> @MajorVersionCompatibleAPI.
>
> Best,
>
> Yufei
>
> `This is not a contribution`
>
>
> On Thu, Sep 29, 2022 at 1:06 PM Ryan Blue <bl...@tabular.io> wrote:
>
>> +1 to the approach outlined here. Thanks, Dan!
>>
>> On Wed, Sep 28, 2022 at 4:18 PM Daniel Weeks <dw...@apache.org> wrote:
>>
>>> Hey Iceberg Community,
>>>
>>> I wanted to raise a discussion thread with respect to how to handle
>>> semantic versioning and deprecations so that we can document the
>>> expectations for changes to the baseline going forward.
>>>
>>> The goal is to clarify/formalize for users, contributors, and reviewers
>>> the expectations around changes to Iceberg public facing interfaces and
>>> what that means in context of the 1.0 release and future releases.
>>>
>>> Prior to 1.0, there has been an informal policy that all public facing
>>> APIs require deprecation and backwards compatibility for at least one minor
>>> release.  Earlier discussion around the 1.0 release included stronger major
>>> version guarantees for the iceberg-api module and Revapi was introduced
>>> with the intent to enforce additional major version guarantees.  Those
>>> interfaces are the primary set that Iceberg users interface with and minor
>>> version changes could be disruptive.
>>>
>>> However, this still leaves a large portion of the codebase without a
>>> clear designation of what is "publicly facing" and what is considered
>>> "internal".  During the community sync today, a few different approaches
>>> were discussed but it sounded like there was some support for the proposed
>>> policy below.  The expectation being that some of the central modules would
>>> require a deprecation cycle to ensure stronger guarantees while other
>>> modules that are less likely to have direct dependencies would still err
>>> toward deprecations, but if more significant structural changes are
>>> necessary or solutions are difficult to incorporate without breaking
>>> changes, it would be up to the discretion of reviewers/committers to allow
>>> breaking changes.
>>>
>>> I've also included some other ideas that are more strict or more
>>> flexible as alternatives.
>>>
>>> Please share support, objections, or alternatives that you think clarify
>>> the approach going forward.
>>>
>>> *Proposed Policy:*
>>>
>>> *Major Version Deprecations Required*
>>> iceberg-api module
>>>
>>> *Minor Version Deprecations Required*
>>> iceberg-common
>>> iceberg-core
>>> iceberg-parquet
>>> iceberg-orc
>>>
>>> *Minor Version Deprecations Discretionary*
>>> (all modules not referenced above)
>>>
>>> *Alternatives:*
>>>
>>>    1. Formalize current policy
>>>       - Minor version guarantees only
>>>       - No major version guarantees
>>>    2. Strict policy
>>>       - Major version guarantees for iceberg-api module
>>>       - Minor version guarantees for all other modules
>>>       - Strongest guarantees, least flexible
>>>    3. Discretionary Policy
>>>       - Major version for iceberg-api
>>>       - Discretionary for other modules
>>>       - Most flexible, weaker guarantees
>>>    4. Annotation driven policy
>>>       - Experimental/Stable/etc. annotations denote policy at a
>>>       class/interface-level
>>>       - Most granular, difficult to implement/maintain
>>>
>>> -Dan
>>>
>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Re: [DISCUSS] Semantic Versioning Guarantees post 1.0

Posted by Yufei Gu <fl...@gmail.com>.
+1 for the proposed approach. Here is a question and a minor suggestion.

Question: Do we consider the module iceberg-data as the minor version
compatible guaranteed?

Suggestion: I get that we don’t want modules like spark/flink to be minor
and major versions compatible. It’s simple, and works well for both users
and developers over time. What if we want the compatible guarantee on
certain classes from these modules, e.g., class SparkActions. Is there a
way to do that? My suggestion is that we can still leverage annotations.
It’s not necessary to be the way of Spark/Hadoop, e.g., @experimental or
@stable. Instead, the annotation can just tell whether this is an API or
not. Considering we have two levels of compatibility guarantees, we may
have two annotations like @MinorVersionCompatibleAPI and
@MajorVersionCompatibleAPI.

Best,

Yufei

`This is not a contribution`


On Thu, Sep 29, 2022 at 1:06 PM Ryan Blue <bl...@tabular.io> wrote:

> +1 to the approach outlined here. Thanks, Dan!
>
> On Wed, Sep 28, 2022 at 4:18 PM Daniel Weeks <dw...@apache.org> wrote:
>
>> Hey Iceberg Community,
>>
>> I wanted to raise a discussion thread with respect to how to handle
>> semantic versioning and deprecations so that we can document the
>> expectations for changes to the baseline going forward.
>>
>> The goal is to clarify/formalize for users, contributors, and reviewers
>> the expectations around changes to Iceberg public facing interfaces and
>> what that means in context of the 1.0 release and future releases.
>>
>> Prior to 1.0, there has been an informal policy that all public facing
>> APIs require deprecation and backwards compatibility for at least one minor
>> release.  Earlier discussion around the 1.0 release included stronger major
>> version guarantees for the iceberg-api module and Revapi was introduced
>> with the intent to enforce additional major version guarantees.  Those
>> interfaces are the primary set that Iceberg users interface with and minor
>> version changes could be disruptive.
>>
>> However, this still leaves a large portion of the codebase without a
>> clear designation of what is "publicly facing" and what is considered
>> "internal".  During the community sync today, a few different approaches
>> were discussed but it sounded like there was some support for the proposed
>> policy below.  The expectation being that some of the central modules would
>> require a deprecation cycle to ensure stronger guarantees while other
>> modules that are less likely to have direct dependencies would still err
>> toward deprecations, but if more significant structural changes are
>> necessary or solutions are difficult to incorporate without breaking
>> changes, it would be up to the discretion of reviewers/committers to allow
>> breaking changes.
>>
>> I've also included some other ideas that are more strict or more flexible
>> as alternatives.
>>
>> Please share support, objections, or alternatives that you think clarify
>> the approach going forward.
>>
>> *Proposed Policy:*
>>
>> *Major Version Deprecations Required*
>> iceberg-api module
>>
>> *Minor Version Deprecations Required*
>> iceberg-common
>> iceberg-core
>> iceberg-parquet
>> iceberg-orc
>>
>> *Minor Version Deprecations Discretionary*
>> (all modules not referenced above)
>>
>> *Alternatives:*
>>
>>    1. Formalize current policy
>>       - Minor version guarantees only
>>       - No major version guarantees
>>    2. Strict policy
>>       - Major version guarantees for iceberg-api module
>>       - Minor version guarantees for all other modules
>>       - Strongest guarantees, least flexible
>>    3. Discretionary Policy
>>       - Major version for iceberg-api
>>       - Discretionary for other modules
>>       - Most flexible, weaker guarantees
>>    4. Annotation driven policy
>>       - Experimental/Stable/etc. annotations denote policy at a
>>       class/interface-level
>>       - Most granular, difficult to implement/maintain
>>
>> -Dan
>>
>>
>
> --
> Ryan Blue
> Tabular
>

Re: [DISCUSS] Semantic Versioning Guarantees post 1.0

Posted by Ryan Blue <bl...@tabular.io>.
+1 to the approach outlined here. Thanks, Dan!

On Wed, Sep 28, 2022 at 4:18 PM Daniel Weeks <dw...@apache.org> wrote:

> Hey Iceberg Community,
>
> I wanted to raise a discussion thread with respect to how to handle
> semantic versioning and deprecations so that we can document the
> expectations for changes to the baseline going forward.
>
> The goal is to clarify/formalize for users, contributors, and reviewers
> the expectations around changes to Iceberg public facing interfaces and
> what that means in context of the 1.0 release and future releases.
>
> Prior to 1.0, there has been an informal policy that all public facing
> APIs require deprecation and backwards compatibility for at least one minor
> release.  Earlier discussion around the 1.0 release included stronger major
> version guarantees for the iceberg-api module and Revapi was introduced
> with the intent to enforce additional major version guarantees.  Those
> interfaces are the primary set that Iceberg users interface with and minor
> version changes could be disruptive.
>
> However, this still leaves a large portion of the codebase without a clear
> designation of what is "publicly facing" and what is considered
> "internal".  During the community sync today, a few different approaches
> were discussed but it sounded like there was some support for the proposed
> policy below.  The expectation being that some of the central modules would
> require a deprecation cycle to ensure stronger guarantees while other
> modules that are less likely to have direct dependencies would still err
> toward deprecations, but if more significant structural changes are
> necessary or solutions are difficult to incorporate without breaking
> changes, it would be up to the discretion of reviewers/committers to allow
> breaking changes.
>
> I've also included some other ideas that are more strict or more flexible
> as alternatives.
>
> Please share support, objections, or alternatives that you think clarify
> the approach going forward.
>
> *Proposed Policy:*
>
> *Major Version Deprecations Required*
> iceberg-api module
>
> *Minor Version Deprecations Required*
> iceberg-common
> iceberg-core
> iceberg-parquet
> iceberg-orc
>
> *Minor Version Deprecations Discretionary*
> (all modules not referenced above)
>
> *Alternatives:*
>
>    1. Formalize current policy
>       - Minor version guarantees only
>       - No major version guarantees
>    2. Strict policy
>       - Major version guarantees for iceberg-api module
>       - Minor version guarantees for all other modules
>       - Strongest guarantees, least flexible
>    3. Discretionary Policy
>       - Major version for iceberg-api
>       - Discretionary for other modules
>       - Most flexible, weaker guarantees
>    4. Annotation driven policy
>       - Experimental/Stable/etc. annotations denote policy at a
>       class/interface-level
>       - Most granular, difficult to implement/maintain
>
> -Dan
>
>

-- 
Ryan Blue
Tabular