You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ahmet Altay <al...@google.com> on 2020/09/17 22:46:53 UTC
Re: Semantic versioning

Did we end up updating the documentation (in [1] and elsewhere)? There was
a (not major but still) breaking change in python related to typehints in
2.24.0 release [2].

/cc +Udi Meiri <eh...@google.com>

[1] https://beam.apache.org/get-started/downloads/
[2]
https://github.com/apache/beam/pull/12745/commits/222cd448fe0262fcc5557186b58013ec7bf26622

On Thu, Jun 4, 2020 at 5:07 PM Robert Bradshaw <ro...@google.com> wrote:

> That tool is looks great; we should use it more often! (In fact, there's a
> pending RC right now :). It looks like we don't generally do too bad, but
> can help prevent accidental slippage.
>
> As for whether we should provide semantic versioning, that's a really
> difficult question. I don't think Beam is at a point that we can or should
> provide 100% semantic versioning, but exceptions should be few and far
> between, and hopefully appropriately called out. Technically bugfixes can
> be "backwards incompatible" and I think discouraging/disallowing dangerous
> (or unexpected) behavior can also be fair game. @Experimental annotations
> can be useful, but when something has that label for years I think it
> loses its meaning.
>
> If we're claiming strict semantic versioning, we should update that, or at
> least add some caveats.
>
>
> On Fri, May 29, 2020 at 9:57 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Ismaël, this is awesome!! Can we incorporate it into our processes? No
>> matter what our policy, this is great data. (FWIW also we hit 100%
>> compatibility a lot of the time... so there! :-p)
>>
>> But anyhow I completely agree that strict semver is too hard and not
>> valuable enough. I think we should use semver "in spirit". We "basically"
>> and "mostly" and "most reasonably" expect to not break users when they
>> upgrade minor versions. And patch versions are irrelevant but if we
>> released one, they "should" also be reversible without breakage.
>>
>> To add my own spin on Ismaël's point about treating this as a tradeoff,
>> backwards compatibility is always more vague than it seems at first, and
>> usually prioritized wrong IMO.
>>
>>  - Breaking someone's compile is usually easy to detect (hence can be
>> automated) but also usually easy to fix so the user burden is minimal
>> except when you are a diamond dependency (and then semver doesn't help,
>> because transitive deps will not have compatible major versions... see
>> Guava or Protobuf)
>>  - Breaking someone via a runtime error is a bigger deal. It happens
>> often, of course, and we fix it pretty urgently.
>>  - Breaking someone via a performance degradation is just as broken, but
>> hard to repro if it isn't widespread. We may never fix them.
>>  - Breaking someone via giving a wrong answer is the worst possible
>> breakage. It is silent, looks fine, and simply breaks their results.
>>  - Breaking someone via upgrading a transitive dep where there is a
>> backwards-incompatible change somewhere deep is often unfixable.
>>  - Breaking someone via turning down a version of the client library.
>> Cloud providers do this because they cannot afford not to. Services have
>> shorter lifespans than shipped libraries (notably the opposite for
>> build-every-time libraries).
>>  - Breaking someone via keeping transitive deps stable, hence transitive
>> services turn down versions of their client library. This is a direct
>> conflict between "shipped library" style compatibility guarantees and
>> "service" style compatibility guarantees.
>>  - Breaking someone via some field they found via reflection, or
>> sdk/util, or context.stateInternals, or Python where private stuff sort of
>> exist but not really, and <insert other language's encapsulation
>> limitations>. Really, only having very strict documentation can define a
>> supported API surface, and we are not even close to having this. And even
>> if we had it, IDEs would autocomplete things we don't want them to. This is
>> pretty hard.
>>
>> Examples where I am not that happy with how Beam went:
>>
>>  - https://s.apache.org/finishing-triggers-drop-data: Because it was a
>> construction-time breaking change, this was pushed back for years, even
>> though leaving it in caused data loss. (based on the StackOverflow and
>> email volume for this, plenty of users had finishing triggers)
>>  - https://issues.apache.org/jira/browse/BEAM-6906: mutable accumulators
>> are unsafe to use unless cloned, given our CombineFn API and standard
>> fusion techniques. We introduced a new API and documented that users should
>> probably use it, leaving the data loss risk in place to avoid a compile
>> time breaking change. It is still there.
>>  - Beam 2.0.0: we waited a long time to have a first "stable" release,
>> and rushed to make all the breaking changes, because we were going to
>> freeze everything forever. It is bad to wait for a long time and also bad
>> to rush in the breakages.
>>  - Beam 0.x: we only get to do it once. That is a mistake. Between 2.x.y
>> and 3.0.0 you need a period to mature the breaking APIs. You need a "0.x"
>> period in between each major version. Our @Experimental tag is one idea.
>> Another is setting up an LTS and making breaking changes in between. LTS
>> would ideally be a _major_ version, not a minor version. Linux alternating
>> versions was an interesting take. (caveat: my experience is as a Linux
>> 2.4/2.6 user while 2.7 was in development, and they may have changed
>> everything since then).
>>
>> All of this will feed into a renewed Beam 3 brainstorm at some point. But
>> now is actually the wrong time for all of that. Right now, even more than
>> usual, what people need is stability and reliability in every form
>> mentioned. We / our users don't have a lot of surplus capacity for making
>> surprise urgent fixes. It would be great to focus entirely on testing and
>> additional static analyses, etc. I think temporarily going super strict on
>> semver, using tools to ensure it, would serve our users well.
>>
>> Kenn
>>
>> On Thu, May 28, 2020 at 11:28 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> Updating our documentation makes sense.
>>>
>>> The backwards compat discussion is an interesting read. One of the
>>> points that they mention is that they like Spark users to be on the latest
>>> Spark. I can say that this is also true for Dataflow where we want users to
>>> be on the latest version of Beam. In Beam, I have seen that backwards
>>> compatibility is hard because the APIs that users use to construct their
>>> pipeline and what their functions use when the pipeline is executing reach
>>> into the internals of Beam and/or runners and I was wondering whether Spark
>>> was hitting the same issues in this regard?
>>>
>>> With portability and the no knobs philosophy, I can see that we should
>>> be able to relax which version of a runner is being used a lot more from
>>> what version of Beam is being used so we might want to go in a different
>>> direction then what was proposed in the Spark thread as well since we may
>>> be able to achieve a greater level of decoupling.
>>>
>>>
>>> On Thu, May 28, 2020 at 9:18 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>
>>>> I am surprised that we are claiming in the Beam website to use semantic
>>>> versioning (semver) [1] in Beam [2]. We have NEVER really followed
>>>> semantic
>>>> versioning and we have broken multiple times both internal and external
>>>> APIs (at
>>>> least for Java) as you can find in this analysis of source and binary
>>>> compatibility between beam versions that I did for ‘sdks/java/core’ two
>>>> months
>>>> ago in the following link:
>>>>
>>>>
>>>> https://cloudflare-ipfs.com/ipfs/QmQSkWYmzerpUjT7fhE9CF7M9hm2uvJXNpXi58mS8RKcNi/
>>>>
>>>> This report was produced by running the following script that excludes
>>>> both
>>>> @Experimental and @Internal annotations as well as many internal
>>>> packages like
>>>> ‘sdk/util/’, ‘transforms/reflect/’ and ‘sdk/testing/’ among others, for
>>>> more
>>>> details on the exclusions refer to this script code:
>>>>
>>>> https://gist.github.com/iemejia/5277fc269c63c4e49f1bb065454a895e
>>>>
>>>> Respecting semantic versioning is REALLY HARD and a strong compromise
>>>> that may
>>>> bring both positive and negative impact to the project, as usual it is
>>>> all about
>>>> trade-offs. Semver requires tooling that we do not have yet in place to
>>>> find
>>>> regressions before releases to fix them (or to augment major versions
>>>> to respect
>>>> the semver contract). We as a polyglot project need these tools for
>>>> every
>>>> supported language, and since all our languages live in the same
>>>> repository and
>>>> are released simultaneously an incompatible change in one language may
>>>> trigger a
>>>> full new major version number for the whole project which does not look
>>>> like a
>>>> desirable outcome.
>>>>
>>>> For these reasons I think we should soften the claim of using semantic
>>>> versioning claim and producing our own Beam semantic versioning policy
>>>> that is
>>>> consistent with our reality where we can also highlight the lack of
>>>> guarantees
>>>> for code marked as @Internal and @Experimental as well as for some
>>>> modules where
>>>> we may be interested on still having the freedom of not guaranteeing
>>>> stability
>>>> like runners/core* or any class in the different runners that is not a
>>>> PipelineOptions one.
>>>>
>>>> In general whatever we decide we should probably not be as strict but
>>>> consider
>>>> in detail the tradeoffs of the policy. There is an ongoing discussion on
>>>> versioning in the Apache Spark community that is really worth the read
>>>> and
>>>> proposes an analysis between Costs to break and API vs costs to
>>>> maintain an API
>>>> [3]. I think we can use it as an inspiration for an initial version.
>>>>
>>>> WDYT?
>>>>
>>>> [1] https://semver.org/
>>>> [2] https://beam.apache.org/get-started/downloads/
>>>> [3]
>>>> https://lists.apache.org/thread.html/r82f99ad8c2798629eed66d65f2cddc1ed196dddf82e8e9370f3b7d32%40%3Cdev.spark.apache.org%3E
>>>>
>>>>
>>>> On Thu, May 28, 2020 at 4:36 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Most of those items are either in APIs marked @Experimental (the
>>>>> definition of Experimental in Beam is that we can make breaking changes to
>>>>> the API) or are changes in a specific runner - not the Beam API.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Thu, May 28, 2020 at 7:19 AM Ashwin Ramaswami <
>>>>> aramaswamis@gmail.com> wrote:
>>>>>
>>>>>> There's a "Breaking Changes" section on this blogpost:
>>>>>> https://beam.apache.org/blog/beam-2.21.0/ (and really, for earlier
>>>>>> minor versions too)
>>>>>>
>>>>>> Ashwin Ramaswami
>>>>>> Student
>>>>>> *Find me on my:* LinkedIn <https://www.linkedin.com/in/ashwin-r> |
>>>>>> Website <https://epicfaace.github.io/> | GitHub
>>>>>> <https://github.com/epicfaace>
>>>>>>
>>>>>>
>>>>>> On Thu, May 28, 2020 at 10:01 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> What did we break?
>>>>>>>
>>>>>>> On Thu, May 28, 2020, 6:31 AM Ashwin Ramaswami <
>>>>>>> aramaswamis@gmail.com> wrote:
>>>>>>>
>>>>>>>> Do we really use semantic versioning? It appears we introduced
>>>>>>>> breaking changes from 2.20.0 -> 2.21.0. If not, we should update the
>>>>>>>> documentation under "API Stability" on this page:
>>>>>>>> https://beam.apache.org/get-started/downloads/
>>>>>>>>
>>>>>>>> What would be a better way to word the way in which we decide
>>>>>>>> version numbering?
>>>>>>>>
>>>>>>>