You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@trafficcontrol.apache.org by "Gray, Jonathan" <Jo...@comcast.com.INVALID> on 2021/08/31 01:12:51 UTC

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

-1 There are a few reasons I disagree here.

First, I think you’ve correctly identified that using an un-version in the field is problematic.  However, I’d argue that it’s more than just having to move all the pieces of the release together in lockstep.  If at any point you really must move any component ahead of an official release and use that unversion, all bets are off when it comes to upgrade safety.  Additionally, it creates concerns around things like upgrade order, timing, and what’s available when to what midflight.  Today we test all our components as a monoversion, but we rely on the API version promise to carry things forward safely in any order theoretically.  API versioning isn’t just about the technical aspects of software compatibility but also as a language for communicating expectations and change.  If I’m developing using the unversion, I can’t express if my thing works with yours or not in addition to if it will work in concert with other pieces of the puzzle or after an upgrade. “This means certain upgrades … would have to be closely coordinated with Traffic Ops” is simplest when you can say “does the target software version support my current API version, or what common api/software version(s) do I need to get to first” like normal.  Having to define special release upgrade procedures or comparing changesets with each other, especially across teams/implementations, is error prone.

Second, as a developer of support software around ATC I already make this assumption that the latest version available is unstable without an unversion concept.  The reason I make that assumption is because whatever the latest is can still have modifications made to outside the rules of api versioning or simply just being more bugprone than something that’s not being as actively touched.  The driving force behind API changes is that our data model changes when we refactor or improve.  I’m unlikely to willfully upgrade api versions for api version’s sake unless I either need something newer, or I’m forced to for backward compatibility (carrot vs stick).  It’s not a perfect rule of thumb because we do sometimes break the API retroactively like with the move to strong typing or when removing perl before the 1.x api.  The removal of the 1.x API is showing how expensive it truly is to safely remove API versions, and that’s something to be weighed in addition to maintenance cost to the project for those versions.  I think the million-dollar question revolves more around how much/far back we are willing to support.  If it’s only one release at a time, that’s going to drive those 3rd party code maintenance costs up significantly higher as part of just doing business which will slow down deployments even if releases are moving faster.

Third, an unversion just sounds like an opportunity for two developers working on different things but the same struct or endpoint to conflict and not realize it.  Additionally, it puts the burden on somebody at release time to bless a snapshot of whatever is in there as stable or know how to bisect all of what’s happened since the last project release.

Lastly, if the debate around cutting releases somehow being dependent on API code being merged, I argue it’s around feature completeness/usefulness instead.  Merging half-baked features before they’re ready to avoid API versioning issues would be better solved using feature branches.

If maintaining multiple versions of our api is expensive, there are other<https://github.com/rob05c/apiver> solutions<https://elixir-lang.org/> that might be worth investigating to make it cheaper.  Also, if the accumulation is a problem, I’m fine with a standing deprecation of all versions in a release that aren’t api -1 or a previously supported release.  That still means they stick around a little bit, but they’d survive one major upgrade at least in case you are running ahead.

Jonathan G


P.S.  Related historical mailing list conversation references:

  *   Traffic Ops API Semantic Versioning<https://lists.apache.org/thread.html/1a42a2192a81fc4d76639ccd10761b6b73c31345a63715bb8aa86e4e%40%3Cdev.trafficcontrol.apache.org%3E>
  *   Traffic Ops API versioning issues<https://lists.apache.org/thread.html/504b33b9c7b037a3b17a44613b326e0bdefd191cb7e62c0aca9e9515%40%3Cdev.trafficcontrol.apache.org%3E>
  *   Traffic Ops Route Deprecation Strategy<https://lists.apache.org/thread.html/b857afc7b52e72b2e60ebb3eb594b6fa5dd0ed3c9af5a17b58ee4a99%40%3Cdev.trafficcontrol.apache.org%3E>
  *   Deprecate APIv2 and v3<https://lists.apache.org/thread.html/re98819293cc349e9387a335c7a63498fb24eb783c82a2e2bef81b87f%40%3Cdev.trafficcontrol.apache.org%3E>

From: Rawlin Peters <ra...@apache.org>
Date: Friday, August 27, 2021 at 3:19 PM
To: dev@trafficcontrol.apache.org <de...@trafficcontrol.apache.org>
Subject: [EXTERNAL] Proposal: stable vs unstable TO API versions
Hey folks,

I'd like to propose that we start moving towards a TO API development
model where we consider the latest major version of the API
"unstable," while the 2nd latest major version is considered "stable."
What that means is that we would be free to make breaking changes to
the "unstable" version, while the "stable" version would maintain
backwards-compatibility. Eventually, once we feel that the latest
version of the TO API has stabilized, we will declare it "stable" and
deprecate the old stable version.

I see multiple benefits to this:
1. reduce the number of major API versions developers need to support,
making it easier to add new features
2. developers can make incremental changes (breaking and non-breaking)
to the unstable API version in every release without having to
introduce new major or minor versions, making the resulting API much
better overall once it is stabilized
3. reduce the number of unnecessary client upgrades, where the API
version changed but none of the routes the client uses were actually
changed
4. clients that don't need the latest API features don't have to upgrade
5. helps us release more frequently, because we aren't slowed down by
adding unnecessary code for a new TO API major/minor version with
every release
6. gives us more flexibility in what features need to be completed
before we cut a release (because they'd be targeting the unstable API
anyways, we can cut a release without causing a bunch of re-work for
new features that missed the API version bus)

Alas, all good things come at a price. For clients that need to use
the unstable version of the TO API (like Traffic Portal), their
upgrades may need to be closely coordinated with the Traffic Ops
upgrade. For TP, this is nothing new, because it is generally always
upgraded at the same time as TO. However, for other components that
may want to use the unstable API (e.g. `t3c`), this means certain
upgrades (not all, mind you, only those where a route the component
uses is actually broken) would have to be closely coordinated with
Traffic Ops. That said, for `t3c` at least, moving forward with Cache
Config Snapshots (https://urldefense.com/v3/__https://github.com/apache/trafficcontrol/pull/4708__;!!CQl3mcHX2A!Xwir0ypT4QqWermATCvTxSuv_hmAUW_lpc3T9a_ZdYeONDiveF0DqFudVmu_Sl4brueC$ )
would greatly alleviate that concern, since the snapshot route would
be kept backwards-compatible.

Please let me know what you think of this proposal. If we can come to
a consensus on this, we may be able to declare TO API 3.0 "stable" and
4.0 "unstable," meaning we can avoid a potential 5.0 API version in
whatever release comes after ATC 6.0.

- Rawlin

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by ocket 8888 <oc...@gmail.com>.

> Not to keep the can of worms open, but could a happy medium an
alternative here be to add a new section to our API responses at the same
level as "response" that we could add new/breaking stuff to in the current
API version?

idk, I don't really think that helps too much. It seems like it shifts the
problem from the `response` of the response being unstable to just the
response being unstable. Sure you can use the stable object, but what does
that gain you over just using the stable API? I guess granularity on a
per-endpoint basis. What do you do for non-breaking changes, just also add
them to the outer response? What about for collections? Like if we add a
`tags` array to servers, then if you request `/servers` do you essentially
get a response with two copies of the data you wanted, but one of the
copies has a single extra field on each element? Or would it have to
include only an identifier and we force clients that want to use it to do
an O(2n) hash mapping on the results to be able to associate the right
extra data with the right original structure? Either way the upgrade path
wouldn't be too much more complex than just changing a number in one place
and then usages for affected things - you only skip that single number. In
the worse case where you have to do the mapping upgrading would actually be
harder than using API number only.

On Wed, Sep 8, 2021 at 4:07 PM Gray, Jonathan
<Jo...@comcast.com.invalid> wrote:

> > we'll still have saved ourselves from the pain of having yet another
> major TO API version
>
> That seems like the elephant in the room eventually that’s going to need
> to change eventually.
>
> > Not to keep the can of worms open, but could a happy medium an
> alternative here be to add a new section to our API responses at the same
> level as "response" that we could add new/breaking stuff to in the current
> API version?
>
> The main problem we’ve suffered with is the golang json marshaller being
> incredibly strict about what is produces and consumes with strong typing.
> If anything doesn’t match up precisely to the structs on the other side of
> the marshaller, or use incredibly slow reflection, it just throws errors.
> I’d normally be moderately ok with a slushbucket extensions subkey to do
> whatever in, but golang can’t out of the box handle it in a performant way.
>
> Jonathan G
>
> From: Dave Neuman <ne...@apache.org>
> Date: Wednesday, September 8, 2021 at 3:58 PM
> To: dev@trafficcontrol.apache.org <de...@trafficcontrol.apache.org>
> Subject: Re: [EXTERNAL] Proposal: stable vs unstable TO API versions
> Yeah, Jonathan hit the nail on the head, there are some specific things
> coming that we know are breaking changes, and it would be nice to add those
> to 4.0 instead of having to roll yet another API version before 4.0 is even
> released. I am +1 with doing this just for 4.0 and then retrospecting
> after.
>
> Not to keep the can of worms open, but could a happy medium an alternative
> here be to add a new section to our API responses at the same level as
> "response" that we could add new/breaking stuff to in the current API
> version?  That way, clients that don't need or know about the new breaking
> thing could just use the `response` section which is stable and new clients
> can use the `unreleased` (or whatever) section?  It's just json so
> everytime you need to change it you can just add a new field under
> unreleased and then once a feature/whatever is stable and we are ready to
> make a new API version, we just move stuff from `unreleased` up to
> `response`?   Just a thought, trying to come up with something that allows
> us to iterate quickly without potentially causing customer issues and/or
> getting ourselves in a situation where upgrades are broken.
>
> --Dave
>
>
> On Wed, Sep 8, 2021 at 3:40 PM Rawlin Peters <ra...@apache.org> wrote:
>
> > > It seems you have a very specific set of goals with this proposal
> rather
> > than a general practice here, and possibly a very limited duration for
> this
> > proposal
> >
> > I would still love to see this implemented as a general practice (new
> > API versions being considered unstable until officially stabilized),
> > but I'm willing to concede that we just try it out for a release cycle
> > (specifically for the 3 breaking changes we have planned) before
> > having a retrospective to determine if it was a successful practice or
> > not and whether or not we should continue with it. Even if we choose
> > not to continue the practice, we'll still have saved ourselves from
> > the pain of having yet another major TO API version.
> >
> > - Rawlin
> >
>

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by "Gray, Jonathan" <Jo...@comcast.com.INVALID>.

> we'll still have saved ourselves from the pain of having yet another major TO API version

That seems like the elephant in the room eventually that’s going to need to change eventually.

> Not to keep the can of worms open, but could a happy medium an alternative here be to add a new section to our API responses at the same level as "response" that we could add new/breaking stuff to in the current API version?

The main problem we’ve suffered with is the golang json marshaller being incredibly strict about what is produces and consumes with strong typing.  If anything doesn’t match up precisely to the structs on the other side of the marshaller, or use incredibly slow reflection, it just throws errors.  I’d normally be moderately ok with a slushbucket extensions subkey to do whatever in, but golang can’t out of the box handle it in a performant way.

Jonathan G

From: Dave Neuman <ne...@apache.org>
Date: Wednesday, September 8, 2021 at 3:58 PM
To: dev@trafficcontrol.apache.org <de...@trafficcontrol.apache.org>
Subject: Re: [EXTERNAL] Proposal: stable vs unstable TO API versions
Yeah, Jonathan hit the nail on the head, there are some specific things
coming that we know are breaking changes, and it would be nice to add those
to 4.0 instead of having to roll yet another API version before 4.0 is even
released. I am +1 with doing this just for 4.0 and then retrospecting
after.

Not to keep the can of worms open, but could a happy medium an alternative
here be to add a new section to our API responses at the same level as
"response" that we could add new/breaking stuff to in the current API
version?  That way, clients that don't need or know about the new breaking
thing could just use the `response` section which is stable and new clients
can use the `unreleased` (or whatever) section?  It's just json so
everytime you need to change it you can just add a new field under
unreleased and then once a feature/whatever is stable and we are ready to
make a new API version, we just move stuff from `unreleased` up to
`response`?   Just a thought, trying to come up with something that allows
us to iterate quickly without potentially causing customer issues and/or
getting ourselves in a situation where upgrades are broken.

--Dave

On Wed, Sep 8, 2021 at 3:40 PM Rawlin Peters <ra...@apache.org> wrote:

> > It seems you have a very specific set of goals with this proposal rather
> than a general practice here, and possibly a very limited duration for this
> proposal
>
> I would still love to see this implemented as a general practice (new
> API versions being considered unstable until officially stabilized),
> but I'm willing to concede that we just try it out for a release cycle
> (specifically for the 3 breaking changes we have planned) before
> having a retrospective to determine if it was a successful practice or
> not and whether or not we should continue with it. Even if we choose
> not to continue the practice, we'll still have saved ourselves from
> the pain of having yet another major TO API version.
>
> - Rawlin
>

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by Dave Neuman <ne...@apache.org>.

Yeah, Jonathan hit the nail on the head, there are some specific things
coming that we know are breaking changes, and it would be nice to add those
to 4.0 instead of having to roll yet another API version before 4.0 is even
released. I am +1 with doing this just for 4.0 and then retrospecting
after.

Not to keep the can of worms open, but could a happy medium an alternative
here be to add a new section to our API responses at the same level as
"response" that we could add new/breaking stuff to in the current API
version?  That way, clients that don't need or know about the new breaking
thing could just use the `response` section which is stable and new clients
can use the `unreleased` (or whatever) section?  It's just json so
everytime you need to change it you can just add a new field under
unreleased and then once a feature/whatever is stable and we are ready to
make a new API version, we just move stuff from `unreleased` up to
`response`?   Just a thought, trying to come up with something that allows
us to iterate quickly without potentially causing customer issues and/or
getting ourselves in a situation where upgrades are broken.

--Dave

On Wed, Sep 8, 2021 at 3:40 PM Rawlin Peters <ra...@apache.org> wrote:

> > It seems you have a very specific set of goals with this proposal rather
> than a general practice here, and possibly a very limited duration for this
> proposal
>
> I would still love to see this implemented as a general practice (new
> API versions being considered unstable until officially stabilized),
> but I'm willing to concede that we just try it out for a release cycle
> (specifically for the 3 breaking changes we have planned) before
> having a retrospective to determine if it was a successful practice or
> not and whether or not we should continue with it. Even if we choose
> not to continue the practice, we'll still have saved ourselves from
> the pain of having yet another major TO API version.
>
> - Rawlin
>

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by Rawlin Peters <ra...@apache.org>.

> It seems you have a very specific set of goals with this proposal rather than a general practice here, and possibly a very limited duration for this proposal

I would still love to see this implemented as a general practice (new
API versions being considered unstable until officially stabilized),
but I'm willing to concede that we just try it out for a release cycle
(specifically for the 3 breaking changes we have planned) before
having a retrospective to determine if it was a successful practice or
not and whether or not we should continue with it. Even if we choose
not to continue the practice, we'll still have saved ourselves from
the pain of having yet another major TO API version.

- Rawlin

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by "Gray, Jonathan" <Jo...@comcast.com.INVALID>.

It seems you have a very specific set of goals with this proposal rather than a general practice here, and possibly a very limited duration for this proposal.  While yes you’ve thought through t3c in this case and it might be ok if everything works as it should, the general objection is still true.  It’s not possible to know all API consumer behavior.  This is why API versioning practices are so critically important in addition to supported upgrade practices.  My objection stands, but as the implementor and with the additional +1 you have, you’re welcome to ignore and proceed anyway.

Jonathan G

From: Rawlin Peters <ra...@apache.org>
Date: Wednesday, September 8, 2021 at 9:39 AM
To: dev@trafficcontrol.apache.org <de...@trafficcontrol.apache.org>
Subject: Re: [EXTERNAL] Proposal: stable vs unstable TO API versions
I understand your point, Jonathan and Rob, but I think it is a bit
much to assume that every upgrade of components that use the unstable
API will break them entirely and cause a customer impact/outage. For
the 3 things we're planning on breaking (jobs, users, servers), I'm
pretty sure `t3c` is the only component truly affected that could
potentially create bad ATS configs. Let's look at each in turn:

Jobs:
If TO is upgraded first, `t3c` requests will get a 200, but it will
just ignore the new field returned in the JSON, meaning jobs will
still be "refresh" instead of "refetch." No impact here.
If `t3c` is upgraded first, the new field will be empty, meaning `t3c`
should just treat it as "refresh." No impact here either.

Users:
`t3c` does not use this API and will not be impacted.

 Servers:
If TO is upgraded first, `t3c` requests will get a 200, but it will
think all server profile IDs = 0 (because the field will be missing
from the response, it will get unmarshalled as the int's "empty"
value). `t3c` will then attempt to get parameters of profile 0 and get
404'd, causing the run to fail, leaving old configs in place. No real
impact here other than not being able to change ATS configs until
`t3c` is upgraded.
If `t3c` is upgraded first, `t3c` requests will get a 200, but since
the resulting profile arrays would be empty, it will cause the `t3c`
run to fail, leaving old configs in place. Again, no real impact other
than not being able to change ATS configs until TO is also upgraded.

As you can see, this single coordinated upgrade would not cause any
real impact. Would you at least be willing to try it out for the next
release cycle in order to get these 3 breaking changes into API 4.0?
That alone would be a major savings, and we'd only have to coordinate
a single upgrade.

- Rawlin

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by Rawlin Peters <ra...@apache.org>.

I understand your point, Jonathan and Rob, but I think it is a bit
much to assume that every upgrade of components that use the unstable
API will break them entirely and cause a customer impact/outage. For
the 3 things we're planning on breaking (jobs, users, servers), I'm
pretty sure `t3c` is the only component truly affected that could
potentially create bad ATS configs. Let's look at each in turn:

Jobs:
If TO is upgraded first, `t3c` requests will get a 200, but it will
just ignore the new field returned in the JSON, meaning jobs will
still be "refresh" instead of "refetch." No impact here.
If `t3c` is upgraded first, the new field will be empty, meaning `t3c`
should just treat it as "refresh." No impact here either.

Users:
`t3c` does not use this API and will not be impacted.

 Servers:
If TO is upgraded first, `t3c` requests will get a 200, but it will
think all server profile IDs = 0 (because the field will be missing
from the response, it will get unmarshalled as the int's "empty"
value). `t3c` will then attempt to get parameters of profile 0 and get
404'd, causing the run to fail, leaving old configs in place. No real
impact here other than not being able to change ATS configs until
`t3c` is upgraded.
If `t3c` is upgraded first, `t3c` requests will get a 200, but since
the resulting profile arrays would be empty, it will cause the `t3c`
run to fail, leaving old configs in place. Again, no real impact other
than not being able to change ATS configs until TO is also upgraded.

As you can see, this single coordinated upgrade would not cause any
real impact. Would you at least be willing to try it out for the next
release cycle in order to get these 3 breaking changes into API 4.0?
That alone would be a major savings, and we'd only have to coordinate
a single upgrade.

- Rawlin

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by "Gray, Jonathan" <Jo...@comcast.com.INVALID>.

I agree with Rob here and I think we’re still missing a key problem around upgrade problems this proposal creates.  Consider the following:


  1.  Beginning state
     *   TO version Q serving API version 12.0 and “unstable”
     *   Component version Q consuming version 12.0
  2.  A bug is found and fix backported or a new feature is required in production which entails an API change
     *   TO version Q.1 serving API version 12.0 and “unstable”
     *   Component version Q.1 now consumes “unstable”
  3.  Time to upgrade to TO version W
     *   Upgrade TO first

                                                               i.      TO version Q serving API version 12.0, 13.0, and “unstable”

                                                             ii.      Component on Q.1 previously consuming “unstable” now has a very different payload because “unstable” represents not only the bugfix that was previously needed but all unrelated API changes

                                                           iii.      Therefore mid-upgrade Component Q.1 will be broken until it’s also updated to W and leveraging API 13.0 presumably or W.-1 and some new payload expectation of “unstable”

     *   Upgrade Component first

                                                               i.      TO Version Q.1 serving API version 12.0 and “unstable” as in 2.a

                                                             ii.      Component version W either expects TO to provide version 13.0 or “unstable” using a definition based on TO Version W which is different than what it had been serving in Q.1

                                                           iii.      Therefore mid-upgrade Component W will be broken until TO is also upgraded to TO Version W.

     *   Downgrade Component first

                                                               i.      This doesn’t work because Component version Q has a known issue or required feature impacting enough to justify the work to create and upgrade to Q.1 which requires TO serving API “unstable” as the fix wasn’t in API 12.0.

In either case moving forward you can’t safely upgrade without simultaneously stopping both TO and Component.  Depending on what Component is, that may not be possible without creating a customer impact, and it doesn’t matter if it’s an official ATC supported component or a 3rd party tool.  Introducing the idea of an un-version version here opens a door to a new category of problems.

Jonathan G


From: Robert O Butts <ro...@apache.org>
Date: Tuesday, September 7, 2021 at 9:58 AM
To: dev@trafficcontrol.apache.org <de...@trafficcontrol.apache.org>
Subject: Re: [EXTERNAL] Proposal: stable vs unstable TO API versions
I'm concerned that using this "unstable" version makes it impossible to
upgrade in-place.

Because if a client (cache config, Traffic Monitor, random ops scripts,
etc) uses it, and a breaking change is made, if you upgrade Traffic Ops
first you'll break all clients, and if you upgrade clients first, they'll
try to talk to TO and get 200's but the data will be malformed.

You could theoretically downgrade all clients to the previous version,
starting from the most-downstream, and then upgrade. But if a production
CDN is using a new feature, the CDN will almost certainly have things
relying on it that will break, either CDN operations or clients using new
features.

Worse, it seems like this isn't obvious. Which makes it a pretty big
footgun, if ATC operators use the "beta" API in their production CDN
without realizing they just made it impossible to upgrade.

On the other hand, I'm not seeing the big development savings. What
features have we added in the past that we added to the API, and then
changed our minds one version later and decided we did it wrong and wanted
to make a breaking change? Since using it makes it impossible to upgrade,
this means all production CDNs will have to wait 2 major versions for new
features. Underlying data changes that require two major versions to add
(like Layered Profiles) are pretty rare; this means for every small,
compatible change, users will have to wait two major versions to use a new
feature in production. That seems like a pretty high cost.


On Tue, Aug 31, 2021 at 10:27 AM Rawlin Peters <ra...@apache.org> wrote:

> For your 1st reason, that is all hinged on whether or not the software
> needs to use the unstable version of the API. That is why you also
> have the choice to stay on the stable version and not have to worry
> about coordinating upgrades. Mind you, upgrades would only need to be
> coordinated in the cases where a component actually uses one of the
> broken APIs in the unstable version. We can easily keep track of
> breaking changes in the changelog in order to call out certain
> upgrades that would need to be coordinated (for any components that
> use the unstable API). Just because that process might be more
> error-prone than keeping the latest API version stable doesn't mean we
> shouldn't do it. It's a small risk that has a huge reward in time
> saved by not having to deal with so many API upgrades.
>
> I think your 2nd reason is actually supporting this proposal:
>
> > The removal of the 1.x API is showing how expensive it truly is to
> safely remove API versions, and that’s something to be weighed in addition
> to maintenance cost to the project for those versions.
>
> The 1.x API removal was a prime example in just how much code was able
> to stay on the stable API version until we decided to remove it. With
> this proposal, all of that code would still be able to remain
> unchanged for a longer period of time than without this proposal,
> saving much unnecessary toil. It also reduces maintenance cost of
> prior versions because in creating less new major versions, we will
> have less of them to support over time.
>
> > I think the million-dollar question revolves more around how much/far
> back we are willing to support. If it’s only one release at a time, that’s
> going to drive those 3rd party code maintenance costs up significantly
> higher as part of just doing business which will slow down deployments even
> if releases are moving faster.
>
> I don't think so, because we'd be creating less major versions to
> remove in the first place, so we wouldn't have to worry about
> upgrading 3rd party code that stays on the stable API version. From
> the lessons learned with the API 1.x removal, the vast majority of 3rd
> party code stays on the stable API version until that version is
> getting removed. So we would be releasing faster *and* deploying
> faster.
>
> For your 3rd reason, developers working on the same route generally
> always have to coordinate changes in some way, and we are usually very
> good about that. That is how it's always been done and will continue
> to be done, unaffected by this proposal. It's not really the release
> manager's responsibility to figure out what has been broken and what
> upgrades need to be coordinated. That is a collective responsibility
> of all ATC developers when making breaking changes. Breaking changes
> should be called out in the changelog, along with any prescribed
> upgrade orders. If this proposal is accepted, I think we should give
> these types of changes their own specific section in the changelog.
>
> For your 4th reason, I don't think we've ever decided to merge
> something that was half-baked just to avoid API versioning issues. A
> PR is already a feature branch and can remain open until ready to
> merge. The problem this proposal solves is when a developer starts
> developing a feature towards e.g. API 4.0, but we just cut a release
> and are now on API 5.0, so that developer then needs to *rework* their
> PR to now target API 5.0. Unnecessary rework decreases productivity
> and makes the feature take longer to get to production and produce
> value for us. This proposal basically extends the runway, so that we
> don't have to make the decision to delay the release if the feature is
> nearly complete in order to avoid that unnecessary rework. We can
> simply cut the release on time and have the new feature land in the
> subsequent release (with no unnecessary rework for the developer).
> Additionally, it is always somewhat disappointing when we have to
> *wait* to start developing a new feature because a release is about to
> be cut in order to avoid unnecessary rework caused by API versioning.
> This proposal would allow that work to start at any point in time
> without adding any unnecessary rework.
>
> For your last point, I know you keep linking to Rob's
> https://urldefense.com/v3/__https://github.com/rob05c/apiver__;!!CQl3mcHX2A!S6GOQKU9zaRJOcesmvJnRZ75p8hCsurwihDB49QCG3az2zWPOIK-F-_7jHXx4zWuNTb2$<https://urldefense.com/v3/__https:/github.com/rob05c/apiver__;!!CQl3mcHX2A!S6GOQKU9zaRJOcesmvJnRZ75p8hCsurwihDB49QCG3az2zWPOIK-F-_7jHXx4zWuNTb2$>  library whenever conversations
> related to API versioning come up, but this proposal is mainly
> concerned with major version changes, for which that library was not
> made. Also, I'm not really sure how Elixir would help solve this
> problem.
>
> - Rawlin
>

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by Robert O Butts <ro...@apache.org>.

I'm concerned that using this "unstable" version makes it impossible to
upgrade in-place.

Because if a client (cache config, Traffic Monitor, random ops scripts,
etc) uses it, and a breaking change is made, if you upgrade Traffic Ops
first you'll break all clients, and if you upgrade clients first, they'll
try to talk to TO and get 200's but the data will be malformed.

You could theoretically downgrade all clients to the previous version,
starting from the most-downstream, and then upgrade. But if a production
CDN is using a new feature, the CDN will almost certainly have things
relying on it that will break, either CDN operations or clients using new
features.

Worse, it seems like this isn't obvious. Which makes it a pretty big
footgun, if ATC operators use the "beta" API in their production CDN
without realizing they just made it impossible to upgrade.

On the other hand, I'm not seeing the big development savings. What
features have we added in the past that we added to the API, and then
changed our minds one version later and decided we did it wrong and wanted
to make a breaking change? Since using it makes it impossible to upgrade,
this means all production CDNs will have to wait 2 major versions for new
features. Underlying data changes that require two major versions to add
(like Layered Profiles) are pretty rare; this means for every small,
compatible change, users will have to wait two major versions to use a new
feature in production. That seems like a pretty high cost.


On Tue, Aug 31, 2021 at 10:27 AM Rawlin Peters <ra...@apache.org> wrote:

> For your 1st reason, that is all hinged on whether or not the software
> needs to use the unstable version of the API. That is why you also
> have the choice to stay on the stable version and not have to worry
> about coordinating upgrades. Mind you, upgrades would only need to be
> coordinated in the cases where a component actually uses one of the
> broken APIs in the unstable version. We can easily keep track of
> breaking changes in the changelog in order to call out certain
> upgrades that would need to be coordinated (for any components that
> use the unstable API). Just because that process might be more
> error-prone than keeping the latest API version stable doesn't mean we
> shouldn't do it. It's a small risk that has a huge reward in time
> saved by not having to deal with so many API upgrades.
>
> I think your 2nd reason is actually supporting this proposal:
>
> > The removal of the 1.x API is showing how expensive it truly is to
> safely remove API versions, and that’s something to be weighed in addition
> to maintenance cost to the project for those versions.
>
> The 1.x API removal was a prime example in just how much code was able
> to stay on the stable API version until we decided to remove it. With
> this proposal, all of that code would still be able to remain
> unchanged for a longer period of time than without this proposal,
> saving much unnecessary toil. It also reduces maintenance cost of
> prior versions because in creating less new major versions, we will
> have less of them to support over time.
>
> > I think the million-dollar question revolves more around how much/far
> back we are willing to support. If it’s only one release at a time, that’s
> going to drive those 3rd party code maintenance costs up significantly
> higher as part of just doing business which will slow down deployments even
> if releases are moving faster.
>
> I don't think so, because we'd be creating less major versions to
> remove in the first place, so we wouldn't have to worry about
> upgrading 3rd party code that stays on the stable API version. From
> the lessons learned with the API 1.x removal, the vast majority of 3rd
> party code stays on the stable API version until that version is
> getting removed. So we would be releasing faster *and* deploying
> faster.
>
> For your 3rd reason, developers working on the same route generally
> always have to coordinate changes in some way, and we are usually very
> good about that. That is how it's always been done and will continue
> to be done, unaffected by this proposal. It's not really the release
> manager's responsibility to figure out what has been broken and what
> upgrades need to be coordinated. That is a collective responsibility
> of all ATC developers when making breaking changes. Breaking changes
> should be called out in the changelog, along with any prescribed
> upgrade orders. If this proposal is accepted, I think we should give
> these types of changes their own specific section in the changelog.
>
> For your 4th reason, I don't think we've ever decided to merge
> something that was half-baked just to avoid API versioning issues. A
> PR is already a feature branch and can remain open until ready to
> merge. The problem this proposal solves is when a developer starts
> developing a feature towards e.g. API 4.0, but we just cut a release
> and are now on API 5.0, so that developer then needs to *rework* their
> PR to now target API 5.0. Unnecessary rework decreases productivity
> and makes the feature take longer to get to production and produce
> value for us. This proposal basically extends the runway, so that we
> don't have to make the decision to delay the release if the feature is
> nearly complete in order to avoid that unnecessary rework. We can
> simply cut the release on time and have the new feature land in the
> subsequent release (with no unnecessary rework for the developer).
> Additionally, it is always somewhat disappointing when we have to
> *wait* to start developing a new feature because a release is about to
> be cut in order to avoid unnecessary rework caused by API versioning.
> This proposal would allow that work to start at any point in time
> without adding any unnecessary rework.
>
> For your last point, I know you keep linking to Rob's
> https://github.com/rob05c/apiver library whenever conversations
> related to API versioning come up, but this proposal is mainly
> concerned with major version changes, for which that library was not
> made. Also, I'm not really sure how Elixir would help solve this
> problem.
>
> - Rawlin
>

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by Rawlin Peters <ra...@apache.org>.

Replies below:

On Tue, Sep 7, 2021 at 9:37 AM Dave Neuman <ne...@apache.org> wrote:
> I do think that we will have to probably put some thought into when we
> determine an API is "stable" and what that process looks like.  It is a
> little uncomfortable to just leave that as a gut feel type thing, but I
> understand that it is also very hard to put more rules/processes around
> something that is pretty subjective.

I think if our general guideline is to only make breaking changes when
absolutely necessary for a new feature being added (i.e. we can't just
add a new optional field with a default for some reason or adding new
routes that tie into existing routes would make the API too unwieldy),
then we should just look at what we have planned on our roadmaps for
the next 6-12 months or so. If there is anything that sticks out as
needing a breaking API change, then perhaps we hold off on stabilizing
until we get that breaking change into the unstable API. Or, if the
API version has already been unstable for a certain amount of time,
perhaps we would stabilize it even if we have breaking changes on the
roadmap.

On Tue, Sep 7, 2021 at 9:58 AM Robert O Butts <ro...@apache.org> wrote:
>
> I'm concerned that using this "unstable" version makes it impossible to
> upgrade in-place.
>
> Because if a client (cache config, Traffic Monitor, random ops scripts,
> etc) uses it, and a breaking change is made, if you upgrade Traffic Ops
> first you'll break all clients, and if you upgrade clients first, they'll
> try to talk to TO and get 200's but the data will be malformed.

I understand your concern about upgrading, but in reality it's still
possible to upgrade components that use the unstable API version. It
will just require more coordination than upgrading components that use
the stable API. Plus, keep in mind, it's not like every single
breaking change to the unstable API automatically breaks every client
of the unstable API. Only clients using the particular route(s) being
broken in the unstable API would require coordination to upgrade.

> Worse, it seems like this isn't obvious. Which makes it a pretty big
> footgun, if ATC operators use the "beta" API in their production CDN
> without realizing they just made it impossible to upgrade.

If we declare a certain API version unstable, ATC operators should
understand the risks of using it, just like there are risks involved
in using the API in general. Using the API to make changes is
generally a last-resort option when making the same changes in the UI
would take much longer. Using the UI is generally the much safer
option since it has a lot more built-in safeties (confirmations, form
validation, etc) than the API, but in the case where ATC operators
absolutely need the new features in the unstable API and can't use the
UI instead, they will have to take that risk.

> On the other hand, I'm not seeing the big development savings.

https://github.com/apache/trafficcontrol/pull/6145 -- 60,000 lines of
code just to add a new major TO API version is a pretty big savings,
and that is not even counting all of the "if version == x"
conditionals that have to clutter the code to handle multiple API
versions. The fewer version-specific conditionals we have to deal with
in the code, the easier it is to develop and the less bug-prone it is.

> Since using it makes it impossible to upgrade,
> this means all production CDNs will have to wait 2 major versions for new
> features.

Again, this is a false statement. CDNs will have access to unstable
features via the API immediately upon release, and if certain
components need new changes in the unstable API, their upgrades may
need to be coordinated with the TO upgrade. Since `t3c` uses a large
percentage of the API currently and will most likely need to use the
unstable API, most of its upgrade concerns will be alleviated by the
addition of Cache Config Snapshots. The Cache Config Snapshots API
will generally always be stable in that the JSON snapshot will only
have fields added in a backwards-compatible manner. We should never
make a breaking change to a snapshot, and in general we never really
have (at least for the CRConfig snapshot that I know of). So with
Cache Config Snapshots, `t3c` will always have access to new features
right away and won't have to use the unstable API. Hopefully that
alleviates some of your upgrade concerts with respect to `t3c`. Most
other ATC components use a much smaller percentage of the API and
generally don't always need to use the latest API version.

- Rawlin

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by Dave Neuman <ne...@apache.org>.

Another versioning discussion, yay!
In all seriousness, I am +1 for whatever makes development easier as long
as the risk to operations doesn't outweigh the savings.  In other words, if
we feel like this change is going to provide a lot more simplicity for our
development cycle --  including the ability to release quicker -- and we
don't feel the cost is too high operationally then I am all for it.

I understand the concerns about potentially breaking clients that are using
some version of the unstable or latest version of the API, but I think as a
client using the latest/unstable version of the API that is an
inherent risk.  I would also hope that we would invest in the proper amount
of automated testing necessary to catch these types of issues ahead of
time.

Ultimately I think that we have spent a lot of time discussing API
versioning and yet it still comes up as a pain for our development
process.  I am all for any reasonable change that could make that better.
I am also for trying new things and if it doesn't work, we go back to the
way things were.

I do think that we will have to probably put some thought into when we
determine an API is "stable" and what that process looks like.  It is a
little uncomfortable to just leave that as a gut feel type thing, but I
understand that it is also very hard to put more rules/processes around
something that is pretty subjective.

This is just my $.02, I am definitely not actively doing a ton of
development against the API these days, but I do see the pain that the API
brings us and hope we can make that better.

--Dave

On Tue, Aug 31, 2021 at 10:27 AM Rawlin Peters <ra...@apache.org> wrote:

> For your 1st reason, that is all hinged on whether or not the software
> needs to use the unstable version of the API. That is why you also
> have the choice to stay on the stable version and not have to worry
> about coordinating upgrades. Mind you, upgrades would only need to be
> coordinated in the cases where a component actually uses one of the
> broken APIs in the unstable version. We can easily keep track of
> breaking changes in the changelog in order to call out certain
> upgrades that would need to be coordinated (for any components that
> use the unstable API). Just because that process might be more
> error-prone than keeping the latest API version stable doesn't mean we
> shouldn't do it. It's a small risk that has a huge reward in time
> saved by not having to deal with so many API upgrades.
>
> I think your 2nd reason is actually supporting this proposal:
>
> > The removal of the 1.x API is showing how expensive it truly is to
> safely remove API versions, and that’s something to be weighed in addition
> to maintenance cost to the project for those versions.
>
> The 1.x API removal was a prime example in just how much code was able
> to stay on the stable API version until we decided to remove it. With
> this proposal, all of that code would still be able to remain
> unchanged for a longer period of time than without this proposal,
> saving much unnecessary toil. It also reduces maintenance cost of
> prior versions because in creating less new major versions, we will
> have less of them to support over time.
>
> > I think the million-dollar question revolves more around how much/far
> back we are willing to support. If it’s only one release at a time, that’s
> going to drive those 3rd party code maintenance costs up significantly
> higher as part of just doing business which will slow down deployments even
> if releases are moving faster.
>
> I don't think so, because we'd be creating less major versions to
> remove in the first place, so we wouldn't have to worry about
> upgrading 3rd party code that stays on the stable API version. From
> the lessons learned with the API 1.x removal, the vast majority of 3rd
> party code stays on the stable API version until that version is
> getting removed. So we would be releasing faster *and* deploying
> faster.
>
> For your 3rd reason, developers working on the same route generally
> always have to coordinate changes in some way, and we are usually very
> good about that. That is how it's always been done and will continue
> to be done, unaffected by this proposal. It's not really the release
> manager's responsibility to figure out what has been broken and what
> upgrades need to be coordinated. That is a collective responsibility
> of all ATC developers when making breaking changes. Breaking changes
> should be called out in the changelog, along with any prescribed
> upgrade orders. If this proposal is accepted, I think we should give
> these types of changes their own specific section in the changelog.
>
> For your 4th reason, I don't think we've ever decided to merge
> something that was half-baked just to avoid API versioning issues. A
> PR is already a feature branch and can remain open until ready to
> merge. The problem this proposal solves is when a developer starts
> developing a feature towards e.g. API 4.0, but we just cut a release
> and are now on API 5.0, so that developer then needs to *rework* their
> PR to now target API 5.0. Unnecessary rework decreases productivity
> and makes the feature take longer to get to production and produce
> value for us. This proposal basically extends the runway, so that we
> don't have to make the decision to delay the release if the feature is
> nearly complete in order to avoid that unnecessary rework. We can
> simply cut the release on time and have the new feature land in the
> subsequent release (with no unnecessary rework for the developer).
> Additionally, it is always somewhat disappointing when we have to
> *wait* to start developing a new feature because a release is about to
> be cut in order to avoid unnecessary rework caused by API versioning.
> This proposal would allow that work to start at any point in time
> without adding any unnecessary rework.
>
> For your last point, I know you keep linking to Rob's
> https://github.com/rob05c/apiver library whenever conversations
> related to API versioning come up, but this proposal is mainly
> concerned with major version changes, for which that library was not
> made. Also, I'm not really sure how Elixir would help solve this
> problem.
>
> - Rawlin
>

Re: [EXTERNAL] Proposal: stable vs unstable TO API versions

Posted by Rawlin Peters <ra...@apache.org>.

For your 1st reason, that is all hinged on whether or not the software
needs to use the unstable version of the API. That is why you also
have the choice to stay on the stable version and not have to worry
about coordinating upgrades. Mind you, upgrades would only need to be
coordinated in the cases where a component actually uses one of the
broken APIs in the unstable version. We can easily keep track of
breaking changes in the changelog in order to call out certain
upgrades that would need to be coordinated (for any components that
use the unstable API). Just because that process might be more
error-prone than keeping the latest API version stable doesn't mean we
shouldn't do it. It's a small risk that has a huge reward in time
saved by not having to deal with so many API upgrades.

I think your 2nd reason is actually supporting this proposal:

> The removal of the 1.x API is showing how expensive it truly is to safely remove API versions, and that’s something to be weighed in addition to maintenance cost to the project for those versions.

The 1.x API removal was a prime example in just how much code was able
to stay on the stable API version until we decided to remove it. With
this proposal, all of that code would still be able to remain
unchanged for a longer period of time than without this proposal,
saving much unnecessary toil. It also reduces maintenance cost of
prior versions because in creating less new major versions, we will
have less of them to support over time.

> I think the million-dollar question revolves more around how much/far back we are willing to support. If it’s only one release at a time, that’s going to drive those 3rd party code maintenance costs up significantly higher as part of just doing business which will slow down deployments even if releases are moving faster.

I don't think so, because we'd be creating less major versions to
remove in the first place, so we wouldn't have to worry about
upgrading 3rd party code that stays on the stable API version. From
the lessons learned with the API 1.x removal, the vast majority of 3rd
party code stays on the stable API version until that version is
getting removed. So we would be releasing faster *and* deploying
faster.

For your 3rd reason, developers working on the same route generally
always have to coordinate changes in some way, and we are usually very
good about that. That is how it's always been done and will continue
to be done, unaffected by this proposal. It's not really the release
manager's responsibility to figure out what has been broken and what
upgrades need to be coordinated. That is a collective responsibility
of all ATC developers when making breaking changes. Breaking changes
should be called out in the changelog, along with any prescribed
upgrade orders. If this proposal is accepted, I think we should give
these types of changes their own specific section in the changelog.

For your 4th reason, I don't think we've ever decided to merge
something that was half-baked just to avoid API versioning issues. A
PR is already a feature branch and can remain open until ready to
merge. The problem this proposal solves is when a developer starts
developing a feature towards e.g. API 4.0, but we just cut a release
and are now on API 5.0, so that developer then needs to *rework* their
PR to now target API 5.0. Unnecessary rework decreases productivity
and makes the feature take longer to get to production and produce
value for us. This proposal basically extends the runway, so that we
don't have to make the decision to delay the release if the feature is
nearly complete in order to avoid that unnecessary rework. We can
simply cut the release on time and have the new feature land in the
subsequent release (with no unnecessary rework for the developer).
Additionally, it is always somewhat disappointing when we have to
*wait* to start developing a new feature because a release is about to
be cut in order to avoid unnecessary rework caused by API versioning.
This proposal would allow that work to start at any point in time
without adding any unnecessary rework.

For your last point, I know you keep linking to Rob's
https://github.com/rob05c/apiver library whenever conversations
related to API versioning come up, but this proposal is mainly
concerned with major version changes, for which that library was not
made. Also, I'm not really sure how Elixir would help solve this
problem.

- Rawlin