You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Tom Bentley <tb...@redhat.com> on 2021/12/17 17:09:16 UTC

Re: [DISCUSS] KIP-778 KRaft Upgrades

Hi David,

Thanks for the KIP. Sorry I'm so late in looking at this. I have a few
questions.

1. In the Proposed Changes Overview it says "The controller validates that
the cluster can be safely downgraded to this version (override with
--force)" I think that be "--unsafe"?

2. Also in this section it says "Components reload their state with new
version" the word "components" is a little vague. I think it means
controllers and brokers, right?

3. In the compatibility section it says "A controller running old software
will join the quorum and begin replicating the metadata log. If this
inactive controller encounters a FeatureLevelRecord for *metadata.version*
that it cannot support, it should terminate." I assume "encounters"
includes the snapshot case?

4. "In the unlikely event that an active controller encounters an
unsupported *metadata.version*, it should resign and terminate. ", I'm
unclear on how this could happen, could you elaborate?

5. In the ApiVersions part of the Upgrades section, "Brokers will observe
changes to *metadata.version *as they replicate records from the metadata
log. If a new *metadata.version* is seen, brokers will renegotiate
compatible RPCs with other brokers through the the ApiVersions workflow."
Is it possible that broker A does a metadata fetch, sees a changed
metadata.version and attempts renegotiation with broker B before B has seen
the metadata.version?

6. Why does "kafka-features.sh downgrade" not support "--release"?

Kind regards,

Tom

On Tue, Nov 30, 2021 at 10:26 PM Jun Rao <ju...@confluent.io.invalid> wrote:

> Hi, David,
>
> Thanks for the reply. No more questions from me.
>
> Jun
>
> On Tue, Nov 30, 2021 at 1:52 PM David Arthur
> <da...@confluent.io.invalid> wrote:
>
> > Thanks Jun,
> >
> > 30: I clarified the description of the "upgrade" command to:
> >
> > The FEATURE and VERSION arguments can be repeated to indicate an upgrade
> of
> > > multiple features in one request. Alternatively, the RELEASE flag can
> be
> > > given to upgrade to the default versions of the specified release.
> These
> > > two options, FEATURE and RELEASE, are mutually exclusive. If neither is
> > > given, this command will do nothing.
> >
> >
> > Basically, you must provide either "kafka-features.sh upgrade --release
> > 3.2" or something like "kafka-features.sh upgrade --feature X --version
> 10"
> >
> > -David
> >
> > On Tue, Nov 30, 2021 at 2:51 PM Jun Rao <ju...@confluent.io.invalid>
> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply. Just one more minor comment.
> > >
> > > 30. ./kafka-features.sh upgrade: It seems that the release param is
> > > optional. Could you describe the semantic when release is not
> specified?
> > >
> > > Jun
> > >
> > > On Mon, Nov 29, 2021 at 5:06 PM David Arthur
> > > <da...@confluent.io.invalid> wrote:
> > >
> > > > Jun, I updated the KIP with the "disable" CLI.
> > > >
> > > > For 16, I think you're asking how we can safely introduce the
> > > > initial version of other feature flags in the future. This will
> > probably
> > > > depend on the nature of the feature that the new flag is gating. We
> can
> > > > probably take a similar approach and say version 1 is backwards
> > > compatible
> > > > to some point (possibly an IBP or metadata.version?).
> > > >
> > > > -David
> > > >
> > > > On Fri, Nov 19, 2021 at 3:00 PM Jun Rao <ju...@confluent.io.invalid>
> > > wrote:
> > > >
> > > > > Hi, David,
> > > > >
> > > > > Thanks for the reply. The new CLI sounds reasonable to me.
> > > > >
> > > > > 16.
> > > > > For case C, choosing the latest version sounds good to me.
> > > > > For case B, for metadata.version, we pick version 1 since it just
> > > happens
> > > > > that for metadata.version version 1 is backward compatible. How do
> we
> > > > make
> > > > > this more general for other features?
> > > > >
> > > > > 21. Do you still plan to change "delete" to "disable" in the CLI?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Nov 18, 2021 at 11:50 AM David Arthur
> > > > > <da...@confluent.io.invalid> wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > The KIP has some changes to the CLI for KIP-584. With Jason's
> > > > suggestion
> > > > > > incorporated, these three commands would look like:
> > > > > >
> > > > > > 1) kafka-features.sh upgrade --release latest
> > > > > > upgrades all known features to their defaults in the latest
> release
> > > > > >
> > > > > > 2) kafka-features.sh downgrade --release 3.x
> > > > > > downgrade all known features to the default versions from 3.x
> > > > > >
> > > > > > 3) kafka-features.sh describe --release latest
> > > > > > Describe the known features of the latest release
> > > > > >
> > > > > > The --upgrade-all and --downgrade-all are replaced by specific
> each
> > > > > > feature+version or giving the --release argument. One problem
> with
> > > > > > --downgrade-all is it's not clear what the target versions should
> > be:
> > > > the
> > > > > > previous version before the last upgrade -- or the lowest
> supported
> > > > > > version. Since downgrades will be less common, I think it's
> better
> > to
> > > > > make
> > > > > > the operator be more explicit about the desired downgrade version
> > > > (either
> > > > > > through [--feature X --version Y] or [--release 3.1]). Does that
> > seem
> > > > > > reasonable?
> > > > > >
> > > > > > 16. For case C, I think we will want to always use the latest
> > > > > > metadata.version for new clusters (like we do for IBP). We can
> > always
> > > > > > change what we mean by "default" down the road. Also, this will
> > > likely
> > > > > > become a bit of information we include in release and upgrade
> notes
> > > > with
> > > > > > each release.
> > > > > >
> > > > > > -David
> > > > > >
> > > > > > On Thu, Nov 18, 2021 at 1:14 PM Jun Rao <jun@confluent.io.invalid
> >
> > > > > wrote:
> > > > > >
> > > > > > > Hi, Jason, David,
> > > > > > >
> > > > > > > Just to clarify on the interaction with the end user, the
> design
> > in
> > > > > > KIP-584
> > > > > > > allows one to do the following.
> > > > > > > (1) kafka-features.sh  --upgrade-all --bootstrap-server
> > > > > > > kafka-broker0.prn1:9071 to upgrade all features to the latest
> > > version
> > > > > > known
> > > > > > > by the tool. The user doesn't need to know a specific feature
> > > > version.
> > > > > > > (2) kafka-features.sh  --downgrade-all --bootstrap-server
> > > > > > > kafka-broker0.prn1:9071 to downgrade all features to the latest
> > > > version
> > > > > > > known by the tool. The user doesn't need to know a specific
> > feature
> > > > > > > version.
> > > > > > > (3) kafka-features.sh  --describe --bootstrap-server
> > > > > > > kafka-broker0.prn1:9071 to find out the supported version for
> > each
> > > > > > feature.
> > > > > > > This allows the user to upgrade each feature individually.
> > > > > > >
> > > > > > > Most users will be doing (1) and occasionally (2), and won't
> need
> > > to
> > > > > know
> > > > > > > the exact version of each feature.
> > > > > > >
> > > > > > > 16. For case C, what's the default version? Is it always the
> > > latest?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Nov 18, 2021 at 8:15 AM David Arthur
> > > > > > > <da...@confluent.io.invalid> wrote:
> > > > > > >
> > > > > > > > Colin, thanks for the detailed response. I understand what
> > you're
> > > > > > saying
> > > > > > > > and I agree with your rationale.
> > > > > > > >
> > > > > > > > It seems like we could just initialize their cluster.metadata
> > to
> > > 1
> > > > > when
> > > > > > > the
> > > > > > > > > software is upgraded to 3.2.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Concretely, this means the controller would see that there is
> > no
> > > > > > > > FeatureLevelRecord in the log, and so it would bootstrap one.
> > > > > Normally
> > > > > > > for
> > > > > > > > cluster initialization, this would be read from
> > meta.properties,
> > > > but
> > > > > in
> > > > > > > the
> > > > > > > > case of preview clusters we would need to special case it to
> > > > > initialize
> > > > > > > the
> > > > > > > > version to 1.
> > > > > > > >
> > > > > > > > Once the new FeatureLevelRecord has been committed, any nodes
> > > > > (brokers
> > > > > > or
> > > > > > > > controllers) that are running the latest software will react
> to
> > > the
> > > > > new
> > > > > > > > metadata.version. This means we will need to make this
> initial
> > > > > version
> > > > > > > of 1
> > > > > > > > be backwards compatible to what we have in 3.0 and 3.1 (since
> > > some
> > > > > > > brokers
> > > > > > > > will be on the new software and some on older/preview
> versions)
> > > > > > > >
> > > > > > > > I agree that auto-upgrading preview users from IBP 3.0 to
> > > > > > > metadata.version
> > > > > > > > 1 (equivalent to IBP 3.2) is probably fine.
> > > > > > > >
> > > > > > > > Back to Jun's case B:
> > > > > > > >
> > > > > > > > b. We upgrade from an old version where no metadata.version
> has
> > > > been
> > > > > > > > > finalized. In this case, it makes sense to leave
> > > metadata.version
> > > > > > > > disabled
> > > > > > > > > since we don't know if all brokers have been upgraded.
> > > > > > > >
> > > > > > > >
> > > > > > > > Instead of leaving it uninitialized, we would initialize it
> to
> > 1
> > > > > which
> > > > > > > > would be backwards compatible to 3.0 and 3.1. This would
> > > eliminate
> > > > > the
> > > > > > > need
> > > > > > > > to check a "final IBP" or deal with 3.2+ clusters without a
> > > > > > > > metadata.version set. The downgrade path for 3.2 back to a
> > > preview
> > > > > > > release
> > > > > > > > should be fine since we are saying that metadata.version 1 is
> > > > > > compatible
> > > > > > > > with those releases. The FeatureLevelRecord would exist, but
> it
> > > > would
> > > > > > be
> > > > > > > > ignored (we might need to make sure this actually works in
> > 3.0).
> > > > > > > >
> > > > > > > > For clarity, here are the three workflows of Jun's three
> cases:
> > > > > > > >
> > > > > > > > A (KRaft 3.2+ to KRaft 3.2+):
> > > > > > > > * User initializes cluster with an explicit metadata.version
> X,
> > > or
> > > > > the
> > > > > > > tool
> > > > > > > > selects the default
> > > > > > > > * After upgrade, cluster stays at version X until operator
> > > upgrades
> > > > > to
> > > > > > Y
> > > > > > > >
> > > > > > > > B (KRaft Preview to KRaft 3.2+):
> > > > > > > > * User initializes cluster without metadata.version
> > > > > > > > * After controller leader is upgraded, initializes
> > > metadata.version
> > > > > to
> > > > > > 1
> > > > > > > > * After the cluster is upgraded, an operator may further
> > upgrade
> > > to
> > > > > > > version
> > > > > > > > Y
> > > > > > > >
> > > > > > > > C (New KRaft 3.2+):
> > > > > > > > * User initializes cluster with an explicit metadata.version
> X,
> > > or
> > > > > the
> > > > > > > tool
> > > > > > > > selects the default
> > > > > > > >
> > > > > > > >
> > > > > > > > This has been mentioned in this thread, but we should
> > explicitly
> > > > call
> > > > > > out
> > > > > > > > that the absence of metadata.version in meta.properties will
> be
> > > > used
> > > > > > > > to identify that we are in case B. After 3.2, we will require
> > > that
> > > > > > > > metadata.version is present in meta.properties for new
> > clusters.
> > > If
> > > > > > users
> > > > > > > > omit the --metadata-version flag from kafka-storage.sh, we
> > should
> > > > > fill
> > > > > > > in a
> > > > > > > > default.
> > > > > > > >
> > > > > > > > So how about the following changes/clarifications to the KIP:
> > > > > > > >
> > > > > > > > * Starting with 3.2, metadata.version is required in KRaft,
> IBP
> > > is
> > > > > > > ignored
> > > > > > > > * If meta.properties does not include metadata.version, it
> > > > indicates
> > > > > > this
> > > > > > > > cluster was initialized with a preview release
> > > > > > > > * If upgrading from a preview release to 3.2+, the controller
> > > will
> > > > > > > > initialize metadata.version=1
> > > > > > > > * metadata.version=1 is backwards compatible with preview
> > > releases
> > > > > > > >
> > > > > > > > -David
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Nov 18, 2021 at 10:02 AM David Arthur <
> > > > > > david.arthur@confluent.io
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Jason,
> > > > > > > > >
> > > > > > > > > 1/2. You've got it right. The intention is that
> > > metadata.version
> > > > > will
> > > > > > > > gate
> > > > > > > > > both RPCs (like IBP does) as well as metadata records. So,
> > > when a
> > > > > > > broker
> > > > > > > > > sees that metadata.version changed, it may start
> advertising
> > > new
> > > > > RPCs
> > > > > > > and
> > > > > > > > > it will need to refresh the ApiVersions it has for other
> > > > brokers. I
> > > > > > can
> > > > > > > > try
> > > > > > > > > to make this more explicit in the KIP.
> > > > > > > > >
> > > > > > > > > 3. Yes, that's the intention of the --dry-run flag. The
> > current
> > > > > > > > > implementation in kafka-features.sh does a dry run on the
> > > client
> > > > > > side,
> > > > > > > > but
> > > > > > > > > this KIP pushes the validation down to the controller. This
> > > will
> > > > > > allow
> > > > > > > us
> > > > > > > > > to have the context needed to do proper validation
> > > > > > > > >
> > > > > > > > > Re: version number complexity -- yes this has come up in
> > > offline
> > > > > > > > > discussions. With just one feature flag to manage, it's not
> > so
> > > > bad,
> > > > > > but
> > > > > > > > > explicitly managing even a few would be a burden. I like
> your
> > > > > > > suggestion
> > > > > > > > of
> > > > > > > > > the "--release" flag. That could act as a sort of manifest
> of
> > > > > > versions
> > > > > > > > > (i.e., the defaults from that release). We could also
> support
> > > > > > something
> > > > > > > > > like "kafka-features upgrade --release latest" to bring
> > > > everything
> > > > > to
> > > > > > > the
> > > > > > > > > highest supported version.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Nov 17, 2021 at 9:36 PM Jason Gustafson
> > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi Colin/David,
> > > > > > > > >>
> > > > > > > > >> > Like David said, basically it boils down to creating a
> > > feature
> > > > > > flag
> > > > > > > > for
> > > > > > > > >> the new proposed __consumer_offsets version, or using a
> new
> > > > > > > > >> IBP/metadata.version for it. Both approaches have pros and
> > > cons.
> > > > > > Using
> > > > > > > > an
> > > > > > > > >> IBP/metadata.version bump reduces the size of the testing
> > > > matrix.
> > > > > > But
> > > > > > > > >> using
> > > > > > > > >> a feature flag allows people to avoid any bugs or pain
> > > > associated
> > > > > > with
> > > > > > > > the
> > > > > > > > >> change if they don't care about enabling it. This is
> > basically
> > > > the
> > > > > > > > classic
> > > > > > > > >> "should I use a feature flag or not?" discussion and we
> need
> > > to
> > > > > have
> > > > > > > it
> > > > > > > > on
> > > > > > > > >> a case-by-case basis.
> > > > > > > > >>
> > > > > > > > >> I think most users are not going to care to manage
> versions
> > > for
> > > > a
> > > > > > > bunch
> > > > > > > > of
> > > > > > > > >> different features. The IBP today has many shortcomings,
> but
> > > at
> > > > > > least
> > > > > > > > it's
> > > > > > > > >> tied to a version that users understand (i.e. the release
> > > > > version).
> > > > > > > How
> > > > > > > > >> would users know after upgrading to Kafka 3.1, for
> example,
> > > that
> > > > > > they
> > > > > > > > need
> > > > > > > > >> to upgrade the metadata.version to 3  and offsets.version
> > to 4
> > > > (or
> > > > > > > > >> whatever)? It's a lot of overhead trying to understand all
> > of
> > > > the
> > > > > > > > >> potential
> > > > > > > > >> features and what each upgrade actually means to them. I
> am
> > > > > > wondering
> > > > > > > if
> > > > > > > > >> we
> > > > > > > > >> could give them something more convenient which is tied to
> > the
> > > > > > release
> > > > > > > > >> version. For example, maybe we could use a command like
> > > > > > > `kafka-features
> > > > > > > > >> upgrade --release 3.1`, which the broker would then
> > translate
> > > to
> > > > > an
> > > > > > > > >> upgrade
> > > > > > > > >> to the latest versions of the individual features at the
> > time
> > > of
> > > > > the
> > > > > > > 3.1
> > > > > > > > >> release. Basically it's just a static map from release
> > version
> > > > to
> > > > > > > > feature
> > > > > > > > >> versions to make the upgrade process more convenient.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >> Jason
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Wed, Nov 17, 2021 at 6:20 PM Jason Gustafson <
> > > > > jason@confluent.io
> > > > > > >
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > A few additional questions:
> > > > > > > > >> >
> > > > > > > > >> > 1. Currently the IBP tells us what version of individual
> > > > > > > inter-broker
> > > > > > > > >> RPCs
> > > > > > > > >> > will be used. I think the plan in this KIP is to use
> > > > ApiVersions
> > > > > > > > request
> > > > > > > > >> > instead to find the highest compatible version (just
> like
> > > > > > clients).
> > > > > > > > Do I
> > > > > > > > >> > have that right?
> > > > > > > > >> >
> > > > > > > > >> > 2. The following wasn't very clear to me:
> > > > > > > > >> >
> > > > > > > > >> > > Brokers will be able to observe changes to
> > > metadata.version
> > > > by
> > > > > > > > >> observing
> > > > > > > > >> > the metadata log, and could then submit a new
> > > > ApiVersionsRequest
> > > > > > to
> > > > > > > > the
> > > > > > > > >> > other Kafka nodes.
> > > > > > > > >> >
> > > > > > > > >> > Is the purpose of submitting new ApiVersions requests to
> > > > update
> > > > > > the
> > > > > > > > >> > features or the RPC versions? Does metadata.version also
> > > > > influence
> > > > > > > the
> > > > > > > > >> > versions that a broker advertises? It would help to have
> > > more
> > > > > > detail
> > > > > > > > >> about
> > > > > > > > >> > this.
> > > > > > > > >> >
> > > > > > > > >> > 3. I imagine users will want to know before performing
> an
> > > > > upgrade
> > > > > > > > >> whether
> > > > > > > > >> > downgrading will be safe. Would the --dry-run flag tell
> > them
> > > > > this?
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> > Jason
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Wed, Nov 17, 2021 at 3:55 PM Colin McCabe <
> > > > > cmccabe@apache.org>
> > > > > > > > >> wrote:
> > > > > > > > >> >
> > > > > > > > >> >> On Wed, Nov 17, 2021, at 11:28, Jason Gustafson wrote:
> > > > > > > > >> >> > Hi David,
> > > > > > > > >> >> >
> > > > > > > > >> >> > Forgive me if this ground has been covered already.
> > > Today,
> > > > we
> > > > > > > have
> > > > > > > > a
> > > > > > > > >> few
> > > > > > > > >> >> > other things that we have latched onto the IBP, such
> as
> > > > > > upgrades
> > > > > > > to
> > > > > > > > >> the
> > > > > > > > >> >> > format of records in __consumer_offsets. I've been
> > > assuming
> > > > > > that
> > > > > > > > >> >> > metadata.version is not covering this. Is that right
> or
> > > is
> > > > > > there
> > > > > > > > some
> > > > > > > > >> >> other
> > > > > > > > >> >> > plan to take care of cases like this?
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > > > > > > > >> >> I think metadata.version could cover changes to things
> > like
> > > > > > > > >> >> __consumer_offsets, if people want it to. Or to put it
> > > > another
> > > > > > way,
> > > > > > > > >> that is
> > > > > > > > >> >> out of scope for this KIP.
> > > > > > > > >> >>
> > > > > > > > >> >> Like David said, basically it boils down to creating a
> > > > feature
> > > > > > flag
> > > > > > > > for
> > > > > > > > >> >> the new proposed __consumer_offsets version, or using a
> > new
> > > > > > > > >> >> IBP/metadata.version for it. Both approaches have pros
> > and
> > > > > cons.
> > > > > > > > Using
> > > > > > > > >> an
> > > > > > > > >> >> IBP/metadata.version bump reduces the size of the
> testing
> > > > > matrix.
> > > > > > > But
> > > > > > > > >> using
> > > > > > > > >> >> a feature flag allows people to avoid any bugs or pain
> > > > > associated
> > > > > > > > with
> > > > > > > > >> the
> > > > > > > > >> >> change if they don't care about enabling it. This is
> > > > basically
> > > > > > the
> > > > > > > > >> classic
> > > > > > > > >> >> "should I use a feature flag or not?" discussion and we
> > > need
> > > > to
> > > > > > > have
> > > > > > > > >> it on
> > > > > > > > >> >> a case-by-case basis.
> > > > > > > > >> >>
> > > > > > > > >> >> I think it's worth calling out that having a 1:1
> mapping
> > > > > between
> > > > > > > IBP
> > > > > > > > >> >> versions and metadata.versions will result in some
> > > > > > > metadata.versions
> > > > > > > > >> that
> > > > > > > > >> >> "don't do anything" (aka they do the same thing as the
> > > > previous
> > > > > > > > >> >> metadata.version). For example, if we change
> > > > StopReplicaRequest
> > > > > > > > again,
> > > > > > > > >> that
> > > > > > > > >> >> will not affect KRaft mode, but probably would require
> an
> > > IBP
> > > > > > bump
> > > > > > > > and
> > > > > > > > >> >> hence a metadata.version bump. I think that's OK. It's
> > not
> > > > that
> > > > > > > > >> different
> > > > > > > > >> >> from updating your IBP and getting support for
> ZStandard,
> > > > when
> > > > > > your
> > > > > > > > >> >> deployment doesn't use ZStandard compression.
> > > > > > > > >> >>
> > > > > > > > >> >> best,
> > > > > > > > >> >> Colin
> > > > > > > > >> >>
> > > > > > > > >> >> > Thanks,
> > > > > > > > >> >> > Jason
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> > On Wed, Nov 17, 2021 at 10:17 AM Jun Rao
> > > > > > > <jun@confluent.io.invalid
> > > > > > > > >
> > > > > > > > >> >> wrote:
> > > > > > > > >> >> >
> > > > > > > > >> >> >> Hi, Colin,
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> Thanks for the reply.
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> For case b, I am not sure that I understand your
> > > > suggestion.
> > > > > > > Does
> > > > > > > > >> "each
> > > > > > > > >> >> >> subsequent level for metadata.version corresponds to
> > an
> > > > IBP
> > > > > > > > version"
> > > > > > > > >> >> mean
> > > > > > > > >> >> >> that we need to keep IBP forever? Could you describe
> > the
> > > > > > upgrade
> > > > > > > > >> >> process in
> > > > > > > > >> >> >> this case?
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> Jun
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> On Tue, Nov 16, 2021 at 3:45 PM Colin McCabe <
> > > > > > > cmccabe@apache.org>
> > > > > > > > >> >> wrote:
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> > On Tue, Nov 16, 2021, at 15:13, Jun Rao wrote:
> > > > > > > > >> >> >> > > Hi, David, Colin,
> > > > > > > > >> >> >> > >
> > > > > > > > >> >> >> > > Thanks for the reply.
> > > > > > > > >> >> >> > >
> > > > > > > > >> >> >> > > 16. Discussed with David offline a bit. We have
> 3
> > > > cases.
> > > > > > > > >> >> >> > > a. We upgrade from an old version where the
> > > > > > metadata.version
> > > > > > > > has
> > > > > > > > >> >> >> already
> > > > > > > > >> >> >> > > been finalized. In this case it makes sense to
> > stay
> > > > with
> > > > > > > that
> > > > > > > > >> >> feature
> > > > > > > > >> >> >> > > version after the upgrade.
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > +1
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > > b. We upgrade from an old version where no
> > > > > > metadata.version
> > > > > > > > has
> > > > > > > > >> >> been
> > > > > > > > >> >> >> > > finalized. In this case, it makes sense to leave
> > > > > > > > >> metadata.version
> > > > > > > > >> >> >> > disabled
> > > > > > > > >> >> >> > > since we don't know if all brokers have been
> > > upgraded.
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > This is the scenario I was hoping to avoid by
> saying
> > > > that
> > > > > > ALL
> > > > > > > > >> KRaft
> > > > > > > > >> >> >> > clusters have metadata.version of at least 1, and
> > each
> > > > > > > > subsequent
> > > > > > > > >> >> level
> > > > > > > > >> >> >> for
> > > > > > > > >> >> >> > metadata.version corresponds to an IBP version.
> The
> > > > > existing
> > > > > > > > KRaft
> > > > > > > > >> >> >> clusters
> > > > > > > > >> >> >> > in 3.0 and earlier are preview (not for
> production)
> > > so I
> > > > > > think
> > > > > > > > >> this
> > > > > > > > >> >> >> change
> > > > > > > > >> >> >> > is OK for 3.x (given that it affects only KRaft).
> > Then
> > > > IBP
> > > > > > is
> > > > > > > > >> >> irrelevant
> > > > > > > > >> >> >> > for KRaft clusters (the config is ignored,
> possibly
> > > > with a
> > > > > > > WARN
> > > > > > > > or
> > > > > > > > >> >> ERROR
> > > > > > > > >> >> >> > message generated if it is set).
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > > c. We are starting from a brand new cluster and
> of
> > > > > course
> > > > > > no
> > > > > > > > >> >> >> > > metadata.version has been finalized. In this
> case,
> > > the
> > > > > KIP
> > > > > > > > says
> > > > > > > > >> it
> > > > > > > > >> >> will
> > > > > > > > >> >> >> > > pick the metadata.version in meta.properties. In
> > the
> > > > > > common
> > > > > > > > >> case,
> > > > > > > > >> >> >> people
> > > > > > > > >> >> >> > > probably won't set the metadata.version in the
> > > > > > > meta.properties
> > > > > > > > >> file
> > > > > > > > >> >> >> > > explicitly. So, it will be useful to put a
> default
> > > > > > (stable)
> > > > > > > > >> version
> > > > > > > > >> >> >> there
> > > > > > > > >> >> >> > > when the meta.properties.
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > Hmm. I was assuming that clusters where the admin
> > > didn't
> > > > > > > specify
> > > > > > > > >> any
> > > > > > > > >> >> >> > metadata.version during formatting would get the
> > > latest
> > > > > > > > >> >> metadata.version.
> > > > > > > > >> >> >> > Partly, because this is what we do for IBP today.
> It
> > > > would
> > > > > > be
> > > > > > > > >> good to
> > > > > > > > >> >> >> > clarify this...
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > >
> > > > > > > > >> >> >> > > Also, it would be useful to clarify that if a
> > > > > > > > FeatureLevelRecord
> > > > > > > > >> >> exists
> > > > > > > > >> >> >> > for
> > > > > > > > >> >> >> > > metadata.version, the metadata.version in
> > > > > meta.properties
> > > > > > > will
> > > > > > > > >> be
> > > > > > > > >> >> >> > ignored.
> > > > > > > > >> >> >> > >
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > Yeah, I agree.
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > best,
> > > > > > > > >> >> >> > Colin
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >> > > Thanks,
> > > > > > > > >> >> >> > >
> > > > > > > > >> >> >> > > Jun
> > > > > > > > >> >> >> > >
> > > > > > > > >> >> >> > >
> > > > > > > > >> >> >> > > On Tue, Nov 16, 2021 at 12:39 PM Colin McCabe <
> > > > > > > > >> cmccabe@apache.org>
> > > > > > > > >> >> >> > wrote:
> > > > > > > > >> >> >> > >
> > > > > > > > >> >> >> > >> On Fri, Nov 5, 2021, at 15:18, Jun Rao wrote:
> > > > > > > > >> >> >> > >> > Hi, David,
> > > > > > > > >> >> >> > >> >
> > > > > > > > >> >> >> > >> > Thanks for the reply.
> > > > > > > > >> >> >> > >> >
> > > > > > > > >> >> >> > >> > 16. My first concern is that the KIP picks up
> > > > > > > meta.version
> > > > > > > > >> >> >> > inconsistently
> > > > > > > > >> >> >> > >> > during the deployment. If a new cluster is
> > > started,
> > > > > we
> > > > > > > pick
> > > > > > > > >> up
> > > > > > > > >> >> the
> > > > > > > > >> >> >> > >> highest
> > > > > > > > >> >> >> > >> > version. If we upgrade, we leave the feature
> > > > version
> > > > > > > > >> unchanged.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> Hi Jun,
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> Thanks again for taking a look.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> The proposed behavior in KIP-778 is consistent
> > with
> > > > how
> > > > > > it
> > > > > > > > >> works
> > > > > > > > >> >> >> today.
> > > > > > > > >> >> >> > >> Upgrading the software is distinct from
> upgrading
> > > the
> > > > > > IBP.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> I think it is important to keep these two
> > > operations
> > > > > > > > >> ("upgrading
> > > > > > > > >> >> >> > >> IBP/metadata version" and "upgrading software
> > > > version")
> > > > > > > > >> separate.
> > > > > > > > >> >> If
> > > > > > > > >> >> >> > they
> > > > > > > > >> >> >> > >> are coupled it will create a situation where
> > > software
> > > > > > > > upgrades
> > > > > > > > >> are
> > > > > > > > >> >> >> > >> difficult and dangerous.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> Consider a situation where you find some bug in
> > > your
> > > > > > > current
> > > > > > > > >> >> software,
> > > > > > > > >> >> >> > and
> > > > > > > > >> >> >> > >> you want to upgrade to new software that fixes
> > the
> > > > bug.
> > > > > > If
> > > > > > > > >> >> upgrades
> > > > > > > > >> >> >> and
> > > > > > > > >> >> >> > IBP
> > > > > > > > >> >> >> > >> bumps are coupled, you can't do this without
> also
> > > > > bumping
> > > > > > > the
> > > > > > > > >> IBP,
> > > > > > > > >> >> >> > which is
> > > > > > > > >> >> >> > >> usually considered a high-risk change. That
> means
> > > > that
> > > > > > > either
> > > > > > > > >> you
> > > > > > > > >> >> have
> > > > > > > > >> >> >> > to
> > > > > > > > >> >> >> > >> make a special build that includes only the fix
> > > > > > > > (time-consuming
> > > > > > > > >> >> and
> > > > > > > > >> >> >> > >> error-prone), live with the bug for longer, or
> be
> > > > very
> > > > > > > > >> >> conservative
> > > > > > > > >> >> >> > about
> > > > > > > > >> >> >> > >> ever introducing new IBP/metadata versions.
> None
> > of
> > > > > those
> > > > > > > are
> > > > > > > > >> >> really
> > > > > > > > >> >> >> > good
> > > > > > > > >> >> >> > >> choices.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> > Intuitively, it seems that independent of
> how a
> > > > > cluster
> > > > > > > is
> > > > > > > > >> >> deployed,
> > > > > > > > >> >> >> > we
> > > > > > > > >> >> >> > >> > should always pick the same feature version.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> I think it makes sense to draw a distinction
> > > between
> > > > > > > > upgrading
> > > > > > > > >> an
> > > > > > > > >> >> >> > existing
> > > > > > > > >> >> >> > >> cluster and deploying a new one. What most
> people
> > > > want
> > > > > > out
> > > > > > > of
> > > > > > > > >> >> upgrades
> > > > > > > > >> >> >> > is
> > > > > > > > >> >> >> > >> that things should keep working, but with bug
> > > fixes.
> > > > If
> > > > > > we
> > > > > > > > >> change
> > > > > > > > >> >> >> that,
> > > > > > > > >> >> >> > it
> > > > > > > > >> >> >> > >> just makes people more reluctant to upgrade
> > (which
> > > is
> > > > > > > always
> > > > > > > > a
> > > > > > > > >> >> >> > problem...)
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> > I think we need to think this through in this
> > > KIP.
> > > > My
> > > > > > > > second
> > > > > > > > >> >> concern
> > > > > > > > >> >> >> > is
> > > > > > > > >> >> >> > >> > that as a particular version matures, it's
> > > > > inconvenient
> > > > > > > > for a
> > > > > > > > >> >> user
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >> > >> manually
> > > > > > > > >> >> >> > >> > upgrade every feature version. As long as we
> > > have a
> > > > > > path
> > > > > > > to
> > > > > > > > >> >> achieve
> > > > > > > > >> >> >> > that
> > > > > > > > >> >> >> > >> in
> > > > > > > > >> >> >> > >> > the future, we don't need to address that in
> > this
> > > > > KIP.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> If people are managing a large number of Kafka
> > > > > clusters,
> > > > > > > they
> > > > > > > > >> will
> > > > > > > > >> >> >> want
> > > > > > > > >> >> >> > to
> > > > > > > > >> >> >> > >> do some sort of A/B testing with IBP/metadata
> > > > versions.
> > > > > > So
> > > > > > > if
> > > > > > > > >> you
> > > > > > > > >> >> have
> > > > > > > > >> >> >> > 1000
> > > > > > > > >> >> >> > >> Kafka clusters, you roll out the new IBP
> version
> > to
> > > > 10
> > > > > of
> > > > > > > > them
> > > > > > > > >> >> and see
> > > > > > > > >> >> >> > how
> > > > > > > > >> >> >> > >> it goes. If that goes well, you roll it out to
> > > more,
> > > > > etc.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> So, the automation needs to be at the cluster
> > > > > management
> > > > > > > > layer,
> > > > > > > > >> >> not at
> > > > > > > > >> >> >> > the
> > > > > > > > >> >> >> > >> Kafka layer. Each Kafka cluster doesn't know
> how
> > > well
> > > > > > > things
> > > > > > > > >> went
> > > > > > > > >> >> in
> > > > > > > > >> >> >> the
> > > > > > > > >> >> >> > >> other 999 clusters. Automatically upgrading is
> a
> > > bad
> > > > > > thing
> > > > > > > > for
> > > > > > > > >> the
> > > > > > > > >> >> >> same
> > > > > > > > >> >> >> > >> reason Kafka automatically upgrading its own
> > > software
> > > > > > > version
> > > > > > > > >> >> would
> > > > > > > > >> >> >> be a
> > > > > > > > >> >> >> > >> bad thing -- it could lead to a disruption to a
> > > > > sensitive
> > > > > > > > >> cluster
> > > > > > > > >> >> at
> > > > > > > > >> >> >> the
> > > > > > > > >> >> >> > >> wrong time.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> For people who are just managing one or two
> Kafka
> > > > > > clusters,
> > > > > > > > >> >> >> > automatically
> > > > > > > > >> >> >> > >> upgrading feature versions isn't a big burden
> and
> > > can
> > > > > be
> > > > > > > done
> > > > > > > > >> >> >> manually.
> > > > > > > > >> >> >> > >> This is all consistent with how IBP works
> today.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> Also, we already have a command-line option to
> > the
> > > > > > feature
> > > > > > > > tool
> > > > > > > > >> >> which
> > > > > > > > >> >> >> > >> upgrades all features to the latest available,
> if
> > > > that
> > > > > is
> > > > > > > > what
> > > > > > > > >> the
> > > > > > > > >> >> >> > >> administrator desires. Perhaps we could add
> > > > > documentation
> > > > > > > > >> saying
> > > > > > > > >> >> that
> > > > > > > > >> >> >> > this
> > > > > > > > >> >> >> > >> should be done as the last step of the upgrade.
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> best,
> > > > > > > > >> >> >> > >> Colin
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> > >> >
> > > > > > > > >> >> >> > >> > 21. "./kafka-features.sh delete": Deleting a
> > > > feature
> > > > > > > seems
> > > > > > > > a
> > > > > > > > >> bit
> > > > > > > > >> >> >> weird
> > > > > > > > >> >> >> > >> > since the logic is always there. Would it be
> > > better
> > > > > to
> > > > > > > use
> > > > > > > > >> >> disable?
> > > > > > > > >> >> >> > >> >
> > > > > > > > >> >> >> > >> > Jun
> > > > > > > > >> >> >> > >> >
> > > > > > > > >> >> >> > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur
> > > > > > > > >> >> >> > >> > <da...@confluent.io.invalid> wrote:
> > > > > > > > >> >> >> > >> >
> > > > > > > > >> >> >> > >> >> Colin and Jun, thanks for the additional
> > > comments!
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> Colin:
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> > We've been talking about having an
> automated
> > > RPC
> > > > > > > > >> >> compatibility
> > > > > > > > >> >> >> > checker
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> Do we have a way to mark fields in schemas
> as
> > > > > > > deprecated?
> > > > > > > > It
> > > > > > > > >> >> can
> > > > > > > > >> >> >> > stay in
> > > > > > > > >> >> >> > >> >> the RPC, it just complicates the logic a
> bit.
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> > It would be nice if the active controller
> > > could
> > > > > > > validate
> > > > > > > > >> >> that a
> > > > > > > > >> >> >> > >> majority
> > > > > > > > >> >> >> > >> >> of the quorum could use the proposed
> > > > > metadata.version.
> > > > > > > The
> > > > > > > > >> >> active
> > > > > > > > >> >> >> > >> >> controller should have this information,
> > right?
> > > If
> > > > > we
> > > > > > > > don't
> > > > > > > > >> >> have
> > > > > > > > >> >> >> > recent
> > > > > > > > >> >> >> > >> >> information  from a quorum of voters, we
> > > wouldn't
> > > > be
> > > > > > > > active.
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> I believe we should have this information
> from
> > > the
> > > > > > > > >> >> >> > ApiVersionsResponse.
> > > > > > > > >> >> >> > >> It
> > > > > > > > >> >> >> > >> >> would be good to do this validation to
> avoid a
> > > > > > situation
> > > > > > > > >> where
> > > > > > > > >> >> a
> > > > > > > > >> >> >> > >> >> quorum leader can't be elected due to
> > > > unprocessable
> > > > > > > > records.
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> > Do we need delete as a command separate
> from
> > > > > > > downgrade?
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> I think from an operator's perspective, it
> is
> > > nice
> > > > > to
> > > > > > > > >> >> distinguish
> > > > > > > > >> >> >> > >> between
> > > > > > > > >> >> >> > >> >> changing a feature flag and unsetting it. It
> > > might
> > > > > be
> > > > > > > > >> >> surprising to
> > > > > > > > >> >> >> > an
> > > > > > > > >> >> >> > >> >> operator to see the flag's version set to
> > > nothing
> > > > > when
> > > > > > > > they
> > > > > > > > >> >> >> requested
> > > > > > > > >> >> >> > >> the
> > > > > > > > >> >> >> > >> >> downgrade to version 0 (or less).
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> > it seems like we should spell out that
> > > > > > > metadata.version
> > > > > > > > >> >> begins at
> > > > > > > > >> >> >> > 1 in
> > > > > > > > >> >> >> > >> >> KRaft clusters
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> I added this text:
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> Introduce an IBP version to indicate the
> > lowest
> > > > > > software
> > > > > > > > >> >> version
> > > > > > > > >> >> >> that
> > > > > > > > >> >> >> > >> >> > supports *metadata.version*. Below this
> IBP,
> > > the
> > > > > > > > >> >> >> > *metadata.version* is
> > > > > > > > >> >> >> > >> >> > undefined and will not be examined. At or
> > > above
> > > > > this
> > > > > > > > IBP,
> > > > > > > > >> the
> > > > > > > > >> >> >> > >> >> > *metadata.version* must be *0* for
> ZooKeeper
> > > > > > clusters
> > > > > > > > and
> > > > > > > > >> >> will be
> > > > > > > > >> >> >> > >> >> > initialized as *1* for KRaft clusters.
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> > We probably also want an RPC implemented
> by
> > > both
> > > > > > > brokers
> > > > > > > > >> and
> > > > > > > > >> >> >> > >> controllers
> > > > > > > > >> >> >> > >> >> that will reveal the min and max supported
> > > > versions
> > > > > > for
> > > > > > > > each
> > > > > > > > >> >> >> feature
> > > > > > > > >> >> >> > >> level
> > > > > > > > >> >> >> > >> >> supported by the server
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> This is available in ApiVersionsResponse (we
> > > > include
> > > > > > the
> > > > > > > > >> >> server's
> > > > > > > > >> >> >> > >> supported
> > > > > > > > >> >> >> > >> >> features as well as the cluster's finalized
> > > > > features)
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> --------
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> Jun:
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> 12. I've updated the KIP with AdminClient
> > > changes
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> 14. You're right, it looks like I missed a
> few
> > > > > > sections
> > > > > > > > >> >> regarding
> > > > > > > > >> >> >> > >> snapshot
> > > > > > > > >> >> >> > >> >> generation. I've corrected it
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> 16. This feels more like an enhancement to
> > > > KIP-584.
> > > > > I
> > > > > > > > agree
> > > > > > > > >> it
> > > > > > > > >> >> >> could
> > > > > > > > >> >> >> > be
> > > > > > > > >> >> >> > >> >> useful, but perhaps we could address it
> > > separately
> > > > > > from
> > > > > > > > >> KRaft
> > > > > > > > >> >> >> > upgrades?
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> 20. Indeed snapshots are not strictly
> > necessary
> > > > > during
> > > > > > > an
> > > > > > > > >> >> upgrade,
> > > > > > > > >> >> >> > I've
> > > > > > > > >> >> >> > >> >> reworded this
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> Thanks!
> > > > > > > > >> >> >> > >> >> David
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> On Thu, Nov 4, 2021 at 6:51 PM Jun Rao
> > > > > > > > >> >> <ju...@confluent.io.invalid>
> > > > > > > > >> >> >> > >> wrote:
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> > Hi, David, Jose and Colin,
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > Thanks for the reply. A few more comments.
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > 12. It seems that we haven't updated the
> > > > > AdminClient
> > > > > > > > >> >> accordingly?
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > 14. "Metadata snapshot is generated and
> sent
> > > to
> > > > > the
> > > > > > > > other
> > > > > > > > >> >> >> inactive
> > > > > > > > >> >> >> > >> >> > controllers and to brokers". I thought we
> > > wanted
> > > > > > each
> > > > > > > > >> broker
> > > > > > > > >> >> to
> > > > > > > > >> >> >> > >> generate
> > > > > > > > >> >> >> > >> >> > its own snapshot independently? If only
> the
> > > > > > controller
> > > > > > > > >> >> generates
> > > > > > > > >> >> >> > the
> > > > > > > > >> >> >> > >> >> > snapshot, how do we force other brokers to
> > > pick
> > > > it
> > > > > > up?
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > 16. If a feature version is new, one may
> not
> > > > want
> > > > > to
> > > > > > > > >> enable
> > > > > > > > >> >> it
> > > > > > > > >> >> >> > >> >> immediately
> > > > > > > > >> >> >> > >> >> > after the cluster is upgraded. However,
> if a
> > > > > feature
> > > > > > > > >> version
> > > > > > > > >> >> has
> > > > > > > > >> >> >> > been
> > > > > > > > >> >> >> > >> >> > stable, requiring every user to run a
> > command
> > > to
> > > > > > > upgrade
> > > > > > > > >> to
> > > > > > > > >> >> that
> > > > > > > > >> >> >> > >> version
> > > > > > > > >> >> >> > >> >> > seems inconvenient. One way to improve
> this
> > is
> > > > for
> > > > > > > each
> > > > > > > > >> >> feature
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >> > >> define
> > > > > > > > >> >> >> > >> >> > one version as the default. Then, when we
> > > > upgrade
> > > > > a
> > > > > > > > >> cluster,
> > > > > > > > >> >> we
> > > > > > > > >> >> >> > will
> > > > > > > > >> >> >> > >> >> > automatically upgrade the feature to the
> > > default
> > > > > > > > version.
> > > > > > > > >> An
> > > > > > > > >> >> >> admin
> > > > > > > > >> >> >> > >> could
> > > > > > > > >> >> >> > >> >> > use the tool to upgrade to a version
> higher
> > > than
> > > > > the
> > > > > > > > >> default.
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > 20. "The quorum controller can assist with
> > > this
> > > > > > > process
> > > > > > > > by
> > > > > > > > >> >> >> > generating
> > > > > > > > >> >> >> > >> a
> > > > > > > > >> >> >> > >> >> > metadata snapshot after a metadata.version
> > > > > increase
> > > > > > > has
> > > > > > > > >> been
> > > > > > > > >> >> >> > >> committed to
> > > > > > > > >> >> >> > >> >> > the metadata log. This snapshot will be a
> > > > > convenient
> > > > > > > way
> > > > > > > > >> to
> > > > > > > > >> >> let
> > > > > > > > >> >> >> > broker
> > > > > > > > >> >> >> > >> >> and
> > > > > > > > >> >> >> > >> >> > controller components rebuild their entire
> > > > > in-memory
> > > > > > > > state
> > > > > > > > >> >> >> > following
> > > > > > > > >> >> >> > >> an
> > > > > > > > >> >> >> > >> >> > upgrade." The new version of the software
> > > could
> > > > > read
> > > > > > > > both
> > > > > > > > >> >> the new
> > > > > > > > >> >> >> > and
> > > > > > > > >> >> >> > >> the
> > > > > > > > >> >> >> > >> >> > old version. Is generating a new snapshot
> > > during
> > > > > > > upgrade
> > > > > > > > >> >> needed?
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > Jun
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > On Wed, Nov 3, 2021 at 5:42 PM Colin
> McCabe
> > <
> > > > > > > > >> >> cmccabe@apache.org>
> > > > > > > > >> >> >> > >> wrote:
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > > On Tue, Oct 12, 2021, at 10:34, Jun Rao
> > > wrote:
> > > > > > > > >> >> >> > >> >> > > > Hi, David,
> > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > >> >> >> > >> >> > > > One more comment.
> > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > >> >> >> > >> >> > > > 16. The main reason why KIP-584
> requires
> > > > > > > finalizing
> > > > > > > > a
> > > > > > > > >> >> feature
> > > > > > > > >> >> >> > >> >> manually
> > > > > > > > >> >> >> > >> >> > is
> > > > > > > > >> >> >> > >> >> > > > that in the ZK world, the controller
> > > doesn't
> > > > > > know
> > > > > > > > all
> > > > > > > > >> >> brokers
> > > > > > > > >> >> >> > in a
> > > > > > > > >> >> >> > >> >> > > cluster.
> > > > > > > > >> >> >> > >> >> > > > A broker temporarily down is not
> > > registered
> > > > in
> > > > > > ZK.
> > > > > > > > in
> > > > > > > > >> the
> > > > > > > > >> >> >> KRaft
> > > > > > > > >> >> >> > >> >> world,
> > > > > > > > >> >> >> > >> >> > > the
> > > > > > > > >> >> >> > >> >> > > > controller keeps track of all brokers,
> > > > > including
> > > > > > > > those
> > > > > > > > >> >> that
> > > > > > > > >> >> >> are
> > > > > > > > >> >> >> > >> >> > > temporarily
> > > > > > > > >> >> >> > >> >> > > > down. This makes it possible for the
> > > > > controller
> > > > > > to
> > > > > > > > >> >> >> > automatically
> > > > > > > > >> >> >> > >> >> > > finalize a
> > > > > > > > >> >> >> > >> >> > > > feature---it's safe to do so when all
> > > > brokers
> > > > > > > > support
> > > > > > > > >> >> that
> > > > > > > > >> >> >> > >> feature.
> > > > > > > > >> >> >> > >> >> > This
> > > > > > > > >> >> >> > >> >> > > > will make the upgrade process much
> > simpler
> > > > > since
> > > > > > > no
> > > > > > > > >> >> manual
> > > > > > > > >> >> >> > >> command is
> > > > > > > > >> >> >> > >> >> > > > required to turn on a new feature.
> Have
> > we
> > > > > > > > considered
> > > > > > > > >> >> this?
> > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > >> >> >> > >> >> > > > Thanks,
> > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > >> >> >> > >> >> > > > Jun
> > > > > > > > >> >> >> > >> >> > >
> > > > > > > > >> >> >> > >> >> > > Hi Jun,
> > > > > > > > >> >> >> > >> >> > >
> > > > > > > > >> >> >> > >> >> > > I guess David commented on this point
> > > already,
> > > > > but
> > > > > > > > I'll
> > > > > > > > >> >> comment
> > > > > > > > >> >> >> > as
> > > > > > > > >> >> >> > >> >> well.
> > > > > > > > >> >> >> > >> >> > I
> > > > > > > > >> >> >> > >> >> > > always had the perception that users
> > viewed
> > > > > rolls
> > > > > > as
> > > > > > > > >> >> >> potentially
> > > > > > > > >> >> >> > >> risky
> > > > > > > > >> >> >> > >> >> > and
> > > > > > > > >> >> >> > >> >> > > were looking for ways to reduce the
> risk.
> > > Not
> > > > > > > enabling
> > > > > > > > >> >> features
> > > > > > > > >> >> >> > >> right
> > > > > > > > >> >> >> > >> >> > away
> > > > > > > > >> >> >> > >> >> > > after installing new software seems like
> > one
> > > > way
> > > > > > to
> > > > > > > do
> > > > > > > > >> >> that. If
> > > > > > > > >> >> >> > we
> > > > > > > > >> >> >> > >> had
> > > > > > > > >> >> >> > >> >> a
> > > > > > > > >> >> >> > >> >> > > feature to automatically upgrade during
> a
> > > > roll,
> > > > > > I'm
> > > > > > > > not
> > > > > > > > >> >> sure
> > > > > > > > >> >> >> > that I
> > > > > > > > >> >> >> > >> >> would
> > > > > > > > >> >> >> > >> >> > > recommend that people use it, because if
> > > > > something
> > > > > > > > >> fails,
> > > > > > > > >> >> it
> > > > > > > > >> >> >> > makes
> > > > > > > > >> >> >> > >> it
> > > > > > > > >> >> >> > >> >> > > harder to tell if the new feature is at
> > > fault,
> > > > > or
> > > > > > > > >> something
> > > > > > > > >> >> >> else
> > > > > > > > >> >> >> > in
> > > > > > > > >> >> >> > >> the
> > > > > > > > >> >> >> > >> >> > new
> > > > > > > > >> >> >> > >> >> > > software.
> > > > > > > > >> >> >> > >> >> > >
> > > > > > > > >> >> >> > >> >> > > We already tell users to do a "double
> > roll"
> > > > when
> > > > > > > going
> > > > > > > > >> to
> > > > > > > > >> >> a new
> > > > > > > > >> >> >> > IBP.
> > > > > > > > >> >> >> > >> >> > (Just
> > > > > > > > >> >> >> > >> >> > > to give background to people who haven't
> > > heard
> > > > > > that
> > > > > > > > >> >> phrase, the
> > > > > > > > >> >> >> > >> first
> > > > > > > > >> >> >> > >> >> > roll
> > > > > > > > >> >> >> > >> >> > > installs the new software, and the
> second
> > > roll
> > > > > > > updates
> > > > > > > > >> the
> > > > > > > > >> >> >> IBP).
> > > > > > > > >> >> >> > So
> > > > > > > > >> >> >> > >> >> this
> > > > > > > > >> >> >> > >> >> > > KIP-778 mechanism is basically very
> > similar
> > > to
> > > > > > that,
> > > > > > > > >> >> except the
> > > > > > > > >> >> >> > >> second
> > > > > > > > >> >> >> > >> >> > > thing isn't a roll, but just an upgrade
> > > > command.
> > > > > > So
> > > > > > > I
> > > > > > > > >> think
> > > > > > > > >> >> >> this
> > > > > > > > >> >> >> > is
> > > > > > > > >> >> >> > >> >> > > consistent with what we currently do.
> > > > > > > > >> >> >> > >> >> > >
> > > > > > > > >> >> >> > >> >> > > Also, just like David said, we can
> always
> > > add
> > > > > > > > >> auto-upgrade
> > > > > > > > >> >> >> later
> > > > > > > > >> >> >> > if
> > > > > > > > >> >> >> > >> >> there
> > > > > > > > >> >> >> > >> >> > > is demand...
> > > > > > > > >> >> >> > >> >> > >
> > > > > > > > >> >> >> > >> >> > > best,
> > > > > > > > >> >> >> > >> >> > > Colin
> > > > > > > > >> >> >> > >> >> > >
> > > > > > > > >> >> >> > >> >> > >
> > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > >> >> >> > >> >> > > > On Thu, Oct 7, 2021 at 5:19 PM Jun
> Rao <
> > > > > > > > >> jun@confluent.io
> > > > > > > > >> >> >
> > > > > > > > >> >> >> > wrote:
> > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > >> >> >> > >> >> > > >> Hi, David,
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> Thanks for the KIP. A few comments
> > below.
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> 10. It would be useful to describe
> how
> > > the
> > > > > > > > controller
> > > > > > > > >> >> node
> > > > > > > > >> >> >> > >> >> determines
> > > > > > > > >> >> >> > >> >> > > the
> > > > > > > > >> >> >> > >> >> > > >> RPC version used to communicate to
> > other
> > > > > > > controller
> > > > > > > > >> >> nodes.
> > > > > > > > >> >> >> > There
> > > > > > > > >> >> >> > >> >> seems
> > > > > > > > >> >> >> > >> >> > > to
> > > > > > > > >> >> >> > >> >> > > >> be a bootstrap problem. A controller
> > node
> > > > > can't
> > > > > > > > read
> > > > > > > > >> >> the log
> > > > > > > > >> >> >> > and
> > > > > > > > >> >> >> > >> >> > > >> therefore the feature level until a
> > > quorum
> > > > > > leader
> > > > > > > > is
> > > > > > > > >> >> >> elected.
> > > > > > > > >> >> >> > But
> > > > > > > > >> >> >> > >> >> > leader
> > > > > > > > >> >> >> > >> >> > > >> election requires an RPC.
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> 11. For downgrades, it would be
> useful
> > to
> > > > > > > describe
> > > > > > > > >> how
> > > > > > > > >> >> to
> > > > > > > > >> >> >> > >> determine
> > > > > > > > >> >> >> > >> >> > the
> > > > > > > > >> >> >> > >> >> > > >> downgrade process (generating new
> > > snapshot,
> > > > > > > > >> propagating
> > > > > > > > >> >> the
> > > > > > > > >> >> >> > >> >> snapshot,
> > > > > > > > >> >> >> > >> >> > > etc)
> > > > > > > > >> >> >> > >> >> > > >> has completed. We could block the
> > > > > UpdateFeature
> > > > > > > > >> request
> > > > > > > > >> >> >> until
> > > > > > > > >> >> >> > the
> > > > > > > > >> >> >> > >> >> > > process
> > > > > > > > >> >> >> > >> >> > > >> is completed. However, since the
> > process
> > > > > could
> > > > > > > take
> > > > > > > > >> >> time,
> > > > > > > > >> >> >> the
> > > > > > > > >> >> >> > >> >> request
> > > > > > > > >> >> >> > >> >> > > could
> > > > > > > > >> >> >> > >> >> > > >> time out. Another way is through
> > > > > > DescribeFeature
> > > > > > > > and
> > > > > > > > >> the
> > > > > > > > >> >> >> > server
> > > > > > > > >> >> >> > >> only
> > > > > > > > >> >> >> > >> >> > > >> reports downgraded versions after the
> > > > process
> > > > > > is
> > > > > > > > >> >> completed.
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> 12. Since we are changing
> > > > > > UpdateFeaturesRequest,
> > > > > > > do
> > > > > > > > >> we
> > > > > > > > >> >> need
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >> > >> >> change
> > > > > > > > >> >> >> > >> >> > > the
> > > > > > > > >> >> >> > >> >> > > >> AdminClient api for updateFeatures
> too?
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> 13. For the paragraph starting with
> "In
> > > the
> > > > > > > absence
> > > > > > > > >> of
> > > > > > > > >> >> an
> > > > > > > > >> >> >> > >> operator
> > > > > > > > >> >> >> > >> >> > > >> defined value for metadata.version",
> in
> > > > > > KIP-584,
> > > > > > > we
> > > > > > > > >> >> >> described
> > > > > > > > >> >> >> > >> how to
> > > > > > > > >> >> >> > >> >> > > >> finalize features with New cluster
> > > > bootstrap.
> > > > > > In
> > > > > > > > that
> > > > > > > > >> >> case,
> > > > > > > > >> >> >> > it's
> > > > > > > > >> >> >> > >> >> > > >> inconvenient for the users to have to
> > run
> > > > an
> > > > > > > admin
> > > > > > > > >> tool
> > > > > > > > >> >> to
> > > > > > > > >> >> >> > >> finalize
> > > > > > > > >> >> >> > >> >> > the
> > > > > > > > >> >> >> > >> >> > > >> version for each feature. Instead,
> the
> > > > system
> > > > > > > > detects
> > > > > > > > >> >> that
> > > > > > > > >> >> >> the
> > > > > > > > >> >> >> > >> >> > /features
> > > > > > > > >> >> >> > >> >> > > >> path is missing in ZK and thus
> > > > automatically
> > > > > > > > >> finalizes
> > > > > > > > >> >> every
> > > > > > > > >> >> >> > >> feature
> > > > > > > > >> >> >> > >> >> > > with
> > > > > > > > >> >> >> > >> >> > > >> the latest supported version. Could
> we
> > do
> > > > > > > something
> > > > > > > > >> >> similar
> > > > > > > > >> >> >> in
> > > > > > > > >> >> >> > >> the
> > > > > > > > >> >> >> > >> >> > KRaft
> > > > > > > > >> >> >> > >> >> > > >> mode?
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> 14. After the quorum leader
> generates a
> > > new
> > > > > > > > snapshot,
> > > > > > > > >> >> how do
> > > > > > > > >> >> >> > we
> > > > > > > > >> >> >> > >> >> force
> > > > > > > > >> >> >> > >> >> > > >> other nodes to pick up the new
> > snapshot?
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> 15. I agree with Jose that it will be
> > > > useful
> > > > > to
> > > > > > > > >> describe
> > > > > > > > >> >> >> when
> > > > > > > > >> >> >> > >> >> > > generating a
> > > > > > > > >> >> >> > >> >> > > >> new snapshot is needed. To me, it
> seems
> > > the
> > > > > new
> > > > > > > > >> >> snapshot is
> > > > > > > > >> >> >> > only
> > > > > > > > >> >> >> > >> >> > needed
> > > > > > > > >> >> >> > >> >> > > >> when incompatible changes are made.
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> 7. Jose, what control records were
> you
> > > > > > referring?
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> Thanks,
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> Jun
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >> On Tue, Oct 5, 2021 at 8:53 AM David
> > > > Arthur <
> > > > > > > > >> >> >> > >> davidarthur@apache.org
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >> > > >> wrote:
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > > >>> Jose, thanks for the thorough review
> > and
> > > > > > > comments!
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> I am out of the office until next
> > week,
> > > > so I
> > > > > > > > >> probably
> > > > > > > > >> >> won't
> > > > > > > > >> >> >> > be
> > > > > > > > >> >> >> > >> able
> > > > > > > > >> >> >> > >> >> > to
> > > > > > > > >> >> >> > >> >> > > >>> update the KIP until then. Here are
> > some
> > > > > > replies
> > > > > > > > to
> > > > > > > > >> >> your
> > > > > > > > >> >> >> > >> questions:
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> 1. Generate snapshot on upgrade
> > > > > > > > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated
> and
> > > > sent
> > > > > to
> > > > > > > the
> > > > > > > > >> >> other
> > > > > > > > >> >> >> > nodes
> > > > > > > > >> >> >> > >> >> > > >>> > Why does the Active Controller
> need
> > to
> > > > > > > generate
> > > > > > > > a
> > > > > > > > >> new
> > > > > > > > >> >> >> > snapshot
> > > > > > > > >> >> >> > >> >> and
> > > > > > > > >> >> >> > >> >> > > >>> > force a snapshot fetch from the
> > > replicas
> > > > > > > > (inactive
> > > > > > > > >> >> >> > controller
> > > > > > > > >> >> >> > >> and
> > > > > > > > >> >> >> > >> >> > > >>> > brokers) on an upgrade? Isn't
> > writing
> > > > the
> > > > > > > > >> >> >> > FeatureLevelRecord
> > > > > > > > >> >> >> > >> good
> > > > > > > > >> >> >> > >> >> > > >>> > enough to communicate the upgrade
> to
> > > the
> > > > > > > > replicas?
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> You're right, we don't necessarily
> > need
> > > to
> > > > > > > > >> _transmit_ a
> > > > > > > > >> >> >> > >> snapshot,
> > > > > > > > >> >> >> > >> >> > since
> > > > > > > > >> >> >> > >> >> > > >>> each node can generate its own
> > > equivalent
> > > > > > > snapshot
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> 2. Generate snapshot on downgrade
> > > > > > > > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated
> and
> > > > sent
> > > > > to
> > > > > > > the
> > > > > > > > >> >> other
> > > > > > > > >> >> >> > >> inactive
> > > > > > > > >> >> >> > >> >> > > >>> > controllers and to brokers (this
> > > > snapshot
> > > > > > may
> > > > > > > be
> > > > > > > > >> >> lossy!)
> > > > > > > > >> >> >> > >> >> > > >>> > Why do we need to send this
> > downgraded
> > > > > > > snapshot
> > > > > > > > to
> > > > > > > > >> >> the
> > > > > > > > >> >> >> > >> brokers?
> > > > > > > > >> >> >> > >> >> The
> > > > > > > > >> >> >> > >> >> > > >>> > replicas have seen the
> > > > FeatureLevelRecord
> > > > > > and
> > > > > > > > >> >> noticed the
> > > > > > > > >> >> >> > >> >> > downgrade.
> > > > > > > > >> >> >> > >> >> > > >>> > Can we have the replicas each
> > > > > independently
> > > > > > > > >> generate
> > > > > > > > >> >> a
> > > > > > > > >> >> >> > >> downgraded
> > > > > > > > >> >> >> > >> >> > > >>> > snapshot at the offset for the
> > > > downgraded
> > > > > > > > >> >> >> > FeatureLevelRecord?
> > > > > > > > >> >> >> > >> I
> > > > > > > > >> >> >> > >> >> > > assume
> > > > > > > > >> >> >> > >> >> > > >>> > that the active controller will
> > > > guarantee
> > > > > > that
> > > > > > > > all
> > > > > > > > >> >> >> records
> > > > > > > > >> >> >> > >> after
> > > > > > > > >> >> >> > >> >> > the
> > > > > > > > >> >> >> > >> >> > > >>> > FatureLevelRecord use the
> downgraded
> > > > > > version.
> > > > > > > If
> > > > > > > > >> so,
> > > > > > > > >> >> it
> > > > > > > > >> >> >> > would
> > > > > > > > >> >> >> > >> be
> > > > > > > > >> >> >> > >> >> > good
> > > > > > > > >> >> >> > >> >> > > >>> > to mention that explicitly.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> Similar to above, yes a broker that
> > > > detects
> > > > > a
> > > > > > > > >> >> downgrade via
> > > > > > > > >> >> >> > >> >> > > >>> FeatureLevelRecord could generate
> its
> > > own
> > > > > > > > downgrade
> > > > > > > > >> >> >> snapshot
> > > > > > > > >> >> >> > and
> > > > > > > > >> >> >> > >> >> > reload
> > > > > > > > >> >> >> > >> >> > > >>> its
> > > > > > > > >> >> >> > >> >> > > >>> state from that. This does get a
> > little
> > > > > fuzzy
> > > > > > > when
> > > > > > > > >> we
> > > > > > > > >> >> >> > consider
> > > > > > > > >> >> >> > >> >> cases
> > > > > > > > >> >> >> > >> >> > > where
> > > > > > > > >> >> >> > >> >> > > >>> brokers are on different software
> > > versions
> > > > > and
> > > > > > > > >> could be
> > > > > > > > >> >> >> > >> generating
> > > > > > > > >> >> >> > >> >> a
> > > > > > > > >> >> >> > >> >> > > >>> downgrade snapshot for version X,
> but
> > > > using
> > > > > > > > >> different
> > > > > > > > >> >> >> > versions
> > > > > > > > >> >> >> > >> of
> > > > > > > > >> >> >> > >> >> the
> > > > > > > > >> >> >> > >> >> > > >>> code.
> > > > > > > > >> >> >> > >> >> > > >>> It might be safer to let the
> > controller
> > > > > > generate
> > > > > > > > the
> > > > > > > > >> >> >> > snapshot so
> > > > > > > > >> >> >> > >> >> each
> > > > > > > > >> >> >> > >> >> > > >>> broker (regardless of software
> > version)
> > > > gets
> > > > > > the
> > > > > > > > >> same
> > > > > > > > >> >> >> > records.
> > > > > > > > >> >> >> > >> >> > However,
> > > > > > > > >> >> >> > >> >> > > >>> for
> > > > > > > > >> >> >> > >> >> > > >>> upgrades (or downgrades) we expect
> the
> > > > whole
> > > > > > > > cluster
> > > > > > > > >> >> to be
> > > > > > > > >> >> >> > >> running
> > > > > > > > >> >> >> > >> >> > the
> > > > > > > > >> >> >> > >> >> > > >>> same
> > > > > > > > >> >> >> > >> >> > > >>> software version before triggering
> the
> > > > > > > > >> metadata.version
> > > > > > > > >> >> >> > change,
> > > > > > > > >> >> >> > >> so
> > > > > > > > >> >> >> > >> >> > > perhaps
> > > > > > > > >> >> >> > >> >> > > >>> this isn't a likely scenario.
> > Thoughts?
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> 3. Max metadata version
> > > > > > > > >> >> >> > >> >> > > >>> > >For the first release that
> supports
> > > > > > > > >> >> metadata.version, we
> > > > > > > > >> >> >> > can
> > > > > > > > >> >> >> > >> >> > simply
> > > > > > > > >> >> >> > >> >> > > >>> > initialize metadata.version with
> the
> > > > > current
> > > > > > > > (and
> > > > > > > > >> >> only)
> > > > > > > > >> >> >> > >> version.
> > > > > > > > >> >> >> > >> >> > For
> > > > > > > > >> >> >> > >> >> > > >>> future
> > > > > > > > >> >> >> > >> >> > > >>> > releases, we will need a mechanism
> > to
> > > > > > > bootstrap
> > > > > > > > a
> > > > > > > > >> >> >> > particular
> > > > > > > > >> >> >> > >> >> > version.
> > > > > > > > >> >> >> > >> >> > > >>> This
> > > > > > > > >> >> >> > >> >> > > >>> > could be done using the
> > > meta.properties
> > > > > file
> > > > > > > or
> > > > > > > > >> some
> > > > > > > > >> >> >> > similar
> > > > > > > > >> >> >> > >> >> > > mechanism.
> > > > > > > > >> >> >> > >> >> > > >>> The
> > > > > > > > >> >> >> > >> >> > > >>> > reason we need the allow for a
> > > specific
> > > > > > > initial
> > > > > > > > >> >> version
> > > > > > > > >> >> >> is
> > > > > > > > >> >> >> > to
> > > > > > > > >> >> >> > >> >> > support
> > > > > > > > >> >> >> > >> >> > > >>> the
> > > > > > > > >> >> >> > >> >> > > >>> > use case of starting a Kafka
> cluster
> > > at
> > > > > > > version
> > > > > > > > X
> > > > > > > > >> >> with an
> > > > > > > > >> >> >> > >> older
> > > > > > > > >> >> >> > >> >> > > >>> > metadata.version.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> I assume that the Active Controller
> > will
> > > > > learn
> > > > > > > the
> > > > > > > > >> >> metadata
> > > > > > > > >> >> >> > >> version
> > > > > > > > >> >> >> > >> >> > of
> > > > > > > > >> >> >> > >> >> > > >>> > the broker through the
> > > > > > > > BrokerRegistrationRequest.
> > > > > > > > >> How
> > > > > > > > >> >> >> will
> > > > > > > > >> >> >> > the
> > > > > > > > >> >> >> > >> >> > Active
> > > > > > > > >> >> >> > >> >> > > >>> > Controller learn about the max
> > > metadata
> > > > > > > version
> > > > > > > > of
> > > > > > > > >> >> the
> > > > > > > > >> >> >> > >> inactive
> > > > > > > > >> >> >> > >> >> > > >>> > controller nodes? We currently
> don't
> > > > send
> > > > > a
> > > > > > > > >> >> registration
> > > > > > > > >> >> >> > >> request
> > > > > > > > >> >> >> > >> >> > from
> > > > > > > > >> >> >> > >> >> > > >>> > the inactive controller to the
> > active
> > > > > > > > controller.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> This came up during the design, but
> I
> > > > > > neglected
> > > > > > > to
> > > > > > > > >> add
> > > > > > > > >> >> it
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >> > the
> > > > > > > > >> >> >> > >> >> KIP.
> > > > > > > > >> >> >> > >> >> > > We
> > > > > > > > >> >> >> > >> >> > > >>> will need a mechanism for
> determining
> > > the
> > > > > > > > supported
> > > > > > > > >> >> >> features
> > > > > > > > >> >> >> > of
> > > > > > > > >> >> >> > >> >> each
> > > > > > > > >> >> >> > >> >> > > >>> controller similar to how brokers
> use
> > > > > > > > >> >> >> > BrokerRegistrationRequest.
> > > > > > > > >> >> >> > >> >> > > Perhaps
> > > > > > > > >> >> >> > >> >> > > >>> controllers could write a
> > > > FeatureLevelRecord
> > > > > > (or
> > > > > > > > >> >> similar)
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >> > the
> > > > > > > > >> >> >> > >> >> > > metadata
> > > > > > > > >> >> >> > >> >> > > >>> log indicating their supported
> > version.
> > > > > WDYT?
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> Why do you need to bootstrap a
> > > particular
> > > > > > > version?
> > > > > > > > >> >> Isn't
> > > > > > > > >> >> >> the
> > > > > > > > >> >> >> > >> intent
> > > > > > > > >> >> >> > >> >> > > >>> > that the broker will learn the
> > active
> > > > > > metadata
> > > > > > > > >> >> version by
> > > > > > > > >> >> >> > >> reading
> > > > > > > > >> >> >> > >> >> > the
> > > > > > > > >> >> >> > >> >> > > >>> > metadata before unfencing?
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> This bootstrapping is needed for
> when
> > a
> > > > > KRaft
> > > > > > > > >> cluster
> > > > > > > > >> >> is
> > > > > > > > >> >> >> > first
> > > > > > > > >> >> >> > >> >> > > started. If
> > > > > > > > >> >> >> > >> >> > > >>> we don't have this mechanism, the
> > > cluster
> > > > > > can't
> > > > > > > > >> really
> > > > > > > > >> >> do
> > > > > > > > >> >> >> > >> anything
> > > > > > > > >> >> >> > >> >> > > until
> > > > > > > > >> >> >> > >> >> > > >>> the operator finalizes the
> > > > metadata.version
> > > > > > with
> > > > > > > > the
> > > > > > > > >> >> tool.
> > > > > > > > >> >> >> > The
> > > > > > > > >> >> >> > >> >> > > >>> bootstrapping will be done by the
> > > > controller
> > > > > > and
> > > > > > > > the
> > > > > > > > >> >> >> brokers
> > > > > > > > >> >> >> > >> will
> > > > > > > > >> >> >> > >> >> see
> > > > > > > > >> >> >> > >> >> > > this
> > > > > > > > >> >> >> > >> >> > > >>> version as a record (like you say).
> > I'll
> > > > add
> > > > > > > some
> > > > > > > > >> text
> > > > > > > > >> >> to
> > > > > > > > >> >> >> > >> clarify
> > > > > > > > >> >> >> > >> >> > this.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> 4. Reject Registration - This is
> > related
> > > > to
> > > > > > the
> > > > > > > > >> bullet
> > > > > > > > >> >> >> point
> > > > > > > > >> >> >> > >> above.
> > > > > > > > >> >> >> > >> >> > > >>> > What will be the behavior of the
> > > active
> > > > > > > > controller
> > > > > > > > >> >> if the
> > > > > > > > >> >> >> > >> broker
> > > > > > > > >> >> >> > >> >> > > sends
> > > > > > > > >> >> >> > >> >> > > >>> > a metadata version that is not
> > > > compatible
> > > > > > with
> > > > > > > > the
> > > > > > > > >> >> >> cluster
> > > > > > > > >> >> >> > >> wide
> > > > > > > > >> >> >> > >> >> > > >>> > metadata version?
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> If a broker starts up with a lower
> > > > supported
> > > > > > > > version
> > > > > > > > >> >> range
> > > > > > > > >> >> >> > than
> > > > > > > > >> >> >> > >> the
> > > > > > > > >> >> >> > >> >> > > >>> current
> > > > > > > > >> >> >> > >> >> > > >>> cluster metadata.version, it should
> > log
> > > an
> > > > > > error
> > > > > > > > and
> > > > > > > > >> >> >> > shutdown.
> > > > > > > > >> >> >> > >> This
> > > > > > > > >> >> >> > >> >> > is
> > > > > > > > >> >> >> > >> >> > > in
> > > > > > > > >> >> >> > >> >> > > >>> line with KIP-584.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> 5. Discover upgrade
> > > > > > > > >> >> >> > >> >> > > >>> > > This snapshot will be a
> convenient
> > > way
> > > > > to
> > > > > > > let
> > > > > > > > >> >> broker
> > > > > > > > >> >> >> and
> > > > > > > > >> >> >> > >> >> > controller
> > > > > > > > >> >> >> > >> >> > > >>> > components rebuild their entire
> > > > in-memory
> > > > > > > state
> > > > > > > > >> >> following
> > > > > > > > >> >> >> > an
> > > > > > > > >> >> >> > >> >> > upgrade.
> > > > > > > > >> >> >> > >> >> > > >>> > Can we rely on the presence of the
> > > > > > > > >> >> FeatureLevelRecord for
> > > > > > > > >> >> >> > the
> > > > > > > > >> >> >> > >> >> > > metadata
> > > > > > > > >> >> >> > >> >> > > >>> > version for this functionality? If
> > so,
> > > > it
> > > > > > > avoids
> > > > > > > > >> >> having
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >> > >> reload
> > > > > > > > >> >> >> > >> >> > the
> > > > > > > > >> >> >> > >> >> > > >>> > snapshot.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> For upgrades, yes probably since we
> > > won't
> > > > > need
> > > > > > > to
> > > > > > > > >> >> "rewrite"
> > > > > > > > >> >> >> > any
> > > > > > > > >> >> >> > >> >> > > records in
> > > > > > > > >> >> >> > >> >> > > >>> this case. For downgrades, we will
> > need
> > > to
> > > > > > > > generate
> > > > > > > > >> the
> > > > > > > > >> >> >> > snapshot
> > > > > > > > >> >> >> > >> >> and
> > > > > > > > >> >> >> > >> >> > > >>> reload
> > > > > > > > >> >> >> > >> >> > > >>> everything.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> 6. Metadata version specification
> > > > > > > > >> >> >> > >> >> > > >>> > >  V4(version=4,
> > > > > > isBackwardsCompatible=false,
> > > > > > > > >> >> >> > description="New
> > > > > > > > >> >> >> > >> >> > > metadata
> > > > > > > > >> >> >> > >> >> > > >>> > record type Bar"),
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> Very cool. Do you have plans to
> > generate
> > > > > > Apache
> > > > > > > > >> Kafka
> > > > > > > > >> >> HTML
> > > > > > > > >> >> >> > >> >> > > >>> > documentation for this
> information?
> > > > Would
> > > > > be
> > > > > > > > >> helpful
> > > > > > > > >> >> to
> > > > > > > > >> >> >> > >> display
> > > > > > > > >> >> >> > >> >> > this
> > > > > > > > >> >> >> > >> >> > > >>> > information to the user using the
> > > > > > > > >> kafka-features.sh
> > > > > > > > >> >> and
> > > > > > > > >> >> >> > >> feature
> > > > > > > > >> >> >> > >> >> > RPC?
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> Hm good idea :) I'll add a brief
> > section
> > > > on
> > > > > > > > >> >> documentation.
> > > > > > > > >> >> >> > This
> > > > > > > > >> >> >> > >> >> would
> > > > > > > > >> >> >> > >> >> > > >>> certainly be very useful
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> 7.Downgrade records
> > > > > > > > >> >> >> > >> >> > > >>> > I think we should explicitly
> mention
> > > > that
> > > > > > the
> > > > > > > > >> >> downgrade
> > > > > > > > >> >> >> > >> process
> > > > > > > > >> >> >> > >> >> > will
> > > > > > > > >> >> >> > >> >> > > >>> > downgrade both metadata records
> and
> > > > > > controller
> > > > > > > > >> >> records.
> > > > > > > > >> >> >> In
> > > > > > > > >> >> >> > >> >> KIP-630
> > > > > > > > >> >> >> > >> >> > we
> > > > > > > > >> >> >> > >> >> > > >>> > introduced two control records for
> > > > > > snapshots.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> Yes, good call. Let me re-read that
> > KIP
> > > > and
> > > > > > > > include
> > > > > > > > >> >> some
> > > > > > > > >> >> >> > >> details.
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> Thanks again for the comments!
> > > > > > > > >> >> >> > >> >> > > >>> -David
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> On Wed, Sep 29, 2021 at 5:09 PM José
> > > > Armando
> > > > > > > > García
> > > > > > > > >> >> Sancio
> > > > > > > > >> >> >> > >> >> > > >>> <js...@confluent.io.invalid>
> wrote:
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>> > One more comment.
> > > > > > > > >> >> >> > >> >> > > >>> >
> > > > > > > > >> >> >> > >> >> > > >>> > 7.Downgrade records
> > > > > > > > >> >> >> > >> >> > > >>> > I think we should explicitly
> mention
> > > > that
> > > > > > the
> > > > > > > > >> >> downgrade
> > > > > > > > >> >> >> > >> process
> > > > > > > > >> >> >> > >> >> > will
> > > > > > > > >> >> >> > >> >> > > >>> > downgrade both metadata records
> and
> > > > > > controller
> > > > > > > > >> >> records.
> > > > > > > > >> >> >> In
> > > > > > > > >> >> >> > >> >> KIP-630
> > > > > > > > >> >> >> > >> >> > we
> > > > > > > > >> >> >> > >> >> > > >>> > introduced two control records for
> > > > > > snapshots.
> > > > > > > > >> >> >> > >> >> > > >>> >
> > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > >> >> >> > >> >> > >
> > > > > > > > >> >> >> > >> >> >
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >> >> --
> > > > > > > > >> >> >> > >> >> -David
> > > > > > > > >> >> >> > >> >>
> > > > > > > > >> >> >> > >>
> > > > > > > > >> >> >> >
> > > > > > > > >> >> >>
> > > > > > > > >> >>
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -David
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -David
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -David
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -David
> > > >
> > >
> >
> >
> > --
> > -David
> >
>

Re: [DISCUSS] KIP-778 KRaft Upgrades

Posted by David Arthur <da...@confluent.io.INVALID>.
Tom, thanks for taking a look!

1. Yes you're right, I'll fix that
2. Components refers to broker objects like ReplicaManager,
ClientQuotaManager, etc. Controller components (so far) don't need to
reload any state based on metadata.version changes.
3. Yes, if a controller reads a FeatureLevelRecord from a snapshot or
commit event, and it's for an unsupportable metadata.version, it will
terminate
4. I think it should be impossible for the active controller to read an
unsupported metadata.version except for bugs
5. Yes that's possible, however once B sees the new version, it will close
its connections to A and force the ApiVersions exchange to happen again.
This is an area we're looking to improve on in a future KIP
6. Yes, it should, I'll add.

Cheers,
David

On Fri, Dec 17, 2021 at 12:16 PM Tom Bentley <tb...@redhat.com> wrote:

> Hi David,
>
> Thanks for the KIP. Sorry I'm so late in looking at this. I have a few
> questions.
>
> 1. In the Proposed Changes Overview it says "The controller validates that
> the cluster can be safely downgraded to this version (override with
> --force)" I think that be "--unsafe"?
>
> 2. Also in this section it says "Components reload their state with new
> version" the word "components" is a little vague. I think it means
> controllers and brokers, right?
>
> 3. In the compatibility section it says "A controller running old software
> will join the quorum and begin replicating the metadata log. If this
> inactive controller encounters a FeatureLevelRecord for *metadata.version*
> that it cannot support, it should terminate." I assume "encounters"
> includes the snapshot case?
>
> 4. "In the unlikely event that an active controller encounters an
> unsupported *metadata.version*, it should resign and terminate. ", I'm
> unclear on how this could happen, could you elaborate?
>
> 5. In the ApiVersions part of the Upgrades section, "Brokers will observe
> changes to *metadata.version *as they replicate records from the metadata
> log. If a new *metadata.version* is seen, brokers will renegotiate
> compatible RPCs with other brokers through the the ApiVersions workflow."
> Is it possible that broker A does a metadata fetch, sees a changed
> metadata.version and attempts renegotiation with broker B before B has seen
> the metadata.version?
>
> 6. Why does "kafka-features.sh downgrade" not support "--release"?
>
> Kind regards,
>
> Tom
>
> On Tue, Nov 30, 2021 at 10:26 PM Jun Rao <ju...@confluent.io.invalid> wrote:
>
> > Hi, David,
> >
> > Thanks for the reply. No more questions from me.
> >
> > Jun
> >
> > On Tue, Nov 30, 2021 at 1:52 PM David Arthur
> > <da...@confluent.io.invalid> wrote:
> >
> > > Thanks Jun,
> > >
> > > 30: I clarified the description of the "upgrade" command to:
> > >
> > > The FEATURE and VERSION arguments can be repeated to indicate an
> upgrade
> > of
> > > > multiple features in one request. Alternatively, the RELEASE flag can
> > be
> > > > given to upgrade to the default versions of the specified release.
> > These
> > > > two options, FEATURE and RELEASE, are mutually exclusive. If neither
> is
> > > > given, this command will do nothing.
> > >
> > >
> > > Basically, you must provide either "kafka-features.sh upgrade --release
> > > 3.2" or something like "kafka-features.sh upgrade --feature X --version
> > 10"
> > >
> > > -David
> > >
> > > On Tue, Nov 30, 2021 at 2:51 PM Jun Rao <ju...@confluent.io.invalid>
> > wrote:
> > >
> > > > Hi, David,
> > > >
> > > > Thanks for the reply. Just one more minor comment.
> > > >
> > > > 30. ./kafka-features.sh upgrade: It seems that the release param is
> > > > optional. Could you describe the semantic when release is not
> > specified?
> > > >
> > > > Jun
> > > >
> > > > On Mon, Nov 29, 2021 at 5:06 PM David Arthur
> > > > <da...@confluent.io.invalid> wrote:
> > > >
> > > > > Jun, I updated the KIP with the "disable" CLI.
> > > > >
> > > > > For 16, I think you're asking how we can safely introduce the
> > > > > initial version of other feature flags in the future. This will
> > > probably
> > > > > depend on the nature of the feature that the new flag is gating. We
> > can
> > > > > probably take a similar approach and say version 1 is backwards
> > > > compatible
> > > > > to some point (possibly an IBP or metadata.version?).
> > > > >
> > > > > -David
> > > > >
> > > > > On Fri, Nov 19, 2021 at 3:00 PM Jun Rao <ju...@confluent.io.invalid>
> > > > wrote:
> > > > >
> > > > > > Hi, David,
> > > > > >
> > > > > > Thanks for the reply. The new CLI sounds reasonable to me.
> > > > > >
> > > > > > 16.
> > > > > > For case C, choosing the latest version sounds good to me.
> > > > > > For case B, for metadata.version, we pick version 1 since it just
> > > > happens
> > > > > > that for metadata.version version 1 is backward compatible. How
> do
> > we
> > > > > make
> > > > > > this more general for other features?
> > > > > >
> > > > > > 21. Do you still plan to change "delete" to "disable" in the CLI?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 18, 2021 at 11:50 AM David Arthur
> > > > > > <da...@confluent.io.invalid> wrote:
> > > > > >
> > > > > > > Jun,
> > > > > > >
> > > > > > > The KIP has some changes to the CLI for KIP-584. With Jason's
> > > > > suggestion
> > > > > > > incorporated, these three commands would look like:
> > > > > > >
> > > > > > > 1) kafka-features.sh upgrade --release latest
> > > > > > > upgrades all known features to their defaults in the latest
> > release
> > > > > > >
> > > > > > > 2) kafka-features.sh downgrade --release 3.x
> > > > > > > downgrade all known features to the default versions from 3.x
> > > > > > >
> > > > > > > 3) kafka-features.sh describe --release latest
> > > > > > > Describe the known features of the latest release
> > > > > > >
> > > > > > > The --upgrade-all and --downgrade-all are replaced by specific
> > each
> > > > > > > feature+version or giving the --release argument. One problem
> > with
> > > > > > > --downgrade-all is it's not clear what the target versions
> should
> > > be:
> > > > > the
> > > > > > > previous version before the last upgrade -- or the lowest
> > supported
> > > > > > > version. Since downgrades will be less common, I think it's
> > better
> > > to
> > > > > > make
> > > > > > > the operator be more explicit about the desired downgrade
> version
> > > > > (either
> > > > > > > through [--feature X --version Y] or [--release 3.1]). Does
> that
> > > seem
> > > > > > > reasonable?
> > > > > > >
> > > > > > > 16. For case C, I think we will want to always use the latest
> > > > > > > metadata.version for new clusters (like we do for IBP). We can
> > > always
> > > > > > > change what we mean by "default" down the road. Also, this will
> > > > likely
> > > > > > > become a bit of information we include in release and upgrade
> > notes
> > > > > with
> > > > > > > each release.
> > > > > > >
> > > > > > > -David
> > > > > > >
> > > > > > > On Thu, Nov 18, 2021 at 1:14 PM Jun Rao
> <jun@confluent.io.invalid
> > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi, Jason, David,
> > > > > > > >
> > > > > > > > Just to clarify on the interaction with the end user, the
> > design
> > > in
> > > > > > > KIP-584
> > > > > > > > allows one to do the following.
> > > > > > > > (1) kafka-features.sh  --upgrade-all --bootstrap-server
> > > > > > > > kafka-broker0.prn1:9071 to upgrade all features to the latest
> > > > version
> > > > > > > known
> > > > > > > > by the tool. The user doesn't need to know a specific feature
> > > > > version.
> > > > > > > > (2) kafka-features.sh  --downgrade-all --bootstrap-server
> > > > > > > > kafka-broker0.prn1:9071 to downgrade all features to the
> latest
> > > > > version
> > > > > > > > known by the tool. The user doesn't need to know a specific
> > > feature
> > > > > > > > version.
> > > > > > > > (3) kafka-features.sh  --describe --bootstrap-server
> > > > > > > > kafka-broker0.prn1:9071 to find out the supported version for
> > > each
> > > > > > > feature.
> > > > > > > > This allows the user to upgrade each feature individually.
> > > > > > > >
> > > > > > > > Most users will be doing (1) and occasionally (2), and won't
> > need
> > > > to
> > > > > > know
> > > > > > > > the exact version of each feature.
> > > > > > > >
> > > > > > > > 16. For case C, what's the default version? Is it always the
> > > > latest?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Nov 18, 2021 at 8:15 AM David Arthur
> > > > > > > > <da...@confluent.io.invalid> wrote:
> > > > > > > >
> > > > > > > > > Colin, thanks for the detailed response. I understand what
> > > you're
> > > > > > > saying
> > > > > > > > > and I agree with your rationale.
> > > > > > > > >
> > > > > > > > > It seems like we could just initialize their
> cluster.metadata
> > > to
> > > > 1
> > > > > > when
> > > > > > > > the
> > > > > > > > > > software is upgraded to 3.2.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Concretely, this means the controller would see that there
> is
> > > no
> > > > > > > > > FeatureLevelRecord in the log, and so it would bootstrap
> one.
> > > > > > Normally
> > > > > > > > for
> > > > > > > > > cluster initialization, this would be read from
> > > meta.properties,
> > > > > but
> > > > > > in
> > > > > > > > the
> > > > > > > > > case of preview clusters we would need to special case it
> to
> > > > > > initialize
> > > > > > > > the
> > > > > > > > > version to 1.
> > > > > > > > >
> > > > > > > > > Once the new FeatureLevelRecord has been committed, any
> nodes
> > > > > > (brokers
> > > > > > > or
> > > > > > > > > controllers) that are running the latest software will
> react
> > to
> > > > the
> > > > > > new
> > > > > > > > > metadata.version. This means we will need to make this
> > initial
> > > > > > version
> > > > > > > > of 1
> > > > > > > > > be backwards compatible to what we have in 3.0 and 3.1
> (since
> > > > some
> > > > > > > > brokers
> > > > > > > > > will be on the new software and some on older/preview
> > versions)
> > > > > > > > >
> > > > > > > > > I agree that auto-upgrading preview users from IBP 3.0 to
> > > > > > > > metadata.version
> > > > > > > > > 1 (equivalent to IBP 3.2) is probably fine.
> > > > > > > > >
> > > > > > > > > Back to Jun's case B:
> > > > > > > > >
> > > > > > > > > b. We upgrade from an old version where no metadata.version
> > has
> > > > > been
> > > > > > > > > > finalized. In this case, it makes sense to leave
> > > > metadata.version
> > > > > > > > > disabled
> > > > > > > > > > since we don't know if all brokers have been upgraded.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Instead of leaving it uninitialized, we would initialize it
> > to
> > > 1
> > > > > > which
> > > > > > > > > would be backwards compatible to 3.0 and 3.1. This would
> > > > eliminate
> > > > > > the
> > > > > > > > need
> > > > > > > > > to check a "final IBP" or deal with 3.2+ clusters without a
> > > > > > > > > metadata.version set. The downgrade path for 3.2 back to a
> > > > preview
> > > > > > > > release
> > > > > > > > > should be fine since we are saying that metadata.version 1
> is
> > > > > > > compatible
> > > > > > > > > with those releases. The FeatureLevelRecord would exist,
> but
> > it
> > > > > would
> > > > > > > be
> > > > > > > > > ignored (we might need to make sure this actually works in
> > > 3.0).
> > > > > > > > >
> > > > > > > > > For clarity, here are the three workflows of Jun's three
> > cases:
> > > > > > > > >
> > > > > > > > > A (KRaft 3.2+ to KRaft 3.2+):
> > > > > > > > > * User initializes cluster with an explicit
> metadata.version
> > X,
> > > > or
> > > > > > the
> > > > > > > > tool
> > > > > > > > > selects the default
> > > > > > > > > * After upgrade, cluster stays at version X until operator
> > > > upgrades
> > > > > > to
> > > > > > > Y
> > > > > > > > >
> > > > > > > > > B (KRaft Preview to KRaft 3.2+):
> > > > > > > > > * User initializes cluster without metadata.version
> > > > > > > > > * After controller leader is upgraded, initializes
> > > > metadata.version
> > > > > > to
> > > > > > > 1
> > > > > > > > > * After the cluster is upgraded, an operator may further
> > > upgrade
> > > > to
> > > > > > > > version
> > > > > > > > > Y
> > > > > > > > >
> > > > > > > > > C (New KRaft 3.2+):
> > > > > > > > > * User initializes cluster with an explicit
> metadata.version
> > X,
> > > > or
> > > > > > the
> > > > > > > > tool
> > > > > > > > > selects the default
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This has been mentioned in this thread, but we should
> > > explicitly
> > > > > call
> > > > > > > out
> > > > > > > > > that the absence of metadata.version in meta.properties
> will
> > be
> > > > > used
> > > > > > > > > to identify that we are in case B. After 3.2, we will
> require
> > > > that
> > > > > > > > > metadata.version is present in meta.properties for new
> > > clusters.
> > > > If
> > > > > > > users
> > > > > > > > > omit the --metadata-version flag from kafka-storage.sh, we
> > > should
> > > > > > fill
> > > > > > > > in a
> > > > > > > > > default.
> > > > > > > > >
> > > > > > > > > So how about the following changes/clarifications to the
> KIP:
> > > > > > > > >
> > > > > > > > > * Starting with 3.2, metadata.version is required in KRaft,
> > IBP
> > > > is
> > > > > > > > ignored
> > > > > > > > > * If meta.properties does not include metadata.version, it
> > > > > indicates
> > > > > > > this
> > > > > > > > > cluster was initialized with a preview release
> > > > > > > > > * If upgrading from a preview release to 3.2+, the
> controller
> > > > will
> > > > > > > > > initialize metadata.version=1
> > > > > > > > > * metadata.version=1 is backwards compatible with preview
> > > > releases
> > > > > > > > >
> > > > > > > > > -David
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Nov 18, 2021 at 10:02 AM David Arthur <
> > > > > > > david.arthur@confluent.io
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Jason,
> > > > > > > > > >
> > > > > > > > > > 1/2. You've got it right. The intention is that
> > > > metadata.version
> > > > > > will
> > > > > > > > > gate
> > > > > > > > > > both RPCs (like IBP does) as well as metadata records.
> So,
> > > > when a
> > > > > > > > broker
> > > > > > > > > > sees that metadata.version changed, it may start
> > advertising
> > > > new
> > > > > > RPCs
> > > > > > > > and
> > > > > > > > > > it will need to refresh the ApiVersions it has for other
> > > > > brokers. I
> > > > > > > can
> > > > > > > > > try
> > > > > > > > > > to make this more explicit in the KIP.
> > > > > > > > > >
> > > > > > > > > > 3. Yes, that's the intention of the --dry-run flag. The
> > > current
> > > > > > > > > > implementation in kafka-features.sh does a dry run on the
> > > > client
> > > > > > > side,
> > > > > > > > > but
> > > > > > > > > > this KIP pushes the validation down to the controller.
> This
> > > > will
> > > > > > > allow
> > > > > > > > us
> > > > > > > > > > to have the context needed to do proper validation
> > > > > > > > > >
> > > > > > > > > > Re: version number complexity -- yes this has come up in
> > > > offline
> > > > > > > > > > discussions. With just one feature flag to manage, it's
> not
> > > so
> > > > > bad,
> > > > > > > but
> > > > > > > > > > explicitly managing even a few would be a burden. I like
> > your
> > > > > > > > suggestion
> > > > > > > > > of
> > > > > > > > > > the "--release" flag. That could act as a sort of
> manifest
> > of
> > > > > > > versions
> > > > > > > > > > (i.e., the defaults from that release). We could also
> > support
> > > > > > > something
> > > > > > > > > > like "kafka-features upgrade --release latest" to bring
> > > > > everything
> > > > > > to
> > > > > > > > the
> > > > > > > > > > highest supported version.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 17, 2021 at 9:36 PM Jason Gustafson
> > > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi Colin/David,
> > > > > > > > > >>
> > > > > > > > > >> > Like David said, basically it boils down to creating a
> > > > feature
> > > > > > > flag
> > > > > > > > > for
> > > > > > > > > >> the new proposed __consumer_offsets version, or using a
> > new
> > > > > > > > > >> IBP/metadata.version for it. Both approaches have pros
> and
> > > > cons.
> > > > > > > Using
> > > > > > > > > an
> > > > > > > > > >> IBP/metadata.version bump reduces the size of the
> testing
> > > > > matrix.
> > > > > > > But
> > > > > > > > > >> using
> > > > > > > > > >> a feature flag allows people to avoid any bugs or pain
> > > > > associated
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > >> change if they don't care about enabling it. This is
> > > basically
> > > > > the
> > > > > > > > > classic
> > > > > > > > > >> "should I use a feature flag or not?" discussion and we
> > need
> > > > to
> > > > > > have
> > > > > > > > it
> > > > > > > > > on
> > > > > > > > > >> a case-by-case basis.
> > > > > > > > > >>
> > > > > > > > > >> I think most users are not going to care to manage
> > versions
> > > > for
> > > > > a
> > > > > > > > bunch
> > > > > > > > > of
> > > > > > > > > >> different features. The IBP today has many shortcomings,
> > but
> > > > at
> > > > > > > least
> > > > > > > > > it's
> > > > > > > > > >> tied to a version that users understand (i.e. the
> release
> > > > > > version).
> > > > > > > > How
> > > > > > > > > >> would users know after upgrading to Kafka 3.1, for
> > example,
> > > > that
> > > > > > > they
> > > > > > > > > need
> > > > > > > > > >> to upgrade the metadata.version to 3  and
> offsets.version
> > > to 4
> > > > > (or
> > > > > > > > > >> whatever)? It's a lot of overhead trying to understand
> all
> > > of
> > > > > the
> > > > > > > > > >> potential
> > > > > > > > > >> features and what each upgrade actually means to them. I
> > am
> > > > > > > wondering
> > > > > > > > if
> > > > > > > > > >> we
> > > > > > > > > >> could give them something more convenient which is tied
> to
> > > the
> > > > > > > release
> > > > > > > > > >> version. For example, maybe we could use a command like
> > > > > > > > `kafka-features
> > > > > > > > > >> upgrade --release 3.1`, which the broker would then
> > > translate
> > > > to
> > > > > > an
> > > > > > > > > >> upgrade
> > > > > > > > > >> to the latest versions of the individual features at the
> > > time
> > > > of
> > > > > > the
> > > > > > > > 3.1
> > > > > > > > > >> release. Basically it's just a static map from release
> > > version
> > > > > to
> > > > > > > > > feature
> > > > > > > > > >> versions to make the upgrade process more convenient.
> > > > > > > > > >>
> > > > > > > > > >> Thanks,
> > > > > > > > > >> Jason
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Wed, Nov 17, 2021 at 6:20 PM Jason Gustafson <
> > > > > > jason@confluent.io
> > > > > > > >
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > A few additional questions:
> > > > > > > > > >> >
> > > > > > > > > >> > 1. Currently the IBP tells us what version of
> individual
> > > > > > > > inter-broker
> > > > > > > > > >> RPCs
> > > > > > > > > >> > will be used. I think the plan in this KIP is to use
> > > > > ApiVersions
> > > > > > > > > request
> > > > > > > > > >> > instead to find the highest compatible version (just
> > like
> > > > > > > clients).
> > > > > > > > > Do I
> > > > > > > > > >> > have that right?
> > > > > > > > > >> >
> > > > > > > > > >> > 2. The following wasn't very clear to me:
> > > > > > > > > >> >
> > > > > > > > > >> > > Brokers will be able to observe changes to
> > > > metadata.version
> > > > > by
> > > > > > > > > >> observing
> > > > > > > > > >> > the metadata log, and could then submit a new
> > > > > ApiVersionsRequest
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > >> > other Kafka nodes.
> > > > > > > > > >> >
> > > > > > > > > >> > Is the purpose of submitting new ApiVersions requests
> to
> > > > > update
> > > > > > > the
> > > > > > > > > >> > features or the RPC versions? Does metadata.version
> also
> > > > > > influence
> > > > > > > > the
> > > > > > > > > >> > versions that a broker advertises? It would help to
> have
> > > > more
> > > > > > > detail
> > > > > > > > > >> about
> > > > > > > > > >> > this.
> > > > > > > > > >> >
> > > > > > > > > >> > 3. I imagine users will want to know before performing
> > an
> > > > > > upgrade
> > > > > > > > > >> whether
> > > > > > > > > >> > downgrading will be safe. Would the --dry-run flag
> tell
> > > them
> > > > > > this?
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks,
> > > > > > > > > >> > Jason
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > On Wed, Nov 17, 2021 at 3:55 PM Colin McCabe <
> > > > > > cmccabe@apache.org>
> > > > > > > > > >> wrote:
> > > > > > > > > >> >
> > > > > > > > > >> >> On Wed, Nov 17, 2021, at 11:28, Jason Gustafson
> wrote:
> > > > > > > > > >> >> > Hi David,
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Forgive me if this ground has been covered already.
> > > > Today,
> > > > > we
> > > > > > > > have
> > > > > > > > > a
> > > > > > > > > >> few
> > > > > > > > > >> >> > other things that we have latched onto the IBP,
> such
> > as
> > > > > > > upgrades
> > > > > > > > to
> > > > > > > > > >> the
> > > > > > > > > >> >> > format of records in __consumer_offsets. I've been
> > > > assuming
> > > > > > > that
> > > > > > > > > >> >> > metadata.version is not covering this. Is that
> right
> > or
> > > > is
> > > > > > > there
> > > > > > > > > some
> > > > > > > > > >> >> other
> > > > > > > > > >> >> > plan to take care of cases like this?
> > > > > > > > > >> >> >
> > > > > > > > > >> >>
> > > > > > > > > >> >> I think metadata.version could cover changes to
> things
> > > like
> > > > > > > > > >> >> __consumer_offsets, if people want it to. Or to put
> it
> > > > > another
> > > > > > > way,
> > > > > > > > > >> that is
> > > > > > > > > >> >> out of scope for this KIP.
> > > > > > > > > >> >>
> > > > > > > > > >> >> Like David said, basically it boils down to creating
> a
> > > > > feature
> > > > > > > flag
> > > > > > > > > for
> > > > > > > > > >> >> the new proposed __consumer_offsets version, or
> using a
> > > new
> > > > > > > > > >> >> IBP/metadata.version for it. Both approaches have
> pros
> > > and
> > > > > > cons.
> > > > > > > > > Using
> > > > > > > > > >> an
> > > > > > > > > >> >> IBP/metadata.version bump reduces the size of the
> > testing
> > > > > > matrix.
> > > > > > > > But
> > > > > > > > > >> using
> > > > > > > > > >> >> a feature flag allows people to avoid any bugs or
> pain
> > > > > > associated
> > > > > > > > > with
> > > > > > > > > >> the
> > > > > > > > > >> >> change if they don't care about enabling it. This is
> > > > > basically
> > > > > > > the
> > > > > > > > > >> classic
> > > > > > > > > >> >> "should I use a feature flag or not?" discussion and
> we
> > > > need
> > > > > to
> > > > > > > > have
> > > > > > > > > >> it on
> > > > > > > > > >> >> a case-by-case basis.
> > > > > > > > > >> >>
> > > > > > > > > >> >> I think it's worth calling out that having a 1:1
> > mapping
> > > > > > between
> > > > > > > > IBP
> > > > > > > > > >> >> versions and metadata.versions will result in some
> > > > > > > > metadata.versions
> > > > > > > > > >> that
> > > > > > > > > >> >> "don't do anything" (aka they do the same thing as
> the
> > > > > previous
> > > > > > > > > >> >> metadata.version). For example, if we change
> > > > > StopReplicaRequest
> > > > > > > > > again,
> > > > > > > > > >> that
> > > > > > > > > >> >> will not affect KRaft mode, but probably would
> require
> > an
> > > > IBP
> > > > > > > bump
> > > > > > > > > and
> > > > > > > > > >> >> hence a metadata.version bump. I think that's OK.
> It's
> > > not
> > > > > that
> > > > > > > > > >> different
> > > > > > > > > >> >> from updating your IBP and getting support for
> > ZStandard,
> > > > > when
> > > > > > > your
> > > > > > > > > >> >> deployment doesn't use ZStandard compression.
> > > > > > > > > >> >>
> > > > > > > > > >> >> best,
> > > > > > > > > >> >> Colin
> > > > > > > > > >> >>
> > > > > > > > > >> >> > Thanks,
> > > > > > > > > >> >> > Jason
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > On Wed, Nov 17, 2021 at 10:17 AM Jun Rao
> > > > > > > > <jun@confluent.io.invalid
> > > > > > > > > >
> > > > > > > > > >> >> wrote:
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >> Hi, Colin,
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> Thanks for the reply.
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> For case b, I am not sure that I understand your
> > > > > suggestion.
> > > > > > > > Does
> > > > > > > > > >> "each
> > > > > > > > > >> >> >> subsequent level for metadata.version corresponds
> to
> > > an
> > > > > IBP
> > > > > > > > > version"
> > > > > > > > > >> >> mean
> > > > > > > > > >> >> >> that we need to keep IBP forever? Could you
> describe
> > > the
> > > > > > > upgrade
> > > > > > > > > >> >> process in
> > > > > > > > > >> >> >> this case?
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> Jun
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> On Tue, Nov 16, 2021 at 3:45 PM Colin McCabe <
> > > > > > > > cmccabe@apache.org>
> > > > > > > > > >> >> wrote:
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> > On Tue, Nov 16, 2021, at 15:13, Jun Rao wrote:
> > > > > > > > > >> >> >> > > Hi, David, Colin,
> > > > > > > > > >> >> >> > >
> > > > > > > > > >> >> >> > > Thanks for the reply.
> > > > > > > > > >> >> >> > >
> > > > > > > > > >> >> >> > > 16. Discussed with David offline a bit. We
> have
> > 3
> > > > > cases.
> > > > > > > > > >> >> >> > > a. We upgrade from an old version where the
> > > > > > > metadata.version
> > > > > > > > > has
> > > > > > > > > >> >> >> already
> > > > > > > > > >> >> >> > > been finalized. In this case it makes sense to
> > > stay
> > > > > with
> > > > > > > > that
> > > > > > > > > >> >> feature
> > > > > > > > > >> >> >> > > version after the upgrade.
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > +1
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > > b. We upgrade from an old version where no
> > > > > > > metadata.version
> > > > > > > > > has
> > > > > > > > > >> >> been
> > > > > > > > > >> >> >> > > finalized. In this case, it makes sense to
> leave
> > > > > > > > > >> metadata.version
> > > > > > > > > >> >> >> > disabled
> > > > > > > > > >> >> >> > > since we don't know if all brokers have been
> > > > upgraded.
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > This is the scenario I was hoping to avoid by
> > saying
> > > > > that
> > > > > > > ALL
> > > > > > > > > >> KRaft
> > > > > > > > > >> >> >> > clusters have metadata.version of at least 1,
> and
> > > each
> > > > > > > > > subsequent
> > > > > > > > > >> >> level
> > > > > > > > > >> >> >> for
> > > > > > > > > >> >> >> > metadata.version corresponds to an IBP version.
> > The
> > > > > > existing
> > > > > > > > > KRaft
> > > > > > > > > >> >> >> clusters
> > > > > > > > > >> >> >> > in 3.0 and earlier are preview (not for
> > production)
> > > > so I
> > > > > > > think
> > > > > > > > > >> this
> > > > > > > > > >> >> >> change
> > > > > > > > > >> >> >> > is OK for 3.x (given that it affects only
> KRaft).
> > > Then
> > > > > IBP
> > > > > > > is
> > > > > > > > > >> >> irrelevant
> > > > > > > > > >> >> >> > for KRaft clusters (the config is ignored,
> > possibly
> > > > > with a
> > > > > > > > WARN
> > > > > > > > > or
> > > > > > > > > >> >> ERROR
> > > > > > > > > >> >> >> > message generated if it is set).
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > > c. We are starting from a brand new cluster
> and
> > of
> > > > > > course
> > > > > > > no
> > > > > > > > > >> >> >> > > metadata.version has been finalized. In this
> > case,
> > > > the
> > > > > > KIP
> > > > > > > > > says
> > > > > > > > > >> it
> > > > > > > > > >> >> will
> > > > > > > > > >> >> >> > > pick the metadata.version in meta.properties.
> In
> > > the
> > > > > > > common
> > > > > > > > > >> case,
> > > > > > > > > >> >> >> people
> > > > > > > > > >> >> >> > > probably won't set the metadata.version in the
> > > > > > > > meta.properties
> > > > > > > > > >> file
> > > > > > > > > >> >> >> > > explicitly. So, it will be useful to put a
> > default
> > > > > > > (stable)
> > > > > > > > > >> version
> > > > > > > > > >> >> >> there
> > > > > > > > > >> >> >> > > when the meta.properties.
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > Hmm. I was assuming that clusters where the
> admin
> > > > didn't
> > > > > > > > specify
> > > > > > > > > >> any
> > > > > > > > > >> >> >> > metadata.version during formatting would get the
> > > > latest
> > > > > > > > > >> >> metadata.version.
> > > > > > > > > >> >> >> > Partly, because this is what we do for IBP
> today.
> > It
> > > > > would
> > > > > > > be
> > > > > > > > > >> good to
> > > > > > > > > >> >> >> > clarify this...
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > >
> > > > > > > > > >> >> >> > > Also, it would be useful to clarify that if a
> > > > > > > > > FeatureLevelRecord
> > > > > > > > > >> >> exists
> > > > > > > > > >> >> >> > for
> > > > > > > > > >> >> >> > > metadata.version, the metadata.version in
> > > > > > meta.properties
> > > > > > > > will
> > > > > > > > > >> be
> > > > > > > > > >> >> >> > ignored.
> > > > > > > > > >> >> >> > >
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > Yeah, I agree.
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > best,
> > > > > > > > > >> >> >> > Colin
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >> > > Thanks,
> > > > > > > > > >> >> >> > >
> > > > > > > > > >> >> >> > > Jun
> > > > > > > > > >> >> >> > >
> > > > > > > > > >> >> >> > >
> > > > > > > > > >> >> >> > > On Tue, Nov 16, 2021 at 12:39 PM Colin McCabe
> <
> > > > > > > > > >> cmccabe@apache.org>
> > > > > > > > > >> >> >> > wrote:
> > > > > > > > > >> >> >> > >
> > > > > > > > > >> >> >> > >> On Fri, Nov 5, 2021, at 15:18, Jun Rao wrote:
> > > > > > > > > >> >> >> > >> > Hi, David,
> > > > > > > > > >> >> >> > >> >
> > > > > > > > > >> >> >> > >> > Thanks for the reply.
> > > > > > > > > >> >> >> > >> >
> > > > > > > > > >> >> >> > >> > 16. My first concern is that the KIP picks
> up
> > > > > > > > meta.version
> > > > > > > > > >> >> >> > inconsistently
> > > > > > > > > >> >> >> > >> > during the deployment. If a new cluster is
> > > > started,
> > > > > > we
> > > > > > > > pick
> > > > > > > > > >> up
> > > > > > > > > >> >> the
> > > > > > > > > >> >> >> > >> highest
> > > > > > > > > >> >> >> > >> > version. If we upgrade, we leave the
> feature
> > > > > version
> > > > > > > > > >> unchanged.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> Hi Jun,
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> Thanks again for taking a look.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> The proposed behavior in KIP-778 is
> consistent
> > > with
> > > > > how
> > > > > > > it
> > > > > > > > > >> works
> > > > > > > > > >> >> >> today.
> > > > > > > > > >> >> >> > >> Upgrading the software is distinct from
> > upgrading
> > > > the
> > > > > > > IBP.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> I think it is important to keep these two
> > > > operations
> > > > > > > > > >> ("upgrading
> > > > > > > > > >> >> >> > >> IBP/metadata version" and "upgrading software
> > > > > version")
> > > > > > > > > >> separate.
> > > > > > > > > >> >> If
> > > > > > > > > >> >> >> > they
> > > > > > > > > >> >> >> > >> are coupled it will create a situation where
> > > > software
> > > > > > > > > upgrades
> > > > > > > > > >> are
> > > > > > > > > >> >> >> > >> difficult and dangerous.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> Consider a situation where you find some bug
> in
> > > > your
> > > > > > > > current
> > > > > > > > > >> >> software,
> > > > > > > > > >> >> >> > and
> > > > > > > > > >> >> >> > >> you want to upgrade to new software that
> fixes
> > > the
> > > > > bug.
> > > > > > > If
> > > > > > > > > >> >> upgrades
> > > > > > > > > >> >> >> and
> > > > > > > > > >> >> >> > IBP
> > > > > > > > > >> >> >> > >> bumps are coupled, you can't do this without
> > also
> > > > > > bumping
> > > > > > > > the
> > > > > > > > > >> IBP,
> > > > > > > > > >> >> >> > which is
> > > > > > > > > >> >> >> > >> usually considered a high-risk change. That
> > means
> > > > > that
> > > > > > > > either
> > > > > > > > > >> you
> > > > > > > > > >> >> have
> > > > > > > > > >> >> >> > to
> > > > > > > > > >> >> >> > >> make a special build that includes only the
> fix
> > > > > > > > > (time-consuming
> > > > > > > > > >> >> and
> > > > > > > > > >> >> >> > >> error-prone), live with the bug for longer,
> or
> > be
> > > > > very
> > > > > > > > > >> >> conservative
> > > > > > > > > >> >> >> > about
> > > > > > > > > >> >> >> > >> ever introducing new IBP/metadata versions.
> > None
> > > of
> > > > > > those
> > > > > > > > are
> > > > > > > > > >> >> really
> > > > > > > > > >> >> >> > good
> > > > > > > > > >> >> >> > >> choices.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> > Intuitively, it seems that independent of
> > how a
> > > > > > cluster
> > > > > > > > is
> > > > > > > > > >> >> deployed,
> > > > > > > > > >> >> >> > we
> > > > > > > > > >> >> >> > >> > should always pick the same feature
> version.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> I think it makes sense to draw a distinction
> > > > between
> > > > > > > > > upgrading
> > > > > > > > > >> an
> > > > > > > > > >> >> >> > existing
> > > > > > > > > >> >> >> > >> cluster and deploying a new one. What most
> > people
> > > > > want
> > > > > > > out
> > > > > > > > of
> > > > > > > > > >> >> upgrades
> > > > > > > > > >> >> >> > is
> > > > > > > > > >> >> >> > >> that things should keep working, but with bug
> > > > fixes.
> > > > > If
> > > > > > > we
> > > > > > > > > >> change
> > > > > > > > > >> >> >> that,
> > > > > > > > > >> >> >> > it
> > > > > > > > > >> >> >> > >> just makes people more reluctant to upgrade
> > > (which
> > > > is
> > > > > > > > always
> > > > > > > > > a
> > > > > > > > > >> >> >> > problem...)
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> > I think we need to think this through in
> this
> > > > KIP.
> > > > > My
> > > > > > > > > second
> > > > > > > > > >> >> concern
> > > > > > > > > >> >> >> > is
> > > > > > > > > >> >> >> > >> > that as a particular version matures, it's
> > > > > > inconvenient
> > > > > > > > > for a
> > > > > > > > > >> >> user
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >> > >> manually
> > > > > > > > > >> >> >> > >> > upgrade every feature version. As long as
> we
> > > > have a
> > > > > > > path
> > > > > > > > to
> > > > > > > > > >> >> achieve
> > > > > > > > > >> >> >> > that
> > > > > > > > > >> >> >> > >> in
> > > > > > > > > >> >> >> > >> > the future, we don't need to address that
> in
> > > this
> > > > > > KIP.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> If people are managing a large number of
> Kafka
> > > > > > clusters,
> > > > > > > > they
> > > > > > > > > >> will
> > > > > > > > > >> >> >> want
> > > > > > > > > >> >> >> > to
> > > > > > > > > >> >> >> > >> do some sort of A/B testing with IBP/metadata
> > > > > versions.
> > > > > > > So
> > > > > > > > if
> > > > > > > > > >> you
> > > > > > > > > >> >> have
> > > > > > > > > >> >> >> > 1000
> > > > > > > > > >> >> >> > >> Kafka clusters, you roll out the new IBP
> > version
> > > to
> > > > > 10
> > > > > > of
> > > > > > > > > them
> > > > > > > > > >> >> and see
> > > > > > > > > >> >> >> > how
> > > > > > > > > >> >> >> > >> it goes. If that goes well, you roll it out
> to
> > > > more,
> > > > > > etc.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> So, the automation needs to be at the cluster
> > > > > > management
> > > > > > > > > layer,
> > > > > > > > > >> >> not at
> > > > > > > > > >> >> >> > the
> > > > > > > > > >> >> >> > >> Kafka layer. Each Kafka cluster doesn't know
> > how
> > > > well
> > > > > > > > things
> > > > > > > > > >> went
> > > > > > > > > >> >> in
> > > > > > > > > >> >> >> the
> > > > > > > > > >> >> >> > >> other 999 clusters. Automatically upgrading
> is
> > a
> > > > bad
> > > > > > > thing
> > > > > > > > > for
> > > > > > > > > >> the
> > > > > > > > > >> >> >> same
> > > > > > > > > >> >> >> > >> reason Kafka automatically upgrading its own
> > > > software
> > > > > > > > version
> > > > > > > > > >> >> would
> > > > > > > > > >> >> >> be a
> > > > > > > > > >> >> >> > >> bad thing -- it could lead to a disruption
> to a
> > > > > > sensitive
> > > > > > > > > >> cluster
> > > > > > > > > >> >> at
> > > > > > > > > >> >> >> the
> > > > > > > > > >> >> >> > >> wrong time.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> For people who are just managing one or two
> > Kafka
> > > > > > > clusters,
> > > > > > > > > >> >> >> > automatically
> > > > > > > > > >> >> >> > >> upgrading feature versions isn't a big burden
> > and
> > > > can
> > > > > > be
> > > > > > > > done
> > > > > > > > > >> >> >> manually.
> > > > > > > > > >> >> >> > >> This is all consistent with how IBP works
> > today.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> Also, we already have a command-line option
> to
> > > the
> > > > > > > feature
> > > > > > > > > tool
> > > > > > > > > >> >> which
> > > > > > > > > >> >> >> > >> upgrades all features to the latest
> available,
> > if
> > > > > that
> > > > > > is
> > > > > > > > > what
> > > > > > > > > >> the
> > > > > > > > > >> >> >> > >> administrator desires. Perhaps we could add
> > > > > > documentation
> > > > > > > > > >> saying
> > > > > > > > > >> >> that
> > > > > > > > > >> >> >> > this
> > > > > > > > > >> >> >> > >> should be done as the last step of the
> upgrade.
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> best,
> > > > > > > > > >> >> >> > >> Colin
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> > >> >
> > > > > > > > > >> >> >> > >> > 21. "./kafka-features.sh delete": Deleting
> a
> > > > > feature
> > > > > > > > seems
> > > > > > > > > a
> > > > > > > > > >> bit
> > > > > > > > > >> >> >> weird
> > > > > > > > > >> >> >> > >> > since the logic is always there. Would it
> be
> > > > better
> > > > > > to
> > > > > > > > use
> > > > > > > > > >> >> disable?
> > > > > > > > > >> >> >> > >> >
> > > > > > > > > >> >> >> > >> > Jun
> > > > > > > > > >> >> >> > >> >
> > > > > > > > > >> >> >> > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur
> > > > > > > > > >> >> >> > >> > <da...@confluent.io.invalid> wrote:
> > > > > > > > > >> >> >> > >> >
> > > > > > > > > >> >> >> > >> >> Colin and Jun, thanks for the additional
> > > > comments!
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> Colin:
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> > We've been talking about having an
> > automated
> > > > RPC
> > > > > > > > > >> >> compatibility
> > > > > > > > > >> >> >> > checker
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> Do we have a way to mark fields in schemas
> > as
> > > > > > > > deprecated?
> > > > > > > > > It
> > > > > > > > > >> >> can
> > > > > > > > > >> >> >> > stay in
> > > > > > > > > >> >> >> > >> >> the RPC, it just complicates the logic a
> > bit.
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> > It would be nice if the active
> controller
> > > > could
> > > > > > > > validate
> > > > > > > > > >> >> that a
> > > > > > > > > >> >> >> > >> majority
> > > > > > > > > >> >> >> > >> >> of the quorum could use the proposed
> > > > > > metadata.version.
> > > > > > > > The
> > > > > > > > > >> >> active
> > > > > > > > > >> >> >> > >> >> controller should have this information,
> > > right?
> > > > If
> > > > > > we
> > > > > > > > > don't
> > > > > > > > > >> >> have
> > > > > > > > > >> >> >> > recent
> > > > > > > > > >> >> >> > >> >> information  from a quorum of voters, we
> > > > wouldn't
> > > > > be
> > > > > > > > > active.
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> I believe we should have this information
> > from
> > > > the
> > > > > > > > > >> >> >> > ApiVersionsResponse.
> > > > > > > > > >> >> >> > >> It
> > > > > > > > > >> >> >> > >> >> would be good to do this validation to
> > avoid a
> > > > > > > situation
> > > > > > > > > >> where
> > > > > > > > > >> >> a
> > > > > > > > > >> >> >> > >> >> quorum leader can't be elected due to
> > > > > unprocessable
> > > > > > > > > records.
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> > Do we need delete as a command separate
> > from
> > > > > > > > downgrade?
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> I think from an operator's perspective, it
> > is
> > > > nice
> > > > > > to
> > > > > > > > > >> >> distinguish
> > > > > > > > > >> >> >> > >> between
> > > > > > > > > >> >> >> > >> >> changing a feature flag and unsetting it.
> It
> > > > might
> > > > > > be
> > > > > > > > > >> >> surprising to
> > > > > > > > > >> >> >> > an
> > > > > > > > > >> >> >> > >> >> operator to see the flag's version set to
> > > > nothing
> > > > > > when
> > > > > > > > > they
> > > > > > > > > >> >> >> requested
> > > > > > > > > >> >> >> > >> the
> > > > > > > > > >> >> >> > >> >> downgrade to version 0 (or less).
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> > it seems like we should spell out that
> > > > > > > > metadata.version
> > > > > > > > > >> >> begins at
> > > > > > > > > >> >> >> > 1 in
> > > > > > > > > >> >> >> > >> >> KRaft clusters
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> I added this text:
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> Introduce an IBP version to indicate the
> > > lowest
> > > > > > > software
> > > > > > > > > >> >> version
> > > > > > > > > >> >> >> that
> > > > > > > > > >> >> >> > >> >> > supports *metadata.version*. Below this
> > IBP,
> > > > the
> > > > > > > > > >> >> >> > *metadata.version* is
> > > > > > > > > >> >> >> > >> >> > undefined and will not be examined. At
> or
> > > > above
> > > > > > this
> > > > > > > > > IBP,
> > > > > > > > > >> the
> > > > > > > > > >> >> >> > >> >> > *metadata.version* must be *0* for
> > ZooKeeper
> > > > > > > clusters
> > > > > > > > > and
> > > > > > > > > >> >> will be
> > > > > > > > > >> >> >> > >> >> > initialized as *1* for KRaft clusters.
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> > We probably also want an RPC implemented
> > by
> > > > both
> > > > > > > > brokers
> > > > > > > > > >> and
> > > > > > > > > >> >> >> > >> controllers
> > > > > > > > > >> >> >> > >> >> that will reveal the min and max supported
> > > > > versions
> > > > > > > for
> > > > > > > > > each
> > > > > > > > > >> >> >> feature
> > > > > > > > > >> >> >> > >> level
> > > > > > > > > >> >> >> > >> >> supported by the server
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> This is available in ApiVersionsResponse
> (we
> > > > > include
> > > > > > > the
> > > > > > > > > >> >> server's
> > > > > > > > > >> >> >> > >> supported
> > > > > > > > > >> >> >> > >> >> features as well as the cluster's
> finalized
> > > > > > features)
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> --------
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> Jun:
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> 12. I've updated the KIP with AdminClient
> > > > changes
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> 14. You're right, it looks like I missed a
> > few
> > > > > > > sections
> > > > > > > > > >> >> regarding
> > > > > > > > > >> >> >> > >> snapshot
> > > > > > > > > >> >> >> > >> >> generation. I've corrected it
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> 16. This feels more like an enhancement to
> > > > > KIP-584.
> > > > > > I
> > > > > > > > > agree
> > > > > > > > > >> it
> > > > > > > > > >> >> >> could
> > > > > > > > > >> >> >> > be
> > > > > > > > > >> >> >> > >> >> useful, but perhaps we could address it
> > > > separately
> > > > > > > from
> > > > > > > > > >> KRaft
> > > > > > > > > >> >> >> > upgrades?
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> 20. Indeed snapshots are not strictly
> > > necessary
> > > > > > during
> > > > > > > > an
> > > > > > > > > >> >> upgrade,
> > > > > > > > > >> >> >> > I've
> > > > > > > > > >> >> >> > >> >> reworded this
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> Thanks!
> > > > > > > > > >> >> >> > >> >> David
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> On Thu, Nov 4, 2021 at 6:51 PM Jun Rao
> > > > > > > > > >> >> <ju...@confluent.io.invalid>
> > > > > > > > > >> >> >> > >> wrote:
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> > Hi, David, Jose and Colin,
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > Thanks for the reply. A few more
> comments.
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > 12. It seems that we haven't updated the
> > > > > > AdminClient
> > > > > > > > > >> >> accordingly?
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > 14. "Metadata snapshot is generated and
> > sent
> > > > to
> > > > > > the
> > > > > > > > > other
> > > > > > > > > >> >> >> inactive
> > > > > > > > > >> >> >> > >> >> > controllers and to brokers". I thought
> we
> > > > wanted
> > > > > > > each
> > > > > > > > > >> broker
> > > > > > > > > >> >> to
> > > > > > > > > >> >> >> > >> generate
> > > > > > > > > >> >> >> > >> >> > its own snapshot independently? If only
> > the
> > > > > > > controller
> > > > > > > > > >> >> generates
> > > > > > > > > >> >> >> > the
> > > > > > > > > >> >> >> > >> >> > snapshot, how do we force other brokers
> to
> > > > pick
> > > > > it
> > > > > > > up?
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > 16. If a feature version is new, one may
> > not
> > > > > want
> > > > > > to
> > > > > > > > > >> enable
> > > > > > > > > >> >> it
> > > > > > > > > >> >> >> > >> >> immediately
> > > > > > > > > >> >> >> > >> >> > after the cluster is upgraded. However,
> > if a
> > > > > > feature
> > > > > > > > > >> version
> > > > > > > > > >> >> has
> > > > > > > > > >> >> >> > been
> > > > > > > > > >> >> >> > >> >> > stable, requiring every user to run a
> > > command
> > > > to
> > > > > > > > upgrade
> > > > > > > > > >> to
> > > > > > > > > >> >> that
> > > > > > > > > >> >> >> > >> version
> > > > > > > > > >> >> >> > >> >> > seems inconvenient. One way to improve
> > this
> > > is
> > > > > for
> > > > > > > > each
> > > > > > > > > >> >> feature
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >> > >> define
> > > > > > > > > >> >> >> > >> >> > one version as the default. Then, when
> we
> > > > > upgrade
> > > > > > a
> > > > > > > > > >> cluster,
> > > > > > > > > >> >> we
> > > > > > > > > >> >> >> > will
> > > > > > > > > >> >> >> > >> >> > automatically upgrade the feature to the
> > > > default
> > > > > > > > > version.
> > > > > > > > > >> An
> > > > > > > > > >> >> >> admin
> > > > > > > > > >> >> >> > >> could
> > > > > > > > > >> >> >> > >> >> > use the tool to upgrade to a version
> > higher
> > > > than
> > > > > > the
> > > > > > > > > >> default.
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > 20. "The quorum controller can assist
> with
> > > > this
> > > > > > > > process
> > > > > > > > > by
> > > > > > > > > >> >> >> > generating
> > > > > > > > > >> >> >> > >> a
> > > > > > > > > >> >> >> > >> >> > metadata snapshot after a
> metadata.version
> > > > > > increase
> > > > > > > > has
> > > > > > > > > >> been
> > > > > > > > > >> >> >> > >> committed to
> > > > > > > > > >> >> >> > >> >> > the metadata log. This snapshot will be
> a
> > > > > > convenient
> > > > > > > > way
> > > > > > > > > >> to
> > > > > > > > > >> >> let
> > > > > > > > > >> >> >> > broker
> > > > > > > > > >> >> >> > >> >> and
> > > > > > > > > >> >> >> > >> >> > controller components rebuild their
> entire
> > > > > > in-memory
> > > > > > > > > state
> > > > > > > > > >> >> >> > following
> > > > > > > > > >> >> >> > >> an
> > > > > > > > > >> >> >> > >> >> > upgrade." The new version of the
> software
> > > > could
> > > > > > read
> > > > > > > > > both
> > > > > > > > > >> >> the new
> > > > > > > > > >> >> >> > and
> > > > > > > > > >> >> >> > >> the
> > > > > > > > > >> >> >> > >> >> > old version. Is generating a new
> snapshot
> > > > during
> > > > > > > > upgrade
> > > > > > > > > >> >> needed?
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > Jun
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > On Wed, Nov 3, 2021 at 5:42 PM Colin
> > McCabe
> > > <
> > > > > > > > > >> >> cmccabe@apache.org>
> > > > > > > > > >> >> >> > >> wrote:
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > > On Tue, Oct 12, 2021, at 10:34, Jun
> Rao
> > > > wrote:
> > > > > > > > > >> >> >> > >> >> > > > Hi, David,
> > > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > > >> >> >> > >> >> > > > One more comment.
> > > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > > >> >> >> > >> >> > > > 16. The main reason why KIP-584
> > requires
> > > > > > > > finalizing
> > > > > > > > > a
> > > > > > > > > >> >> feature
> > > > > > > > > >> >> >> > >> >> manually
> > > > > > > > > >> >> >> > >> >> > is
> > > > > > > > > >> >> >> > >> >> > > > that in the ZK world, the controller
> > > > doesn't
> > > > > > > know
> > > > > > > > > all
> > > > > > > > > >> >> brokers
> > > > > > > > > >> >> >> > in a
> > > > > > > > > >> >> >> > >> >> > > cluster.
> > > > > > > > > >> >> >> > >> >> > > > A broker temporarily down is not
> > > > registered
> > > > > in
> > > > > > > ZK.
> > > > > > > > > in
> > > > > > > > > >> the
> > > > > > > > > >> >> >> KRaft
> > > > > > > > > >> >> >> > >> >> world,
> > > > > > > > > >> >> >> > >> >> > > the
> > > > > > > > > >> >> >> > >> >> > > > controller keeps track of all
> brokers,
> > > > > > including
> > > > > > > > > those
> > > > > > > > > >> >> that
> > > > > > > > > >> >> >> are
> > > > > > > > > >> >> >> > >> >> > > temporarily
> > > > > > > > > >> >> >> > >> >> > > > down. This makes it possible for the
> > > > > > controller
> > > > > > > to
> > > > > > > > > >> >> >> > automatically
> > > > > > > > > >> >> >> > >> >> > > finalize a
> > > > > > > > > >> >> >> > >> >> > > > feature---it's safe to do so when
> all
> > > > > brokers
> > > > > > > > > support
> > > > > > > > > >> >> that
> > > > > > > > > >> >> >> > >> feature.
> > > > > > > > > >> >> >> > >> >> > This
> > > > > > > > > >> >> >> > >> >> > > > will make the upgrade process much
> > > simpler
> > > > > > since
> > > > > > > > no
> > > > > > > > > >> >> manual
> > > > > > > > > >> >> >> > >> command is
> > > > > > > > > >> >> >> > >> >> > > > required to turn on a new feature.
> > Have
> > > we
> > > > > > > > > considered
> > > > > > > > > >> >> this?
> > > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > > >> >> >> > >> >> > > > Thanks,
> > > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > > >> >> >> > >> >> > > > Jun
> > > > > > > > > >> >> >> > >> >> > >
> > > > > > > > > >> >> >> > >> >> > > Hi Jun,
> > > > > > > > > >> >> >> > >> >> > >
> > > > > > > > > >> >> >> > >> >> > > I guess David commented on this point
> > > > already,
> > > > > > but
> > > > > > > > > I'll
> > > > > > > > > >> >> comment
> > > > > > > > > >> >> >> > as
> > > > > > > > > >> >> >> > >> >> well.
> > > > > > > > > >> >> >> > >> >> > I
> > > > > > > > > >> >> >> > >> >> > > always had the perception that users
> > > viewed
> > > > > > rolls
> > > > > > > as
> > > > > > > > > >> >> >> potentially
> > > > > > > > > >> >> >> > >> risky
> > > > > > > > > >> >> >> > >> >> > and
> > > > > > > > > >> >> >> > >> >> > > were looking for ways to reduce the
> > risk.
> > > > Not
> > > > > > > > enabling
> > > > > > > > > >> >> features
> > > > > > > > > >> >> >> > >> right
> > > > > > > > > >> >> >> > >> >> > away
> > > > > > > > > >> >> >> > >> >> > > after installing new software seems
> like
> > > one
> > > > > way
> > > > > > > to
> > > > > > > > do
> > > > > > > > > >> >> that. If
> > > > > > > > > >> >> >> > we
> > > > > > > > > >> >> >> > >> had
> > > > > > > > > >> >> >> > >> >> a
> > > > > > > > > >> >> >> > >> >> > > feature to automatically upgrade
> during
> > a
> > > > > roll,
> > > > > > > I'm
> > > > > > > > > not
> > > > > > > > > >> >> sure
> > > > > > > > > >> >> >> > that I
> > > > > > > > > >> >> >> > >> >> would
> > > > > > > > > >> >> >> > >> >> > > recommend that people use it, because
> if
> > > > > > something
> > > > > > > > > >> fails,
> > > > > > > > > >> >> it
> > > > > > > > > >> >> >> > makes
> > > > > > > > > >> >> >> > >> it
> > > > > > > > > >> >> >> > >> >> > > harder to tell if the new feature is
> at
> > > > fault,
> > > > > > or
> > > > > > > > > >> something
> > > > > > > > > >> >> >> else
> > > > > > > > > >> >> >> > in
> > > > > > > > > >> >> >> > >> the
> > > > > > > > > >> >> >> > >> >> > new
> > > > > > > > > >> >> >> > >> >> > > software.
> > > > > > > > > >> >> >> > >> >> > >
> > > > > > > > > >> >> >> > >> >> > > We already tell users to do a "double
> > > roll"
> > > > > when
> > > > > > > > going
> > > > > > > > > >> to
> > > > > > > > > >> >> a new
> > > > > > > > > >> >> >> > IBP.
> > > > > > > > > >> >> >> > >> >> > (Just
> > > > > > > > > >> >> >> > >> >> > > to give background to people who
> haven't
> > > > heard
> > > > > > > that
> > > > > > > > > >> >> phrase, the
> > > > > > > > > >> >> >> > >> first
> > > > > > > > > >> >> >> > >> >> > roll
> > > > > > > > > >> >> >> > >> >> > > installs the new software, and the
> > second
> > > > roll
> > > > > > > > updates
> > > > > > > > > >> the
> > > > > > > > > >> >> >> IBP).
> > > > > > > > > >> >> >> > So
> > > > > > > > > >> >> >> > >> >> this
> > > > > > > > > >> >> >> > >> >> > > KIP-778 mechanism is basically very
> > > similar
> > > > to
> > > > > > > that,
> > > > > > > > > >> >> except the
> > > > > > > > > >> >> >> > >> second
> > > > > > > > > >> >> >> > >> >> > > thing isn't a roll, but just an
> upgrade
> > > > > command.
> > > > > > > So
> > > > > > > > I
> > > > > > > > > >> think
> > > > > > > > > >> >> >> this
> > > > > > > > > >> >> >> > is
> > > > > > > > > >> >> >> > >> >> > > consistent with what we currently do.
> > > > > > > > > >> >> >> > >> >> > >
> > > > > > > > > >> >> >> > >> >> > > Also, just like David said, we can
> > always
> > > > add
> > > > > > > > > >> auto-upgrade
> > > > > > > > > >> >> >> later
> > > > > > > > > >> >> >> > if
> > > > > > > > > >> >> >> > >> >> there
> > > > > > > > > >> >> >> > >> >> > > is demand...
> > > > > > > > > >> >> >> > >> >> > >
> > > > > > > > > >> >> >> > >> >> > > best,
> > > > > > > > > >> >> >> > >> >> > > Colin
> > > > > > > > > >> >> >> > >> >> > >
> > > > > > > > > >> >> >> > >> >> > >
> > > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > > >> >> >> > >> >> > > > On Thu, Oct 7, 2021 at 5:19 PM Jun
> > Rao <
> > > > > > > > > >> jun@confluent.io
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >> > wrote:
> > > > > > > > > >> >> >> > >> >> > > >
> > > > > > > > > >> >> >> > >> >> > > >> Hi, David,
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> Thanks for the KIP. A few comments
> > > below.
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> 10. It would be useful to describe
> > how
> > > > the
> > > > > > > > > controller
> > > > > > > > > >> >> node
> > > > > > > > > >> >> >> > >> >> determines
> > > > > > > > > >> >> >> > >> >> > > the
> > > > > > > > > >> >> >> > >> >> > > >> RPC version used to communicate to
> > > other
> > > > > > > > controller
> > > > > > > > > >> >> nodes.
> > > > > > > > > >> >> >> > There
> > > > > > > > > >> >> >> > >> >> seems
> > > > > > > > > >> >> >> > >> >> > > to
> > > > > > > > > >> >> >> > >> >> > > >> be a bootstrap problem. A
> controller
> > > node
> > > > > > can't
> > > > > > > > > read
> > > > > > > > > >> >> the log
> > > > > > > > > >> >> >> > and
> > > > > > > > > >> >> >> > >> >> > > >> therefore the feature level until a
> > > > quorum
> > > > > > > leader
> > > > > > > > > is
> > > > > > > > > >> >> >> elected.
> > > > > > > > > >> >> >> > But
> > > > > > > > > >> >> >> > >> >> > leader
> > > > > > > > > >> >> >> > >> >> > > >> election requires an RPC.
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> 11. For downgrades, it would be
> > useful
> > > to
> > > > > > > > describe
> > > > > > > > > >> how
> > > > > > > > > >> >> to
> > > > > > > > > >> >> >> > >> determine
> > > > > > > > > >> >> >> > >> >> > the
> > > > > > > > > >> >> >> > >> >> > > >> downgrade process (generating new
> > > > snapshot,
> > > > > > > > > >> propagating
> > > > > > > > > >> >> the
> > > > > > > > > >> >> >> > >> >> snapshot,
> > > > > > > > > >> >> >> > >> >> > > etc)
> > > > > > > > > >> >> >> > >> >> > > >> has completed. We could block the
> > > > > > UpdateFeature
> > > > > > > > > >> request
> > > > > > > > > >> >> >> until
> > > > > > > > > >> >> >> > the
> > > > > > > > > >> >> >> > >> >> > > process
> > > > > > > > > >> >> >> > >> >> > > >> is completed. However, since the
> > > process
> > > > > > could
> > > > > > > > take
> > > > > > > > > >> >> time,
> > > > > > > > > >> >> >> the
> > > > > > > > > >> >> >> > >> >> request
> > > > > > > > > >> >> >> > >> >> > > could
> > > > > > > > > >> >> >> > >> >> > > >> time out. Another way is through
> > > > > > > DescribeFeature
> > > > > > > > > and
> > > > > > > > > >> the
> > > > > > > > > >> >> >> > server
> > > > > > > > > >> >> >> > >> only
> > > > > > > > > >> >> >> > >> >> > > >> reports downgraded versions after
> the
> > > > > process
> > > > > > > is
> > > > > > > > > >> >> completed.
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> 12. Since we are changing
> > > > > > > UpdateFeaturesRequest,
> > > > > > > > do
> > > > > > > > > >> we
> > > > > > > > > >> >> need
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >> > >> >> change
> > > > > > > > > >> >> >> > >> >> > > the
> > > > > > > > > >> >> >> > >> >> > > >> AdminClient api for updateFeatures
> > too?
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> 13. For the paragraph starting with
> > "In
> > > > the
> > > > > > > > absence
> > > > > > > > > >> of
> > > > > > > > > >> >> an
> > > > > > > > > >> >> >> > >> operator
> > > > > > > > > >> >> >> > >> >> > > >> defined value for
> metadata.version",
> > in
> > > > > > > KIP-584,
> > > > > > > > we
> > > > > > > > > >> >> >> described
> > > > > > > > > >> >> >> > >> how to
> > > > > > > > > >> >> >> > >> >> > > >> finalize features with New cluster
> > > > > bootstrap.
> > > > > > > In
> > > > > > > > > that
> > > > > > > > > >> >> case,
> > > > > > > > > >> >> >> > it's
> > > > > > > > > >> >> >> > >> >> > > >> inconvenient for the users to have
> to
> > > run
> > > > > an
> > > > > > > > admin
> > > > > > > > > >> tool
> > > > > > > > > >> >> to
> > > > > > > > > >> >> >> > >> finalize
> > > > > > > > > >> >> >> > >> >> > the
> > > > > > > > > >> >> >> > >> >> > > >> version for each feature. Instead,
> > the
> > > > > system
> > > > > > > > > detects
> > > > > > > > > >> >> that
> > > > > > > > > >> >> >> the
> > > > > > > > > >> >> >> > >> >> > /features
> > > > > > > > > >> >> >> > >> >> > > >> path is missing in ZK and thus
> > > > > automatically
> > > > > > > > > >> finalizes
> > > > > > > > > >> >> every
> > > > > > > > > >> >> >> > >> feature
> > > > > > > > > >> >> >> > >> >> > > with
> > > > > > > > > >> >> >> > >> >> > > >> the latest supported version. Could
> > we
> > > do
> > > > > > > > something
> > > > > > > > > >> >> similar
> > > > > > > > > >> >> >> in
> > > > > > > > > >> >> >> > >> the
> > > > > > > > > >> >> >> > >> >> > KRaft
> > > > > > > > > >> >> >> > >> >> > > >> mode?
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> 14. After the quorum leader
> > generates a
> > > > new
> > > > > > > > > snapshot,
> > > > > > > > > >> >> how do
> > > > > > > > > >> >> >> > we
> > > > > > > > > >> >> >> > >> >> force
> > > > > > > > > >> >> >> > >> >> > > >> other nodes to pick up the new
> > > snapshot?
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> 15. I agree with Jose that it will
> be
> > > > > useful
> > > > > > to
> > > > > > > > > >> describe
> > > > > > > > > >> >> >> when
> > > > > > > > > >> >> >> > >> >> > > generating a
> > > > > > > > > >> >> >> > >> >> > > >> new snapshot is needed. To me, it
> > seems
> > > > the
> > > > > > new
> > > > > > > > > >> >> snapshot is
> > > > > > > > > >> >> >> > only
> > > > > > > > > >> >> >> > >> >> > needed
> > > > > > > > > >> >> >> > >> >> > > >> when incompatible changes are made.
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> 7. Jose, what control records were
> > you
> > > > > > > referring?
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> Thanks,
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> Jun
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >> On Tue, Oct 5, 2021 at 8:53 AM
> David
> > > > > Arthur <
> > > > > > > > > >> >> >> > >> davidarthur@apache.org
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >> > > >> wrote:
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > > >>> Jose, thanks for the thorough
> review
> > > and
> > > > > > > > comments!
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> I am out of the office until next
> > > week,
> > > > > so I
> > > > > > > > > >> probably
> > > > > > > > > >> >> won't
> > > > > > > > > >> >> >> > be
> > > > > > > > > >> >> >> > >> able
> > > > > > > > > >> >> >> > >> >> > to
> > > > > > > > > >> >> >> > >> >> > > >>> update the KIP until then. Here
> are
> > > some
> > > > > > > replies
> > > > > > > > > to
> > > > > > > > > >> >> your
> > > > > > > > > >> >> >> > >> questions:
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> 1. Generate snapshot on upgrade
> > > > > > > > > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated
> > and
> > > > > sent
> > > > > > to
> > > > > > > > the
> > > > > > > > > >> >> other
> > > > > > > > > >> >> >> > nodes
> > > > > > > > > >> >> >> > >> >> > > >>> > Why does the Active Controller
> > need
> > > to
> > > > > > > > generate
> > > > > > > > > a
> > > > > > > > > >> new
> > > > > > > > > >> >> >> > snapshot
> > > > > > > > > >> >> >> > >> >> and
> > > > > > > > > >> >> >> > >> >> > > >>> > force a snapshot fetch from the
> > > > replicas
> > > > > > > > > (inactive
> > > > > > > > > >> >> >> > controller
> > > > > > > > > >> >> >> > >> and
> > > > > > > > > >> >> >> > >> >> > > >>> > brokers) on an upgrade? Isn't
> > > writing
> > > > > the
> > > > > > > > > >> >> >> > FeatureLevelRecord
> > > > > > > > > >> >> >> > >> good
> > > > > > > > > >> >> >> > >> >> > > >>> > enough to communicate the
> upgrade
> > to
> > > > the
> > > > > > > > > replicas?
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> You're right, we don't necessarily
> > > need
> > > > to
> > > > > > > > > >> _transmit_ a
> > > > > > > > > >> >> >> > >> snapshot,
> > > > > > > > > >> >> >> > >> >> > since
> > > > > > > > > >> >> >> > >> >> > > >>> each node can generate its own
> > > > equivalent
> > > > > > > > snapshot
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> 2. Generate snapshot on downgrade
> > > > > > > > > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated
> > and
> > > > > sent
> > > > > > to
> > > > > > > > the
> > > > > > > > > >> >> other
> > > > > > > > > >> >> >> > >> inactive
> > > > > > > > > >> >> >> > >> >> > > >>> > controllers and to brokers (this
> > > > > snapshot
> > > > > > > may
> > > > > > > > be
> > > > > > > > > >> >> lossy!)
> > > > > > > > > >> >> >> > >> >> > > >>> > Why do we need to send this
> > > downgraded
> > > > > > > > snapshot
> > > > > > > > > to
> > > > > > > > > >> >> the
> > > > > > > > > >> >> >> > >> brokers?
> > > > > > > > > >> >> >> > >> >> The
> > > > > > > > > >> >> >> > >> >> > > >>> > replicas have seen the
> > > > > FeatureLevelRecord
> > > > > > > and
> > > > > > > > > >> >> noticed the
> > > > > > > > > >> >> >> > >> >> > downgrade.
> > > > > > > > > >> >> >> > >> >> > > >>> > Can we have the replicas each
> > > > > > independently
> > > > > > > > > >> generate
> > > > > > > > > >> >> a
> > > > > > > > > >> >> >> > >> downgraded
> > > > > > > > > >> >> >> > >> >> > > >>> > snapshot at the offset for the
> > > > > downgraded
> > > > > > > > > >> >> >> > FeatureLevelRecord?
> > > > > > > > > >> >> >> > >> I
> > > > > > > > > >> >> >> > >> >> > > assume
> > > > > > > > > >> >> >> > >> >> > > >>> > that the active controller will
> > > > > guarantee
> > > > > > > that
> > > > > > > > > all
> > > > > > > > > >> >> >> records
> > > > > > > > > >> >> >> > >> after
> > > > > > > > > >> >> >> > >> >> > the
> > > > > > > > > >> >> >> > >> >> > > >>> > FatureLevelRecord use the
> > downgraded
> > > > > > > version.
> > > > > > > > If
> > > > > > > > > >> so,
> > > > > > > > > >> >> it
> > > > > > > > > >> >> >> > would
> > > > > > > > > >> >> >> > >> be
> > > > > > > > > >> >> >> > >> >> > good
> > > > > > > > > >> >> >> > >> >> > > >>> > to mention that explicitly.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> Similar to above, yes a broker
> that
> > > > > detects
> > > > > > a
> > > > > > > > > >> >> downgrade via
> > > > > > > > > >> >> >> > >> >> > > >>> FeatureLevelRecord could generate
> > its
> > > > own
> > > > > > > > > downgrade
> > > > > > > > > >> >> >> snapshot
> > > > > > > > > >> >> >> > and
> > > > > > > > > >> >> >> > >> >> > reload
> > > > > > > > > >> >> >> > >> >> > > >>> its
> > > > > > > > > >> >> >> > >> >> > > >>> state from that. This does get a
> > > little
> > > > > > fuzzy
> > > > > > > > when
> > > > > > > > > >> we
> > > > > > > > > >> >> >> > consider
> > > > > > > > > >> >> >> > >> >> cases
> > > > > > > > > >> >> >> > >> >> > > where
> > > > > > > > > >> >> >> > >> >> > > >>> brokers are on different software
> > > > versions
> > > > > > and
> > > > > > > > > >> could be
> > > > > > > > > >> >> >> > >> generating
> > > > > > > > > >> >> >> > >> >> a
> > > > > > > > > >> >> >> > >> >> > > >>> downgrade snapshot for version X,
> > but
> > > > > using
> > > > > > > > > >> different
> > > > > > > > > >> >> >> > versions
> > > > > > > > > >> >> >> > >> of
> > > > > > > > > >> >> >> > >> >> the
> > > > > > > > > >> >> >> > >> >> > > >>> code.
> > > > > > > > > >> >> >> > >> >> > > >>> It might be safer to let the
> > > controller
> > > > > > > generate
> > > > > > > > > the
> > > > > > > > > >> >> >> > snapshot so
> > > > > > > > > >> >> >> > >> >> each
> > > > > > > > > >> >> >> > >> >> > > >>> broker (regardless of software
> > > version)
> > > > > gets
> > > > > > > the
> > > > > > > > > >> same
> > > > > > > > > >> >> >> > records.
> > > > > > > > > >> >> >> > >> >> > However,
> > > > > > > > > >> >> >> > >> >> > > >>> for
> > > > > > > > > >> >> >> > >> >> > > >>> upgrades (or downgrades) we expect
> > the
> > > > > whole
> > > > > > > > > cluster
> > > > > > > > > >> >> to be
> > > > > > > > > >> >> >> > >> running
> > > > > > > > > >> >> >> > >> >> > the
> > > > > > > > > >> >> >> > >> >> > > >>> same
> > > > > > > > > >> >> >> > >> >> > > >>> software version before triggering
> > the
> > > > > > > > > >> metadata.version
> > > > > > > > > >> >> >> > change,
> > > > > > > > > >> >> >> > >> so
> > > > > > > > > >> >> >> > >> >> > > perhaps
> > > > > > > > > >> >> >> > >> >> > > >>> this isn't a likely scenario.
> > > Thoughts?
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> 3. Max metadata version
> > > > > > > > > >> >> >> > >> >> > > >>> > >For the first release that
> > supports
> > > > > > > > > >> >> metadata.version, we
> > > > > > > > > >> >> >> > can
> > > > > > > > > >> >> >> > >> >> > simply
> > > > > > > > > >> >> >> > >> >> > > >>> > initialize metadata.version with
> > the
> > > > > > current
> > > > > > > > > (and
> > > > > > > > > >> >> only)
> > > > > > > > > >> >> >> > >> version.
> > > > > > > > > >> >> >> > >> >> > For
> > > > > > > > > >> >> >> > >> >> > > >>> future
> > > > > > > > > >> >> >> > >> >> > > >>> > releases, we will need a
> mechanism
> > > to
> > > > > > > > bootstrap
> > > > > > > > > a
> > > > > > > > > >> >> >> > particular
> > > > > > > > > >> >> >> > >> >> > version.
> > > > > > > > > >> >> >> > >> >> > > >>> This
> > > > > > > > > >> >> >> > >> >> > > >>> > could be done using the
> > > > meta.properties
> > > > > > file
> > > > > > > > or
> > > > > > > > > >> some
> > > > > > > > > >> >> >> > similar
> > > > > > > > > >> >> >> > >> >> > > mechanism.
> > > > > > > > > >> >> >> > >> >> > > >>> The
> > > > > > > > > >> >> >> > >> >> > > >>> > reason we need the allow for a
> > > > specific
> > > > > > > > initial
> > > > > > > > > >> >> version
> > > > > > > > > >> >> >> is
> > > > > > > > > >> >> >> > to
> > > > > > > > > >> >> >> > >> >> > support
> > > > > > > > > >> >> >> > >> >> > > >>> the
> > > > > > > > > >> >> >> > >> >> > > >>> > use case of starting a Kafka
> > cluster
> > > > at
> > > > > > > > version
> > > > > > > > > X
> > > > > > > > > >> >> with an
> > > > > > > > > >> >> >> > >> older
> > > > > > > > > >> >> >> > >> >> > > >>> > metadata.version.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> I assume that the Active
> Controller
> > > will
> > > > > > learn
> > > > > > > > the
> > > > > > > > > >> >> metadata
> > > > > > > > > >> >> >> > >> version
> > > > > > > > > >> >> >> > >> >> > of
> > > > > > > > > >> >> >> > >> >> > > >>> > the broker through the
> > > > > > > > > BrokerRegistrationRequest.
> > > > > > > > > >> How
> > > > > > > > > >> >> >> will
> > > > > > > > > >> >> >> > the
> > > > > > > > > >> >> >> > >> >> > Active
> > > > > > > > > >> >> >> > >> >> > > >>> > Controller learn about the max
> > > > metadata
> > > > > > > > version
> > > > > > > > > of
> > > > > > > > > >> >> the
> > > > > > > > > >> >> >> > >> inactive
> > > > > > > > > >> >> >> > >> >> > > >>> > controller nodes? We currently
> > don't
> > > > > send
> > > > > > a
> > > > > > > > > >> >> registration
> > > > > > > > > >> >> >> > >> request
> > > > > > > > > >> >> >> > >> >> > from
> > > > > > > > > >> >> >> > >> >> > > >>> > the inactive controller to the
> > > active
> > > > > > > > > controller.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> This came up during the design,
> but
> > I
> > > > > > > neglected
> > > > > > > > to
> > > > > > > > > >> add
> > > > > > > > > >> >> it
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >> > the
> > > > > > > > > >> >> >> > >> >> KIP.
> > > > > > > > > >> >> >> > >> >> > > We
> > > > > > > > > >> >> >> > >> >> > > >>> will need a mechanism for
> > determining
> > > > the
> > > > > > > > > supported
> > > > > > > > > >> >> >> features
> > > > > > > > > >> >> >> > of
> > > > > > > > > >> >> >> > >> >> each
> > > > > > > > > >> >> >> > >> >> > > >>> controller similar to how brokers
> > use
> > > > > > > > > >> >> >> > BrokerRegistrationRequest.
> > > > > > > > > >> >> >> > >> >> > > Perhaps
> > > > > > > > > >> >> >> > >> >> > > >>> controllers could write a
> > > > > FeatureLevelRecord
> > > > > > > (or
> > > > > > > > > >> >> similar)
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >> > the
> > > > > > > > > >> >> >> > >> >> > > metadata
> > > > > > > > > >> >> >> > >> >> > > >>> log indicating their supported
> > > version.
> > > > > > WDYT?
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> Why do you need to bootstrap a
> > > > particular
> > > > > > > > version?
> > > > > > > > > >> >> Isn't
> > > > > > > > > >> >> >> the
> > > > > > > > > >> >> >> > >> intent
> > > > > > > > > >> >> >> > >> >> > > >>> > that the broker will learn the
> > > active
> > > > > > > metadata
> > > > > > > > > >> >> version by
> > > > > > > > > >> >> >> > >> reading
> > > > > > > > > >> >> >> > >> >> > the
> > > > > > > > > >> >> >> > >> >> > > >>> > metadata before unfencing?
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> This bootstrapping is needed for
> > when
> > > a
> > > > > > KRaft
> > > > > > > > > >> cluster
> > > > > > > > > >> >> is
> > > > > > > > > >> >> >> > first
> > > > > > > > > >> >> >> > >> >> > > started. If
> > > > > > > > > >> >> >> > >> >> > > >>> we don't have this mechanism, the
> > > > cluster
> > > > > > > can't
> > > > > > > > > >> really
> > > > > > > > > >> >> do
> > > > > > > > > >> >> >> > >> anything
> > > > > > > > > >> >> >> > >> >> > > until
> > > > > > > > > >> >> >> > >> >> > > >>> the operator finalizes the
> > > > > metadata.version
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > >> >> tool.
> > > > > > > > > >> >> >> > The
> > > > > > > > > >> >> >> > >> >> > > >>> bootstrapping will be done by the
> > > > > controller
> > > > > > > and
> > > > > > > > > the
> > > > > > > > > >> >> >> brokers
> > > > > > > > > >> >> >> > >> will
> > > > > > > > > >> >> >> > >> >> see
> > > > > > > > > >> >> >> > >> >> > > this
> > > > > > > > > >> >> >> > >> >> > > >>> version as a record (like you
> say).
> > > I'll
> > > > > add
> > > > > > > > some
> > > > > > > > > >> text
> > > > > > > > > >> >> to
> > > > > > > > > >> >> >> > >> clarify
> > > > > > > > > >> >> >> > >> >> > this.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> 4. Reject Registration - This is
> > > related
> > > > > to
> > > > > > > the
> > > > > > > > > >> bullet
> > > > > > > > > >> >> >> point
> > > > > > > > > >> >> >> > >> above.
> > > > > > > > > >> >> >> > >> >> > > >>> > What will be the behavior of the
> > > > active
> > > > > > > > > controller
> > > > > > > > > >> >> if the
> > > > > > > > > >> >> >> > >> broker
> > > > > > > > > >> >> >> > >> >> > > sends
> > > > > > > > > >> >> >> > >> >> > > >>> > a metadata version that is not
> > > > > compatible
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > >> >> >> cluster
> > > > > > > > > >> >> >> > >> wide
> > > > > > > > > >> >> >> > >> >> > > >>> > metadata version?
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> If a broker starts up with a lower
> > > > > supported
> > > > > > > > > version
> > > > > > > > > >> >> range
> > > > > > > > > >> >> >> > than
> > > > > > > > > >> >> >> > >> the
> > > > > > > > > >> >> >> > >> >> > > >>> current
> > > > > > > > > >> >> >> > >> >> > > >>> cluster metadata.version, it
> should
> > > log
> > > > an
> > > > > > > error
> > > > > > > > > and
> > > > > > > > > >> >> >> > shutdown.
> > > > > > > > > >> >> >> > >> This
> > > > > > > > > >> >> >> > >> >> > is
> > > > > > > > > >> >> >> > >> >> > > in
> > > > > > > > > >> >> >> > >> >> > > >>> line with KIP-584.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> 5. Discover upgrade
> > > > > > > > > >> >> >> > >> >> > > >>> > > This snapshot will be a
> > convenient
> > > > way
> > > > > > to
> > > > > > > > let
> > > > > > > > > >> >> broker
> > > > > > > > > >> >> >> and
> > > > > > > > > >> >> >> > >> >> > controller
> > > > > > > > > >> >> >> > >> >> > > >>> > components rebuild their entire
> > > > > in-memory
> > > > > > > > state
> > > > > > > > > >> >> following
> > > > > > > > > >> >> >> > an
> > > > > > > > > >> >> >> > >> >> > upgrade.
> > > > > > > > > >> >> >> > >> >> > > >>> > Can we rely on the presence of
> the
> > > > > > > > > >> >> FeatureLevelRecord for
> > > > > > > > > >> >> >> > the
> > > > > > > > > >> >> >> > >> >> > > metadata
> > > > > > > > > >> >> >> > >> >> > > >>> > version for this functionality?
> If
> > > so,
> > > > > it
> > > > > > > > avoids
> > > > > > > > > >> >> having
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >> > >> reload
> > > > > > > > > >> >> >> > >> >> > the
> > > > > > > > > >> >> >> > >> >> > > >>> > snapshot.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> For upgrades, yes probably since
> we
> > > > won't
> > > > > > need
> > > > > > > > to
> > > > > > > > > >> >> "rewrite"
> > > > > > > > > >> >> >> > any
> > > > > > > > > >> >> >> > >> >> > > records in
> > > > > > > > > >> >> >> > >> >> > > >>> this case. For downgrades, we will
> > > need
> > > > to
> > > > > > > > > generate
> > > > > > > > > >> the
> > > > > > > > > >> >> >> > snapshot
> > > > > > > > > >> >> >> > >> >> and
> > > > > > > > > >> >> >> > >> >> > > >>> reload
> > > > > > > > > >> >> >> > >> >> > > >>> everything.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> 6. Metadata version specification
> > > > > > > > > >> >> >> > >> >> > > >>> > >  V4(version=4,
> > > > > > > isBackwardsCompatible=false,
> > > > > > > > > >> >> >> > description="New
> > > > > > > > > >> >> >> > >> >> > > metadata
> > > > > > > > > >> >> >> > >> >> > > >>> > record type Bar"),
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> Very cool. Do you have plans to
> > > generate
> > > > > > > Apache
> > > > > > > > > >> Kafka
> > > > > > > > > >> >> HTML
> > > > > > > > > >> >> >> > >> >> > > >>> > documentation for this
> > information?
> > > > > Would
> > > > > > be
> > > > > > > > > >> helpful
> > > > > > > > > >> >> to
> > > > > > > > > >> >> >> > >> display
> > > > > > > > > >> >> >> > >> >> > this
> > > > > > > > > >> >> >> > >> >> > > >>> > information to the user using
> the
> > > > > > > > > >> kafka-features.sh
> > > > > > > > > >> >> and
> > > > > > > > > >> >> >> > >> feature
> > > > > > > > > >> >> >> > >> >> > RPC?
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> Hm good idea :) I'll add a brief
> > > section
> > > > > on
> > > > > > > > > >> >> documentation.
> > > > > > > > > >> >> >> > This
> > > > > > > > > >> >> >> > >> >> would
> > > > > > > > > >> >> >> > >> >> > > >>> certainly be very useful
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> 7.Downgrade records
> > > > > > > > > >> >> >> > >> >> > > >>> > I think we should explicitly
> > mention
> > > > > that
> > > > > > > the
> > > > > > > > > >> >> downgrade
> > > > > > > > > >> >> >> > >> process
> > > > > > > > > >> >> >> > >> >> > will
> > > > > > > > > >> >> >> > >> >> > > >>> > downgrade both metadata records
> > and
> > > > > > > controller
> > > > > > > > > >> >> records.
> > > > > > > > > >> >> >> In
> > > > > > > > > >> >> >> > >> >> KIP-630
> > > > > > > > > >> >> >> > >> >> > we
> > > > > > > > > >> >> >> > >> >> > > >>> > introduced two control records
> for
> > > > > > > snapshots.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> Yes, good call. Let me re-read
> that
> > > KIP
> > > > > and
> > > > > > > > > include
> > > > > > > > > >> >> some
> > > > > > > > > >> >> >> > >> details.
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> Thanks again for the comments!
> > > > > > > > > >> >> >> > >> >> > > >>> -David
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> On Wed, Sep 29, 2021 at 5:09 PM
> José
> > > > > Armando
> > > > > > > > > García
> > > > > > > > > >> >> Sancio
> > > > > > > > > >> >> >> > >> >> > > >>> <js...@confluent.io.invalid>
> > wrote:
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>> > One more comment.
> > > > > > > > > >> >> >> > >> >> > > >>> >
> > > > > > > > > >> >> >> > >> >> > > >>> > 7.Downgrade records
> > > > > > > > > >> >> >> > >> >> > > >>> > I think we should explicitly
> > mention
> > > > > that
> > > > > > > the
> > > > > > > > > >> >> downgrade
> > > > > > > > > >> >> >> > >> process
> > > > > > > > > >> >> >> > >> >> > will
> > > > > > > > > >> >> >> > >> >> > > >>> > downgrade both metadata records
> > and
> > > > > > > controller
> > > > > > > > > >> >> records.
> > > > > > > > > >> >> >> In
> > > > > > > > > >> >> >> > >> >> KIP-630
> > > > > > > > > >> >> >> > >> >> > we
> > > > > > > > > >> >> >> > >> >> > > >>> > introduced two control records
> for
> > > > > > > snapshots.
> > > > > > > > > >> >> >> > >> >> > > >>> >
> > > > > > > > > >> >> >> > >> >> > > >>>
> > > > > > > > > >> >> >> > >> >> > > >>
> > > > > > > > > >> >> >> > >> >> > >
> > > > > > > > > >> >> >> > >> >> >
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >> >> --
> > > > > > > > > >> >> >> > >> >> -David
> > > > > > > > > >> >> >> > >> >>
> > > > > > > > > >> >> >> > >>
> > > > > > > > > >> >> >> >
> > > > > > > > > >> >> >>
> > > > > > > > > >> >>
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -David
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -David
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -David
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -David
> > > > >
> > > >
> > >
> > >
> > > --
> > > -David
> > >
> >
>


-- 
-David