You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Kowshik Prakasam <kp...@confluent.io> on 2020/04/01 00:29:03 UTC

Re: [DISCUSS] KIP-584: Versioning scheme for features

Hey Boyang,

Thanks for the great feedback! I have updated the KIP based on your
feedback.
Please find my response below for your comments, look for sentences starting
with "(Kowshik)" below.


> 1. "When is it safe for the brokers to begin handling EOS traffic" could
be
> converted as "When is it safe for the brokers to start serving new
> Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> context.

(Kowshik): Great point! Done.

> 2. In the *Explanation *section, the metadata version number part seems a
> bit blurred. Could you point a reference to later section that we going to
> store it in Zookeeper and update it every time when there is a feature
> change?

(Kowshik): Great point! Done. I've added a reference in the KIP.


> 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> features such as group coordinator semantics, there is no legal scenario
to
> perform a downgrade at all. So having downgrade door open is pretty
> error-prone as human faults happen all the time. I'm assuming as new
> features are implemented, it's not very hard to add a flag during feature
> creation to indicate whether this feature is "downgradable". Could you
> explain a bit more on the extra engineering effort for shipping this KIP
> with downgrade protection in place?

(Kowshik): Great point! I'd agree and disagree here. While I agree that
accidental
downgrades can cause problems, I also think sometimes downgrades should
be allowed for emergency reasons (not all downgrades cause issues).
It is just subjective to the feature being downgraded.

To be more strict about feature version downgrades, I have modified the KIP
proposing that we mandate a `--force-downgrade` flag be used in the
UPDATE_FEATURES api
and the tooling, whenever the human is downgrading a finalized feature
version.
Hopefully this should cover the requirement, until we find the need for
advanced downgrade support.

> 4. "Each broker’s supported dictionary of feature versions will be defined
> in the broker code." So this means in order to restrict a certain feature,
> we need to start the broker first and then send a feature gating request
> immediately, which introduces a time gap and the intended-to-close feature
> could actually serve request during this phase. Do you think we should
also
> support configurations as well so that admin user could freely roll up a
> cluster with all nodes complying the same feature gating, without worrying
> about the turnaround time to propagate the message only after the cluster
> starts up?

(Kowshik): This is a great point/question. One of the expectations out of
this KIP, which is
already followed in the broker, is the following.
 - Imagine at time T1 the broker starts up and registers it’s presence in
ZK,
   along with advertising it’s supported features.
 - Imagine at a future time T2 the broker receives the UpdateMetadataRequest
   from the controller, which contains the latest finalized features as
seen by
   the controller. The broker validates this data against it’s supported
features to
   make sure there is no mismatch (it will shutdown if there is an
incompatibility).

It is expected that during the time between the 2 events T1 and T2, the
broker is
almost a silent entity in the cluster. It does not add any value to the
cluster, or carry
out any important broker activities. By “important”, I mean it is not doing
mutations
on it’s persistence, not mutating critical in-memory state, won’t be serving
produce/fetch requests. Note it doesn’t even know it’s assigned partitions
until
it receives UpdateMetadataRequest from controller. Anything the broker is
doing up
until this point is not damaging/useful.

I’ve clarified the above in the KIP, see this new section:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
.

> 5. "adding a new Feature, updating or deleting an existing Feature", may
be
> I misunderstood something, I thought the features are defined in broker
> code, so admin could not really create a new feature?

(Kowshik): Great point! You understood this right. Here adding a feature
means we are
adding a cluster-wide finalized *max* version for a feature that was
previously never finalized.
I have clarified this in the KIP now.

> 6. I think we need a separate error code like FEATURE_UPDATE_IN_PROGRESS
to
> reject a concurrent feature update request.

(Kowshik): Great point! I have modified the KIP adding the above (see
'Tooling support -> Admin API changes').

> 7. I think we haven't discussed the alternative solution to pass the
> feature information through Zookeeper. Is that mentioned in the KIP to
> justify why using UpdateMetadata is more favorable?

(Kowshik): Nice question! The broker reads finalized feature info stored in
ZK,
only during startup when it does a validation. When serving
`ApiVersionsRequest`, the
broker does not read this info from ZK directly. I'd imagine the risk is
that it can increase
the ZK read QPS which can be a bottleneck for the system. Today, in Kafka
we use the
controller to fan out ZK updates to brokers and we want to stick to that
pattern to avoid
the ZK read bottleneck when serving `ApiVersionsRequest`.

> 8. I was under the impression that user could configure a range of
> supported versions, what's the trade-off for allowing single finalized
> version only?

(Kowshik): Great question! The finalized version of a feature basically
refers to
the cluster-wide finalized feature "maximum" version. For example, if the
'group_coordinator' feature
has the finalized version set to 10, then, it means that cluster-wide all
versions upto v10 are
supported for this feature. However, note that if some version (ex: v0)
gets deprecated
for this feature, then we don’t convey that using this scheme (also
supporting deprecation is a non-goal).

(Kowshik): I’ve now modified the KIP at all points, refering to finalized
feature "maximum" versions.

> 9. One minor syntax fix: Note that here the "client" here may be a
producer

(Kowshik): Great point! Done.


Cheers,
Kowshik


On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <re...@gmail.com>
wrote:

> Hey Kowshik,
>
> thanks for the revised KIP. Got a couple of questions:
>
> 1. "When is it safe for the brokers to begin handling EOS traffic" could be
> converted as "When is it safe for the brokers to start serving new
> Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> context.
>
> 2. In the *Explanation *section, the metadata version number part seems a
> bit blurred. Could you point a reference to later section that we going to
> store it in Zookeeper and update it every time when there is a feature
> change?
>
> 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> features such as group coordinator semantics, there is no legal scenario to
> perform a downgrade at all. So having downgrade door open is pretty
> error-prone as human faults happen all the time. I'm assuming as new
> features are implemented, it's not very hard to add a flag during feature
> creation to indicate whether this feature is "downgradable". Could you
> explain a bit more on the extra engineering effort for shipping this KIP
> with downgrade protection in place?
>
> 4. "Each broker’s supported dictionary of feature versions will be defined
> in the broker code." So this means in order to restrict a certain feature,
> we need to start the broker first and then send a feature gating request
> immediately, which introduces a time gap and the intended-to-close feature
> could actually serve request during this phase. Do you think we should also
> support configurations as well so that admin user could freely roll up a
> cluster with all nodes complying the same feature gating, without worrying
> about the turnaround time to propagate the message only after the cluster
> starts up?
>
> 5. "adding a new Feature, updating or deleting an existing Feature", may be
> I misunderstood something, I thought the features are defined in broker
> code, so admin could not really create a new feature?
>
> 6. I think we need a separate error code like FEATURE_UPDATE_IN_PROGRESS to
> reject a concurrent feature update request.
>
> 7. I think we haven't discussed the alternative solution to pass the
> feature information through Zookeeper. Is that mentioned in the KIP to
> justify why using UpdateMetadata is more favorable?
>
> 8. I was under the impression that user could configure a range of
> supported versions, what's the trade-off for allowing single finalized
> version only?
>
> 9. One minor syntax fix: Note that here the "client" here may be a producer
>
> Boyang
>
> On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org> wrote:
>
> > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > Hi Colin,
> > >
> > > Thanks for the feedback! I've changed the KIP to address your
> > > suggestions.
> > > Please find below my explanation. Here is a link to KIP 584:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > .
> > >
> > > 1. '__data_version__' is the version of the finalized feature metadata
> > > (i.e. actual ZK node contents), while the '__schema_version__' is the
> > > version of the schema of the data persisted in ZK. These serve
> different
> > > purposes. '__data_version__' is is useful mainly to clients during
> reads,
> > > to differentiate between the 2 versions of eventually consistent
> > 'finalized
> > > features' metadata (i.e. larger metadata version is more recent).
> > > '__schema_version__' provides an additional degree of flexibility,
> where
> > if
> > > we decide to change the schema for '/features' node in ZK (in the
> > future),
> > > then we can manage broker roll outs suitably (i.e.
> > > serialization/deserialization of the ZK data can be handled safely).
> >
> > Hi Kowshik,
> >
> > If you're talking about a number that lets you know if data is more or
> > less recent, we would typically call that an epoch, and not a version.
> For
> > the ZK data structures, the word "version" is typically reserved for
> > describing changes to the overall schema of the data that is written to
> > ZooKeeper.  We don't even really change the "version" of those schemas
> that
> > much, since most changes are backwards-compatible.  But we do include
> that
> > version field just in case.
> >
> > I don't think we really need an epoch here, though, since we can just
> look
> > at the broker epoch.  Whenever the broker registers, its epoch will be
> > greater than the previous broker epoch.  And the newly registered data
> will
> > take priority.  This will be a lot simpler than adding a separate epoch
> > system, I think.
> >
> > >
> > > 2. Regarding admin client needing min and max information - you are
> > right!
> > > I've changed the KIP such that the Admin API also allows the user to
> read
> > > 'supported features' from a specific broker. Please look at the section
> > > "Admin API changes".
> >
> > Thanks.
> >
> > >
> > > 3. Regarding the use of `long` vs `Long` - it was not deliberate. I've
> > > improved the KIP to just use `long` at all places.
> >
> > Sounds good.
> >
> > >
> > > 4. Regarding kafka.admin.FeatureCommand tool - you are right! I've
> > updated
> > > the KIP sketching the functionality provided by this tool, with some
> > > examples. Please look at the section "Tooling support examples".
> > >
> > > Thank you!
> >
> >
> > Thanks, Kowshik.
> >
> > cheers,
> > Colin
> >
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <cm...@apache.org>
> > wrote:
> > >
> > > > Thanks, Kowshik, this looks good.
> > > >
> > > > In the "Schema" section, do we really need both __schema_version__
> and
> > > > __data_version__?  Can we just have a single version field here?
> > > >
> > > > Shouldn't the Admin(Client) function have some way to get the min and
> > max
> > > > information that we're exposing as well?  I guess we could have min,
> > max,
> > > > and current.  Unrelated: is the use of Long rather than long
> deliberate
> > > > here?
> > > >
> > > > It would be good to describe how the command line tool
> > > > kafka.admin.FeatureCommand will work.  For example the flags that it
> > will
> > > > take and the output that it will generate to STDOUT.
> > > >
> > > > cheers,
> > > > Colin
> > > >
> > > >
> > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > Hi all,
> > > > >
> > > > > I've opened KIP-584 <https://issues.apache.org/jira/browse/KIP-584
> >
> > > > > which
> > > > > is intended to provide a versioning scheme for features. I'd like
> to
> > use
> > > > > this thread to discuss the same. I'd appreciate any feedback on
> this.
> > > > > Here
> > > > > is a link to KIP-584:
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > >  .
> > > > >
> > > > > Thank you!
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi all,

Thank you very much for all the insightful feedback!
How do you feel about the KIP?
Does the scope and the write up look OK to you, and is it time to call a
vote?


Cheers,
Kowshik

On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> Thank you for the suggestion! I have updated the KIP, please find my
> response below.
>
> > 200. I guess you are saying only when the allowDowngrade field is set,
> the
> > finalized feature version can go backward. Otherwise, it can only go up.
> > That makes sense. It would be useful to make that clear when explaining
> > the usage of the allowDowngrade field. In the validation section, we
> have  "
> > /features' from {"max_version_level": X} to {"max_version_level": X’}",
> it
> > seems that we need to mention Y there.
>
> (Kowshik): Great point! Yes, that is correct. Done, I have updated the
> validations
> section explaining the above. Here is a link to this section:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
>
>
> Cheers,
> Kowshik
>
>
>
>
> On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Kowshik,
>>
>> 200. I guess you are saying only when the allowDowngrade field is set, the
>> finalized feature version can go backward. Otherwise, it can only go up.
>> That makes sense. It would be useful to make that clear when explaining
>> the usage of the allowDowngrade field. In the validation section, we
>> have  "
>> /features' from {"max_version_level": X} to {"max_version_level": X’}", it
>> seems that we need to mention Y there.
>>
>> Thanks,
>>
>> Jun
>>
>> On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <kprakasam@confluent.io
>> >
>> wrote:
>>
>> > Hi Jun,
>> >
>> > Great question! Please find my response below.
>> >
>> > > 200. My understanding is that If the CLI tool passes the
>> > > '--allow-downgrade' flag when updating a specific feature, then a
>> future
>> > > downgrade is possible. Otherwise, the feature is now downgradable. If
>> so,
>> > I
>> > > was wondering how the controller remembers this since it can be
>> restarted
>> > > over time?
>> >
>> > (Kowshik): The purpose of the flag was to just restrict the user intent
>> for
>> > a specific request.
>> > It seems to me that to avoid confusion, I could call the flag as
>> > `--try-downgrade` instead.
>> > Then this makes it clear, that, the controller just has to consider the
>> ask
>> > from
>> > the user as an explicit request to attempt a downgrade.
>> >
>> > The flag does not act as an override on controller's decision making
>> that
>> > decides whether
>> > a flag is downgradable (these decisions on whether to allow a flag to be
>> > downgraded
>> > from a specific version level, can be embedded in the controller code).
>> >
>> > Please let me know what you think.
>> > Sorry if I misunderstood the original question.
>> >
>> >
>> > Cheers,
>> > Kowshik
>> >
>> >
>> > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <ju...@confluent.io> wrote:
>> >
>> > > Hi, Kowshik,
>> > >
>> > > Thanks for the reply. Makes sense. Just one more question.
>> > >
>> > > 200. My understanding is that If the CLI tool passes the
>> > > '--allow-downgrade' flag when updating a specific feature, then a
>> future
>> > > downgrade is possible. Otherwise, the feature is now downgradable. If
>> > so, I
>> > > was wondering how the controller remembers this since it can be
>> restarted
>> > > over time?
>> > >
>> > > Jun
>> > >
>> > >
>> > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <
>> kprakasam@confluent.io
>> > >
>> > > wrote:
>> > >
>> > > > Hi Jun,
>> > > >
>> > > > Thanks a lot for the feedback and the questions!
>> > > > Please find my response below.
>> > > >
>> > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field.
>> It
>> > > seems
>> > > > > that field needs to be persisted somewhere in ZK?
>> > > >
>> > > > (Kowshik): Great question! Below is my explanation. Please help me
>> > > > understand,
>> > > > if you feel there are cases where we would need to still persist it
>> in
>> > > ZK.
>> > > >
>> > > > Firstly I have updated my thoughts into the KIP now, under the
>> > > 'guidelines'
>> > > > section:
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
>> > > >
>> > > > The allowDowngrade boolean field is just to restrict the user
>> intent,
>> > and
>> > > > to remind
>> > > > them to double check their intent before proceeding. It should be
>> set
>> > to
>> > > > true
>> > > > by the user in a request, only when the user intent is to forcefully
>> > > > "attempt" a
>> > > > downgrade of a specific feature's max version level, to the provided
>> > > value
>> > > > in
>> > > > the request.
>> > > >
>> > > > We can extend this safeguard. The controller (on it's end) can
>> maintain
>> > > > rules in the code, that, for safety reasons would outright reject
>> > certain
>> > > > downgrades
>> > > > from a specific max_version_level for a specific feature. Such
>> > rejections
>> > > > may
>> > > > happen depending on the feature being downgraded, and from what
>> version
>> > > > level.
>> > > >
>> > > > The CLI tool only allows a downgrade attempt in conjunction with
>> > specific
>> > > > flags and sub-commands. For example, in the CLI tool, if the user
>> uses
>> > > the
>> > > > 'downgrade-all' command, or passes '--allow-downgrade' flag when
>> > > updating a
>> > > > specific feature, only then the tool will translate this ask to
>> setting
>> > > > 'allowDowngrade' field in the request to the server.
>> > > >
>> > > > > 201. UpdateFeaturesResponse has the following top level fields.
>> > Should
>> > > > > those fields be per feature?
>> > > > >
>> > > > >   "fields": [
>> > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
>> > > > >       "about": "The error code, or 0 if there was no error." },
>> > > > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
>> > > > >       "about": "The error message, or null if there was no
>> error." }
>> > > > >   ]
>> > > >
>> > > > (Kowshik): Great question!
>> > > > As such, the API is transactional, as explained in the sections
>> linked
>> > > > below.
>> > > > Either all provided FeatureUpdate was applied, or none.
>> > > > It's the reason I felt we can have just one error code + message.
>> > > > Happy to extend this if you feel otherwise. Please let me know.
>> > > >
>> > > > Link to sections:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
>> > > >
>> > > > > 202. The /features path in ZK has a field min_version_level. Which
>> > API
>> > > > and
>> > > > > tool can change that value?
>> > > >
>> > > > (Kowshik): Great question! Currently this cannot be modified by
>> using
>> > the
>> > > > API or the tool.
>> > > > Feature version deprecation (by raising min_version_level) can be
>> done
>> > > only
>> > > > by the Controller directly. The rationale is explained in this
>> section:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
>> > > >
>> > > >
>> > > > Cheers,
>> > > > Kowshik
>> > > >
>> > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io> wrote:
>> > > >
>> > > > > Hi, Kowshik,
>> > > > >
>> > > > > Thanks for addressing those comments. Just a few more minor
>> comments.
>> > > > >
>> > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field.
>> It
>> > > seems
>> > > > > that field needs to be persisted somewhere in ZK?
>> > > > >
>> > > > > 201. UpdateFeaturesResponse has the following top level fields.
>> > Should
>> > > > > those fields be per feature?
>> > > > >
>> > > > >   "fields": [
>> > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
>> > > > >       "about": "The error code, or 0 if there was no error." },
>> > > > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
>> > > > >       "about": "The error message, or null if there was no
>> error." }
>> > > > >   ]
>> > > > >
>> > > > > 202. The /features path in ZK has a field min_version_level. Which
>> > API
>> > > > and
>> > > > > tool can change that value?
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
>> > > kprakasam@confluent.io
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Jun,
>> > > > > >
>> > > > > > Thanks for the feedback! I have updated the KIP-584 addressing
>> your
>> > > > > > comments.
>> > > > > > Please find my response below.
>> > > > > >
>> > > > > > > 100.6 You can look for the sentence "This operation requires
>> > ALTER
>> > > on
>> > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
>> > > > > > > KafkaApis.authorize().
>> > > > > >
>> > > > > > (Kowshik): Done. Great point! For the newly introduced
>> > > UPDATE_FEATURES
>> > > > > api,
>> > > > > > I have added a
>> > > > > > requirement that AclOperation.ALTER is required on
>> > > > ResourceType.CLUSTER.
>> > > > > >
>> > > > > > > 110. Keeping the feature version as int is probably fine. I
>> just
>> > > felt
>> > > > > > that
>> > > > > > > for some of the common user interactions, it's more
>> convenient to
>> > > > > > > relate that to a release version. For example, if a user
>> wants to
>> > > > > > downgrade
>> > > > > > > to a release 2.5, it's easier for the user to use the tool
>> like
>> > > "tool
>> > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
>> > --version
>> > > > 6".
>> > > > > >
>> > > > > > (Kowshik): Great point. Generally, maximum feature version
>> levels
>> > are
>> > > > not
>> > > > > > downgradable after
>> > > > > > they are finalized in the cluster. This is because, as a
>> guideline
>> > > > > bumping
>> > > > > > feature version level usually is used mainly to convey important
>> > > > breaking
>> > > > > > changes.
>> > > > > > Despite the above, there may be some extreme/rare cases where a
>> > user
>> > > > > wants
>> > > > > > to downgrade
>> > > > > > all features to a specific previous release. The user may want
>> to
>> > do
>> > > > this
>> > > > > > just
>> > > > > > prior to rolling back a Kafka cluster to a previous release.
>> > > > > >
>> > > > > > To support the above, I have made a change to the KIP explaining
>> > that
>> > > > the
>> > > > > > CLI tool is versioned.
>> > > > > > The CLI tool internally has knowledge about a map of features to
>> > > their
>> > > > > > respective max
>> > > > > > versions supported by the Broker. The tool's knowledge of
>> features
>> > > and
>> > > > > > their version values,
>> > > > > > is limited to the version of the CLI tool itself i.e. the
>> > information
>> > > > is
>> > > > > > packaged into the CLI tool
>> > > > > > when it is released. Whenever a Kafka release introduces a new
>> > > feature
>> > > > > > version, or modifies
>> > > > > > an existing feature version, the CLI tool shall also be updated
>> > with
>> > > > this
>> > > > > > information,
>> > > > > > Newer versions of the CLI tool will be released as part of the
>> > Kafka
>> > > > > > releases.
>> > > > > >
>> > > > > > Therefore, to achieve the downgrade need, the user just needs to
>> > run
>> > > > the
>> > > > > > version of
>> > > > > > the CLI tool that's part of the particular previous release that
>> > > he/she
>> > > > > is
>> > > > > > downgrading to.
>> > > > > > To help the user with this, there is a new command added to the
>> CLI
>> > > > tool
>> > > > > > called `downgrade-all`.
>> > > > > > This essentially downgrades max version levels of all features
>> in
>> > the
>> > > > > > cluster to the versions
>> > > > > > known to the CLI tool internally.
>> > > > > >
>> > > > > > I have explained the above in the KIP under these sections:
>> > > > > >
>> > > > > > Tooling support (have explained that the CLI tool is versioned):
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
>> > > > > >
>> > > > > > Regular CLI tool usage (please refer to point #3, and see the
>> > tooling
>> > > > > > example)
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
>> > > > > >
>> > > > > > > 110. Similarly, if the client library finds a feature mismatch
>> > with
>> > > > the
>> > > > > > broker,
>> > > > > > > the client likely needs to log some error message for the
>> user to
>> > > > take
>> > > > > > some
>> > > > > > > actions. It's much more actionable if the error message is
>> > "upgrade
>> > > > the
>> > > > > > > broker to release version 2.6" than just "upgrade the broker
>> to
>> > > > feature
>> > > > > > > version 7".
>> > > > > >
>> > > > > > (Kowshik): That's a really good point! If we use ints for
>> feature
>> > > > > versions,
>> > > > > > the best
>> > > > > > message that client can print for debugging is "broker doesn't
>> > > support
>> > > > > > feature version 7", and alongside that print the supported
>> version
>> > > > range
>> > > > > > returned
>> > > > > > by the broker. Then, does it sound reasonable that the user
>> could
>> > > then
>> > > > > > reference
>> > > > > > Kafka release logs to figure out which version of the broker
>> > release
>> > > is
>> > > > > > required
>> > > > > > be deployed, to support feature version 7? I couldn't think of a
>> > > better
>> > > > > > strategy here.
>> > > > > >
>> > > > > > > 120. When should a developer bump up the version of a feature?
>> > > > > >
>> > > > > > (Kowshik): Great question! In the KIP, I have added a section:
>> > > > > 'Guidelines
>> > > > > > on feature versions and workflows'
>> > > > > > providing some guidelines on when to use the versioned feature
>> > flags,
>> > > > and
>> > > > > > what
>> > > > > > are the regular workflows with the CLI tool.
>> > > > > >
>> > > > > > Link to the relevant sections:
>> > > > > > Guidelines:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
>> > > > > >
>> > > > > > Regular CLI tool usage:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
>> > > > > >
>> > > > > > Advanced CLI tool usage:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
>> > > > > >
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Kowshik
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io>
>> wrote:
>> > > > > >
>> > > > > > > Hi, Kowshik,
>> > > > > > >
>> > > > > > > Thanks for the reply. A few more comments.
>> > > > > > >
>> > > > > > > 110. Keeping the feature version as int is probably fine. I
>> just
>> > > felt
>> > > > > > that
>> > > > > > > for some of the common user interactions, it's more
>> convenient to
>> > > > > > > relate that to a release version. For example, if a user
>> wants to
>> > > > > > downgrade
>> > > > > > > to a release 2.5, it's easier for the user to use the tool
>> like
>> > > "tool
>> > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
>> > --version
>> > > > 6".
>> > > > > > > Similarly, if the client library finds a feature mismatch with
>> > the
>> > > > > > broker,
>> > > > > > > the client likely needs to log some error message for the
>> user to
>> > > > take
>> > > > > > some
>> > > > > > > actions. It's much more actionable if the error message is
>> > "upgrade
>> > > > the
>> > > > > > > broker to release version 2.6" than just "upgrade the broker
>> to
>> > > > feature
>> > > > > > > version 7".
>> > > > > > >
>> > > > > > > 111. Sounds good.
>> > > > > > >
>> > > > > > > 120. When should a developer bump up the version of a feature?
>> > > > > > >
>> > > > > > > Jun
>> > > > > > >
>> > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
>> > > > > kprakasam@confluent.io
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Jun,
>> > > > > > > >
>> > > > > > > > I have updated the KIP for the item 111.
>> > > > > > > > I'm in the process of addressing 100.6, and will provide an
>> > > update
>> > > > > > soon.
>> > > > > > > > I think item 110 is still under discussion given we are now
>> > > > > providing a
>> > > > > > > way
>> > > > > > > > to finalize
>> > > > > > > > all features to their latest version levels. In any case,
>> > please
>> > > > let
>> > > > > us
>> > > > > > > > know
>> > > > > > > > how you feel in response to Colin's comments on this topic.
>> > > > > > > >
>> > > > > > > > > 111. To put this in context, when we had IBP, the default
>> > value
>> > > > is
>> > > > > > the
>> > > > > > > > > current released version. So, if you are a brand new user,
>> > you
>> > > > > don't
>> > > > > > > need
>> > > > > > > > > to configure IBP and all new features will be immediately
>> > > > available
>> > > > > > in
>> > > > > > > > the
>> > > > > > > > > new cluster. If you are upgrading from an old version,
>> you do
>> > > > need
>> > > > > to
>> > > > > > > > > understand and configure IBP. I see a similar pattern here
>> > for
>> > > > > > > > > features. From the ease of use perspective, ideally, we
>> > > shouldn't
>> > > > > > > require
>> > > > > > > > a
>> > > > > > > > > new user to have an extra step such as running a bootstrap
>> > > script
>> > > > > > > unless
>> > > > > > > > > it's truly necessary. If someone has a special need (all
>> the
>> > > > cases
>> > > > > > you
>> > > > > > > > > mentioned seem special cases?), they can configure a mode
>> > such
>> > > > that
>> > > > > > > > > features are enabled/disabled manually.
>> > > > > > > >
>> > > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if I
>> > > didn't
>> > > > > > > > understand
>> > > > > > > > this need earlier. I have updated the KIP with the approach
>> > that
>> > > > > > whenever
>> > > > > > > > the '/features' node is absent, the controller by default
>> will
>> > > > > > bootstrap
>> > > > > > > > the node
>> > > > > > > > to contain the latest feature levels. Here is the new
>> section
>> > in
>> > > > the
>> > > > > > KIP
>> > > > > > > > describing
>> > > > > > > > the same:
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
>> > > > > > > >
>> > > > > > > > Next, as I explained in my response to Colin's suggestions,
>> we
>> > > are
>> > > > > now
>> > > > > > > > providing a `--finalize-latest-features` flag with the
>> tooling.
>> > > > This
>> > > > > > lets
>> > > > > > > > the sysadmin finalize all features known to the controller
>> to
>> > > their
>> > > > > > > latest
>> > > > > > > > version
>> > > > > > > > levels. Please look at this section (point #3 and the
>> tooling
>> > > > example
>> > > > > > > > later):
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Do you feel this addresses your comment/concern?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Cheers,
>> > > > > > > > Kowshik
>> > > > > > > >
>> > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io>
>> > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi, Kowshik,
>> > > > > > > > >
>> > > > > > > > > Thanks for the reply. A few more replies below.
>> > > > > > > > >
>> > > > > > > > > 100.6 You can look for the sentence "This operation
>> requires
>> > > > ALTER
>> > > > > on
>> > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
>> > > > > > > > > KafkaApis.authorize().
>> > > > > > > > >
>> > > > > > > > > 110. From the external client/tooling perspective, it's
>> more
>> > > > > natural
>> > > > > > to
>> > > > > > > > use
>> > > > > > > > > the release version for features. If we can use the same
>> > > release
>> > > > > > > version
>> > > > > > > > > for internal representation, it seems simpler (easier to
>> > > > > understand,
>> > > > > > no
>> > > > > > > > > mapping overhead, etc). Is there a benefit with separate
>> > > external
>> > > > > and
>> > > > > > > > > internal versioning schemes?
>> > > > > > > > >
>> > > > > > > > > 111. To put this in context, when we had IBP, the default
>> > value
>> > > > is
>> > > > > > the
>> > > > > > > > > current released version. So, if you are a brand new user,
>> > you
>> > > > > don't
>> > > > > > > need
>> > > > > > > > > to configure IBP and all new features will be immediately
>> > > > available
>> > > > > > in
>> > > > > > > > the
>> > > > > > > > > new cluster. If you are upgrading from an old version,
>> you do
>> > > > need
>> > > > > to
>> > > > > > > > > understand and configure IBP. I see a similar pattern here
>> > for
>> > > > > > > > > features. From the ease of use perspective, ideally, we
>> > > shouldn't
>> > > > > > > > require a
>> > > > > > > > > new user to have an extra step such as running a bootstrap
>> > > script
>> > > > > > > unless
>> > > > > > > > > it's truly necessary. If someone has a special need (all
>> the
>> > > > cases
>> > > > > > you
>> > > > > > > > > mentioned seem special cases?), they can configure a mode
>> > such
>> > > > that
>> > > > > > > > > features are enabled/disabled manually.
>> > > > > > > > >
>> > > > > > > > > Jun
>> > > > > > > > >
>> > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
>> > > > > > > kprakasam@confluent.io>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Jun,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for the feedback and suggestions. Please find my
>> > > > response
>> > > > > > > below.
>> > > > > > > > > >
>> > > > > > > > > > > 100.6 For every new request, the admin needs to
>> control
>> > who
>> > > > is
>> > > > > > > > allowed
>> > > > > > > > > to
>> > > > > > > > > > > issue that request if security is enabled. So, we
>> need to
>> > > > > assign
>> > > > > > > the
>> > > > > > > > > new
>> > > > > > > > > > > request a ResourceType and possible AclOperations. See
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
>> > > > > > > > > > > as an example.
>> > > > > > > > > >
>> > > > > > > > > > (Kowshik): I don't see any reference to the words
>> > > ResourceType
>> > > > or
>> > > > > > > > > > AclOperations
>> > > > > > > > > > in the KIP. Please let me know how I can use the KIP
>> that
>> > you
>> > > > > > linked
>> > > > > > > to
>> > > > > > > > > > know how to
>> > > > > > > > > > setup the appropriate ResourceType and/or
>> ClusterOperation?
>> > > > > > > > > >
>> > > > > > > > > > > 105. If we change delete to disable, it's better to do
>> > this
>> > > > > > > > > consistently
>> > > > > > > > > > in
>> > > > > > > > > > > request protocol and admin api as well.
>> > > > > > > > > >
>> > > > > > > > > > (Kowshik): The API shouldn't be called 'disable' when
>> it is
>> > > > > > deleting
>> > > > > > > a
>> > > > > > > > > > feature.
>> > > > > > > > > > I've just changed the KIP to use 'delete'. I don't have
>> a
>> > > > strong
>> > > > > > > > > > preference.
>> > > > > > > > > >
>> > > > > > > > > > > 110. The minVersion/maxVersion for features use int64.
>> > > > > Currently,
>> > > > > > > our
>> > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
>> > 2.5.0).
>> > > > It's
>> > > > > > > > > possible
>> > > > > > > > > > > for new features to be included in minor releases too.
>> > > Should
>> > > > > we
>> > > > > > > make
>> > > > > > > > > the
>> > > > > > > > > > > feature versioning match the release versioning?
>> > > > > > > > > >
>> > > > > > > > > > (Kowshik): The release version can be mapped to a set of
>> > > > feature
>> > > > > > > > > versions,
>> > > > > > > > > > and this can be done, for example in the tool (or even
>> > > external
>> > > > > to
>> > > > > > > the
>> > > > > > > > > > tool).
>> > > > > > > > > > Can you please clarify what I'm missing?
>> > > > > > > > > >
>> > > > > > > > > > > 111. "During regular operations, the data in the ZK
>> node
>> > > can
>> > > > be
>> > > > > > > > mutated
>> > > > > > > > > > > only via a specific admin API served only by the
>> > > > controller." I
>> > > > > > am
>> > > > > > > > > > > wondering why can't the controller auto finalize a
>> > feature
>> > > > > > version
>> > > > > > > > > after
>> > > > > > > > > > > all brokers are upgraded? For new users who download
>> the
>> > > > latest
>> > > > > > > > version
>> > > > > > > > > > to
>> > > > > > > > > > > build a new cluster, it's inconvenient for them to
>> have
>> > to
>> > > > > > manually
>> > > > > > > > > > enable
>> > > > > > > > > > > each feature.
>> > > > > > > > > >
>> > > > > > > > > > (Kowshik): I agree that there is a trade-off here, but
>> it
>> > > will
>> > > > > help
>> > > > > > > > > > to decide whether the automation can be thought through
>> in
>> > > the
>> > > > > > future
>> > > > > > > > > > in a follow up KIP, or right now in this KIP. We may
>> invest
>> > > > > > > > > > in automation, but we have to decide whether we should
>> do
>> > it
>> > > > > > > > > > now or later.
>> > > > > > > > > >
>> > > > > > > > > > For the inconvenience that you mentioned, do you think
>> the
>> > > > > problem
>> > > > > > > that
>> > > > > > > > > you
>> > > > > > > > > > mentioned can be  overcome by asking for the cluster
>> > operator
>> > > > to
>> > > > > > run
>> > > > > > > a
>> > > > > > > > > > bootstrap script  when he/she knows that a specific AK
>> > > release
>> > > > > has
>> > > > > > > been
>> > > > > > > > > > almost completely deployed in a cluster for the first
>> time?
>> > > > Idea
>> > > > > is
>> > > > > > > > that
>> > > > > > > > > > the
>> > > > > > > > > > bootstrap script will know how to map a specific AK
>> release
>> > > to
>> > > > > > > > finalized
>> > > > > > > > > > feature versions, and run the `kafka-features.sh` tool
>> > > > > > appropriately
>> > > > > > > > > > against
>> > > > > > > > > > the cluster.
>> > > > > > > > > >
>> > > > > > > > > > Now, coming back to your automation proposal/question.
>> > > > > > > > > > I do see the value of automated feature version
>> > finalization,
>> > > > > but I
>> > > > > > > > also
>> > > > > > > > > > see
>> > > > > > > > > > that this will open up several questions and some
>> risks, as
>> > > > > > explained
>> > > > > > > > > > below.
>> > > > > > > > > > The answers to these depend on the definition of the
>> > > automation
>> > > > > we
>> > > > > > > > choose
>> > > > > > > > > > to build, and how well does it fit into a kafka
>> deployment.
>> > > > > > > > > > Basically, it can be unsafe for the controller to
>> finalize
>> > > > > feature
>> > > > > > > > > version
>> > > > > > > > > > upgrades automatically, without learning about the
>> intent
>> > of
>> > > > the
>> > > > > > > > cluster
>> > > > > > > > > > operator.
>> > > > > > > > > > 1. We would sometimes want to lock feature versions only
>> > when
>> > > > we
>> > > > > > have
>> > > > > > > > > > externally verified
>> > > > > > > > > > the stability of the broker binary.
>> > > > > > > > > > 2. Sometimes only the cluster operator knows that a
>> cluster
>> > > > > upgrade
>> > > > > > > is
>> > > > > > > > > > complete,
>> > > > > > > > > > and new brokers are highly unlikely to join the cluster.
>> > > > > > > > > > 3. Only the cluster operator knows that the intent is to
>> > > deploy
>> > > > > the
>> > > > > > > > same
>> > > > > > > > > > version
>> > > > > > > > > > of the new broker release across the entire cluster
>> (i.e.
>> > the
>> > > > > > latest
>> > > > > > > > > > downloaded version).
>> > > > > > > > > > 4. For downgrades, it appears the controller still needs
>> > some
>> > > > > > > external
>> > > > > > > > > > input
>> > > > > > > > > > (such as the proposed tool) to finalize a feature
>> version
>> > > > > > downgrade.
>> > > > > > > > > >
>> > > > > > > > > > If we have automation, that automation can end up
>> failing
>> > in
>> > > > some
>> > > > > > of
>> > > > > > > > the
>> > > > > > > > > > cases
>> > > > > > > > > > above. Then, we need a way to declare that the cluster
>> is
>> > > "not
>> > > > > > ready"
>> > > > > > > > if
>> > > > > > > > > > the
>> > > > > > > > > > controller cannot automatically finalize some basic
>> > required
>> > > > > > feature
>> > > > > > > > > > version
>> > > > > > > > > > upgrades across the cluster. We need to make the cluster
>> > > > operator
>> > > > > > > aware
>> > > > > > > > > in
>> > > > > > > > > > such a scenario (raise an alert or alike).
>> > > > > > > > > >
>> > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
>> should
>> > be
>> > > 49
>> > > > > > > instead
>> > > > > > > > > of
>> > > > > > > > > > 48.
>> > > > > > > > > >
>> > > > > > > > > > (Kowshik): Done.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Cheers,
>> > > > > > > > > > Kowshik
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <
>> jun@confluent.io>
>> > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi, Kowshik,
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks for the reply. A few more comments below.
>> > > > > > > > > > >
>> > > > > > > > > > > 100.6 For every new request, the admin needs to
>> control
>> > who
>> > > > is
>> > > > > > > > allowed
>> > > > > > > > > to
>> > > > > > > > > > > issue that request if security is enabled. So, we
>> need to
>> > > > > assign
>> > > > > > > the
>> > > > > > > > > new
>> > > > > > > > > > > request a ResourceType and possible AclOperations. See
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
>> > > > > > > > > > > as
>> > > > > > > > > > > an example.
>> > > > > > > > > > >
>> > > > > > > > > > > 105. If we change delete to disable, it's better to do
>> > this
>> > > > > > > > > consistently
>> > > > > > > > > > in
>> > > > > > > > > > > request protocol and admin api as well.
>> > > > > > > > > > >
>> > > > > > > > > > > 110. The minVersion/maxVersion for features use int64.
>> > > > > Currently,
>> > > > > > > our
>> > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
>> > 2.5.0).
>> > > > It's
>> > > > > > > > > possible
>> > > > > > > > > > > for new features to be included in minor releases too.
>> > > Should
>> > > > > we
>> > > > > > > make
>> > > > > > > > > the
>> > > > > > > > > > > feature versioning match the release versioning?
>> > > > > > > > > > >
>> > > > > > > > > > > 111. "During regular operations, the data in the ZK
>> node
>> > > can
>> > > > be
>> > > > > > > > mutated
>> > > > > > > > > > > only via a specific admin API served only by the
>> > > > controller." I
>> > > > > > am
>> > > > > > > > > > > wondering why can't the controller auto finalize a
>> > feature
>> > > > > > version
>> > > > > > > > > after
>> > > > > > > > > > > all brokers are upgraded? For new users who download
>> the
>> > > > latest
>> > > > > > > > version
>> > > > > > > > > > to
>> > > > > > > > > > > build a new cluster, it's inconvenient for them to
>> have
>> > to
>> > > > > > manually
>> > > > > > > > > > enable
>> > > > > > > > > > > each feature.
>> > > > > > > > > > >
>> > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
>> should
>> > be
>> > > 49
>> > > > > > > instead
>> > > > > > > > > of
>> > > > > > > > > > > 48.
>> > > > > > > > > > >
>> > > > > > > > > > > Jun
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
>> > > > > > > > > kprakasam@confluent.io>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hey Jun,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks a lot for the great feedback! Please note
>> that
>> > the
>> > > > > > design
>> > > > > > > > > > > > has changed a little bit on the KIP, and we now
>> > propagate
>> > > > the
>> > > > > > > > > finalized
>> > > > > > > > > > > > features metadata only via ZK watches (instead of
>> > > > > > > > > UpdateMetadataRequest
>> > > > > > > > > > > > from the controller).
>> > > > > > > > > > > >
>> > > > > > > > > > > > Please find below my response to your
>> > questions/feedback,
>> > > > > with
>> > > > > > > the
>> > > > > > > > > > prefix
>> > > > > > > > > > > > "(Kowshik):".
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
>> > > > > > > > > > > > > 100.1 Since this request waits for responses from
>> > > > brokers,
>> > > > > > > should
>> > > > > > > > > we
>> > > > > > > > > > > add
>> > > > > > > > > > > > a
>> > > > > > > > > > > > > timeout in the request (like createTopicRequest)?
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): Great point! Done. I have added a timeout
>> > > field.
>> > > > > > Note:
>> > > > > > > > we
>> > > > > > > > > no
>> > > > > > > > > > > > longer
>> > > > > > > > > > > > wait for responses from brokers, since the design
>> has
>> > > been
>> > > > > > > changed
>> > > > > > > > so
>> > > > > > > > > > > that
>> > > > > > > > > > > > the
>> > > > > > > > > > > > features information is propagated via ZK.
>> > Nevertheless,
>> > > it
>> > > > > is
>> > > > > > > > right
>> > > > > > > > > to
>> > > > > > > > > > > > have a timeout
>> > > > > > > > > > > > for the request.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 100.2 The response schema is a bit weird.
>> Typically,
>> > > the
>> > > > > > > response
>> > > > > > > > > > just
>> > > > > > > > > > > > > shows an error code and an error message, instead
>> of
>> > > > > echoing
>> > > > > > > the
>> > > > > > > > > > > request.
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to
>> > just
>> > > > > return
>> > > > > > > an
>> > > > > > > > > > error
>> > > > > > > > > > > > code and a message.
>> > > > > > > > > > > > Previously it was not echoing the "request", rather
>> it
>> > > was
>> > > > > > > > returning
>> > > > > > > > > > the
>> > > > > > > > > > > > latest set of
>> > > > > > > > > > > > cluster-wide finalized features (after applying the
>> > > > updates).
>> > > > > > But
>> > > > > > > > you
>> > > > > > > > > > are
>> > > > > > > > > > > > right,
>> > > > > > > > > > > > the additional info is not required, so I have
>> removed
>> > it
>> > > > > from
>> > > > > > > the
>> > > > > > > > > > > response
>> > > > > > > > > > > > schema.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 100.3 Should we add a separate request to
>> > list/describe
>> > > > the
>> > > > > > > > > existing
>> > > > > > > > > > > > > features?
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): This is already present in the KIP via
>> the
>> > > > > > > > > > 'DescribeFeatures'
>> > > > > > > > > > > > Admin API,
>> > > > > > > > > > > > which, underneath covers uses the
>> ApiVersionsRequest to
>> > > > > > > > list/describe
>> > > > > > > > > > the
>> > > > > > > > > > > > existing features. Please read the 'Tooling support'
>> > > > section.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
>> > > single
>> > > > > > > request.
>> > > > > > > > > For
>> > > > > > > > > > > > > DELETE, the version field doesn't make sense. So,
>> I
>> > > guess
>> > > > > the
>> > > > > > > > > broker
>> > > > > > > > > > > just
>> > > > > > > > > > > > > ignores this? An alternative way is to have a
>> > separate
>> > > > > > > > > > > > DeleteFeaturesRequest
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP now
>> to
>> > > > have 2
>> > > > > > > > > separate
>> > > > > > > > > > > > controller APIs
>> > > > > > > > > > > > serving these different purposes:
>> > > > > > > > > > > > 1. updateFeatures
>> > > > > > > > > > > > 2. deleteFeatures
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
>> > > > monotonically
>> > > > > > > > > > increasing
>> > > > > > > > > > > > > version of the metadata for finalized features."
>> I am
>> > > > > > wondering
>> > > > > > > > why
>> > > > > > > > > > the
>> > > > > > > > > > > > > ordering is important?
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is called
>> > epoch
>> > > > > > > (instead
>> > > > > > > > of
>> > > > > > > > > > > > version), and
>> > > > > > > > > > > > it is just the ZK node version. Basically, this is
>> the
>> > > > epoch
>> > > > > > for
>> > > > > > > > the
>> > > > > > > > > > > > cluster-wide
>> > > > > > > > > > > > finalized feature version metadata. This metadata is
>> > > served
>> > > > > to
>> > > > > > > > > clients
>> > > > > > > > > > > via
>> > > > > > > > > > > > the
>> > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate
>> updates
>> > > from
>> > > > > the
>> > > > > > > > > > > '/features'
>> > > > > > > > > > > > ZK node
>> > > > > > > > > > > > to all brokers, via ZK watches setup by each broker
>> on
>> > > the
>> > > > > > > > > '/features'
>> > > > > > > > > > > > node.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Now here is why the ordering is important:
>> > > > > > > > > > > > ZK watches don't propagate at the same time. As a
>> > result,
>> > > > the
>> > > > > > > > > > > > ApiVersionsResponse
>> > > > > > > > > > > > is eventually consistent across brokers. This can
>> > > introduce
>> > > > > > cases
>> > > > > > > > > > > > where clients see an older lower epoch of the
>> features
>> > > > > > metadata,
>> > > > > > > > > after
>> > > > > > > > > > a
>> > > > > > > > > > > > more recent
>> > > > > > > > > > > > higher epoch was returned at a previous point in
>> time.
>> > We
>> > > > > > expect
>> > > > > > > > > > clients
>> > > > > > > > > > > > to always employ the rule that the latest received
>> > higher
>> > > > > epoch
>> > > > > > > of
>> > > > > > > > > > > metadata
>> > > > > > > > > > > > always trumps an older smaller epoch. Those clients
>> > that
>> > > > are
>> > > > > > > > external
>> > > > > > > > > > to
>> > > > > > > > > > > > Kafka should strongly consider discovering the
>> latest
>> > > > > metadata
>> > > > > > > once
>> > > > > > > > > > > during
>> > > > > > > > > > > > startup from the brokers, and if required refresh
>> the
>> > > > > metadata
>> > > > > > > > > > > periodically
>> > > > > > > > > > > > (to get the latest metadata).
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 100.6 Could you specify the required ACL for this
>> new
>> > > > > > request?
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): What is ACL, and how could I find out
>> which
>> > > one
>> > > > to
>> > > > > > > > > specify?
>> > > > > > > > > > > > Please could you provide me some pointers? I'll be
>> glad
>> > > to
>> > > > > > update
>> > > > > > > > the
>> > > > > > > > > > > > KIP once I know the next steps.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 101. For the broker registration ZK node, should
>> we
>> > > bump
>> > > > up
>> > > > > > the
>> > > > > > > > > > version
>> > > > > > > > > > > > in
>> > > > > > > > > > > > the json?
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): Great point! Done. I've increased the
>> > version
>> > > in
>> > > > > the
>> > > > > > > > > broker
>> > > > > > > > > > > json
>> > > > > > > > > > > > by 1.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 102. For the /features ZK node, not sure if we
>> need
>> > the
>> > > > > epoch
>> > > > > > > > > field.
>> > > > > > > > > > > Each
>> > > > > > > > > > > > > ZK node has an internal version field that is
>> > > incremented
>> > > > > on
>> > > > > > > > every
>> > > > > > > > > > > > update.
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node
>> > > version
>> > > > > > now,
>> > > > > > > > > > instead
>> > > > > > > > > > > of
>> > > > > > > > > > > > explicitly
>> > > > > > > > > > > > incremented epoch.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature
>> > > version
>> > > > > > > > > cluster-wide
>> > > > > > > > > > > is
>> > > > > > > > > > > > > left to the discretion of the logic implementing
>> the
>> > > > > feature
>> > > > > > > (ex:
>> > > > > > > > > can
>> > > > > > > > > > > be
>> > > > > > > > > > > > > done via dynamic broker config)." Does that mean
>> the
>> > > > broker
>> > > > > > > > > > > registration
>> > > > > > > > > > > > ZK
>> > > > > > > > > > > > > node will be updated dynamically when this
>> happens?
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): Not really. The text was just conveying
>> > that a
>> > > > > > broker
>> > > > > > > > > could
>> > > > > > > > > > > > "know" of
>> > > > > > > > > > > > a new feature version, but it does not mean the
>> broker
>> > > > should
>> > > > > > > have
>> > > > > > > > > also
>> > > > > > > > > > > > activated the effects of the feature version.
>> Knowing
>> > vs
>> > > > > > > activation
>> > > > > > > > > > are 2
>> > > > > > > > > > > > separate things,
>> > > > > > > > > > > > and the latter can be achieved by dynamic config. I
>> > have
>> > > > > > reworded
>> > > > > > > > the
>> > > > > > > > > > > text
>> > > > > > > > > > > > to
>> > > > > > > > > > > > make this clear to the reader.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 104. UpdateMetadataRequest
>> > > > > > > > > > > > > 104.1 It would be useful to describe when the
>> feature
>> > > > > > metadata
>> > > > > > > is
>> > > > > > > > > > > > included
>> > > > > > > > > > > > > in the request. My understanding is that it's only
>> > > > included
>> > > > > > if
>> > > > > > > > (1)
>> > > > > > > > > > > there
>> > > > > > > > > > > > is
>> > > > > > > > > > > > > a change to the finalized feature; (2) broker
>> > restart;
>> > > > (3)
>> > > > > > > > > controller
>> > > > > > > > > > > > > failover.
>> > > > > > > > > > > > > 104.2 The new fields have the following versions.
>> Why
>> > > are
>> > > > > the
>> > > > > > > > > > versions
>> > > > > > > > > > > 3+
>> > > > > > > > > > > > > when the top version is bumped to 6?
>> > > > > > > > > > > > >       "fields":  [
>> > > > > > > > > > > > >         {"name": "Name", "type":  "string",
>> > "versions":
>> > > > > > "3+",
>> > > > > > > > > > > > >           "about": "The name of the feature."},
>> > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
>> > > > "versions":
>> > > > > > > "3+",
>> > > > > > > > > > > > >           "about": "The finalized version for the
>> > > > > feature."}
>> > > > > > > > > > > > >       ]
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): With the new improved design, we have
>> > > completely
>> > > > > > > > > eliminated
>> > > > > > > > > > > the
>> > > > > > > > > > > > need to
>> > > > > > > > > > > > use UpdateMetadataRequest. This is because we now
>> rely
>> > on
>> > > > ZK
>> > > > > to
>> > > > > > > > > deliver
>> > > > > > > > > > > the
>> > > > > > > > > > > > notifications for changes to the '/features' ZK
>> node.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
>> > update/delete,
>> > > > > > perhaps
>> > > > > > > > > it's
>> > > > > > > > > > > > better
>> > > > > > > > > > > > > to use enable/disable?
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so
>> that
>> > we
>> > > > > > instead
>> > > > > > > > call
>> > > > > > > > > > it
>> > > > > > > > > > > > 'disable'.
>> > > > > > > > > > > > However for 'update', it can now also refer to
>> either
>> > an
>> > > > > > upgrade
>> > > > > > > > or a
>> > > > > > > > > > > > forced downgrade.
>> > > > > > > > > > > > Therefore, I have left it the way it is, just
>> calling
>> > it
>> > > as
>> > > > > > just
>> > > > > > > > > > > 'update'.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > Kowshik
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
>> > > jun@confluent.io>
>> > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi, Kowshik,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few
>> > comments
>> > > > > below.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
>> > > > > > > > > > > > > 100.1 Since this request waits for responses from
>> > > > brokers,
>> > > > > > > should
>> > > > > > > > > we
>> > > > > > > > > > > add
>> > > > > > > > > > > > a
>> > > > > > > > > > > > > timeout in the request (like createTopicRequest)?
>> > > > > > > > > > > > > 100.2 The response schema is a bit weird.
>> Typically,
>> > > the
>> > > > > > > response
>> > > > > > > > > > just
>> > > > > > > > > > > > > shows an error code and an error message, instead
>> of
>> > > > > echoing
>> > > > > > > the
>> > > > > > > > > > > request.
>> > > > > > > > > > > > > 100.3 Should we add a separate request to
>> > list/describe
>> > > > the
>> > > > > > > > > existing
>> > > > > > > > > > > > > features?
>> > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
>> > > single
>> > > > > > > request.
>> > > > > > > > > For
>> > > > > > > > > > > > > DELETE, the version field doesn't make sense. So,
>> I
>> > > guess
>> > > > > the
>> > > > > > > > > broker
>> > > > > > > > > > > just
>> > > > > > > > > > > > > ignores this? An alternative way is to have a
>> > separate
>> > > > > > > > > > > > > DeleteFeaturesRequest
>> > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
>> > > > monotonically
>> > > > > > > > > > increasing
>> > > > > > > > > > > > > version of the metadata for finalized features."
>> I am
>> > > > > > wondering
>> > > > > > > > why
>> > > > > > > > > > the
>> > > > > > > > > > > > > ordering is important?
>> > > > > > > > > > > > > 100.6 Could you specify the required ACL for this
>> new
>> > > > > > request?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 101. For the broker registration ZK node, should
>> we
>> > > bump
>> > > > up
>> > > > > > the
>> > > > > > > > > > version
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > the json?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 102. For the /features ZK node, not sure if we
>> need
>> > the
>> > > > > epoch
>> > > > > > > > > field.
>> > > > > > > > > > > Each
>> > > > > > > > > > > > > ZK node has an internal version field that is
>> > > incremented
>> > > > > on
>> > > > > > > > every
>> > > > > > > > > > > > update.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature
>> > > version
>> > > > > > > > > cluster-wide
>> > > > > > > > > > > is
>> > > > > > > > > > > > > left to the discretion of the logic implementing
>> the
>> > > > > feature
>> > > > > > > (ex:
>> > > > > > > > > can
>> > > > > > > > > > > be
>> > > > > > > > > > > > > done via dynamic broker config)." Does that mean
>> the
>> > > > broker
>> > > > > > > > > > > registration
>> > > > > > > > > > > > ZK
>> > > > > > > > > > > > > node will be updated dynamically when this
>> happens?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 104. UpdateMetadataRequest
>> > > > > > > > > > > > > 104.1 It would be useful to describe when the
>> feature
>> > > > > > metadata
>> > > > > > > is
>> > > > > > > > > > > > included
>> > > > > > > > > > > > > in the request. My understanding is that it's only
>> > > > included
>> > > > > > if
>> > > > > > > > (1)
>> > > > > > > > > > > there
>> > > > > > > > > > > > is
>> > > > > > > > > > > > > a change to the finalized feature; (2) broker
>> > restart;
>> > > > (3)
>> > > > > > > > > controller
>> > > > > > > > > > > > > failover.
>> > > > > > > > > > > > > 104.2 The new fields have the following versions.
>> Why
>> > > are
>> > > > > the
>> > > > > > > > > > versions
>> > > > > > > > > > > 3+
>> > > > > > > > > > > > > when the top version is bumped to 6?
>> > > > > > > > > > > > >       "fields":  [
>> > > > > > > > > > > > >         {"name": "Name", "type":  "string",
>> > "versions":
>> > > > > > "3+",
>> > > > > > > > > > > > >           "about": "The name of the feature."},
>> > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
>> > > > "versions":
>> > > > > > > "3+",
>> > > > > > > > > > > > >           "about": "The finalized version for the
>> > > > > feature."}
>> > > > > > > > > > > > >       ]
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
>> > update/delete,
>> > > > > > perhaps
>> > > > > > > > > it's
>> > > > > > > > > > > > better
>> > > > > > > > > > > > > to use enable/disable?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Jun
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
>> > > > > > > > > > > kprakasam@confluent.io
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hey Boyang,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks for the great feedback! I have updated
>> the
>> > KIP
>> > > > > based
>> > > > > > > on
>> > > > > > > > > your
>> > > > > > > > > > > > > > feedback.
>> > > > > > > > > > > > > > Please find my response below for your comments,
>> > look
>> > > > for
>> > > > > > > > > sentences
>> > > > > > > > > > > > > > starting
>> > > > > > > > > > > > > > with "(Kowshik)" below.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
>> > > handling
>> > > > > EOS
>> > > > > > > > > > traffic"
>> > > > > > > > > > > > > could
>> > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > converted as "When is it safe for the brokers
>> to
>> > > > start
>> > > > > > > > serving
>> > > > > > > > > > new
>> > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
>> > > > explained
>> > > > > > > > earlier
>> > > > > > > > > > in
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > context.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! Done.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
>> > > version
>> > > > > > > number
>> > > > > > > > > part
>> > > > > > > > > > > > > seems a
>> > > > > > > > > > > > > > > bit blurred. Could you point a reference to
>> later
>> > > > > section
>> > > > > > > > that
>> > > > > > > > > we
>> > > > > > > > > > > > going
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > store it in Zookeeper and update it every time
>> > when
>> > > > > there
>> > > > > > > is
>> > > > > > > > a
>> > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > change?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a
>> > reference
>> > > in
>> > > > > the
>> > > > > > > > KIP.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a
>> > > > Non-goal
>> > > > > of
>> > > > > > > the
>> > > > > > > > > > KIP,
>> > > > > > > > > > > > for
>> > > > > > > > > > > > > > > features such as group coordinator semantics,
>> > there
>> > > > is
>> > > > > no
>> > > > > > > > legal
>> > > > > > > > > > > > > scenario
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > perform a downgrade at all. So having
>> downgrade
>> > > door
>> > > > > open
>> > > > > > > is
>> > > > > > > > > > pretty
>> > > > > > > > > > > > > > > error-prone as human faults happen all the
>> time.
>> > > I'm
>> > > > > > > assuming
>> > > > > > > > > as
>> > > > > > > > > > > new
>> > > > > > > > > > > > > > > features are implemented, it's not very hard
>> to
>> > > add a
>> > > > > > flag
>> > > > > > > > > during
>> > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > creation to indicate whether this feature is
>> > > > > > > "downgradable".
>> > > > > > > > > > Could
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > > > explain a bit more on the extra engineering
>> > effort
>> > > > for
>> > > > > > > > shipping
>> > > > > > > > > > > this
>> > > > > > > > > > > > > KIP
>> > > > > > > > > > > > > > > with downgrade protection in place?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree
>> > here.
>> > > > > While
>> > > > > > I
>> > > > > > > > > agree
>> > > > > > > > > > > that
>> > > > > > > > > > > > > > accidental
>> > > > > > > > > > > > > > downgrades can cause problems, I also think
>> > sometimes
>> > > > > > > > downgrades
>> > > > > > > > > > > should
>> > > > > > > > > > > > > > be allowed for emergency reasons (not all
>> > downgrades
>> > > > > cause
>> > > > > > > > > issues).
>> > > > > > > > > > > > > > It is just subjective to the feature being
>> > > downgraded.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > To be more strict about feature version
>> > downgrades, I
>> > > > > have
>> > > > > > > > > modified
>> > > > > > > > > > > the
>> > > > > > > > > > > > > KIP
>> > > > > > > > > > > > > > proposing that we mandate a `--force-downgrade`
>> > flag
>> > > be
>> > > > > > used
>> > > > > > > in
>> > > > > > > > > the
>> > > > > > > > > > > > > > UPDATE_FEATURES api
>> > > > > > > > > > > > > > and the tooling, whenever the human is
>> downgrading
>> > a
>> > > > > > > finalized
>> > > > > > > > > > > feature
>> > > > > > > > > > > > > > version.
>> > > > > > > > > > > > > > Hopefully this should cover the requirement,
>> until
>> > we
>> > > > > find
>> > > > > > > the
>> > > > > > > > > need
>> > > > > > > > > > > for
>> > > > > > > > > > > > > > advanced downgrade support.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
>> feature
>> > > > > > versions
>> > > > > > > > will
>> > > > > > > > > > be
>> > > > > > > > > > > > > > defined
>> > > > > > > > > > > > > > > in the broker code." So this means in order to
>> > > > > restrict a
>> > > > > > > > > certain
>> > > > > > > > > > > > > > feature,
>> > > > > > > > > > > > > > > we need to start the broker first and then
>> send a
>> > > > > feature
>> > > > > > > > > gating
>> > > > > > > > > > > > > request
>> > > > > > > > > > > > > > > immediately, which introduces a time gap and
>> the
>> > > > > > > > > > intended-to-close
>> > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > could actually serve request during this
>> phase.
>> > Do
>> > > > you
>> > > > > > > think
>> > > > > > > > we
>> > > > > > > > > > > > should
>> > > > > > > > > > > > > > also
>> > > > > > > > > > > > > > > support configurations as well so that admin
>> user
>> > > > could
>> > > > > > > > freely
>> > > > > > > > > > roll
>> > > > > > > > > > > > up
>> > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > cluster with all nodes complying the same
>> feature
>> > > > > gating,
>> > > > > > > > > without
>> > > > > > > > > > > > > > worrying
>> > > > > > > > > > > > > > > about the turnaround time to propagate the
>> > message
>> > > > only
>> > > > > > > after
>> > > > > > > > > the
>> > > > > > > > > > > > > cluster
>> > > > > > > > > > > > > > > starts up?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): This is a great point/question. One
>> of
>> > the
>> > > > > > > > > expectations
>> > > > > > > > > > > out
>> > > > > > > > > > > > of
>> > > > > > > > > > > > > > this KIP, which is
>> > > > > > > > > > > > > > already followed in the broker, is the
>> following.
>> > > > > > > > > > > > > >  - Imagine at time T1 the broker starts up and
>> > > > registers
>> > > > > > it’s
>> > > > > > > > > > > presence
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > ZK,
>> > > > > > > > > > > > > >    along with advertising it’s supported
>> features.
>> > > > > > > > > > > > > >  - Imagine at a future time T2 the broker
>> receives
>> > > the
>> > > > > > > > > > > > > > UpdateMetadataRequest
>> > > > > > > > > > > > > >    from the controller, which contains the
>> latest
>> > > > > finalized
>> > > > > > > > > > features
>> > > > > > > > > > > as
>> > > > > > > > > > > > > > seen by
>> > > > > > > > > > > > > >    the controller. The broker validates this
>> data
>> > > > against
>> > > > > > > it’s
>> > > > > > > > > > > > supported
>> > > > > > > > > > > > > > features to
>> > > > > > > > > > > > > >    make sure there is no mismatch (it will
>> shutdown
>> > > if
>> > > > > > there
>> > > > > > > is
>> > > > > > > > > an
>> > > > > > > > > > > > > > incompatibility).
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > It is expected that during the time between the
>> 2
>> > > > events
>> > > > > T1
>> > > > > > > and
>> > > > > > > > > T2,
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > broker is
>> > > > > > > > > > > > > > almost a silent entity in the cluster. It does
>> not
>> > > add
>> > > > > any
>> > > > > > > > value
>> > > > > > > > > to
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > cluster, or carry
>> > > > > > > > > > > > > > out any important broker activities. By
>> > “important”,
>> > > I
>> > > > > mean
>> > > > > > > it
>> > > > > > > > is
>> > > > > > > > > > not
>> > > > > > > > > > > > > doing
>> > > > > > > > > > > > > > mutations
>> > > > > > > > > > > > > > on it’s persistence, not mutating critical
>> > in-memory
>> > > > > state,
>> > > > > > > > won’t
>> > > > > > > > > > be
>> > > > > > > > > > > > > > serving
>> > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t even
>> know
>> > > it’s
>> > > > > > > assigned
>> > > > > > > > > > > > > partitions
>> > > > > > > > > > > > > > until
>> > > > > > > > > > > > > > it receives UpdateMetadataRequest from
>> controller.
>> > > > > Anything
>> > > > > > > the
>> > > > > > > > > > > broker
>> > > > > > > > > > > > is
>> > > > > > > > > > > > > > doing up
>> > > > > > > > > > > > > > until this point is not damaging/useful.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I’ve clarified the above in the KIP, see this
>> new
>> > > > > section:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
>> > > > > > > > > > > > > > .
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
>> deleting an
>> > > > > > existing
>> > > > > > > > > > > Feature",
>> > > > > > > > > > > > > may
>> > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > I misunderstood something, I thought the
>> features
>> > > are
>> > > > > > > defined
>> > > > > > > > > in
>> > > > > > > > > > > > broker
>> > > > > > > > > > > > > > > code, so admin could not really create a new
>> > > feature?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! You understood this
>> right.
>> > > Here
>> > > > > > > adding
>> > > > > > > > a
>> > > > > > > > > > > > feature
>> > > > > > > > > > > > > > means we are
>> > > > > > > > > > > > > > adding a cluster-wide finalized *max* version
>> for a
>> > > > > feature
>> > > > > > > > that
>> > > > > > > > > > was
>> > > > > > > > > > > > > > previously never finalized.
>> > > > > > > > > > > > > > I have clarified this in the KIP now.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 6. I think we need a separate error code like
>> > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > reject a concurrent feature update request.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP
>> > > adding
>> > > > > the
>> > > > > > > > above
>> > > > > > > > > > (see
>> > > > > > > > > > > > > > 'Tooling support -> Admin API changes').
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 7. I think we haven't discussed the
>> alternative
>> > > > > solution
>> > > > > > to
>> > > > > > > > > pass
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > feature information through Zookeeper. Is that
>> > > > > mentioned
>> > > > > > in
>> > > > > > > > the
>> > > > > > > > > > KIP
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > > > justify why using UpdateMetadata is more
>> > favorable?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads
>> > finalized
>> > > > > > feature
>> > > > > > > > info
>> > > > > > > > > > > > stored
>> > > > > > > > > > > > > in
>> > > > > > > > > > > > > > ZK,
>> > > > > > > > > > > > > > only during startup when it does a validation.
>> When
>> > > > > serving
>> > > > > > > > > > > > > > `ApiVersionsRequest`, the
>> > > > > > > > > > > > > > broker does not read this info from ZK directly.
>> > I'd
>> > > > > > imagine
>> > > > > > > > the
>> > > > > > > > > > risk
>> > > > > > > > > > > > is
>> > > > > > > > > > > > > > that it can increase
>> > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck for
>> the
>> > > > system.
>> > > > > > > > Today,
>> > > > > > > > > in
>> > > > > > > > > > > > Kafka
>> > > > > > > > > > > > > > we use the
>> > > > > > > > > > > > > > controller to fan out ZK updates to brokers and
>> we
>> > > want
>> > > > > to
>> > > > > > > > stick
>> > > > > > > > > to
>> > > > > > > > > > > > that
>> > > > > > > > > > > > > > pattern to avoid
>> > > > > > > > > > > > > > the ZK read bottleneck when serving
>> > > > `ApiVersionsRequest`.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 8. I was under the impression that user could
>> > > > > configure a
>> > > > > > > > range
>> > > > > > > > > > of
>> > > > > > > > > > > > > > > supported versions, what's the trade-off for
>> > > allowing
>> > > > > > > single
>> > > > > > > > > > > > finalized
>> > > > > > > > > > > > > > > version only?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great question! The finalized version
>> > of a
>> > > > > > feature
>> > > > > > > > > > > basically
>> > > > > > > > > > > > > > refers to
>> > > > > > > > > > > > > > the cluster-wide finalized feature "maximum"
>> > version.
>> > > > For
>> > > > > > > > > example,
>> > > > > > > > > > if
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > 'group_coordinator' feature
>> > > > > > > > > > > > > > has the finalized version set to 10, then, it
>> means
>> > > > that
>> > > > > > > > > > cluster-wide
>> > > > > > > > > > > > all
>> > > > > > > > > > > > > > versions upto v10 are
>> > > > > > > > > > > > > > supported for this feature. However, note that
>> if
>> > > some
>> > > > > > > version
>> > > > > > > > > (ex:
>> > > > > > > > > > > v0)
>> > > > > > > > > > > > > > gets deprecated
>> > > > > > > > > > > > > > for this feature, then we don’t convey that
>> using
>> > > this
>> > > > > > scheme
>> > > > > > > > > (also
>> > > > > > > > > > > > > > supporting deprecation is a non-goal).
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all
>> points,
>> > > > > > refering
>> > > > > > > to
>> > > > > > > > > > > > finalized
>> > > > > > > > > > > > > > feature "maximum" versions.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
>> > > "client"
>> > > > > here
>> > > > > > > may
>> > > > > > > > > be
>> > > > > > > > > > a
>> > > > > > > > > > > > > > producer
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! Done.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > Kowshik
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
>> > > > > > > > > > > > reluctanthero104@gmail.com>
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Hey Kowshik,
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple of
>> > > > questions:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
>> > > handling
>> > > > > EOS
>> > > > > > > > > > traffic"
>> > > > > > > > > > > > > could
>> > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > converted as "When is it safe for the brokers
>> to
>> > > > start
>> > > > > > > > serving
>> > > > > > > > > > new
>> > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
>> > > > explained
>> > > > > > > > earlier
>> > > > > > > > > > in
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > context.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
>> > > version
>> > > > > > > number
>> > > > > > > > > part
>> > > > > > > > > > > > > seems a
>> > > > > > > > > > > > > > > bit blurred. Could you point a reference to
>> later
>> > > > > section
>> > > > > > > > that
>> > > > > > > > > we
>> > > > > > > > > > > > going
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > store it in Zookeeper and update it every time
>> > when
>> > > > > there
>> > > > > > > is
>> > > > > > > > a
>> > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > change?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a
>> > > > Non-goal
>> > > > > of
>> > > > > > > the
>> > > > > > > > > > KIP,
>> > > > > > > > > > > > for
>> > > > > > > > > > > > > > > features such as group coordinator semantics,
>> > there
>> > > > is
>> > > > > no
>> > > > > > > > legal
>> > > > > > > > > > > > > scenario
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > perform a downgrade at all. So having
>> downgrade
>> > > door
>> > > > > open
>> > > > > > > is
>> > > > > > > > > > pretty
>> > > > > > > > > > > > > > > error-prone as human faults happen all the
>> time.
>> > > I'm
>> > > > > > > assuming
>> > > > > > > > > as
>> > > > > > > > > > > new
>> > > > > > > > > > > > > > > features are implemented, it's not very hard
>> to
>> > > add a
>> > > > > > flag
>> > > > > > > > > during
>> > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > creation to indicate whether this feature is
>> > > > > > > "downgradable".
>> > > > > > > > > > Could
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > > > explain a bit more on the extra engineering
>> > effort
>> > > > for
>> > > > > > > > shipping
>> > > > > > > > > > > this
>> > > > > > > > > > > > > KIP
>> > > > > > > > > > > > > > > with downgrade protection in place?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
>> feature
>> > > > > > versions
>> > > > > > > > will
>> > > > > > > > > > be
>> > > > > > > > > > > > > > defined
>> > > > > > > > > > > > > > > in the broker code." So this means in order to
>> > > > > restrict a
>> > > > > > > > > certain
>> > > > > > > > > > > > > > feature,
>> > > > > > > > > > > > > > > we need to start the broker first and then
>> send a
>> > > > > feature
>> > > > > > > > > gating
>> > > > > > > > > > > > > request
>> > > > > > > > > > > > > > > immediately, which introduces a time gap and
>> the
>> > > > > > > > > > intended-to-close
>> > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > could actually serve request during this
>> phase.
>> > Do
>> > > > you
>> > > > > > > think
>> > > > > > > > we
>> > > > > > > > > > > > should
>> > > > > > > > > > > > > > also
>> > > > > > > > > > > > > > > support configurations as well so that admin
>> user
>> > > > could
>> > > > > > > > freely
>> > > > > > > > > > roll
>> > > > > > > > > > > > up
>> > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > cluster with all nodes complying the same
>> feature
>> > > > > gating,
>> > > > > > > > > without
>> > > > > > > > > > > > > > worrying
>> > > > > > > > > > > > > > > about the turnaround time to propagate the
>> > message
>> > > > only
>> > > > > > > after
>> > > > > > > > > the
>> > > > > > > > > > > > > cluster
>> > > > > > > > > > > > > > > starts up?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
>> deleting an
>> > > > > > existing
>> > > > > > > > > > > Feature",
>> > > > > > > > > > > > > may
>> > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > I misunderstood something, I thought the
>> features
>> > > are
>> > > > > > > defined
>> > > > > > > > > in
>> > > > > > > > > > > > broker
>> > > > > > > > > > > > > > > code, so admin could not really create a new
>> > > feature?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 6. I think we need a separate error code like
>> > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > reject a concurrent feature update request.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 7. I think we haven't discussed the
>> alternative
>> > > > > solution
>> > > > > > to
>> > > > > > > > > pass
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > feature information through Zookeeper. Is that
>> > > > > mentioned
>> > > > > > in
>> > > > > > > > the
>> > > > > > > > > > KIP
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > > > justify why using UpdateMetadata is more
>> > favorable?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 8. I was under the impression that user could
>> > > > > configure a
>> > > > > > > > range
>> > > > > > > > > > of
>> > > > > > > > > > > > > > > supported versions, what's the trade-off for
>> > > allowing
>> > > > > > > single
>> > > > > > > > > > > > finalized
>> > > > > > > > > > > > > > > version only?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
>> > > "client"
>> > > > > here
>> > > > > > > may
>> > > > > > > > > be
>> > > > > > > > > > a
>> > > > > > > > > > > > > > producer
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Boyang
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
>> > > > > > > > > cmccabe@apache.org
>> > > > > > > > > > >
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik
>> > Prakasam
>> > > > > wrote:
>> > > > > > > > > > > > > > > > > Hi Colin,
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Thanks for the feedback! I've changed the
>> KIP
>> > > to
>> > > > > > > address
>> > > > > > > > > your
>> > > > > > > > > > > > > > > > > suggestions.
>> > > > > > > > > > > > > > > > > Please find below my explanation. Here is
>> a
>> > > link
>> > > > to
>> > > > > > KIP
>> > > > > > > > > 584:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > > > > > > > > > > > > .
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 1. '__data_version__' is the version of
>> the
>> > > > > finalized
>> > > > > > > > > feature
>> > > > > > > > > > > > > > metadata
>> > > > > > > > > > > > > > > > > (i.e. actual ZK node contents), while the
>> > > > > > > > > > '__schema_version__'
>> > > > > > > > > > > is
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > version of the schema of the data
>> persisted
>> > in
>> > > > ZK.
>> > > > > > > These
>> > > > > > > > > > serve
>> > > > > > > > > > > > > > > different
>> > > > > > > > > > > > > > > > > purposes. '__data_version__' is is useful
>> > > mainly
>> > > > to
>> > > > > > > > clients
>> > > > > > > > > > > > during
>> > > > > > > > > > > > > > > reads,
>> > > > > > > > > > > > > > > > > to differentiate between the 2 versions of
>> > > > > eventually
>> > > > > > > > > > > consistent
>> > > > > > > > > > > > > > > > 'finalized
>> > > > > > > > > > > > > > > > > features' metadata (i.e. larger metadata
>> > > version
>> > > > is
>> > > > > > > more
>> > > > > > > > > > > recent).
>> > > > > > > > > > > > > > > > > '__schema_version__' provides an
>> additional
>> > > > degree
>> > > > > of
>> > > > > > > > > > > > flexibility,
>> > > > > > > > > > > > > > > where
>> > > > > > > > > > > > > > > > if
>> > > > > > > > > > > > > > > > > we decide to change the schema for
>> > '/features'
>> > > > node
>> > > > > > in
>> > > > > > > ZK
>> > > > > > > > > (in
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > > future),
>> > > > > > > > > > > > > > > > > then we can manage broker roll outs
>> suitably
>> > > > (i.e.
>> > > > > > > > > > > > > > > > > serialization/deserialization of the ZK
>> data
>> > > can
>> > > > be
>> > > > > > > > handled
>> > > > > > > > > > > > > safely).
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hi Kowshik,
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > If you're talking about a number that lets
>> you
>> > > know
>> > > > > if
>> > > > > > > data
>> > > > > > > > > is
>> > > > > > > > > > > more
>> > > > > > > > > > > > > or
>> > > > > > > > > > > > > > > > less recent, we would typically call that an
>> > > epoch,
>> > > > > and
>> > > > > > > > not a
>> > > > > > > > > > > > > version.
>> > > > > > > > > > > > > > > For
>> > > > > > > > > > > > > > > > the ZK data structures, the word "version"
>> is
>> > > > > typically
>> > > > > > > > > > reserved
>> > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > describing changes to the overall schema of
>> the
>> > > > data
>> > > > > > that
>> > > > > > > > is
>> > > > > > > > > > > > written
>> > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > ZooKeeper.  We don't even really change the
>> > > > "version"
>> > > > > > of
>> > > > > > > > > those
>> > > > > > > > > > > > > schemas
>> > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > much, since most changes are
>> > > backwards-compatible.
>> > > > > But
>> > > > > > > we
>> > > > > > > > do
>> > > > > > > > > > > > include
>> > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > version field just in case.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I don't think we really need an epoch here,
>> > > though,
>> > > > > > since
>> > > > > > > > we
>> > > > > > > > > > can
>> > > > > > > > > > > > just
>> > > > > > > > > > > > > > > look
>> > > > > > > > > > > > > > > > at the broker epoch.  Whenever the broker
>> > > > registers,
>> > > > > > its
>> > > > > > > > > epoch
>> > > > > > > > > > > will
>> > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > greater than the previous broker epoch.  And
>> > the
>> > > > > newly
>> > > > > > > > > > registered
>> > > > > > > > > > > > > data
>> > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > take priority.  This will be a lot simpler
>> than
>> > > > > adding
>> > > > > > a
>> > > > > > > > > > separate
>> > > > > > > > > > > > > epoch
>> > > > > > > > > > > > > > > > system, I think.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 2. Regarding admin client needing min and
>> max
>> > > > > > > > information -
>> > > > > > > > > > you
>> > > > > > > > > > > > are
>> > > > > > > > > > > > > > > > right!
>> > > > > > > > > > > > > > > > > I've changed the KIP such that the Admin
>> API
>> > > also
>> > > > > > > allows
>> > > > > > > > > the
>> > > > > > > > > > > user
>> > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > read
>> > > > > > > > > > > > > > > > > 'supported features' from a specific
>> broker.
>> > > > Please
>> > > > > > > look
>> > > > > > > > at
>> > > > > > > > > > the
>> > > > > > > > > > > > > > section
>> > > > > > > > > > > > > > > > > "Admin API changes".
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long`
>> - it
>> > > was
>> > > > > not
>> > > > > > > > > > > deliberate.
>> > > > > > > > > > > > > > I've
>> > > > > > > > > > > > > > > > > improved the KIP to just use `long` at all
>> > > > places.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Sounds good.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand
>> tool
>> > -
>> > > > you
>> > > > > > are
>> > > > > > > > > right!
>> > > > > > > > > > > > I've
>> > > > > > > > > > > > > > > > updated
>> > > > > > > > > > > > > > > > > the KIP sketching the functionality
>> provided
>> > by
>> > > > > this
>> > > > > > > > tool,
>> > > > > > > > > > with
>> > > > > > > > > > > > > some
>> > > > > > > > > > > > > > > > > examples. Please look at the section
>> "Tooling
>> > > > > support
>> > > > > > > > > > > examples".
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Thank you!
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks, Kowshik.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > cheers,
>> > > > > > > > > > > > > > > > Colin
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > > Kowshik
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin
>> > McCabe <
>> > > > > > > > > > > > cmccabe@apache.org>
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > In the "Schema" section, do we really
>> need
>> > > both
>> > > > > > > > > > > > > __schema_version__
>> > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > > __data_version__?  Can we just have a
>> > single
>> > > > > > version
>> > > > > > > > > field
>> > > > > > > > > > > > here?
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function
>> have
>> > > some
>> > > > > way
>> > > > > > to
>> > > > > > > > get
>> > > > > > > > > > the
>> > > > > > > > > > > > min
>> > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > max
>> > > > > > > > > > > > > > > > > > information that we're exposing as
>> well?  I
>> > > > guess
>> > > > > > we
>> > > > > > > > > could
>> > > > > > > > > > > have
>> > > > > > > > > > > > > > min,
>> > > > > > > > > > > > > > > > max,
>> > > > > > > > > > > > > > > > > > and current.  Unrelated: is the use of
>> Long
>> > > > > rather
>> > > > > > > than
>> > > > > > > > > > long
>> > > > > > > > > > > > > > > deliberate
>> > > > > > > > > > > > > > > > > > here?
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > It would be good to describe how the
>> > command
>> > > > line
>> > > > > > > tool
>> > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work.
>> For
>> > > > > example
>> > > > > > > the
>> > > > > > > > > > flags
>> > > > > > > > > > > > that
>> > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > > take and the output that it will
>> generate
>> > to
>> > > > > > STDOUT.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > cheers,
>> > > > > > > > > > > > > > > > > > Colin
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik
>> > > > Prakasam
>> > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > Hi all,
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I've opened KIP-584
>> > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
>> > > > > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > which
>> > > > > > > > > > > > > > > > > > > is intended to provide a versioning
>> > scheme
>> > > > for
>> > > > > > > > > features.
>> > > > > > > > > > > I'd
>> > > > > > > > > > > > > like
>> > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > > > > > this thread to discuss the same. I'd
>> > > > appreciate
>> > > > > > any
>> > > > > > > > > > > feedback
>> > > > > > > > > > > > on
>> > > > > > > > > > > > > > > this.
>> > > > > > > > > > > > > > > > > > > Here
>> > > > > > > > > > > > > > > > > > > is a link to KIP-584
>> > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > > > > > > > > > > > > > >  .
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Thank you!
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > > > > Kowshik
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Dhruvil Shah <dh...@confluent.io>.
Hi Kowshik,

Thanks for the KIP, this is exciting!

The KIP includes examples on how operators could use the command line
utility, etc. It would be great to add some high-level details on how the
upgrade workflow changes overall with the addition of feature versions.

- Dhruvil

On Wed, Apr 15, 2020 at 6:29 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> Sorry the links were broken in my last response, here are the right links:
>
> 200.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioning
> Scheme For Features-Validations
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
> >
> 110.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-When
> To Use Versioned Feature Flags?
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Whentouseversionedfeatureflags
> ?>
>
>
> Cheers,
> Kowshik
>
> On Wed, Apr 15, 2020 at 6:24 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> >
> > Hi Jun,
> >
> > Thanks for the feedback! I have addressed the comments in the KIP.
> >
> > > 200. In the validation section, there is still the text  "*from*
> > > {"max_version_level":
> > > X} *to* {"max_version_level": X’}". It seems that it should say "from X
> > to
> > > Y"?
> >
> > (Kowshik): Done. I have reworded it a bit to make it clearer now in this
> > section:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
> >
> > > 110. Could we add that we need to document the bumped version of each
> > > feature in the upgrade section of a release?
> >
> > (Kowshik): Great point! Done, I have mentioned it in #3 this section:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584
> > <https://issues.apache.org/jira/browse/KIP-584>
> > %3A+Versioning+scheme+for+features#KIP-584
> > <https://issues.apache.org/jira/browse/KIP-584>
> > :Versioningschemeforfeatures-Whentouseversionedfeatureflags?
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Wed, Apr 15, 2020 at 4:00 PM Jun Rao <ju...@confluent.io> wrote:
> >
> >> Hi, Kowshik,
> >>
> >> Looks good to me now. Just a couple of minor things below.
> >>
> >> 200. In the validation section, there is still the text  "*from*
> >> {"max_version_level":
> >> X} *to* {"max_version_level": X’}". It seems that it should say "from X
> to
> >> Y"?
> >>
> >> 110. Could we add that we need to document the bumped version of each
> >> feature in the upgrade section of a release?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam <
> kprakasam@confluent.io>
> >> wrote:
> >>
> >> > Hi Jun,
> >> >
> >> > Thank you for the suggestion! I have updated the KIP, please find my
> >> > response below.
> >> >
> >> > > 200. I guess you are saying only when the allowDowngrade field is
> set,
> >> > the
> >> > > finalized feature version can go backward. Otherwise, it can only go
> >> up.
> >> > > That makes sense. It would be useful to make that clear when
> >> explaining
> >> > > the usage of the allowDowngrade field. In the validation section, we
> >> > have  "
> >> > > /features' from {"max_version_level": X} to {"max_version_level":
> >> X’}",
> >> > it
> >> > > seems that we need to mention Y there.
> >> >
> >> > (Kowshik): Great point! Yes, that is correct. Done, I have updated the
> >> > validations
> >> > section explaining the above. Here is a link to this section:
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
> >> >
> >> >
> >> > Cheers,
> >> > Kowshik
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <ju...@confluent.io> wrote:
> >> >
> >> > > Hi, Kowshik,
> >> > >
> >> > > 200. I guess you are saying only when the allowDowngrade field is
> set,
> >> > the
> >> > > finalized feature version can go backward. Otherwise, it can only go
> >> up.
> >> > > That makes sense. It would be useful to make that clear when
> >> explaining
> >> > > the usage of the allowDowngrade field. In the validation section, we
> >> have
> >> > > "
> >> > > /features' from {"max_version_level": X} to {"max_version_level":
> >> X’}",
> >> > it
> >> > > seems that we need to mention Y there.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > > On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <
> >> > kprakasam@confluent.io>
> >> > > wrote:
> >> > >
> >> > > > Hi Jun,
> >> > > >
> >> > > > Great question! Please find my response below.
> >> > > >
> >> > > > > 200. My understanding is that If the CLI tool passes the
> >> > > > > '--allow-downgrade' flag when updating a specific feature, then
> a
> >> > > future
> >> > > > > downgrade is possible. Otherwise, the feature is now
> >> downgradable. If
> >> > > so,
> >> > > > I
> >> > > > > was wondering how the controller remembers this since it can be
> >> > > restarted
> >> > > > > over time?
> >> > > >
> >> > > > (Kowshik): The purpose of the flag was to just restrict the user
> >> intent
> >> > > for
> >> > > > a specific request.
> >> > > > It seems to me that to avoid confusion, I could call the flag as
> >> > > > `--try-downgrade` instead.
> >> > > > Then this makes it clear, that, the controller just has to
> consider
> >> the
> >> > > ask
> >> > > > from
> >> > > > the user as an explicit request to attempt a downgrade.
> >> > > >
> >> > > > The flag does not act as an override on controller's decision
> making
> >> > that
> >> > > > decides whether
> >> > > > a flag is downgradable (these decisions on whether to allow a flag
> >> to
> >> > be
> >> > > > downgraded
> >> > > > from a specific version level, can be embedded in the controller
> >> code).
> >> > > >
> >> > > > Please let me know what you think.
> >> > > > Sorry if I misunderstood the original question.
> >> > > >
> >> > > >
> >> > > > Cheers,
> >> > > > Kowshik
> >> > > >
> >> > > >
> >> > > > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <ju...@confluent.io> wrote:
> >> > > >
> >> > > > > Hi, Kowshik,
> >> > > > >
> >> > > > > Thanks for the reply. Makes sense. Just one more question.
> >> > > > >
> >> > > > > 200. My understanding is that If the CLI tool passes the
> >> > > > > '--allow-downgrade' flag when updating a specific feature, then
> a
> >> > > future
> >> > > > > downgrade is possible. Otherwise, the feature is now
> >> downgradable. If
> >> > > > so, I
> >> > > > > was wondering how the controller remembers this since it can be
> >> > > restarted
> >> > > > > over time?
> >> > > > >
> >> > > > > Jun
> >> > > > >
> >> > > > >
> >> > > > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <
> >> > > kprakasam@confluent.io
> >> > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi Jun,
> >> > > > > >
> >> > > > > > Thanks a lot for the feedback and the questions!
> >> > > > > > Please find my response below.
> >> > > > > >
> >> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade
> >> field.
> >> > It
> >> > > > > seems
> >> > > > > > > that field needs to be persisted somewhere in ZK?
> >> > > > > >
> >> > > > > > (Kowshik): Great question! Below is my explanation. Please
> help
> >> me
> >> > > > > > understand,
> >> > > > > > if you feel there are cases where we would need to still
> >> persist it
> >> > > in
> >> > > > > ZK.
> >> > > > > >
> >> > > > > > Firstly I have updated my thoughts into the KIP now, under the
> >> > > > > 'guidelines'
> >> > > > > > section:
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> >> > > > > >
> >> > > > > > The allowDowngrade boolean field is just to restrict the user
> >> > intent,
> >> > > > and
> >> > > > > > to remind
> >> > > > > > them to double check their intent before proceeding. It should
> >> be
> >> > set
> >> > > > to
> >> > > > > > true
> >> > > > > > by the user in a request, only when the user intent is to
> >> > forcefully
> >> > > > > > "attempt" a
> >> > > > > > downgrade of a specific feature's max version level, to the
> >> > provided
> >> > > > > value
> >> > > > > > in
> >> > > > > > the request.
> >> > > > > >
> >> > > > > > We can extend this safeguard. The controller (on it's end) can
> >> > > maintain
> >> > > > > > rules in the code, that, for safety reasons would outright
> >> reject
> >> > > > certain
> >> > > > > > downgrades
> >> > > > > > from a specific max_version_level for a specific feature. Such
> >> > > > rejections
> >> > > > > > may
> >> > > > > > happen depending on the feature being downgraded, and from
> what
> >> > > version
> >> > > > > > level.
> >> > > > > >
> >> > > > > > The CLI tool only allows a downgrade attempt in conjunction
> with
> >> > > > specific
> >> > > > > > flags and sub-commands. For example, in the CLI tool, if the
> >> user
> >> > > uses
> >> > > > > the
> >> > > > > > 'downgrade-all' command, or passes '--allow-downgrade' flag
> when
> >> > > > > updating a
> >> > > > > > specific feature, only then the tool will translate this ask
> to
> >> > > setting
> >> > > > > > 'allowDowngrade' field in the request to the server.
> >> > > > > >
> >> > > > > > > 201. UpdateFeaturesResponse has the following top level
> >> fields.
> >> > > > Should
> >> > > > > > > those fields be per feature?
> >> > > > > > >
> >> > > > > > >   "fields": [
> >> > > > > > >     { "name": "ErrorCode", "type": "int16", "versions":
> "0+",
> >> > > > > > >       "about": "The error code, or 0 if there was no error."
> >> },
> >> > > > > > >     { "name": "ErrorMessage", "type": "string", "versions":
> >> "0+",
> >> > > > > > >       "about": "The error message, or null if there was no
> >> > error."
> >> > > }
> >> > > > > > >   ]
> >> > > > > >
> >> > > > > > (Kowshik): Great question!
> >> > > > > > As such, the API is transactional, as explained in the
> sections
> >> > > linked
> >> > > > > > below.
> >> > > > > > Either all provided FeatureUpdate was applied, or none.
> >> > > > > > It's the reason I felt we can have just one error code +
> >> message.
> >> > > > > > Happy to extend this if you feel otherwise. Please let me
> know.
> >> > > > > >
> >> > > > > > Link to sections:
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
> >> > > > > >
> >> > > > > > > 202. The /features path in ZK has a field min_version_level.
> >> > Which
> >> > > > API
> >> > > > > > and
> >> > > > > > > tool can change that value?
> >> > > > > >
> >> > > > > > (Kowshik): Great question! Currently this cannot be modified
> by
> >> > using
> >> > > > the
> >> > > > > > API or the tool.
> >> > > > > > Feature version deprecation (by raising min_version_level) can
> >> be
> >> > > done
> >> > > > > only
> >> > > > > > by the Controller directly. The rationale is explained in this
> >> > > section:
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
> >> > > > > >
> >> > > > > >
> >> > > > > > Cheers,
> >> > > > > > Kowshik
> >> > > > > >
> >> > > > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io>
> >> wrote:
> >> > > > > >
> >> > > > > > > Hi, Kowshik,
> >> > > > > > >
> >> > > > > > > Thanks for addressing those comments. Just a few more minor
> >> > > comments.
> >> > > > > > >
> >> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade
> >> field.
> >> > It
> >> > > > > seems
> >> > > > > > > that field needs to be persisted somewhere in ZK?
> >> > > > > > >
> >> > > > > > > 201. UpdateFeaturesResponse has the following top level
> >> fields.
> >> > > > Should
> >> > > > > > > those fields be per feature?
> >> > > > > > >
> >> > > > > > >   "fields": [
> >> > > > > > >     { "name": "ErrorCode", "type": "int16", "versions":
> "0+",
> >> > > > > > >       "about": "The error code, or 0 if there was no error."
> >> },
> >> > > > > > >     { "name": "ErrorMessage", "type": "string", "versions":
> >> "0+",
> >> > > > > > >       "about": "The error message, or null if there was no
> >> > error."
> >> > > }
> >> > > > > > >   ]
> >> > > > > > >
> >> > > > > > > 202. The /features path in ZK has a field min_version_level.
> >> > Which
> >> > > > API
> >> > > > > > and
> >> > > > > > > tool can change that value?
> >> > > > > > >
> >> > > > > > > Jun
> >> > > > > > >
> >> > > > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
> >> > > > > kprakasam@confluent.io
> >> > > > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi Jun,
> >> > > > > > > >
> >> > > > > > > > Thanks for the feedback! I have updated the KIP-584
> >> addressing
> >> > > your
> >> > > > > > > > comments.
> >> > > > > > > > Please find my response below.
> >> > > > > > > >
> >> > > > > > > > > 100.6 You can look for the sentence "This operation
> >> requires
> >> > > > ALTER
> >> > > > > on
> >> > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> >> > > > > > > > > KafkaApis.authorize().
> >> > > > > > > >
> >> > > > > > > > (Kowshik): Done. Great point! For the newly introduced
> >> > > > > UPDATE_FEATURES
> >> > > > > > > api,
> >> > > > > > > > I have added a
> >> > > > > > > > requirement that AclOperation.ALTER is required on
> >> > > > > > ResourceType.CLUSTER.
> >> > > > > > > >
> >> > > > > > > > > 110. Keeping the feature version as int is probably
> fine.
> >> I
> >> > > just
> >> > > > > felt
> >> > > > > > > > that
> >> > > > > > > > > for some of the common user interactions, it's more
> >> > convenient
> >> > > to
> >> > > > > > > > > relate that to a release version. For example, if a user
> >> > wants
> >> > > to
> >> > > > > > > > downgrade
> >> > > > > > > > > to a release 2.5, it's easier for the user to use the
> tool
> >> > like
> >> > > > > "tool
> >> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature
> X
> >> > > > --version
> >> > > > > > 6".
> >> > > > > > > >
> >> > > > > > > > (Kowshik): Great point. Generally, maximum feature version
> >> > levels
> >> > > > are
> >> > > > > > not
> >> > > > > > > > downgradable after
> >> > > > > > > > they are finalized in the cluster. This is because, as a
> >> > > guideline
> >> > > > > > > bumping
> >> > > > > > > > feature version level usually is used mainly to convey
> >> > important
> >> > > > > > breaking
> >> > > > > > > > changes.
> >> > > > > > > > Despite the above, there may be some extreme/rare cases
> >> where a
> >> > > > user
> >> > > > > > > wants
> >> > > > > > > > to downgrade
> >> > > > > > > > all features to a specific previous release. The user may
> >> want
> >> > to
> >> > > > do
> >> > > > > > this
> >> > > > > > > > just
> >> > > > > > > > prior to rolling back a Kafka cluster to a previous
> release.
> >> > > > > > > >
> >> > > > > > > > To support the above, I have made a change to the KIP
> >> > explaining
> >> > > > that
> >> > > > > > the
> >> > > > > > > > CLI tool is versioned.
> >> > > > > > > > The CLI tool internally has knowledge about a map of
> >> features
> >> > to
> >> > > > > their
> >> > > > > > > > respective max
> >> > > > > > > > versions supported by the Broker. The tool's knowledge of
> >> > > features
> >> > > > > and
> >> > > > > > > > their version values,
> >> > > > > > > > is limited to the version of the CLI tool itself i.e. the
> >> > > > information
> >> > > > > > is
> >> > > > > > > > packaged into the CLI tool
> >> > > > > > > > when it is released. Whenever a Kafka release introduces a
> >> new
> >> > > > > feature
> >> > > > > > > > version, or modifies
> >> > > > > > > > an existing feature version, the CLI tool shall also be
> >> updated
> >> > > > with
> >> > > > > > this
> >> > > > > > > > information,
> >> > > > > > > > Newer versions of the CLI tool will be released as part of
> >> the
> >> > > > Kafka
> >> > > > > > > > releases.
> >> > > > > > > >
> >> > > > > > > > Therefore, to achieve the downgrade need, the user just
> >> needs
> >> > to
> >> > > > run
> >> > > > > > the
> >> > > > > > > > version of
> >> > > > > > > > the CLI tool that's part of the particular previous
> release
> >> > that
> >> > > > > he/she
> >> > > > > > > is
> >> > > > > > > > downgrading to.
> >> > > > > > > > To help the user with this, there is a new command added
> to
> >> the
> >> > > CLI
> >> > > > > > tool
> >> > > > > > > > called `downgrade-all`.
> >> > > > > > > > This essentially downgrades max version levels of all
> >> features
> >> > in
> >> > > > the
> >> > > > > > > > cluster to the versions
> >> > > > > > > > known to the CLI tool internally.
> >> > > > > > > >
> >> > > > > > > > I have explained the above in the KIP under these
> sections:
> >> > > > > > > >
> >> > > > > > > > Tooling support (have explained that the CLI tool is
> >> > versioned):
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> >> > > > > > > >
> >> > > > > > > > Regular CLI tool usage (please refer to point #3, and see
> >> the
> >> > > > tooling
> >> > > > > > > > example)
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> >> > > > > > > >
> >> > > > > > > > > 110. Similarly, if the client library finds a feature
> >> > mismatch
> >> > > > with
> >> > > > > > the
> >> > > > > > > > broker,
> >> > > > > > > > > the client likely needs to log some error message for
> the
> >> > user
> >> > > to
> >> > > > > > take
> >> > > > > > > > some
> >> > > > > > > > > actions. It's much more actionable if the error message
> is
> >> > > > "upgrade
> >> > > > > > the
> >> > > > > > > > > broker to release version 2.6" than just "upgrade the
> >> broker
> >> > to
> >> > > > > > feature
> >> > > > > > > > > version 7".
> >> > > > > > > >
> >> > > > > > > > (Kowshik): That's a really good point! If we use ints for
> >> > feature
> >> > > > > > > versions,
> >> > > > > > > > the best
> >> > > > > > > > message that client can print for debugging is "broker
> >> doesn't
> >> > > > > support
> >> > > > > > > > feature version 7", and alongside that print the supported
> >> > > version
> >> > > > > > range
> >> > > > > > > > returned
> >> > > > > > > > by the broker. Then, does it sound reasonable that the
> user
> >> > could
> >> > > > > then
> >> > > > > > > > reference
> >> > > > > > > > Kafka release logs to figure out which version of the
> broker
> >> > > > release
> >> > > > > is
> >> > > > > > > > required
> >> > > > > > > > be deployed, to support feature version 7? I couldn't
> think
> >> of
> >> > a
> >> > > > > better
> >> > > > > > > > strategy here.
> >> > > > > > > >
> >> > > > > > > > > 120. When should a developer bump up the version of a
> >> > feature?
> >> > > > > > > >
> >> > > > > > > > (Kowshik): Great question! In the KIP, I have added a
> >> section:
> >> > > > > > > 'Guidelines
> >> > > > > > > > on feature versions and workflows'
> >> > > > > > > > providing some guidelines on when to use the versioned
> >> feature
> >> > > > flags,
> >> > > > > > and
> >> > > > > > > > what
> >> > > > > > > > are the regular workflows with the CLI tool.
> >> > > > > > > >
> >> > > > > > > > Link to the relevant sections:
> >> > > > > > > > Guidelines:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> >> > > > > > > >
> >> > > > > > > > Regular CLI tool usage:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> >> > > > > > > >
> >> > > > > > > > Advanced CLI tool usage:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Cheers,
> >> > > > > > > > Kowshik
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <jun@confluent.io
> >
> >> > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi, Kowshik,
> >> > > > > > > > >
> >> > > > > > > > > Thanks for the reply. A few more comments.
> >> > > > > > > > >
> >> > > > > > > > > 110. Keeping the feature version as int is probably
> fine.
> >> I
> >> > > just
> >> > > > > felt
> >> > > > > > > > that
> >> > > > > > > > > for some of the common user interactions, it's more
> >> > convenient
> >> > > to
> >> > > > > > > > > relate that to a release version. For example, if a user
> >> > wants
> >> > > to
> >> > > > > > > > downgrade
> >> > > > > > > > > to a release 2.5, it's easier for the user to use the
> tool
> >> > like
> >> > > > > "tool
> >> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature
> X
> >> > > > --version
> >> > > > > > 6".
> >> > > > > > > > > Similarly, if the client library finds a feature
> mismatch
> >> > with
> >> > > > the
> >> > > > > > > > broker,
> >> > > > > > > > > the client likely needs to log some error message for
> the
> >> > user
> >> > > to
> >> > > > > > take
> >> > > > > > > > some
> >> > > > > > > > > actions. It's much more actionable if the error message
> is
> >> > > > "upgrade
> >> > > > > > the
> >> > > > > > > > > broker to release version 2.6" than just "upgrade the
> >> broker
> >> > to
> >> > > > > > feature
> >> > > > > > > > > version 7".
> >> > > > > > > > >
> >> > > > > > > > > 111. Sounds good.
> >> > > > > > > > >
> >> > > > > > > > > 120. When should a developer bump up the version of a
> >> > feature?
> >> > > > > > > > >
> >> > > > > > > > > Jun
> >> > > > > > > > >
> >> > > > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> >> > > > > > > kprakasam@confluent.io
> >> > > > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Jun,
> >> > > > > > > > > >
> >> > > > > > > > > > I have updated the KIP for the item 111.
> >> > > > > > > > > > I'm in the process of addressing 100.6, and will
> >> provide an
> >> > > > > update
> >> > > > > > > > soon.
> >> > > > > > > > > > I think item 110 is still under discussion given we
> are
> >> now
> >> > > > > > > providing a
> >> > > > > > > > > way
> >> > > > > > > > > > to finalize
> >> > > > > > > > > > all features to their latest version levels. In any
> >> case,
> >> > > > please
> >> > > > > > let
> >> > > > > > > us
> >> > > > > > > > > > know
> >> > > > > > > > > > how you feel in response to Colin's comments on this
> >> topic.
> >> > > > > > > > > >
> >> > > > > > > > > > > 111. To put this in context, when we had IBP, the
> >> default
> >> > > > value
> >> > > > > > is
> >> > > > > > > > the
> >> > > > > > > > > > > current released version. So, if you are a brand new
> >> > user,
> >> > > > you
> >> > > > > > > don't
> >> > > > > > > > > need
> >> > > > > > > > > > > to configure IBP and all new features will be
> >> immediately
> >> > > > > > available
> >> > > > > > > > in
> >> > > > > > > > > > the
> >> > > > > > > > > > > new cluster. If you are upgrading from an old
> version,
> >> > you
> >> > > do
> >> > > > > > need
> >> > > > > > > to
> >> > > > > > > > > > > understand and configure IBP. I see a similar
> pattern
> >> > here
> >> > > > for
> >> > > > > > > > > > > features. From the ease of use perspective, ideally,
> >> we
> >> > > > > shouldn't
> >> > > > > > > > > require
> >> > > > > > > > > > a
> >> > > > > > > > > > > new user to have an extra step such as running a
> >> > bootstrap
> >> > > > > script
> >> > > > > > > > > unless
> >> > > > > > > > > > > it's truly necessary. If someone has a special need
> >> (all
> >> > > the
> >> > > > > > cases
> >> > > > > > > > you
> >> > > > > > > > > > > mentioned seem special cases?), they can configure a
> >> mode
> >> > > > such
> >> > > > > > that
> >> > > > > > > > > > > features are enabled/disabled manually.
> >> > > > > > > > > >
> >> > > > > > > > > > (Kowshik): That makes sense, thanks for the idea!
> Sorry
> >> if
> >> > I
> >> > > > > didn't
> >> > > > > > > > > > understand
> >> > > > > > > > > > this need earlier. I have updated the KIP with the
> >> approach
> >> > > > that
> >> > > > > > > > whenever
> >> > > > > > > > > > the '/features' node is absent, the controller by
> >> default
> >> > > will
> >> > > > > > > > bootstrap
> >> > > > > > > > > > the node
> >> > > > > > > > > > to contain the latest feature levels. Here is the new
> >> > section
> >> > > > in
> >> > > > > > the
> >> > > > > > > > KIP
> >> > > > > > > > > > describing
> >> > > > > > > > > > the same:
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> >> > > > > > > > > >
> >> > > > > > > > > > Next, as I explained in my response to Colin's
> >> suggestions,
> >> > > we
> >> > > > > are
> >> > > > > > > now
> >> > > > > > > > > > providing a `--finalize-latest-features` flag with the
> >> > > tooling.
> >> > > > > > This
> >> > > > > > > > lets
> >> > > > > > > > > > the sysadmin finalize all features known to the
> >> controller
> >> > to
> >> > > > > their
> >> > > > > > > > > latest
> >> > > > > > > > > > version
> >> > > > > > > > > > levels. Please look at this section (point #3 and the
> >> > tooling
> >> > > > > > example
> >> > > > > > > > > > later):
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Do you feel this addresses your comment/concern?
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Cheers,
> >> > > > > > > > > > Kowshik
> >> > > > > > > > > >
> >> > > > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <
> >> jun@confluent.io>
> >> > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hi, Kowshik,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thanks for the reply. A few more replies below.
> >> > > > > > > > > > >
> >> > > > > > > > > > > 100.6 You can look for the sentence "This operation
> >> > > requires
> >> > > > > > ALTER
> >> > > > > > > on
> >> > > > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage
> in
> >> > > > > > > > > > > KafkaApis.authorize().
> >> > > > > > > > > > >
> >> > > > > > > > > > > 110. From the external client/tooling perspective,
> >> it's
> >> > > more
> >> > > > > > > natural
> >> > > > > > > > to
> >> > > > > > > > > > use
> >> > > > > > > > > > > the release version for features. If we can use the
> >> same
> >> > > > > release
> >> > > > > > > > > version
> >> > > > > > > > > > > for internal representation, it seems simpler
> (easier
> >> to
> >> > > > > > > understand,
> >> > > > > > > > no
> >> > > > > > > > > > > mapping overhead, etc). Is there a benefit with
> >> separate
> >> > > > > external
> >> > > > > > > and
> >> > > > > > > > > > > internal versioning schemes?
> >> > > > > > > > > > >
> >> > > > > > > > > > > 111. To put this in context, when we had IBP, the
> >> default
> >> > > > value
> >> > > > > > is
> >> > > > > > > > the
> >> > > > > > > > > > > current released version. So, if you are a brand new
> >> > user,
> >> > > > you
> >> > > > > > > don't
> >> > > > > > > > > need
> >> > > > > > > > > > > to configure IBP and all new features will be
> >> immediately
> >> > > > > > available
> >> > > > > > > > in
> >> > > > > > > > > > the
> >> > > > > > > > > > > new cluster. If you are upgrading from an old
> version,
> >> > you
> >> > > do
> >> > > > > > need
> >> > > > > > > to
> >> > > > > > > > > > > understand and configure IBP. I see a similar
> pattern
> >> > here
> >> > > > for
> >> > > > > > > > > > > features. From the ease of use perspective, ideally,
> >> we
> >> > > > > shouldn't
> >> > > > > > > > > > require a
> >> > > > > > > > > > > new user to have an extra step such as running a
> >> > bootstrap
> >> > > > > script
> >> > > > > > > > > unless
> >> > > > > > > > > > > it's truly necessary. If someone has a special need
> >> (all
> >> > > the
> >> > > > > > cases
> >> > > > > > > > you
> >> > > > > > > > > > > mentioned seem special cases?), they can configure a
> >> mode
> >> > > > such
> >> > > > > > that
> >> > > > > > > > > > > features are enabled/disabled manually.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Jun
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> >> > > > > > > > > kprakasam@confluent.io>
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi Jun,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks for the feedback and suggestions. Please
> >> find my
> >> > > > > > response
> >> > > > > > > > > below.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 100.6 For every new request, the admin needs to
> >> > control
> >> > > > who
> >> > > > > > is
> >> > > > > > > > > > allowed
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > issue that request if security is enabled. So,
> we
> >> > need
> >> > > to
> >> > > > > > > assign
> >> > > > > > > > > the
> >> > > > > > > > > > > new
> >> > > > > > > > > > > > > request a ResourceType and possible
> AclOperations.
> >> > See
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> >> > > > > > > > > > > > > as an example.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): I don't see any reference to the words
> >> > > > > ResourceType
> >> > > > > > or
> >> > > > > > > > > > > > AclOperations
> >> > > > > > > > > > > > in the KIP. Please let me know how I can use the
> KIP
> >> > that
> >> > > > you
> >> > > > > > > > linked
> >> > > > > > > > > to
> >> > > > > > > > > > > > know how to
> >> > > > > > > > > > > > setup the appropriate ResourceType and/or
> >> > > ClusterOperation?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 105. If we change delete to disable, it's better
> >> to
> >> > do
> >> > > > this
> >> > > > > > > > > > > consistently
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > request protocol and admin api as well.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): The API shouldn't be called 'disable'
> >> when
> >> > it
> >> > > is
> >> > > > > > > > deleting
> >> > > > > > > > > a
> >> > > > > > > > > > > > feature.
> >> > > > > > > > > > > > I've just changed the KIP to use 'delete'. I don't
> >> > have a
> >> > > > > > strong
> >> > > > > > > > > > > > preference.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use
> >> > int64.
> >> > > > > > > Currently,
> >> > > > > > > > > our
> >> > > > > > > > > > > > > release version schema is major.minor.bugfix
> (e.g.
> >> > > > 2.5.0).
> >> > > > > > It's
> >> > > > > > > > > > > possible
> >> > > > > > > > > > > > > for new features to be included in minor
> releases
> >> > too.
> >> > > > > Should
> >> > > > > > > we
> >> > > > > > > > > make
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > feature versioning match the release versioning?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): The release version can be mapped to a
> >> set
> >> > of
> >> > > > > > feature
> >> > > > > > > > > > > versions,
> >> > > > > > > > > > > > and this can be done, for example in the tool (or
> >> even
> >> > > > > external
> >> > > > > > > to
> >> > > > > > > > > the
> >> > > > > > > > > > > > tool).
> >> > > > > > > > > > > > Can you please clarify what I'm missing?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 111. "During regular operations, the data in the
> >> ZK
> >> > > node
> >> > > > > can
> >> > > > > > be
> >> > > > > > > > > > mutated
> >> > > > > > > > > > > > > only via a specific admin API served only by the
> >> > > > > > controller." I
> >> > > > > > > > am
> >> > > > > > > > > > > > > wondering why can't the controller auto
> finalize a
> >> > > > feature
> >> > > > > > > > version
> >> > > > > > > > > > > after
> >> > > > > > > > > > > > > all brokers are upgraded? For new users who
> >> download
> >> > > the
> >> > > > > > latest
> >> > > > > > > > > > version
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > build a new cluster, it's inconvenient for them
> to
> >> > have
> >> > > > to
> >> > > > > > > > manually
> >> > > > > > > > > > > > enable
> >> > > > > > > > > > > > > each feature.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): I agree that there is a trade-off here,
> >> but
> >> > it
> >> > > > > will
> >> > > > > > > help
> >> > > > > > > > > > > > to decide whether the automation can be thought
> >> through
> >> > > in
> >> > > > > the
> >> > > > > > > > future
> >> > > > > > > > > > > > in a follow up KIP, or right now in this KIP. We
> may
> >> > > invest
> >> > > > > > > > > > > > in automation, but we have to decide whether we
> >> should
> >> > do
> >> > > > it
> >> > > > > > > > > > > > now or later.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > For the inconvenience that you mentioned, do you
> >> think
> >> > > the
> >> > > > > > > problem
> >> > > > > > > > > that
> >> > > > > > > > > > > you
> >> > > > > > > > > > > > mentioned can be  overcome by asking for the
> cluster
> >> > > > operator
> >> > > > > > to
> >> > > > > > > > run
> >> > > > > > > > > a
> >> > > > > > > > > > > > bootstrap script  when he/she knows that a
> specific
> >> AK
> >> > > > > release
> >> > > > > > > has
> >> > > > > > > > > been
> >> > > > > > > > > > > > almost completely deployed in a cluster for the
> >> first
> >> > > time?
> >> > > > > > Idea
> >> > > > > > > is
> >> > > > > > > > > > that
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > bootstrap script will know how to map a specific
> AK
> >> > > release
> >> > > > > to
> >> > > > > > > > > > finalized
> >> > > > > > > > > > > > feature versions, and run the `kafka-features.sh`
> >> tool
> >> > > > > > > > appropriately
> >> > > > > > > > > > > > against
> >> > > > > > > > > > > > the cluster.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Now, coming back to your automation
> >> proposal/question.
> >> > > > > > > > > > > > I do see the value of automated feature version
> >> > > > finalization,
> >> > > > > > > but I
> >> > > > > > > > > > also
> >> > > > > > > > > > > > see
> >> > > > > > > > > > > > that this will open up several questions and some
> >> > risks,
> >> > > as
> >> > > > > > > > explained
> >> > > > > > > > > > > > below.
> >> > > > > > > > > > > > The answers to these depend on the definition of
> the
> >> > > > > automation
> >> > > > > > > we
> >> > > > > > > > > > choose
> >> > > > > > > > > > > > to build, and how well does it fit into a kafka
> >> > > deployment.
> >> > > > > > > > > > > > Basically, it can be unsafe for the controller to
> >> > > finalize
> >> > > > > > > feature
> >> > > > > > > > > > > version
> >> > > > > > > > > > > > upgrades automatically, without learning about the
> >> > intent
> >> > > > of
> >> > > > > > the
> >> > > > > > > > > > cluster
> >> > > > > > > > > > > > operator.
> >> > > > > > > > > > > > 1. We would sometimes want to lock feature
> versions
> >> > only
> >> > > > when
> >> > > > > > we
> >> > > > > > > > have
> >> > > > > > > > > > > > externally verified
> >> > > > > > > > > > > > the stability of the broker binary.
> >> > > > > > > > > > > > 2. Sometimes only the cluster operator knows that
> a
> >> > > cluster
> >> > > > > > > upgrade
> >> > > > > > > > > is
> >> > > > > > > > > > > > complete,
> >> > > > > > > > > > > > and new brokers are highly unlikely to join the
> >> > cluster.
> >> > > > > > > > > > > > 3. Only the cluster operator knows that the intent
> >> is
> >> > to
> >> > > > > deploy
> >> > > > > > > the
> >> > > > > > > > > > same
> >> > > > > > > > > > > > version
> >> > > > > > > > > > > > of the new broker release across the entire
> cluster
> >> > (i.e.
> >> > > > the
> >> > > > > > > > latest
> >> > > > > > > > > > > > downloaded version).
> >> > > > > > > > > > > > 4. For downgrades, it appears the controller still
> >> > needs
> >> > > > some
> >> > > > > > > > > external
> >> > > > > > > > > > > > input
> >> > > > > > > > > > > > (such as the proposed tool) to finalize a feature
> >> > version
> >> > > > > > > > downgrade.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > If we have automation, that automation can end up
> >> > failing
> >> > > > in
> >> > > > > > some
> >> > > > > > > > of
> >> > > > > > > > > > the
> >> > > > > > > > > > > > cases
> >> > > > > > > > > > > > above. Then, we need a way to declare that the
> >> cluster
> >> > is
> >> > > > > "not
> >> > > > > > > > ready"
> >> > > > > > > > > > if
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > controller cannot automatically finalize some
> basic
> >> > > > required
> >> > > > > > > > feature
> >> > > > > > > > > > > > version
> >> > > > > > > > > > > > upgrades across the cluster. We need to make the
> >> > cluster
> >> > > > > > operator
> >> > > > > > > > > aware
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > such a scenario (raise an alert or alike).
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
> >> > should
> >> > > > be
> >> > > > > 49
> >> > > > > > > > > instead
> >> > > > > > > > > > > of
> >> > > > > > > > > > > > 48.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): Done.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <
> >> > > jun@confluent.io>
> >> > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Hi, Kowshik,
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Thanks for the reply. A few more comments below.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 100.6 For every new request, the admin needs to
> >> > control
> >> > > > who
> >> > > > > > is
> >> > > > > > > > > > allowed
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > issue that request if security is enabled. So,
> we
> >> > need
> >> > > to
> >> > > > > > > assign
> >> > > > > > > > > the
> >> > > > > > > > > > > new
> >> > > > > > > > > > > > > request a ResourceType and possible
> AclOperations.
> >> > See
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> >> > > > > > > > > > > > > as
> >> > > > > > > > > > > > > an example.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 105. If we change delete to disable, it's better
> >> to
> >> > do
> >> > > > this
> >> > > > > > > > > > > consistently
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > request protocol and admin api as well.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use
> >> > int64.
> >> > > > > > > Currently,
> >> > > > > > > > > our
> >> > > > > > > > > > > > > release version schema is major.minor.bugfix
> (e.g.
> >> > > > 2.5.0).
> >> > > > > > It's
> >> > > > > > > > > > > possible
> >> > > > > > > > > > > > > for new features to be included in minor
> releases
> >> > too.
> >> > > > > Should
> >> > > > > > > we
> >> > > > > > > > > make
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > feature versioning match the release versioning?
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 111. "During regular operations, the data in the
> >> ZK
> >> > > node
> >> > > > > can
> >> > > > > > be
> >> > > > > > > > > > mutated
> >> > > > > > > > > > > > > only via a specific admin API served only by the
> >> > > > > > controller." I
> >> > > > > > > > am
> >> > > > > > > > > > > > > wondering why can't the controller auto
> finalize a
> >> > > > feature
> >> > > > > > > > version
> >> > > > > > > > > > > after
> >> > > > > > > > > > > > > all brokers are upgraded? For new users who
> >> download
> >> > > the
> >> > > > > > latest
> >> > > > > > > > > > version
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > build a new cluster, it's inconvenient for them
> to
> >> > have
> >> > > > to
> >> > > > > > > > manually
> >> > > > > > > > > > > > enable
> >> > > > > > > > > > > > > each feature.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
> >> > should
> >> > > > be
> >> > > > > 49
> >> > > > > > > > > instead
> >> > > > > > > > > > > of
> >> > > > > > > > > > > > > 48.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Jun
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam
> <
> >> > > > > > > > > > > kprakasam@confluent.io>
> >> > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hey Jun,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks a lot for the great feedback! Please
> note
> >> > that
> >> > > > the
> >> > > > > > > > design
> >> > > > > > > > > > > > > > has changed a little bit on the KIP, and we
> now
> >> > > > propagate
> >> > > > > > the
> >> > > > > > > > > > > finalized
> >> > > > > > > > > > > > > > features metadata only via ZK watches (instead
> >> of
> >> > > > > > > > > > > UpdateMetadataRequest
> >> > > > > > > > > > > > > > from the controller).
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Please find below my response to your
> >> > > > questions/feedback,
> >> > > > > > > with
> >> > > > > > > > > the
> >> > > > > > > > > > > > prefix
> >> > > > > > > > > > > > > > "(Kowshik):".
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.
> >> UpdateFeaturesRequest/UpdateFeaturesResponse
> >> > > > > > > > > > > > > > > 100.1 Since this request waits for responses
> >> from
> >> > > > > > brokers,
> >> > > > > > > > > should
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > add
> >> > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > timeout in the request (like
> >> createTopicRequest)?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I have added a
> >> > timeout
> >> > > > > field.
> >> > > > > > > > Note:
> >> > > > > > > > > > we
> >> > > > > > > > > > > no
> >> > > > > > > > > > > > > > longer
> >> > > > > > > > > > > > > > wait for responses from brokers, since the
> >> design
> >> > has
> >> > > > > been
> >> > > > > > > > > changed
> >> > > > > > > > > > so
> >> > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > features information is propagated via ZK.
> >> > > > Nevertheless,
> >> > > > > it
> >> > > > > > > is
> >> > > > > > > > > > right
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > > have a timeout
> >> > > > > > > > > > > > > > for the request.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> >> > > Typically,
> >> > > > > the
> >> > > > > > > > > response
> >> > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > shows an error code and an error message,
> >> instead
> >> > > of
> >> > > > > > > echoing
> >> > > > > > > > > the
> >> > > > > > > > > > > > > request.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified
> >> it to
> >> > > > just
> >> > > > > > > return
> >> > > > > > > > > an
> >> > > > > > > > > > > > error
> >> > > > > > > > > > > > > > code and a message.
> >> > > > > > > > > > > > > > Previously it was not echoing the "request",
> >> rather
> >> > > it
> >> > > > > was
> >> > > > > > > > > > returning
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > latest set of
> >> > > > > > > > > > > > > > cluster-wide finalized features (after
> applying
> >> the
> >> > > > > > updates).
> >> > > > > > > > But
> >> > > > > > > > > > you
> >> > > > > > > > > > > > are
> >> > > > > > > > > > > > > > right,
> >> > > > > > > > > > > > > > the additional info is not required, so I have
> >> > > removed
> >> > > > it
> >> > > > > > > from
> >> > > > > > > > > the
> >> > > > > > > > > > > > > response
> >> > > > > > > > > > > > > > schema.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.3 Should we add a separate request to
> >> > > > list/describe
> >> > > > > > the
> >> > > > > > > > > > > existing
> >> > > > > > > > > > > > > > > features?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): This is already present in the KIP
> >> via
> >> > the
> >> > > > > > > > > > > > 'DescribeFeatures'
> >> > > > > > > > > > > > > > Admin API,
> >> > > > > > > > > > > > > > which, underneath covers uses the
> >> > ApiVersionsRequest
> >> > > to
> >> > > > > > > > > > list/describe
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > existing features. Please read the 'Tooling
> >> > support'
> >> > > > > > section.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE
> >> in a
> >> > > > > single
> >> > > > > > > > > request.
> >> > > > > > > > > > > For
> >> > > > > > > > > > > > > > > DELETE, the version field doesn't make
> sense.
> >> > So, I
> >> > > > > guess
> >> > > > > > > the
> >> > > > > > > > > > > broker
> >> > > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > ignores this? An alternative way is to have
> a
> >> > > > separate
> >> > > > > > > > > > > > > > DeleteFeaturesRequest
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! I have modified the
> KIP
> >> now
> >> > > to
> >> > > > > > have 2
> >> > > > > > > > > > > separate
> >> > > > > > > > > > > > > > controller APIs
> >> > > > > > > > > > > > > > serving these different purposes:
> >> > > > > > > > > > > > > > 1. updateFeatures
> >> > > > > > > > > > > > > > 2. deleteFeatures
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have
> "The
> >> > > > > > monotonically
> >> > > > > > > > > > > > increasing
> >> > > > > > > > > > > > > > > version of the metadata for finalized
> >> features."
> >> > I
> >> > > am
> >> > > > > > > > wondering
> >> > > > > > > > > > why
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > ordering is important?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is
> >> called
> >> > > > epoch
> >> > > > > > > > > (instead
> >> > > > > > > > > > of
> >> > > > > > > > > > > > > > version), and
> >> > > > > > > > > > > > > > it is just the ZK node version. Basically,
> this
> >> is
> >> > > the
> >> > > > > > epoch
> >> > > > > > > > for
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > > cluster-wide
> >> > > > > > > > > > > > > > finalized feature version metadata. This
> >> metadata
> >> > is
> >> > > > > served
> >> > > > > > > to
> >> > > > > > > > > > > clients
> >> > > > > > > > > > > > > via
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate
> >> > updates
> >> > > > > from
> >> > > > > > > the
> >> > > > > > > > > > > > > '/features'
> >> > > > > > > > > > > > > > ZK node
> >> > > > > > > > > > > > > > to all brokers, via ZK watches setup by each
> >> broker
> >> > > on
> >> > > > > the
> >> > > > > > > > > > > '/features'
> >> > > > > > > > > > > > > > node.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Now here is why the ordering is important:
> >> > > > > > > > > > > > > > ZK watches don't propagate at the same time.
> As
> >> a
> >> > > > result,
> >> > > > > > the
> >> > > > > > > > > > > > > > ApiVersionsResponse
> >> > > > > > > > > > > > > > is eventually consistent across brokers. This
> >> can
> >> > > > > introduce
> >> > > > > > > > cases
> >> > > > > > > > > > > > > > where clients see an older lower epoch of the
> >> > > features
> >> > > > > > > > metadata,
> >> > > > > > > > > > > after
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > > > > more recent
> >> > > > > > > > > > > > > > higher epoch was returned at a previous point
> in
> >> > > time.
> >> > > > We
> >> > > > > > > > expect
> >> > > > > > > > > > > > clients
> >> > > > > > > > > > > > > > to always employ the rule that the latest
> >> received
> >> > > > higher
> >> > > > > > > epoch
> >> > > > > > > > > of
> >> > > > > > > > > > > > > metadata
> >> > > > > > > > > > > > > > always trumps an older smaller epoch. Those
> >> clients
> >> > > > that
> >> > > > > > are
> >> > > > > > > > > > external
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > > Kafka should strongly consider discovering the
> >> > latest
> >> > > > > > > metadata
> >> > > > > > > > > once
> >> > > > > > > > > > > > > during
> >> > > > > > > > > > > > > > startup from the brokers, and if required
> >> refresh
> >> > the
> >> > > > > > > metadata
> >> > > > > > > > > > > > > periodically
> >> > > > > > > > > > > > > > (to get the latest metadata).
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for
> >> this
> >> > > new
> >> > > > > > > > request?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): What is ACL, and how could I find
> out
> >> > > which
> >> > > > > one
> >> > > > > > to
> >> > > > > > > > > > > specify?
> >> > > > > > > > > > > > > > Please could you provide me some pointers?
> I'll
> >> be
> >> > > glad
> >> > > > > to
> >> > > > > > > > update
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > > KIP once I know the next steps.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 101. For the broker registration ZK node,
> >> should
> >> > we
> >> > > > > bump
> >> > > > > > up
> >> > > > > > > > the
> >> > > > > > > > > > > > version
> >> > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > the json?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I've increased
> the
> >> > > > version
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > > > > broker
> >> > > > > > > > > > > > > json
> >> > > > > > > > > > > > > > by 1.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if
> we
> >> > need
> >> > > > the
> >> > > > > > > epoch
> >> > > > > > > > > > > field.
> >> > > > > > > > > > > > > Each
> >> > > > > > > > > > > > > > > ZK node has an internal version field that
> is
> >> > > > > incremented
> >> > > > > > > on
> >> > > > > > > > > > every
> >> > > > > > > > > > > > > > update.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK
> >> node
> >> > > > > version
> >> > > > > > > > now,
> >> > > > > > > > > > > > instead
> >> > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > explicitly
> >> > > > > > > > > > > > > > incremented epoch.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a
> >> feature
> >> > > > > version
> >> > > > > > > > > > > cluster-wide
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > left to the discretion of the logic
> >> implementing
> >> > > the
> >> > > > > > > feature
> >> > > > > > > > > (ex:
> >> > > > > > > > > > > can
> >> > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > done via dynamic broker config)." Does that
> >> mean
> >> > > the
> >> > > > > > broker
> >> > > > > > > > > > > > > registration
> >> > > > > > > > > > > > > > ZK
> >> > > > > > > > > > > > > > > node will be updated dynamically when this
> >> > happens?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Not really. The text was just
> >> conveying
> >> > > > that a
> >> > > > > > > > broker
> >> > > > > > > > > > > could
> >> > > > > > > > > > > > > > "know" of
> >> > > > > > > > > > > > > > a new feature version, but it does not mean
> the
> >> > > broker
> >> > > > > > should
> >> > > > > > > > > have
> >> > > > > > > > > > > also
> >> > > > > > > > > > > > > > activated the effects of the feature version.
> >> > Knowing
> >> > > > vs
> >> > > > > > > > > activation
> >> > > > > > > > > > > > are 2
> >> > > > > > > > > > > > > > separate things,
> >> > > > > > > > > > > > > > and the latter can be achieved by dynamic
> >> config. I
> >> > > > have
> >> > > > > > > > reworded
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > text
> >> > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > make this clear to the reader.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 104. UpdateMetadataRequest
> >> > > > > > > > > > > > > > > 104.1 It would be useful to describe when
> the
> >> > > feature
> >> > > > > > > > metadata
> >> > > > > > > > > is
> >> > > > > > > > > > > > > > included
> >> > > > > > > > > > > > > > > in the request. My understanding is that
> it's
> >> > only
> >> > > > > > included
> >> > > > > > > > if
> >> > > > > > > > > > (1)
> >> > > > > > > > > > > > > there
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > a change to the finalized feature; (2)
> broker
> >> > > > restart;
> >> > > > > > (3)
> >> > > > > > > > > > > controller
> >> > > > > > > > > > > > > > > failover.
> >> > > > > > > > > > > > > > > 104.2 The new fields have the following
> >> versions.
> >> > > Why
> >> > > > > are
> >> > > > > > > the
> >> > > > > > > > > > > > versions
> >> > > > > > > > > > > > > 3+
> >> > > > > > > > > > > > > > > when the top version is bumped to 6?
> >> > > > > > > > > > > > > > >       "fields":  [
> >> > > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> >> > > > "versions":
> >> > > > > > > > "3+",
> >> > > > > > > > > > > > > > >           "about": "The name of the
> >> feature."},
> >> > > > > > > > > > > > > > >         {"name":  "Version", "type":
> "int64",
> >> > > > > > "versions":
> >> > > > > > > > > "3+",
> >> > > > > > > > > > > > > > >           "about": "The finalized version
> for
> >> the
> >> > > > > > > feature."}
> >> > > > > > > > > > > > > > >       ]
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): With the new improved design, we
> have
> >> > > > > completely
> >> > > > > > > > > > > eliminated
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > need to
> >> > > > > > > > > > > > > > use UpdateMetadataRequest. This is because we
> >> now
> >> > > rely
> >> > > > on
> >> > > > > > ZK
> >> > > > > > > to
> >> > > > > > > > > > > deliver
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > notifications for changes to the '/features'
> ZK
> >> > node.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> >> > > > update/delete,
> >> > > > > > > > perhaps
> >> > > > > > > > > > > it's
> >> > > > > > > > > > > > > > better
> >> > > > > > > > > > > > > > > to use enable/disable?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it
> so
> >> > that
> >> > > > we
> >> > > > > > > > instead
> >> > > > > > > > > > call
> >> > > > > > > > > > > > it
> >> > > > > > > > > > > > > > 'disable'.
> >> > > > > > > > > > > > > > However for 'update', it can now also refer to
> >> > either
> >> > > > an
> >> > > > > > > > upgrade
> >> > > > > > > > > > or a
> >> > > > > > > > > > > > > > forced downgrade.
> >> > > > > > > > > > > > > > Therefore, I have left it the way it is, just
> >> > calling
> >> > > > it
> >> > > > > as
> >> > > > > > > > just
> >> > > > > > > > > > > > > 'update'.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
> >> > > > > jun@confluent.io>
> >> > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Hi, Kowshik,
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A
> few
> >> > > > comments
> >> > > > > > > below.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.
> >> UpdateFeaturesRequest/UpdateFeaturesResponse
> >> > > > > > > > > > > > > > > 100.1 Since this request waits for responses
> >> from
> >> > > > > > brokers,
> >> > > > > > > > > should
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > add
> >> > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > timeout in the request (like
> >> createTopicRequest)?
> >> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> >> > > Typically,
> >> > > > > the
> >> > > > > > > > > response
> >> > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > shows an error code and an error message,
> >> instead
> >> > > of
> >> > > > > > > echoing
> >> > > > > > > > > the
> >> > > > > > > > > > > > > request.
> >> > > > > > > > > > > > > > > 100.3 Should we add a separate request to
> >> > > > list/describe
> >> > > > > > the
> >> > > > > > > > > > > existing
> >> > > > > > > > > > > > > > > features?
> >> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE
> >> in a
> >> > > > > single
> >> > > > > > > > > request.
> >> > > > > > > > > > > For
> >> > > > > > > > > > > > > > > DELETE, the version field doesn't make
> sense.
> >> > So, I
> >> > > > > guess
> >> > > > > > > the
> >> > > > > > > > > > > broker
> >> > > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > ignores this? An alternative way is to have
> a
> >> > > > separate
> >> > > > > > > > > > > > > > > DeleteFeaturesRequest
> >> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have
> "The
> >> > > > > > monotonically
> >> > > > > > > > > > > > increasing
> >> > > > > > > > > > > > > > > version of the metadata for finalized
> >> features."
> >> > I
> >> > > am
> >> > > > > > > > wondering
> >> > > > > > > > > > why
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > ordering is important?
> >> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for
> >> this
> >> > > new
> >> > > > > > > > request?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 101. For the broker registration ZK node,
> >> should
> >> > we
> >> > > > > bump
> >> > > > > > up
> >> > > > > > > > the
> >> > > > > > > > > > > > version
> >> > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > the json?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if
> we
> >> > need
> >> > > > the
> >> > > > > > > epoch
> >> > > > > > > > > > > field.
> >> > > > > > > > > > > > > Each
> >> > > > > > > > > > > > > > > ZK node has an internal version field that
> is
> >> > > > > incremented
> >> > > > > > > on
> >> > > > > > > > > > every
> >> > > > > > > > > > > > > > update.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a
> >> feature
> >> > > > > version
> >> > > > > > > > > > > cluster-wide
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > left to the discretion of the logic
> >> implementing
> >> > > the
> >> > > > > > > feature
> >> > > > > > > > > (ex:
> >> > > > > > > > > > > can
> >> > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > done via dynamic broker config)." Does that
> >> mean
> >> > > the
> >> > > > > > broker
> >> > > > > > > > > > > > > registration
> >> > > > > > > > > > > > > > ZK
> >> > > > > > > > > > > > > > > node will be updated dynamically when this
> >> > happens?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 104. UpdateMetadataRequest
> >> > > > > > > > > > > > > > > 104.1 It would be useful to describe when
> the
> >> > > feature
> >> > > > > > > > metadata
> >> > > > > > > > > is
> >> > > > > > > > > > > > > > included
> >> > > > > > > > > > > > > > > in the request. My understanding is that
> it's
> >> > only
> >> > > > > > included
> >> > > > > > > > if
> >> > > > > > > > > > (1)
> >> > > > > > > > > > > > > there
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > a change to the finalized feature; (2)
> broker
> >> > > > restart;
> >> > > > > > (3)
> >> > > > > > > > > > > controller
> >> > > > > > > > > > > > > > > failover.
> >> > > > > > > > > > > > > > > 104.2 The new fields have the following
> >> versions.
> >> > > Why
> >> > > > > are
> >> > > > > > > the
> >> > > > > > > > > > > > versions
> >> > > > > > > > > > > > > 3+
> >> > > > > > > > > > > > > > > when the top version is bumped to 6?
> >> > > > > > > > > > > > > > >       "fields":  [
> >> > > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> >> > > > "versions":
> >> > > > > > > > "3+",
> >> > > > > > > > > > > > > > >           "about": "The name of the
> >> feature."},
> >> > > > > > > > > > > > > > >         {"name":  "Version", "type":
> "int64",
> >> > > > > > "versions":
> >> > > > > > > > > "3+",
> >> > > > > > > > > > > > > > >           "about": "The finalized version
> for
> >> the
> >> > > > > > > feature."}
> >> > > > > > > > > > > > > > >       ]
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> >> > > > update/delete,
> >> > > > > > > > perhaps
> >> > > > > > > > > > > it's
> >> > > > > > > > > > > > > > better
> >> > > > > > > > > > > > > > > to use enable/disable?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Jun
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik
> >> Prakasam
> >> > <
> >> > > > > > > > > > > > > kprakasam@confluent.io
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hey Boyang,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Thanks for the great feedback! I have
> >> updated
> >> > the
> >> > > > KIP
> >> > > > > > > based
> >> > > > > > > > > on
> >> > > > > > > > > > > your
> >> > > > > > > > > > > > > > > > feedback.
> >> > > > > > > > > > > > > > > > Please find my response below for your
> >> > comments,
> >> > > > look
> >> > > > > > for
> >> > > > > > > > > > > sentences
> >> > > > > > > > > > > > > > > > starting
> >> > > > > > > > > > > > > > > > with "(Kowshik)" below.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to
> >> begin
> >> > > > > handling
> >> > > > > > > EOS
> >> > > > > > > > > > > > traffic"
> >> > > > > > > > > > > > > > > could
> >> > > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > converted as "When is it safe for the
> >> brokers
> >> > > to
> >> > > > > > start
> >> > > > > > > > > > serving
> >> > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS
> is
> >> not
> >> > > > > > explained
> >> > > > > > > > > > earlier
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > context.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the
> >> metadata
> >> > > > > version
> >> > > > > > > > > number
> >> > > > > > > > > > > part
> >> > > > > > > > > > > > > > > seems a
> >> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference
> >> to
> >> > > later
> >> > > > > > > section
> >> > > > > > > > > > that
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > > going
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > store it in Zookeeper and update it
> every
> >> > time
> >> > > > when
> >> > > > > > > there
> >> > > > > > > > > is
> >> > > > > > > > > > a
> >> > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > change?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a
> >> > > > reference
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > > > KIP.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although
> >> it's a
> >> > > > > > Non-goal
> >> > > > > > > of
> >> > > > > > > > > the
> >> > > > > > > > > > > > KIP,
> >> > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > > features such as group coordinator
> >> semantics,
> >> > > > there
> >> > > > > > is
> >> > > > > > > no
> >> > > > > > > > > > legal
> >> > > > > > > > > > > > > > > scenario
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > perform a downgrade at all. So having
> >> > downgrade
> >> > > > > door
> >> > > > > > > open
> >> > > > > > > > > is
> >> > > > > > > > > > > > pretty
> >> > > > > > > > > > > > > > > > > error-prone as human faults happen all
> the
> >> > > time.
> >> > > > > I'm
> >> > > > > > > > > assuming
> >> > > > > > > > > > > as
> >> > > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > features are implemented, it's not very
> >> hard
> >> > to
> >> > > > > add a
> >> > > > > > > > flag
> >> > > > > > > > > > > during
> >> > > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > creation to indicate whether this
> feature
> >> is
> >> > > > > > > > > "downgradable".
> >> > > > > > > > > > > > Could
> >> > > > > > > > > > > > > > you
> >> > > > > > > > > > > > > > > > > explain a bit more on the extra
> >> engineering
> >> > > > effort
> >> > > > > > for
> >> > > > > > > > > > shipping
> >> > > > > > > > > > > > > this
> >> > > > > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > > > > with downgrade protection in place?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and
> >> disagree
> >> > > > here.
> >> > > > > > > While
> >> > > > > > > > I
> >> > > > > > > > > > > agree
> >> > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > accidental
> >> > > > > > > > > > > > > > > > downgrades can cause problems, I also
> think
> >> > > > sometimes
> >> > > > > > > > > > downgrades
> >> > > > > > > > > > > > > should
> >> > > > > > > > > > > > > > > > be allowed for emergency reasons (not all
> >> > > > downgrades
> >> > > > > > > cause
> >> > > > > > > > > > > issues).
> >> > > > > > > > > > > > > > > > It is just subjective to the feature being
> >> > > > > downgraded.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > To be more strict about feature version
> >> > > > downgrades, I
> >> > > > > > > have
> >> > > > > > > > > > > modified
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > > > proposing that we mandate a
> >> `--force-downgrade`
> >> > > > flag
> >> > > > > be
> >> > > > > > > > used
> >> > > > > > > > > in
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > UPDATE_FEATURES api
> >> > > > > > > > > > > > > > > > and the tooling, whenever the human is
> >> > > downgrading
> >> > > > a
> >> > > > > > > > > finalized
> >> > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > version.
> >> > > > > > > > > > > > > > > > Hopefully this should cover the
> requirement,
> >> > > until
> >> > > > we
> >> > > > > > > find
> >> > > > > > > > > the
> >> > > > > > > > > > > need
> >> > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > advanced downgrade support.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary
> of
> >> > > feature
> >> > > > > > > > versions
> >> > > > > > > > > > will
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > defined
> >> > > > > > > > > > > > > > > > > in the broker code." So this means in
> >> order
> >> > to
> >> > > > > > > restrict a
> >> > > > > > > > > > > certain
> >> > > > > > > > > > > > > > > > feature,
> >> > > > > > > > > > > > > > > > > we need to start the broker first and
> then
> >> > > send a
> >> > > > > > > feature
> >> > > > > > > > > > > gating
> >> > > > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > > > immediately, which introduces a time gap
> >> and
> >> > > the
> >> > > > > > > > > > > > intended-to-close
> >> > > > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > could actually serve request during this
> >> > phase.
> >> > > > Do
> >> > > > > > you
> >> > > > > > > > > think
> >> > > > > > > > > > we
> >> > > > > > > > > > > > > > should
> >> > > > > > > > > > > > > > > > also
> >> > > > > > > > > > > > > > > > > support configurations as well so that
> >> admin
> >> > > user
> >> > > > > > could
> >> > > > > > > > > > freely
> >> > > > > > > > > > > > roll
> >> > > > > > > > > > > > > > up
> >> > > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > > cluster with all nodes complying the
> same
> >> > > feature
> >> > > > > > > gating,
> >> > > > > > > > > > > without
> >> > > > > > > > > > > > > > > > worrying
> >> > > > > > > > > > > > > > > > > about the turnaround time to propagate
> the
> >> > > > message
> >> > > > > > only
> >> > > > > > > > > after
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > > starts up?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): This is a great point/question.
> >> One
> >> > of
> >> > > > the
> >> > > > > > > > > > > expectations
> >> > > > > > > > > > > > > out
> >> > > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > this KIP, which is
> >> > > > > > > > > > > > > > > > already followed in the broker, is the
> >> > following.
> >> > > > > > > > > > > > > > > >  - Imagine at time T1 the broker starts up
> >> and
> >> > > > > > registers
> >> > > > > > > > it’s
> >> > > > > > > > > > > > > presence
> >> > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > ZK,
> >> > > > > > > > > > > > > > > >    along with advertising it’s supported
> >> > > features.
> >> > > > > > > > > > > > > > > >  - Imagine at a future time T2 the broker
> >> > > receives
> >> > > > > the
> >> > > > > > > > > > > > > > > > UpdateMetadataRequest
> >> > > > > > > > > > > > > > > >    from the controller, which contains the
> >> > latest
> >> > > > > > > finalized
> >> > > > > > > > > > > > features
> >> > > > > > > > > > > > > as
> >> > > > > > > > > > > > > > > > seen by
> >> > > > > > > > > > > > > > > >    the controller. The broker validates
> this
> >> > data
> >> > > > > > against
> >> > > > > > > > > it’s
> >> > > > > > > > > > > > > > supported
> >> > > > > > > > > > > > > > > > features to
> >> > > > > > > > > > > > > > > >    make sure there is no mismatch (it will
> >> > > shutdown
> >> > > > > if
> >> > > > > > > > there
> >> > > > > > > > > is
> >> > > > > > > > > > > an
> >> > > > > > > > > > > > > > > > incompatibility).
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > It is expected that during the time
> between
> >> > the 2
> >> > > > > > events
> >> > > > > > > T1
> >> > > > > > > > > and
> >> > > > > > > > > > > T2,
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > broker is
> >> > > > > > > > > > > > > > > > almost a silent entity in the cluster. It
> >> does
> >> > > not
> >> > > > > add
> >> > > > > > > any
> >> > > > > > > > > > value
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > cluster, or carry
> >> > > > > > > > > > > > > > > > out any important broker activities. By
> >> > > > “important”,
> >> > > > > I
> >> > > > > > > mean
> >> > > > > > > > > it
> >> > > > > > > > > > is
> >> > > > > > > > > > > > not
> >> > > > > > > > > > > > > > > doing
> >> > > > > > > > > > > > > > > > mutations
> >> > > > > > > > > > > > > > > > on it’s persistence, not mutating critical
> >> > > > in-memory
> >> > > > > > > state,
> >> > > > > > > > > > won’t
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > serving
> >> > > > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t
> even
> >> > know
> >> > > > > it’s
> >> > > > > > > > > assigned
> >> > > > > > > > > > > > > > > partitions
> >> > > > > > > > > > > > > > > > until
> >> > > > > > > > > > > > > > > > it receives UpdateMetadataRequest from
> >> > > controller.
> >> > > > > > > Anything
> >> > > > > > > > > the
> >> > > > > > > > > > > > > broker
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > doing up
> >> > > > > > > > > > > > > > > > until this point is not damaging/useful.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > I’ve clarified the above in the KIP, see
> >> this
> >> > new
> >> > > > > > > section:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> >> > > > > > > > > > > > > > > > .
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
> >> > deleting
> >> > > an
> >> > > > > > > > existing
> >> > > > > > > > > > > > > Feature",
> >> > > > > > > > > > > > > > > may
> >> > > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > I misunderstood something, I thought the
> >> > > features
> >> > > > > are
> >> > > > > > > > > defined
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > > broker
> >> > > > > > > > > > > > > > > > > code, so admin could not really create a
> >> new
> >> > > > > feature?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! You understood
> this
> >> > > right.
> >> > > > > Here
> >> > > > > > > > > adding
> >> > > > > > > > > > a
> >> > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > means we are
> >> > > > > > > > > > > > > > > > adding a cluster-wide finalized *max*
> >> version
> >> > > for a
> >> > > > > > > feature
> >> > > > > > > > > > that
> >> > > > > > > > > > > > was
> >> > > > > > > > > > > > > > > > previously never finalized.
> >> > > > > > > > > > > > > > > > I have clarified this in the KIP now.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 6. I think we need a separate error code
> >> like
> >> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > reject a concurrent feature update
> >> request.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified
> the
> >> KIP
> >> > > > > adding
> >> > > > > > > the
> >> > > > > > > > > > above
> >> > > > > > > > > > > > (see
> >> > > > > > > > > > > > > > > > 'Tooling support -> Admin API changes').
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the
> >> > alternative
> >> > > > > > > solution
> >> > > > > > > > to
> >> > > > > > > > > > > pass
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > feature information through Zookeeper.
> Is
> >> > that
> >> > > > > > > mentioned
> >> > > > > > > > in
> >> > > > > > > > > > the
> >> > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> >> > > > favorable?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads
> >> > > > finalized
> >> > > > > > > > feature
> >> > > > > > > > > > info
> >> > > > > > > > > > > > > > stored
> >> > > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > ZK,
> >> > > > > > > > > > > > > > > > only during startup when it does a
> >> validation.
> >> > > When
> >> > > > > > > serving
> >> > > > > > > > > > > > > > > > `ApiVersionsRequest`, the
> >> > > > > > > > > > > > > > > > broker does not read this info from ZK
> >> > directly.
> >> > > > I'd
> >> > > > > > > > imagine
> >> > > > > > > > > > the
> >> > > > > > > > > > > > risk
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > that it can increase
> >> > > > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck
> >> for
> >> > the
> >> > > > > > system.
> >> > > > > > > > > > Today,
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > > Kafka
> >> > > > > > > > > > > > > > > > we use the
> >> > > > > > > > > > > > > > > > controller to fan out ZK updates to
> brokers
> >> and
> >> > > we
> >> > > > > want
> >> > > > > > > to
> >> > > > > > > > > > stick
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > pattern to avoid
> >> > > > > > > > > > > > > > > > the ZK read bottleneck when serving
> >> > > > > > `ApiVersionsRequest`.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 8. I was under the impression that user
> >> could
> >> > > > > > > configure a
> >> > > > > > > > > > range
> >> > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > > supported versions, what's the trade-off
> >> for
> >> > > > > allowing
> >> > > > > > > > > single
> >> > > > > > > > > > > > > > finalized
> >> > > > > > > > > > > > > > > > > version only?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great question! The finalized
> >> > version
> >> > > > of a
> >> > > > > > > > feature
> >> > > > > > > > > > > > > basically
> >> > > > > > > > > > > > > > > > refers to
> >> > > > > > > > > > > > > > > > the cluster-wide finalized feature
> "maximum"
> >> > > > version.
> >> > > > > > For
> >> > > > > > > > > > > example,
> >> > > > > > > > > > > > if
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > 'group_coordinator' feature
> >> > > > > > > > > > > > > > > > has the finalized version set to 10, then,
> >> it
> >> > > means
> >> > > > > > that
> >> > > > > > > > > > > > cluster-wide
> >> > > > > > > > > > > > > > all
> >> > > > > > > > > > > > > > > > versions upto v10 are
> >> > > > > > > > > > > > > > > > supported for this feature. However, note
> >> that
> >> > if
> >> > > > > some
> >> > > > > > > > > version
> >> > > > > > > > > > > (ex:
> >> > > > > > > > > > > > > v0)
> >> > > > > > > > > > > > > > > > gets deprecated
> >> > > > > > > > > > > > > > > > for this feature, then we don’t convey
> that
> >> > using
> >> > > > > this
> >> > > > > > > > scheme
> >> > > > > > > > > > > (also
> >> > > > > > > > > > > > > > > > supporting deprecation is a non-goal).
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at
> all
> >> > > points,
> >> > > > > > > > refering
> >> > > > > > > > > to
> >> > > > > > > > > > > > > > finalized
> >> > > > > > > > > > > > > > > > feature "maximum" versions.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here
> >> the
> >> > > > > "client"
> >> > > > > > > here
> >> > > > > > > > > may
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > producer
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang
> Chen
> >> <
> >> > > > > > > > > > > > > > reluctanthero104@gmail.com>
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Hey Kowshik,
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple
> >> of
> >> > > > > > questions:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to
> >> begin
> >> > > > > handling
> >> > > > > > > EOS
> >> > > > > > > > > > > > traffic"
> >> > > > > > > > > > > > > > > could
> >> > > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > converted as "When is it safe for the
> >> brokers
> >> > > to
> >> > > > > > start
> >> > > > > > > > > > serving
> >> > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS
> is
> >> not
> >> > > > > > explained
> >> > > > > > > > > > earlier
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > context.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the
> >> metadata
> >> > > > > version
> >> > > > > > > > > number
> >> > > > > > > > > > > part
> >> > > > > > > > > > > > > > > seems a
> >> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference
> >> to
> >> > > later
> >> > > > > > > section
> >> > > > > > > > > > that
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > > going
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > store it in Zookeeper and update it
> every
> >> > time
> >> > > > when
> >> > > > > > > there
> >> > > > > > > > > is
> >> > > > > > > > > > a
> >> > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > change?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although
> >> it's a
> >> > > > > > Non-goal
> >> > > > > > > of
> >> > > > > > > > > the
> >> > > > > > > > > > > > KIP,
> >> > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > > features such as group coordinator
> >> semantics,
> >> > > > there
> >> > > > > > is
> >> > > > > > > no
> >> > > > > > > > > > legal
> >> > > > > > > > > > > > > > > scenario
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > perform a downgrade at all. So having
> >> > downgrade
> >> > > > > door
> >> > > > > > > open
> >> > > > > > > > > is
> >> > > > > > > > > > > > pretty
> >> > > > > > > > > > > > > > > > > error-prone as human faults happen all
> the
> >> > > time.
> >> > > > > I'm
> >> > > > > > > > > assuming
> >> > > > > > > > > > > as
> >> > > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > features are implemented, it's not very
> >> hard
> >> > to
> >> > > > > add a
> >> > > > > > > > flag
> >> > > > > > > > > > > during
> >> > > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > creation to indicate whether this
> feature
> >> is
> >> > > > > > > > > "downgradable".
> >> > > > > > > > > > > > Could
> >> > > > > > > > > > > > > > you
> >> > > > > > > > > > > > > > > > > explain a bit more on the extra
> >> engineering
> >> > > > effort
> >> > > > > > for
> >> > > > > > > > > > shipping
> >> > > > > > > > > > > > > this
> >> > > > > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > > > > with downgrade protection in place?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary
> of
> >> > > feature
> >> > > > > > > > versions
> >> > > > > > > > > > will
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > defined
> >> > > > > > > > > > > > > > > > > in the broker code." So this means in
> >> order
> >> > to
> >> > > > > > > restrict a
> >> > > > > > > > > > > certain
> >> > > > > > > > > > > > > > > > feature,
> >> > > > > > > > > > > > > > > > > we need to start the broker first and
> then
> >> > > send a
> >> > > > > > > feature
> >> > > > > > > > > > > gating
> >> > > > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > > > immediately, which introduces a time gap
> >> and
> >> > > the
> >> > > > > > > > > > > > intended-to-close
> >> > > > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > could actually serve request during this
> >> > phase.
> >> > > > Do
> >> > > > > > you
> >> > > > > > > > > think
> >> > > > > > > > > > we
> >> > > > > > > > > > > > > > should
> >> > > > > > > > > > > > > > > > also
> >> > > > > > > > > > > > > > > > > support configurations as well so that
> >> admin
> >> > > user
> >> > > > > > could
> >> > > > > > > > > > freely
> >> > > > > > > > > > > > roll
> >> > > > > > > > > > > > > > up
> >> > > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > > cluster with all nodes complying the
> same
> >> > > feature
> >> > > > > > > gating,
> >> > > > > > > > > > > without
> >> > > > > > > > > > > > > > > > worrying
> >> > > > > > > > > > > > > > > > > about the turnaround time to propagate
> the
> >> > > > message
> >> > > > > > only
> >> > > > > > > > > after
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > > starts up?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
> >> > deleting
> >> > > an
> >> > > > > > > > existing
> >> > > > > > > > > > > > > Feature",
> >> > > > > > > > > > > > > > > may
> >> > > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > I misunderstood something, I thought the
> >> > > features
> >> > > > > are
> >> > > > > > > > > defined
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > > broker
> >> > > > > > > > > > > > > > > > > code, so admin could not really create a
> >> new
> >> > > > > feature?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 6. I think we need a separate error code
> >> like
> >> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > reject a concurrent feature update
> >> request.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the
> >> > alternative
> >> > > > > > > solution
> >> > > > > > > > to
> >> > > > > > > > > > > pass
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > feature information through Zookeeper.
> Is
> >> > that
> >> > > > > > > mentioned
> >> > > > > > > > in
> >> > > > > > > > > > the
> >> > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> >> > > > favorable?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 8. I was under the impression that user
> >> could
> >> > > > > > > configure a
> >> > > > > > > > > > range
> >> > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > > supported versions, what's the trade-off
> >> for
> >> > > > > allowing
> >> > > > > > > > > single
> >> > > > > > > > > > > > > > finalized
> >> > > > > > > > > > > > > > > > > version only?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here
> >> the
> >> > > > > "client"
> >> > > > > > > here
> >> > > > > > > > > may
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > producer
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Boyang
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin
> >> McCabe
> >> > <
> >> > > > > > > > > > > cmccabe@apache.org
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24,
> Kowshik
> >> > > > Prakasam
> >> > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > > Hi Colin,
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Thanks for the feedback! I've
> changed
> >> the
> >> > > KIP
> >> > > > > to
> >> > > > > > > > > address
> >> > > > > > > > > > > your
> >> > > > > > > > > > > > > > > > > > > suggestions.
> >> > > > > > > > > > > > > > > > > > > Please find below my explanation.
> Here
> >> > is a
> >> > > > > link
> >> > > > > > to
> >> > > > > > > > KIP
> >> > > > > > > > > > > 584:
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > > > > > > > > > > > > > > .
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > 1. '__data_version__' is the version
> >> of
> >> > the
> >> > > > > > > finalized
> >> > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > metadata
> >> > > > > > > > > > > > > > > > > > > (i.e. actual ZK node contents),
> while
> >> the
> >> > > > > > > > > > > > '__schema_version__'
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > > version of the schema of the data
> >> > persisted
> >> > > > in
> >> > > > > > ZK.
> >> > > > > > > > > These
> >> > > > > > > > > > > > serve
> >> > > > > > > > > > > > > > > > > different
> >> > > > > > > > > > > > > > > > > > > purposes. '__data_version__' is is
> >> useful
> >> > > > > mainly
> >> > > > > > to
> >> > > > > > > > > > clients
> >> > > > > > > > > > > > > > during
> >> > > > > > > > > > > > > > > > > reads,
> >> > > > > > > > > > > > > > > > > > > to differentiate between the 2
> >> versions
> >> > of
> >> > > > > > > eventually
> >> > > > > > > > > > > > > consistent
> >> > > > > > > > > > > > > > > > > > 'finalized
> >> > > > > > > > > > > > > > > > > > > features' metadata (i.e. larger
> >> metadata
> >> > > > > version
> >> > > > > > is
> >> > > > > > > > > more
> >> > > > > > > > > > > > > recent).
> >> > > > > > > > > > > > > > > > > > > '__schema_version__' provides an
> >> > additional
> >> > > > > > degree
> >> > > > > > > of
> >> > > > > > > > > > > > > > flexibility,
> >> > > > > > > > > > > > > > > > > where
> >> > > > > > > > > > > > > > > > > > if
> >> > > > > > > > > > > > > > > > > > > we decide to change the schema for
> >> > > > '/features'
> >> > > > > > node
> >> > > > > > > > in
> >> > > > > > > > > ZK
> >> > > > > > > > > > > (in
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > future),
> >> > > > > > > > > > > > > > > > > > > then we can manage broker roll outs
> >> > > suitably
> >> > > > > > (i.e.
> >> > > > > > > > > > > > > > > > > > > serialization/deserialization of the
> >> ZK
> >> > > data
> >> > > > > can
> >> > > > > > be
> >> > > > > > > > > > handled
> >> > > > > > > > > > > > > > > safely).
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Hi Kowshik,
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > If you're talking about a number that
> >> lets
> >> > > you
> >> > > > > know
> >> > > > > > > if
> >> > > > > > > > > data
> >> > > > > > > > > > > is
> >> > > > > > > > > > > > > more
> >> > > > > > > > > > > > > > > or
> >> > > > > > > > > > > > > > > > > > less recent, we would typically call
> >> that
> >> > an
> >> > > > > epoch,
> >> > > > > > > and
> >> > > > > > > > > > not a
> >> > > > > > > > > > > > > > > version.
> >> > > > > > > > > > > > > > > > > For
> >> > > > > > > > > > > > > > > > > > the ZK data structures, the word
> >> "version"
> >> > is
> >> > > > > > > typically
> >> > > > > > > > > > > > reserved
> >> > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > > > describing changes to the overall
> >> schema of
> >> > > the
> >> > > > > > data
> >> > > > > > > > that
> >> > > > > > > > > > is
> >> > > > > > > > > > > > > > written
> >> > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > > ZooKeeper.  We don't even really
> change
> >> the
> >> > > > > > "version"
> >> > > > > > > > of
> >> > > > > > > > > > > those
> >> > > > > > > > > > > > > > > schemas
> >> > > > > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > > > much, since most changes are
> >> > > > > backwards-compatible.
> >> > > > > > > But
> >> > > > > > > > > we
> >> > > > > > > > > > do
> >> > > > > > > > > > > > > > include
> >> > > > > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > > > version field just in case.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > I don't think we really need an epoch
> >> here,
> >> > > > > though,
> >> > > > > > > > since
> >> > > > > > > > > > we
> >> > > > > > > > > > > > can
> >> > > > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > > > look
> >> > > > > > > > > > > > > > > > > > at the broker epoch.  Whenever the
> >> broker
> >> > > > > > registers,
> >> > > > > > > > its
> >> > > > > > > > > > > epoch
> >> > > > > > > > > > > > > will
> >> > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > > greater than the previous broker
> epoch.
> >> > And
> >> > > > the
> >> > > > > > > newly
> >> > > > > > > > > > > > registered
> >> > > > > > > > > > > > > > > data
> >> > > > > > > > > > > > > > > > > will
> >> > > > > > > > > > > > > > > > > > take priority.  This will be a lot
> >> simpler
> >> > > than
> >> > > > > > > adding
> >> > > > > > > > a
> >> > > > > > > > > > > > separate
> >> > > > > > > > > > > > > > > epoch
> >> > > > > > > > > > > > > > > > > > system, I think.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > 2. Regarding admin client needing
> min
> >> and
> >> > > max
> >> > > > > > > > > > information -
> >> > > > > > > > > > > > you
> >> > > > > > > > > > > > > > are
> >> > > > > > > > > > > > > > > > > > right!
> >> > > > > > > > > > > > > > > > > > > I've changed the KIP such that the
> >> Admin
> >> > > API
> >> > > > > also
> >> > > > > > > > > allows
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > user
> >> > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > read
> >> > > > > > > > > > > > > > > > > > > 'supported features' from a specific
> >> > > broker.
> >> > > > > > Please
> >> > > > > > > > > look
> >> > > > > > > > > > at
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > section
> >> > > > > > > > > > > > > > > > > > > "Admin API changes".
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs
> >> `Long`
> >> > -
> >> > > it
> >> > > > > was
> >> > > > > > > not
> >> > > > > > > > > > > > > deliberate.
> >> > > > > > > > > > > > > > > > I've
> >> > > > > > > > > > > > > > > > > > > improved the KIP to just use `long`
> at
> >> > all
> >> > > > > > places.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Sounds good.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > 4. Regarding
> >> kafka.admin.FeatureCommand
> >> > > tool
> >> > > > -
> >> > > > > > you
> >> > > > > > > > are
> >> > > > > > > > > > > right!
> >> > > > > > > > > > > > > > I've
> >> > > > > > > > > > > > > > > > > > updated
> >> > > > > > > > > > > > > > > > > > > the KIP sketching the functionality
> >> > > provided
> >> > > > by
> >> > > > > > > this
> >> > > > > > > > > > tool,
> >> > > > > > > > > > > > with
> >> > > > > > > > > > > > > > > some
> >> > > > > > > > > > > > > > > > > > > examples. Please look at the section
> >> > > "Tooling
> >> > > > > > > support
> >> > > > > > > > > > > > > examples".
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Thank you!
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks, Kowshik.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > cheers,
> >> > > > > > > > > > > > > > > > > > Colin
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM
> Colin
> >> > > > McCabe <
> >> > > > > > > > > > > > > > cmccabe@apache.org>
> >> > > > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > In the "Schema" section, do we
> >> really
> >> > > need
> >> > > > > both
> >> > > > > > > > > > > > > > > __schema_version__
> >> > > > > > > > > > > > > > > > > and
> >> > > > > > > > > > > > > > > > > > > > __data_version__?  Can we just
> have
> >> a
> >> > > > single
> >> > > > > > > > version
> >> > > > > > > > > > > field
> >> > > > > > > > > > > > > > here?
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client)
> function
> >> > have
> >> > > > > some
> >> > > > > > > way
> >> > > > > > > > to
> >> > > > > > > > > > get
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > min
> >> > > > > > > > > > > > > > > > and
> >> > > > > > > > > > > > > > > > > > max
> >> > > > > > > > > > > > > > > > > > > > information that we're exposing as
> >> > > well?  I
> >> > > > > > guess
> >> > > > > > > > we
> >> > > > > > > > > > > could
> >> > > > > > > > > > > > > have
> >> > > > > > > > > > > > > > > > min,
> >> > > > > > > > > > > > > > > > > > max,
> >> > > > > > > > > > > > > > > > > > > > and current.  Unrelated: is the
> use
> >> of
> >> > > Long
> >> > > > > > > rather
> >> > > > > > > > > than
> >> > > > > > > > > > > > long
> >> > > > > > > > > > > > > > > > > deliberate
> >> > > > > > > > > > > > > > > > > > > > here?
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > It would be good to describe how
> the
> >> > > > command
> >> > > > > > line
> >> > > > > > > > > tool
> >> > > > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will
> >> work.
> >> > > For
> >> > > > > > > example
> >> > > > > > > > > the
> >> > > > > > > > > > > > flags
> >> > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > it
> >> > > > > > > > > > > > > > > > > > will
> >> > > > > > > > > > > > > > > > > > > > take and the output that it will
> >> > generate
> >> > > > to
> >> > > > > > > > STDOUT.
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > cheers,
> >> > > > > > > > > > > > > > > > > > > > Colin
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08,
> >> Kowshik
> >> > > > > > Prakasam
> >> > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > > > > Hi all,
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > I've opened KIP-584
> >> > > > > > > > > > > > > > > <
> >> https://issues.apache.org/jira/browse/KIP-584>
> >> > <
> >> > > > > > > > > > > > > > > >
> >> https://issues.apache.org/jira/browse/KIP-584
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > which
> >> > > > > > > > > > > > > > > > > > > > > is intended to provide a
> >> versioning
> >> > > > scheme
> >> > > > > > for
> >> > > > > > > > > > > features.
> >> > > > > > > > > > > > > I'd
> >> > > > > > > > > > > > > > > like
> >> > > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > > use
> >> > > > > > > > > > > > > > > > > > > > > this thread to discuss the same.
> >> I'd
> >> > > > > > appreciate
> >> > > > > > > > any
> >> > > > > > > > > > > > > feedback
> >> > > > > > > > > > > > > > on
> >> > > > > > > > > > > > > > > > > this.
> >> > > > > > > > > > > > > > > > > > > > > Here
> >> > > > > > > > > > > > > > > > > > > > > is a link to KIP-584
> >> > > > > > > > > > > > > > > <
> >> https://issues.apache.org/jira/browse/KIP-584>:
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > > > > > > > > > > > > > > > >  .
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Thank you!
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Jun,

Sorry the links were broken in my last response, here are the right links:

200. https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioning
Scheme For Features-Validations
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations>
110. https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-When
To Use Versioned Feature Flags?
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Whentouseversionedfeatureflags?>


Cheers,
Kowshik

On Wed, Apr 15, 2020 at 6:24 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

>
> Hi Jun,
>
> Thanks for the feedback! I have addressed the comments in the KIP.
>
> > 200. In the validation section, there is still the text  "*from*
> > {"max_version_level":
> > X} *to* {"max_version_level": X’}". It seems that it should say "from X
> to
> > Y"?
>
> (Kowshik): Done. I have reworded it a bit to make it clearer now in this
> section:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
>
> > 110. Could we add that we need to document the bumped version of each
> > feature in the upgrade section of a release?
>
> (Kowshik): Great point! Done, I have mentioned it in #3 this section:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>
> %3A+Versioning+scheme+for+features#KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>
> :Versioningschemeforfeatures-Whentouseversionedfeatureflags?
>
>
> Cheers,
> Kowshik
>
> On Wed, Apr 15, 2020 at 4:00 PM Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Kowshik,
>>
>> Looks good to me now. Just a couple of minor things below.
>>
>> 200. In the validation section, there is still the text  "*from*
>> {"max_version_level":
>> X} *to* {"max_version_level": X’}". It seems that it should say "from X to
>> Y"?
>>
>> 110. Could we add that we need to document the bumped version of each
>> feature in the upgrade section of a release?
>>
>> Thanks,
>>
>> Jun
>>
>> On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam <kp...@confluent.io>
>> wrote:
>>
>> > Hi Jun,
>> >
>> > Thank you for the suggestion! I have updated the KIP, please find my
>> > response below.
>> >
>> > > 200. I guess you are saying only when the allowDowngrade field is set,
>> > the
>> > > finalized feature version can go backward. Otherwise, it can only go
>> up.
>> > > That makes sense. It would be useful to make that clear when
>> explaining
>> > > the usage of the allowDowngrade field. In the validation section, we
>> > have  "
>> > > /features' from {"max_version_level": X} to {"max_version_level":
>> X’}",
>> > it
>> > > seems that we need to mention Y there.
>> >
>> > (Kowshik): Great point! Yes, that is correct. Done, I have updated the
>> > validations
>> > section explaining the above. Here is a link to this section:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
>> >
>> >
>> > Cheers,
>> > Kowshik
>> >
>> >
>> >
>> >
>> > On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <ju...@confluent.io> wrote:
>> >
>> > > Hi, Kowshik,
>> > >
>> > > 200. I guess you are saying only when the allowDowngrade field is set,
>> > the
>> > > finalized feature version can go backward. Otherwise, it can only go
>> up.
>> > > That makes sense. It would be useful to make that clear when
>> explaining
>> > > the usage of the allowDowngrade field. In the validation section, we
>> have
>> > > "
>> > > /features' from {"max_version_level": X} to {"max_version_level":
>> X’}",
>> > it
>> > > seems that we need to mention Y there.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > > On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <
>> > kprakasam@confluent.io>
>> > > wrote:
>> > >
>> > > > Hi Jun,
>> > > >
>> > > > Great question! Please find my response below.
>> > > >
>> > > > > 200. My understanding is that If the CLI tool passes the
>> > > > > '--allow-downgrade' flag when updating a specific feature, then a
>> > > future
>> > > > > downgrade is possible. Otherwise, the feature is now
>> downgradable. If
>> > > so,
>> > > > I
>> > > > > was wondering how the controller remembers this since it can be
>> > > restarted
>> > > > > over time?
>> > > >
>> > > > (Kowshik): The purpose of the flag was to just restrict the user
>> intent
>> > > for
>> > > > a specific request.
>> > > > It seems to me that to avoid confusion, I could call the flag as
>> > > > `--try-downgrade` instead.
>> > > > Then this makes it clear, that, the controller just has to consider
>> the
>> > > ask
>> > > > from
>> > > > the user as an explicit request to attempt a downgrade.
>> > > >
>> > > > The flag does not act as an override on controller's decision making
>> > that
>> > > > decides whether
>> > > > a flag is downgradable (these decisions on whether to allow a flag
>> to
>> > be
>> > > > downgraded
>> > > > from a specific version level, can be embedded in the controller
>> code).
>> > > >
>> > > > Please let me know what you think.
>> > > > Sorry if I misunderstood the original question.
>> > > >
>> > > >
>> > > > Cheers,
>> > > > Kowshik
>> > > >
>> > > >
>> > > > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <ju...@confluent.io> wrote:
>> > > >
>> > > > > Hi, Kowshik,
>> > > > >
>> > > > > Thanks for the reply. Makes sense. Just one more question.
>> > > > >
>> > > > > 200. My understanding is that If the CLI tool passes the
>> > > > > '--allow-downgrade' flag when updating a specific feature, then a
>> > > future
>> > > > > downgrade is possible. Otherwise, the feature is now
>> downgradable. If
>> > > > so, I
>> > > > > was wondering how the controller remembers this since it can be
>> > > restarted
>> > > > > over time?
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > >
>> > > > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <
>> > > kprakasam@confluent.io
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Jun,
>> > > > > >
>> > > > > > Thanks a lot for the feedback and the questions!
>> > > > > > Please find my response below.
>> > > > > >
>> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade
>> field.
>> > It
>> > > > > seems
>> > > > > > > that field needs to be persisted somewhere in ZK?
>> > > > > >
>> > > > > > (Kowshik): Great question! Below is my explanation. Please help
>> me
>> > > > > > understand,
>> > > > > > if you feel there are cases where we would need to still
>> persist it
>> > > in
>> > > > > ZK.
>> > > > > >
>> > > > > > Firstly I have updated my thoughts into the KIP now, under the
>> > > > > 'guidelines'
>> > > > > > section:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
>> > > > > >
>> > > > > > The allowDowngrade boolean field is just to restrict the user
>> > intent,
>> > > > and
>> > > > > > to remind
>> > > > > > them to double check their intent before proceeding. It should
>> be
>> > set
>> > > > to
>> > > > > > true
>> > > > > > by the user in a request, only when the user intent is to
>> > forcefully
>> > > > > > "attempt" a
>> > > > > > downgrade of a specific feature's max version level, to the
>> > provided
>> > > > > value
>> > > > > > in
>> > > > > > the request.
>> > > > > >
>> > > > > > We can extend this safeguard. The controller (on it's end) can
>> > > maintain
>> > > > > > rules in the code, that, for safety reasons would outright
>> reject
>> > > > certain
>> > > > > > downgrades
>> > > > > > from a specific max_version_level for a specific feature. Such
>> > > > rejections
>> > > > > > may
>> > > > > > happen depending on the feature being downgraded, and from what
>> > > version
>> > > > > > level.
>> > > > > >
>> > > > > > The CLI tool only allows a downgrade attempt in conjunction with
>> > > > specific
>> > > > > > flags and sub-commands. For example, in the CLI tool, if the
>> user
>> > > uses
>> > > > > the
>> > > > > > 'downgrade-all' command, or passes '--allow-downgrade' flag when
>> > > > > updating a
>> > > > > > specific feature, only then the tool will translate this ask to
>> > > setting
>> > > > > > 'allowDowngrade' field in the request to the server.
>> > > > > >
>> > > > > > > 201. UpdateFeaturesResponse has the following top level
>> fields.
>> > > > Should
>> > > > > > > those fields be per feature?
>> > > > > > >
>> > > > > > >   "fields": [
>> > > > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
>> > > > > > >       "about": "The error code, or 0 if there was no error."
>> },
>> > > > > > >     { "name": "ErrorMessage", "type": "string", "versions":
>> "0+",
>> > > > > > >       "about": "The error message, or null if there was no
>> > error."
>> > > }
>> > > > > > >   ]
>> > > > > >
>> > > > > > (Kowshik): Great question!
>> > > > > > As such, the API is transactional, as explained in the sections
>> > > linked
>> > > > > > below.
>> > > > > > Either all provided FeatureUpdate was applied, or none.
>> > > > > > It's the reason I felt we can have just one error code +
>> message.
>> > > > > > Happy to extend this if you feel otherwise. Please let me know.
>> > > > > >
>> > > > > > Link to sections:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
>> > > > > >
>> > > > > > > 202. The /features path in ZK has a field min_version_level.
>> > Which
>> > > > API
>> > > > > > and
>> > > > > > > tool can change that value?
>> > > > > >
>> > > > > > (Kowshik): Great question! Currently this cannot be modified by
>> > using
>> > > > the
>> > > > > > API or the tool.
>> > > > > > Feature version deprecation (by raising min_version_level) can
>> be
>> > > done
>> > > > > only
>> > > > > > by the Controller directly. The rationale is explained in this
>> > > section:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
>> > > > > >
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Kowshik
>> > > > > >
>> > > > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io>
>> wrote:
>> > > > > >
>> > > > > > > Hi, Kowshik,
>> > > > > > >
>> > > > > > > Thanks for addressing those comments. Just a few more minor
>> > > comments.
>> > > > > > >
>> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade
>> field.
>> > It
>> > > > > seems
>> > > > > > > that field needs to be persisted somewhere in ZK?
>> > > > > > >
>> > > > > > > 201. UpdateFeaturesResponse has the following top level
>> fields.
>> > > > Should
>> > > > > > > those fields be per feature?
>> > > > > > >
>> > > > > > >   "fields": [
>> > > > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
>> > > > > > >       "about": "The error code, or 0 if there was no error."
>> },
>> > > > > > >     { "name": "ErrorMessage", "type": "string", "versions":
>> "0+",
>> > > > > > >       "about": "The error message, or null if there was no
>> > error."
>> > > }
>> > > > > > >   ]
>> > > > > > >
>> > > > > > > 202. The /features path in ZK has a field min_version_level.
>> > Which
>> > > > API
>> > > > > > and
>> > > > > > > tool can change that value?
>> > > > > > >
>> > > > > > > Jun
>> > > > > > >
>> > > > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
>> > > > > kprakasam@confluent.io
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Jun,
>> > > > > > > >
>> > > > > > > > Thanks for the feedback! I have updated the KIP-584
>> addressing
>> > > your
>> > > > > > > > comments.
>> > > > > > > > Please find my response below.
>> > > > > > > >
>> > > > > > > > > 100.6 You can look for the sentence "This operation
>> requires
>> > > > ALTER
>> > > > > on
>> > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
>> > > > > > > > > KafkaApis.authorize().
>> > > > > > > >
>> > > > > > > > (Kowshik): Done. Great point! For the newly introduced
>> > > > > UPDATE_FEATURES
>> > > > > > > api,
>> > > > > > > > I have added a
>> > > > > > > > requirement that AclOperation.ALTER is required on
>> > > > > > ResourceType.CLUSTER.
>> > > > > > > >
>> > > > > > > > > 110. Keeping the feature version as int is probably fine.
>> I
>> > > just
>> > > > > felt
>> > > > > > > > that
>> > > > > > > > > for some of the common user interactions, it's more
>> > convenient
>> > > to
>> > > > > > > > > relate that to a release version. For example, if a user
>> > wants
>> > > to
>> > > > > > > > downgrade
>> > > > > > > > > to a release 2.5, it's easier for the user to use the tool
>> > like
>> > > > > "tool
>> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
>> > > > --version
>> > > > > > 6".
>> > > > > > > >
>> > > > > > > > (Kowshik): Great point. Generally, maximum feature version
>> > levels
>> > > > are
>> > > > > > not
>> > > > > > > > downgradable after
>> > > > > > > > they are finalized in the cluster. This is because, as a
>> > > guideline
>> > > > > > > bumping
>> > > > > > > > feature version level usually is used mainly to convey
>> > important
>> > > > > > breaking
>> > > > > > > > changes.
>> > > > > > > > Despite the above, there may be some extreme/rare cases
>> where a
>> > > > user
>> > > > > > > wants
>> > > > > > > > to downgrade
>> > > > > > > > all features to a specific previous release. The user may
>> want
>> > to
>> > > > do
>> > > > > > this
>> > > > > > > > just
>> > > > > > > > prior to rolling back a Kafka cluster to a previous release.
>> > > > > > > >
>> > > > > > > > To support the above, I have made a change to the KIP
>> > explaining
>> > > > that
>> > > > > > the
>> > > > > > > > CLI tool is versioned.
>> > > > > > > > The CLI tool internally has knowledge about a map of
>> features
>> > to
>> > > > > their
>> > > > > > > > respective max
>> > > > > > > > versions supported by the Broker. The tool's knowledge of
>> > > features
>> > > > > and
>> > > > > > > > their version values,
>> > > > > > > > is limited to the version of the CLI tool itself i.e. the
>> > > > information
>> > > > > > is
>> > > > > > > > packaged into the CLI tool
>> > > > > > > > when it is released. Whenever a Kafka release introduces a
>> new
>> > > > > feature
>> > > > > > > > version, or modifies
>> > > > > > > > an existing feature version, the CLI tool shall also be
>> updated
>> > > > with
>> > > > > > this
>> > > > > > > > information,
>> > > > > > > > Newer versions of the CLI tool will be released as part of
>> the
>> > > > Kafka
>> > > > > > > > releases.
>> > > > > > > >
>> > > > > > > > Therefore, to achieve the downgrade need, the user just
>> needs
>> > to
>> > > > run
>> > > > > > the
>> > > > > > > > version of
>> > > > > > > > the CLI tool that's part of the particular previous release
>> > that
>> > > > > he/she
>> > > > > > > is
>> > > > > > > > downgrading to.
>> > > > > > > > To help the user with this, there is a new command added to
>> the
>> > > CLI
>> > > > > > tool
>> > > > > > > > called `downgrade-all`.
>> > > > > > > > This essentially downgrades max version levels of all
>> features
>> > in
>> > > > the
>> > > > > > > > cluster to the versions
>> > > > > > > > known to the CLI tool internally.
>> > > > > > > >
>> > > > > > > > I have explained the above in the KIP under these sections:
>> > > > > > > >
>> > > > > > > > Tooling support (have explained that the CLI tool is
>> > versioned):
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
>> > > > > > > >
>> > > > > > > > Regular CLI tool usage (please refer to point #3, and see
>> the
>> > > > tooling
>> > > > > > > > example)
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
>> > > > > > > >
>> > > > > > > > > 110. Similarly, if the client library finds a feature
>> > mismatch
>> > > > with
>> > > > > > the
>> > > > > > > > broker,
>> > > > > > > > > the client likely needs to log some error message for the
>> > user
>> > > to
>> > > > > > take
>> > > > > > > > some
>> > > > > > > > > actions. It's much more actionable if the error message is
>> > > > "upgrade
>> > > > > > the
>> > > > > > > > > broker to release version 2.6" than just "upgrade the
>> broker
>> > to
>> > > > > > feature
>> > > > > > > > > version 7".
>> > > > > > > >
>> > > > > > > > (Kowshik): That's a really good point! If we use ints for
>> > feature
>> > > > > > > versions,
>> > > > > > > > the best
>> > > > > > > > message that client can print for debugging is "broker
>> doesn't
>> > > > > support
>> > > > > > > > feature version 7", and alongside that print the supported
>> > > version
>> > > > > > range
>> > > > > > > > returned
>> > > > > > > > by the broker. Then, does it sound reasonable that the user
>> > could
>> > > > > then
>> > > > > > > > reference
>> > > > > > > > Kafka release logs to figure out which version of the broker
>> > > > release
>> > > > > is
>> > > > > > > > required
>> > > > > > > > be deployed, to support feature version 7? I couldn't think
>> of
>> > a
>> > > > > better
>> > > > > > > > strategy here.
>> > > > > > > >
>> > > > > > > > > 120. When should a developer bump up the version of a
>> > feature?
>> > > > > > > >
>> > > > > > > > (Kowshik): Great question! In the KIP, I have added a
>> section:
>> > > > > > > 'Guidelines
>> > > > > > > > on feature versions and workflows'
>> > > > > > > > providing some guidelines on when to use the versioned
>> feature
>> > > > flags,
>> > > > > > and
>> > > > > > > > what
>> > > > > > > > are the regular workflows with the CLI tool.
>> > > > > > > >
>> > > > > > > > Link to the relevant sections:
>> > > > > > > > Guidelines:
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
>> > > > > > > >
>> > > > > > > > Regular CLI tool usage:
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
>> > > > > > > >
>> > > > > > > > Advanced CLI tool usage:
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Cheers,
>> > > > > > > > Kowshik
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io>
>> > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi, Kowshik,
>> > > > > > > > >
>> > > > > > > > > Thanks for the reply. A few more comments.
>> > > > > > > > >
>> > > > > > > > > 110. Keeping the feature version as int is probably fine.
>> I
>> > > just
>> > > > > felt
>> > > > > > > > that
>> > > > > > > > > for some of the common user interactions, it's more
>> > convenient
>> > > to
>> > > > > > > > > relate that to a release version. For example, if a user
>> > wants
>> > > to
>> > > > > > > > downgrade
>> > > > > > > > > to a release 2.5, it's easier for the user to use the tool
>> > like
>> > > > > "tool
>> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
>> > > > --version
>> > > > > > 6".
>> > > > > > > > > Similarly, if the client library finds a feature mismatch
>> > with
>> > > > the
>> > > > > > > > broker,
>> > > > > > > > > the client likely needs to log some error message for the
>> > user
>> > > to
>> > > > > > take
>> > > > > > > > some
>> > > > > > > > > actions. It's much more actionable if the error message is
>> > > > "upgrade
>> > > > > > the
>> > > > > > > > > broker to release version 2.6" than just "upgrade the
>> broker
>> > to
>> > > > > > feature
>> > > > > > > > > version 7".
>> > > > > > > > >
>> > > > > > > > > 111. Sounds good.
>> > > > > > > > >
>> > > > > > > > > 120. When should a developer bump up the version of a
>> > feature?
>> > > > > > > > >
>> > > > > > > > > Jun
>> > > > > > > > >
>> > > > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
>> > > > > > > kprakasam@confluent.io
>> > > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Jun,
>> > > > > > > > > >
>> > > > > > > > > > I have updated the KIP for the item 111.
>> > > > > > > > > > I'm in the process of addressing 100.6, and will
>> provide an
>> > > > > update
>> > > > > > > > soon.
>> > > > > > > > > > I think item 110 is still under discussion given we are
>> now
>> > > > > > > providing a
>> > > > > > > > > way
>> > > > > > > > > > to finalize
>> > > > > > > > > > all features to their latest version levels. In any
>> case,
>> > > > please
>> > > > > > let
>> > > > > > > us
>> > > > > > > > > > know
>> > > > > > > > > > how you feel in response to Colin's comments on this
>> topic.
>> > > > > > > > > >
>> > > > > > > > > > > 111. To put this in context, when we had IBP, the
>> default
>> > > > value
>> > > > > > is
>> > > > > > > > the
>> > > > > > > > > > > current released version. So, if you are a brand new
>> > user,
>> > > > you
>> > > > > > > don't
>> > > > > > > > > need
>> > > > > > > > > > > to configure IBP and all new features will be
>> immediately
>> > > > > > available
>> > > > > > > > in
>> > > > > > > > > > the
>> > > > > > > > > > > new cluster. If you are upgrading from an old version,
>> > you
>> > > do
>> > > > > > need
>> > > > > > > to
>> > > > > > > > > > > understand and configure IBP. I see a similar pattern
>> > here
>> > > > for
>> > > > > > > > > > > features. From the ease of use perspective, ideally,
>> we
>> > > > > shouldn't
>> > > > > > > > > require
>> > > > > > > > > > a
>> > > > > > > > > > > new user to have an extra step such as running a
>> > bootstrap
>> > > > > script
>> > > > > > > > > unless
>> > > > > > > > > > > it's truly necessary. If someone has a special need
>> (all
>> > > the
>> > > > > > cases
>> > > > > > > > you
>> > > > > > > > > > > mentioned seem special cases?), they can configure a
>> mode
>> > > > such
>> > > > > > that
>> > > > > > > > > > > features are enabled/disabled manually.
>> > > > > > > > > >
>> > > > > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry
>> if
>> > I
>> > > > > didn't
>> > > > > > > > > > understand
>> > > > > > > > > > this need earlier. I have updated the KIP with the
>> approach
>> > > > that
>> > > > > > > > whenever
>> > > > > > > > > > the '/features' node is absent, the controller by
>> default
>> > > will
>> > > > > > > > bootstrap
>> > > > > > > > > > the node
>> > > > > > > > > > to contain the latest feature levels. Here is the new
>> > section
>> > > > in
>> > > > > > the
>> > > > > > > > KIP
>> > > > > > > > > > describing
>> > > > > > > > > > the same:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
>> > > > > > > > > >
>> > > > > > > > > > Next, as I explained in my response to Colin's
>> suggestions,
>> > > we
>> > > > > are
>> > > > > > > now
>> > > > > > > > > > providing a `--finalize-latest-features` flag with the
>> > > tooling.
>> > > > > > This
>> > > > > > > > lets
>> > > > > > > > > > the sysadmin finalize all features known to the
>> controller
>> > to
>> > > > > their
>> > > > > > > > > latest
>> > > > > > > > > > version
>> > > > > > > > > > levels. Please look at this section (point #3 and the
>> > tooling
>> > > > > > example
>> > > > > > > > > > later):
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Do you feel this addresses your comment/concern?
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Cheers,
>> > > > > > > > > > Kowshik
>> > > > > > > > > >
>> > > > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <
>> jun@confluent.io>
>> > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi, Kowshik,
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks for the reply. A few more replies below.
>> > > > > > > > > > >
>> > > > > > > > > > > 100.6 You can look for the sentence "This operation
>> > > requires
>> > > > > > ALTER
>> > > > > > > on
>> > > > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
>> > > > > > > > > > > KafkaApis.authorize().
>> > > > > > > > > > >
>> > > > > > > > > > > 110. From the external client/tooling perspective,
>> it's
>> > > more
>> > > > > > > natural
>> > > > > > > > to
>> > > > > > > > > > use
>> > > > > > > > > > > the release version for features. If we can use the
>> same
>> > > > > release
>> > > > > > > > > version
>> > > > > > > > > > > for internal representation, it seems simpler (easier
>> to
>> > > > > > > understand,
>> > > > > > > > no
>> > > > > > > > > > > mapping overhead, etc). Is there a benefit with
>> separate
>> > > > > external
>> > > > > > > and
>> > > > > > > > > > > internal versioning schemes?
>> > > > > > > > > > >
>> > > > > > > > > > > 111. To put this in context, when we had IBP, the
>> default
>> > > > value
>> > > > > > is
>> > > > > > > > the
>> > > > > > > > > > > current released version. So, if you are a brand new
>> > user,
>> > > > you
>> > > > > > > don't
>> > > > > > > > > need
>> > > > > > > > > > > to configure IBP and all new features will be
>> immediately
>> > > > > > available
>> > > > > > > > in
>> > > > > > > > > > the
>> > > > > > > > > > > new cluster. If you are upgrading from an old version,
>> > you
>> > > do
>> > > > > > need
>> > > > > > > to
>> > > > > > > > > > > understand and configure IBP. I see a similar pattern
>> > here
>> > > > for
>> > > > > > > > > > > features. From the ease of use perspective, ideally,
>> we
>> > > > > shouldn't
>> > > > > > > > > > require a
>> > > > > > > > > > > new user to have an extra step such as running a
>> > bootstrap
>> > > > > script
>> > > > > > > > > unless
>> > > > > > > > > > > it's truly necessary. If someone has a special need
>> (all
>> > > the
>> > > > > > cases
>> > > > > > > > you
>> > > > > > > > > > > mentioned seem special cases?), they can configure a
>> mode
>> > > > such
>> > > > > > that
>> > > > > > > > > > > features are enabled/disabled manually.
>> > > > > > > > > > >
>> > > > > > > > > > > Jun
>> > > > > > > > > > >
>> > > > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
>> > > > > > > > > kprakasam@confluent.io>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi Jun,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks for the feedback and suggestions. Please
>> find my
>> > > > > > response
>> > > > > > > > > below.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 100.6 For every new request, the admin needs to
>> > control
>> > > > who
>> > > > > > is
>> > > > > > > > > > allowed
>> > > > > > > > > > > to
>> > > > > > > > > > > > > issue that request if security is enabled. So, we
>> > need
>> > > to
>> > > > > > > assign
>> > > > > > > > > the
>> > > > > > > > > > > new
>> > > > > > > > > > > > > request a ResourceType and possible AclOperations.
>> > See
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
>> > > > > > > > > > > > > as an example.
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): I don't see any reference to the words
>> > > > > ResourceType
>> > > > > > or
>> > > > > > > > > > > > AclOperations
>> > > > > > > > > > > > in the KIP. Please let me know how I can use the KIP
>> > that
>> > > > you
>> > > > > > > > linked
>> > > > > > > > > to
>> > > > > > > > > > > > know how to
>> > > > > > > > > > > > setup the appropriate ResourceType and/or
>> > > ClusterOperation?
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 105. If we change delete to disable, it's better
>> to
>> > do
>> > > > this
>> > > > > > > > > > > consistently
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > request protocol and admin api as well.
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): The API shouldn't be called 'disable'
>> when
>> > it
>> > > is
>> > > > > > > > deleting
>> > > > > > > > > a
>> > > > > > > > > > > > feature.
>> > > > > > > > > > > > I've just changed the KIP to use 'delete'. I don't
>> > have a
>> > > > > > strong
>> > > > > > > > > > > > preference.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use
>> > int64.
>> > > > > > > Currently,
>> > > > > > > > > our
>> > > > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
>> > > > 2.5.0).
>> > > > > > It's
>> > > > > > > > > > > possible
>> > > > > > > > > > > > > for new features to be included in minor releases
>> > too.
>> > > > > Should
>> > > > > > > we
>> > > > > > > > > make
>> > > > > > > > > > > the
>> > > > > > > > > > > > > feature versioning match the release versioning?
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): The release version can be mapped to a
>> set
>> > of
>> > > > > > feature
>> > > > > > > > > > > versions,
>> > > > > > > > > > > > and this can be done, for example in the tool (or
>> even
>> > > > > external
>> > > > > > > to
>> > > > > > > > > the
>> > > > > > > > > > > > tool).
>> > > > > > > > > > > > Can you please clarify what I'm missing?
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 111. "During regular operations, the data in the
>> ZK
>> > > node
>> > > > > can
>> > > > > > be
>> > > > > > > > > > mutated
>> > > > > > > > > > > > > only via a specific admin API served only by the
>> > > > > > controller." I
>> > > > > > > > am
>> > > > > > > > > > > > > wondering why can't the controller auto finalize a
>> > > > feature
>> > > > > > > > version
>> > > > > > > > > > > after
>> > > > > > > > > > > > > all brokers are upgraded? For new users who
>> download
>> > > the
>> > > > > > latest
>> > > > > > > > > > version
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > build a new cluster, it's inconvenient for them to
>> > have
>> > > > to
>> > > > > > > > manually
>> > > > > > > > > > > > enable
>> > > > > > > > > > > > > each feature.
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): I agree that there is a trade-off here,
>> but
>> > it
>> > > > > will
>> > > > > > > help
>> > > > > > > > > > > > to decide whether the automation can be thought
>> through
>> > > in
>> > > > > the
>> > > > > > > > future
>> > > > > > > > > > > > in a follow up KIP, or right now in this KIP. We may
>> > > invest
>> > > > > > > > > > > > in automation, but we have to decide whether we
>> should
>> > do
>> > > > it
>> > > > > > > > > > > > now or later.
>> > > > > > > > > > > >
>> > > > > > > > > > > > For the inconvenience that you mentioned, do you
>> think
>> > > the
>> > > > > > > problem
>> > > > > > > > > that
>> > > > > > > > > > > you
>> > > > > > > > > > > > mentioned can be  overcome by asking for the cluster
>> > > > operator
>> > > > > > to
>> > > > > > > > run
>> > > > > > > > > a
>> > > > > > > > > > > > bootstrap script  when he/she knows that a specific
>> AK
>> > > > > release
>> > > > > > > has
>> > > > > > > > > been
>> > > > > > > > > > > > almost completely deployed in a cluster for the
>> first
>> > > time?
>> > > > > > Idea
>> > > > > > > is
>> > > > > > > > > > that
>> > > > > > > > > > > > the
>> > > > > > > > > > > > bootstrap script will know how to map a specific AK
>> > > release
>> > > > > to
>> > > > > > > > > > finalized
>> > > > > > > > > > > > feature versions, and run the `kafka-features.sh`
>> tool
>> > > > > > > > appropriately
>> > > > > > > > > > > > against
>> > > > > > > > > > > > the cluster.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Now, coming back to your automation
>> proposal/question.
>> > > > > > > > > > > > I do see the value of automated feature version
>> > > > finalization,
>> > > > > > > but I
>> > > > > > > > > > also
>> > > > > > > > > > > > see
>> > > > > > > > > > > > that this will open up several questions and some
>> > risks,
>> > > as
>> > > > > > > > explained
>> > > > > > > > > > > > below.
>> > > > > > > > > > > > The answers to these depend on the definition of the
>> > > > > automation
>> > > > > > > we
>> > > > > > > > > > choose
>> > > > > > > > > > > > to build, and how well does it fit into a kafka
>> > > deployment.
>> > > > > > > > > > > > Basically, it can be unsafe for the controller to
>> > > finalize
>> > > > > > > feature
>> > > > > > > > > > > version
>> > > > > > > > > > > > upgrades automatically, without learning about the
>> > intent
>> > > > of
>> > > > > > the
>> > > > > > > > > > cluster
>> > > > > > > > > > > > operator.
>> > > > > > > > > > > > 1. We would sometimes want to lock feature versions
>> > only
>> > > > when
>> > > > > > we
>> > > > > > > > have
>> > > > > > > > > > > > externally verified
>> > > > > > > > > > > > the stability of the broker binary.
>> > > > > > > > > > > > 2. Sometimes only the cluster operator knows that a
>> > > cluster
>> > > > > > > upgrade
>> > > > > > > > > is
>> > > > > > > > > > > > complete,
>> > > > > > > > > > > > and new brokers are highly unlikely to join the
>> > cluster.
>> > > > > > > > > > > > 3. Only the cluster operator knows that the intent
>> is
>> > to
>> > > > > deploy
>> > > > > > > the
>> > > > > > > > > > same
>> > > > > > > > > > > > version
>> > > > > > > > > > > > of the new broker release across the entire cluster
>> > (i.e.
>> > > > the
>> > > > > > > > latest
>> > > > > > > > > > > > downloaded version).
>> > > > > > > > > > > > 4. For downgrades, it appears the controller still
>> > needs
>> > > > some
>> > > > > > > > > external
>> > > > > > > > > > > > input
>> > > > > > > > > > > > (such as the proposed tool) to finalize a feature
>> > version
>> > > > > > > > downgrade.
>> > > > > > > > > > > >
>> > > > > > > > > > > > If we have automation, that automation can end up
>> > failing
>> > > > in
>> > > > > > some
>> > > > > > > > of
>> > > > > > > > > > the
>> > > > > > > > > > > > cases
>> > > > > > > > > > > > above. Then, we need a way to declare that the
>> cluster
>> > is
>> > > > > "not
>> > > > > > > > ready"
>> > > > > > > > > > if
>> > > > > > > > > > > > the
>> > > > > > > > > > > > controller cannot automatically finalize some basic
>> > > > required
>> > > > > > > > feature
>> > > > > > > > > > > > version
>> > > > > > > > > > > > upgrades across the cluster. We need to make the
>> > cluster
>> > > > > > operator
>> > > > > > > > > aware
>> > > > > > > > > > > in
>> > > > > > > > > > > > such a scenario (raise an alert or alike).
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
>> > should
>> > > > be
>> > > > > 49
>> > > > > > > > > instead
>> > > > > > > > > > > of
>> > > > > > > > > > > > 48.
>> > > > > > > > > > > >
>> > > > > > > > > > > > (Kowshik): Done.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > Kowshik
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <
>> > > jun@confluent.io>
>> > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi, Kowshik,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Thanks for the reply. A few more comments below.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 100.6 For every new request, the admin needs to
>> > control
>> > > > who
>> > > > > > is
>> > > > > > > > > > allowed
>> > > > > > > > > > > to
>> > > > > > > > > > > > > issue that request if security is enabled. So, we
>> > need
>> > > to
>> > > > > > > assign
>> > > > > > > > > the
>> > > > > > > > > > > new
>> > > > > > > > > > > > > request a ResourceType and possible AclOperations.
>> > See
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
>> > > > > > > > > > > > > as
>> > > > > > > > > > > > > an example.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 105. If we change delete to disable, it's better
>> to
>> > do
>> > > > this
>> > > > > > > > > > > consistently
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > request protocol and admin api as well.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use
>> > int64.
>> > > > > > > Currently,
>> > > > > > > > > our
>> > > > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
>> > > > 2.5.0).
>> > > > > > It's
>> > > > > > > > > > > possible
>> > > > > > > > > > > > > for new features to be included in minor releases
>> > too.
>> > > > > Should
>> > > > > > > we
>> > > > > > > > > make
>> > > > > > > > > > > the
>> > > > > > > > > > > > > feature versioning match the release versioning?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 111. "During regular operations, the data in the
>> ZK
>> > > node
>> > > > > can
>> > > > > > be
>> > > > > > > > > > mutated
>> > > > > > > > > > > > > only via a specific admin API served only by the
>> > > > > > controller." I
>> > > > > > > > am
>> > > > > > > > > > > > > wondering why can't the controller auto finalize a
>> > > > feature
>> > > > > > > > version
>> > > > > > > > > > > after
>> > > > > > > > > > > > > all brokers are upgraded? For new users who
>> download
>> > > the
>> > > > > > latest
>> > > > > > > > > > version
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > build a new cluster, it's inconvenient for them to
>> > have
>> > > > to
>> > > > > > > > manually
>> > > > > > > > > > > > enable
>> > > > > > > > > > > > > each feature.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
>> > should
>> > > > be
>> > > > > 49
>> > > > > > > > > instead
>> > > > > > > > > > > of
>> > > > > > > > > > > > > 48.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Jun
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
>> > > > > > > > > > > kprakasam@confluent.io>
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hey Jun,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks a lot for the great feedback! Please note
>> > that
>> > > > the
>> > > > > > > > design
>> > > > > > > > > > > > > > has changed a little bit on the KIP, and we now
>> > > > propagate
>> > > > > > the
>> > > > > > > > > > > finalized
>> > > > > > > > > > > > > > features metadata only via ZK watches (instead
>> of
>> > > > > > > > > > > UpdateMetadataRequest
>> > > > > > > > > > > > > > from the controller).
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Please find below my response to your
>> > > > questions/feedback,
>> > > > > > > with
>> > > > > > > > > the
>> > > > > > > > > > > > prefix
>> > > > > > > > > > > > > > "(Kowshik):".
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 100.
>> UpdateFeaturesRequest/UpdateFeaturesResponse
>> > > > > > > > > > > > > > > 100.1 Since this request waits for responses
>> from
>> > > > > > brokers,
>> > > > > > > > > should
>> > > > > > > > > > > we
>> > > > > > > > > > > > > add
>> > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > timeout in the request (like
>> createTopicRequest)?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! Done. I have added a
>> > timeout
>> > > > > field.
>> > > > > > > > Note:
>> > > > > > > > > > we
>> > > > > > > > > > > no
>> > > > > > > > > > > > > > longer
>> > > > > > > > > > > > > > wait for responses from brokers, since the
>> design
>> > has
>> > > > > been
>> > > > > > > > > changed
>> > > > > > > > > > so
>> > > > > > > > > > > > > that
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > features information is propagated via ZK.
>> > > > Nevertheless,
>> > > > > it
>> > > > > > > is
>> > > > > > > > > > right
>> > > > > > > > > > > to
>> > > > > > > > > > > > > > have a timeout
>> > > > > > > > > > > > > > for the request.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
>> > > Typically,
>> > > > > the
>> > > > > > > > > response
>> > > > > > > > > > > > just
>> > > > > > > > > > > > > > > shows an error code and an error message,
>> instead
>> > > of
>> > > > > > > echoing
>> > > > > > > > > the
>> > > > > > > > > > > > > request.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified
>> it to
>> > > > just
>> > > > > > > return
>> > > > > > > > > an
>> > > > > > > > > > > > error
>> > > > > > > > > > > > > > code and a message.
>> > > > > > > > > > > > > > Previously it was not echoing the "request",
>> rather
>> > > it
>> > > > > was
>> > > > > > > > > > returning
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > latest set of
>> > > > > > > > > > > > > > cluster-wide finalized features (after applying
>> the
>> > > > > > updates).
>> > > > > > > > But
>> > > > > > > > > > you
>> > > > > > > > > > > > are
>> > > > > > > > > > > > > > right,
>> > > > > > > > > > > > > > the additional info is not required, so I have
>> > > removed
>> > > > it
>> > > > > > > from
>> > > > > > > > > the
>> > > > > > > > > > > > > response
>> > > > > > > > > > > > > > schema.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 100.3 Should we add a separate request to
>> > > > list/describe
>> > > > > > the
>> > > > > > > > > > > existing
>> > > > > > > > > > > > > > > features?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): This is already present in the KIP
>> via
>> > the
>> > > > > > > > > > > > 'DescribeFeatures'
>> > > > > > > > > > > > > > Admin API,
>> > > > > > > > > > > > > > which, underneath covers uses the
>> > ApiVersionsRequest
>> > > to
>> > > > > > > > > > list/describe
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > existing features. Please read the 'Tooling
>> > support'
>> > > > > > section.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE
>> in a
>> > > > > single
>> > > > > > > > > request.
>> > > > > > > > > > > For
>> > > > > > > > > > > > > > > DELETE, the version field doesn't make sense.
>> > So, I
>> > > > > guess
>> > > > > > > the
>> > > > > > > > > > > broker
>> > > > > > > > > > > > > just
>> > > > > > > > > > > > > > > ignores this? An alternative way is to have a
>> > > > separate
>> > > > > > > > > > > > > > DeleteFeaturesRequest
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP
>> now
>> > > to
>> > > > > > have 2
>> > > > > > > > > > > separate
>> > > > > > > > > > > > > > controller APIs
>> > > > > > > > > > > > > > serving these different purposes:
>> > > > > > > > > > > > > > 1. updateFeatures
>> > > > > > > > > > > > > > 2. deleteFeatures
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
>> > > > > > monotonically
>> > > > > > > > > > > > increasing
>> > > > > > > > > > > > > > > version of the metadata for finalized
>> features."
>> > I
>> > > am
>> > > > > > > > wondering
>> > > > > > > > > > why
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > ordering is important?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is
>> called
>> > > > epoch
>> > > > > > > > > (instead
>> > > > > > > > > > of
>> > > > > > > > > > > > > > version), and
>> > > > > > > > > > > > > > it is just the ZK node version. Basically, this
>> is
>> > > the
>> > > > > > epoch
>> > > > > > > > for
>> > > > > > > > > > the
>> > > > > > > > > > > > > > cluster-wide
>> > > > > > > > > > > > > > finalized feature version metadata. This
>> metadata
>> > is
>> > > > > served
>> > > > > > > to
>> > > > > > > > > > > clients
>> > > > > > > > > > > > > via
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate
>> > updates
>> > > > > from
>> > > > > > > the
>> > > > > > > > > > > > > '/features'
>> > > > > > > > > > > > > > ZK node
>> > > > > > > > > > > > > > to all brokers, via ZK watches setup by each
>> broker
>> > > on
>> > > > > the
>> > > > > > > > > > > '/features'
>> > > > > > > > > > > > > > node.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Now here is why the ordering is important:
>> > > > > > > > > > > > > > ZK watches don't propagate at the same time. As
>> a
>> > > > result,
>> > > > > > the
>> > > > > > > > > > > > > > ApiVersionsResponse
>> > > > > > > > > > > > > > is eventually consistent across brokers. This
>> can
>> > > > > introduce
>> > > > > > > > cases
>> > > > > > > > > > > > > > where clients see an older lower epoch of the
>> > > features
>> > > > > > > > metadata,
>> > > > > > > > > > > after
>> > > > > > > > > > > > a
>> > > > > > > > > > > > > > more recent
>> > > > > > > > > > > > > > higher epoch was returned at a previous point in
>> > > time.
>> > > > We
>> > > > > > > > expect
>> > > > > > > > > > > > clients
>> > > > > > > > > > > > > > to always employ the rule that the latest
>> received
>> > > > higher
>> > > > > > > epoch
>> > > > > > > > > of
>> > > > > > > > > > > > > metadata
>> > > > > > > > > > > > > > always trumps an older smaller epoch. Those
>> clients
>> > > > that
>> > > > > > are
>> > > > > > > > > > external
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > > Kafka should strongly consider discovering the
>> > latest
>> > > > > > > metadata
>> > > > > > > > > once
>> > > > > > > > > > > > > during
>> > > > > > > > > > > > > > startup from the brokers, and if required
>> refresh
>> > the
>> > > > > > > metadata
>> > > > > > > > > > > > > periodically
>> > > > > > > > > > > > > > (to get the latest metadata).
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for
>> this
>> > > new
>> > > > > > > > request?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): What is ACL, and how could I find out
>> > > which
>> > > > > one
>> > > > > > to
>> > > > > > > > > > > specify?
>> > > > > > > > > > > > > > Please could you provide me some pointers? I'll
>> be
>> > > glad
>> > > > > to
>> > > > > > > > update
>> > > > > > > > > > the
>> > > > > > > > > > > > > > KIP once I know the next steps.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 101. For the broker registration ZK node,
>> should
>> > we
>> > > > > bump
>> > > > > > up
>> > > > > > > > the
>> > > > > > > > > > > > version
>> > > > > > > > > > > > > > in
>> > > > > > > > > > > > > > the json?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! Done. I've increased the
>> > > > version
>> > > > > in
>> > > > > > > the
>> > > > > > > > > > > broker
>> > > > > > > > > > > > > json
>> > > > > > > > > > > > > > by 1.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we
>> > need
>> > > > the
>> > > > > > > epoch
>> > > > > > > > > > > field.
>> > > > > > > > > > > > > Each
>> > > > > > > > > > > > > > > ZK node has an internal version field that is
>> > > > > incremented
>> > > > > > > on
>> > > > > > > > > > every
>> > > > > > > > > > > > > > update.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK
>> node
>> > > > > version
>> > > > > > > > now,
>> > > > > > > > > > > > instead
>> > > > > > > > > > > > > of
>> > > > > > > > > > > > > > explicitly
>> > > > > > > > > > > > > > incremented epoch.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a
>> feature
>> > > > > version
>> > > > > > > > > > > cluster-wide
>> > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > left to the discretion of the logic
>> implementing
>> > > the
>> > > > > > > feature
>> > > > > > > > > (ex:
>> > > > > > > > > > > can
>> > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > done via dynamic broker config)." Does that
>> mean
>> > > the
>> > > > > > broker
>> > > > > > > > > > > > > registration
>> > > > > > > > > > > > > > ZK
>> > > > > > > > > > > > > > > node will be updated dynamically when this
>> > happens?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): Not really. The text was just
>> conveying
>> > > > that a
>> > > > > > > > broker
>> > > > > > > > > > > could
>> > > > > > > > > > > > > > "know" of
>> > > > > > > > > > > > > > a new feature version, but it does not mean the
>> > > broker
>> > > > > > should
>> > > > > > > > > have
>> > > > > > > > > > > also
>> > > > > > > > > > > > > > activated the effects of the feature version.
>> > Knowing
>> > > > vs
>> > > > > > > > > activation
>> > > > > > > > > > > > are 2
>> > > > > > > > > > > > > > separate things,
>> > > > > > > > > > > > > > and the latter can be achieved by dynamic
>> config. I
>> > > > have
>> > > > > > > > reworded
>> > > > > > > > > > the
>> > > > > > > > > > > > > text
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > make this clear to the reader.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 104. UpdateMetadataRequest
>> > > > > > > > > > > > > > > 104.1 It would be useful to describe when the
>> > > feature
>> > > > > > > > metadata
>> > > > > > > > > is
>> > > > > > > > > > > > > > included
>> > > > > > > > > > > > > > > in the request. My understanding is that it's
>> > only
>> > > > > > included
>> > > > > > > > if
>> > > > > > > > > > (1)
>> > > > > > > > > > > > > there
>> > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > a change to the finalized feature; (2) broker
>> > > > restart;
>> > > > > > (3)
>> > > > > > > > > > > controller
>> > > > > > > > > > > > > > > failover.
>> > > > > > > > > > > > > > > 104.2 The new fields have the following
>> versions.
>> > > Why
>> > > > > are
>> > > > > > > the
>> > > > > > > > > > > > versions
>> > > > > > > > > > > > > 3+
>> > > > > > > > > > > > > > > when the top version is bumped to 6?
>> > > > > > > > > > > > > > >       "fields":  [
>> > > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
>> > > > "versions":
>> > > > > > > > "3+",
>> > > > > > > > > > > > > > >           "about": "The name of the
>> feature."},
>> > > > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
>> > > > > > "versions":
>> > > > > > > > > "3+",
>> > > > > > > > > > > > > > >           "about": "The finalized version for
>> the
>> > > > > > > feature."}
>> > > > > > > > > > > > > > >       ]
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): With the new improved design, we have
>> > > > > completely
>> > > > > > > > > > > eliminated
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > need to
>> > > > > > > > > > > > > > use UpdateMetadataRequest. This is because we
>> now
>> > > rely
>> > > > on
>> > > > > > ZK
>> > > > > > > to
>> > > > > > > > > > > deliver
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > notifications for changes to the '/features' ZK
>> > node.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
>> > > > update/delete,
>> > > > > > > > perhaps
>> > > > > > > > > > > it's
>> > > > > > > > > > > > > > better
>> > > > > > > > > > > > > > > to use enable/disable?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so
>> > that
>> > > > we
>> > > > > > > > instead
>> > > > > > > > > > call
>> > > > > > > > > > > > it
>> > > > > > > > > > > > > > 'disable'.
>> > > > > > > > > > > > > > However for 'update', it can now also refer to
>> > either
>> > > > an
>> > > > > > > > upgrade
>> > > > > > > > > > or a
>> > > > > > > > > > > > > > forced downgrade.
>> > > > > > > > > > > > > > Therefore, I have left it the way it is, just
>> > calling
>> > > > it
>> > > > > as
>> > > > > > > > just
>> > > > > > > > > > > > > 'update'.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > Kowshik
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
>> > > > > jun@confluent.io>
>> > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Hi, Kowshik,
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few
>> > > > comments
>> > > > > > > below.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 100.
>> UpdateFeaturesRequest/UpdateFeaturesResponse
>> > > > > > > > > > > > > > > 100.1 Since this request waits for responses
>> from
>> > > > > > brokers,
>> > > > > > > > > should
>> > > > > > > > > > > we
>> > > > > > > > > > > > > add
>> > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > timeout in the request (like
>> createTopicRequest)?
>> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
>> > > Typically,
>> > > > > the
>> > > > > > > > > response
>> > > > > > > > > > > > just
>> > > > > > > > > > > > > > > shows an error code and an error message,
>> instead
>> > > of
>> > > > > > > echoing
>> > > > > > > > > the
>> > > > > > > > > > > > > request.
>> > > > > > > > > > > > > > > 100.3 Should we add a separate request to
>> > > > list/describe
>> > > > > > the
>> > > > > > > > > > > existing
>> > > > > > > > > > > > > > > features?
>> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE
>> in a
>> > > > > single
>> > > > > > > > > request.
>> > > > > > > > > > > For
>> > > > > > > > > > > > > > > DELETE, the version field doesn't make sense.
>> > So, I
>> > > > > guess
>> > > > > > > the
>> > > > > > > > > > > broker
>> > > > > > > > > > > > > just
>> > > > > > > > > > > > > > > ignores this? An alternative way is to have a
>> > > > separate
>> > > > > > > > > > > > > > > DeleteFeaturesRequest
>> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
>> > > > > > monotonically
>> > > > > > > > > > > > increasing
>> > > > > > > > > > > > > > > version of the metadata for finalized
>> features."
>> > I
>> > > am
>> > > > > > > > wondering
>> > > > > > > > > > why
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > ordering is important?
>> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for
>> this
>> > > new
>> > > > > > > > request?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 101. For the broker registration ZK node,
>> should
>> > we
>> > > > > bump
>> > > > > > up
>> > > > > > > > the
>> > > > > > > > > > > > version
>> > > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > the json?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we
>> > need
>> > > > the
>> > > > > > > epoch
>> > > > > > > > > > > field.
>> > > > > > > > > > > > > Each
>> > > > > > > > > > > > > > > ZK node has an internal version field that is
>> > > > > incremented
>> > > > > > > on
>> > > > > > > > > > every
>> > > > > > > > > > > > > > update.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a
>> feature
>> > > > > version
>> > > > > > > > > > > cluster-wide
>> > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > left to the discretion of the logic
>> implementing
>> > > the
>> > > > > > > feature
>> > > > > > > > > (ex:
>> > > > > > > > > > > can
>> > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > done via dynamic broker config)." Does that
>> mean
>> > > the
>> > > > > > broker
>> > > > > > > > > > > > > registration
>> > > > > > > > > > > > > > ZK
>> > > > > > > > > > > > > > > node will be updated dynamically when this
>> > happens?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 104. UpdateMetadataRequest
>> > > > > > > > > > > > > > > 104.1 It would be useful to describe when the
>> > > feature
>> > > > > > > > metadata
>> > > > > > > > > is
>> > > > > > > > > > > > > > included
>> > > > > > > > > > > > > > > in the request. My understanding is that it's
>> > only
>> > > > > > included
>> > > > > > > > if
>> > > > > > > > > > (1)
>> > > > > > > > > > > > > there
>> > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > a change to the finalized feature; (2) broker
>> > > > restart;
>> > > > > > (3)
>> > > > > > > > > > > controller
>> > > > > > > > > > > > > > > failover.
>> > > > > > > > > > > > > > > 104.2 The new fields have the following
>> versions.
>> > > Why
>> > > > > are
>> > > > > > > the
>> > > > > > > > > > > > versions
>> > > > > > > > > > > > > 3+
>> > > > > > > > > > > > > > > when the top version is bumped to 6?
>> > > > > > > > > > > > > > >       "fields":  [
>> > > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
>> > > > "versions":
>> > > > > > > > "3+",
>> > > > > > > > > > > > > > >           "about": "The name of the
>> feature."},
>> > > > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
>> > > > > > "versions":
>> > > > > > > > > "3+",
>> > > > > > > > > > > > > > >           "about": "The finalized version for
>> the
>> > > > > > > feature."}
>> > > > > > > > > > > > > > >       ]
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
>> > > > update/delete,
>> > > > > > > > perhaps
>> > > > > > > > > > > it's
>> > > > > > > > > > > > > > better
>> > > > > > > > > > > > > > > to use enable/disable?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Jun
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik
>> Prakasam
>> > <
>> > > > > > > > > > > > > kprakasam@confluent.io
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hey Boyang,
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks for the great feedback! I have
>> updated
>> > the
>> > > > KIP
>> > > > > > > based
>> > > > > > > > > on
>> > > > > > > > > > > your
>> > > > > > > > > > > > > > > > feedback.
>> > > > > > > > > > > > > > > > Please find my response below for your
>> > comments,
>> > > > look
>> > > > > > for
>> > > > > > > > > > > sentences
>> > > > > > > > > > > > > > > > starting
>> > > > > > > > > > > > > > > > with "(Kowshik)" below.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to
>> begin
>> > > > > handling
>> > > > > > > EOS
>> > > > > > > > > > > > traffic"
>> > > > > > > > > > > > > > > could
>> > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > converted as "When is it safe for the
>> brokers
>> > > to
>> > > > > > start
>> > > > > > > > > > serving
>> > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is
>> not
>> > > > > > explained
>> > > > > > > > > > earlier
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > context.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): Great point! Done.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the
>> metadata
>> > > > > version
>> > > > > > > > > number
>> > > > > > > > > > > part
>> > > > > > > > > > > > > > > seems a
>> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference
>> to
>> > > later
>> > > > > > > section
>> > > > > > > > > > that
>> > > > > > > > > > > we
>> > > > > > > > > > > > > > going
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > store it in Zookeeper and update it every
>> > time
>> > > > when
>> > > > > > > there
>> > > > > > > > > is
>> > > > > > > > > > a
>> > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > > change?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a
>> > > > reference
>> > > > > in
>> > > > > > > the
>> > > > > > > > > > KIP.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although
>> it's a
>> > > > > > Non-goal
>> > > > > > > of
>> > > > > > > > > the
>> > > > > > > > > > > > KIP,
>> > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > features such as group coordinator
>> semantics,
>> > > > there
>> > > > > > is
>> > > > > > > no
>> > > > > > > > > > legal
>> > > > > > > > > > > > > > > scenario
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > perform a downgrade at all. So having
>> > downgrade
>> > > > > door
>> > > > > > > open
>> > > > > > > > > is
>> > > > > > > > > > > > pretty
>> > > > > > > > > > > > > > > > > error-prone as human faults happen all the
>> > > time.
>> > > > > I'm
>> > > > > > > > > assuming
>> > > > > > > > > > > as
>> > > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > features are implemented, it's not very
>> hard
>> > to
>> > > > > add a
>> > > > > > > > flag
>> > > > > > > > > > > during
>> > > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > > creation to indicate whether this feature
>> is
>> > > > > > > > > "downgradable".
>> > > > > > > > > > > > Could
>> > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > > explain a bit more on the extra
>> engineering
>> > > > effort
>> > > > > > for
>> > > > > > > > > > shipping
>> > > > > > > > > > > > > this
>> > > > > > > > > > > > > > > KIP
>> > > > > > > > > > > > > > > > > with downgrade protection in place?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and
>> disagree
>> > > > here.
>> > > > > > > While
>> > > > > > > > I
>> > > > > > > > > > > agree
>> > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > accidental
>> > > > > > > > > > > > > > > > downgrades can cause problems, I also think
>> > > > sometimes
>> > > > > > > > > > downgrades
>> > > > > > > > > > > > > should
>> > > > > > > > > > > > > > > > be allowed for emergency reasons (not all
>> > > > downgrades
>> > > > > > > cause
>> > > > > > > > > > > issues).
>> > > > > > > > > > > > > > > > It is just subjective to the feature being
>> > > > > downgraded.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > To be more strict about feature version
>> > > > downgrades, I
>> > > > > > > have
>> > > > > > > > > > > modified
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > KIP
>> > > > > > > > > > > > > > > > proposing that we mandate a
>> `--force-downgrade`
>> > > > flag
>> > > > > be
>> > > > > > > > used
>> > > > > > > > > in
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > > UPDATE_FEATURES api
>> > > > > > > > > > > > > > > > and the tooling, whenever the human is
>> > > downgrading
>> > > > a
>> > > > > > > > > finalized
>> > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > version.
>> > > > > > > > > > > > > > > > Hopefully this should cover the requirement,
>> > > until
>> > > > we
>> > > > > > > find
>> > > > > > > > > the
>> > > > > > > > > > > need
>> > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > advanced downgrade support.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
>> > > feature
>> > > > > > > > versions
>> > > > > > > > > > will
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > defined
>> > > > > > > > > > > > > > > > > in the broker code." So this means in
>> order
>> > to
>> > > > > > > restrict a
>> > > > > > > > > > > certain
>> > > > > > > > > > > > > > > > feature,
>> > > > > > > > > > > > > > > > > we need to start the broker first and then
>> > > send a
>> > > > > > > feature
>> > > > > > > > > > > gating
>> > > > > > > > > > > > > > > request
>> > > > > > > > > > > > > > > > > immediately, which introduces a time gap
>> and
>> > > the
>> > > > > > > > > > > > intended-to-close
>> > > > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > > could actually serve request during this
>> > phase.
>> > > > Do
>> > > > > > you
>> > > > > > > > > think
>> > > > > > > > > > we
>> > > > > > > > > > > > > > should
>> > > > > > > > > > > > > > > > also
>> > > > > > > > > > > > > > > > > support configurations as well so that
>> admin
>> > > user
>> > > > > > could
>> > > > > > > > > > freely
>> > > > > > > > > > > > roll
>> > > > > > > > > > > > > > up
>> > > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > > cluster with all nodes complying the same
>> > > feature
>> > > > > > > gating,
>> > > > > > > > > > > without
>> > > > > > > > > > > > > > > > worrying
>> > > > > > > > > > > > > > > > > about the turnaround time to propagate the
>> > > > message
>> > > > > > only
>> > > > > > > > > after
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > cluster
>> > > > > > > > > > > > > > > > > starts up?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): This is a great point/question.
>> One
>> > of
>> > > > the
>> > > > > > > > > > > expectations
>> > > > > > > > > > > > > out
>> > > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > this KIP, which is
>> > > > > > > > > > > > > > > > already followed in the broker, is the
>> > following.
>> > > > > > > > > > > > > > > >  - Imagine at time T1 the broker starts up
>> and
>> > > > > > registers
>> > > > > > > > it’s
>> > > > > > > > > > > > > presence
>> > > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > > ZK,
>> > > > > > > > > > > > > > > >    along with advertising it’s supported
>> > > features.
>> > > > > > > > > > > > > > > >  - Imagine at a future time T2 the broker
>> > > receives
>> > > > > the
>> > > > > > > > > > > > > > > > UpdateMetadataRequest
>> > > > > > > > > > > > > > > >    from the controller, which contains the
>> > latest
>> > > > > > > finalized
>> > > > > > > > > > > > features
>> > > > > > > > > > > > > as
>> > > > > > > > > > > > > > > > seen by
>> > > > > > > > > > > > > > > >    the controller. The broker validates this
>> > data
>> > > > > > against
>> > > > > > > > > it’s
>> > > > > > > > > > > > > > supported
>> > > > > > > > > > > > > > > > features to
>> > > > > > > > > > > > > > > >    make sure there is no mismatch (it will
>> > > shutdown
>> > > > > if
>> > > > > > > > there
>> > > > > > > > > is
>> > > > > > > > > > > an
>> > > > > > > > > > > > > > > > incompatibility).
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > It is expected that during the time between
>> > the 2
>> > > > > > events
>> > > > > > > T1
>> > > > > > > > > and
>> > > > > > > > > > > T2,
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > broker is
>> > > > > > > > > > > > > > > > almost a silent entity in the cluster. It
>> does
>> > > not
>> > > > > add
>> > > > > > > any
>> > > > > > > > > > value
>> > > > > > > > > > > to
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > cluster, or carry
>> > > > > > > > > > > > > > > > out any important broker activities. By
>> > > > “important”,
>> > > > > I
>> > > > > > > mean
>> > > > > > > > > it
>> > > > > > > > > > is
>> > > > > > > > > > > > not
>> > > > > > > > > > > > > > > doing
>> > > > > > > > > > > > > > > > mutations
>> > > > > > > > > > > > > > > > on it’s persistence, not mutating critical
>> > > > in-memory
>> > > > > > > state,
>> > > > > > > > > > won’t
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > serving
>> > > > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t even
>> > know
>> > > > > it’s
>> > > > > > > > > assigned
>> > > > > > > > > > > > > > > partitions
>> > > > > > > > > > > > > > > > until
>> > > > > > > > > > > > > > > > it receives UpdateMetadataRequest from
>> > > controller.
>> > > > > > > Anything
>> > > > > > > > > the
>> > > > > > > > > > > > > broker
>> > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > doing up
>> > > > > > > > > > > > > > > > until this point is not damaging/useful.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I’ve clarified the above in the KIP, see
>> this
>> > new
>> > > > > > > section:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
>> > > > > > > > > > > > > > > > .
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
>> > deleting
>> > > an
>> > > > > > > > existing
>> > > > > > > > > > > > > Feature",
>> > > > > > > > > > > > > > > may
>> > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > I misunderstood something, I thought the
>> > > features
>> > > > > are
>> > > > > > > > > defined
>> > > > > > > > > > > in
>> > > > > > > > > > > > > > broker
>> > > > > > > > > > > > > > > > > code, so admin could not really create a
>> new
>> > > > > feature?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): Great point! You understood this
>> > > right.
>> > > > > Here
>> > > > > > > > > adding
>> > > > > > > > > > a
>> > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > means we are
>> > > > > > > > > > > > > > > > adding a cluster-wide finalized *max*
>> version
>> > > for a
>> > > > > > > feature
>> > > > > > > > > > that
>> > > > > > > > > > > > was
>> > > > > > > > > > > > > > > > previously never finalized.
>> > > > > > > > > > > > > > > > I have clarified this in the KIP now.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 6. I think we need a separate error code
>> like
>> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > reject a concurrent feature update
>> request.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified the
>> KIP
>> > > > > adding
>> > > > > > > the
>> > > > > > > > > > above
>> > > > > > > > > > > > (see
>> > > > > > > > > > > > > > > > 'Tooling support -> Admin API changes').
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the
>> > alternative
>> > > > > > > solution
>> > > > > > > > to
>> > > > > > > > > > > pass
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > feature information through Zookeeper. Is
>> > that
>> > > > > > > mentioned
>> > > > > > > > in
>> > > > > > > > > > the
>> > > > > > > > > > > > KIP
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
>> > > > favorable?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads
>> > > > finalized
>> > > > > > > > feature
>> > > > > > > > > > info
>> > > > > > > > > > > > > > stored
>> > > > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > > ZK,
>> > > > > > > > > > > > > > > > only during startup when it does a
>> validation.
>> > > When
>> > > > > > > serving
>> > > > > > > > > > > > > > > > `ApiVersionsRequest`, the
>> > > > > > > > > > > > > > > > broker does not read this info from ZK
>> > directly.
>> > > > I'd
>> > > > > > > > imagine
>> > > > > > > > > > the
>> > > > > > > > > > > > risk
>> > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > that it can increase
>> > > > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck
>> for
>> > the
>> > > > > > system.
>> > > > > > > > > > Today,
>> > > > > > > > > > > in
>> > > > > > > > > > > > > > Kafka
>> > > > > > > > > > > > > > > > we use the
>> > > > > > > > > > > > > > > > controller to fan out ZK updates to brokers
>> and
>> > > we
>> > > > > want
>> > > > > > > to
>> > > > > > > > > > stick
>> > > > > > > > > > > to
>> > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > pattern to avoid
>> > > > > > > > > > > > > > > > the ZK read bottleneck when serving
>> > > > > > `ApiVersionsRequest`.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 8. I was under the impression that user
>> could
>> > > > > > > configure a
>> > > > > > > > > > range
>> > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > > supported versions, what's the trade-off
>> for
>> > > > > allowing
>> > > > > > > > > single
>> > > > > > > > > > > > > > finalized
>> > > > > > > > > > > > > > > > > version only?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): Great question! The finalized
>> > version
>> > > > of a
>> > > > > > > > feature
>> > > > > > > > > > > > > basically
>> > > > > > > > > > > > > > > > refers to
>> > > > > > > > > > > > > > > > the cluster-wide finalized feature "maximum"
>> > > > version.
>> > > > > > For
>> > > > > > > > > > > example,
>> > > > > > > > > > > > if
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > 'group_coordinator' feature
>> > > > > > > > > > > > > > > > has the finalized version set to 10, then,
>> it
>> > > means
>> > > > > > that
>> > > > > > > > > > > > cluster-wide
>> > > > > > > > > > > > > > all
>> > > > > > > > > > > > > > > > versions upto v10 are
>> > > > > > > > > > > > > > > > supported for this feature. However, note
>> that
>> > if
>> > > > > some
>> > > > > > > > > version
>> > > > > > > > > > > (ex:
>> > > > > > > > > > > > > v0)
>> > > > > > > > > > > > > > > > gets deprecated
>> > > > > > > > > > > > > > > > for this feature, then we don’t convey that
>> > using
>> > > > > this
>> > > > > > > > scheme
>> > > > > > > > > > > (also
>> > > > > > > > > > > > > > > > supporting deprecation is a non-goal).
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all
>> > > points,
>> > > > > > > > refering
>> > > > > > > > > to
>> > > > > > > > > > > > > > finalized
>> > > > > > > > > > > > > > > > feature "maximum" versions.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here
>> the
>> > > > > "client"
>> > > > > > > here
>> > > > > > > > > may
>> > > > > > > > > > > be
>> > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > producer
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > (Kowshik): Great point! Done.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > Kowshik
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen
>> <
>> > > > > > > > > > > > > > reluctanthero104@gmail.com>
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Hey Kowshik,
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple
>> of
>> > > > > > questions:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to
>> begin
>> > > > > handling
>> > > > > > > EOS
>> > > > > > > > > > > > traffic"
>> > > > > > > > > > > > > > > could
>> > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > converted as "When is it safe for the
>> brokers
>> > > to
>> > > > > > start
>> > > > > > > > > > serving
>> > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is
>> not
>> > > > > > explained
>> > > > > > > > > > earlier
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > context.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the
>> metadata
>> > > > > version
>> > > > > > > > > number
>> > > > > > > > > > > part
>> > > > > > > > > > > > > > > seems a
>> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference
>> to
>> > > later
>> > > > > > > section
>> > > > > > > > > > that
>> > > > > > > > > > > we
>> > > > > > > > > > > > > > going
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > store it in Zookeeper and update it every
>> > time
>> > > > when
>> > > > > > > there
>> > > > > > > > > is
>> > > > > > > > > > a
>> > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > > change?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although
>> it's a
>> > > > > > Non-goal
>> > > > > > > of
>> > > > > > > > > the
>> > > > > > > > > > > > KIP,
>> > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > features such as group coordinator
>> semantics,
>> > > > there
>> > > > > > is
>> > > > > > > no
>> > > > > > > > > > legal
>> > > > > > > > > > > > > > > scenario
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > perform a downgrade at all. So having
>> > downgrade
>> > > > > door
>> > > > > > > open
>> > > > > > > > > is
>> > > > > > > > > > > > pretty
>> > > > > > > > > > > > > > > > > error-prone as human faults happen all the
>> > > time.
>> > > > > I'm
>> > > > > > > > > assuming
>> > > > > > > > > > > as
>> > > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > features are implemented, it's not very
>> hard
>> > to
>> > > > > add a
>> > > > > > > > flag
>> > > > > > > > > > > during
>> > > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > > creation to indicate whether this feature
>> is
>> > > > > > > > > "downgradable".
>> > > > > > > > > > > > Could
>> > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > > explain a bit more on the extra
>> engineering
>> > > > effort
>> > > > > > for
>> > > > > > > > > > shipping
>> > > > > > > > > > > > > this
>> > > > > > > > > > > > > > > KIP
>> > > > > > > > > > > > > > > > > with downgrade protection in place?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
>> > > feature
>> > > > > > > > versions
>> > > > > > > > > > will
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > defined
>> > > > > > > > > > > > > > > > > in the broker code." So this means in
>> order
>> > to
>> > > > > > > restrict a
>> > > > > > > > > > > certain
>> > > > > > > > > > > > > > > > feature,
>> > > > > > > > > > > > > > > > > we need to start the broker first and then
>> > > send a
>> > > > > > > feature
>> > > > > > > > > > > gating
>> > > > > > > > > > > > > > > request
>> > > > > > > > > > > > > > > > > immediately, which introduces a time gap
>> and
>> > > the
>> > > > > > > > > > > > intended-to-close
>> > > > > > > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > > could actually serve request during this
>> > phase.
>> > > > Do
>> > > > > > you
>> > > > > > > > > think
>> > > > > > > > > > we
>> > > > > > > > > > > > > > should
>> > > > > > > > > > > > > > > > also
>> > > > > > > > > > > > > > > > > support configurations as well so that
>> admin
>> > > user
>> > > > > > could
>> > > > > > > > > > freely
>> > > > > > > > > > > > roll
>> > > > > > > > > > > > > > up
>> > > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > > cluster with all nodes complying the same
>> > > feature
>> > > > > > > gating,
>> > > > > > > > > > > without
>> > > > > > > > > > > > > > > > worrying
>> > > > > > > > > > > > > > > > > about the turnaround time to propagate the
>> > > > message
>> > > > > > only
>> > > > > > > > > after
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > cluster
>> > > > > > > > > > > > > > > > > starts up?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
>> > deleting
>> > > an
>> > > > > > > > existing
>> > > > > > > > > > > > > Feature",
>> > > > > > > > > > > > > > > may
>> > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > I misunderstood something, I thought the
>> > > features
>> > > > > are
>> > > > > > > > > defined
>> > > > > > > > > > > in
>> > > > > > > > > > > > > > broker
>> > > > > > > > > > > > > > > > > code, so admin could not really create a
>> new
>> > > > > feature?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 6. I think we need a separate error code
>> like
>> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > reject a concurrent feature update
>> request.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the
>> > alternative
>> > > > > > > solution
>> > > > > > > > to
>> > > > > > > > > > > pass
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > feature information through Zookeeper. Is
>> > that
>> > > > > > > mentioned
>> > > > > > > > in
>> > > > > > > > > > the
>> > > > > > > > > > > > KIP
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
>> > > > favorable?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 8. I was under the impression that user
>> could
>> > > > > > > configure a
>> > > > > > > > > > range
>> > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > > supported versions, what's the trade-off
>> for
>> > > > > allowing
>> > > > > > > > > single
>> > > > > > > > > > > > > > finalized
>> > > > > > > > > > > > > > > > > version only?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here
>> the
>> > > > > "client"
>> > > > > > > here
>> > > > > > > > > may
>> > > > > > > > > > > be
>> > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > producer
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Boyang
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin
>> McCabe
>> > <
>> > > > > > > > > > > cmccabe@apache.org
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik
>> > > > Prakasam
>> > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > Hi Colin,
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Thanks for the feedback! I've changed
>> the
>> > > KIP
>> > > > > to
>> > > > > > > > > address
>> > > > > > > > > > > your
>> > > > > > > > > > > > > > > > > > > suggestions.
>> > > > > > > > > > > > > > > > > > > Please find below my explanation. Here
>> > is a
>> > > > > link
>> > > > > > to
>> > > > > > > > KIP
>> > > > > > > > > > > 584:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > > > > > > > > > > > > > > .
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > 1. '__data_version__' is the version
>> of
>> > the
>> > > > > > > finalized
>> > > > > > > > > > > feature
>> > > > > > > > > > > > > > > > metadata
>> > > > > > > > > > > > > > > > > > > (i.e. actual ZK node contents), while
>> the
>> > > > > > > > > > > > '__schema_version__'
>> > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > version of the schema of the data
>> > persisted
>> > > > in
>> > > > > > ZK.
>> > > > > > > > > These
>> > > > > > > > > > > > serve
>> > > > > > > > > > > > > > > > > different
>> > > > > > > > > > > > > > > > > > > purposes. '__data_version__' is is
>> useful
>> > > > > mainly
>> > > > > > to
>> > > > > > > > > > clients
>> > > > > > > > > > > > > > during
>> > > > > > > > > > > > > > > > > reads,
>> > > > > > > > > > > > > > > > > > > to differentiate between the 2
>> versions
>> > of
>> > > > > > > eventually
>> > > > > > > > > > > > > consistent
>> > > > > > > > > > > > > > > > > > 'finalized
>> > > > > > > > > > > > > > > > > > > features' metadata (i.e. larger
>> metadata
>> > > > > version
>> > > > > > is
>> > > > > > > > > more
>> > > > > > > > > > > > > recent).
>> > > > > > > > > > > > > > > > > > > '__schema_version__' provides an
>> > additional
>> > > > > > degree
>> > > > > > > of
>> > > > > > > > > > > > > > flexibility,
>> > > > > > > > > > > > > > > > > where
>> > > > > > > > > > > > > > > > > > if
>> > > > > > > > > > > > > > > > > > > we decide to change the schema for
>> > > > '/features'
>> > > > > > node
>> > > > > > > > in
>> > > > > > > > > ZK
>> > > > > > > > > > > (in
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > future),
>> > > > > > > > > > > > > > > > > > > then we can manage broker roll outs
>> > > suitably
>> > > > > > (i.e.
>> > > > > > > > > > > > > > > > > > > serialization/deserialization of the
>> ZK
>> > > data
>> > > > > can
>> > > > > > be
>> > > > > > > > > > handled
>> > > > > > > > > > > > > > > safely).
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Hi Kowshik,
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > If you're talking about a number that
>> lets
>> > > you
>> > > > > know
>> > > > > > > if
>> > > > > > > > > data
>> > > > > > > > > > > is
>> > > > > > > > > > > > > more
>> > > > > > > > > > > > > > > or
>> > > > > > > > > > > > > > > > > > less recent, we would typically call
>> that
>> > an
>> > > > > epoch,
>> > > > > > > and
>> > > > > > > > > > not a
>> > > > > > > > > > > > > > > version.
>> > > > > > > > > > > > > > > > > For
>> > > > > > > > > > > > > > > > > > the ZK data structures, the word
>> "version"
>> > is
>> > > > > > > typically
>> > > > > > > > > > > > reserved
>> > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > > describing changes to the overall
>> schema of
>> > > the
>> > > > > > data
>> > > > > > > > that
>> > > > > > > > > > is
>> > > > > > > > > > > > > > written
>> > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > ZooKeeper.  We don't even really change
>> the
>> > > > > > "version"
>> > > > > > > > of
>> > > > > > > > > > > those
>> > > > > > > > > > > > > > > schemas
>> > > > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > > > much, since most changes are
>> > > > > backwards-compatible.
>> > > > > > > But
>> > > > > > > > > we
>> > > > > > > > > > do
>> > > > > > > > > > > > > > include
>> > > > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > > > version field just in case.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > I don't think we really need an epoch
>> here,
>> > > > > though,
>> > > > > > > > since
>> > > > > > > > > > we
>> > > > > > > > > > > > can
>> > > > > > > > > > > > > > just
>> > > > > > > > > > > > > > > > > look
>> > > > > > > > > > > > > > > > > > at the broker epoch.  Whenever the
>> broker
>> > > > > > registers,
>> > > > > > > > its
>> > > > > > > > > > > epoch
>> > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > > greater than the previous broker epoch.
>> > And
>> > > > the
>> > > > > > > newly
>> > > > > > > > > > > > registered
>> > > > > > > > > > > > > > > data
>> > > > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > > take priority.  This will be a lot
>> simpler
>> > > than
>> > > > > > > adding
>> > > > > > > > a
>> > > > > > > > > > > > separate
>> > > > > > > > > > > > > > > epoch
>> > > > > > > > > > > > > > > > > > system, I think.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > 2. Regarding admin client needing min
>> and
>> > > max
>> > > > > > > > > > information -
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > > are
>> > > > > > > > > > > > > > > > > > right!
>> > > > > > > > > > > > > > > > > > > I've changed the KIP such that the
>> Admin
>> > > API
>> > > > > also
>> > > > > > > > > allows
>> > > > > > > > > > > the
>> > > > > > > > > > > > > user
>> > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > read
>> > > > > > > > > > > > > > > > > > > 'supported features' from a specific
>> > > broker.
>> > > > > > Please
>> > > > > > > > > look
>> > > > > > > > > > at
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > section
>> > > > > > > > > > > > > > > > > > > "Admin API changes".
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs
>> `Long`
>> > -
>> > > it
>> > > > > was
>> > > > > > > not
>> > > > > > > > > > > > > deliberate.
>> > > > > > > > > > > > > > > > I've
>> > > > > > > > > > > > > > > > > > > improved the KIP to just use `long` at
>> > all
>> > > > > > places.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Sounds good.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > 4. Regarding
>> kafka.admin.FeatureCommand
>> > > tool
>> > > > -
>> > > > > > you
>> > > > > > > > are
>> > > > > > > > > > > right!
>> > > > > > > > > > > > > > I've
>> > > > > > > > > > > > > > > > > > updated
>> > > > > > > > > > > > > > > > > > > the KIP sketching the functionality
>> > > provided
>> > > > by
>> > > > > > > this
>> > > > > > > > > > tool,
>> > > > > > > > > > > > with
>> > > > > > > > > > > > > > > some
>> > > > > > > > > > > > > > > > > > > examples. Please look at the section
>> > > "Tooling
>> > > > > > > support
>> > > > > > > > > > > > > examples".
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Thank you!
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks, Kowshik.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > cheers,
>> > > > > > > > > > > > > > > > > > Colin
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > > > > Kowshik
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin
>> > > > McCabe <
>> > > > > > > > > > > > > > cmccabe@apache.org>
>> > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > In the "Schema" section, do we
>> really
>> > > need
>> > > > > both
>> > > > > > > > > > > > > > > __schema_version__
>> > > > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > > > > __data_version__?  Can we just have
>> a
>> > > > single
>> > > > > > > > version
>> > > > > > > > > > > field
>> > > > > > > > > > > > > > here?
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function
>> > have
>> > > > > some
>> > > > > > > way
>> > > > > > > > to
>> > > > > > > > > > get
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > min
>> > > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > > max
>> > > > > > > > > > > > > > > > > > > > information that we're exposing as
>> > > well?  I
>> > > > > > guess
>> > > > > > > > we
>> > > > > > > > > > > could
>> > > > > > > > > > > > > have
>> > > > > > > > > > > > > > > > min,
>> > > > > > > > > > > > > > > > > > max,
>> > > > > > > > > > > > > > > > > > > > and current.  Unrelated: is the use
>> of
>> > > Long
>> > > > > > > rather
>> > > > > > > > > than
>> > > > > > > > > > > > long
>> > > > > > > > > > > > > > > > > deliberate
>> > > > > > > > > > > > > > > > > > > > here?
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > It would be good to describe how the
>> > > > command
>> > > > > > line
>> > > > > > > > > tool
>> > > > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will
>> work.
>> > > For
>> > > > > > > example
>> > > > > > > > > the
>> > > > > > > > > > > > flags
>> > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > > > > take and the output that it will
>> > generate
>> > > > to
>> > > > > > > > STDOUT.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > cheers,
>> > > > > > > > > > > > > > > > > > > > Colin
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08,
>> Kowshik
>> > > > > > Prakasam
>> > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > > Hi all,
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > I've opened KIP-584
>> > > > > > > > > > > > > > > <
>> https://issues.apache.org/jira/browse/KIP-584>
>> > <
>> > > > > > > > > > > > > > > >
>> https://issues.apache.org/jira/browse/KIP-584
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > which
>> > > > > > > > > > > > > > > > > > > > > is intended to provide a
>> versioning
>> > > > scheme
>> > > > > > for
>> > > > > > > > > > > features.
>> > > > > > > > > > > > > I'd
>> > > > > > > > > > > > > > > like
>> > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > > > > > > > this thread to discuss the same.
>> I'd
>> > > > > > appreciate
>> > > > > > > > any
>> > > > > > > > > > > > > feedback
>> > > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > > > this.
>> > > > > > > > > > > > > > > > > > > > > Here
>> > > > > > > > > > > > > > > > > > > > > is a link to KIP-584
>> > > > > > > > > > > > > > > <
>> https://issues.apache.org/jira/browse/KIP-584>:
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > > > > > > > > > > > > > > > >  .
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Thank you!
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > > > > > > Kowshik
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Jun,

Thanks for the feedback! I have addressed the comments in the KIP.

> 200. In the validation section, there is still the text  "*from*
> {"max_version_level":
> X} *to* {"max_version_level": X’}". It seems that it should say "from X to
> Y"?

(Kowshik): Done. I have reworded it a bit to make it clearer now in this
section:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations

> 110. Could we add that we need to document the bumped version of each
> feature in the upgrade section of a release?

(Kowshik): Great point! Done, I have mentioned it in #3 this section:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584
<https://issues.apache.org/jira/browse/KIP-584>
%3A+Versioning+scheme+for+features#KIP-584
<https://issues.apache.org/jira/browse/KIP-584>
:Versioningschemeforfeatures-Whentouseversionedfeatureflags?


Cheers,
Kowshik

On Wed, Apr 15, 2020 at 4:00 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Looks good to me now. Just a couple of minor things below.
>
> 200. In the validation section, there is still the text  "*from*
> {"max_version_level":
> X} *to* {"max_version_level": X’}". It seems that it should say "from X to
> Y"?
>
> 110. Could we add that we need to document the bumped version of each
> feature in the upgrade section of a release?
>
> Thanks,
>
> Jun
>
> On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hi Jun,
> >
> > Thank you for the suggestion! I have updated the KIP, please find my
> > response below.
> >
> > > 200. I guess you are saying only when the allowDowngrade field is set,
> > the
> > > finalized feature version can go backward. Otherwise, it can only go
> up.
> > > That makes sense. It would be useful to make that clear when explaining
> > > the usage of the allowDowngrade field. In the validation section, we
> > have  "
> > > /features' from {"max_version_level": X} to {"max_version_level": X’}",
> > it
> > > seems that we need to mention Y there.
> >
> > (Kowshik): Great point! Yes, that is correct. Done, I have updated the
> > validations
> > section explaining the above. Here is a link to this section:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
> >
> >
> > Cheers,
> > Kowshik
> >
> >
> >
> >
> > On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > 200. I guess you are saying only when the allowDowngrade field is set,
> > the
> > > finalized feature version can go backward. Otherwise, it can only go
> up.
> > > That makes sense. It would be useful to make that clear when explaining
> > > the usage of the allowDowngrade field. In the validation section, we
> have
> > > "
> > > /features' from {"max_version_level": X} to {"max_version_level": X’}",
> > it
> > > seems that we need to mention Y there.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <
> > kprakasam@confluent.io>
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Great question! Please find my response below.
> > > >
> > > > > 200. My understanding is that If the CLI tool passes the
> > > > > '--allow-downgrade' flag when updating a specific feature, then a
> > > future
> > > > > downgrade is possible. Otherwise, the feature is now downgradable.
> If
> > > so,
> > > > I
> > > > > was wondering how the controller remembers this since it can be
> > > restarted
> > > > > over time?
> > > >
> > > > (Kowshik): The purpose of the flag was to just restrict the user
> intent
> > > for
> > > > a specific request.
> > > > It seems to me that to avoid confusion, I could call the flag as
> > > > `--try-downgrade` instead.
> > > > Then this makes it clear, that, the controller just has to consider
> the
> > > ask
> > > > from
> > > > the user as an explicit request to attempt a downgrade.
> > > >
> > > > The flag does not act as an override on controller's decision making
> > that
> > > > decides whether
> > > > a flag is downgradable (these decisions on whether to allow a flag to
> > be
> > > > downgraded
> > > > from a specific version level, can be embedded in the controller
> code).
> > > >
> > > > Please let me know what you think.
> > > > Sorry if I misunderstood the original question.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > >
> > > > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the reply. Makes sense. Just one more question.
> > > > >
> > > > > 200. My understanding is that If the CLI tool passes the
> > > > > '--allow-downgrade' flag when updating a specific feature, then a
> > > future
> > > > > downgrade is possible. Otherwise, the feature is now downgradable.
> If
> > > > so, I
> > > > > was wondering how the controller remembers this since it can be
> > > restarted
> > > > > over time?
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <
> > > kprakasam@confluent.io
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks a lot for the feedback and the questions!
> > > > > > Please find my response below.
> > > > > >
> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade
> field.
> > It
> > > > > seems
> > > > > > > that field needs to be persisted somewhere in ZK?
> > > > > >
> > > > > > (Kowshik): Great question! Below is my explanation. Please help
> me
> > > > > > understand,
> > > > > > if you feel there are cases where we would need to still persist
> it
> > > in
> > > > > ZK.
> > > > > >
> > > > > > Firstly I have updated my thoughts into the KIP now, under the
> > > > > 'guidelines'
> > > > > > section:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > > > > >
> > > > > > The allowDowngrade boolean field is just to restrict the user
> > intent,
> > > > and
> > > > > > to remind
> > > > > > them to double check their intent before proceeding. It should be
> > set
> > > > to
> > > > > > true
> > > > > > by the user in a request, only when the user intent is to
> > forcefully
> > > > > > "attempt" a
> > > > > > downgrade of a specific feature's max version level, to the
> > provided
> > > > > value
> > > > > > in
> > > > > > the request.
> > > > > >
> > > > > > We can extend this safeguard. The controller (on it's end) can
> > > maintain
> > > > > > rules in the code, that, for safety reasons would outright reject
> > > > certain
> > > > > > downgrades
> > > > > > from a specific max_version_level for a specific feature. Such
> > > > rejections
> > > > > > may
> > > > > > happen depending on the feature being downgraded, and from what
> > > version
> > > > > > level.
> > > > > >
> > > > > > The CLI tool only allows a downgrade attempt in conjunction with
> > > > specific
> > > > > > flags and sub-commands. For example, in the CLI tool, if the user
> > > uses
> > > > > the
> > > > > > 'downgrade-all' command, or passes '--allow-downgrade' flag when
> > > > > updating a
> > > > > > specific feature, only then the tool will translate this ask to
> > > setting
> > > > > > 'allowDowngrade' field in the request to the server.
> > > > > >
> > > > > > > 201. UpdateFeaturesResponse has the following top level fields.
> > > > Should
> > > > > > > those fields be per feature?
> > > > > > >
> > > > > > >   "fields": [
> > > > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > > > > > >       "about": "The error code, or 0 if there was no error." },
> > > > > > >     { "name": "ErrorMessage", "type": "string", "versions":
> "0+",
> > > > > > >       "about": "The error message, or null if there was no
> > error."
> > > }
> > > > > > >   ]
> > > > > >
> > > > > > (Kowshik): Great question!
> > > > > > As such, the API is transactional, as explained in the sections
> > > linked
> > > > > > below.
> > > > > > Either all provided FeatureUpdate was applied, or none.
> > > > > > It's the reason I felt we can have just one error code + message.
> > > > > > Happy to extend this if you feel otherwise. Please let me know.
> > > > > >
> > > > > > Link to sections:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
> > > > > >
> > > > > > > 202. The /features path in ZK has a field min_version_level.
> > Which
> > > > API
> > > > > > and
> > > > > > > tool can change that value?
> > > > > >
> > > > > > (Kowshik): Great question! Currently this cannot be modified by
> > using
> > > > the
> > > > > > API or the tool.
> > > > > > Feature version deprecation (by raising min_version_level) can be
> > > done
> > > > > only
> > > > > > by the Controller directly. The rationale is explained in this
> > > section:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Kowshik,
> > > > > > >
> > > > > > > Thanks for addressing those comments. Just a few more minor
> > > comments.
> > > > > > >
> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade
> field.
> > It
> > > > > seems
> > > > > > > that field needs to be persisted somewhere in ZK?
> > > > > > >
> > > > > > > 201. UpdateFeaturesResponse has the following top level fields.
> > > > Should
> > > > > > > those fields be per feature?
> > > > > > >
> > > > > > >   "fields": [
> > > > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > > > > > >       "about": "The error code, or 0 if there was no error." },
> > > > > > >     { "name": "ErrorMessage", "type": "string", "versions":
> "0+",
> > > > > > >       "about": "The error message, or null if there was no
> > error."
> > > }
> > > > > > >   ]
> > > > > > >
> > > > > > > 202. The /features path in ZK has a field min_version_level.
> > Which
> > > > API
> > > > > > and
> > > > > > > tool can change that value?
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
> > > > > kprakasam@confluent.io
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > Thanks for the feedback! I have updated the KIP-584
> addressing
> > > your
> > > > > > > > comments.
> > > > > > > > Please find my response below.
> > > > > > > >
> > > > > > > > > 100.6 You can look for the sentence "This operation
> requires
> > > > ALTER
> > > > > on
> > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > > > > KafkaApis.authorize().
> > > > > > > >
> > > > > > > > (Kowshik): Done. Great point! For the newly introduced
> > > > > UPDATE_FEATURES
> > > > > > > api,
> > > > > > > > I have added a
> > > > > > > > requirement that AclOperation.ALTER is required on
> > > > > > ResourceType.CLUSTER.
> > > > > > > >
> > > > > > > > > 110. Keeping the feature version as int is probably fine. I
> > > just
> > > > > felt
> > > > > > > > that
> > > > > > > > > for some of the common user interactions, it's more
> > convenient
> > > to
> > > > > > > > > relate that to a release version. For example, if a user
> > wants
> > > to
> > > > > > > > downgrade
> > > > > > > > > to a release 2.5, it's easier for the user to use the tool
> > like
> > > > > "tool
> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
> > > > --version
> > > > > > 6".
> > > > > > > >
> > > > > > > > (Kowshik): Great point. Generally, maximum feature version
> > levels
> > > > are
> > > > > > not
> > > > > > > > downgradable after
> > > > > > > > they are finalized in the cluster. This is because, as a
> > > guideline
> > > > > > > bumping
> > > > > > > > feature version level usually is used mainly to convey
> > important
> > > > > > breaking
> > > > > > > > changes.
> > > > > > > > Despite the above, there may be some extreme/rare cases
> where a
> > > > user
> > > > > > > wants
> > > > > > > > to downgrade
> > > > > > > > all features to a specific previous release. The user may
> want
> > to
> > > > do
> > > > > > this
> > > > > > > > just
> > > > > > > > prior to rolling back a Kafka cluster to a previous release.
> > > > > > > >
> > > > > > > > To support the above, I have made a change to the KIP
> > explaining
> > > > that
> > > > > > the
> > > > > > > > CLI tool is versioned.
> > > > > > > > The CLI tool internally has knowledge about a map of features
> > to
> > > > > their
> > > > > > > > respective max
> > > > > > > > versions supported by the Broker. The tool's knowledge of
> > > features
> > > > > and
> > > > > > > > their version values,
> > > > > > > > is limited to the version of the CLI tool itself i.e. the
> > > > information
> > > > > > is
> > > > > > > > packaged into the CLI tool
> > > > > > > > when it is released. Whenever a Kafka release introduces a
> new
> > > > > feature
> > > > > > > > version, or modifies
> > > > > > > > an existing feature version, the CLI tool shall also be
> updated
> > > > with
> > > > > > this
> > > > > > > > information,
> > > > > > > > Newer versions of the CLI tool will be released as part of
> the
> > > > Kafka
> > > > > > > > releases.
> > > > > > > >
> > > > > > > > Therefore, to achieve the downgrade need, the user just needs
> > to
> > > > run
> > > > > > the
> > > > > > > > version of
> > > > > > > > the CLI tool that's part of the particular previous release
> > that
> > > > > he/she
> > > > > > > is
> > > > > > > > downgrading to.
> > > > > > > > To help the user with this, there is a new command added to
> the
> > > CLI
> > > > > > tool
> > > > > > > > called `downgrade-all`.
> > > > > > > > This essentially downgrades max version levels of all
> features
> > in
> > > > the
> > > > > > > > cluster to the versions
> > > > > > > > known to the CLI tool internally.
> > > > > > > >
> > > > > > > > I have explained the above in the KIP under these sections:
> > > > > > > >
> > > > > > > > Tooling support (have explained that the CLI tool is
> > versioned):
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > > > > >
> > > > > > > > Regular CLI tool usage (please refer to point #3, and see the
> > > > tooling
> > > > > > > > example)
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > > > > > >
> > > > > > > > > 110. Similarly, if the client library finds a feature
> > mismatch
> > > > with
> > > > > > the
> > > > > > > > broker,
> > > > > > > > > the client likely needs to log some error message for the
> > user
> > > to
> > > > > > take
> > > > > > > > some
> > > > > > > > > actions. It's much more actionable if the error message is
> > > > "upgrade
> > > > > > the
> > > > > > > > > broker to release version 2.6" than just "upgrade the
> broker
> > to
> > > > > > feature
> > > > > > > > > version 7".
> > > > > > > >
> > > > > > > > (Kowshik): That's a really good point! If we use ints for
> > feature
> > > > > > > versions,
> > > > > > > > the best
> > > > > > > > message that client can print for debugging is "broker
> doesn't
> > > > > support
> > > > > > > > feature version 7", and alongside that print the supported
> > > version
> > > > > > range
> > > > > > > > returned
> > > > > > > > by the broker. Then, does it sound reasonable that the user
> > could
> > > > > then
> > > > > > > > reference
> > > > > > > > Kafka release logs to figure out which version of the broker
> > > > release
> > > > > is
> > > > > > > > required
> > > > > > > > be deployed, to support feature version 7? I couldn't think
> of
> > a
> > > > > better
> > > > > > > > strategy here.
> > > > > > > >
> > > > > > > > > 120. When should a developer bump up the version of a
> > feature?
> > > > > > > >
> > > > > > > > (Kowshik): Great question! In the KIP, I have added a
> section:
> > > > > > > 'Guidelines
> > > > > > > > on feature versions and workflows'
> > > > > > > > providing some guidelines on when to use the versioned
> feature
> > > > flags,
> > > > > > and
> > > > > > > > what
> > > > > > > > are the regular workflows with the CLI tool.
> > > > > > > >
> > > > > > > > Link to the relevant sections:
> > > > > > > > Guidelines:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > > > > > > >
> > > > > > > > Regular CLI tool usage:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > > > > > >
> > > > > > > > Advanced CLI tool usage:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Kowshik,
> > > > > > > > >
> > > > > > > > > Thanks for the reply. A few more comments.
> > > > > > > > >
> > > > > > > > > 110. Keeping the feature version as int is probably fine. I
> > > just
> > > > > felt
> > > > > > > > that
> > > > > > > > > for some of the common user interactions, it's more
> > convenient
> > > to
> > > > > > > > > relate that to a release version. For example, if a user
> > wants
> > > to
> > > > > > > > downgrade
> > > > > > > > > to a release 2.5, it's easier for the user to use the tool
> > like
> > > > > "tool
> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
> > > > --version
> > > > > > 6".
> > > > > > > > > Similarly, if the client library finds a feature mismatch
> > with
> > > > the
> > > > > > > > broker,
> > > > > > > > > the client likely needs to log some error message for the
> > user
> > > to
> > > > > > take
> > > > > > > > some
> > > > > > > > > actions. It's much more actionable if the error message is
> > > > "upgrade
> > > > > > the
> > > > > > > > > broker to release version 2.6" than just "upgrade the
> broker
> > to
> > > > > > feature
> > > > > > > > > version 7".
> > > > > > > > >
> > > > > > > > > 111. Sounds good.
> > > > > > > > >
> > > > > > > > > 120. When should a developer bump up the version of a
> > feature?
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> > > > > > > kprakasam@confluent.io
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Jun,
> > > > > > > > > >
> > > > > > > > > > I have updated the KIP for the item 111.
> > > > > > > > > > I'm in the process of addressing 100.6, and will provide
> an
> > > > > update
> > > > > > > > soon.
> > > > > > > > > > I think item 110 is still under discussion given we are
> now
> > > > > > > providing a
> > > > > > > > > way
> > > > > > > > > > to finalize
> > > > > > > > > > all features to their latest version levels. In any case,
> > > > please
> > > > > > let
> > > > > > > us
> > > > > > > > > > know
> > > > > > > > > > how you feel in response to Colin's comments on this
> topic.
> > > > > > > > > >
> > > > > > > > > > > 111. To put this in context, when we had IBP, the
> default
> > > > value
> > > > > > is
> > > > > > > > the
> > > > > > > > > > > current released version. So, if you are a brand new
> > user,
> > > > you
> > > > > > > don't
> > > > > > > > > need
> > > > > > > > > > > to configure IBP and all new features will be
> immediately
> > > > > > available
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > new cluster. If you are upgrading from an old version,
> > you
> > > do
> > > > > > need
> > > > > > > to
> > > > > > > > > > > understand and configure IBP. I see a similar pattern
> > here
> > > > for
> > > > > > > > > > > features. From the ease of use perspective, ideally, we
> > > > > shouldn't
> > > > > > > > > require
> > > > > > > > > > a
> > > > > > > > > > > new user to have an extra step such as running a
> > bootstrap
> > > > > script
> > > > > > > > > unless
> > > > > > > > > > > it's truly necessary. If someone has a special need
> (all
> > > the
> > > > > > cases
> > > > > > > > you
> > > > > > > > > > > mentioned seem special cases?), they can configure a
> mode
> > > > such
> > > > > > that
> > > > > > > > > > > features are enabled/disabled manually.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry
> if
> > I
> > > > > didn't
> > > > > > > > > > understand
> > > > > > > > > > this need earlier. I have updated the KIP with the
> approach
> > > > that
> > > > > > > > whenever
> > > > > > > > > > the '/features' node is absent, the controller by default
> > > will
> > > > > > > > bootstrap
> > > > > > > > > > the node
> > > > > > > > > > to contain the latest feature levels. Here is the new
> > section
> > > > in
> > > > > > the
> > > > > > > > KIP
> > > > > > > > > > describing
> > > > > > > > > > the same:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > > > > > > > > >
> > > > > > > > > > Next, as I explained in my response to Colin's
> suggestions,
> > > we
> > > > > are
> > > > > > > now
> > > > > > > > > > providing a `--finalize-latest-features` flag with the
> > > tooling.
> > > > > > This
> > > > > > > > lets
> > > > > > > > > > the sysadmin finalize all features known to the
> controller
> > to
> > > > > their
> > > > > > > > > latest
> > > > > > > > > > version
> > > > > > > > > > levels. Please look at this section (point #3 and the
> > tooling
> > > > > > example
> > > > > > > > > > later):
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Do you feel this addresses your comment/concern?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <
> jun@confluent.io>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the reply. A few more replies below.
> > > > > > > > > > >
> > > > > > > > > > > 100.6 You can look for the sentence "This operation
> > > requires
> > > > > > ALTER
> > > > > > > on
> > > > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > > > > > > KafkaApis.authorize().
> > > > > > > > > > >
> > > > > > > > > > > 110. From the external client/tooling perspective, it's
> > > more
> > > > > > > natural
> > > > > > > > to
> > > > > > > > > > use
> > > > > > > > > > > the release version for features. If we can use the
> same
> > > > > release
> > > > > > > > > version
> > > > > > > > > > > for internal representation, it seems simpler (easier
> to
> > > > > > > understand,
> > > > > > > > no
> > > > > > > > > > > mapping overhead, etc). Is there a benefit with
> separate
> > > > > external
> > > > > > > and
> > > > > > > > > > > internal versioning schemes?
> > > > > > > > > > >
> > > > > > > > > > > 111. To put this in context, when we had IBP, the
> default
> > > > value
> > > > > > is
> > > > > > > > the
> > > > > > > > > > > current released version. So, if you are a brand new
> > user,
> > > > you
> > > > > > > don't
> > > > > > > > > need
> > > > > > > > > > > to configure IBP and all new features will be
> immediately
> > > > > > available
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > new cluster. If you are upgrading from an old version,
> > you
> > > do
> > > > > > need
> > > > > > > to
> > > > > > > > > > > understand and configure IBP. I see a similar pattern
> > here
> > > > for
> > > > > > > > > > > features. From the ease of use perspective, ideally, we
> > > > > shouldn't
> > > > > > > > > > require a
> > > > > > > > > > > new user to have an extra step such as running a
> > bootstrap
> > > > > script
> > > > > > > > > unless
> > > > > > > > > > > it's truly necessary. If someone has a special need
> (all
> > > the
> > > > > > cases
> > > > > > > > you
> > > > > > > > > > > mentioned seem special cases?), they can configure a
> mode
> > > > such
> > > > > > that
> > > > > > > > > > > features are enabled/disabled manually.
> > > > > > > > > > >
> > > > > > > > > > > Jun
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> > > > > > > > > kprakasam@confluent.io>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the feedback and suggestions. Please find
> my
> > > > > > response
> > > > > > > > > below.
> > > > > > > > > > > >
> > > > > > > > > > > > > 100.6 For every new request, the admin needs to
> > control
> > > > who
> > > > > > is
> > > > > > > > > > allowed
> > > > > > > > > > > to
> > > > > > > > > > > > > issue that request if security is enabled. So, we
> > need
> > > to
> > > > > > > assign
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > > > > request a ResourceType and possible AclOperations.
> > See
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > > > > > as an example.
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): I don't see any reference to the words
> > > > > ResourceType
> > > > > > or
> > > > > > > > > > > > AclOperations
> > > > > > > > > > > > in the KIP. Please let me know how I can use the KIP
> > that
> > > > you
> > > > > > > > linked
> > > > > > > > > to
> > > > > > > > > > > > know how to
> > > > > > > > > > > > setup the appropriate ResourceType and/or
> > > ClusterOperation?
> > > > > > > > > > > >
> > > > > > > > > > > > > 105. If we change delete to disable, it's better to
> > do
> > > > this
> > > > > > > > > > > consistently
> > > > > > > > > > > > in
> > > > > > > > > > > > > request protocol and admin api as well.
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): The API shouldn't be called 'disable' when
> > it
> > > is
> > > > > > > > deleting
> > > > > > > > > a
> > > > > > > > > > > > feature.
> > > > > > > > > > > > I've just changed the KIP to use 'delete'. I don't
> > have a
> > > > > > strong
> > > > > > > > > > > > preference.
> > > > > > > > > > > >
> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use
> > int64.
> > > > > > > Currently,
> > > > > > > > > our
> > > > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
> > > > 2.5.0).
> > > > > > It's
> > > > > > > > > > > possible
> > > > > > > > > > > > > for new features to be included in minor releases
> > too.
> > > > > Should
> > > > > > > we
> > > > > > > > > make
> > > > > > > > > > > the
> > > > > > > > > > > > > feature versioning match the release versioning?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): The release version can be mapped to a set
> > of
> > > > > > feature
> > > > > > > > > > > versions,
> > > > > > > > > > > > and this can be done, for example in the tool (or
> even
> > > > > external
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > tool).
> > > > > > > > > > > > Can you please clarify what I'm missing?
> > > > > > > > > > > >
> > > > > > > > > > > > > 111. "During regular operations, the data in the ZK
> > > node
> > > > > can
> > > > > > be
> > > > > > > > > > mutated
> > > > > > > > > > > > > only via a specific admin API served only by the
> > > > > > controller." I
> > > > > > > > am
> > > > > > > > > > > > > wondering why can't the controller auto finalize a
> > > > feature
> > > > > > > > version
> > > > > > > > > > > after
> > > > > > > > > > > > > all brokers are upgraded? For new users who
> download
> > > the
> > > > > > latest
> > > > > > > > > > version
> > > > > > > > > > > > to
> > > > > > > > > > > > > build a new cluster, it's inconvenient for them to
> > have
> > > > to
> > > > > > > > manually
> > > > > > > > > > > > enable
> > > > > > > > > > > > > each feature.
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): I agree that there is a trade-off here,
> but
> > it
> > > > > will
> > > > > > > help
> > > > > > > > > > > > to decide whether the automation can be thought
> through
> > > in
> > > > > the
> > > > > > > > future
> > > > > > > > > > > > in a follow up KIP, or right now in this KIP. We may
> > > invest
> > > > > > > > > > > > in automation, but we have to decide whether we
> should
> > do
> > > > it
> > > > > > > > > > > > now or later.
> > > > > > > > > > > >
> > > > > > > > > > > > For the inconvenience that you mentioned, do you
> think
> > > the
> > > > > > > problem
> > > > > > > > > that
> > > > > > > > > > > you
> > > > > > > > > > > > mentioned can be  overcome by asking for the cluster
> > > > operator
> > > > > > to
> > > > > > > > run
> > > > > > > > > a
> > > > > > > > > > > > bootstrap script  when he/she knows that a specific
> AK
> > > > > release
> > > > > > > has
> > > > > > > > > been
> > > > > > > > > > > > almost completely deployed in a cluster for the first
> > > time?
> > > > > > Idea
> > > > > > > is
> > > > > > > > > > that
> > > > > > > > > > > > the
> > > > > > > > > > > > bootstrap script will know how to map a specific AK
> > > release
> > > > > to
> > > > > > > > > > finalized
> > > > > > > > > > > > feature versions, and run the `kafka-features.sh`
> tool
> > > > > > > > appropriately
> > > > > > > > > > > > against
> > > > > > > > > > > > the cluster.
> > > > > > > > > > > >
> > > > > > > > > > > > Now, coming back to your automation
> proposal/question.
> > > > > > > > > > > > I do see the value of automated feature version
> > > > finalization,
> > > > > > > but I
> > > > > > > > > > also
> > > > > > > > > > > > see
> > > > > > > > > > > > that this will open up several questions and some
> > risks,
> > > as
> > > > > > > > explained
> > > > > > > > > > > > below.
> > > > > > > > > > > > The answers to these depend on the definition of the
> > > > > automation
> > > > > > > we
> > > > > > > > > > choose
> > > > > > > > > > > > to build, and how well does it fit into a kafka
> > > deployment.
> > > > > > > > > > > > Basically, it can be unsafe for the controller to
> > > finalize
> > > > > > > feature
> > > > > > > > > > > version
> > > > > > > > > > > > upgrades automatically, without learning about the
> > intent
> > > > of
> > > > > > the
> > > > > > > > > > cluster
> > > > > > > > > > > > operator.
> > > > > > > > > > > > 1. We would sometimes want to lock feature versions
> > only
> > > > when
> > > > > > we
> > > > > > > > have
> > > > > > > > > > > > externally verified
> > > > > > > > > > > > the stability of the broker binary.
> > > > > > > > > > > > 2. Sometimes only the cluster operator knows that a
> > > cluster
> > > > > > > upgrade
> > > > > > > > > is
> > > > > > > > > > > > complete,
> > > > > > > > > > > > and new brokers are highly unlikely to join the
> > cluster.
> > > > > > > > > > > > 3. Only the cluster operator knows that the intent is
> > to
> > > > > deploy
> > > > > > > the
> > > > > > > > > > same
> > > > > > > > > > > > version
> > > > > > > > > > > > of the new broker release across the entire cluster
> > (i.e.
> > > > the
> > > > > > > > latest
> > > > > > > > > > > > downloaded version).
> > > > > > > > > > > > 4. For downgrades, it appears the controller still
> > needs
> > > > some
> > > > > > > > > external
> > > > > > > > > > > > input
> > > > > > > > > > > > (such as the proposed tool) to finalize a feature
> > version
> > > > > > > > downgrade.
> > > > > > > > > > > >
> > > > > > > > > > > > If we have automation, that automation can end up
> > failing
> > > > in
> > > > > > some
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > cases
> > > > > > > > > > > > above. Then, we need a way to declare that the
> cluster
> > is
> > > > > "not
> > > > > > > > ready"
> > > > > > > > > > if
> > > > > > > > > > > > the
> > > > > > > > > > > > controller cannot automatically finalize some basic
> > > > required
> > > > > > > > feature
> > > > > > > > > > > > version
> > > > > > > > > > > > upgrades across the cluster. We need to make the
> > cluster
> > > > > > operator
> > > > > > > > > aware
> > > > > > > > > > > in
> > > > > > > > > > > > such a scenario (raise an alert or alike).
> > > > > > > > > > > >
> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
> > should
> > > > be
> > > > > 49
> > > > > > > > > instead
> > > > > > > > > > > of
> > > > > > > > > > > > 48.
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Done.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Kowshik
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <
> > > jun@confluent.io>
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the reply. A few more comments below.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 100.6 For every new request, the admin needs to
> > control
> > > > who
> > > > > > is
> > > > > > > > > > allowed
> > > > > > > > > > > to
> > > > > > > > > > > > > issue that request if security is enabled. So, we
> > need
> > > to
> > > > > > > assign
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > > > > request a ResourceType and possible AclOperations.
> > See
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > > > > > as
> > > > > > > > > > > > > an example.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 105. If we change delete to disable, it's better to
> > do
> > > > this
> > > > > > > > > > > consistently
> > > > > > > > > > > > in
> > > > > > > > > > > > > request protocol and admin api as well.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use
> > int64.
> > > > > > > Currently,
> > > > > > > > > our
> > > > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
> > > > 2.5.0).
> > > > > > It's
> > > > > > > > > > > possible
> > > > > > > > > > > > > for new features to be included in minor releases
> > too.
> > > > > Should
> > > > > > > we
> > > > > > > > > make
> > > > > > > > > > > the
> > > > > > > > > > > > > feature versioning match the release versioning?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 111. "During regular operations, the data in the ZK
> > > node
> > > > > can
> > > > > > be
> > > > > > > > > > mutated
> > > > > > > > > > > > > only via a specific admin API served only by the
> > > > > > controller." I
> > > > > > > > am
> > > > > > > > > > > > > wondering why can't the controller auto finalize a
> > > > feature
> > > > > > > > version
> > > > > > > > > > > after
> > > > > > > > > > > > > all brokers are upgraded? For new users who
> download
> > > the
> > > > > > latest
> > > > > > > > > > version
> > > > > > > > > > > > to
> > > > > > > > > > > > > build a new cluster, it's inconvenient for them to
> > have
> > > > to
> > > > > > > > manually
> > > > > > > > > > > > enable
> > > > > > > > > > > > > each feature.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
> > should
> > > > be
> > > > > 49
> > > > > > > > > instead
> > > > > > > > > > > of
> > > > > > > > > > > > > 48.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jun
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > > > > > > > > > kprakasam@confluent.io>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hey Jun,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks a lot for the great feedback! Please note
> > that
> > > > the
> > > > > > > > design
> > > > > > > > > > > > > > has changed a little bit on the KIP, and we now
> > > > propagate
> > > > > > the
> > > > > > > > > > > finalized
> > > > > > > > > > > > > > features metadata only via ZK watches (instead of
> > > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > > > > from the controller).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please find below my response to your
> > > > questions/feedback,
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > prefix
> > > > > > > > > > > > > > "(Kowshik):".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100.
> UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > > > > > 100.1 Since this request waits for responses
> from
> > > > > > brokers,
> > > > > > > > > should
> > > > > > > > > > > we
> > > > > > > > > > > > > add
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > timeout in the request (like
> createTopicRequest)?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! Done. I have added a
> > timeout
> > > > > field.
> > > > > > > > Note:
> > > > > > > > > > we
> > > > > > > > > > > no
> > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > wait for responses from brokers, since the design
> > has
> > > > > been
> > > > > > > > > changed
> > > > > > > > > > so
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > features information is propagated via ZK.
> > > > Nevertheless,
> > > > > it
> > > > > > > is
> > > > > > > > > > right
> > > > > > > > > > > to
> > > > > > > > > > > > > > have a timeout
> > > > > > > > > > > > > > for the request.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> > > Typically,
> > > > > the
> > > > > > > > > response
> > > > > > > > > > > > just
> > > > > > > > > > > > > > > shows an error code and an error message,
> instead
> > > of
> > > > > > > echoing
> > > > > > > > > the
> > > > > > > > > > > > > request.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it
> to
> > > > just
> > > > > > > return
> > > > > > > > > an
> > > > > > > > > > > > error
> > > > > > > > > > > > > > code and a message.
> > > > > > > > > > > > > > Previously it was not echoing the "request",
> rather
> > > it
> > > > > was
> > > > > > > > > > returning
> > > > > > > > > > > > the
> > > > > > > > > > > > > > latest set of
> > > > > > > > > > > > > > cluster-wide finalized features (after applying
> the
> > > > > > updates).
> > > > > > > > But
> > > > > > > > > > you
> > > > > > > > > > > > are
> > > > > > > > > > > > > > right,
> > > > > > > > > > > > > > the additional info is not required, so I have
> > > removed
> > > > it
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > > > > > response
> > > > > > > > > > > > > > schema.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100.3 Should we add a separate request to
> > > > list/describe
> > > > > > the
> > > > > > > > > > > existing
> > > > > > > > > > > > > > > features?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): This is already present in the KIP via
> > the
> > > > > > > > > > > > 'DescribeFeatures'
> > > > > > > > > > > > > > Admin API,
> > > > > > > > > > > > > > which, underneath covers uses the
> > ApiVersionsRequest
> > > to
> > > > > > > > > > list/describe
> > > > > > > > > > > > the
> > > > > > > > > > > > > > existing features. Please read the 'Tooling
> > support'
> > > > > > section.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE
> in a
> > > > > single
> > > > > > > > > request.
> > > > > > > > > > > For
> > > > > > > > > > > > > > > DELETE, the version field doesn't make sense.
> > So, I
> > > > > guess
> > > > > > > the
> > > > > > > > > > > broker
> > > > > > > > > > > > > just
> > > > > > > > > > > > > > > ignores this? An alternative way is to have a
> > > > separate
> > > > > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP
> now
> > > to
> > > > > > have 2
> > > > > > > > > > > separate
> > > > > > > > > > > > > > controller APIs
> > > > > > > > > > > > > > serving these different purposes:
> > > > > > > > > > > > > > 1. updateFeatures
> > > > > > > > > > > > > > 2. deleteFeatures
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > > > > > monotonically
> > > > > > > > > > > > increasing
> > > > > > > > > > > > > > > version of the metadata for finalized
> features."
> > I
> > > am
> > > > > > > > wondering
> > > > > > > > > > why
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > ordering is important?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is
> called
> > > > epoch
> > > > > > > > > (instead
> > > > > > > > > > of
> > > > > > > > > > > > > > version), and
> > > > > > > > > > > > > > it is just the ZK node version. Basically, this
> is
> > > the
> > > > > > epoch
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > > > cluster-wide
> > > > > > > > > > > > > > finalized feature version metadata. This metadata
> > is
> > > > > served
> > > > > > > to
> > > > > > > > > > > clients
> > > > > > > > > > > > > via
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate
> > updates
> > > > > from
> > > > > > > the
> > > > > > > > > > > > > '/features'
> > > > > > > > > > > > > > ZK node
> > > > > > > > > > > > > > to all brokers, via ZK watches setup by each
> broker
> > > on
> > > > > the
> > > > > > > > > > > '/features'
> > > > > > > > > > > > > > node.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Now here is why the ordering is important:
> > > > > > > > > > > > > > ZK watches don't propagate at the same time. As a
> > > > result,
> > > > > > the
> > > > > > > > > > > > > > ApiVersionsResponse
> > > > > > > > > > > > > > is eventually consistent across brokers. This can
> > > > > introduce
> > > > > > > > cases
> > > > > > > > > > > > > > where clients see an older lower epoch of the
> > > features
> > > > > > > > metadata,
> > > > > > > > > > > after
> > > > > > > > > > > > a
> > > > > > > > > > > > > > more recent
> > > > > > > > > > > > > > higher epoch was returned at a previous point in
> > > time.
> > > > We
> > > > > > > > expect
> > > > > > > > > > > > clients
> > > > > > > > > > > > > > to always employ the rule that the latest
> received
> > > > higher
> > > > > > > epoch
> > > > > > > > > of
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > always trumps an older smaller epoch. Those
> clients
> > > > that
> > > > > > are
> > > > > > > > > > external
> > > > > > > > > > > > to
> > > > > > > > > > > > > > Kafka should strongly consider discovering the
> > latest
> > > > > > > metadata
> > > > > > > > > once
> > > > > > > > > > > > > during
> > > > > > > > > > > > > > startup from the brokers, and if required refresh
> > the
> > > > > > > metadata
> > > > > > > > > > > > > periodically
> > > > > > > > > > > > > > (to get the latest metadata).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for
> this
> > > new
> > > > > > > > request?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): What is ACL, and how could I find out
> > > which
> > > > > one
> > > > > > to
> > > > > > > > > > > specify?
> > > > > > > > > > > > > > Please could you provide me some pointers? I'll
> be
> > > glad
> > > > > to
> > > > > > > > update
> > > > > > > > > > the
> > > > > > > > > > > > > > KIP once I know the next steps.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 101. For the broker registration ZK node,
> should
> > we
> > > > > bump
> > > > > > up
> > > > > > > > the
> > > > > > > > > > > > version
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > the json?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! Done. I've increased the
> > > > version
> > > > > in
> > > > > > > the
> > > > > > > > > > > broker
> > > > > > > > > > > > > json
> > > > > > > > > > > > > > by 1.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we
> > need
> > > > the
> > > > > > > epoch
> > > > > > > > > > > field.
> > > > > > > > > > > > > Each
> > > > > > > > > > > > > > > ZK node has an internal version field that is
> > > > > incremented
> > > > > > > on
> > > > > > > > > > every
> > > > > > > > > > > > > > update.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK
> node
> > > > > version
> > > > > > > > now,
> > > > > > > > > > > > instead
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > explicitly
> > > > > > > > > > > > > > incremented epoch.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a
> feature
> > > > > version
> > > > > > > > > > > cluster-wide
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > left to the discretion of the logic
> implementing
> > > the
> > > > > > > feature
> > > > > > > > > (ex:
> > > > > > > > > > > can
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > done via dynamic broker config)." Does that
> mean
> > > the
> > > > > > broker
> > > > > > > > > > > > > registration
> > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > node will be updated dynamically when this
> > happens?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Not really. The text was just
> conveying
> > > > that a
> > > > > > > > broker
> > > > > > > > > > > could
> > > > > > > > > > > > > > "know" of
> > > > > > > > > > > > > > a new feature version, but it does not mean the
> > > broker
> > > > > > should
> > > > > > > > > have
> > > > > > > > > > > also
> > > > > > > > > > > > > > activated the effects of the feature version.
> > Knowing
> > > > vs
> > > > > > > > > activation
> > > > > > > > > > > > are 2
> > > > > > > > > > > > > > separate things,
> > > > > > > > > > > > > > and the latter can be achieved by dynamic
> config. I
> > > > have
> > > > > > > > reworded
> > > > > > > > > > the
> > > > > > > > > > > > > text
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > make this clear to the reader.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > > > > > 104.1 It would be useful to describe when the
> > > feature
> > > > > > > > metadata
> > > > > > > > > is
> > > > > > > > > > > > > > included
> > > > > > > > > > > > > > > in the request. My understanding is that it's
> > only
> > > > > > included
> > > > > > > > if
> > > > > > > > > > (1)
> > > > > > > > > > > > > there
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > a change to the finalized feature; (2) broker
> > > > restart;
> > > > > > (3)
> > > > > > > > > > > controller
> > > > > > > > > > > > > > > failover.
> > > > > > > > > > > > > > > 104.2 The new fields have the following
> versions.
> > > Why
> > > > > are
> > > > > > > the
> > > > > > > > > > > > versions
> > > > > > > > > > > > > 3+
> > > > > > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > > > > > >       "fields":  [
> > > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> > > > "versions":
> > > > > > > > "3+",
> > > > > > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > > > > > "versions":
> > > > > > > > > "3+",
> > > > > > > > > > > > > > >           "about": "The finalized version for
> the
> > > > > > > feature."}
> > > > > > > > > > > > > > >       ]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): With the new improved design, we have
> > > > > completely
> > > > > > > > > > > eliminated
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > need to
> > > > > > > > > > > > > > use UpdateMetadataRequest. This is because we now
> > > rely
> > > > on
> > > > > > ZK
> > > > > > > to
> > > > > > > > > > > deliver
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > notifications for changes to the '/features' ZK
> > node.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> > > > update/delete,
> > > > > > > > perhaps
> > > > > > > > > > > it's
> > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > to use enable/disable?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so
> > that
> > > > we
> > > > > > > > instead
> > > > > > > > > > call
> > > > > > > > > > > > it
> > > > > > > > > > > > > > 'disable'.
> > > > > > > > > > > > > > However for 'update', it can now also refer to
> > either
> > > > an
> > > > > > > > upgrade
> > > > > > > > > > or a
> > > > > > > > > > > > > > forced downgrade.
> > > > > > > > > > > > > > Therefore, I have left it the way it is, just
> > calling
> > > > it
> > > > > as
> > > > > > > > just
> > > > > > > > > > > > > 'update'.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
> > > > > jun@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few
> > > > comments
> > > > > > > below.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100.
> UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > > > > > 100.1 Since this request waits for responses
> from
> > > > > > brokers,
> > > > > > > > > should
> > > > > > > > > > > we
> > > > > > > > > > > > > add
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > timeout in the request (like
> createTopicRequest)?
> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> > > Typically,
> > > > > the
> > > > > > > > > response
> > > > > > > > > > > > just
> > > > > > > > > > > > > > > shows an error code and an error message,
> instead
> > > of
> > > > > > > echoing
> > > > > > > > > the
> > > > > > > > > > > > > request.
> > > > > > > > > > > > > > > 100.3 Should we add a separate request to
> > > > list/describe
> > > > > > the
> > > > > > > > > > > existing
> > > > > > > > > > > > > > > features?
> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE
> in a
> > > > > single
> > > > > > > > > request.
> > > > > > > > > > > For
> > > > > > > > > > > > > > > DELETE, the version field doesn't make sense.
> > So, I
> > > > > guess
> > > > > > > the
> > > > > > > > > > > broker
> > > > > > > > > > > > > just
> > > > > > > > > > > > > > > ignores this? An alternative way is to have a
> > > > separate
> > > > > > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > > > > > monotonically
> > > > > > > > > > > > increasing
> > > > > > > > > > > > > > > version of the metadata for finalized
> features."
> > I
> > > am
> > > > > > > > wondering
> > > > > > > > > > why
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > ordering is important?
> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for
> this
> > > new
> > > > > > > > request?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 101. For the broker registration ZK node,
> should
> > we
> > > > > bump
> > > > > > up
> > > > > > > > the
> > > > > > > > > > > > version
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the json?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we
> > need
> > > > the
> > > > > > > epoch
> > > > > > > > > > > field.
> > > > > > > > > > > > > Each
> > > > > > > > > > > > > > > ZK node has an internal version field that is
> > > > > incremented
> > > > > > > on
> > > > > > > > > > every
> > > > > > > > > > > > > > update.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a
> feature
> > > > > version
> > > > > > > > > > > cluster-wide
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > left to the discretion of the logic
> implementing
> > > the
> > > > > > > feature
> > > > > > > > > (ex:
> > > > > > > > > > > can
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > done via dynamic broker config)." Does that
> mean
> > > the
> > > > > > broker
> > > > > > > > > > > > > registration
> > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > node will be updated dynamically when this
> > happens?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > > > > > 104.1 It would be useful to describe when the
> > > feature
> > > > > > > > metadata
> > > > > > > > > is
> > > > > > > > > > > > > > included
> > > > > > > > > > > > > > > in the request. My understanding is that it's
> > only
> > > > > > included
> > > > > > > > if
> > > > > > > > > > (1)
> > > > > > > > > > > > > there
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > a change to the finalized feature; (2) broker
> > > > restart;
> > > > > > (3)
> > > > > > > > > > > controller
> > > > > > > > > > > > > > > failover.
> > > > > > > > > > > > > > > 104.2 The new fields have the following
> versions.
> > > Why
> > > > > are
> > > > > > > the
> > > > > > > > > > > > versions
> > > > > > > > > > > > > 3+
> > > > > > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > > > > > >       "fields":  [
> > > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> > > > "versions":
> > > > > > > > "3+",
> > > > > > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > > > > > "versions":
> > > > > > > > > "3+",
> > > > > > > > > > > > > > >           "about": "The finalized version for
> the
> > > > > > > feature."}
> > > > > > > > > > > > > > >       ]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> > > > update/delete,
> > > > > > > > perhaps
> > > > > > > > > > > it's
> > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > to use enable/disable?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik
> Prakasam
> > <
> > > > > > > > > > > > > kprakasam@confluent.io
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hey Boyang,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the great feedback! I have updated
> > the
> > > > KIP
> > > > > > > based
> > > > > > > > > on
> > > > > > > > > > > your
> > > > > > > > > > > > > > > > feedback.
> > > > > > > > > > > > > > > > Please find my response below for your
> > comments,
> > > > look
> > > > > > for
> > > > > > > > > > > sentences
> > > > > > > > > > > > > > > > starting
> > > > > > > > > > > > > > > > with "(Kowshik)" below.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to
> begin
> > > > > handling
> > > > > > > EOS
> > > > > > > > > > > > traffic"
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > converted as "When is it safe for the
> brokers
> > > to
> > > > > > start
> > > > > > > > > > serving
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is
> not
> > > > > > explained
> > > > > > > > > > earlier
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > context.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the
> metadata
> > > > > version
> > > > > > > > > number
> > > > > > > > > > > part
> > > > > > > > > > > > > > > seems a
> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference to
> > > later
> > > > > > > section
> > > > > > > > > > that
> > > > > > > > > > > we
> > > > > > > > > > > > > > going
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > store it in Zookeeper and update it every
> > time
> > > > when
> > > > > > > there
> > > > > > > > > is
> > > > > > > > > > a
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > > change?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a
> > > > reference
> > > > > in
> > > > > > > the
> > > > > > > > > > KIP.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although
> it's a
> > > > > > Non-goal
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > KIP,
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > features such as group coordinator
> semantics,
> > > > there
> > > > > > is
> > > > > > > no
> > > > > > > > > > legal
> > > > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > perform a downgrade at all. So having
> > downgrade
> > > > > door
> > > > > > > open
> > > > > > > > > is
> > > > > > > > > > > > pretty
> > > > > > > > > > > > > > > > > error-prone as human faults happen all the
> > > time.
> > > > > I'm
> > > > > > > > > assuming
> > > > > > > > > > > as
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > features are implemented, it's not very
> hard
> > to
> > > > > add a
> > > > > > > > flag
> > > > > > > > > > > during
> > > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > > creation to indicate whether this feature
> is
> > > > > > > > > "downgradable".
> > > > > > > > > > > > Could
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > explain a bit more on the extra engineering
> > > > effort
> > > > > > for
> > > > > > > > > > shipping
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and
> disagree
> > > > here.
> > > > > > > While
> > > > > > > > I
> > > > > > > > > > > agree
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > accidental
> > > > > > > > > > > > > > > > downgrades can cause problems, I also think
> > > > sometimes
> > > > > > > > > > downgrades
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > be allowed for emergency reasons (not all
> > > > downgrades
> > > > > > > cause
> > > > > > > > > > > issues).
> > > > > > > > > > > > > > > > It is just subjective to the feature being
> > > > > downgraded.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To be more strict about feature version
> > > > downgrades, I
> > > > > > > have
> > > > > > > > > > > modified
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > > proposing that we mandate a
> `--force-downgrade`
> > > > flag
> > > > > be
> > > > > > > > used
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > UPDATE_FEATURES api
> > > > > > > > > > > > > > > > and the tooling, whenever the human is
> > > downgrading
> > > > a
> > > > > > > > > finalized
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > version.
> > > > > > > > > > > > > > > > Hopefully this should cover the requirement,
> > > until
> > > > we
> > > > > > > find
> > > > > > > > > the
> > > > > > > > > > > need
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > advanced downgrade support.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
> > > feature
> > > > > > > > versions
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > defined
> > > > > > > > > > > > > > > > > in the broker code." So this means in order
> > to
> > > > > > > restrict a
> > > > > > > > > > > certain
> > > > > > > > > > > > > > > > feature,
> > > > > > > > > > > > > > > > > we need to start the broker first and then
> > > send a
> > > > > > > feature
> > > > > > > > > > > gating
> > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > immediately, which introduces a time gap
> and
> > > the
> > > > > > > > > > > > intended-to-close
> > > > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > > could actually serve request during this
> > phase.
> > > > Do
> > > > > > you
> > > > > > > > > think
> > > > > > > > > > we
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > support configurations as well so that
> admin
> > > user
> > > > > > could
> > > > > > > > > > freely
> > > > > > > > > > > > roll
> > > > > > > > > > > > > > up
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > cluster with all nodes complying the same
> > > feature
> > > > > > > gating,
> > > > > > > > > > > without
> > > > > > > > > > > > > > > > worrying
> > > > > > > > > > > > > > > > > about the turnaround time to propagate the
> > > > message
> > > > > > only
> > > > > > > > > after
> > > > > > > > > > > the
> > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > > starts up?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): This is a great point/question.
> One
> > of
> > > > the
> > > > > > > > > > > expectations
> > > > > > > > > > > > > out
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > this KIP, which is
> > > > > > > > > > > > > > > > already followed in the broker, is the
> > following.
> > > > > > > > > > > > > > > >  - Imagine at time T1 the broker starts up
> and
> > > > > > registers
> > > > > > > > it’s
> > > > > > > > > > > > > presence
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > ZK,
> > > > > > > > > > > > > > > >    along with advertising it’s supported
> > > features.
> > > > > > > > > > > > > > > >  - Imagine at a future time T2 the broker
> > > receives
> > > > > the
> > > > > > > > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > > > > > >    from the controller, which contains the
> > latest
> > > > > > > finalized
> > > > > > > > > > > > features
> > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > seen by
> > > > > > > > > > > > > > > >    the controller. The broker validates this
> > data
> > > > > > against
> > > > > > > > > it’s
> > > > > > > > > > > > > > supported
> > > > > > > > > > > > > > > > features to
> > > > > > > > > > > > > > > >    make sure there is no mismatch (it will
> > > shutdown
> > > > > if
> > > > > > > > there
> > > > > > > > > is
> > > > > > > > > > > an
> > > > > > > > > > > > > > > > incompatibility).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It is expected that during the time between
> > the 2
> > > > > > events
> > > > > > > T1
> > > > > > > > > and
> > > > > > > > > > > T2,
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > broker is
> > > > > > > > > > > > > > > > almost a silent entity in the cluster. It
> does
> > > not
> > > > > add
> > > > > > > any
> > > > > > > > > > value
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > cluster, or carry
> > > > > > > > > > > > > > > > out any important broker activities. By
> > > > “important”,
> > > > > I
> > > > > > > mean
> > > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > doing
> > > > > > > > > > > > > > > > mutations
> > > > > > > > > > > > > > > > on it’s persistence, not mutating critical
> > > > in-memory
> > > > > > > state,
> > > > > > > > > > won’t
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > serving
> > > > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t even
> > know
> > > > > it’s
> > > > > > > > > assigned
> > > > > > > > > > > > > > > partitions
> > > > > > > > > > > > > > > > until
> > > > > > > > > > > > > > > > it receives UpdateMetadataRequest from
> > > controller.
> > > > > > > Anything
> > > > > > > > > the
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > doing up
> > > > > > > > > > > > > > > > until this point is not damaging/useful.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I’ve clarified the above in the KIP, see this
> > new
> > > > > > > section:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
> > deleting
> > > an
> > > > > > > > existing
> > > > > > > > > > > > > Feature",
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > I misunderstood something, I thought the
> > > features
> > > > > are
> > > > > > > > > defined
> > > > > > > > > > > in
> > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > code, so admin could not really create a
> new
> > > > > feature?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): Great point! You understood this
> > > right.
> > > > > Here
> > > > > > > > > adding
> > > > > > > > > > a
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > means we are
> > > > > > > > > > > > > > > > adding a cluster-wide finalized *max* version
> > > for a
> > > > > > > feature
> > > > > > > > > > that
> > > > > > > > > > > > was
> > > > > > > > > > > > > > > > previously never finalized.
> > > > > > > > > > > > > > > > I have clarified this in the KIP now.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 6. I think we need a separate error code
> like
> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified the
> KIP
> > > > > adding
> > > > > > > the
> > > > > > > > > > above
> > > > > > > > > > > > (see
> > > > > > > > > > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the
> > alternative
> > > > > > > solution
> > > > > > > > to
> > > > > > > > > > > pass
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > feature information through Zookeeper. Is
> > that
> > > > > > > mentioned
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > KIP
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> > > > favorable?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads
> > > > finalized
> > > > > > > > feature
> > > > > > > > > > info
> > > > > > > > > > > > > > stored
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > ZK,
> > > > > > > > > > > > > > > > only during startup when it does a
> validation.
> > > When
> > > > > > > serving
> > > > > > > > > > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > > > > > > > > > broker does not read this info from ZK
> > directly.
> > > > I'd
> > > > > > > > imagine
> > > > > > > > > > the
> > > > > > > > > > > > risk
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > that it can increase
> > > > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck for
> > the
> > > > > > system.
> > > > > > > > > > Today,
> > > > > > > > > > > in
> > > > > > > > > > > > > > Kafka
> > > > > > > > > > > > > > > > we use the
> > > > > > > > > > > > > > > > controller to fan out ZK updates to brokers
> and
> > > we
> > > > > want
> > > > > > > to
> > > > > > > > > > stick
> > > > > > > > > > > to
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > pattern to avoid
> > > > > > > > > > > > > > > > the ZK read bottleneck when serving
> > > > > > `ApiVersionsRequest`.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 8. I was under the impression that user
> could
> > > > > > > configure a
> > > > > > > > > > range
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > supported versions, what's the trade-off
> for
> > > > > allowing
> > > > > > > > > single
> > > > > > > > > > > > > > finalized
> > > > > > > > > > > > > > > > > version only?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): Great question! The finalized
> > version
> > > > of a
> > > > > > > > feature
> > > > > > > > > > > > > basically
> > > > > > > > > > > > > > > > refers to
> > > > > > > > > > > > > > > > the cluster-wide finalized feature "maximum"
> > > > version.
> > > > > > For
> > > > > > > > > > > example,
> > > > > > > > > > > > if
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > 'group_coordinator' feature
> > > > > > > > > > > > > > > > has the finalized version set to 10, then, it
> > > means
> > > > > > that
> > > > > > > > > > > > cluster-wide
> > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > versions upto v10 are
> > > > > > > > > > > > > > > > supported for this feature. However, note
> that
> > if
> > > > > some
> > > > > > > > > version
> > > > > > > > > > > (ex:
> > > > > > > > > > > > > v0)
> > > > > > > > > > > > > > > > gets deprecated
> > > > > > > > > > > > > > > > for this feature, then we don’t convey that
> > using
> > > > > this
> > > > > > > > scheme
> > > > > > > > > > > (also
> > > > > > > > > > > > > > > > supporting deprecation is a non-goal).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all
> > > points,
> > > > > > > > refering
> > > > > > > > > to
> > > > > > > > > > > > > > finalized
> > > > > > > > > > > > > > > > feature "maximum" versions.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> > > > > "client"
> > > > > > > here
> > > > > > > > > may
> > > > > > > > > > > be
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > > > > > > > > > reluctanthero104@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hey Kowshik,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple of
> > > > > > questions:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to
> begin
> > > > > handling
> > > > > > > EOS
> > > > > > > > > > > > traffic"
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > converted as "When is it safe for the
> brokers
> > > to
> > > > > > start
> > > > > > > > > > serving
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is
> not
> > > > > > explained
> > > > > > > > > > earlier
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > context.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the
> metadata
> > > > > version
> > > > > > > > > number
> > > > > > > > > > > part
> > > > > > > > > > > > > > > seems a
> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference to
> > > later
> > > > > > > section
> > > > > > > > > > that
> > > > > > > > > > > we
> > > > > > > > > > > > > > going
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > store it in Zookeeper and update it every
> > time
> > > > when
> > > > > > > there
> > > > > > > > > is
> > > > > > > > > > a
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > > change?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although
> it's a
> > > > > > Non-goal
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > KIP,
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > features such as group coordinator
> semantics,
> > > > there
> > > > > > is
> > > > > > > no
> > > > > > > > > > legal
> > > > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > perform a downgrade at all. So having
> > downgrade
> > > > > door
> > > > > > > open
> > > > > > > > > is
> > > > > > > > > > > > pretty
> > > > > > > > > > > > > > > > > error-prone as human faults happen all the
> > > time.
> > > > > I'm
> > > > > > > > > assuming
> > > > > > > > > > > as
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > features are implemented, it's not very
> hard
> > to
> > > > > add a
> > > > > > > > flag
> > > > > > > > > > > during
> > > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > > creation to indicate whether this feature
> is
> > > > > > > > > "downgradable".
> > > > > > > > > > > > Could
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > explain a bit more on the extra engineering
> > > > effort
> > > > > > for
> > > > > > > > > > shipping
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
> > > feature
> > > > > > > > versions
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > defined
> > > > > > > > > > > > > > > > > in the broker code." So this means in order
> > to
> > > > > > > restrict a
> > > > > > > > > > > certain
> > > > > > > > > > > > > > > > feature,
> > > > > > > > > > > > > > > > > we need to start the broker first and then
> > > send a
> > > > > > > feature
> > > > > > > > > > > gating
> > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > immediately, which introduces a time gap
> and
> > > the
> > > > > > > > > > > > intended-to-close
> > > > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > > could actually serve request during this
> > phase.
> > > > Do
> > > > > > you
> > > > > > > > > think
> > > > > > > > > > we
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > support configurations as well so that
> admin
> > > user
> > > > > > could
> > > > > > > > > > freely
> > > > > > > > > > > > roll
> > > > > > > > > > > > > > up
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > cluster with all nodes complying the same
> > > feature
> > > > > > > gating,
> > > > > > > > > > > without
> > > > > > > > > > > > > > > > worrying
> > > > > > > > > > > > > > > > > about the turnaround time to propagate the
> > > > message
> > > > > > only
> > > > > > > > > after
> > > > > > > > > > > the
> > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > > starts up?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
> > deleting
> > > an
> > > > > > > > existing
> > > > > > > > > > > > > Feature",
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > I misunderstood something, I thought the
> > > features
> > > > > are
> > > > > > > > > defined
> > > > > > > > > > > in
> > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > code, so admin could not really create a
> new
> > > > > feature?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 6. I think we need a separate error code
> like
> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the
> > alternative
> > > > > > > solution
> > > > > > > > to
> > > > > > > > > > > pass
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > feature information through Zookeeper. Is
> > that
> > > > > > > mentioned
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > KIP
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> > > > favorable?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 8. I was under the impression that user
> could
> > > > > > > configure a
> > > > > > > > > > range
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > supported versions, what's the trade-off
> for
> > > > > allowing
> > > > > > > > > single
> > > > > > > > > > > > > > finalized
> > > > > > > > > > > > > > > > > version only?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> > > > > "client"
> > > > > > > here
> > > > > > > > > may
> > > > > > > > > > > be
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Boyang
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin
> McCabe
> > <
> > > > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik
> > > > Prakasam
> > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the feedback! I've changed
> the
> > > KIP
> > > > > to
> > > > > > > > > address
> > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > suggestions.
> > > > > > > > > > > > > > > > > > > Please find below my explanation. Here
> > is a
> > > > > link
> > > > > > to
> > > > > > > > KIP
> > > > > > > > > > > 584:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 1. '__data_version__' is the version of
> > the
> > > > > > > finalized
> > > > > > > > > > > feature
> > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > (i.e. actual ZK node contents), while
> the
> > > > > > > > > > > > '__schema_version__'
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > version of the schema of the data
> > persisted
> > > > in
> > > > > > ZK.
> > > > > > > > > These
> > > > > > > > > > > > serve
> > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > purposes. '__data_version__' is is
> useful
> > > > > mainly
> > > > > > to
> > > > > > > > > > clients
> > > > > > > > > > > > > > during
> > > > > > > > > > > > > > > > > reads,
> > > > > > > > > > > > > > > > > > > to differentiate between the 2 versions
> > of
> > > > > > > eventually
> > > > > > > > > > > > > consistent
> > > > > > > > > > > > > > > > > > 'finalized
> > > > > > > > > > > > > > > > > > > features' metadata (i.e. larger
> metadata
> > > > > version
> > > > > > is
> > > > > > > > > more
> > > > > > > > > > > > > recent).
> > > > > > > > > > > > > > > > > > > '__schema_version__' provides an
> > additional
> > > > > > degree
> > > > > > > of
> > > > > > > > > > > > > > flexibility,
> > > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > we decide to change the schema for
> > > > '/features'
> > > > > > node
> > > > > > > > in
> > > > > > > > > ZK
> > > > > > > > > > > (in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > future),
> > > > > > > > > > > > > > > > > > > then we can manage broker roll outs
> > > suitably
> > > > > > (i.e.
> > > > > > > > > > > > > > > > > > > serialization/deserialization of the ZK
> > > data
> > > > > can
> > > > > > be
> > > > > > > > > > handled
> > > > > > > > > > > > > > > safely).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Kowshik,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > If you're talking about a number that
> lets
> > > you
> > > > > know
> > > > > > > if
> > > > > > > > > data
> > > > > > > > > > > is
> > > > > > > > > > > > > more
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > less recent, we would typically call that
> > an
> > > > > epoch,
> > > > > > > and
> > > > > > > > > > not a
> > > > > > > > > > > > > > > version.
> > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > the ZK data structures, the word
> "version"
> > is
> > > > > > > typically
> > > > > > > > > > > > reserved
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > describing changes to the overall schema
> of
> > > the
> > > > > > data
> > > > > > > > that
> > > > > > > > > > is
> > > > > > > > > > > > > > written
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > ZooKeeper.  We don't even really change
> the
> > > > > > "version"
> > > > > > > > of
> > > > > > > > > > > those
> > > > > > > > > > > > > > > schemas
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > much, since most changes are
> > > > > backwards-compatible.
> > > > > > > But
> > > > > > > > > we
> > > > > > > > > > do
> > > > > > > > > > > > > > include
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > version field just in case.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I don't think we really need an epoch
> here,
> > > > > though,
> > > > > > > > since
> > > > > > > > > > we
> > > > > > > > > > > > can
> > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > at the broker epoch.  Whenever the broker
> > > > > > registers,
> > > > > > > > its
> > > > > > > > > > > epoch
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > greater than the previous broker epoch.
> > And
> > > > the
> > > > > > > newly
> > > > > > > > > > > > registered
> > > > > > > > > > > > > > > data
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > take priority.  This will be a lot
> simpler
> > > than
> > > > > > > adding
> > > > > > > > a
> > > > > > > > > > > > separate
> > > > > > > > > > > > > > > epoch
> > > > > > > > > > > > > > > > > > system, I think.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 2. Regarding admin client needing min
> and
> > > max
> > > > > > > > > > information -
> > > > > > > > > > > > you
> > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > right!
> > > > > > > > > > > > > > > > > > > I've changed the KIP such that the
> Admin
> > > API
> > > > > also
> > > > > > > > > allows
> > > > > > > > > > > the
> > > > > > > > > > > > > user
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > read
> > > > > > > > > > > > > > > > > > > 'supported features' from a specific
> > > broker.
> > > > > > Please
> > > > > > > > > look
> > > > > > > > > > at
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > section
> > > > > > > > > > > > > > > > > > > "Admin API changes".
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs
> `Long`
> > -
> > > it
> > > > > was
> > > > > > > not
> > > > > > > > > > > > > deliberate.
> > > > > > > > > > > > > > > > I've
> > > > > > > > > > > > > > > > > > > improved the KIP to just use `long` at
> > all
> > > > > > places.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Sounds good.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand
> > > tool
> > > > -
> > > > > > you
> > > > > > > > are
> > > > > > > > > > > right!
> > > > > > > > > > > > > > I've
> > > > > > > > > > > > > > > > > > updated
> > > > > > > > > > > > > > > > > > > the KIP sketching the functionality
> > > provided
> > > > by
> > > > > > > this
> > > > > > > > > > tool,
> > > > > > > > > > > > with
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > examples. Please look at the section
> > > "Tooling
> > > > > > > support
> > > > > > > > > > > > > examples".
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin
> > > > McCabe <
> > > > > > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > In the "Schema" section, do we really
> > > need
> > > > > both
> > > > > > > > > > > > > > > __schema_version__
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > __data_version__?  Can we just have a
> > > > single
> > > > > > > > version
> > > > > > > > > > > field
> > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function
> > have
> > > > > some
> > > > > > > way
> > > > > > > > to
> > > > > > > > > > get
> > > > > > > > > > > > the
> > > > > > > > > > > > > > min
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > > information that we're exposing as
> > > well?  I
> > > > > > guess
> > > > > > > > we
> > > > > > > > > > > could
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > min,
> > > > > > > > > > > > > > > > > > max,
> > > > > > > > > > > > > > > > > > > > and current.  Unrelated: is the use
> of
> > > Long
> > > > > > > rather
> > > > > > > > > than
> > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > deliberate
> > > > > > > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > It would be good to describe how the
> > > > command
> > > > > > line
> > > > > > > > > tool
> > > > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work.
> > > For
> > > > > > > example
> > > > > > > > > the
> > > > > > > > > > > > flags
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > take and the output that it will
> > generate
> > > > to
> > > > > > > > STDOUT.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08,
> Kowshik
> > > > > > Prakasam
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I've opened KIP-584
> > > > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584
> >
> > <
> > > > > > > > > > > > > > > >
> https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > is intended to provide a versioning
> > > > scheme
> > > > > > for
> > > > > > > > > > > features.
> > > > > > > > > > > > > I'd
> > > > > > > > > > > > > > > like
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > this thread to discuss the same.
> I'd
> > > > > > appreciate
> > > > > > > > any
> > > > > > > > > > > > > feedback
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > > > > Here
> > > > > > > > > > > > > > > > > > > > > is a link to KIP-584
> > > > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584
> >:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > > > > > >  .
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Jun Rao <ju...@confluent.io>.
Hi, Kowshik,

Looks good to me now. Just a couple of minor things below.

200. In the validation section, there is still the text  "*from*
{"max_version_level":
X} *to* {"max_version_level": X’}". It seems that it should say "from X to
Y"?

110. Could we add that we need to document the bumped version of each
feature in the upgrade section of a release?

Thanks,

Jun

On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> Thank you for the suggestion! I have updated the KIP, please find my
> response below.
>
> > 200. I guess you are saying only when the allowDowngrade field is set,
> the
> > finalized feature version can go backward. Otherwise, it can only go up.
> > That makes sense. It would be useful to make that clear when explaining
> > the usage of the allowDowngrade field. In the validation section, we
> have  "
> > /features' from {"max_version_level": X} to {"max_version_level": X’}",
> it
> > seems that we need to mention Y there.
>
> (Kowshik): Great point! Yes, that is correct. Done, I have updated the
> validations
> section explaining the above. Here is a link to this section:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
>
>
> Cheers,
> Kowshik
>
>
>
>
> On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > 200. I guess you are saying only when the allowDowngrade field is set,
> the
> > finalized feature version can go backward. Otherwise, it can only go up.
> > That makes sense. It would be useful to make that clear when explaining
> > the usage of the allowDowngrade field. In the validation section, we have
> > "
> > /features' from {"max_version_level": X} to {"max_version_level": X’}",
> it
> > seems that we need to mention Y there.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <
> kprakasam@confluent.io>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Great question! Please find my response below.
> > >
> > > > 200. My understanding is that If the CLI tool passes the
> > > > '--allow-downgrade' flag when updating a specific feature, then a
> > future
> > > > downgrade is possible. Otherwise, the feature is now downgradable. If
> > so,
> > > I
> > > > was wondering how the controller remembers this since it can be
> > restarted
> > > > over time?
> > >
> > > (Kowshik): The purpose of the flag was to just restrict the user intent
> > for
> > > a specific request.
> > > It seems to me that to avoid confusion, I could call the flag as
> > > `--try-downgrade` instead.
> > > Then this makes it clear, that, the controller just has to consider the
> > ask
> > > from
> > > the user as an explicit request to attempt a downgrade.
> > >
> > > The flag does not act as an override on controller's decision making
> that
> > > decides whether
> > > a flag is downgradable (these decisions on whether to allow a flag to
> be
> > > downgraded
> > > from a specific version level, can be embedded in the controller code).
> > >
> > > Please let me know what you think.
> > > Sorry if I misunderstood the original question.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > >
> > > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the reply. Makes sense. Just one more question.
> > > >
> > > > 200. My understanding is that If the CLI tool passes the
> > > > '--allow-downgrade' flag when updating a specific feature, then a
> > future
> > > > downgrade is possible. Otherwise, the feature is now downgradable. If
> > > so, I
> > > > was wondering how the controller remembers this since it can be
> > restarted
> > > > over time?
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <
> > kprakasam@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks a lot for the feedback and the questions!
> > > > > Please find my response below.
> > > > >
> > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field.
> It
> > > > seems
> > > > > > that field needs to be persisted somewhere in ZK?
> > > > >
> > > > > (Kowshik): Great question! Below is my explanation. Please help me
> > > > > understand,
> > > > > if you feel there are cases where we would need to still persist it
> > in
> > > > ZK.
> > > > >
> > > > > Firstly I have updated my thoughts into the KIP now, under the
> > > > 'guidelines'
> > > > > section:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > > > >
> > > > > The allowDowngrade boolean field is just to restrict the user
> intent,
> > > and
> > > > > to remind
> > > > > them to double check their intent before proceeding. It should be
> set
> > > to
> > > > > true
> > > > > by the user in a request, only when the user intent is to
> forcefully
> > > > > "attempt" a
> > > > > downgrade of a specific feature's max version level, to the
> provided
> > > > value
> > > > > in
> > > > > the request.
> > > > >
> > > > > We can extend this safeguard. The controller (on it's end) can
> > maintain
> > > > > rules in the code, that, for safety reasons would outright reject
> > > certain
> > > > > downgrades
> > > > > from a specific max_version_level for a specific feature. Such
> > > rejections
> > > > > may
> > > > > happen depending on the feature being downgraded, and from what
> > version
> > > > > level.
> > > > >
> > > > > The CLI tool only allows a downgrade attempt in conjunction with
> > > specific
> > > > > flags and sub-commands. For example, in the CLI tool, if the user
> > uses
> > > > the
> > > > > 'downgrade-all' command, or passes '--allow-downgrade' flag when
> > > > updating a
> > > > > specific feature, only then the tool will translate this ask to
> > setting
> > > > > 'allowDowngrade' field in the request to the server.
> > > > >
> > > > > > 201. UpdateFeaturesResponse has the following top level fields.
> > > Should
> > > > > > those fields be per feature?
> > > > > >
> > > > > >   "fields": [
> > > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > > > > >       "about": "The error code, or 0 if there was no error." },
> > > > > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> > > > > >       "about": "The error message, or null if there was no
> error."
> > }
> > > > > >   ]
> > > > >
> > > > > (Kowshik): Great question!
> > > > > As such, the API is transactional, as explained in the sections
> > linked
> > > > > below.
> > > > > Either all provided FeatureUpdate was applied, or none.
> > > > > It's the reason I felt we can have just one error code + message.
> > > > > Happy to extend this if you feel otherwise. Please let me know.
> > > > >
> > > > > Link to sections:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
> > > > >
> > > > > > 202. The /features path in ZK has a field min_version_level.
> Which
> > > API
> > > > > and
> > > > > > tool can change that value?
> > > > >
> > > > > (Kowshik): Great question! Currently this cannot be modified by
> using
> > > the
> > > > > API or the tool.
> > > > > Feature version deprecation (by raising min_version_level) can be
> > done
> > > > only
> > > > > by the Controller directly. The rationale is explained in this
> > section:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Kowshik,
> > > > > >
> > > > > > Thanks for addressing those comments. Just a few more minor
> > comments.
> > > > > >
> > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field.
> It
> > > > seems
> > > > > > that field needs to be persisted somewhere in ZK?
> > > > > >
> > > > > > 201. UpdateFeaturesResponse has the following top level fields.
> > > Should
> > > > > > those fields be per feature?
> > > > > >
> > > > > >   "fields": [
> > > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > > > > >       "about": "The error code, or 0 if there was no error." },
> > > > > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> > > > > >       "about": "The error message, or null if there was no
> error."
> > }
> > > > > >   ]
> > > > > >
> > > > > > 202. The /features path in ZK has a field min_version_level.
> Which
> > > API
> > > > > and
> > > > > > tool can change that value?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
> > > > kprakasam@confluent.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Thanks for the feedback! I have updated the KIP-584 addressing
> > your
> > > > > > > comments.
> > > > > > > Please find my response below.
> > > > > > >
> > > > > > > > 100.6 You can look for the sentence "This operation requires
> > > ALTER
> > > > on
> > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > > > KafkaApis.authorize().
> > > > > > >
> > > > > > > (Kowshik): Done. Great point! For the newly introduced
> > > > UPDATE_FEATURES
> > > > > > api,
> > > > > > > I have added a
> > > > > > > requirement that AclOperation.ALTER is required on
> > > > > ResourceType.CLUSTER.
> > > > > > >
> > > > > > > > 110. Keeping the feature version as int is probably fine. I
> > just
> > > > felt
> > > > > > > that
> > > > > > > > for some of the common user interactions, it's more
> convenient
> > to
> > > > > > > > relate that to a release version. For example, if a user
> wants
> > to
> > > > > > > downgrade
> > > > > > > > to a release 2.5, it's easier for the user to use the tool
> like
> > > > "tool
> > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
> > > --version
> > > > > 6".
> > > > > > >
> > > > > > > (Kowshik): Great point. Generally, maximum feature version
> levels
> > > are
> > > > > not
> > > > > > > downgradable after
> > > > > > > they are finalized in the cluster. This is because, as a
> > guideline
> > > > > > bumping
> > > > > > > feature version level usually is used mainly to convey
> important
> > > > > breaking
> > > > > > > changes.
> > > > > > > Despite the above, there may be some extreme/rare cases where a
> > > user
> > > > > > wants
> > > > > > > to downgrade
> > > > > > > all features to a specific previous release. The user may want
> to
> > > do
> > > > > this
> > > > > > > just
> > > > > > > prior to rolling back a Kafka cluster to a previous release.
> > > > > > >
> > > > > > > To support the above, I have made a change to the KIP
> explaining
> > > that
> > > > > the
> > > > > > > CLI tool is versioned.
> > > > > > > The CLI tool internally has knowledge about a map of features
> to
> > > > their
> > > > > > > respective max
> > > > > > > versions supported by the Broker. The tool's knowledge of
> > features
> > > > and
> > > > > > > their version values,
> > > > > > > is limited to the version of the CLI tool itself i.e. the
> > > information
> > > > > is
> > > > > > > packaged into the CLI tool
> > > > > > > when it is released. Whenever a Kafka release introduces a new
> > > > feature
> > > > > > > version, or modifies
> > > > > > > an existing feature version, the CLI tool shall also be updated
> > > with
> > > > > this
> > > > > > > information,
> > > > > > > Newer versions of the CLI tool will be released as part of the
> > > Kafka
> > > > > > > releases.
> > > > > > >
> > > > > > > Therefore, to achieve the downgrade need, the user just needs
> to
> > > run
> > > > > the
> > > > > > > version of
> > > > > > > the CLI tool that's part of the particular previous release
> that
> > > > he/she
> > > > > > is
> > > > > > > downgrading to.
> > > > > > > To help the user with this, there is a new command added to the
> > CLI
> > > > > tool
> > > > > > > called `downgrade-all`.
> > > > > > > This essentially downgrades max version levels of all features
> in
> > > the
> > > > > > > cluster to the versions
> > > > > > > known to the CLI tool internally.
> > > > > > >
> > > > > > > I have explained the above in the KIP under these sections:
> > > > > > >
> > > > > > > Tooling support (have explained that the CLI tool is
> versioned):
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > > > >
> > > > > > > Regular CLI tool usage (please refer to point #3, and see the
> > > tooling
> > > > > > > example)
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > > > > >
> > > > > > > > 110. Similarly, if the client library finds a feature
> mismatch
> > > with
> > > > > the
> > > > > > > broker,
> > > > > > > > the client likely needs to log some error message for the
> user
> > to
> > > > > take
> > > > > > > some
> > > > > > > > actions. It's much more actionable if the error message is
> > > "upgrade
> > > > > the
> > > > > > > > broker to release version 2.6" than just "upgrade the broker
> to
> > > > > feature
> > > > > > > > version 7".
> > > > > > >
> > > > > > > (Kowshik): That's a really good point! If we use ints for
> feature
> > > > > > versions,
> > > > > > > the best
> > > > > > > message that client can print for debugging is "broker doesn't
> > > > support
> > > > > > > feature version 7", and alongside that print the supported
> > version
> > > > > range
> > > > > > > returned
> > > > > > > by the broker. Then, does it sound reasonable that the user
> could
> > > > then
> > > > > > > reference
> > > > > > > Kafka release logs to figure out which version of the broker
> > > release
> > > > is
> > > > > > > required
> > > > > > > be deployed, to support feature version 7? I couldn't think of
> a
> > > > better
> > > > > > > strategy here.
> > > > > > >
> > > > > > > > 120. When should a developer bump up the version of a
> feature?
> > > > > > >
> > > > > > > (Kowshik): Great question! In the KIP, I have added a section:
> > > > > > 'Guidelines
> > > > > > > on feature versions and workflows'
> > > > > > > providing some guidelines on when to use the versioned feature
> > > flags,
> > > > > and
> > > > > > > what
> > > > > > > are the regular workflows with the CLI tool.
> > > > > > >
> > > > > > > Link to the relevant sections:
> > > > > > > Guidelines:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > > > > > >
> > > > > > > Regular CLI tool usage:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > > > > >
> > > > > > > Advanced CLI tool usage:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Kowshik,
> > > > > > > >
> > > > > > > > Thanks for the reply. A few more comments.
> > > > > > > >
> > > > > > > > 110. Keeping the feature version as int is probably fine. I
> > just
> > > > felt
> > > > > > > that
> > > > > > > > for some of the common user interactions, it's more
> convenient
> > to
> > > > > > > > relate that to a release version. For example, if a user
> wants
> > to
> > > > > > > downgrade
> > > > > > > > to a release 2.5, it's easier for the user to use the tool
> like
> > > > "tool
> > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
> > > --version
> > > > > 6".
> > > > > > > > Similarly, if the client library finds a feature mismatch
> with
> > > the
> > > > > > > broker,
> > > > > > > > the client likely needs to log some error message for the
> user
> > to
> > > > > take
> > > > > > > some
> > > > > > > > actions. It's much more actionable if the error message is
> > > "upgrade
> > > > > the
> > > > > > > > broker to release version 2.6" than just "upgrade the broker
> to
> > > > > feature
> > > > > > > > version 7".
> > > > > > > >
> > > > > > > > 111. Sounds good.
> > > > > > > >
> > > > > > > > 120. When should a developer bump up the version of a
> feature?
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> > > > > > kprakasam@confluent.io
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > I have updated the KIP for the item 111.
> > > > > > > > > I'm in the process of addressing 100.6, and will provide an
> > > > update
> > > > > > > soon.
> > > > > > > > > I think item 110 is still under discussion given we are now
> > > > > > providing a
> > > > > > > > way
> > > > > > > > > to finalize
> > > > > > > > > all features to their latest version levels. In any case,
> > > please
> > > > > let
> > > > > > us
> > > > > > > > > know
> > > > > > > > > how you feel in response to Colin's comments on this topic.
> > > > > > > > >
> > > > > > > > > > 111. To put this in context, when we had IBP, the default
> > > value
> > > > > is
> > > > > > > the
> > > > > > > > > > current released version. So, if you are a brand new
> user,
> > > you
> > > > > > don't
> > > > > > > > need
> > > > > > > > > > to configure IBP and all new features will be immediately
> > > > > available
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > new cluster. If you are upgrading from an old version,
> you
> > do
> > > > > need
> > > > > > to
> > > > > > > > > > understand and configure IBP. I see a similar pattern
> here
> > > for
> > > > > > > > > > features. From the ease of use perspective, ideally, we
> > > > shouldn't
> > > > > > > > require
> > > > > > > > > a
> > > > > > > > > > new user to have an extra step such as running a
> bootstrap
> > > > script
> > > > > > > > unless
> > > > > > > > > > it's truly necessary. If someone has a special need (all
> > the
> > > > > cases
> > > > > > > you
> > > > > > > > > > mentioned seem special cases?), they can configure a mode
> > > such
> > > > > that
> > > > > > > > > > features are enabled/disabled manually.
> > > > > > > > >
> > > > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if
> I
> > > > didn't
> > > > > > > > > understand
> > > > > > > > > this need earlier. I have updated the KIP with the approach
> > > that
> > > > > > > whenever
> > > > > > > > > the '/features' node is absent, the controller by default
> > will
> > > > > > > bootstrap
> > > > > > > > > the node
> > > > > > > > > to contain the latest feature levels. Here is the new
> section
> > > in
> > > > > the
> > > > > > > KIP
> > > > > > > > > describing
> > > > > > > > > the same:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > > > > > > > >
> > > > > > > > > Next, as I explained in my response to Colin's suggestions,
> > we
> > > > are
> > > > > > now
> > > > > > > > > providing a `--finalize-latest-features` flag with the
> > tooling.
> > > > > This
> > > > > > > lets
> > > > > > > > > the sysadmin finalize all features known to the controller
> to
> > > > their
> > > > > > > > latest
> > > > > > > > > version
> > > > > > > > > levels. Please look at this section (point #3 and the
> tooling
> > > > > example
> > > > > > > > > later):
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Do you feel this addresses your comment/concern?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Kowshik,
> > > > > > > > > >
> > > > > > > > > > Thanks for the reply. A few more replies below.
> > > > > > > > > >
> > > > > > > > > > 100.6 You can look for the sentence "This operation
> > requires
> > > > > ALTER
> > > > > > on
> > > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > > > > > KafkaApis.authorize().
> > > > > > > > > >
> > > > > > > > > > 110. From the external client/tooling perspective, it's
> > more
> > > > > > natural
> > > > > > > to
> > > > > > > > > use
> > > > > > > > > > the release version for features. If we can use the same
> > > > release
> > > > > > > > version
> > > > > > > > > > for internal representation, it seems simpler (easier to
> > > > > > understand,
> > > > > > > no
> > > > > > > > > > mapping overhead, etc). Is there a benefit with separate
> > > > external
> > > > > > and
> > > > > > > > > > internal versioning schemes?
> > > > > > > > > >
> > > > > > > > > > 111. To put this in context, when we had IBP, the default
> > > value
> > > > > is
> > > > > > > the
> > > > > > > > > > current released version. So, if you are a brand new
> user,
> > > you
> > > > > > don't
> > > > > > > > need
> > > > > > > > > > to configure IBP and all new features will be immediately
> > > > > available
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > new cluster. If you are upgrading from an old version,
> you
> > do
> > > > > need
> > > > > > to
> > > > > > > > > > understand and configure IBP. I see a similar pattern
> here
> > > for
> > > > > > > > > > features. From the ease of use perspective, ideally, we
> > > > shouldn't
> > > > > > > > > require a
> > > > > > > > > > new user to have an extra step such as running a
> bootstrap
> > > > script
> > > > > > > > unless
> > > > > > > > > > it's truly necessary. If someone has a special need (all
> > the
> > > > > cases
> > > > > > > you
> > > > > > > > > > mentioned seem special cases?), they can configure a mode
> > > such
> > > > > that
> > > > > > > > > > features are enabled/disabled manually.
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> > > > > > > > kprakasam@confluent.io>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Jun,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the feedback and suggestions. Please find my
> > > > > response
> > > > > > > > below.
> > > > > > > > > > >
> > > > > > > > > > > > 100.6 For every new request, the admin needs to
> control
> > > who
> > > > > is
> > > > > > > > > allowed
> > > > > > > > > > to
> > > > > > > > > > > > issue that request if security is enabled. So, we
> need
> > to
> > > > > > assign
> > > > > > > > the
> > > > > > > > > > new
> > > > > > > > > > > > request a ResourceType and possible AclOperations.
> See
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > > > > as an example.
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): I don't see any reference to the words
> > > > ResourceType
> > > > > or
> > > > > > > > > > > AclOperations
> > > > > > > > > > > in the KIP. Please let me know how I can use the KIP
> that
> > > you
> > > > > > > linked
> > > > > > > > to
> > > > > > > > > > > know how to
> > > > > > > > > > > setup the appropriate ResourceType and/or
> > ClusterOperation?
> > > > > > > > > > >
> > > > > > > > > > > > 105. If we change delete to disable, it's better to
> do
> > > this
> > > > > > > > > > consistently
> > > > > > > > > > > in
> > > > > > > > > > > > request protocol and admin api as well.
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): The API shouldn't be called 'disable' when
> it
> > is
> > > > > > > deleting
> > > > > > > > a
> > > > > > > > > > > feature.
> > > > > > > > > > > I've just changed the KIP to use 'delete'. I don't
> have a
> > > > > strong
> > > > > > > > > > > preference.
> > > > > > > > > > >
> > > > > > > > > > > > 110. The minVersion/maxVersion for features use
> int64.
> > > > > > Currently,
> > > > > > > > our
> > > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
> > > 2.5.0).
> > > > > It's
> > > > > > > > > > possible
> > > > > > > > > > > > for new features to be included in minor releases
> too.
> > > > Should
> > > > > > we
> > > > > > > > make
> > > > > > > > > > the
> > > > > > > > > > > > feature versioning match the release versioning?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): The release version can be mapped to a set
> of
> > > > > feature
> > > > > > > > > > versions,
> > > > > > > > > > > and this can be done, for example in the tool (or even
> > > > external
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > tool).
> > > > > > > > > > > Can you please clarify what I'm missing?
> > > > > > > > > > >
> > > > > > > > > > > > 111. "During regular operations, the data in the ZK
> > node
> > > > can
> > > > > be
> > > > > > > > > mutated
> > > > > > > > > > > > only via a specific admin API served only by the
> > > > > controller." I
> > > > > > > am
> > > > > > > > > > > > wondering why can't the controller auto finalize a
> > > feature
> > > > > > > version
> > > > > > > > > > after
> > > > > > > > > > > > all brokers are upgraded? For new users who download
> > the
> > > > > latest
> > > > > > > > > version
> > > > > > > > > > > to
> > > > > > > > > > > > build a new cluster, it's inconvenient for them to
> have
> > > to
> > > > > > > manually
> > > > > > > > > > > enable
> > > > > > > > > > > > each feature.
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): I agree that there is a trade-off here, but
> it
> > > > will
> > > > > > help
> > > > > > > > > > > to decide whether the automation can be thought through
> > in
> > > > the
> > > > > > > future
> > > > > > > > > > > in a follow up KIP, or right now in this KIP. We may
> > invest
> > > > > > > > > > > in automation, but we have to decide whether we should
> do
> > > it
> > > > > > > > > > > now or later.
> > > > > > > > > > >
> > > > > > > > > > > For the inconvenience that you mentioned, do you think
> > the
> > > > > > problem
> > > > > > > > that
> > > > > > > > > > you
> > > > > > > > > > > mentioned can be  overcome by asking for the cluster
> > > operator
> > > > > to
> > > > > > > run
> > > > > > > > a
> > > > > > > > > > > bootstrap script  when he/she knows that a specific AK
> > > > release
> > > > > > has
> > > > > > > > been
> > > > > > > > > > > almost completely deployed in a cluster for the first
> > time?
> > > > > Idea
> > > > > > is
> > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > bootstrap script will know how to map a specific AK
> > release
> > > > to
> > > > > > > > > finalized
> > > > > > > > > > > feature versions, and run the `kafka-features.sh` tool
> > > > > > > appropriately
> > > > > > > > > > > against
> > > > > > > > > > > the cluster.
> > > > > > > > > > >
> > > > > > > > > > > Now, coming back to your automation proposal/question.
> > > > > > > > > > > I do see the value of automated feature version
> > > finalization,
> > > > > > but I
> > > > > > > > > also
> > > > > > > > > > > see
> > > > > > > > > > > that this will open up several questions and some
> risks,
> > as
> > > > > > > explained
> > > > > > > > > > > below.
> > > > > > > > > > > The answers to these depend on the definition of the
> > > > automation
> > > > > > we
> > > > > > > > > choose
> > > > > > > > > > > to build, and how well does it fit into a kafka
> > deployment.
> > > > > > > > > > > Basically, it can be unsafe for the controller to
> > finalize
> > > > > > feature
> > > > > > > > > > version
> > > > > > > > > > > upgrades automatically, without learning about the
> intent
> > > of
> > > > > the
> > > > > > > > > cluster
> > > > > > > > > > > operator.
> > > > > > > > > > > 1. We would sometimes want to lock feature versions
> only
> > > when
> > > > > we
> > > > > > > have
> > > > > > > > > > > externally verified
> > > > > > > > > > > the stability of the broker binary.
> > > > > > > > > > > 2. Sometimes only the cluster operator knows that a
> > cluster
> > > > > > upgrade
> > > > > > > > is
> > > > > > > > > > > complete,
> > > > > > > > > > > and new brokers are highly unlikely to join the
> cluster.
> > > > > > > > > > > 3. Only the cluster operator knows that the intent is
> to
> > > > deploy
> > > > > > the
> > > > > > > > > same
> > > > > > > > > > > version
> > > > > > > > > > > of the new broker release across the entire cluster
> (i.e.
> > > the
> > > > > > > latest
> > > > > > > > > > > downloaded version).
> > > > > > > > > > > 4. For downgrades, it appears the controller still
> needs
> > > some
> > > > > > > > external
> > > > > > > > > > > input
> > > > > > > > > > > (such as the proposed tool) to finalize a feature
> version
> > > > > > > downgrade.
> > > > > > > > > > >
> > > > > > > > > > > If we have automation, that automation can end up
> failing
> > > in
> > > > > some
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > cases
> > > > > > > > > > > above. Then, we need a way to declare that the cluster
> is
> > > > "not
> > > > > > > ready"
> > > > > > > > > if
> > > > > > > > > > > the
> > > > > > > > > > > controller cannot automatically finalize some basic
> > > required
> > > > > > > feature
> > > > > > > > > > > version
> > > > > > > > > > > upgrades across the cluster. We need to make the
> cluster
> > > > > operator
> > > > > > > > aware
> > > > > > > > > > in
> > > > > > > > > > > such a scenario (raise an alert or alike).
> > > > > > > > > > >
> > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
> should
> > > be
> > > > 49
> > > > > > > > instead
> > > > > > > > > > of
> > > > > > > > > > > 48.
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Done.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Kowshik
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <
> > jun@confluent.io>
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the reply. A few more comments below.
> > > > > > > > > > > >
> > > > > > > > > > > > 100.6 For every new request, the admin needs to
> control
> > > who
> > > > > is
> > > > > > > > > allowed
> > > > > > > > > > to
> > > > > > > > > > > > issue that request if security is enabled. So, we
> need
> > to
> > > > > > assign
> > > > > > > > the
> > > > > > > > > > new
> > > > > > > > > > > > request a ResourceType and possible AclOperations.
> See
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > > > > as
> > > > > > > > > > > > an example.
> > > > > > > > > > > >
> > > > > > > > > > > > 105. If we change delete to disable, it's better to
> do
> > > this
> > > > > > > > > > consistently
> > > > > > > > > > > in
> > > > > > > > > > > > request protocol and admin api as well.
> > > > > > > > > > > >
> > > > > > > > > > > > 110. The minVersion/maxVersion for features use
> int64.
> > > > > > Currently,
> > > > > > > > our
> > > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
> > > 2.5.0).
> > > > > It's
> > > > > > > > > > possible
> > > > > > > > > > > > for new features to be included in minor releases
> too.
> > > > Should
> > > > > > we
> > > > > > > > make
> > > > > > > > > > the
> > > > > > > > > > > > feature versioning match the release versioning?
> > > > > > > > > > > >
> > > > > > > > > > > > 111. "During regular operations, the data in the ZK
> > node
> > > > can
> > > > > be
> > > > > > > > > mutated
> > > > > > > > > > > > only via a specific admin API served only by the
> > > > > controller." I
> > > > > > > am
> > > > > > > > > > > > wondering why can't the controller auto finalize a
> > > feature
> > > > > > > version
> > > > > > > > > > after
> > > > > > > > > > > > all brokers are upgraded? For new users who download
> > the
> > > > > latest
> > > > > > > > > version
> > > > > > > > > > > to
> > > > > > > > > > > > build a new cluster, it's inconvenient for them to
> have
> > > to
> > > > > > > manually
> > > > > > > > > > > enable
> > > > > > > > > > > > each feature.
> > > > > > > > > > > >
> > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
> should
> > > be
> > > > 49
> > > > > > > > instead
> > > > > > > > > > of
> > > > > > > > > > > > 48.
> > > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > > > > > > > > kprakasam@confluent.io>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hey Jun,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks a lot for the great feedback! Please note
> that
> > > the
> > > > > > > design
> > > > > > > > > > > > > has changed a little bit on the KIP, and we now
> > > propagate
> > > > > the
> > > > > > > > > > finalized
> > > > > > > > > > > > > features metadata only via ZK watches (instead of
> > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > > > from the controller).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please find below my response to your
> > > questions/feedback,
> > > > > > with
> > > > > > > > the
> > > > > > > > > > > prefix
> > > > > > > > > > > > > "(Kowshik):".
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > > > > 100.1 Since this request waits for responses from
> > > > > brokers,
> > > > > > > > should
> > > > > > > > > > we
> > > > > > > > > > > > add
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! Done. I have added a
> timeout
> > > > field.
> > > > > > > Note:
> > > > > > > > > we
> > > > > > > > > > no
> > > > > > > > > > > > > longer
> > > > > > > > > > > > > wait for responses from brokers, since the design
> has
> > > > been
> > > > > > > > changed
> > > > > > > > > so
> > > > > > > > > > > > that
> > > > > > > > > > > > > the
> > > > > > > > > > > > > features information is propagated via ZK.
> > > Nevertheless,
> > > > it
> > > > > > is
> > > > > > > > > right
> > > > > > > > > > to
> > > > > > > > > > > > > have a timeout
> > > > > > > > > > > > > for the request.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> > Typically,
> > > > the
> > > > > > > > response
> > > > > > > > > > > just
> > > > > > > > > > > > > > shows an error code and an error message, instead
> > of
> > > > > > echoing
> > > > > > > > the
> > > > > > > > > > > > request.
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to
> > > just
> > > > > > return
> > > > > > > > an
> > > > > > > > > > > error
> > > > > > > > > > > > > code and a message.
> > > > > > > > > > > > > Previously it was not echoing the "request", rather
> > it
> > > > was
> > > > > > > > > returning
> > > > > > > > > > > the
> > > > > > > > > > > > > latest set of
> > > > > > > > > > > > > cluster-wide finalized features (after applying the
> > > > > updates).
> > > > > > > But
> > > > > > > > > you
> > > > > > > > > > > are
> > > > > > > > > > > > > right,
> > > > > > > > > > > > > the additional info is not required, so I have
> > removed
> > > it
> > > > > > from
> > > > > > > > the
> > > > > > > > > > > > response
> > > > > > > > > > > > > schema.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 100.3 Should we add a separate request to
> > > list/describe
> > > > > the
> > > > > > > > > > existing
> > > > > > > > > > > > > > features?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): This is already present in the KIP via
> the
> > > > > > > > > > > 'DescribeFeatures'
> > > > > > > > > > > > > Admin API,
> > > > > > > > > > > > > which, underneath covers uses the
> ApiVersionsRequest
> > to
> > > > > > > > > list/describe
> > > > > > > > > > > the
> > > > > > > > > > > > > existing features. Please read the 'Tooling
> support'
> > > > > section.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
> > > > single
> > > > > > > > request.
> > > > > > > > > > For
> > > > > > > > > > > > > > DELETE, the version field doesn't make sense.
> So, I
> > > > guess
> > > > > > the
> > > > > > > > > > broker
> > > > > > > > > > > > just
> > > > > > > > > > > > > > ignores this? An alternative way is to have a
> > > separate
> > > > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP now
> > to
> > > > > have 2
> > > > > > > > > > separate
> > > > > > > > > > > > > controller APIs
> > > > > > > > > > > > > serving these different purposes:
> > > > > > > > > > > > > 1. updateFeatures
> > > > > > > > > > > > > 2. deleteFeatures
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > > > > monotonically
> > > > > > > > > > > increasing
> > > > > > > > > > > > > > version of the metadata for finalized features."
> I
> > am
> > > > > > > wondering
> > > > > > > > > why
> > > > > > > > > > > the
> > > > > > > > > > > > > > ordering is important?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is called
> > > epoch
> > > > > > > > (instead
> > > > > > > > > of
> > > > > > > > > > > > > version), and
> > > > > > > > > > > > > it is just the ZK node version. Basically, this is
> > the
> > > > > epoch
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > > > cluster-wide
> > > > > > > > > > > > > finalized feature version metadata. This metadata
> is
> > > > served
> > > > > > to
> > > > > > > > > > clients
> > > > > > > > > > > > via
> > > > > > > > > > > > > the
> > > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate
> updates
> > > > from
> > > > > > the
> > > > > > > > > > > > '/features'
> > > > > > > > > > > > > ZK node
> > > > > > > > > > > > > to all brokers, via ZK watches setup by each broker
> > on
> > > > the
> > > > > > > > > > '/features'
> > > > > > > > > > > > > node.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Now here is why the ordering is important:
> > > > > > > > > > > > > ZK watches don't propagate at the same time. As a
> > > result,
> > > > > the
> > > > > > > > > > > > > ApiVersionsResponse
> > > > > > > > > > > > > is eventually consistent across brokers. This can
> > > > introduce
> > > > > > > cases
> > > > > > > > > > > > > where clients see an older lower epoch of the
> > features
> > > > > > > metadata,
> > > > > > > > > > after
> > > > > > > > > > > a
> > > > > > > > > > > > > more recent
> > > > > > > > > > > > > higher epoch was returned at a previous point in
> > time.
> > > We
> > > > > > > expect
> > > > > > > > > > > clients
> > > > > > > > > > > > > to always employ the rule that the latest received
> > > higher
> > > > > > epoch
> > > > > > > > of
> > > > > > > > > > > > metadata
> > > > > > > > > > > > > always trumps an older smaller epoch. Those clients
> > > that
> > > > > are
> > > > > > > > > external
> > > > > > > > > > > to
> > > > > > > > > > > > > Kafka should strongly consider discovering the
> latest
> > > > > > metadata
> > > > > > > > once
> > > > > > > > > > > > during
> > > > > > > > > > > > > startup from the brokers, and if required refresh
> the
> > > > > > metadata
> > > > > > > > > > > > periodically
> > > > > > > > > > > > > (to get the latest metadata).
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 100.6 Could you specify the required ACL for this
> > new
> > > > > > > request?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): What is ACL, and how could I find out
> > which
> > > > one
> > > > > to
> > > > > > > > > > specify?
> > > > > > > > > > > > > Please could you provide me some pointers? I'll be
> > glad
> > > > to
> > > > > > > update
> > > > > > > > > the
> > > > > > > > > > > > > KIP once I know the next steps.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 101. For the broker registration ZK node, should
> we
> > > > bump
> > > > > up
> > > > > > > the
> > > > > > > > > > > version
> > > > > > > > > > > > > in
> > > > > > > > > > > > > the json?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! Done. I've increased the
> > > version
> > > > in
> > > > > > the
> > > > > > > > > > broker
> > > > > > > > > > > > json
> > > > > > > > > > > > > by 1.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we
> need
> > > the
> > > > > > epoch
> > > > > > > > > > field.
> > > > > > > > > > > > Each
> > > > > > > > > > > > > > ZK node has an internal version field that is
> > > > incremented
> > > > > > on
> > > > > > > > > every
> > > > > > > > > > > > > update.
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node
> > > > version
> > > > > > > now,
> > > > > > > > > > > instead
> > > > > > > > > > > > of
> > > > > > > > > > > > > explicitly
> > > > > > > > > > > > > incremented epoch.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature
> > > > version
> > > > > > > > > > cluster-wide
> > > > > > > > > > > > is
> > > > > > > > > > > > > > left to the discretion of the logic implementing
> > the
> > > > > > feature
> > > > > > > > (ex:
> > > > > > > > > > can
> > > > > > > > > > > > be
> > > > > > > > > > > > > > done via dynamic broker config)." Does that mean
> > the
> > > > > broker
> > > > > > > > > > > > registration
> > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > node will be updated dynamically when this
> happens?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Not really. The text was just conveying
> > > that a
> > > > > > > broker
> > > > > > > > > > could
> > > > > > > > > > > > > "know" of
> > > > > > > > > > > > > a new feature version, but it does not mean the
> > broker
> > > > > should
> > > > > > > > have
> > > > > > > > > > also
> > > > > > > > > > > > > activated the effects of the feature version.
> Knowing
> > > vs
> > > > > > > > activation
> > > > > > > > > > > are 2
> > > > > > > > > > > > > separate things,
> > > > > > > > > > > > > and the latter can be achieved by dynamic config. I
> > > have
> > > > > > > reworded
> > > > > > > > > the
> > > > > > > > > > > > text
> > > > > > > > > > > > > to
> > > > > > > > > > > > > make this clear to the reader.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > > > > 104.1 It would be useful to describe when the
> > feature
> > > > > > > metadata
> > > > > > > > is
> > > > > > > > > > > > > included
> > > > > > > > > > > > > > in the request. My understanding is that it's
> only
> > > > > included
> > > > > > > if
> > > > > > > > > (1)
> > > > > > > > > > > > there
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > a change to the finalized feature; (2) broker
> > > restart;
> > > > > (3)
> > > > > > > > > > controller
> > > > > > > > > > > > > > failover.
> > > > > > > > > > > > > > 104.2 The new fields have the following versions.
> > Why
> > > > are
> > > > > > the
> > > > > > > > > > > versions
> > > > > > > > > > > > 3+
> > > > > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > > > > >       "fields":  [
> > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> > > "versions":
> > > > > > > "3+",
> > > > > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > > > > "versions":
> > > > > > > > "3+",
> > > > > > > > > > > > > >           "about": "The finalized version for the
> > > > > > feature."}
> > > > > > > > > > > > > >       ]
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): With the new improved design, we have
> > > > completely
> > > > > > > > > > eliminated
> > > > > > > > > > > > the
> > > > > > > > > > > > > need to
> > > > > > > > > > > > > use UpdateMetadataRequest. This is because we now
> > rely
> > > on
> > > > > ZK
> > > > > > to
> > > > > > > > > > deliver
> > > > > > > > > > > > the
> > > > > > > > > > > > > notifications for changes to the '/features' ZK
> node.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> > > update/delete,
> > > > > > > perhaps
> > > > > > > > > > it's
> > > > > > > > > > > > > better
> > > > > > > > > > > > > > to use enable/disable?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so
> that
> > > we
> > > > > > > instead
> > > > > > > > > call
> > > > > > > > > > > it
> > > > > > > > > > > > > 'disable'.
> > > > > > > > > > > > > However for 'update', it can now also refer to
> either
> > > an
> > > > > > > upgrade
> > > > > > > > > or a
> > > > > > > > > > > > > forced downgrade.
> > > > > > > > > > > > > Therefore, I have left it the way it is, just
> calling
> > > it
> > > > as
> > > > > > > just
> > > > > > > > > > > > 'update'.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
> > > > jun@confluent.io>
> > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few
> > > comments
> > > > > > below.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > > > > 100.1 Since this request waits for responses from
> > > > > brokers,
> > > > > > > > should
> > > > > > > > > > we
> > > > > > > > > > > > add
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> > Typically,
> > > > the
> > > > > > > > response
> > > > > > > > > > > just
> > > > > > > > > > > > > > shows an error code and an error message, instead
> > of
> > > > > > echoing
> > > > > > > > the
> > > > > > > > > > > > request.
> > > > > > > > > > > > > > 100.3 Should we add a separate request to
> > > list/describe
> > > > > the
> > > > > > > > > > existing
> > > > > > > > > > > > > > features?
> > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
> > > > single
> > > > > > > > request.
> > > > > > > > > > For
> > > > > > > > > > > > > > DELETE, the version field doesn't make sense.
> So, I
> > > > guess
> > > > > > the
> > > > > > > > > > broker
> > > > > > > > > > > > just
> > > > > > > > > > > > > > ignores this? An alternative way is to have a
> > > separate
> > > > > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > > > > monotonically
> > > > > > > > > > > increasing
> > > > > > > > > > > > > > version of the metadata for finalized features."
> I
> > am
> > > > > > > wondering
> > > > > > > > > why
> > > > > > > > > > > the
> > > > > > > > > > > > > > ordering is important?
> > > > > > > > > > > > > > 100.6 Could you specify the required ACL for this
> > new
> > > > > > > request?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 101. For the broker registration ZK node, should
> we
> > > > bump
> > > > > up
> > > > > > > the
> > > > > > > > > > > version
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > the json?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we
> need
> > > the
> > > > > > epoch
> > > > > > > > > > field.
> > > > > > > > > > > > Each
> > > > > > > > > > > > > > ZK node has an internal version field that is
> > > > incremented
> > > > > > on
> > > > > > > > > every
> > > > > > > > > > > > > update.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature
> > > > version
> > > > > > > > > > cluster-wide
> > > > > > > > > > > > is
> > > > > > > > > > > > > > left to the discretion of the logic implementing
> > the
> > > > > > feature
> > > > > > > > (ex:
> > > > > > > > > > can
> > > > > > > > > > > > be
> > > > > > > > > > > > > > done via dynamic broker config)." Does that mean
> > the
> > > > > broker
> > > > > > > > > > > > registration
> > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > node will be updated dynamically when this
> happens?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > > > > 104.1 It would be useful to describe when the
> > feature
> > > > > > > metadata
> > > > > > > > is
> > > > > > > > > > > > > included
> > > > > > > > > > > > > > in the request. My understanding is that it's
> only
> > > > > included
> > > > > > > if
> > > > > > > > > (1)
> > > > > > > > > > > > there
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > a change to the finalized feature; (2) broker
> > > restart;
> > > > > (3)
> > > > > > > > > > controller
> > > > > > > > > > > > > > failover.
> > > > > > > > > > > > > > 104.2 The new fields have the following versions.
> > Why
> > > > are
> > > > > > the
> > > > > > > > > > > versions
> > > > > > > > > > > > 3+
> > > > > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > > > > >       "fields":  [
> > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> > > "versions":
> > > > > > > "3+",
> > > > > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > > > > "versions":
> > > > > > > > "3+",
> > > > > > > > > > > > > >           "about": "The finalized version for the
> > > > > > feature."}
> > > > > > > > > > > > > >       ]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> > > update/delete,
> > > > > > > perhaps
> > > > > > > > > > it's
> > > > > > > > > > > > > better
> > > > > > > > > > > > > > to use enable/disable?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam
> <
> > > > > > > > > > > > kprakasam@confluent.io
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hey Boyang,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the great feedback! I have updated
> the
> > > KIP
> > > > > > based
> > > > > > > > on
> > > > > > > > > > your
> > > > > > > > > > > > > > > feedback.
> > > > > > > > > > > > > > > Please find my response below for your
> comments,
> > > look
> > > > > for
> > > > > > > > > > sentences
> > > > > > > > > > > > > > > starting
> > > > > > > > > > > > > > > with "(Kowshik)" below.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
> > > > handling
> > > > > > EOS
> > > > > > > > > > > traffic"
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > converted as "When is it safe for the brokers
> > to
> > > > > start
> > > > > > > > > serving
> > > > > > > > > > > new
> > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> > > > > explained
> > > > > > > > > earlier
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > context.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
> > > > version
> > > > > > > > number
> > > > > > > > > > part
> > > > > > > > > > > > > > seems a
> > > > > > > > > > > > > > > > bit blurred. Could you point a reference to
> > later
> > > > > > section
> > > > > > > > > that
> > > > > > > > > > we
> > > > > > > > > > > > > going
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > store it in Zookeeper and update it every
> time
> > > when
> > > > > > there
> > > > > > > > is
> > > > > > > > > a
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > change?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a
> > > reference
> > > > in
> > > > > > the
> > > > > > > > > KIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a
> > > > > Non-goal
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > KIP,
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > features such as group coordinator semantics,
> > > there
> > > > > is
> > > > > > no
> > > > > > > > > legal
> > > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > perform a downgrade at all. So having
> downgrade
> > > > door
> > > > > > open
> > > > > > > > is
> > > > > > > > > > > pretty
> > > > > > > > > > > > > > > > error-prone as human faults happen all the
> > time.
> > > > I'm
> > > > > > > > assuming
> > > > > > > > > > as
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > > features are implemented, it's not very hard
> to
> > > > add a
> > > > > > > flag
> > > > > > > > > > during
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > creation to indicate whether this feature is
> > > > > > > > "downgradable".
> > > > > > > > > > > Could
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > explain a bit more on the extra engineering
> > > effort
> > > > > for
> > > > > > > > > shipping
> > > > > > > > > > > > this
> > > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree
> > > here.
> > > > > > While
> > > > > > > I
> > > > > > > > > > agree
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > accidental
> > > > > > > > > > > > > > > downgrades can cause problems, I also think
> > > sometimes
> > > > > > > > > downgrades
> > > > > > > > > > > > should
> > > > > > > > > > > > > > > be allowed for emergency reasons (not all
> > > downgrades
> > > > > > cause
> > > > > > > > > > issues).
> > > > > > > > > > > > > > > It is just subjective to the feature being
> > > > downgraded.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To be more strict about feature version
> > > downgrades, I
> > > > > > have
> > > > > > > > > > modified
> > > > > > > > > > > > the
> > > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > proposing that we mandate a `--force-downgrade`
> > > flag
> > > > be
> > > > > > > used
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > > > UPDATE_FEATURES api
> > > > > > > > > > > > > > > and the tooling, whenever the human is
> > downgrading
> > > a
> > > > > > > > finalized
> > > > > > > > > > > > feature
> > > > > > > > > > > > > > > version.
> > > > > > > > > > > > > > > Hopefully this should cover the requirement,
> > until
> > > we
> > > > > > find
> > > > > > > > the
> > > > > > > > > > need
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > advanced downgrade support.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
> > feature
> > > > > > > versions
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > > > > defined
> > > > > > > > > > > > > > > > in the broker code." So this means in order
> to
> > > > > > restrict a
> > > > > > > > > > certain
> > > > > > > > > > > > > > > feature,
> > > > > > > > > > > > > > > > we need to start the broker first and then
> > send a
> > > > > > feature
> > > > > > > > > > gating
> > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > immediately, which introduces a time gap and
> > the
> > > > > > > > > > > intended-to-close
> > > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > could actually serve request during this
> phase.
> > > Do
> > > > > you
> > > > > > > > think
> > > > > > > > > we
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > support configurations as well so that admin
> > user
> > > > > could
> > > > > > > > > freely
> > > > > > > > > > > roll
> > > > > > > > > > > > > up
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > cluster with all nodes complying the same
> > feature
> > > > > > gating,
> > > > > > > > > > without
> > > > > > > > > > > > > > > worrying
> > > > > > > > > > > > > > > > about the turnaround time to propagate the
> > > message
> > > > > only
> > > > > > > > after
> > > > > > > > > > the
> > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > starts up?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): This is a great point/question. One
> of
> > > the
> > > > > > > > > > expectations
> > > > > > > > > > > > out
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > this KIP, which is
> > > > > > > > > > > > > > > already followed in the broker, is the
> following.
> > > > > > > > > > > > > > >  - Imagine at time T1 the broker starts up and
> > > > > registers
> > > > > > > it’s
> > > > > > > > > > > > presence
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > ZK,
> > > > > > > > > > > > > > >    along with advertising it’s supported
> > features.
> > > > > > > > > > > > > > >  - Imagine at a future time T2 the broker
> > receives
> > > > the
> > > > > > > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > > > > >    from the controller, which contains the
> latest
> > > > > > finalized
> > > > > > > > > > > features
> > > > > > > > > > > > as
> > > > > > > > > > > > > > > seen by
> > > > > > > > > > > > > > >    the controller. The broker validates this
> data
> > > > > against
> > > > > > > > it’s
> > > > > > > > > > > > > supported
> > > > > > > > > > > > > > > features to
> > > > > > > > > > > > > > >    make sure there is no mismatch (it will
> > shutdown
> > > > if
> > > > > > > there
> > > > > > > > is
> > > > > > > > > > an
> > > > > > > > > > > > > > > incompatibility).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It is expected that during the time between
> the 2
> > > > > events
> > > > > > T1
> > > > > > > > and
> > > > > > > > > > T2,
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > broker is
> > > > > > > > > > > > > > > almost a silent entity in the cluster. It does
> > not
> > > > add
> > > > > > any
> > > > > > > > > value
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > cluster, or carry
> > > > > > > > > > > > > > > out any important broker activities. By
> > > “important”,
> > > > I
> > > > > > mean
> > > > > > > > it
> > > > > > > > > is
> > > > > > > > > > > not
> > > > > > > > > > > > > > doing
> > > > > > > > > > > > > > > mutations
> > > > > > > > > > > > > > > on it’s persistence, not mutating critical
> > > in-memory
> > > > > > state,
> > > > > > > > > won’t
> > > > > > > > > > > be
> > > > > > > > > > > > > > > serving
> > > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t even
> know
> > > > it’s
> > > > > > > > assigned
> > > > > > > > > > > > > > partitions
> > > > > > > > > > > > > > > until
> > > > > > > > > > > > > > > it receives UpdateMetadataRequest from
> > controller.
> > > > > > Anything
> > > > > > > > the
> > > > > > > > > > > > broker
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > doing up
> > > > > > > > > > > > > > > until this point is not damaging/useful.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I’ve clarified the above in the KIP, see this
> new
> > > > > > section:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
> deleting
> > an
> > > > > > > existing
> > > > > > > > > > > > Feature",
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > I misunderstood something, I thought the
> > features
> > > > are
> > > > > > > > defined
> > > > > > > > > > in
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > code, so admin could not really create a new
> > > > feature?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): Great point! You understood this
> > right.
> > > > Here
> > > > > > > > adding
> > > > > > > > > a
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > means we are
> > > > > > > > > > > > > > > adding a cluster-wide finalized *max* version
> > for a
> > > > > > feature
> > > > > > > > > that
> > > > > > > > > > > was
> > > > > > > > > > > > > > > previously never finalized.
> > > > > > > > > > > > > > > I have clarified this in the KIP now.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP
> > > > adding
> > > > > > the
> > > > > > > > > above
> > > > > > > > > > > (see
> > > > > > > > > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 7. I think we haven't discussed the
> alternative
> > > > > > solution
> > > > > > > to
> > > > > > > > > > pass
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > feature information through Zookeeper. Is
> that
> > > > > > mentioned
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > KIP
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> > > favorable?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads
> > > finalized
> > > > > > > feature
> > > > > > > > > info
> > > > > > > > > > > > > stored
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > ZK,
> > > > > > > > > > > > > > > only during startup when it does a validation.
> > When
> > > > > > serving
> > > > > > > > > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > > > > > > > > broker does not read this info from ZK
> directly.
> > > I'd
> > > > > > > imagine
> > > > > > > > > the
> > > > > > > > > > > risk
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > that it can increase
> > > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck for
> the
> > > > > system.
> > > > > > > > > Today,
> > > > > > > > > > in
> > > > > > > > > > > > > Kafka
> > > > > > > > > > > > > > > we use the
> > > > > > > > > > > > > > > controller to fan out ZK updates to brokers and
> > we
> > > > want
> > > > > > to
> > > > > > > > > stick
> > > > > > > > > > to
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > pattern to avoid
> > > > > > > > > > > > > > > the ZK read bottleneck when serving
> > > > > `ApiVersionsRequest`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 8. I was under the impression that user could
> > > > > > configure a
> > > > > > > > > range
> > > > > > > > > > > of
> > > > > > > > > > > > > > > > supported versions, what's the trade-off for
> > > > allowing
> > > > > > > > single
> > > > > > > > > > > > > finalized
> > > > > > > > > > > > > > > > version only?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): Great question! The finalized
> version
> > > of a
> > > > > > > feature
> > > > > > > > > > > > basically
> > > > > > > > > > > > > > > refers to
> > > > > > > > > > > > > > > the cluster-wide finalized feature "maximum"
> > > version.
> > > > > For
> > > > > > > > > > example,
> > > > > > > > > > > if
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > 'group_coordinator' feature
> > > > > > > > > > > > > > > has the finalized version set to 10, then, it
> > means
> > > > > that
> > > > > > > > > > > cluster-wide
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > > versions upto v10 are
> > > > > > > > > > > > > > > supported for this feature. However, note that
> if
> > > > some
> > > > > > > > version
> > > > > > > > > > (ex:
> > > > > > > > > > > > v0)
> > > > > > > > > > > > > > > gets deprecated
> > > > > > > > > > > > > > > for this feature, then we don’t convey that
> using
> > > > this
> > > > > > > scheme
> > > > > > > > > > (also
> > > > > > > > > > > > > > > supporting deprecation is a non-goal).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all
> > points,
> > > > > > > refering
> > > > > > > > to
> > > > > > > > > > > > > finalized
> > > > > > > > > > > > > > > feature "maximum" versions.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> > > > "client"
> > > > > > here
> > > > > > > > may
> > > > > > > > > > be
> > > > > > > > > > > a
> > > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > > > > > > > > reluctanthero104@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hey Kowshik,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple of
> > > > > questions:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
> > > > handling
> > > > > > EOS
> > > > > > > > > > > traffic"
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > converted as "When is it safe for the brokers
> > to
> > > > > start
> > > > > > > > > serving
> > > > > > > > > > > new
> > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> > > > > explained
> > > > > > > > > earlier
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > context.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
> > > > version
> > > > > > > > number
> > > > > > > > > > part
> > > > > > > > > > > > > > seems a
> > > > > > > > > > > > > > > > bit blurred. Could you point a reference to
> > later
> > > > > > section
> > > > > > > > > that
> > > > > > > > > > we
> > > > > > > > > > > > > going
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > store it in Zookeeper and update it every
> time
> > > when
> > > > > > there
> > > > > > > > is
> > > > > > > > > a
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > change?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a
> > > > > Non-goal
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > KIP,
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > features such as group coordinator semantics,
> > > there
> > > > > is
> > > > > > no
> > > > > > > > > legal
> > > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > perform a downgrade at all. So having
> downgrade
> > > > door
> > > > > > open
> > > > > > > > is
> > > > > > > > > > > pretty
> > > > > > > > > > > > > > > > error-prone as human faults happen all the
> > time.
> > > > I'm
> > > > > > > > assuming
> > > > > > > > > > as
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > > features are implemented, it's not very hard
> to
> > > > add a
> > > > > > > flag
> > > > > > > > > > during
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > creation to indicate whether this feature is
> > > > > > > > "downgradable".
> > > > > > > > > > > Could
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > explain a bit more on the extra engineering
> > > effort
> > > > > for
> > > > > > > > > shipping
> > > > > > > > > > > > this
> > > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
> > feature
> > > > > > > versions
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > > > > defined
> > > > > > > > > > > > > > > > in the broker code." So this means in order
> to
> > > > > > restrict a
> > > > > > > > > > certain
> > > > > > > > > > > > > > > feature,
> > > > > > > > > > > > > > > > we need to start the broker first and then
> > send a
> > > > > > feature
> > > > > > > > > > gating
> > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > immediately, which introduces a time gap and
> > the
> > > > > > > > > > > intended-to-close
> > > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > could actually serve request during this
> phase.
> > > Do
> > > > > you
> > > > > > > > think
> > > > > > > > > we
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > support configurations as well so that admin
> > user
> > > > > could
> > > > > > > > > freely
> > > > > > > > > > > roll
> > > > > > > > > > > > > up
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > cluster with all nodes complying the same
> > feature
> > > > > > gating,
> > > > > > > > > > without
> > > > > > > > > > > > > > > worrying
> > > > > > > > > > > > > > > > about the turnaround time to propagate the
> > > message
> > > > > only
> > > > > > > > after
> > > > > > > > > > the
> > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > starts up?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
> deleting
> > an
> > > > > > > existing
> > > > > > > > > > > > Feature",
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > I misunderstood something, I thought the
> > features
> > > > are
> > > > > > > > defined
> > > > > > > > > > in
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > code, so admin could not really create a new
> > > > feature?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 7. I think we haven't discussed the
> alternative
> > > > > > solution
> > > > > > > to
> > > > > > > > > > pass
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > feature information through Zookeeper. Is
> that
> > > > > > mentioned
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > KIP
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> > > favorable?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 8. I was under the impression that user could
> > > > > > configure a
> > > > > > > > > range
> > > > > > > > > > > of
> > > > > > > > > > > > > > > > supported versions, what's the trade-off for
> > > > allowing
> > > > > > > > single
> > > > > > > > > > > > > finalized
> > > > > > > > > > > > > > > > version only?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> > > > "client"
> > > > > > here
> > > > > > > > may
> > > > > > > > > > be
> > > > > > > > > > > a
> > > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Boyang
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe
> <
> > > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik
> > > Prakasam
> > > > > > wrote:
> > > > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the feedback! I've changed the
> > KIP
> > > > to
> > > > > > > > address
> > > > > > > > > > your
> > > > > > > > > > > > > > > > > > suggestions.
> > > > > > > > > > > > > > > > > > Please find below my explanation. Here
> is a
> > > > link
> > > > > to
> > > > > > > KIP
> > > > > > > > > > 584:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 1. '__data_version__' is the version of
> the
> > > > > > finalized
> > > > > > > > > > feature
> > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > (i.e. actual ZK node contents), while the
> > > > > > > > > > > '__schema_version__'
> > > > > > > > > > > > is
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > version of the schema of the data
> persisted
> > > in
> > > > > ZK.
> > > > > > > > These
> > > > > > > > > > > serve
> > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > purposes. '__data_version__' is is useful
> > > > mainly
> > > > > to
> > > > > > > > > clients
> > > > > > > > > > > > > during
> > > > > > > > > > > > > > > > reads,
> > > > > > > > > > > > > > > > > > to differentiate between the 2 versions
> of
> > > > > > eventually
> > > > > > > > > > > > consistent
> > > > > > > > > > > > > > > > > 'finalized
> > > > > > > > > > > > > > > > > > features' metadata (i.e. larger metadata
> > > > version
> > > > > is
> > > > > > > > more
> > > > > > > > > > > > recent).
> > > > > > > > > > > > > > > > > > '__schema_version__' provides an
> additional
> > > > > degree
> > > > > > of
> > > > > > > > > > > > > flexibility,
> > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > we decide to change the schema for
> > > '/features'
> > > > > node
> > > > > > > in
> > > > > > > > ZK
> > > > > > > > > > (in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > future),
> > > > > > > > > > > > > > > > > > then we can manage broker roll outs
> > suitably
> > > > > (i.e.
> > > > > > > > > > > > > > > > > > serialization/deserialization of the ZK
> > data
> > > > can
> > > > > be
> > > > > > > > > handled
> > > > > > > > > > > > > > safely).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Kowshik,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If you're talking about a number that lets
> > you
> > > > know
> > > > > > if
> > > > > > > > data
> > > > > > > > > > is
> > > > > > > > > > > > more
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > less recent, we would typically call that
> an
> > > > epoch,
> > > > > > and
> > > > > > > > > not a
> > > > > > > > > > > > > > version.
> > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > the ZK data structures, the word "version"
> is
> > > > > > typically
> > > > > > > > > > > reserved
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > describing changes to the overall schema of
> > the
> > > > > data
> > > > > > > that
> > > > > > > > > is
> > > > > > > > > > > > > written
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > ZooKeeper.  We don't even really change the
> > > > > "version"
> > > > > > > of
> > > > > > > > > > those
> > > > > > > > > > > > > > schemas
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > much, since most changes are
> > > > backwards-compatible.
> > > > > > But
> > > > > > > > we
> > > > > > > > > do
> > > > > > > > > > > > > include
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > version field just in case.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I don't think we really need an epoch here,
> > > > though,
> > > > > > > since
> > > > > > > > > we
> > > > > > > > > > > can
> > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > at the broker epoch.  Whenever the broker
> > > > > registers,
> > > > > > > its
> > > > > > > > > > epoch
> > > > > > > > > > > > will
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > greater than the previous broker epoch.
> And
> > > the
> > > > > > newly
> > > > > > > > > > > registered
> > > > > > > > > > > > > > data
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > take priority.  This will be a lot simpler
> > than
> > > > > > adding
> > > > > > > a
> > > > > > > > > > > separate
> > > > > > > > > > > > > > epoch
> > > > > > > > > > > > > > > > > system, I think.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2. Regarding admin client needing min and
> > max
> > > > > > > > > information -
> > > > > > > > > > > you
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > right!
> > > > > > > > > > > > > > > > > > I've changed the KIP such that the Admin
> > API
> > > > also
> > > > > > > > allows
> > > > > > > > > > the
> > > > > > > > > > > > user
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > read
> > > > > > > > > > > > > > > > > > 'supported features' from a specific
> > broker.
> > > > > Please
> > > > > > > > look
> > > > > > > > > at
> > > > > > > > > > > the
> > > > > > > > > > > > > > > section
> > > > > > > > > > > > > > > > > > "Admin API changes".
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long`
> -
> > it
> > > > was
> > > > > > not
> > > > > > > > > > > > deliberate.
> > > > > > > > > > > > > > > I've
> > > > > > > > > > > > > > > > > > improved the KIP to just use `long` at
> all
> > > > > places.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Sounds good.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand
> > tool
> > > -
> > > > > you
> > > > > > > are
> > > > > > > > > > right!
> > > > > > > > > > > > > I've
> > > > > > > > > > > > > > > > > updated
> > > > > > > > > > > > > > > > > > the KIP sketching the functionality
> > provided
> > > by
> > > > > > this
> > > > > > > > > tool,
> > > > > > > > > > > with
> > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > examples. Please look at the section
> > "Tooling
> > > > > > support
> > > > > > > > > > > > examples".
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin
> > > McCabe <
> > > > > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > In the "Schema" section, do we really
> > need
> > > > both
> > > > > > > > > > > > > > __schema_version__
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > __data_version__?  Can we just have a
> > > single
> > > > > > > version
> > > > > > > > > > field
> > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function
> have
> > > > some
> > > > > > way
> > > > > > > to
> > > > > > > > > get
> > > > > > > > > > > the
> > > > > > > > > > > > > min
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > information that we're exposing as
> > well?  I
> > > > > guess
> > > > > > > we
> > > > > > > > > > could
> > > > > > > > > > > > have
> > > > > > > > > > > > > > > min,
> > > > > > > > > > > > > > > > > max,
> > > > > > > > > > > > > > > > > > > and current.  Unrelated: is the use of
> > Long
> > > > > > rather
> > > > > > > > than
> > > > > > > > > > > long
> > > > > > > > > > > > > > > > deliberate
> > > > > > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It would be good to describe how the
> > > command
> > > > > line
> > > > > > > > tool
> > > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work.
> > For
> > > > > > example
> > > > > > > > the
> > > > > > > > > > > flags
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > take and the output that it will
> generate
> > > to
> > > > > > > STDOUT.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik
> > > > > Prakasam
> > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I've opened KIP-584
> > > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>
> <
> > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > is intended to provide a versioning
> > > scheme
> > > > > for
> > > > > > > > > > features.
> > > > > > > > > > > > I'd
> > > > > > > > > > > > > > like
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > this thread to discuss the same. I'd
> > > > > appreciate
> > > > > > > any
> > > > > > > > > > > > feedback
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > > > Here
> > > > > > > > > > > > > > > > > > > > is a link to KIP-584
> > > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > > > > >  .
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Jun,

Thank you for the suggestion! I have updated the KIP, please find my
response below.

> 200. I guess you are saying only when the allowDowngrade field is set, the
> finalized feature version can go backward. Otherwise, it can only go up.
> That makes sense. It would be useful to make that clear when explaining
> the usage of the allowDowngrade field. In the validation section, we
have  "
> /features' from {"max_version_level": X} to {"max_version_level": X’}", it
> seems that we need to mention Y there.

(Kowshik): Great point! Yes, that is correct. Done, I have updated the
validations
section explaining the above. Here is a link to this section:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations


Cheers,
Kowshik




On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> 200. I guess you are saying only when the allowDowngrade field is set, the
> finalized feature version can go backward. Otherwise, it can only go up.
> That makes sense. It would be useful to make that clear when explaining
> the usage of the allowDowngrade field. In the validation section, we have
> "
> /features' from {"max_version_level": X} to {"max_version_level": X’}", it
> seems that we need to mention Y there.
>
> Thanks,
>
> Jun
>
> On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hi Jun,
> >
> > Great question! Please find my response below.
> >
> > > 200. My understanding is that If the CLI tool passes the
> > > '--allow-downgrade' flag when updating a specific feature, then a
> future
> > > downgrade is possible. Otherwise, the feature is now downgradable. If
> so,
> > I
> > > was wondering how the controller remembers this since it can be
> restarted
> > > over time?
> >
> > (Kowshik): The purpose of the flag was to just restrict the user intent
> for
> > a specific request.
> > It seems to me that to avoid confusion, I could call the flag as
> > `--try-downgrade` instead.
> > Then this makes it clear, that, the controller just has to consider the
> ask
> > from
> > the user as an explicit request to attempt a downgrade.
> >
> > The flag does not act as an override on controller's decision making that
> > decides whether
> > a flag is downgradable (these decisions on whether to allow a flag to be
> > downgraded
> > from a specific version level, can be embedded in the controller code).
> >
> > Please let me know what you think.
> > Sorry if I misunderstood the original question.
> >
> >
> > Cheers,
> > Kowshik
> >
> >
> > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the reply. Makes sense. Just one more question.
> > >
> > > 200. My understanding is that If the CLI tool passes the
> > > '--allow-downgrade' flag when updating a specific feature, then a
> future
> > > downgrade is possible. Otherwise, the feature is now downgradable. If
> > so, I
> > > was wondering how the controller remembers this since it can be
> restarted
> > > over time?
> > >
> > > Jun
> > >
> > >
> > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <
> kprakasam@confluent.io
> > >
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks a lot for the feedback and the questions!
> > > > Please find my response below.
> > > >
> > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It
> > > seems
> > > > > that field needs to be persisted somewhere in ZK?
> > > >
> > > > (Kowshik): Great question! Below is my explanation. Please help me
> > > > understand,
> > > > if you feel there are cases where we would need to still persist it
> in
> > > ZK.
> > > >
> > > > Firstly I have updated my thoughts into the KIP now, under the
> > > 'guidelines'
> > > > section:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > > >
> > > > The allowDowngrade boolean field is just to restrict the user intent,
> > and
> > > > to remind
> > > > them to double check their intent before proceeding. It should be set
> > to
> > > > true
> > > > by the user in a request, only when the user intent is to forcefully
> > > > "attempt" a
> > > > downgrade of a specific feature's max version level, to the provided
> > > value
> > > > in
> > > > the request.
> > > >
> > > > We can extend this safeguard. The controller (on it's end) can
> maintain
> > > > rules in the code, that, for safety reasons would outright reject
> > certain
> > > > downgrades
> > > > from a specific max_version_level for a specific feature. Such
> > rejections
> > > > may
> > > > happen depending on the feature being downgraded, and from what
> version
> > > > level.
> > > >
> > > > The CLI tool only allows a downgrade attempt in conjunction with
> > specific
> > > > flags and sub-commands. For example, in the CLI tool, if the user
> uses
> > > the
> > > > 'downgrade-all' command, or passes '--allow-downgrade' flag when
> > > updating a
> > > > specific feature, only then the tool will translate this ask to
> setting
> > > > 'allowDowngrade' field in the request to the server.
> > > >
> > > > > 201. UpdateFeaturesResponse has the following top level fields.
> > Should
> > > > > those fields be per feature?
> > > > >
> > > > >   "fields": [
> > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > > > >       "about": "The error code, or 0 if there was no error." },
> > > > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> > > > >       "about": "The error message, or null if there was no error."
> }
> > > > >   ]
> > > >
> > > > (Kowshik): Great question!
> > > > As such, the API is transactional, as explained in the sections
> linked
> > > > below.
> > > > Either all provided FeatureUpdate was applied, or none.
> > > > It's the reason I felt we can have just one error code + message.
> > > > Happy to extend this if you feel otherwise. Please let me know.
> > > >
> > > > Link to sections:
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
> > > >
> > > > > 202. The /features path in ZK has a field min_version_level. Which
> > API
> > > > and
> > > > > tool can change that value?
> > > >
> > > > (Kowshik): Great question! Currently this cannot be modified by using
> > the
> > > > API or the tool.
> > > > Feature version deprecation (by raising min_version_level) can be
> done
> > > only
> > > > by the Controller directly. The rationale is explained in this
> section:
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for addressing those comments. Just a few more minor
> comments.
> > > > >
> > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It
> > > seems
> > > > > that field needs to be persisted somewhere in ZK?
> > > > >
> > > > > 201. UpdateFeaturesResponse has the following top level fields.
> > Should
> > > > > those fields be per feature?
> > > > >
> > > > >   "fields": [
> > > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > > > >       "about": "The error code, or 0 if there was no error." },
> > > > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> > > > >       "about": "The error message, or null if there was no error."
> }
> > > > >   ]
> > > > >
> > > > > 202. The /features path in ZK has a field min_version_level. Which
> > API
> > > > and
> > > > > tool can change that value?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
> > > kprakasam@confluent.io
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for the feedback! I have updated the KIP-584 addressing
> your
> > > > > > comments.
> > > > > > Please find my response below.
> > > > > >
> > > > > > > 100.6 You can look for the sentence "This operation requires
> > ALTER
> > > on
> > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > > KafkaApis.authorize().
> > > > > >
> > > > > > (Kowshik): Done. Great point! For the newly introduced
> > > UPDATE_FEATURES
> > > > > api,
> > > > > > I have added a
> > > > > > requirement that AclOperation.ALTER is required on
> > > > ResourceType.CLUSTER.
> > > > > >
> > > > > > > 110. Keeping the feature version as int is probably fine. I
> just
> > > felt
> > > > > > that
> > > > > > > for some of the common user interactions, it's more convenient
> to
> > > > > > > relate that to a release version. For example, if a user wants
> to
> > > > > > downgrade
> > > > > > > to a release 2.5, it's easier for the user to use the tool like
> > > "tool
> > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
> > --version
> > > > 6".
> > > > > >
> > > > > > (Kowshik): Great point. Generally, maximum feature version levels
> > are
> > > > not
> > > > > > downgradable after
> > > > > > they are finalized in the cluster. This is because, as a
> guideline
> > > > > bumping
> > > > > > feature version level usually is used mainly to convey important
> > > > breaking
> > > > > > changes.
> > > > > > Despite the above, there may be some extreme/rare cases where a
> > user
> > > > > wants
> > > > > > to downgrade
> > > > > > all features to a specific previous release. The user may want to
> > do
> > > > this
> > > > > > just
> > > > > > prior to rolling back a Kafka cluster to a previous release.
> > > > > >
> > > > > > To support the above, I have made a change to the KIP explaining
> > that
> > > > the
> > > > > > CLI tool is versioned.
> > > > > > The CLI tool internally has knowledge about a map of features to
> > > their
> > > > > > respective max
> > > > > > versions supported by the Broker. The tool's knowledge of
> features
> > > and
> > > > > > their version values,
> > > > > > is limited to the version of the CLI tool itself i.e. the
> > information
> > > > is
> > > > > > packaged into the CLI tool
> > > > > > when it is released. Whenever a Kafka release introduces a new
> > > feature
> > > > > > version, or modifies
> > > > > > an existing feature version, the CLI tool shall also be updated
> > with
> > > > this
> > > > > > information,
> > > > > > Newer versions of the CLI tool will be released as part of the
> > Kafka
> > > > > > releases.
> > > > > >
> > > > > > Therefore, to achieve the downgrade need, the user just needs to
> > run
> > > > the
> > > > > > version of
> > > > > > the CLI tool that's part of the particular previous release that
> > > he/she
> > > > > is
> > > > > > downgrading to.
> > > > > > To help the user with this, there is a new command added to the
> CLI
> > > > tool
> > > > > > called `downgrade-all`.
> > > > > > This essentially downgrades max version levels of all features in
> > the
> > > > > > cluster to the versions
> > > > > > known to the CLI tool internally.
> > > > > >
> > > > > > I have explained the above in the KIP under these sections:
> > > > > >
> > > > > > Tooling support (have explained that the CLI tool is versioned):
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > > >
> > > > > > Regular CLI tool usage (please refer to point #3, and see the
> > tooling
> > > > > > example)
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > > > >
> > > > > > > 110. Similarly, if the client library finds a feature mismatch
> > with
> > > > the
> > > > > > broker,
> > > > > > > the client likely needs to log some error message for the user
> to
> > > > take
> > > > > > some
> > > > > > > actions. It's much more actionable if the error message is
> > "upgrade
> > > > the
> > > > > > > broker to release version 2.6" than just "upgrade the broker to
> > > > feature
> > > > > > > version 7".
> > > > > >
> > > > > > (Kowshik): That's a really good point! If we use ints for feature
> > > > > versions,
> > > > > > the best
> > > > > > message that client can print for debugging is "broker doesn't
> > > support
> > > > > > feature version 7", and alongside that print the supported
> version
> > > > range
> > > > > > returned
> > > > > > by the broker. Then, does it sound reasonable that the user could
> > > then
> > > > > > reference
> > > > > > Kafka release logs to figure out which version of the broker
> > release
> > > is
> > > > > > required
> > > > > > be deployed, to support feature version 7? I couldn't think of a
> > > better
> > > > > > strategy here.
> > > > > >
> > > > > > > 120. When should a developer bump up the version of a feature?
> > > > > >
> > > > > > (Kowshik): Great question! In the KIP, I have added a section:
> > > > > 'Guidelines
> > > > > > on feature versions and workflows'
> > > > > > providing some guidelines on when to use the versioned feature
> > flags,
> > > > and
> > > > > > what
> > > > > > are the regular workflows with the CLI tool.
> > > > > >
> > > > > > Link to the relevant sections:
> > > > > > Guidelines:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > > > > >
> > > > > > Regular CLI tool usage:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > > > >
> > > > > > Advanced CLI tool usage:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Kowshik,
> > > > > > >
> > > > > > > Thanks for the reply. A few more comments.
> > > > > > >
> > > > > > > 110. Keeping the feature version as int is probably fine. I
> just
> > > felt
> > > > > > that
> > > > > > > for some of the common user interactions, it's more convenient
> to
> > > > > > > relate that to a release version. For example, if a user wants
> to
> > > > > > downgrade
> > > > > > > to a release 2.5, it's easier for the user to use the tool like
> > > "tool
> > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
> > --version
> > > > 6".
> > > > > > > Similarly, if the client library finds a feature mismatch with
> > the
> > > > > > broker,
> > > > > > > the client likely needs to log some error message for the user
> to
> > > > take
> > > > > > some
> > > > > > > actions. It's much more actionable if the error message is
> > "upgrade
> > > > the
> > > > > > > broker to release version 2.6" than just "upgrade the broker to
> > > > feature
> > > > > > > version 7".
> > > > > > >
> > > > > > > 111. Sounds good.
> > > > > > >
> > > > > > > 120. When should a developer bump up the version of a feature?
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> > > > > kprakasam@confluent.io
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > I have updated the KIP for the item 111.
> > > > > > > > I'm in the process of addressing 100.6, and will provide an
> > > update
> > > > > > soon.
> > > > > > > > I think item 110 is still under discussion given we are now
> > > > > providing a
> > > > > > > way
> > > > > > > > to finalize
> > > > > > > > all features to their latest version levels. In any case,
> > please
> > > > let
> > > > > us
> > > > > > > > know
> > > > > > > > how you feel in response to Colin's comments on this topic.
> > > > > > > >
> > > > > > > > > 111. To put this in context, when we had IBP, the default
> > value
> > > > is
> > > > > > the
> > > > > > > > > current released version. So, if you are a brand new user,
> > you
> > > > > don't
> > > > > > > need
> > > > > > > > > to configure IBP and all new features will be immediately
> > > > available
> > > > > > in
> > > > > > > > the
> > > > > > > > > new cluster. If you are upgrading from an old version, you
> do
> > > > need
> > > > > to
> > > > > > > > > understand and configure IBP. I see a similar pattern here
> > for
> > > > > > > > > features. From the ease of use perspective, ideally, we
> > > shouldn't
> > > > > > > require
> > > > > > > > a
> > > > > > > > > new user to have an extra step such as running a bootstrap
> > > script
> > > > > > > unless
> > > > > > > > > it's truly necessary. If someone has a special need (all
> the
> > > > cases
> > > > > > you
> > > > > > > > > mentioned seem special cases?), they can configure a mode
> > such
> > > > that
> > > > > > > > > features are enabled/disabled manually.
> > > > > > > >
> > > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if I
> > > didn't
> > > > > > > > understand
> > > > > > > > this need earlier. I have updated the KIP with the approach
> > that
> > > > > > whenever
> > > > > > > > the '/features' node is absent, the controller by default
> will
> > > > > > bootstrap
> > > > > > > > the node
> > > > > > > > to contain the latest feature levels. Here is the new section
> > in
> > > > the
> > > > > > KIP
> > > > > > > > describing
> > > > > > > > the same:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > > > > > > >
> > > > > > > > Next, as I explained in my response to Colin's suggestions,
> we
> > > are
> > > > > now
> > > > > > > > providing a `--finalize-latest-features` flag with the
> tooling.
> > > > This
> > > > > > lets
> > > > > > > > the sysadmin finalize all features known to the controller to
> > > their
> > > > > > > latest
> > > > > > > > version
> > > > > > > > levels. Please look at this section (point #3 and the tooling
> > > > example
> > > > > > > > later):
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > > > > >
> > > > > > > >
> > > > > > > > Do you feel this addresses your comment/concern?
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Kowshik,
> > > > > > > > >
> > > > > > > > > Thanks for the reply. A few more replies below.
> > > > > > > > >
> > > > > > > > > 100.6 You can look for the sentence "This operation
> requires
> > > > ALTER
> > > > > on
> > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > > > > KafkaApis.authorize().
> > > > > > > > >
> > > > > > > > > 110. From the external client/tooling perspective, it's
> more
> > > > > natural
> > > > > > to
> > > > > > > > use
> > > > > > > > > the release version for features. If we can use the same
> > > release
> > > > > > > version
> > > > > > > > > for internal representation, it seems simpler (easier to
> > > > > understand,
> > > > > > no
> > > > > > > > > mapping overhead, etc). Is there a benefit with separate
> > > external
> > > > > and
> > > > > > > > > internal versioning schemes?
> > > > > > > > >
> > > > > > > > > 111. To put this in context, when we had IBP, the default
> > value
> > > > is
> > > > > > the
> > > > > > > > > current released version. So, if you are a brand new user,
> > you
> > > > > don't
> > > > > > > need
> > > > > > > > > to configure IBP and all new features will be immediately
> > > > available
> > > > > > in
> > > > > > > > the
> > > > > > > > > new cluster. If you are upgrading from an old version, you
> do
> > > > need
> > > > > to
> > > > > > > > > understand and configure IBP. I see a similar pattern here
> > for
> > > > > > > > > features. From the ease of use perspective, ideally, we
> > > shouldn't
> > > > > > > > require a
> > > > > > > > > new user to have an extra step such as running a bootstrap
> > > script
> > > > > > > unless
> > > > > > > > > it's truly necessary. If someone has a special need (all
> the
> > > > cases
> > > > > > you
> > > > > > > > > mentioned seem special cases?), they can configure a mode
> > such
> > > > that
> > > > > > > > > features are enabled/disabled manually.
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> > > > > > > kprakasam@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Jun,
> > > > > > > > > >
> > > > > > > > > > Thanks for the feedback and suggestions. Please find my
> > > > response
> > > > > > > below.
> > > > > > > > > >
> > > > > > > > > > > 100.6 For every new request, the admin needs to control
> > who
> > > > is
> > > > > > > > allowed
> > > > > > > > > to
> > > > > > > > > > > issue that request if security is enabled. So, we need
> to
> > > > > assign
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > > > as an example.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): I don't see any reference to the words
> > > ResourceType
> > > > or
> > > > > > > > > > AclOperations
> > > > > > > > > > in the KIP. Please let me know how I can use the KIP that
> > you
> > > > > > linked
> > > > > > > to
> > > > > > > > > > know how to
> > > > > > > > > > setup the appropriate ResourceType and/or
> ClusterOperation?
> > > > > > > > > >
> > > > > > > > > > > 105. If we change delete to disable, it's better to do
> > this
> > > > > > > > > consistently
> > > > > > > > > > in
> > > > > > > > > > > request protocol and admin api as well.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): The API shouldn't be called 'disable' when it
> is
> > > > > > deleting
> > > > > > > a
> > > > > > > > > > feature.
> > > > > > > > > > I've just changed the KIP to use 'delete'. I don't have a
> > > > strong
> > > > > > > > > > preference.
> > > > > > > > > >
> > > > > > > > > > > 110. The minVersion/maxVersion for features use int64.
> > > > > Currently,
> > > > > > > our
> > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
> > 2.5.0).
> > > > It's
> > > > > > > > > possible
> > > > > > > > > > > for new features to be included in minor releases too.
> > > Should
> > > > > we
> > > > > > > make
> > > > > > > > > the
> > > > > > > > > > > feature versioning match the release versioning?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): The release version can be mapped to a set of
> > > > feature
> > > > > > > > > versions,
> > > > > > > > > > and this can be done, for example in the tool (or even
> > > external
> > > > > to
> > > > > > > the
> > > > > > > > > > tool).
> > > > > > > > > > Can you please clarify what I'm missing?
> > > > > > > > > >
> > > > > > > > > > > 111. "During regular operations, the data in the ZK
> node
> > > can
> > > > be
> > > > > > > > mutated
> > > > > > > > > > > only via a specific admin API served only by the
> > > > controller." I
> > > > > > am
> > > > > > > > > > > wondering why can't the controller auto finalize a
> > feature
> > > > > > version
> > > > > > > > > after
> > > > > > > > > > > all brokers are upgraded? For new users who download
> the
> > > > latest
> > > > > > > > version
> > > > > > > > > > to
> > > > > > > > > > > build a new cluster, it's inconvenient for them to have
> > to
> > > > > > manually
> > > > > > > > > > enable
> > > > > > > > > > > each feature.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): I agree that there is a trade-off here, but it
> > > will
> > > > > help
> > > > > > > > > > to decide whether the automation can be thought through
> in
> > > the
> > > > > > future
> > > > > > > > > > in a follow up KIP, or right now in this KIP. We may
> invest
> > > > > > > > > > in automation, but we have to decide whether we should do
> > it
> > > > > > > > > > now or later.
> > > > > > > > > >
> > > > > > > > > > For the inconvenience that you mentioned, do you think
> the
> > > > > problem
> > > > > > > that
> > > > > > > > > you
> > > > > > > > > > mentioned can be  overcome by asking for the cluster
> > operator
> > > > to
> > > > > > run
> > > > > > > a
> > > > > > > > > > bootstrap script  when he/she knows that a specific AK
> > > release
> > > > > has
> > > > > > > been
> > > > > > > > > > almost completely deployed in a cluster for the first
> time?
> > > > Idea
> > > > > is
> > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > bootstrap script will know how to map a specific AK
> release
> > > to
> > > > > > > > finalized
> > > > > > > > > > feature versions, and run the `kafka-features.sh` tool
> > > > > > appropriately
> > > > > > > > > > against
> > > > > > > > > > the cluster.
> > > > > > > > > >
> > > > > > > > > > Now, coming back to your automation proposal/question.
> > > > > > > > > > I do see the value of automated feature version
> > finalization,
> > > > > but I
> > > > > > > > also
> > > > > > > > > > see
> > > > > > > > > > that this will open up several questions and some risks,
> as
> > > > > > explained
> > > > > > > > > > below.
> > > > > > > > > > The answers to these depend on the definition of the
> > > automation
> > > > > we
> > > > > > > > choose
> > > > > > > > > > to build, and how well does it fit into a kafka
> deployment.
> > > > > > > > > > Basically, it can be unsafe for the controller to
> finalize
> > > > > feature
> > > > > > > > > version
> > > > > > > > > > upgrades automatically, without learning about the intent
> > of
> > > > the
> > > > > > > > cluster
> > > > > > > > > > operator.
> > > > > > > > > > 1. We would sometimes want to lock feature versions only
> > when
> > > > we
> > > > > > have
> > > > > > > > > > externally verified
> > > > > > > > > > the stability of the broker binary.
> > > > > > > > > > 2. Sometimes only the cluster operator knows that a
> cluster
> > > > > upgrade
> > > > > > > is
> > > > > > > > > > complete,
> > > > > > > > > > and new brokers are highly unlikely to join the cluster.
> > > > > > > > > > 3. Only the cluster operator knows that the intent is to
> > > deploy
> > > > > the
> > > > > > > > same
> > > > > > > > > > version
> > > > > > > > > > of the new broker release across the entire cluster (i.e.
> > the
> > > > > > latest
> > > > > > > > > > downloaded version).
> > > > > > > > > > 4. For downgrades, it appears the controller still needs
> > some
> > > > > > > external
> > > > > > > > > > input
> > > > > > > > > > (such as the proposed tool) to finalize a feature version
> > > > > > downgrade.
> > > > > > > > > >
> > > > > > > > > > If we have automation, that automation can end up failing
> > in
> > > > some
> > > > > > of
> > > > > > > > the
> > > > > > > > > > cases
> > > > > > > > > > above. Then, we need a way to declare that the cluster is
> > > "not
> > > > > > ready"
> > > > > > > > if
> > > > > > > > > > the
> > > > > > > > > > controller cannot automatically finalize some basic
> > required
> > > > > > feature
> > > > > > > > > > version
> > > > > > > > > > upgrades across the cluster. We need to make the cluster
> > > > operator
> > > > > > > aware
> > > > > > > > > in
> > > > > > > > > > such a scenario (raise an alert or alike).
> > > > > > > > > >
> > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should
> > be
> > > 49
> > > > > > > instead
> > > > > > > > > of
> > > > > > > > > > 48.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Done.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <
> jun@confluent.io>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the reply. A few more comments below.
> > > > > > > > > > >
> > > > > > > > > > > 100.6 For every new request, the admin needs to control
> > who
> > > > is
> > > > > > > > allowed
> > > > > > > > > to
> > > > > > > > > > > issue that request if security is enabled. So, we need
> to
> > > > > assign
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > > > as
> > > > > > > > > > > an example.
> > > > > > > > > > >
> > > > > > > > > > > 105. If we change delete to disable, it's better to do
> > this
> > > > > > > > > consistently
> > > > > > > > > > in
> > > > > > > > > > > request protocol and admin api as well.
> > > > > > > > > > >
> > > > > > > > > > > 110. The minVersion/maxVersion for features use int64.
> > > > > Currently,
> > > > > > > our
> > > > > > > > > > > release version schema is major.minor.bugfix (e.g.
> > 2.5.0).
> > > > It's
> > > > > > > > > possible
> > > > > > > > > > > for new features to be included in minor releases too.
> > > Should
> > > > > we
> > > > > > > make
> > > > > > > > > the
> > > > > > > > > > > feature versioning match the release versioning?
> > > > > > > > > > >
> > > > > > > > > > > 111. "During regular operations, the data in the ZK
> node
> > > can
> > > > be
> > > > > > > > mutated
> > > > > > > > > > > only via a specific admin API served only by the
> > > > controller." I
> > > > > > am
> > > > > > > > > > > wondering why can't the controller auto finalize a
> > feature
> > > > > > version
> > > > > > > > > after
> > > > > > > > > > > all brokers are upgraded? For new users who download
> the
> > > > latest
> > > > > > > > version
> > > > > > > > > > to
> > > > > > > > > > > build a new cluster, it's inconvenient for them to have
> > to
> > > > > > manually
> > > > > > > > > > enable
> > > > > > > > > > > each feature.
> > > > > > > > > > >
> > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should
> > be
> > > 49
> > > > > > > instead
> > > > > > > > > of
> > > > > > > > > > > 48.
> > > > > > > > > > >
> > > > > > > > > > > Jun
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > > > > > > > kprakasam@confluent.io>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey Jun,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks a lot for the great feedback! Please note that
> > the
> > > > > > design
> > > > > > > > > > > > has changed a little bit on the KIP, and we now
> > propagate
> > > > the
> > > > > > > > > finalized
> > > > > > > > > > > > features metadata only via ZK watches (instead of
> > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > > from the controller).
> > > > > > > > > > > >
> > > > > > > > > > > > Please find below my response to your
> > questions/feedback,
> > > > > with
> > > > > > > the
> > > > > > > > > > prefix
> > > > > > > > > > > > "(Kowshik):".
> > > > > > > > > > > >
> > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > > > 100.1 Since this request waits for responses from
> > > > brokers,
> > > > > > > should
> > > > > > > > > we
> > > > > > > > > > > add
> > > > > > > > > > > > a
> > > > > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! Done. I have added a timeout
> > > field.
> > > > > > Note:
> > > > > > > > we
> > > > > > > > > no
> > > > > > > > > > > > longer
> > > > > > > > > > > > wait for responses from brokers, since the design has
> > > been
> > > > > > > changed
> > > > > > > > so
> > > > > > > > > > > that
> > > > > > > > > > > > the
> > > > > > > > > > > > features information is propagated via ZK.
> > Nevertheless,
> > > it
> > > > > is
> > > > > > > > right
> > > > > > > > > to
> > > > > > > > > > > > have a timeout
> > > > > > > > > > > > for the request.
> > > > > > > > > > > >
> > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> Typically,
> > > the
> > > > > > > response
> > > > > > > > > > just
> > > > > > > > > > > > > shows an error code and an error message, instead
> of
> > > > > echoing
> > > > > > > the
> > > > > > > > > > > request.
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to
> > just
> > > > > return
> > > > > > > an
> > > > > > > > > > error
> > > > > > > > > > > > code and a message.
> > > > > > > > > > > > Previously it was not echoing the "request", rather
> it
> > > was
> > > > > > > > returning
> > > > > > > > > > the
> > > > > > > > > > > > latest set of
> > > > > > > > > > > > cluster-wide finalized features (after applying the
> > > > updates).
> > > > > > But
> > > > > > > > you
> > > > > > > > > > are
> > > > > > > > > > > > right,
> > > > > > > > > > > > the additional info is not required, so I have
> removed
> > it
> > > > > from
> > > > > > > the
> > > > > > > > > > > response
> > > > > > > > > > > > schema.
> > > > > > > > > > > >
> > > > > > > > > > > > > 100.3 Should we add a separate request to
> > list/describe
> > > > the
> > > > > > > > > existing
> > > > > > > > > > > > > features?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): This is already present in the KIP via the
> > > > > > > > > > 'DescribeFeatures'
> > > > > > > > > > > > Admin API,
> > > > > > > > > > > > which, underneath covers uses the ApiVersionsRequest
> to
> > > > > > > > list/describe
> > > > > > > > > > the
> > > > > > > > > > > > existing features. Please read the 'Tooling support'
> > > > section.
> > > > > > > > > > > >
> > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
> > > single
> > > > > > > request.
> > > > > > > > > For
> > > > > > > > > > > > > DELETE, the version field doesn't make sense. So, I
> > > guess
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > just
> > > > > > > > > > > > > ignores this? An alternative way is to have a
> > separate
> > > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP now
> to
> > > > have 2
> > > > > > > > > separate
> > > > > > > > > > > > controller APIs
> > > > > > > > > > > > serving these different purposes:
> > > > > > > > > > > > 1. updateFeatures
> > > > > > > > > > > > 2. deleteFeatures
> > > > > > > > > > > >
> > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > > > monotonically
> > > > > > > > > > increasing
> > > > > > > > > > > > > version of the metadata for finalized features." I
> am
> > > > > > wondering
> > > > > > > > why
> > > > > > > > > > the
> > > > > > > > > > > > > ordering is important?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is called
> > epoch
> > > > > > > (instead
> > > > > > > > of
> > > > > > > > > > > > version), and
> > > > > > > > > > > > it is just the ZK node version. Basically, this is
> the
> > > > epoch
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > > cluster-wide
> > > > > > > > > > > > finalized feature version metadata. This metadata is
> > > served
> > > > > to
> > > > > > > > > clients
> > > > > > > > > > > via
> > > > > > > > > > > > the
> > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate updates
> > > from
> > > > > the
> > > > > > > > > > > '/features'
> > > > > > > > > > > > ZK node
> > > > > > > > > > > > to all brokers, via ZK watches setup by each broker
> on
> > > the
> > > > > > > > > '/features'
> > > > > > > > > > > > node.
> > > > > > > > > > > >
> > > > > > > > > > > > Now here is why the ordering is important:
> > > > > > > > > > > > ZK watches don't propagate at the same time. As a
> > result,
> > > > the
> > > > > > > > > > > > ApiVersionsResponse
> > > > > > > > > > > > is eventually consistent across brokers. This can
> > > introduce
> > > > > > cases
> > > > > > > > > > > > where clients see an older lower epoch of the
> features
> > > > > > metadata,
> > > > > > > > > after
> > > > > > > > > > a
> > > > > > > > > > > > more recent
> > > > > > > > > > > > higher epoch was returned at a previous point in
> time.
> > We
> > > > > > expect
> > > > > > > > > > clients
> > > > > > > > > > > > to always employ the rule that the latest received
> > higher
> > > > > epoch
> > > > > > > of
> > > > > > > > > > > metadata
> > > > > > > > > > > > always trumps an older smaller epoch. Those clients
> > that
> > > > are
> > > > > > > > external
> > > > > > > > > > to
> > > > > > > > > > > > Kafka should strongly consider discovering the latest
> > > > > metadata
> > > > > > > once
> > > > > > > > > > > during
> > > > > > > > > > > > startup from the brokers, and if required refresh the
> > > > > metadata
> > > > > > > > > > > periodically
> > > > > > > > > > > > (to get the latest metadata).
> > > > > > > > > > > >
> > > > > > > > > > > > > 100.6 Could you specify the required ACL for this
> new
> > > > > > request?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): What is ACL, and how could I find out
> which
> > > one
> > > > to
> > > > > > > > > specify?
> > > > > > > > > > > > Please could you provide me some pointers? I'll be
> glad
> > > to
> > > > > > update
> > > > > > > > the
> > > > > > > > > > > > KIP once I know the next steps.
> > > > > > > > > > > >
> > > > > > > > > > > > > 101. For the broker registration ZK node, should we
> > > bump
> > > > up
> > > > > > the
> > > > > > > > > > version
> > > > > > > > > > > > in
> > > > > > > > > > > > the json?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! Done. I've increased the
> > version
> > > in
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > json
> > > > > > > > > > > > by 1.
> > > > > > > > > > > >
> > > > > > > > > > > > > 102. For the /features ZK node, not sure if we need
> > the
> > > > > epoch
> > > > > > > > > field.
> > > > > > > > > > > Each
> > > > > > > > > > > > > ZK node has an internal version field that is
> > > incremented
> > > > > on
> > > > > > > > every
> > > > > > > > > > > > update.
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node
> > > version
> > > > > > now,
> > > > > > > > > > instead
> > > > > > > > > > > of
> > > > > > > > > > > > explicitly
> > > > > > > > > > > > incremented epoch.
> > > > > > > > > > > >
> > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature
> > > version
> > > > > > > > > cluster-wide
> > > > > > > > > > > is
> > > > > > > > > > > > > left to the discretion of the logic implementing
> the
> > > > > feature
> > > > > > > (ex:
> > > > > > > > > can
> > > > > > > > > > > be
> > > > > > > > > > > > > done via dynamic broker config)." Does that mean
> the
> > > > broker
> > > > > > > > > > > registration
> > > > > > > > > > > > ZK
> > > > > > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Not really. The text was just conveying
> > that a
> > > > > > broker
> > > > > > > > > could
> > > > > > > > > > > > "know" of
> > > > > > > > > > > > a new feature version, but it does not mean the
> broker
> > > > should
> > > > > > > have
> > > > > > > > > also
> > > > > > > > > > > > activated the effects of the feature version. Knowing
> > vs
> > > > > > > activation
> > > > > > > > > > are 2
> > > > > > > > > > > > separate things,
> > > > > > > > > > > > and the latter can be achieved by dynamic config. I
> > have
> > > > > > reworded
> > > > > > > > the
> > > > > > > > > > > text
> > > > > > > > > > > > to
> > > > > > > > > > > > make this clear to the reader.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > > > 104.1 It would be useful to describe when the
> feature
> > > > > > metadata
> > > > > > > is
> > > > > > > > > > > > included
> > > > > > > > > > > > > in the request. My understanding is that it's only
> > > > included
> > > > > > if
> > > > > > > > (1)
> > > > > > > > > > > there
> > > > > > > > > > > > is
> > > > > > > > > > > > > a change to the finalized feature; (2) broker
> > restart;
> > > > (3)
> > > > > > > > > controller
> > > > > > > > > > > > > failover.
> > > > > > > > > > > > > 104.2 The new fields have the following versions.
> Why
> > > are
> > > > > the
> > > > > > > > > > versions
> > > > > > > > > > > 3+
> > > > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > > > >       "fields":  [
> > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> > "versions":
> > > > > > "3+",
> > > > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > > > "versions":
> > > > > > > "3+",
> > > > > > > > > > > > >           "about": "The finalized version for the
> > > > > feature."}
> > > > > > > > > > > > >       ]
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): With the new improved design, we have
> > > completely
> > > > > > > > > eliminated
> > > > > > > > > > > the
> > > > > > > > > > > > need to
> > > > > > > > > > > > use UpdateMetadataRequest. This is because we now
> rely
> > on
> > > > ZK
> > > > > to
> > > > > > > > > deliver
> > > > > > > > > > > the
> > > > > > > > > > > > notifications for changes to the '/features' ZK node.
> > > > > > > > > > > >
> > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> > update/delete,
> > > > > > perhaps
> > > > > > > > > it's
> > > > > > > > > > > > better
> > > > > > > > > > > > > to use enable/disable?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so that
> > we
> > > > > > instead
> > > > > > > > call
> > > > > > > > > > it
> > > > > > > > > > > > 'disable'.
> > > > > > > > > > > > However for 'update', it can now also refer to either
> > an
> > > > > > upgrade
> > > > > > > > or a
> > > > > > > > > > > > forced downgrade.
> > > > > > > > > > > > Therefore, I have left it the way it is, just calling
> > it
> > > as
> > > > > > just
> > > > > > > > > > > 'update'.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Kowshik
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
> > > jun@confluent.io>
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few
> > comments
> > > > > below.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > > > 100.1 Since this request waits for responses from
> > > > brokers,
> > > > > > > should
> > > > > > > > > we
> > > > > > > > > > > add
> > > > > > > > > > > > a
> > > > > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> Typically,
> > > the
> > > > > > > response
> > > > > > > > > > just
> > > > > > > > > > > > > shows an error code and an error message, instead
> of
> > > > > echoing
> > > > > > > the
> > > > > > > > > > > request.
> > > > > > > > > > > > > 100.3 Should we add a separate request to
> > list/describe
> > > > the
> > > > > > > > > existing
> > > > > > > > > > > > > features?
> > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
> > > single
> > > > > > > request.
> > > > > > > > > For
> > > > > > > > > > > > > DELETE, the version field doesn't make sense. So, I
> > > guess
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > just
> > > > > > > > > > > > > ignores this? An alternative way is to have a
> > separate
> > > > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > > > monotonically
> > > > > > > > > > increasing
> > > > > > > > > > > > > version of the metadata for finalized features." I
> am
> > > > > > wondering
> > > > > > > > why
> > > > > > > > > > the
> > > > > > > > > > > > > ordering is important?
> > > > > > > > > > > > > 100.6 Could you specify the required ACL for this
> new
> > > > > > request?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 101. For the broker registration ZK node, should we
> > > bump
> > > > up
> > > > > > the
> > > > > > > > > > version
> > > > > > > > > > > > in
> > > > > > > > > > > > > the json?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 102. For the /features ZK node, not sure if we need
> > the
> > > > > epoch
> > > > > > > > > field.
> > > > > > > > > > > Each
> > > > > > > > > > > > > ZK node has an internal version field that is
> > > incremented
> > > > > on
> > > > > > > > every
> > > > > > > > > > > > update.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature
> > > version
> > > > > > > > > cluster-wide
> > > > > > > > > > > is
> > > > > > > > > > > > > left to the discretion of the logic implementing
> the
> > > > > feature
> > > > > > > (ex:
> > > > > > > > > can
> > > > > > > > > > > be
> > > > > > > > > > > > > done via dynamic broker config)." Does that mean
> the
> > > > broker
> > > > > > > > > > > registration
> > > > > > > > > > > > ZK
> > > > > > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > > > 104.1 It would be useful to describe when the
> feature
> > > > > > metadata
> > > > > > > is
> > > > > > > > > > > > included
> > > > > > > > > > > > > in the request. My understanding is that it's only
> > > > included
> > > > > > if
> > > > > > > > (1)
> > > > > > > > > > > there
> > > > > > > > > > > > is
> > > > > > > > > > > > > a change to the finalized feature; (2) broker
> > restart;
> > > > (3)
> > > > > > > > > controller
> > > > > > > > > > > > > failover.
> > > > > > > > > > > > > 104.2 The new fields have the following versions.
> Why
> > > are
> > > > > the
> > > > > > > > > > versions
> > > > > > > > > > > 3+
> > > > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > > > >       "fields":  [
> > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> > "versions":
> > > > > > "3+",
> > > > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > > > "versions":
> > > > > > > "3+",
> > > > > > > > > > > > >           "about": "The finalized version for the
> > > > > feature."}
> > > > > > > > > > > > >       ]
> > > > > > > > > > > > >
> > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> > update/delete,
> > > > > > perhaps
> > > > > > > > > it's
> > > > > > > > > > > > better
> > > > > > > > > > > > > to use enable/disable?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jun
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > > > > > > > > kprakasam@confluent.io
> > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hey Boyang,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the great feedback! I have updated the
> > KIP
> > > > > based
> > > > > > > on
> > > > > > > > > your
> > > > > > > > > > > > > > feedback.
> > > > > > > > > > > > > > Please find my response below for your comments,
> > look
> > > > for
> > > > > > > > > sentences
> > > > > > > > > > > > > > starting
> > > > > > > > > > > > > > with "(Kowshik)" below.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
> > > handling
> > > > > EOS
> > > > > > > > > > traffic"
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > converted as "When is it safe for the brokers
> to
> > > > start
> > > > > > > > serving
> > > > > > > > > > new
> > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> > > > explained
> > > > > > > > earlier
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > context.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
> > > version
> > > > > > > number
> > > > > > > > > part
> > > > > > > > > > > > > seems a
> > > > > > > > > > > > > > > bit blurred. Could you point a reference to
> later
> > > > > section
> > > > > > > > that
> > > > > > > > > we
> > > > > > > > > > > > going
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > store it in Zookeeper and update it every time
> > when
> > > > > there
> > > > > > > is
> > > > > > > > a
> > > > > > > > > > > > feature
> > > > > > > > > > > > > > > change?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a
> > reference
> > > in
> > > > > the
> > > > > > > > KIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a
> > > > Non-goal
> > > > > of
> > > > > > > the
> > > > > > > > > > KIP,
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > features such as group coordinator semantics,
> > there
> > > > is
> > > > > no
> > > > > > > > legal
> > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > perform a downgrade at all. So having downgrade
> > > door
> > > > > open
> > > > > > > is
> > > > > > > > > > pretty
> > > > > > > > > > > > > > > error-prone as human faults happen all the
> time.
> > > I'm
> > > > > > > assuming
> > > > > > > > > as
> > > > > > > > > > > new
> > > > > > > > > > > > > > > features are implemented, it's not very hard to
> > > add a
> > > > > > flag
> > > > > > > > > during
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > creation to indicate whether this feature is
> > > > > > > "downgradable".
> > > > > > > > > > Could
> > > > > > > > > > > > you
> > > > > > > > > > > > > > > explain a bit more on the extra engineering
> > effort
> > > > for
> > > > > > > > shipping
> > > > > > > > > > > this
> > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree
> > here.
> > > > > While
> > > > > > I
> > > > > > > > > agree
> > > > > > > > > > > that
> > > > > > > > > > > > > > accidental
> > > > > > > > > > > > > > downgrades can cause problems, I also think
> > sometimes
> > > > > > > > downgrades
> > > > > > > > > > > should
> > > > > > > > > > > > > > be allowed for emergency reasons (not all
> > downgrades
> > > > > cause
> > > > > > > > > issues).
> > > > > > > > > > > > > > It is just subjective to the feature being
> > > downgraded.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > To be more strict about feature version
> > downgrades, I
> > > > > have
> > > > > > > > > modified
> > > > > > > > > > > the
> > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > proposing that we mandate a `--force-downgrade`
> > flag
> > > be
> > > > > > used
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > > UPDATE_FEATURES api
> > > > > > > > > > > > > > and the tooling, whenever the human is
> downgrading
> > a
> > > > > > > finalized
> > > > > > > > > > > feature
> > > > > > > > > > > > > > version.
> > > > > > > > > > > > > > Hopefully this should cover the requirement,
> until
> > we
> > > > > find
> > > > > > > the
> > > > > > > > > need
> > > > > > > > > > > for
> > > > > > > > > > > > > > advanced downgrade support.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
> feature
> > > > > > versions
> > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > > defined
> > > > > > > > > > > > > > > in the broker code." So this means in order to
> > > > > restrict a
> > > > > > > > > certain
> > > > > > > > > > > > > > feature,
> > > > > > > > > > > > > > > we need to start the broker first and then
> send a
> > > > > feature
> > > > > > > > > gating
> > > > > > > > > > > > > request
> > > > > > > > > > > > > > > immediately, which introduces a time gap and
> the
> > > > > > > > > > intended-to-close
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > could actually serve request during this phase.
> > Do
> > > > you
> > > > > > > think
> > > > > > > > we
> > > > > > > > > > > > should
> > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > support configurations as well so that admin
> user
> > > > could
> > > > > > > > freely
> > > > > > > > > > roll
> > > > > > > > > > > > up
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > cluster with all nodes complying the same
> feature
> > > > > gating,
> > > > > > > > > without
> > > > > > > > > > > > > > worrying
> > > > > > > > > > > > > > > about the turnaround time to propagate the
> > message
> > > > only
> > > > > > > after
> > > > > > > > > the
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > starts up?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): This is a great point/question. One of
> > the
> > > > > > > > > expectations
> > > > > > > > > > > out
> > > > > > > > > > > > of
> > > > > > > > > > > > > > this KIP, which is
> > > > > > > > > > > > > > already followed in the broker, is the following.
> > > > > > > > > > > > > >  - Imagine at time T1 the broker starts up and
> > > > registers
> > > > > > it’s
> > > > > > > > > > > presence
> > > > > > > > > > > > in
> > > > > > > > > > > > > > ZK,
> > > > > > > > > > > > > >    along with advertising it’s supported
> features.
> > > > > > > > > > > > > >  - Imagine at a future time T2 the broker
> receives
> > > the
> > > > > > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > > > >    from the controller, which contains the latest
> > > > > finalized
> > > > > > > > > > features
> > > > > > > > > > > as
> > > > > > > > > > > > > > seen by
> > > > > > > > > > > > > >    the controller. The broker validates this data
> > > > against
> > > > > > > it’s
> > > > > > > > > > > > supported
> > > > > > > > > > > > > > features to
> > > > > > > > > > > > > >    make sure there is no mismatch (it will
> shutdown
> > > if
> > > > > > there
> > > > > > > is
> > > > > > > > > an
> > > > > > > > > > > > > > incompatibility).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It is expected that during the time between the 2
> > > > events
> > > > > T1
> > > > > > > and
> > > > > > > > > T2,
> > > > > > > > > > > the
> > > > > > > > > > > > > > broker is
> > > > > > > > > > > > > > almost a silent entity in the cluster. It does
> not
> > > add
> > > > > any
> > > > > > > > value
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > > cluster, or carry
> > > > > > > > > > > > > > out any important broker activities. By
> > “important”,
> > > I
> > > > > mean
> > > > > > > it
> > > > > > > > is
> > > > > > > > > > not
> > > > > > > > > > > > > doing
> > > > > > > > > > > > > > mutations
> > > > > > > > > > > > > > on it’s persistence, not mutating critical
> > in-memory
> > > > > state,
> > > > > > > > won’t
> > > > > > > > > > be
> > > > > > > > > > > > > > serving
> > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t even know
> > > it’s
> > > > > > > assigned
> > > > > > > > > > > > > partitions
> > > > > > > > > > > > > > until
> > > > > > > > > > > > > > it receives UpdateMetadataRequest from
> controller.
> > > > > Anything
> > > > > > > the
> > > > > > > > > > > broker
> > > > > > > > > > > > is
> > > > > > > > > > > > > > doing up
> > > > > > > > > > > > > > until this point is not damaging/useful.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I’ve clarified the above in the KIP, see this new
> > > > > section:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > > > > > > > .
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 5. "adding a new Feature, updating or deleting
> an
> > > > > > existing
> > > > > > > > > > > Feature",
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > I misunderstood something, I thought the
> features
> > > are
> > > > > > > defined
> > > > > > > > > in
> > > > > > > > > > > > broker
> > > > > > > > > > > > > > > code, so admin could not really create a new
> > > feature?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! You understood this
> right.
> > > Here
> > > > > > > adding
> > > > > > > > a
> > > > > > > > > > > > feature
> > > > > > > > > > > > > > means we are
> > > > > > > > > > > > > > adding a cluster-wide finalized *max* version
> for a
> > > > > feature
> > > > > > > > that
> > > > > > > > > > was
> > > > > > > > > > > > > > previously never finalized.
> > > > > > > > > > > > > > I have clarified this in the KIP now.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP
> > > adding
> > > > > the
> > > > > > > > above
> > > > > > > > > > (see
> > > > > > > > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 7. I think we haven't discussed the alternative
> > > > > solution
> > > > > > to
> > > > > > > > > pass
> > > > > > > > > > > the
> > > > > > > > > > > > > > > feature information through Zookeeper. Is that
> > > > > mentioned
> > > > > > in
> > > > > > > > the
> > > > > > > > > > KIP
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> > favorable?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads
> > finalized
> > > > > > feature
> > > > > > > > info
> > > > > > > > > > > > stored
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > ZK,
> > > > > > > > > > > > > > only during startup when it does a validation.
> When
> > > > > serving
> > > > > > > > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > > > > > > > broker does not read this info from ZK directly.
> > I'd
> > > > > > imagine
> > > > > > > > the
> > > > > > > > > > risk
> > > > > > > > > > > > is
> > > > > > > > > > > > > > that it can increase
> > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck for the
> > > > system.
> > > > > > > > Today,
> > > > > > > > > in
> > > > > > > > > > > > Kafka
> > > > > > > > > > > > > > we use the
> > > > > > > > > > > > > > controller to fan out ZK updates to brokers and
> we
> > > want
> > > > > to
> > > > > > > > stick
> > > > > > > > > to
> > > > > > > > > > > > that
> > > > > > > > > > > > > > pattern to avoid
> > > > > > > > > > > > > > the ZK read bottleneck when serving
> > > > `ApiVersionsRequest`.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 8. I was under the impression that user could
> > > > > configure a
> > > > > > > > range
> > > > > > > > > > of
> > > > > > > > > > > > > > > supported versions, what's the trade-off for
> > > allowing
> > > > > > > single
> > > > > > > > > > > > finalized
> > > > > > > > > > > > > > > version only?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great question! The finalized version
> > of a
> > > > > > feature
> > > > > > > > > > > basically
> > > > > > > > > > > > > > refers to
> > > > > > > > > > > > > > the cluster-wide finalized feature "maximum"
> > version.
> > > > For
> > > > > > > > > example,
> > > > > > > > > > if
> > > > > > > > > > > > the
> > > > > > > > > > > > > > 'group_coordinator' feature
> > > > > > > > > > > > > > has the finalized version set to 10, then, it
> means
> > > > that
> > > > > > > > > > cluster-wide
> > > > > > > > > > > > all
> > > > > > > > > > > > > > versions upto v10 are
> > > > > > > > > > > > > > supported for this feature. However, note that if
> > > some
> > > > > > > version
> > > > > > > > > (ex:
> > > > > > > > > > > v0)
> > > > > > > > > > > > > > gets deprecated
> > > > > > > > > > > > > > for this feature, then we don’t convey that using
> > > this
> > > > > > scheme
> > > > > > > > > (also
> > > > > > > > > > > > > > supporting deprecation is a non-goal).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all
> points,
> > > > > > refering
> > > > > > > to
> > > > > > > > > > > > finalized
> > > > > > > > > > > > > > feature "maximum" versions.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> > > "client"
> > > > > here
> > > > > > > may
> > > > > > > > > be
> > > > > > > > > > a
> > > > > > > > > > > > > > producer
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > > > > > > > reluctanthero104@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hey Kowshik,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple of
> > > > questions:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
> > > handling
> > > > > EOS
> > > > > > > > > > traffic"
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > converted as "When is it safe for the brokers
> to
> > > > start
> > > > > > > > serving
> > > > > > > > > > new
> > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> > > > explained
> > > > > > > > earlier
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > context.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
> > > version
> > > > > > > number
> > > > > > > > > part
> > > > > > > > > > > > > seems a
> > > > > > > > > > > > > > > bit blurred. Could you point a reference to
> later
> > > > > section
> > > > > > > > that
> > > > > > > > > we
> > > > > > > > > > > > going
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > store it in Zookeeper and update it every time
> > when
> > > > > there
> > > > > > > is
> > > > > > > > a
> > > > > > > > > > > > feature
> > > > > > > > > > > > > > > change?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a
> > > > Non-goal
> > > > > of
> > > > > > > the
> > > > > > > > > > KIP,
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > features such as group coordinator semantics,
> > there
> > > > is
> > > > > no
> > > > > > > > legal
> > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > perform a downgrade at all. So having downgrade
> > > door
> > > > > open
> > > > > > > is
> > > > > > > > > > pretty
> > > > > > > > > > > > > > > error-prone as human faults happen all the
> time.
> > > I'm
> > > > > > > assuming
> > > > > > > > > as
> > > > > > > > > > > new
> > > > > > > > > > > > > > > features are implemented, it's not very hard to
> > > add a
> > > > > > flag
> > > > > > > > > during
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > creation to indicate whether this feature is
> > > > > > > "downgradable".
> > > > > > > > > > Could
> > > > > > > > > > > > you
> > > > > > > > > > > > > > > explain a bit more on the extra engineering
> > effort
> > > > for
> > > > > > > > shipping
> > > > > > > > > > > this
> > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of
> feature
> > > > > > versions
> > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > > defined
> > > > > > > > > > > > > > > in the broker code." So this means in order to
> > > > > restrict a
> > > > > > > > > certain
> > > > > > > > > > > > > > feature,
> > > > > > > > > > > > > > > we need to start the broker first and then
> send a
> > > > > feature
> > > > > > > > > gating
> > > > > > > > > > > > > request
> > > > > > > > > > > > > > > immediately, which introduces a time gap and
> the
> > > > > > > > > > intended-to-close
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > could actually serve request during this phase.
> > Do
> > > > you
> > > > > > > think
> > > > > > > > we
> > > > > > > > > > > > should
> > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > support configurations as well so that admin
> user
> > > > could
> > > > > > > > freely
> > > > > > > > > > roll
> > > > > > > > > > > > up
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > cluster with all nodes complying the same
> feature
> > > > > gating,
> > > > > > > > > without
> > > > > > > > > > > > > > worrying
> > > > > > > > > > > > > > > about the turnaround time to propagate the
> > message
> > > > only
> > > > > > > after
> > > > > > > > > the
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > starts up?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 5. "adding a new Feature, updating or deleting
> an
> > > > > > existing
> > > > > > > > > > > Feature",
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > I misunderstood something, I thought the
> features
> > > are
> > > > > > > defined
> > > > > > > > > in
> > > > > > > > > > > > broker
> > > > > > > > > > > > > > > code, so admin could not really create a new
> > > feature?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 7. I think we haven't discussed the alternative
> > > > > solution
> > > > > > to
> > > > > > > > > pass
> > > > > > > > > > > the
> > > > > > > > > > > > > > > feature information through Zookeeper. Is that
> > > > > mentioned
> > > > > > in
> > > > > > > > the
> > > > > > > > > > KIP
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> > favorable?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 8. I was under the impression that user could
> > > > > configure a
> > > > > > > > range
> > > > > > > > > > of
> > > > > > > > > > > > > > > supported versions, what's the trade-off for
> > > allowing
> > > > > > > single
> > > > > > > > > > > > finalized
> > > > > > > > > > > > > > > version only?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> > > "client"
> > > > > here
> > > > > > > may
> > > > > > > > > be
> > > > > > > > > > a
> > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Boyang
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > > > > > > > > cmccabe@apache.org
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik
> > Prakasam
> > > > > wrote:
> > > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the feedback! I've changed the
> KIP
> > > to
> > > > > > > address
> > > > > > > > > your
> > > > > > > > > > > > > > > > > suggestions.
> > > > > > > > > > > > > > > > > Please find below my explanation. Here is a
> > > link
> > > > to
> > > > > > KIP
> > > > > > > > > 584:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1. '__data_version__' is the version of the
> > > > > finalized
> > > > > > > > > feature
> > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > (i.e. actual ZK node contents), while the
> > > > > > > > > > '__schema_version__'
> > > > > > > > > > > is
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > version of the schema of the data persisted
> > in
> > > > ZK.
> > > > > > > These
> > > > > > > > > > serve
> > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > purposes. '__data_version__' is is useful
> > > mainly
> > > > to
> > > > > > > > clients
> > > > > > > > > > > > during
> > > > > > > > > > > > > > > reads,
> > > > > > > > > > > > > > > > > to differentiate between the 2 versions of
> > > > > eventually
> > > > > > > > > > > consistent
> > > > > > > > > > > > > > > > 'finalized
> > > > > > > > > > > > > > > > > features' metadata (i.e. larger metadata
> > > version
> > > > is
> > > > > > > more
> > > > > > > > > > > recent).
> > > > > > > > > > > > > > > > > '__schema_version__' provides an additional
> > > > degree
> > > > > of
> > > > > > > > > > > > flexibility,
> > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > we decide to change the schema for
> > '/features'
> > > > node
> > > > > > in
> > > > > > > ZK
> > > > > > > > > (in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > future),
> > > > > > > > > > > > > > > > > then we can manage broker roll outs
> suitably
> > > > (i.e.
> > > > > > > > > > > > > > > > > serialization/deserialization of the ZK
> data
> > > can
> > > > be
> > > > > > > > handled
> > > > > > > > > > > > > safely).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Kowshik,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If you're talking about a number that lets
> you
> > > know
> > > > > if
> > > > > > > data
> > > > > > > > > is
> > > > > > > > > > > more
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > less recent, we would typically call that an
> > > epoch,
> > > > > and
> > > > > > > > not a
> > > > > > > > > > > > > version.
> > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > the ZK data structures, the word "version" is
> > > > > typically
> > > > > > > > > > reserved
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > > describing changes to the overall schema of
> the
> > > > data
> > > > > > that
> > > > > > > > is
> > > > > > > > > > > > written
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > ZooKeeper.  We don't even really change the
> > > > "version"
> > > > > > of
> > > > > > > > > those
> > > > > > > > > > > > > schemas
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > much, since most changes are
> > > backwards-compatible.
> > > > > But
> > > > > > > we
> > > > > > > > do
> > > > > > > > > > > > include
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > version field just in case.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I don't think we really need an epoch here,
> > > though,
> > > > > > since
> > > > > > > > we
> > > > > > > > > > can
> > > > > > > > > > > > just
> > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > at the broker epoch.  Whenever the broker
> > > > registers,
> > > > > > its
> > > > > > > > > epoch
> > > > > > > > > > > will
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > greater than the previous broker epoch.  And
> > the
> > > > > newly
> > > > > > > > > > registered
> > > > > > > > > > > > > data
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > take priority.  This will be a lot simpler
> than
> > > > > adding
> > > > > > a
> > > > > > > > > > separate
> > > > > > > > > > > > > epoch
> > > > > > > > > > > > > > > > system, I think.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2. Regarding admin client needing min and
> max
> > > > > > > > information -
> > > > > > > > > > you
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > > right!
> > > > > > > > > > > > > > > > > I've changed the KIP such that the Admin
> API
> > > also
> > > > > > > allows
> > > > > > > > > the
> > > > > > > > > > > user
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > read
> > > > > > > > > > > > > > > > > 'supported features' from a specific
> broker.
> > > > Please
> > > > > > > look
> > > > > > > > at
> > > > > > > > > > the
> > > > > > > > > > > > > > section
> > > > > > > > > > > > > > > > > "Admin API changes".
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` -
> it
> > > was
> > > > > not
> > > > > > > > > > > deliberate.
> > > > > > > > > > > > > > I've
> > > > > > > > > > > > > > > > > improved the KIP to just use `long` at all
> > > > places.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Sounds good.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand
> tool
> > -
> > > > you
> > > > > > are
> > > > > > > > > right!
> > > > > > > > > > > > I've
> > > > > > > > > > > > > > > > updated
> > > > > > > > > > > > > > > > > the KIP sketching the functionality
> provided
> > by
> > > > > this
> > > > > > > > tool,
> > > > > > > > > > with
> > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > examples. Please look at the section
> "Tooling
> > > > > support
> > > > > > > > > > > examples".
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin
> > McCabe <
> > > > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > In the "Schema" section, do we really
> need
> > > both
> > > > > > > > > > > > > __schema_version__
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > __data_version__?  Can we just have a
> > single
> > > > > > version
> > > > > > > > > field
> > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function have
> > > some
> > > > > way
> > > > > > to
> > > > > > > > get
> > > > > > > > > > the
> > > > > > > > > > > > min
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > information that we're exposing as
> well?  I
> > > > guess
> > > > > > we
> > > > > > > > > could
> > > > > > > > > > > have
> > > > > > > > > > > > > > min,
> > > > > > > > > > > > > > > > max,
> > > > > > > > > > > > > > > > > > and current.  Unrelated: is the use of
> Long
> > > > > rather
> > > > > > > than
> > > > > > > > > > long
> > > > > > > > > > > > > > > deliberate
> > > > > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It would be good to describe how the
> > command
> > > > line
> > > > > > > tool
> > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work.
> For
> > > > > example
> > > > > > > the
> > > > > > > > > > flags
> > > > > > > > > > > > that
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > take and the output that it will generate
> > to
> > > > > > STDOUT.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik
> > > > Prakasam
> > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I've opened KIP-584
> > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > is intended to provide a versioning
> > scheme
> > > > for
> > > > > > > > > features.
> > > > > > > > > > > I'd
> > > > > > > > > > > > > like
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > this thread to discuss the same. I'd
> > > > appreciate
> > > > > > any
> > > > > > > > > > > feedback
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > > Here
> > > > > > > > > > > > > > > > > > > is a link to KIP-584
> > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > > > >  .
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Jun Rao <ju...@confluent.io>.
Hi, Kowshik,

200. I guess you are saying only when the allowDowngrade field is set, the
finalized feature version can go backward. Otherwise, it can only go up.
That makes sense. It would be useful to make that clear when explaining
the usage of the allowDowngrade field. In the validation section, we have  "
/features' from {"max_version_level": X} to {"max_version_level": X’}", it
seems that we need to mention Y there.

Thanks,

Jun

On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> Great question! Please find my response below.
>
> > 200. My understanding is that If the CLI tool passes the
> > '--allow-downgrade' flag when updating a specific feature, then a future
> > downgrade is possible. Otherwise, the feature is now downgradable. If so,
> I
> > was wondering how the controller remembers this since it can be restarted
> > over time?
>
> (Kowshik): The purpose of the flag was to just restrict the user intent for
> a specific request.
> It seems to me that to avoid confusion, I could call the flag as
> `--try-downgrade` instead.
> Then this makes it clear, that, the controller just has to consider the ask
> from
> the user as an explicit request to attempt a downgrade.
>
> The flag does not act as an override on controller's decision making that
> decides whether
> a flag is downgradable (these decisions on whether to allow a flag to be
> downgraded
> from a specific version level, can be embedded in the controller code).
>
> Please let me know what you think.
> Sorry if I misunderstood the original question.
>
>
> Cheers,
> Kowshik
>
>
> On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > Thanks for the reply. Makes sense. Just one more question.
> >
> > 200. My understanding is that If the CLI tool passes the
> > '--allow-downgrade' flag when updating a specific feature, then a future
> > downgrade is possible. Otherwise, the feature is now downgradable. If
> so, I
> > was wondering how the controller remembers this since it can be restarted
> > over time?
> >
> > Jun
> >
> >
> > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <kprakasam@confluent.io
> >
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks a lot for the feedback and the questions!
> > > Please find my response below.
> > >
> > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It
> > seems
> > > > that field needs to be persisted somewhere in ZK?
> > >
> > > (Kowshik): Great question! Below is my explanation. Please help me
> > > understand,
> > > if you feel there are cases where we would need to still persist it in
> > ZK.
> > >
> > > Firstly I have updated my thoughts into the KIP now, under the
> > 'guidelines'
> > > section:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > >
> > > The allowDowngrade boolean field is just to restrict the user intent,
> and
> > > to remind
> > > them to double check their intent before proceeding. It should be set
> to
> > > true
> > > by the user in a request, only when the user intent is to forcefully
> > > "attempt" a
> > > downgrade of a specific feature's max version level, to the provided
> > value
> > > in
> > > the request.
> > >
> > > We can extend this safeguard. The controller (on it's end) can maintain
> > > rules in the code, that, for safety reasons would outright reject
> certain
> > > downgrades
> > > from a specific max_version_level for a specific feature. Such
> rejections
> > > may
> > > happen depending on the feature being downgraded, and from what version
> > > level.
> > >
> > > The CLI tool only allows a downgrade attempt in conjunction with
> specific
> > > flags and sub-commands. For example, in the CLI tool, if the user uses
> > the
> > > 'downgrade-all' command, or passes '--allow-downgrade' flag when
> > updating a
> > > specific feature, only then the tool will translate this ask to setting
> > > 'allowDowngrade' field in the request to the server.
> > >
> > > > 201. UpdateFeaturesResponse has the following top level fields.
> Should
> > > > those fields be per feature?
> > > >
> > > >   "fields": [
> > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > > >       "about": "The error code, or 0 if there was no error." },
> > > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> > > >       "about": "The error message, or null if there was no error." }
> > > >   ]
> > >
> > > (Kowshik): Great question!
> > > As such, the API is transactional, as explained in the sections linked
> > > below.
> > > Either all provided FeatureUpdate was applied, or none.
> > > It's the reason I felt we can have just one error code + message.
> > > Happy to extend this if you feel otherwise. Please let me know.
> > >
> > > Link to sections:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
> > >
> > > > 202. The /features path in ZK has a field min_version_level. Which
> API
> > > and
> > > > tool can change that value?
> > >
> > > (Kowshik): Great question! Currently this cannot be modified by using
> the
> > > API or the tool.
> > > Feature version deprecation (by raising min_version_level) can be done
> > only
> > > by the Controller directly. The rationale is explained in this section:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for addressing those comments. Just a few more minor comments.
> > > >
> > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It
> > seems
> > > > that field needs to be persisted somewhere in ZK?
> > > >
> > > > 201. UpdateFeaturesResponse has the following top level fields.
> Should
> > > > those fields be per feature?
> > > >
> > > >   "fields": [
> > > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > > >       "about": "The error code, or 0 if there was no error." },
> > > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> > > >       "about": "The error message, or null if there was no error." }
> > > >   ]
> > > >
> > > > 202. The /features path in ZK has a field min_version_level. Which
> API
> > > and
> > > > tool can change that value?
> > > >
> > > > Jun
> > > >
> > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
> > kprakasam@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for the feedback! I have updated the KIP-584 addressing your
> > > > > comments.
> > > > > Please find my response below.
> > > > >
> > > > > > 100.6 You can look for the sentence "This operation requires
> ALTER
> > on
> > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > KafkaApis.authorize().
> > > > >
> > > > > (Kowshik): Done. Great point! For the newly introduced
> > UPDATE_FEATURES
> > > > api,
> > > > > I have added a
> > > > > requirement that AclOperation.ALTER is required on
> > > ResourceType.CLUSTER.
> > > > >
> > > > > > 110. Keeping the feature version as int is probably fine. I just
> > felt
> > > > > that
> > > > > > for some of the common user interactions, it's more convenient to
> > > > > > relate that to a release version. For example, if a user wants to
> > > > > downgrade
> > > > > > to a release 2.5, it's easier for the user to use the tool like
> > "tool
> > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
> --version
> > > 6".
> > > > >
> > > > > (Kowshik): Great point. Generally, maximum feature version levels
> are
> > > not
> > > > > downgradable after
> > > > > they are finalized in the cluster. This is because, as a guideline
> > > > bumping
> > > > > feature version level usually is used mainly to convey important
> > > breaking
> > > > > changes.
> > > > > Despite the above, there may be some extreme/rare cases where a
> user
> > > > wants
> > > > > to downgrade
> > > > > all features to a specific previous release. The user may want to
> do
> > > this
> > > > > just
> > > > > prior to rolling back a Kafka cluster to a previous release.
> > > > >
> > > > > To support the above, I have made a change to the KIP explaining
> that
> > > the
> > > > > CLI tool is versioned.
> > > > > The CLI tool internally has knowledge about a map of features to
> > their
> > > > > respective max
> > > > > versions supported by the Broker. The tool's knowledge of features
> > and
> > > > > their version values,
> > > > > is limited to the version of the CLI tool itself i.e. the
> information
> > > is
> > > > > packaged into the CLI tool
> > > > > when it is released. Whenever a Kafka release introduces a new
> > feature
> > > > > version, or modifies
> > > > > an existing feature version, the CLI tool shall also be updated
> with
> > > this
> > > > > information,
> > > > > Newer versions of the CLI tool will be released as part of the
> Kafka
> > > > > releases.
> > > > >
> > > > > Therefore, to achieve the downgrade need, the user just needs to
> run
> > > the
> > > > > version of
> > > > > the CLI tool that's part of the particular previous release that
> > he/she
> > > > is
> > > > > downgrading to.
> > > > > To help the user with this, there is a new command added to the CLI
> > > tool
> > > > > called `downgrade-all`.
> > > > > This essentially downgrades max version levels of all features in
> the
> > > > > cluster to the versions
> > > > > known to the CLI tool internally.
> > > > >
> > > > > I have explained the above in the KIP under these sections:
> > > > >
> > > > > Tooling support (have explained that the CLI tool is versioned):
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > >
> > > > > Regular CLI tool usage (please refer to point #3, and see the
> tooling
> > > > > example)
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > > >
> > > > > > 110. Similarly, if the client library finds a feature mismatch
> with
> > > the
> > > > > broker,
> > > > > > the client likely needs to log some error message for the user to
> > > take
> > > > > some
> > > > > > actions. It's much more actionable if the error message is
> "upgrade
> > > the
> > > > > > broker to release version 2.6" than just "upgrade the broker to
> > > feature
> > > > > > version 7".
> > > > >
> > > > > (Kowshik): That's a really good point! If we use ints for feature
> > > > versions,
> > > > > the best
> > > > > message that client can print for debugging is "broker doesn't
> > support
> > > > > feature version 7", and alongside that print the supported version
> > > range
> > > > > returned
> > > > > by the broker. Then, does it sound reasonable that the user could
> > then
> > > > > reference
> > > > > Kafka release logs to figure out which version of the broker
> release
> > is
> > > > > required
> > > > > be deployed, to support feature version 7? I couldn't think of a
> > better
> > > > > strategy here.
> > > > >
> > > > > > 120. When should a developer bump up the version of a feature?
> > > > >
> > > > > (Kowshik): Great question! In the KIP, I have added a section:
> > > > 'Guidelines
> > > > > on feature versions and workflows'
> > > > > providing some guidelines on when to use the versioned feature
> flags,
> > > and
> > > > > what
> > > > > are the regular workflows with the CLI tool.
> > > > >
> > > > > Link to the relevant sections:
> > > > > Guidelines:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > > > >
> > > > > Regular CLI tool usage:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > > >
> > > > > Advanced CLI tool usage:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > >
> > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Kowshik,
> > > > > >
> > > > > > Thanks for the reply. A few more comments.
> > > > > >
> > > > > > 110. Keeping the feature version as int is probably fine. I just
> > felt
> > > > > that
> > > > > > for some of the common user interactions, it's more convenient to
> > > > > > relate that to a release version. For example, if a user wants to
> > > > > downgrade
> > > > > > to a release 2.5, it's easier for the user to use the tool like
> > "tool
> > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X
> --version
> > > 6".
> > > > > > Similarly, if the client library finds a feature mismatch with
> the
> > > > > broker,
> > > > > > the client likely needs to log some error message for the user to
> > > take
> > > > > some
> > > > > > actions. It's much more actionable if the error message is
> "upgrade
> > > the
> > > > > > broker to release version 2.6" than just "upgrade the broker to
> > > feature
> > > > > > version 7".
> > > > > >
> > > > > > 111. Sounds good.
> > > > > >
> > > > > > 120. When should a developer bump up the version of a feature?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> > > > kprakasam@confluent.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > I have updated the KIP for the item 111.
> > > > > > > I'm in the process of addressing 100.6, and will provide an
> > update
> > > > > soon.
> > > > > > > I think item 110 is still under discussion given we are now
> > > > providing a
> > > > > > way
> > > > > > > to finalize
> > > > > > > all features to their latest version levels. In any case,
> please
> > > let
> > > > us
> > > > > > > know
> > > > > > > how you feel in response to Colin's comments on this topic.
> > > > > > >
> > > > > > > > 111. To put this in context, when we had IBP, the default
> value
> > > is
> > > > > the
> > > > > > > > current released version. So, if you are a brand new user,
> you
> > > > don't
> > > > > > need
> > > > > > > > to configure IBP and all new features will be immediately
> > > available
> > > > > in
> > > > > > > the
> > > > > > > > new cluster. If you are upgrading from an old version, you do
> > > need
> > > > to
> > > > > > > > understand and configure IBP. I see a similar pattern here
> for
> > > > > > > > features. From the ease of use perspective, ideally, we
> > shouldn't
> > > > > > require
> > > > > > > a
> > > > > > > > new user to have an extra step such as running a bootstrap
> > script
> > > > > > unless
> > > > > > > > it's truly necessary. If someone has a special need (all the
> > > cases
> > > > > you
> > > > > > > > mentioned seem special cases?), they can configure a mode
> such
> > > that
> > > > > > > > features are enabled/disabled manually.
> > > > > > >
> > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if I
> > didn't
> > > > > > > understand
> > > > > > > this need earlier. I have updated the KIP with the approach
> that
> > > > > whenever
> > > > > > > the '/features' node is absent, the controller by default will
> > > > > bootstrap
> > > > > > > the node
> > > > > > > to contain the latest feature levels. Here is the new section
> in
> > > the
> > > > > KIP
> > > > > > > describing
> > > > > > > the same:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > > > > > >
> > > > > > > Next, as I explained in my response to Colin's suggestions, we
> > are
> > > > now
> > > > > > > providing a `--finalize-latest-features` flag with the tooling.
> > > This
> > > > > lets
> > > > > > > the sysadmin finalize all features known to the controller to
> > their
> > > > > > latest
> > > > > > > version
> > > > > > > levels. Please look at this section (point #3 and the tooling
> > > example
> > > > > > > later):
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > > > >
> > > > > > >
> > > > > > > Do you feel this addresses your comment/concern?
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Kowshik,
> > > > > > > >
> > > > > > > > Thanks for the reply. A few more replies below.
> > > > > > > >
> > > > > > > > 100.6 You can look for the sentence "This operation requires
> > > ALTER
> > > > on
> > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > > > KafkaApis.authorize().
> > > > > > > >
> > > > > > > > 110. From the external client/tooling perspective, it's more
> > > > natural
> > > > > to
> > > > > > > use
> > > > > > > > the release version for features. If we can use the same
> > release
> > > > > > version
> > > > > > > > for internal representation, it seems simpler (easier to
> > > > understand,
> > > > > no
> > > > > > > > mapping overhead, etc). Is there a benefit with separate
> > external
> > > > and
> > > > > > > > internal versioning schemes?
> > > > > > > >
> > > > > > > > 111. To put this in context, when we had IBP, the default
> value
> > > is
> > > > > the
> > > > > > > > current released version. So, if you are a brand new user,
> you
> > > > don't
> > > > > > need
> > > > > > > > to configure IBP and all new features will be immediately
> > > available
> > > > > in
> > > > > > > the
> > > > > > > > new cluster. If you are upgrading from an old version, you do
> > > need
> > > > to
> > > > > > > > understand and configure IBP. I see a similar pattern here
> for
> > > > > > > > features. From the ease of use perspective, ideally, we
> > shouldn't
> > > > > > > require a
> > > > > > > > new user to have an extra step such as running a bootstrap
> > script
> > > > > > unless
> > > > > > > > it's truly necessary. If someone has a special need (all the
> > > cases
> > > > > you
> > > > > > > > mentioned seem special cases?), they can configure a mode
> such
> > > that
> > > > > > > > features are enabled/disabled manually.
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> > > > > > kprakasam@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > Thanks for the feedback and suggestions. Please find my
> > > response
> > > > > > below.
> > > > > > > > >
> > > > > > > > > > 100.6 For every new request, the admin needs to control
> who
> > > is
> > > > > > > allowed
> > > > > > > > to
> > > > > > > > > > issue that request if security is enabled. So, we need to
> > > > assign
> > > > > > the
> > > > > > > > new
> > > > > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > > as an example.
> > > > > > > > >
> > > > > > > > > (Kowshik): I don't see any reference to the words
> > ResourceType
> > > or
> > > > > > > > > AclOperations
> > > > > > > > > in the KIP. Please let me know how I can use the KIP that
> you
> > > > > linked
> > > > > > to
> > > > > > > > > know how to
> > > > > > > > > setup the appropriate ResourceType and/or ClusterOperation?
> > > > > > > > >
> > > > > > > > > > 105. If we change delete to disable, it's better to do
> this
> > > > > > > > consistently
> > > > > > > > > in
> > > > > > > > > > request protocol and admin api as well.
> > > > > > > > >
> > > > > > > > > (Kowshik): The API shouldn't be called 'disable' when it is
> > > > > deleting
> > > > > > a
> > > > > > > > > feature.
> > > > > > > > > I've just changed the KIP to use 'delete'. I don't have a
> > > strong
> > > > > > > > > preference.
> > > > > > > > >
> > > > > > > > > > 110. The minVersion/maxVersion for features use int64.
> > > > Currently,
> > > > > > our
> > > > > > > > > > release version schema is major.minor.bugfix (e.g.
> 2.5.0).
> > > It's
> > > > > > > > possible
> > > > > > > > > > for new features to be included in minor releases too.
> > Should
> > > > we
> > > > > > make
> > > > > > > > the
> > > > > > > > > > feature versioning match the release versioning?
> > > > > > > > >
> > > > > > > > > (Kowshik): The release version can be mapped to a set of
> > > feature
> > > > > > > > versions,
> > > > > > > > > and this can be done, for example in the tool (or even
> > external
> > > > to
> > > > > > the
> > > > > > > > > tool).
> > > > > > > > > Can you please clarify what I'm missing?
> > > > > > > > >
> > > > > > > > > > 111. "During regular operations, the data in the ZK node
> > can
> > > be
> > > > > > > mutated
> > > > > > > > > > only via a specific admin API served only by the
> > > controller." I
> > > > > am
> > > > > > > > > > wondering why can't the controller auto finalize a
> feature
> > > > > version
> > > > > > > > after
> > > > > > > > > > all brokers are upgraded? For new users who download the
> > > latest
> > > > > > > version
> > > > > > > > > to
> > > > > > > > > > build a new cluster, it's inconvenient for them to have
> to
> > > > > manually
> > > > > > > > > enable
> > > > > > > > > > each feature.
> > > > > > > > >
> > > > > > > > > (Kowshik): I agree that there is a trade-off here, but it
> > will
> > > > help
> > > > > > > > > to decide whether the automation can be thought through in
> > the
> > > > > future
> > > > > > > > > in a follow up KIP, or right now in this KIP. We may invest
> > > > > > > > > in automation, but we have to decide whether we should do
> it
> > > > > > > > > now or later.
> > > > > > > > >
> > > > > > > > > For the inconvenience that you mentioned, do you think the
> > > > problem
> > > > > > that
> > > > > > > > you
> > > > > > > > > mentioned can be  overcome by asking for the cluster
> operator
> > > to
> > > > > run
> > > > > > a
> > > > > > > > > bootstrap script  when he/she knows that a specific AK
> > release
> > > > has
> > > > > > been
> > > > > > > > > almost completely deployed in a cluster for the first time?
> > > Idea
> > > > is
> > > > > > > that
> > > > > > > > > the
> > > > > > > > > bootstrap script will know how to map a specific AK release
> > to
> > > > > > > finalized
> > > > > > > > > feature versions, and run the `kafka-features.sh` tool
> > > > > appropriately
> > > > > > > > > against
> > > > > > > > > the cluster.
> > > > > > > > >
> > > > > > > > > Now, coming back to your automation proposal/question.
> > > > > > > > > I do see the value of automated feature version
> finalization,
> > > > but I
> > > > > > > also
> > > > > > > > > see
> > > > > > > > > that this will open up several questions and some risks, as
> > > > > explained
> > > > > > > > > below.
> > > > > > > > > The answers to these depend on the definition of the
> > automation
> > > > we
> > > > > > > choose
> > > > > > > > > to build, and how well does it fit into a kafka deployment.
> > > > > > > > > Basically, it can be unsafe for the controller to finalize
> > > > feature
> > > > > > > > version
> > > > > > > > > upgrades automatically, without learning about the intent
> of
> > > the
> > > > > > > cluster
> > > > > > > > > operator.
> > > > > > > > > 1. We would sometimes want to lock feature versions only
> when
> > > we
> > > > > have
> > > > > > > > > externally verified
> > > > > > > > > the stability of the broker binary.
> > > > > > > > > 2. Sometimes only the cluster operator knows that a cluster
> > > > upgrade
> > > > > > is
> > > > > > > > > complete,
> > > > > > > > > and new brokers are highly unlikely to join the cluster.
> > > > > > > > > 3. Only the cluster operator knows that the intent is to
> > deploy
> > > > the
> > > > > > > same
> > > > > > > > > version
> > > > > > > > > of the new broker release across the entire cluster (i.e.
> the
> > > > > latest
> > > > > > > > > downloaded version).
> > > > > > > > > 4. For downgrades, it appears the controller still needs
> some
> > > > > > external
> > > > > > > > > input
> > > > > > > > > (such as the proposed tool) to finalize a feature version
> > > > > downgrade.
> > > > > > > > >
> > > > > > > > > If we have automation, that automation can end up failing
> in
> > > some
> > > > > of
> > > > > > > the
> > > > > > > > > cases
> > > > > > > > > above. Then, we need a way to declare that the cluster is
> > "not
> > > > > ready"
> > > > > > > if
> > > > > > > > > the
> > > > > > > > > controller cannot automatically finalize some basic
> required
> > > > > feature
> > > > > > > > > version
> > > > > > > > > upgrades across the cluster. We need to make the cluster
> > > operator
> > > > > > aware
> > > > > > > > in
> > > > > > > > > such a scenario (raise an alert or alike).
> > > > > > > > >
> > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should
> be
> > 49
> > > > > > instead
> > > > > > > > of
> > > > > > > > > 48.
> > > > > > > > >
> > > > > > > > > (Kowshik): Done.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Kowshik,
> > > > > > > > > >
> > > > > > > > > > Thanks for the reply. A few more comments below.
> > > > > > > > > >
> > > > > > > > > > 100.6 For every new request, the admin needs to control
> who
> > > is
> > > > > > > allowed
> > > > > > > > to
> > > > > > > > > > issue that request if security is enabled. So, we need to
> > > > assign
> > > > > > the
> > > > > > > > new
> > > > > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > > as
> > > > > > > > > > an example.
> > > > > > > > > >
> > > > > > > > > > 105. If we change delete to disable, it's better to do
> this
> > > > > > > > consistently
> > > > > > > > > in
> > > > > > > > > > request protocol and admin api as well.
> > > > > > > > > >
> > > > > > > > > > 110. The minVersion/maxVersion for features use int64.
> > > > Currently,
> > > > > > our
> > > > > > > > > > release version schema is major.minor.bugfix (e.g.
> 2.5.0).
> > > It's
> > > > > > > > possible
> > > > > > > > > > for new features to be included in minor releases too.
> > Should
> > > > we
> > > > > > make
> > > > > > > > the
> > > > > > > > > > feature versioning match the release versioning?
> > > > > > > > > >
> > > > > > > > > > 111. "During regular operations, the data in the ZK node
> > can
> > > be
> > > > > > > mutated
> > > > > > > > > > only via a specific admin API served only by the
> > > controller." I
> > > > > am
> > > > > > > > > > wondering why can't the controller auto finalize a
> feature
> > > > > version
> > > > > > > > after
> > > > > > > > > > all brokers are upgraded? For new users who download the
> > > latest
> > > > > > > version
> > > > > > > > > to
> > > > > > > > > > build a new cluster, it's inconvenient for them to have
> to
> > > > > manually
> > > > > > > > > enable
> > > > > > > > > > each feature.
> > > > > > > > > >
> > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should
> be
> > 49
> > > > > > instead
> > > > > > > > of
> > > > > > > > > > 48.
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > > > > > > kprakasam@confluent.io>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey Jun,
> > > > > > > > > > >
> > > > > > > > > > > Thanks a lot for the great feedback! Please note that
> the
> > > > > design
> > > > > > > > > > > has changed a little bit on the KIP, and we now
> propagate
> > > the
> > > > > > > > finalized
> > > > > > > > > > > features metadata only via ZK watches (instead of
> > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > from the controller).
> > > > > > > > > > >
> > > > > > > > > > > Please find below my response to your
> questions/feedback,
> > > > with
> > > > > > the
> > > > > > > > > prefix
> > > > > > > > > > > "(Kowshik):".
> > > > > > > > > > >
> > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > > 100.1 Since this request waits for responses from
> > > brokers,
> > > > > > should
> > > > > > > > we
> > > > > > > > > > add
> > > > > > > > > > > a
> > > > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! Done. I have added a timeout
> > field.
> > > > > Note:
> > > > > > > we
> > > > > > > > no
> > > > > > > > > > > longer
> > > > > > > > > > > wait for responses from brokers, since the design has
> > been
> > > > > > changed
> > > > > > > so
> > > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > features information is propagated via ZK.
> Nevertheless,
> > it
> > > > is
> > > > > > > right
> > > > > > > > to
> > > > > > > > > > > have a timeout
> > > > > > > > > > > for the request.
> > > > > > > > > > >
> > > > > > > > > > > > 100.2 The response schema is a bit weird. Typically,
> > the
> > > > > > response
> > > > > > > > > just
> > > > > > > > > > > > shows an error code and an error message, instead of
> > > > echoing
> > > > > > the
> > > > > > > > > > request.
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to
> just
> > > > return
> > > > > > an
> > > > > > > > > error
> > > > > > > > > > > code and a message.
> > > > > > > > > > > Previously it was not echoing the "request", rather it
> > was
> > > > > > > returning
> > > > > > > > > the
> > > > > > > > > > > latest set of
> > > > > > > > > > > cluster-wide finalized features (after applying the
> > > updates).
> > > > > But
> > > > > > > you
> > > > > > > > > are
> > > > > > > > > > > right,
> > > > > > > > > > > the additional info is not required, so I have removed
> it
> > > > from
> > > > > > the
> > > > > > > > > > response
> > > > > > > > > > > schema.
> > > > > > > > > > >
> > > > > > > > > > > > 100.3 Should we add a separate request to
> list/describe
> > > the
> > > > > > > > existing
> > > > > > > > > > > > features?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): This is already present in the KIP via the
> > > > > > > > > 'DescribeFeatures'
> > > > > > > > > > > Admin API,
> > > > > > > > > > > which, underneath covers uses the ApiVersionsRequest to
> > > > > > > list/describe
> > > > > > > > > the
> > > > > > > > > > > existing features. Please read the 'Tooling support'
> > > section.
> > > > > > > > > > >
> > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
> > single
> > > > > > request.
> > > > > > > > For
> > > > > > > > > > > > DELETE, the version field doesn't make sense. So, I
> > guess
> > > > the
> > > > > > > > broker
> > > > > > > > > > just
> > > > > > > > > > > > ignores this? An alternative way is to have a
> separate
> > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! I have modified the KIP now to
> > > have 2
> > > > > > > > separate
> > > > > > > > > > > controller APIs
> > > > > > > > > > > serving these different purposes:
> > > > > > > > > > > 1. updateFeatures
> > > > > > > > > > > 2. deleteFeatures
> > > > > > > > > > >
> > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > > monotonically
> > > > > > > > > increasing
> > > > > > > > > > > > version of the metadata for finalized features." I am
> > > > > wondering
> > > > > > > why
> > > > > > > > > the
> > > > > > > > > > > > ordering is important?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is called
> epoch
> > > > > > (instead
> > > > > > > of
> > > > > > > > > > > version), and
> > > > > > > > > > > it is just the ZK node version. Basically, this is the
> > > epoch
> > > > > for
> > > > > > > the
> > > > > > > > > > > cluster-wide
> > > > > > > > > > > finalized feature version metadata. This metadata is
> > served
> > > > to
> > > > > > > > clients
> > > > > > > > > > via
> > > > > > > > > > > the
> > > > > > > > > > > ApiVersionsResponse (for reads). We propagate updates
> > from
> > > > the
> > > > > > > > > > '/features'
> > > > > > > > > > > ZK node
> > > > > > > > > > > to all brokers, via ZK watches setup by each broker on
> > the
> > > > > > > > '/features'
> > > > > > > > > > > node.
> > > > > > > > > > >
> > > > > > > > > > > Now here is why the ordering is important:
> > > > > > > > > > > ZK watches don't propagate at the same time. As a
> result,
> > > the
> > > > > > > > > > > ApiVersionsResponse
> > > > > > > > > > > is eventually consistent across brokers. This can
> > introduce
> > > > > cases
> > > > > > > > > > > where clients see an older lower epoch of the features
> > > > > metadata,
> > > > > > > > after
> > > > > > > > > a
> > > > > > > > > > > more recent
> > > > > > > > > > > higher epoch was returned at a previous point in time.
> We
> > > > > expect
> > > > > > > > > clients
> > > > > > > > > > > to always employ the rule that the latest received
> higher
> > > > epoch
> > > > > > of
> > > > > > > > > > metadata
> > > > > > > > > > > always trumps an older smaller epoch. Those clients
> that
> > > are
> > > > > > > external
> > > > > > > > > to
> > > > > > > > > > > Kafka should strongly consider discovering the latest
> > > > metadata
> > > > > > once
> > > > > > > > > > during
> > > > > > > > > > > startup from the brokers, and if required refresh the
> > > > metadata
> > > > > > > > > > periodically
> > > > > > > > > > > (to get the latest metadata).
> > > > > > > > > > >
> > > > > > > > > > > > 100.6 Could you specify the required ACL for this new
> > > > > request?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): What is ACL, and how could I find out which
> > one
> > > to
> > > > > > > > specify?
> > > > > > > > > > > Please could you provide me some pointers? I'll be glad
> > to
> > > > > update
> > > > > > > the
> > > > > > > > > > > KIP once I know the next steps.
> > > > > > > > > > >
> > > > > > > > > > > > 101. For the broker registration ZK node, should we
> > bump
> > > up
> > > > > the
> > > > > > > > > version
> > > > > > > > > > > in
> > > > > > > > > > > the json?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! Done. I've increased the
> version
> > in
> > > > the
> > > > > > > > broker
> > > > > > > > > > json
> > > > > > > > > > > by 1.
> > > > > > > > > > >
> > > > > > > > > > > > 102. For the /features ZK node, not sure if we need
> the
> > > > epoch
> > > > > > > > field.
> > > > > > > > > > Each
> > > > > > > > > > > > ZK node has an internal version field that is
> > incremented
> > > > on
> > > > > > > every
> > > > > > > > > > > update.
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node
> > version
> > > > > now,
> > > > > > > > > instead
> > > > > > > > > > of
> > > > > > > > > > > explicitly
> > > > > > > > > > > incremented epoch.
> > > > > > > > > > >
> > > > > > > > > > > > 103. "Enabling the actual semantics of a feature
> > version
> > > > > > > > cluster-wide
> > > > > > > > > > is
> > > > > > > > > > > > left to the discretion of the logic implementing the
> > > > feature
> > > > > > (ex:
> > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > > > done via dynamic broker config)." Does that mean the
> > > broker
> > > > > > > > > > registration
> > > > > > > > > > > ZK
> > > > > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Not really. The text was just conveying
> that a
> > > > > broker
> > > > > > > > could
> > > > > > > > > > > "know" of
> > > > > > > > > > > a new feature version, but it does not mean the broker
> > > should
> > > > > > have
> > > > > > > > also
> > > > > > > > > > > activated the effects of the feature version. Knowing
> vs
> > > > > > activation
> > > > > > > > > are 2
> > > > > > > > > > > separate things,
> > > > > > > > > > > and the latter can be achieved by dynamic config. I
> have
> > > > > reworded
> > > > > > > the
> > > > > > > > > > text
> > > > > > > > > > > to
> > > > > > > > > > > make this clear to the reader.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > > 104.1 It would be useful to describe when the feature
> > > > > metadata
> > > > > > is
> > > > > > > > > > > included
> > > > > > > > > > > > in the request. My understanding is that it's only
> > > included
> > > > > if
> > > > > > > (1)
> > > > > > > > > > there
> > > > > > > > > > > is
> > > > > > > > > > > > a change to the finalized feature; (2) broker
> restart;
> > > (3)
> > > > > > > > controller
> > > > > > > > > > > > failover.
> > > > > > > > > > > > 104.2 The new fields have the following versions. Why
> > are
> > > > the
> > > > > > > > > versions
> > > > > > > > > > 3+
> > > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > > >       "fields":  [
> > > > > > > > > > > >         {"name": "Name", "type":  "string",
> "versions":
> > > > > "3+",
> > > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > > "versions":
> > > > > > "3+",
> > > > > > > > > > > >           "about": "The finalized version for the
> > > > feature."}
> > > > > > > > > > > >       ]
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): With the new improved design, we have
> > completely
> > > > > > > > eliminated
> > > > > > > > > > the
> > > > > > > > > > > need to
> > > > > > > > > > > use UpdateMetadataRequest. This is because we now rely
> on
> > > ZK
> > > > to
> > > > > > > > deliver
> > > > > > > > > > the
> > > > > > > > > > > notifications for changes to the '/features' ZK node.
> > > > > > > > > > >
> > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> update/delete,
> > > > > perhaps
> > > > > > > > it's
> > > > > > > > > > > better
> > > > > > > > > > > > to use enable/disable?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so that
> we
> > > > > instead
> > > > > > > call
> > > > > > > > > it
> > > > > > > > > > > 'disable'.
> > > > > > > > > > > However for 'update', it can now also refer to either
> an
> > > > > upgrade
> > > > > > > or a
> > > > > > > > > > > forced downgrade.
> > > > > > > > > > > Therefore, I have left it the way it is, just calling
> it
> > as
> > > > > just
> > > > > > > > > > 'update'.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Kowshik
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
> > jun@confluent.io>
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few
> comments
> > > > below.
> > > > > > > > > > > >
> > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > > 100.1 Since this request waits for responses from
> > > brokers,
> > > > > > should
> > > > > > > > we
> > > > > > > > > > add
> > > > > > > > > > > a
> > > > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > > > > 100.2 The response schema is a bit weird. Typically,
> > the
> > > > > > response
> > > > > > > > > just
> > > > > > > > > > > > shows an error code and an error message, instead of
> > > > echoing
> > > > > > the
> > > > > > > > > > request.
> > > > > > > > > > > > 100.3 Should we add a separate request to
> list/describe
> > > the
> > > > > > > > existing
> > > > > > > > > > > > features?
> > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
> > single
> > > > > > request.
> > > > > > > > For
> > > > > > > > > > > > DELETE, the version field doesn't make sense. So, I
> > guess
> > > > the
> > > > > > > > broker
> > > > > > > > > > just
> > > > > > > > > > > > ignores this? An alternative way is to have a
> separate
> > > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > > monotonically
> > > > > > > > > increasing
> > > > > > > > > > > > version of the metadata for finalized features." I am
> > > > > wondering
> > > > > > > why
> > > > > > > > > the
> > > > > > > > > > > > ordering is important?
> > > > > > > > > > > > 100.6 Could you specify the required ACL for this new
> > > > > request?
> > > > > > > > > > > >
> > > > > > > > > > > > 101. For the broker registration ZK node, should we
> > bump
> > > up
> > > > > the
> > > > > > > > > version
> > > > > > > > > > > in
> > > > > > > > > > > > the json?
> > > > > > > > > > > >
> > > > > > > > > > > > 102. For the /features ZK node, not sure if we need
> the
> > > > epoch
> > > > > > > > field.
> > > > > > > > > > Each
> > > > > > > > > > > > ZK node has an internal version field that is
> > incremented
> > > > on
> > > > > > > every
> > > > > > > > > > > update.
> > > > > > > > > > > >
> > > > > > > > > > > > 103. "Enabling the actual semantics of a feature
> > version
> > > > > > > > cluster-wide
> > > > > > > > > > is
> > > > > > > > > > > > left to the discretion of the logic implementing the
> > > > feature
> > > > > > (ex:
> > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > > > done via dynamic broker config)." Does that mean the
> > > broker
> > > > > > > > > > registration
> > > > > > > > > > > ZK
> > > > > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > > > > >
> > > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > > 104.1 It would be useful to describe when the feature
> > > > > metadata
> > > > > > is
> > > > > > > > > > > included
> > > > > > > > > > > > in the request. My understanding is that it's only
> > > included
> > > > > if
> > > > > > > (1)
> > > > > > > > > > there
> > > > > > > > > > > is
> > > > > > > > > > > > a change to the finalized feature; (2) broker
> restart;
> > > (3)
> > > > > > > > controller
> > > > > > > > > > > > failover.
> > > > > > > > > > > > 104.2 The new fields have the following versions. Why
> > are
> > > > the
> > > > > > > > > versions
> > > > > > > > > > 3+
> > > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > > >       "fields":  [
> > > > > > > > > > > >         {"name": "Name", "type":  "string",
> "versions":
> > > > > "3+",
> > > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > > "versions":
> > > > > > "3+",
> > > > > > > > > > > >           "about": "The finalized version for the
> > > > feature."}
> > > > > > > > > > > >       ]
> > > > > > > > > > > >
> > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> update/delete,
> > > > > perhaps
> > > > > > > > it's
> > > > > > > > > > > better
> > > > > > > > > > > > to use enable/disable?
> > > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > > > > > > > kprakasam@confluent.io
> > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hey Boyang,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the great feedback! I have updated the
> KIP
> > > > based
> > > > > > on
> > > > > > > > your
> > > > > > > > > > > > > feedback.
> > > > > > > > > > > > > Please find my response below for your comments,
> look
> > > for
> > > > > > > > sentences
> > > > > > > > > > > > > starting
> > > > > > > > > > > > > with "(Kowshik)" below.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
> > handling
> > > > EOS
> > > > > > > > > traffic"
> > > > > > > > > > > > could
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > converted as "When is it safe for the brokers to
> > > start
> > > > > > > serving
> > > > > > > > > new
> > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> > > explained
> > > > > > > earlier
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > context.
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
> > version
> > > > > > number
> > > > > > > > part
> > > > > > > > > > > > seems a
> > > > > > > > > > > > > > bit blurred. Could you point a reference to later
> > > > section
> > > > > > > that
> > > > > > > > we
> > > > > > > > > > > going
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > store it in Zookeeper and update it every time
> when
> > > > there
> > > > > > is
> > > > > > > a
> > > > > > > > > > > feature
> > > > > > > > > > > > > > change?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a
> reference
> > in
> > > > the
> > > > > > > KIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 3. For the feature downgrade, although it's a
> > > Non-goal
> > > > of
> > > > > > the
> > > > > > > > > KIP,
> > > > > > > > > > > for
> > > > > > > > > > > > > > features such as group coordinator semantics,
> there
> > > is
> > > > no
> > > > > > > legal
> > > > > > > > > > > > scenario
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > perform a downgrade at all. So having downgrade
> > door
> > > > open
> > > > > > is
> > > > > > > > > pretty
> > > > > > > > > > > > > > error-prone as human faults happen all the time.
> > I'm
> > > > > > assuming
> > > > > > > > as
> > > > > > > > > > new
> > > > > > > > > > > > > > features are implemented, it's not very hard to
> > add a
> > > > > flag
> > > > > > > > during
> > > > > > > > > > > > feature
> > > > > > > > > > > > > > creation to indicate whether this feature is
> > > > > > "downgradable".
> > > > > > > > > Could
> > > > > > > > > > > you
> > > > > > > > > > > > > > explain a bit more on the extra engineering
> effort
> > > for
> > > > > > > shipping
> > > > > > > > > > this
> > > > > > > > > > > > KIP
> > > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree
> here.
> > > > While
> > > > > I
> > > > > > > > agree
> > > > > > > > > > that
> > > > > > > > > > > > > accidental
> > > > > > > > > > > > > downgrades can cause problems, I also think
> sometimes
> > > > > > > downgrades
> > > > > > > > > > should
> > > > > > > > > > > > > be allowed for emergency reasons (not all
> downgrades
> > > > cause
> > > > > > > > issues).
> > > > > > > > > > > > > It is just subjective to the feature being
> > downgraded.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To be more strict about feature version
> downgrades, I
> > > > have
> > > > > > > > modified
> > > > > > > > > > the
> > > > > > > > > > > > KIP
> > > > > > > > > > > > > proposing that we mandate a `--force-downgrade`
> flag
> > be
> > > > > used
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > > UPDATE_FEATURES api
> > > > > > > > > > > > > and the tooling, whenever the human is downgrading
> a
> > > > > > finalized
> > > > > > > > > > feature
> > > > > > > > > > > > > version.
> > > > > > > > > > > > > Hopefully this should cover the requirement, until
> we
> > > > find
> > > > > > the
> > > > > > > > need
> > > > > > > > > > for
> > > > > > > > > > > > > advanced downgrade support.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> > > > > versions
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > > defined
> > > > > > > > > > > > > > in the broker code." So this means in order to
> > > > restrict a
> > > > > > > > certain
> > > > > > > > > > > > > feature,
> > > > > > > > > > > > > > we need to start the broker first and then send a
> > > > feature
> > > > > > > > gating
> > > > > > > > > > > > request
> > > > > > > > > > > > > > immediately, which introduces a time gap and the
> > > > > > > > > intended-to-close
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > could actually serve request during this phase.
> Do
> > > you
> > > > > > think
> > > > > > > we
> > > > > > > > > > > should
> > > > > > > > > > > > > also
> > > > > > > > > > > > > > support configurations as well so that admin user
> > > could
> > > > > > > freely
> > > > > > > > > roll
> > > > > > > > > > > up
> > > > > > > > > > > > a
> > > > > > > > > > > > > > cluster with all nodes complying the same feature
> > > > gating,
> > > > > > > > without
> > > > > > > > > > > > > worrying
> > > > > > > > > > > > > > about the turnaround time to propagate the
> message
> > > only
> > > > > > after
> > > > > > > > the
> > > > > > > > > > > > cluster
> > > > > > > > > > > > > > starts up?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): This is a great point/question. One of
> the
> > > > > > > > expectations
> > > > > > > > > > out
> > > > > > > > > > > of
> > > > > > > > > > > > > this KIP, which is
> > > > > > > > > > > > > already followed in the broker, is the following.
> > > > > > > > > > > > >  - Imagine at time T1 the broker starts up and
> > > registers
> > > > > it’s
> > > > > > > > > > presence
> > > > > > > > > > > in
> > > > > > > > > > > > > ZK,
> > > > > > > > > > > > >    along with advertising it’s supported features.
> > > > > > > > > > > > >  - Imagine at a future time T2 the broker receives
> > the
> > > > > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > > >    from the controller, which contains the latest
> > > > finalized
> > > > > > > > > features
> > > > > > > > > > as
> > > > > > > > > > > > > seen by
> > > > > > > > > > > > >    the controller. The broker validates this data
> > > against
> > > > > > it’s
> > > > > > > > > > > supported
> > > > > > > > > > > > > features to
> > > > > > > > > > > > >    make sure there is no mismatch (it will shutdown
> > if
> > > > > there
> > > > > > is
> > > > > > > > an
> > > > > > > > > > > > > incompatibility).
> > > > > > > > > > > > >
> > > > > > > > > > > > > It is expected that during the time between the 2
> > > events
> > > > T1
> > > > > > and
> > > > > > > > T2,
> > > > > > > > > > the
> > > > > > > > > > > > > broker is
> > > > > > > > > > > > > almost a silent entity in the cluster. It does not
> > add
> > > > any
> > > > > > > value
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > > cluster, or carry
> > > > > > > > > > > > > out any important broker activities. By
> “important”,
> > I
> > > > mean
> > > > > > it
> > > > > > > is
> > > > > > > > > not
> > > > > > > > > > > > doing
> > > > > > > > > > > > > mutations
> > > > > > > > > > > > > on it’s persistence, not mutating critical
> in-memory
> > > > state,
> > > > > > > won’t
> > > > > > > > > be
> > > > > > > > > > > > > serving
> > > > > > > > > > > > > produce/fetch requests. Note it doesn’t even know
> > it’s
> > > > > > assigned
> > > > > > > > > > > > partitions
> > > > > > > > > > > > > until
> > > > > > > > > > > > > it receives UpdateMetadataRequest from controller.
> > > > Anything
> > > > > > the
> > > > > > > > > > broker
> > > > > > > > > > > is
> > > > > > > > > > > > > doing up
> > > > > > > > > > > > > until this point is not damaging/useful.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I’ve clarified the above in the KIP, see this new
> > > > section:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > > > > > > .
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> > > > > existing
> > > > > > > > > > Feature",
> > > > > > > > > > > > may
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > I misunderstood something, I thought the features
> > are
> > > > > > defined
> > > > > > > > in
> > > > > > > > > > > broker
> > > > > > > > > > > > > > code, so admin could not really create a new
> > feature?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! You understood this right.
> > Here
> > > > > > adding
> > > > > > > a
> > > > > > > > > > > feature
> > > > > > > > > > > > > means we are
> > > > > > > > > > > > > adding a cluster-wide finalized *max* version for a
> > > > feature
> > > > > > > that
> > > > > > > > > was
> > > > > > > > > > > > > previously never finalized.
> > > > > > > > > > > > > I have clarified this in the KIP now.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP
> > adding
> > > > the
> > > > > > > above
> > > > > > > > > (see
> > > > > > > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 7. I think we haven't discussed the alternative
> > > > solution
> > > > > to
> > > > > > > > pass
> > > > > > > > > > the
> > > > > > > > > > > > > > feature information through Zookeeper. Is that
> > > > mentioned
> > > > > in
> > > > > > > the
> > > > > > > > > KIP
> > > > > > > > > > > to
> > > > > > > > > > > > > > justify why using UpdateMetadata is more
> favorable?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Nice question! The broker reads
> finalized
> > > > > feature
> > > > > > > info
> > > > > > > > > > > stored
> > > > > > > > > > > > in
> > > > > > > > > > > > > ZK,
> > > > > > > > > > > > > only during startup when it does a validation. When
> > > > serving
> > > > > > > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > > > > > > broker does not read this info from ZK directly.
> I'd
> > > > > imagine
> > > > > > > the
> > > > > > > > > risk
> > > > > > > > > > > is
> > > > > > > > > > > > > that it can increase
> > > > > > > > > > > > > the ZK read QPS which can be a bottleneck for the
> > > system.
> > > > > > > Today,
> > > > > > > > in
> > > > > > > > > > > Kafka
> > > > > > > > > > > > > we use the
> > > > > > > > > > > > > controller to fan out ZK updates to brokers and we
> > want
> > > > to
> > > > > > > stick
> > > > > > > > to
> > > > > > > > > > > that
> > > > > > > > > > > > > pattern to avoid
> > > > > > > > > > > > > the ZK read bottleneck when serving
> > > `ApiVersionsRequest`.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 8. I was under the impression that user could
> > > > configure a
> > > > > > > range
> > > > > > > > > of
> > > > > > > > > > > > > > supported versions, what's the trade-off for
> > allowing
> > > > > > single
> > > > > > > > > > > finalized
> > > > > > > > > > > > > > version only?
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great question! The finalized version
> of a
> > > > > feature
> > > > > > > > > > basically
> > > > > > > > > > > > > refers to
> > > > > > > > > > > > > the cluster-wide finalized feature "maximum"
> version.
> > > For
> > > > > > > > example,
> > > > > > > > > if
> > > > > > > > > > > the
> > > > > > > > > > > > > 'group_coordinator' feature
> > > > > > > > > > > > > has the finalized version set to 10, then, it means
> > > that
> > > > > > > > > cluster-wide
> > > > > > > > > > > all
> > > > > > > > > > > > > versions upto v10 are
> > > > > > > > > > > > > supported for this feature. However, note that if
> > some
> > > > > > version
> > > > > > > > (ex:
> > > > > > > > > > v0)
> > > > > > > > > > > > > gets deprecated
> > > > > > > > > > > > > for this feature, then we don’t convey that using
> > this
> > > > > scheme
> > > > > > > > (also
> > > > > > > > > > > > > supporting deprecation is a non-goal).
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all points,
> > > > > refering
> > > > > > to
> > > > > > > > > > > finalized
> > > > > > > > > > > > > feature "maximum" versions.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> > "client"
> > > > here
> > > > > > may
> > > > > > > > be
> > > > > > > > > a
> > > > > > > > > > > > > producer
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > > > > > > reluctanthero104@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hey Kowshik,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > thanks for the revised KIP. Got a couple of
> > > questions:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
> > handling
> > > > EOS
> > > > > > > > > traffic"
> > > > > > > > > > > > could
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > converted as "When is it safe for the brokers to
> > > start
> > > > > > > serving
> > > > > > > > > new
> > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> > > explained
> > > > > > > earlier
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > context.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
> > version
> > > > > > number
> > > > > > > > part
> > > > > > > > > > > > seems a
> > > > > > > > > > > > > > bit blurred. Could you point a reference to later
> > > > section
> > > > > > > that
> > > > > > > > we
> > > > > > > > > > > going
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > store it in Zookeeper and update it every time
> when
> > > > there
> > > > > > is
> > > > > > > a
> > > > > > > > > > > feature
> > > > > > > > > > > > > > change?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 3. For the feature downgrade, although it's a
> > > Non-goal
> > > > of
> > > > > > the
> > > > > > > > > KIP,
> > > > > > > > > > > for
> > > > > > > > > > > > > > features such as group coordinator semantics,
> there
> > > is
> > > > no
> > > > > > > legal
> > > > > > > > > > > > scenario
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > perform a downgrade at all. So having downgrade
> > door
> > > > open
> > > > > > is
> > > > > > > > > pretty
> > > > > > > > > > > > > > error-prone as human faults happen all the time.
> > I'm
> > > > > > assuming
> > > > > > > > as
> > > > > > > > > > new
> > > > > > > > > > > > > > features are implemented, it's not very hard to
> > add a
> > > > > flag
> > > > > > > > during
> > > > > > > > > > > > feature
> > > > > > > > > > > > > > creation to indicate whether this feature is
> > > > > > "downgradable".
> > > > > > > > > Could
> > > > > > > > > > > you
> > > > > > > > > > > > > > explain a bit more on the extra engineering
> effort
> > > for
> > > > > > > shipping
> > > > > > > > > > this
> > > > > > > > > > > > KIP
> > > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> > > > > versions
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > > defined
> > > > > > > > > > > > > > in the broker code." So this means in order to
> > > > restrict a
> > > > > > > > certain
> > > > > > > > > > > > > feature,
> > > > > > > > > > > > > > we need to start the broker first and then send a
> > > > feature
> > > > > > > > gating
> > > > > > > > > > > > request
> > > > > > > > > > > > > > immediately, which introduces a time gap and the
> > > > > > > > > intended-to-close
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > could actually serve request during this phase.
> Do
> > > you
> > > > > > think
> > > > > > > we
> > > > > > > > > > > should
> > > > > > > > > > > > > also
> > > > > > > > > > > > > > support configurations as well so that admin user
> > > could
> > > > > > > freely
> > > > > > > > > roll
> > > > > > > > > > > up
> > > > > > > > > > > > a
> > > > > > > > > > > > > > cluster with all nodes complying the same feature
> > > > gating,
> > > > > > > > without
> > > > > > > > > > > > > worrying
> > > > > > > > > > > > > > about the turnaround time to propagate the
> message
> > > only
> > > > > > after
> > > > > > > > the
> > > > > > > > > > > > cluster
> > > > > > > > > > > > > > starts up?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> > > > > existing
> > > > > > > > > > Feature",
> > > > > > > > > > > > may
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > I misunderstood something, I thought the features
> > are
> > > > > > defined
> > > > > > > > in
> > > > > > > > > > > broker
> > > > > > > > > > > > > > code, so admin could not really create a new
> > feature?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 7. I think we haven't discussed the alternative
> > > > solution
> > > > > to
> > > > > > > > pass
> > > > > > > > > > the
> > > > > > > > > > > > > > feature information through Zookeeper. Is that
> > > > mentioned
> > > > > in
> > > > > > > the
> > > > > > > > > KIP
> > > > > > > > > > > to
> > > > > > > > > > > > > > justify why using UpdateMetadata is more
> favorable?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 8. I was under the impression that user could
> > > > configure a
> > > > > > > range
> > > > > > > > > of
> > > > > > > > > > > > > > supported versions, what's the trade-off for
> > allowing
> > > > > > single
> > > > > > > > > > > finalized
> > > > > > > > > > > > > > version only?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> > "client"
> > > > here
> > > > > > may
> > > > > > > > be
> > > > > > > > > a
> > > > > > > > > > > > > producer
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Boyang
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > > > > > > > cmccabe@apache.org
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik
> Prakasam
> > > > wrote:
> > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the feedback! I've changed the KIP
> > to
> > > > > > address
> > > > > > > > your
> > > > > > > > > > > > > > > > suggestions.
> > > > > > > > > > > > > > > > Please find below my explanation. Here is a
> > link
> > > to
> > > > > KIP
> > > > > > > > 584:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. '__data_version__' is the version of the
> > > > finalized
> > > > > > > > feature
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > (i.e. actual ZK node contents), while the
> > > > > > > > > '__schema_version__'
> > > > > > > > > > is
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > version of the schema of the data persisted
> in
> > > ZK.
> > > > > > These
> > > > > > > > > serve
> > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > purposes. '__data_version__' is is useful
> > mainly
> > > to
> > > > > > > clients
> > > > > > > > > > > during
> > > > > > > > > > > > > > reads,
> > > > > > > > > > > > > > > > to differentiate between the 2 versions of
> > > > eventually
> > > > > > > > > > consistent
> > > > > > > > > > > > > > > 'finalized
> > > > > > > > > > > > > > > > features' metadata (i.e. larger metadata
> > version
> > > is
> > > > > > more
> > > > > > > > > > recent).
> > > > > > > > > > > > > > > > '__schema_version__' provides an additional
> > > degree
> > > > of
> > > > > > > > > > > flexibility,
> > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > we decide to change the schema for
> '/features'
> > > node
> > > > > in
> > > > > > ZK
> > > > > > > > (in
> > > > > > > > > > the
> > > > > > > > > > > > > > > future),
> > > > > > > > > > > > > > > > then we can manage broker roll outs suitably
> > > (i.e.
> > > > > > > > > > > > > > > > serialization/deserialization of the ZK data
> > can
> > > be
> > > > > > > handled
> > > > > > > > > > > > safely).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Kowshik,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If you're talking about a number that lets you
> > know
> > > > if
> > > > > > data
> > > > > > > > is
> > > > > > > > > > more
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > less recent, we would typically call that an
> > epoch,
> > > > and
> > > > > > > not a
> > > > > > > > > > > > version.
> > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > the ZK data structures, the word "version" is
> > > > typically
> > > > > > > > > reserved
> > > > > > > > > > > for
> > > > > > > > > > > > > > > describing changes to the overall schema of the
> > > data
> > > > > that
> > > > > > > is
> > > > > > > > > > > written
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > ZooKeeper.  We don't even really change the
> > > "version"
> > > > > of
> > > > > > > > those
> > > > > > > > > > > > schemas
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > much, since most changes are
> > backwards-compatible.
> > > > But
> > > > > > we
> > > > > > > do
> > > > > > > > > > > include
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > version field just in case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I don't think we really need an epoch here,
> > though,
> > > > > since
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > > just
> > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > at the broker epoch.  Whenever the broker
> > > registers,
> > > > > its
> > > > > > > > epoch
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > greater than the previous broker epoch.  And
> the
> > > > newly
> > > > > > > > > registered
> > > > > > > > > > > > data
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > take priority.  This will be a lot simpler than
> > > > adding
> > > > > a
> > > > > > > > > separate
> > > > > > > > > > > > epoch
> > > > > > > > > > > > > > > system, I think.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. Regarding admin client needing min and max
> > > > > > > information -
> > > > > > > > > you
> > > > > > > > > > > are
> > > > > > > > > > > > > > > right!
> > > > > > > > > > > > > > > > I've changed the KIP such that the Admin API
> > also
> > > > > > allows
> > > > > > > > the
> > > > > > > > > > user
> > > > > > > > > > > > to
> > > > > > > > > > > > > > read
> > > > > > > > > > > > > > > > 'supported features' from a specific broker.
> > > Please
> > > > > > look
> > > > > > > at
> > > > > > > > > the
> > > > > > > > > > > > > section
> > > > > > > > > > > > > > > > "Admin API changes".
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it
> > was
> > > > not
> > > > > > > > > > deliberate.
> > > > > > > > > > > > > I've
> > > > > > > > > > > > > > > > improved the KIP to just use `long` at all
> > > places.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Sounds good.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool
> -
> > > you
> > > > > are
> > > > > > > > right!
> > > > > > > > > > > I've
> > > > > > > > > > > > > > > updated
> > > > > > > > > > > > > > > > the KIP sketching the functionality provided
> by
> > > > this
> > > > > > > tool,
> > > > > > > > > with
> > > > > > > > > > > > some
> > > > > > > > > > > > > > > > examples. Please look at the section "Tooling
> > > > support
> > > > > > > > > > examples".
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin
> McCabe <
> > > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > In the "Schema" section, do we really need
> > both
> > > > > > > > > > > > __schema_version__
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > __data_version__?  Can we just have a
> single
> > > > > version
> > > > > > > > field
> > > > > > > > > > > here?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function have
> > some
> > > > way
> > > > > to
> > > > > > > get
> > > > > > > > > the
> > > > > > > > > > > min
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > information that we're exposing as well?  I
> > > guess
> > > > > we
> > > > > > > > could
> > > > > > > > > > have
> > > > > > > > > > > > > min,
> > > > > > > > > > > > > > > max,
> > > > > > > > > > > > > > > > > and current.  Unrelated: is the use of Long
> > > > rather
> > > > > > than
> > > > > > > > > long
> > > > > > > > > > > > > > deliberate
> > > > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It would be good to describe how the
> command
> > > line
> > > > > > tool
> > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work.  For
> > > > example
> > > > > > the
> > > > > > > > > flags
> > > > > > > > > > > that
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > take and the output that it will generate
> to
> > > > > STDOUT.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik
> > > Prakasam
> > > > > > wrote:
> > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I've opened KIP-584
> > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > is intended to provide a versioning
> scheme
> > > for
> > > > > > > > features.
> > > > > > > > > > I'd
> > > > > > > > > > > > like
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > this thread to discuss the same. I'd
> > > appreciate
> > > > > any
> > > > > > > > > > feedback
> > > > > > > > > > > on
> > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > Here
> > > > > > > > > > > > > > > > > > is a link to KIP-584
> > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > > >  .
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Jun,

Great question! Please find my response below.

> 200. My understanding is that If the CLI tool passes the
> '--allow-downgrade' flag when updating a specific feature, then a future
> downgrade is possible. Otherwise, the feature is now downgradable. If so,
I
> was wondering how the controller remembers this since it can be restarted
> over time?

(Kowshik): The purpose of the flag was to just restrict the user intent for
a specific request.
It seems to me that to avoid confusion, I could call the flag as
`--try-downgrade` instead.
Then this makes it clear, that, the controller just has to consider the ask
from
the user as an explicit request to attempt a downgrade.

The flag does not act as an override on controller's decision making that
decides whether
a flag is downgradable (these decisions on whether to allow a flag to be
downgraded
from a specific version level, can be embedded in the controller code).

Please let me know what you think.
Sorry if I misunderstood the original question.


Cheers,
Kowshik


On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the reply. Makes sense. Just one more question.
>
> 200. My understanding is that If the CLI tool passes the
> '--allow-downgrade' flag when updating a specific feature, then a future
> downgrade is possible. Otherwise, the feature is now downgradable. If so, I
> was wondering how the controller remembers this since it can be restarted
> over time?
>
> Jun
>
>
> On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hi Jun,
> >
> > Thanks a lot for the feedback and the questions!
> > Please find my response below.
> >
> > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It
> seems
> > > that field needs to be persisted somewhere in ZK?
> >
> > (Kowshik): Great question! Below is my explanation. Please help me
> > understand,
> > if you feel there are cases where we would need to still persist it in
> ZK.
> >
> > Firstly I have updated my thoughts into the KIP now, under the
> 'guidelines'
> > section:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> >
> > The allowDowngrade boolean field is just to restrict the user intent, and
> > to remind
> > them to double check their intent before proceeding. It should be set to
> > true
> > by the user in a request, only when the user intent is to forcefully
> > "attempt" a
> > downgrade of a specific feature's max version level, to the provided
> value
> > in
> > the request.
> >
> > We can extend this safeguard. The controller (on it's end) can maintain
> > rules in the code, that, for safety reasons would outright reject certain
> > downgrades
> > from a specific max_version_level for a specific feature. Such rejections
> > may
> > happen depending on the feature being downgraded, and from what version
> > level.
> >
> > The CLI tool only allows a downgrade attempt in conjunction with specific
> > flags and sub-commands. For example, in the CLI tool, if the user uses
> the
> > 'downgrade-all' command, or passes '--allow-downgrade' flag when
> updating a
> > specific feature, only then the tool will translate this ask to setting
> > 'allowDowngrade' field in the request to the server.
> >
> > > 201. UpdateFeaturesResponse has the following top level fields. Should
> > > those fields be per feature?
> > >
> > >   "fields": [
> > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > >       "about": "The error code, or 0 if there was no error." },
> > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> > >       "about": "The error message, or null if there was no error." }
> > >   ]
> >
> > (Kowshik): Great question!
> > As such, the API is transactional, as explained in the sections linked
> > below.
> > Either all provided FeatureUpdate was applied, or none.
> > It's the reason I felt we can have just one error code + message.
> > Happy to extend this if you feel otherwise. Please let me know.
> >
> > Link to sections:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
> >
> > > 202. The /features path in ZK has a field min_version_level. Which API
> > and
> > > tool can change that value?
> >
> > (Kowshik): Great question! Currently this cannot be modified by using the
> > API or the tool.
> > Feature version deprecation (by raising min_version_level) can be done
> only
> > by the Controller directly. The rationale is explained in this section:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for addressing those comments. Just a few more minor comments.
> > >
> > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It
> seems
> > > that field needs to be persisted somewhere in ZK?
> > >
> > > 201. UpdateFeaturesResponse has the following top level fields. Should
> > > those fields be per feature?
> > >
> > >   "fields": [
> > >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> > >       "about": "The error code, or 0 if there was no error." },
> > >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> > >       "about": "The error message, or null if there was no error." }
> > >   ]
> > >
> > > 202. The /features path in ZK has a field min_version_level. Which API
> > and
> > > tool can change that value?
> > >
> > > Jun
> > >
> > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
> kprakasam@confluent.io
> > >
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the feedback! I have updated the KIP-584 addressing your
> > > > comments.
> > > > Please find my response below.
> > > >
> > > > > 100.6 You can look for the sentence "This operation requires ALTER
> on
> > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > KafkaApis.authorize().
> > > >
> > > > (Kowshik): Done. Great point! For the newly introduced
> UPDATE_FEATURES
> > > api,
> > > > I have added a
> > > > requirement that AclOperation.ALTER is required on
> > ResourceType.CLUSTER.
> > > >
> > > > > 110. Keeping the feature version as int is probably fine. I just
> felt
> > > > that
> > > > > for some of the common user interactions, it's more convenient to
> > > > > relate that to a release version. For example, if a user wants to
> > > > downgrade
> > > > > to a release 2.5, it's easier for the user to use the tool like
> "tool
> > > > > --downgrade 2.5" instead of "tool --downgrade --feature X --version
> > 6".
> > > >
> > > > (Kowshik): Great point. Generally, maximum feature version levels are
> > not
> > > > downgradable after
> > > > they are finalized in the cluster. This is because, as a guideline
> > > bumping
> > > > feature version level usually is used mainly to convey important
> > breaking
> > > > changes.
> > > > Despite the above, there may be some extreme/rare cases where a user
> > > wants
> > > > to downgrade
> > > > all features to a specific previous release. The user may want to do
> > this
> > > > just
> > > > prior to rolling back a Kafka cluster to a previous release.
> > > >
> > > > To support the above, I have made a change to the KIP explaining that
> > the
> > > > CLI tool is versioned.
> > > > The CLI tool internally has knowledge about a map of features to
> their
> > > > respective max
> > > > versions supported by the Broker. The tool's knowledge of features
> and
> > > > their version values,
> > > > is limited to the version of the CLI tool itself i.e. the information
> > is
> > > > packaged into the CLI tool
> > > > when it is released. Whenever a Kafka release introduces a new
> feature
> > > > version, or modifies
> > > > an existing feature version, the CLI tool shall also be updated with
> > this
> > > > information,
> > > > Newer versions of the CLI tool will be released as part of the Kafka
> > > > releases.
> > > >
> > > > Therefore, to achieve the downgrade need, the user just needs to run
> > the
> > > > version of
> > > > the CLI tool that's part of the particular previous release that
> he/she
> > > is
> > > > downgrading to.
> > > > To help the user with this, there is a new command added to the CLI
> > tool
> > > > called `downgrade-all`.
> > > > This essentially downgrades max version levels of all features in the
> > > > cluster to the versions
> > > > known to the CLI tool internally.
> > > >
> > > > I have explained the above in the KIP under these sections:
> > > >
> > > > Tooling support (have explained that the CLI tool is versioned):
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > >
> > > > Regular CLI tool usage (please refer to point #3, and see the tooling
> > > > example)
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > >
> > > > > 110. Similarly, if the client library finds a feature mismatch with
> > the
> > > > broker,
> > > > > the client likely needs to log some error message for the user to
> > take
> > > > some
> > > > > actions. It's much more actionable if the error message is "upgrade
> > the
> > > > > broker to release version 2.6" than just "upgrade the broker to
> > feature
> > > > > version 7".
> > > >
> > > > (Kowshik): That's a really good point! If we use ints for feature
> > > versions,
> > > > the best
> > > > message that client can print for debugging is "broker doesn't
> support
> > > > feature version 7", and alongside that print the supported version
> > range
> > > > returned
> > > > by the broker. Then, does it sound reasonable that the user could
> then
> > > > reference
> > > > Kafka release logs to figure out which version of the broker release
> is
> > > > required
> > > > be deployed, to support feature version 7? I couldn't think of a
> better
> > > > strategy here.
> > > >
> > > > > 120. When should a developer bump up the version of a feature?
> > > >
> > > > (Kowshik): Great question! In the KIP, I have added a section:
> > > 'Guidelines
> > > > on feature versions and workflows'
> > > > providing some guidelines on when to use the versioned feature flags,
> > and
> > > > what
> > > > are the regular workflows with the CLI tool.
> > > >
> > > > Link to the relevant sections:
> > > > Guidelines:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > > >
> > > > Regular CLI tool usage:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > > >
> > > > Advanced CLI tool usage:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > >
> > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the reply. A few more comments.
> > > > >
> > > > > 110. Keeping the feature version as int is probably fine. I just
> felt
> > > > that
> > > > > for some of the common user interactions, it's more convenient to
> > > > > relate that to a release version. For example, if a user wants to
> > > > downgrade
> > > > > to a release 2.5, it's easier for the user to use the tool like
> "tool
> > > > > --downgrade 2.5" instead of "tool --downgrade --feature X --version
> > 6".
> > > > > Similarly, if the client library finds a feature mismatch with the
> > > > broker,
> > > > > the client likely needs to log some error message for the user to
> > take
> > > > some
> > > > > actions. It's much more actionable if the error message is "upgrade
> > the
> > > > > broker to release version 2.6" than just "upgrade the broker to
> > feature
> > > > > version 7".
> > > > >
> > > > > 111. Sounds good.
> > > > >
> > > > > 120. When should a developer bump up the version of a feature?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> > > kprakasam@confluent.io
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > I have updated the KIP for the item 111.
> > > > > > I'm in the process of addressing 100.6, and will provide an
> update
> > > > soon.
> > > > > > I think item 110 is still under discussion given we are now
> > > providing a
> > > > > way
> > > > > > to finalize
> > > > > > all features to their latest version levels. In any case, please
> > let
> > > us
> > > > > > know
> > > > > > how you feel in response to Colin's comments on this topic.
> > > > > >
> > > > > > > 111. To put this in context, when we had IBP, the default value
> > is
> > > > the
> > > > > > > current released version. So, if you are a brand new user, you
> > > don't
> > > > > need
> > > > > > > to configure IBP and all new features will be immediately
> > available
> > > > in
> > > > > > the
> > > > > > > new cluster. If you are upgrading from an old version, you do
> > need
> > > to
> > > > > > > understand and configure IBP. I see a similar pattern here for
> > > > > > > features. From the ease of use perspective, ideally, we
> shouldn't
> > > > > require
> > > > > > a
> > > > > > > new user to have an extra step such as running a bootstrap
> script
> > > > > unless
> > > > > > > it's truly necessary. If someone has a special need (all the
> > cases
> > > > you
> > > > > > > mentioned seem special cases?), they can configure a mode such
> > that
> > > > > > > features are enabled/disabled manually.
> > > > > >
> > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if I
> didn't
> > > > > > understand
> > > > > > this need earlier. I have updated the KIP with the approach that
> > > > whenever
> > > > > > the '/features' node is absent, the controller by default will
> > > > bootstrap
> > > > > > the node
> > > > > > to contain the latest feature levels. Here is the new section in
> > the
> > > > KIP
> > > > > > describing
> > > > > > the same:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > > > > >
> > > > > > Next, as I explained in my response to Colin's suggestions, we
> are
> > > now
> > > > > > providing a `--finalize-latest-features` flag with the tooling.
> > This
> > > > lets
> > > > > > the sysadmin finalize all features known to the controller to
> their
> > > > > latest
> > > > > > version
> > > > > > levels. Please look at this section (point #3 and the tooling
> > example
> > > > > > later):
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > > >
> > > > > >
> > > > > > Do you feel this addresses your comment/concern?
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Kowshik,
> > > > > > >
> > > > > > > Thanks for the reply. A few more replies below.
> > > > > > >
> > > > > > > 100.6 You can look for the sentence "This operation requires
> > ALTER
> > > on
> > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > > KafkaApis.authorize().
> > > > > > >
> > > > > > > 110. From the external client/tooling perspective, it's more
> > > natural
> > > > to
> > > > > > use
> > > > > > > the release version for features. If we can use the same
> release
> > > > > version
> > > > > > > for internal representation, it seems simpler (easier to
> > > understand,
> > > > no
> > > > > > > mapping overhead, etc). Is there a benefit with separate
> external
> > > and
> > > > > > > internal versioning schemes?
> > > > > > >
> > > > > > > 111. To put this in context, when we had IBP, the default value
> > is
> > > > the
> > > > > > > current released version. So, if you are a brand new user, you
> > > don't
> > > > > need
> > > > > > > to configure IBP and all new features will be immediately
> > available
> > > > in
> > > > > > the
> > > > > > > new cluster. If you are upgrading from an old version, you do
> > need
> > > to
> > > > > > > understand and configure IBP. I see a similar pattern here for
> > > > > > > features. From the ease of use perspective, ideally, we
> shouldn't
> > > > > > require a
> > > > > > > new user to have an extra step such as running a bootstrap
> script
> > > > > unless
> > > > > > > it's truly necessary. If someone has a special need (all the
> > cases
> > > > you
> > > > > > > mentioned seem special cases?), they can configure a mode such
> > that
> > > > > > > features are enabled/disabled manually.
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> > > > > kprakasam@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > Thanks for the feedback and suggestions. Please find my
> > response
> > > > > below.
> > > > > > > >
> > > > > > > > > 100.6 For every new request, the admin needs to control who
> > is
> > > > > > allowed
> > > > > > > to
> > > > > > > > > issue that request if security is enabled. So, we need to
> > > assign
> > > > > the
> > > > > > > new
> > > > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > as an example.
> > > > > > > >
> > > > > > > > (Kowshik): I don't see any reference to the words
> ResourceType
> > or
> > > > > > > > AclOperations
> > > > > > > > in the KIP. Please let me know how I can use the KIP that you
> > > > linked
> > > > > to
> > > > > > > > know how to
> > > > > > > > setup the appropriate ResourceType and/or ClusterOperation?
> > > > > > > >
> > > > > > > > > 105. If we change delete to disable, it's better to do this
> > > > > > > consistently
> > > > > > > > in
> > > > > > > > > request protocol and admin api as well.
> > > > > > > >
> > > > > > > > (Kowshik): The API shouldn't be called 'disable' when it is
> > > > deleting
> > > > > a
> > > > > > > > feature.
> > > > > > > > I've just changed the KIP to use 'delete'. I don't have a
> > strong
> > > > > > > > preference.
> > > > > > > >
> > > > > > > > > 110. The minVersion/maxVersion for features use int64.
> > > Currently,
> > > > > our
> > > > > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0).
> > It's
> > > > > > > possible
> > > > > > > > > for new features to be included in minor releases too.
> Should
> > > we
> > > > > make
> > > > > > > the
> > > > > > > > > feature versioning match the release versioning?
> > > > > > > >
> > > > > > > > (Kowshik): The release version can be mapped to a set of
> > feature
> > > > > > > versions,
> > > > > > > > and this can be done, for example in the tool (or even
> external
> > > to
> > > > > the
> > > > > > > > tool).
> > > > > > > > Can you please clarify what I'm missing?
> > > > > > > >
> > > > > > > > > 111. "During regular operations, the data in the ZK node
> can
> > be
> > > > > > mutated
> > > > > > > > > only via a specific admin API served only by the
> > controller." I
> > > > am
> > > > > > > > > wondering why can't the controller auto finalize a feature
> > > > version
> > > > > > > after
> > > > > > > > > all brokers are upgraded? For new users who download the
> > latest
> > > > > > version
> > > > > > > > to
> > > > > > > > > build a new cluster, it's inconvenient for them to have to
> > > > manually
> > > > > > > > enable
> > > > > > > > > each feature.
> > > > > > > >
> > > > > > > > (Kowshik): I agree that there is a trade-off here, but it
> will
> > > help
> > > > > > > > to decide whether the automation can be thought through in
> the
> > > > future
> > > > > > > > in a follow up KIP, or right now in this KIP. We may invest
> > > > > > > > in automation, but we have to decide whether we should do it
> > > > > > > > now or later.
> > > > > > > >
> > > > > > > > For the inconvenience that you mentioned, do you think the
> > > problem
> > > > > that
> > > > > > > you
> > > > > > > > mentioned can be  overcome by asking for the cluster operator
> > to
> > > > run
> > > > > a
> > > > > > > > bootstrap script  when he/she knows that a specific AK
> release
> > > has
> > > > > been
> > > > > > > > almost completely deployed in a cluster for the first time?
> > Idea
> > > is
> > > > > > that
> > > > > > > > the
> > > > > > > > bootstrap script will know how to map a specific AK release
> to
> > > > > > finalized
> > > > > > > > feature versions, and run the `kafka-features.sh` tool
> > > > appropriately
> > > > > > > > against
> > > > > > > > the cluster.
> > > > > > > >
> > > > > > > > Now, coming back to your automation proposal/question.
> > > > > > > > I do see the value of automated feature version finalization,
> > > but I
> > > > > > also
> > > > > > > > see
> > > > > > > > that this will open up several questions and some risks, as
> > > > explained
> > > > > > > > below.
> > > > > > > > The answers to these depend on the definition of the
> automation
> > > we
> > > > > > choose
> > > > > > > > to build, and how well does it fit into a kafka deployment.
> > > > > > > > Basically, it can be unsafe for the controller to finalize
> > > feature
> > > > > > > version
> > > > > > > > upgrades automatically, without learning about the intent of
> > the
> > > > > > cluster
> > > > > > > > operator.
> > > > > > > > 1. We would sometimes want to lock feature versions only when
> > we
> > > > have
> > > > > > > > externally verified
> > > > > > > > the stability of the broker binary.
> > > > > > > > 2. Sometimes only the cluster operator knows that a cluster
> > > upgrade
> > > > > is
> > > > > > > > complete,
> > > > > > > > and new brokers are highly unlikely to join the cluster.
> > > > > > > > 3. Only the cluster operator knows that the intent is to
> deploy
> > > the
> > > > > > same
> > > > > > > > version
> > > > > > > > of the new broker release across the entire cluster (i.e. the
> > > > latest
> > > > > > > > downloaded version).
> > > > > > > > 4. For downgrades, it appears the controller still needs some
> > > > > external
> > > > > > > > input
> > > > > > > > (such as the proposed tool) to finalize a feature version
> > > > downgrade.
> > > > > > > >
> > > > > > > > If we have automation, that automation can end up failing in
> > some
> > > > of
> > > > > > the
> > > > > > > > cases
> > > > > > > > above. Then, we need a way to declare that the cluster is
> "not
> > > > ready"
> > > > > > if
> > > > > > > > the
> > > > > > > > controller cannot automatically finalize some basic required
> > > > feature
> > > > > > > > version
> > > > > > > > upgrades across the cluster. We need to make the cluster
> > operator
> > > > > aware
> > > > > > > in
> > > > > > > > such a scenario (raise an alert or alike).
> > > > > > > >
> > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be
> 49
> > > > > instead
> > > > > > > of
> > > > > > > > 48.
> > > > > > > >
> > > > > > > > (Kowshik): Done.
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Kowshik,
> > > > > > > > >
> > > > > > > > > Thanks for the reply. A few more comments below.
> > > > > > > > >
> > > > > > > > > 100.6 For every new request, the admin needs to control who
> > is
> > > > > > allowed
> > > > > > > to
> > > > > > > > > issue that request if security is enabled. So, we need to
> > > assign
> > > > > the
> > > > > > > new
> > > > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > > as
> > > > > > > > > an example.
> > > > > > > > >
> > > > > > > > > 105. If we change delete to disable, it's better to do this
> > > > > > > consistently
> > > > > > > > in
> > > > > > > > > request protocol and admin api as well.
> > > > > > > > >
> > > > > > > > > 110. The minVersion/maxVersion for features use int64.
> > > Currently,
> > > > > our
> > > > > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0).
> > It's
> > > > > > > possible
> > > > > > > > > for new features to be included in minor releases too.
> Should
> > > we
> > > > > make
> > > > > > > the
> > > > > > > > > feature versioning match the release versioning?
> > > > > > > > >
> > > > > > > > > 111. "During regular operations, the data in the ZK node
> can
> > be
> > > > > > mutated
> > > > > > > > > only via a specific admin API served only by the
> > controller." I
> > > > am
> > > > > > > > > wondering why can't the controller auto finalize a feature
> > > > version
> > > > > > > after
> > > > > > > > > all brokers are upgraded? For new users who download the
> > latest
> > > > > > version
> > > > > > > > to
> > > > > > > > > build a new cluster, it's inconvenient for them to have to
> > > > manually
> > > > > > > > enable
> > > > > > > > > each feature.
> > > > > > > > >
> > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be
> 49
> > > > > instead
> > > > > > > of
> > > > > > > > > 48.
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > > > > > kprakasam@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Jun,
> > > > > > > > > >
> > > > > > > > > > Thanks a lot for the great feedback! Please note that the
> > > > design
> > > > > > > > > > has changed a little bit on the KIP, and we now propagate
> > the
> > > > > > > finalized
> > > > > > > > > > features metadata only via ZK watches (instead of
> > > > > > > UpdateMetadataRequest
> > > > > > > > > > from the controller).
> > > > > > > > > >
> > > > > > > > > > Please find below my response to your questions/feedback,
> > > with
> > > > > the
> > > > > > > > prefix
> > > > > > > > > > "(Kowshik):".
> > > > > > > > > >
> > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > 100.1 Since this request waits for responses from
> > brokers,
> > > > > should
> > > > > > > we
> > > > > > > > > add
> > > > > > > > > > a
> > > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! Done. I have added a timeout
> field.
> > > > Note:
> > > > > > we
> > > > > > > no
> > > > > > > > > > longer
> > > > > > > > > > wait for responses from brokers, since the design has
> been
> > > > > changed
> > > > > > so
> > > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > features information is propagated via ZK. Nevertheless,
> it
> > > is
> > > > > > right
> > > > > > > to
> > > > > > > > > > have a timeout
> > > > > > > > > > for the request.
> > > > > > > > > >
> > > > > > > > > > > 100.2 The response schema is a bit weird. Typically,
> the
> > > > > response
> > > > > > > > just
> > > > > > > > > > > shows an error code and an error message, instead of
> > > echoing
> > > > > the
> > > > > > > > > request.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to just
> > > return
> > > > > an
> > > > > > > > error
> > > > > > > > > > code and a message.
> > > > > > > > > > Previously it was not echoing the "request", rather it
> was
> > > > > > returning
> > > > > > > > the
> > > > > > > > > > latest set of
> > > > > > > > > > cluster-wide finalized features (after applying the
> > updates).
> > > > But
> > > > > > you
> > > > > > > > are
> > > > > > > > > > right,
> > > > > > > > > > the additional info is not required, so I have removed it
> > > from
> > > > > the
> > > > > > > > > response
> > > > > > > > > > schema.
> > > > > > > > > >
> > > > > > > > > > > 100.3 Should we add a separate request to list/describe
> > the
> > > > > > > existing
> > > > > > > > > > > features?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): This is already present in the KIP via the
> > > > > > > > 'DescribeFeatures'
> > > > > > > > > > Admin API,
> > > > > > > > > > which, underneath covers uses the ApiVersionsRequest to
> > > > > > list/describe
> > > > > > > > the
> > > > > > > > > > existing features. Please read the 'Tooling support'
> > section.
> > > > > > > > > >
> > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
> single
> > > > > request.
> > > > > > > For
> > > > > > > > > > > DELETE, the version field doesn't make sense. So, I
> guess
> > > the
> > > > > > > broker
> > > > > > > > > just
> > > > > > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! I have modified the KIP now to
> > have 2
> > > > > > > separate
> > > > > > > > > > controller APIs
> > > > > > > > > > serving these different purposes:
> > > > > > > > > > 1. updateFeatures
> > > > > > > > > > 2. deleteFeatures
> > > > > > > > > >
> > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > monotonically
> > > > > > > > increasing
> > > > > > > > > > > version of the metadata for finalized features." I am
> > > > wondering
> > > > > > why
> > > > > > > > the
> > > > > > > > > > > ordering is important?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): In the latest KIP write-up, it is called epoch
> > > > > (instead
> > > > > > of
> > > > > > > > > > version), and
> > > > > > > > > > it is just the ZK node version. Basically, this is the
> > epoch
> > > > for
> > > > > > the
> > > > > > > > > > cluster-wide
> > > > > > > > > > finalized feature version metadata. This metadata is
> served
> > > to
> > > > > > > clients
> > > > > > > > > via
> > > > > > > > > > the
> > > > > > > > > > ApiVersionsResponse (for reads). We propagate updates
> from
> > > the
> > > > > > > > > '/features'
> > > > > > > > > > ZK node
> > > > > > > > > > to all brokers, via ZK watches setup by each broker on
> the
> > > > > > > '/features'
> > > > > > > > > > node.
> > > > > > > > > >
> > > > > > > > > > Now here is why the ordering is important:
> > > > > > > > > > ZK watches don't propagate at the same time. As a result,
> > the
> > > > > > > > > > ApiVersionsResponse
> > > > > > > > > > is eventually consistent across brokers. This can
> introduce
> > > > cases
> > > > > > > > > > where clients see an older lower epoch of the features
> > > > metadata,
> > > > > > > after
> > > > > > > > a
> > > > > > > > > > more recent
> > > > > > > > > > higher epoch was returned at a previous point in time. We
> > > > expect
> > > > > > > > clients
> > > > > > > > > > to always employ the rule that the latest received higher
> > > epoch
> > > > > of
> > > > > > > > > metadata
> > > > > > > > > > always trumps an older smaller epoch. Those clients that
> > are
> > > > > > external
> > > > > > > > to
> > > > > > > > > > Kafka should strongly consider discovering the latest
> > > metadata
> > > > > once
> > > > > > > > > during
> > > > > > > > > > startup from the brokers, and if required refresh the
> > > metadata
> > > > > > > > > periodically
> > > > > > > > > > (to get the latest metadata).
> > > > > > > > > >
> > > > > > > > > > > 100.6 Could you specify the required ACL for this new
> > > > request?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): What is ACL, and how could I find out which
> one
> > to
> > > > > > > specify?
> > > > > > > > > > Please could you provide me some pointers? I'll be glad
> to
> > > > update
> > > > > > the
> > > > > > > > > > KIP once I know the next steps.
> > > > > > > > > >
> > > > > > > > > > > 101. For the broker registration ZK node, should we
> bump
> > up
> > > > the
> > > > > > > > version
> > > > > > > > > > in
> > > > > > > > > > the json?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! Done. I've increased the version
> in
> > > the
> > > > > > > broker
> > > > > > > > > json
> > > > > > > > > > by 1.
> > > > > > > > > >
> > > > > > > > > > > 102. For the /features ZK node, not sure if we need the
> > > epoch
> > > > > > > field.
> > > > > > > > > Each
> > > > > > > > > > > ZK node has an internal version field that is
> incremented
> > > on
> > > > > > every
> > > > > > > > > > update.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node
> version
> > > > now,
> > > > > > > > instead
> > > > > > > > > of
> > > > > > > > > > explicitly
> > > > > > > > > > incremented epoch.
> > > > > > > > > >
> > > > > > > > > > > 103. "Enabling the actual semantics of a feature
> version
> > > > > > > cluster-wide
> > > > > > > > > is
> > > > > > > > > > > left to the discretion of the logic implementing the
> > > feature
> > > > > (ex:
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > done via dynamic broker config)." Does that mean the
> > broker
> > > > > > > > > registration
> > > > > > > > > > ZK
> > > > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Not really. The text was just conveying that a
> > > > broker
> > > > > > > could
> > > > > > > > > > "know" of
> > > > > > > > > > a new feature version, but it does not mean the broker
> > should
> > > > > have
> > > > > > > also
> > > > > > > > > > activated the effects of the feature version. Knowing vs
> > > > > activation
> > > > > > > > are 2
> > > > > > > > > > separate things,
> > > > > > > > > > and the latter can be achieved by dynamic config. I have
> > > > reworded
> > > > > > the
> > > > > > > > > text
> > > > > > > > > > to
> > > > > > > > > > make this clear to the reader.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > 104.1 It would be useful to describe when the feature
> > > > metadata
> > > > > is
> > > > > > > > > > included
> > > > > > > > > > > in the request. My understanding is that it's only
> > included
> > > > if
> > > > > > (1)
> > > > > > > > > there
> > > > > > > > > > is
> > > > > > > > > > > a change to the finalized feature; (2) broker restart;
> > (3)
> > > > > > > controller
> > > > > > > > > > > failover.
> > > > > > > > > > > 104.2 The new fields have the following versions. Why
> are
> > > the
> > > > > > > > versions
> > > > > > > > > 3+
> > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > >       "fields":  [
> > > > > > > > > > >         {"name": "Name", "type":  "string", "versions":
> > > > "3+",
> > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > "versions":
> > > > > "3+",
> > > > > > > > > > >           "about": "The finalized version for the
> > > feature."}
> > > > > > > > > > >       ]
> > > > > > > > > >
> > > > > > > > > > (Kowshik): With the new improved design, we have
> completely
> > > > > > > eliminated
> > > > > > > > > the
> > > > > > > > > > need to
> > > > > > > > > > use UpdateMetadataRequest. This is because we now rely on
> > ZK
> > > to
> > > > > > > deliver
> > > > > > > > > the
> > > > > > > > > > notifications for changes to the '/features' ZK node.
> > > > > > > > > >
> > > > > > > > > > > 105. kafka-features.sh: Instead of using update/delete,
> > > > perhaps
> > > > > > > it's
> > > > > > > > > > better
> > > > > > > > > > > to use enable/disable?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): For delete, yes, I have changed it so that we
> > > > instead
> > > > > > call
> > > > > > > > it
> > > > > > > > > > 'disable'.
> > > > > > > > > > However for 'update', it can now also refer to either an
> > > > upgrade
> > > > > > or a
> > > > > > > > > > forced downgrade.
> > > > > > > > > > Therefore, I have left it the way it is, just calling it
> as
> > > > just
> > > > > > > > > 'update'.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
> jun@confluent.io>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi, Kowshik,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP. Looks good overall. A few comments
> > > below.
> > > > > > > > > > >
> > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > > 100.1 Since this request waits for responses from
> > brokers,
> > > > > should
> > > > > > > we
> > > > > > > > > add
> > > > > > > > > > a
> > > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > > > 100.2 The response schema is a bit weird. Typically,
> the
> > > > > response
> > > > > > > > just
> > > > > > > > > > > shows an error code and an error message, instead of
> > > echoing
> > > > > the
> > > > > > > > > request.
> > > > > > > > > > > 100.3 Should we add a separate request to list/describe
> > the
> > > > > > > existing
> > > > > > > > > > > features?
> > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a
> single
> > > > > request.
> > > > > > > For
> > > > > > > > > > > DELETE, the version field doesn't make sense. So, I
> guess
> > > the
> > > > > > > broker
> > > > > > > > > just
> > > > > > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> > monotonically
> > > > > > > > increasing
> > > > > > > > > > > version of the metadata for finalized features." I am
> > > > wondering
> > > > > > why
> > > > > > > > the
> > > > > > > > > > > ordering is important?
> > > > > > > > > > > 100.6 Could you specify the required ACL for this new
> > > > request?
> > > > > > > > > > >
> > > > > > > > > > > 101. For the broker registration ZK node, should we
> bump
> > up
> > > > the
> > > > > > > > version
> > > > > > > > > > in
> > > > > > > > > > > the json?
> > > > > > > > > > >
> > > > > > > > > > > 102. For the /features ZK node, not sure if we need the
> > > epoch
> > > > > > > field.
> > > > > > > > > Each
> > > > > > > > > > > ZK node has an internal version field that is
> incremented
> > > on
> > > > > > every
> > > > > > > > > > update.
> > > > > > > > > > >
> > > > > > > > > > > 103. "Enabling the actual semantics of a feature
> version
> > > > > > > cluster-wide
> > > > > > > > > is
> > > > > > > > > > > left to the discretion of the logic implementing the
> > > feature
> > > > > (ex:
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > done via dynamic broker config)." Does that mean the
> > broker
> > > > > > > > > registration
> > > > > > > > > > ZK
> > > > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > > > >
> > > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > > 104.1 It would be useful to describe when the feature
> > > > metadata
> > > > > is
> > > > > > > > > > included
> > > > > > > > > > > in the request. My understanding is that it's only
> > included
> > > > if
> > > > > > (1)
> > > > > > > > > there
> > > > > > > > > > is
> > > > > > > > > > > a change to the finalized feature; (2) broker restart;
> > (3)
> > > > > > > controller
> > > > > > > > > > > failover.
> > > > > > > > > > > 104.2 The new fields have the following versions. Why
> are
> > > the
> > > > > > > > versions
> > > > > > > > > 3+
> > > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > > >       "fields":  [
> > > > > > > > > > >         {"name": "Name", "type":  "string", "versions":
> > > > "3+",
> > > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > > >         {"name":  "Version", "type":  "int64",
> > "versions":
> > > > > "3+",
> > > > > > > > > > >           "about": "The finalized version for the
> > > feature."}
> > > > > > > > > > >       ]
> > > > > > > > > > >
> > > > > > > > > > > 105. kafka-features.sh: Instead of using update/delete,
> > > > perhaps
> > > > > > > it's
> > > > > > > > > > better
> > > > > > > > > > > to use enable/disable?
> > > > > > > > > > >
> > > > > > > > > > > Jun
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > > > > > > kprakasam@confluent.io
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey Boyang,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the great feedback! I have updated the KIP
> > > based
> > > > > on
> > > > > > > your
> > > > > > > > > > > > feedback.
> > > > > > > > > > > > Please find my response below for your comments, look
> > for
> > > > > > > sentences
> > > > > > > > > > > > starting
> > > > > > > > > > > > with "(Kowshik)" below.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
> handling
> > > EOS
> > > > > > > > traffic"
> > > > > > > > > > > could
> > > > > > > > > > > > be
> > > > > > > > > > > > > converted as "When is it safe for the brokers to
> > start
> > > > > > serving
> > > > > > > > new
> > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> > explained
> > > > > > earlier
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > context.
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > >
> > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
> version
> > > > > number
> > > > > > > part
> > > > > > > > > > > seems a
> > > > > > > > > > > > > bit blurred. Could you point a reference to later
> > > section
> > > > > > that
> > > > > > > we
> > > > > > > > > > going
> > > > > > > > > > > > to
> > > > > > > > > > > > > store it in Zookeeper and update it every time when
> > > there
> > > > > is
> > > > > > a
> > > > > > > > > > feature
> > > > > > > > > > > > > change?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! Done. I've added a reference
> in
> > > the
> > > > > > KIP.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > 3. For the feature downgrade, although it's a
> > Non-goal
> > > of
> > > > > the
> > > > > > > > KIP,
> > > > > > > > > > for
> > > > > > > > > > > > > features such as group coordinator semantics, there
> > is
> > > no
> > > > > > legal
> > > > > > > > > > > scenario
> > > > > > > > > > > > to
> > > > > > > > > > > > > perform a downgrade at all. So having downgrade
> door
> > > open
> > > > > is
> > > > > > > > pretty
> > > > > > > > > > > > > error-prone as human faults happen all the time.
> I'm
> > > > > assuming
> > > > > > > as
> > > > > > > > > new
> > > > > > > > > > > > > features are implemented, it's not very hard to
> add a
> > > > flag
> > > > > > > during
> > > > > > > > > > > feature
> > > > > > > > > > > > > creation to indicate whether this feature is
> > > > > "downgradable".
> > > > > > > > Could
> > > > > > > > > > you
> > > > > > > > > > > > > explain a bit more on the extra engineering effort
> > for
> > > > > > shipping
> > > > > > > > > this
> > > > > > > > > > > KIP
> > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree here.
> > > While
> > > > I
> > > > > > > agree
> > > > > > > > > that
> > > > > > > > > > > > accidental
> > > > > > > > > > > > downgrades can cause problems, I also think sometimes
> > > > > > downgrades
> > > > > > > > > should
> > > > > > > > > > > > be allowed for emergency reasons (not all downgrades
> > > cause
> > > > > > > issues).
> > > > > > > > > > > > It is just subjective to the feature being
> downgraded.
> > > > > > > > > > > >
> > > > > > > > > > > > To be more strict about feature version downgrades, I
> > > have
> > > > > > > modified
> > > > > > > > > the
> > > > > > > > > > > KIP
> > > > > > > > > > > > proposing that we mandate a `--force-downgrade` flag
> be
> > > > used
> > > > > in
> > > > > > > the
> > > > > > > > > > > > UPDATE_FEATURES api
> > > > > > > > > > > > and the tooling, whenever the human is downgrading a
> > > > > finalized
> > > > > > > > > feature
> > > > > > > > > > > > version.
> > > > > > > > > > > > Hopefully this should cover the requirement, until we
> > > find
> > > > > the
> > > > > > > need
> > > > > > > > > for
> > > > > > > > > > > > advanced downgrade support.
> > > > > > > > > > > >
> > > > > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> > > > versions
> > > > > > will
> > > > > > > > be
> > > > > > > > > > > > defined
> > > > > > > > > > > > > in the broker code." So this means in order to
> > > restrict a
> > > > > > > certain
> > > > > > > > > > > > feature,
> > > > > > > > > > > > > we need to start the broker first and then send a
> > > feature
> > > > > > > gating
> > > > > > > > > > > request
> > > > > > > > > > > > > immediately, which introduces a time gap and the
> > > > > > > > intended-to-close
> > > > > > > > > > > > feature
> > > > > > > > > > > > > could actually serve request during this phase. Do
> > you
> > > > > think
> > > > > > we
> > > > > > > > > > should
> > > > > > > > > > > > also
> > > > > > > > > > > > > support configurations as well so that admin user
> > could
> > > > > > freely
> > > > > > > > roll
> > > > > > > > > > up
> > > > > > > > > > > a
> > > > > > > > > > > > > cluster with all nodes complying the same feature
> > > gating,
> > > > > > > without
> > > > > > > > > > > > worrying
> > > > > > > > > > > > > about the turnaround time to propagate the message
> > only
> > > > > after
> > > > > > > the
> > > > > > > > > > > cluster
> > > > > > > > > > > > > starts up?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): This is a great point/question. One of the
> > > > > > > expectations
> > > > > > > > > out
> > > > > > > > > > of
> > > > > > > > > > > > this KIP, which is
> > > > > > > > > > > > already followed in the broker, is the following.
> > > > > > > > > > > >  - Imagine at time T1 the broker starts up and
> > registers
> > > > it’s
> > > > > > > > > presence
> > > > > > > > > > in
> > > > > > > > > > > > ZK,
> > > > > > > > > > > >    along with advertising it’s supported features.
> > > > > > > > > > > >  - Imagine at a future time T2 the broker receives
> the
> > > > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > > >    from the controller, which contains the latest
> > > finalized
> > > > > > > > features
> > > > > > > > > as
> > > > > > > > > > > > seen by
> > > > > > > > > > > >    the controller. The broker validates this data
> > against
> > > > > it’s
> > > > > > > > > > supported
> > > > > > > > > > > > features to
> > > > > > > > > > > >    make sure there is no mismatch (it will shutdown
> if
> > > > there
> > > > > is
> > > > > > > an
> > > > > > > > > > > > incompatibility).
> > > > > > > > > > > >
> > > > > > > > > > > > It is expected that during the time between the 2
> > events
> > > T1
> > > > > and
> > > > > > > T2,
> > > > > > > > > the
> > > > > > > > > > > > broker is
> > > > > > > > > > > > almost a silent entity in the cluster. It does not
> add
> > > any
> > > > > > value
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > cluster, or carry
> > > > > > > > > > > > out any important broker activities. By “important”,
> I
> > > mean
> > > > > it
> > > > > > is
> > > > > > > > not
> > > > > > > > > > > doing
> > > > > > > > > > > > mutations
> > > > > > > > > > > > on it’s persistence, not mutating critical in-memory
> > > state,
> > > > > > won’t
> > > > > > > > be
> > > > > > > > > > > > serving
> > > > > > > > > > > > produce/fetch requests. Note it doesn’t even know
> it’s
> > > > > assigned
> > > > > > > > > > > partitions
> > > > > > > > > > > > until
> > > > > > > > > > > > it receives UpdateMetadataRequest from controller.
> > > Anything
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > is
> > > > > > > > > > > > doing up
> > > > > > > > > > > > until this point is not damaging/useful.
> > > > > > > > > > > >
> > > > > > > > > > > > I’ve clarified the above in the KIP, see this new
> > > section:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > > > > > .
> > > > > > > > > > > >
> > > > > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> > > > existing
> > > > > > > > > Feature",
> > > > > > > > > > > may
> > > > > > > > > > > > be
> > > > > > > > > > > > > I misunderstood something, I thought the features
> are
> > > > > defined
> > > > > > > in
> > > > > > > > > > broker
> > > > > > > > > > > > > code, so admin could not really create a new
> feature?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! You understood this right.
> Here
> > > > > adding
> > > > > > a
> > > > > > > > > > feature
> > > > > > > > > > > > means we are
> > > > > > > > > > > > adding a cluster-wide finalized *max* version for a
> > > feature
> > > > > > that
> > > > > > > > was
> > > > > > > > > > > > previously never finalized.
> > > > > > > > > > > > I have clarified this in the KIP now.
> > > > > > > > > > > >
> > > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > to
> > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP
> adding
> > > the
> > > > > > above
> > > > > > > > (see
> > > > > > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > > > > > >
> > > > > > > > > > > > > 7. I think we haven't discussed the alternative
> > > solution
> > > > to
> > > > > > > pass
> > > > > > > > > the
> > > > > > > > > > > > > feature information through Zookeeper. Is that
> > > mentioned
> > > > in
> > > > > > the
> > > > > > > > KIP
> > > > > > > > > > to
> > > > > > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Nice question! The broker reads finalized
> > > > feature
> > > > > > info
> > > > > > > > > > stored
> > > > > > > > > > > in
> > > > > > > > > > > > ZK,
> > > > > > > > > > > > only during startup when it does a validation. When
> > > serving
> > > > > > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > > > > > broker does not read this info from ZK directly. I'd
> > > > imagine
> > > > > > the
> > > > > > > > risk
> > > > > > > > > > is
> > > > > > > > > > > > that it can increase
> > > > > > > > > > > > the ZK read QPS which can be a bottleneck for the
> > system.
> > > > > > Today,
> > > > > > > in
> > > > > > > > > > Kafka
> > > > > > > > > > > > we use the
> > > > > > > > > > > > controller to fan out ZK updates to brokers and we
> want
> > > to
> > > > > > stick
> > > > > > > to
> > > > > > > > > > that
> > > > > > > > > > > > pattern to avoid
> > > > > > > > > > > > the ZK read bottleneck when serving
> > `ApiVersionsRequest`.
> > > > > > > > > > > >
> > > > > > > > > > > > > 8. I was under the impression that user could
> > > configure a
> > > > > > range
> > > > > > > > of
> > > > > > > > > > > > > supported versions, what's the trade-off for
> allowing
> > > > > single
> > > > > > > > > > finalized
> > > > > > > > > > > > > version only?
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great question! The finalized version of a
> > > > feature
> > > > > > > > > basically
> > > > > > > > > > > > refers to
> > > > > > > > > > > > the cluster-wide finalized feature "maximum" version.
> > For
> > > > > > > example,
> > > > > > > > if
> > > > > > > > > > the
> > > > > > > > > > > > 'group_coordinator' feature
> > > > > > > > > > > > has the finalized version set to 10, then, it means
> > that
> > > > > > > > cluster-wide
> > > > > > > > > > all
> > > > > > > > > > > > versions upto v10 are
> > > > > > > > > > > > supported for this feature. However, note that if
> some
> > > > > version
> > > > > > > (ex:
> > > > > > > > > v0)
> > > > > > > > > > > > gets deprecated
> > > > > > > > > > > > for this feature, then we don’t convey that using
> this
> > > > scheme
> > > > > > > (also
> > > > > > > > > > > > supporting deprecation is a non-goal).
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all points,
> > > > refering
> > > > > to
> > > > > > > > > > finalized
> > > > > > > > > > > > feature "maximum" versions.
> > > > > > > > > > > >
> > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> "client"
> > > here
> > > > > may
> > > > > > > be
> > > > > > > > a
> > > > > > > > > > > > producer
> > > > > > > > > > > >
> > > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Kowshik
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > > > > > reluctanthero104@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hey Kowshik,
> > > > > > > > > > > > >
> > > > > > > > > > > > > thanks for the revised KIP. Got a couple of
> > questions:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. "When is it safe for the brokers to begin
> handling
> > > EOS
> > > > > > > > traffic"
> > > > > > > > > > > could
> > > > > > > > > > > > be
> > > > > > > > > > > > > converted as "When is it safe for the brokers to
> > start
> > > > > > serving
> > > > > > > > new
> > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> > explained
> > > > > > earlier
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > context.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. In the *Explanation *section, the metadata
> version
> > > > > number
> > > > > > > part
> > > > > > > > > > > seems a
> > > > > > > > > > > > > bit blurred. Could you point a reference to later
> > > section
> > > > > > that
> > > > > > > we
> > > > > > > > > > going
> > > > > > > > > > > > to
> > > > > > > > > > > > > store it in Zookeeper and update it every time when
> > > there
> > > > > is
> > > > > > a
> > > > > > > > > > feature
> > > > > > > > > > > > > change?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 3. For the feature downgrade, although it's a
> > Non-goal
> > > of
> > > > > the
> > > > > > > > KIP,
> > > > > > > > > > for
> > > > > > > > > > > > > features such as group coordinator semantics, there
> > is
> > > no
> > > > > > legal
> > > > > > > > > > > scenario
> > > > > > > > > > > > to
> > > > > > > > > > > > > perform a downgrade at all. So having downgrade
> door
> > > open
> > > > > is
> > > > > > > > pretty
> > > > > > > > > > > > > error-prone as human faults happen all the time.
> I'm
> > > > > assuming
> > > > > > > as
> > > > > > > > > new
> > > > > > > > > > > > > features are implemented, it's not very hard to
> add a
> > > > flag
> > > > > > > during
> > > > > > > > > > > feature
> > > > > > > > > > > > > creation to indicate whether this feature is
> > > > > "downgradable".
> > > > > > > > Could
> > > > > > > > > > you
> > > > > > > > > > > > > explain a bit more on the extra engineering effort
> > for
> > > > > > shipping
> > > > > > > > > this
> > > > > > > > > > > KIP
> > > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> > > > versions
> > > > > > will
> > > > > > > > be
> > > > > > > > > > > > defined
> > > > > > > > > > > > > in the broker code." So this means in order to
> > > restrict a
> > > > > > > certain
> > > > > > > > > > > > feature,
> > > > > > > > > > > > > we need to start the broker first and then send a
> > > feature
> > > > > > > gating
> > > > > > > > > > > request
> > > > > > > > > > > > > immediately, which introduces a time gap and the
> > > > > > > > intended-to-close
> > > > > > > > > > > > feature
> > > > > > > > > > > > > could actually serve request during this phase. Do
> > you
> > > > > think
> > > > > > we
> > > > > > > > > > should
> > > > > > > > > > > > also
> > > > > > > > > > > > > support configurations as well so that admin user
> > could
> > > > > > freely
> > > > > > > > roll
> > > > > > > > > > up
> > > > > > > > > > > a
> > > > > > > > > > > > > cluster with all nodes complying the same feature
> > > gating,
> > > > > > > without
> > > > > > > > > > > > worrying
> > > > > > > > > > > > > about the turnaround time to propagate the message
> > only
> > > > > after
> > > > > > > the
> > > > > > > > > > > cluster
> > > > > > > > > > > > > starts up?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> > > > existing
> > > > > > > > > Feature",
> > > > > > > > > > > may
> > > > > > > > > > > > be
> > > > > > > > > > > > > I misunderstood something, I thought the features
> are
> > > > > defined
> > > > > > > in
> > > > > > > > > > broker
> > > > > > > > > > > > > code, so admin could not really create a new
> feature?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > > to
> > > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 7. I think we haven't discussed the alternative
> > > solution
> > > > to
> > > > > > > pass
> > > > > > > > > the
> > > > > > > > > > > > > feature information through Zookeeper. Is that
> > > mentioned
> > > > in
> > > > > > the
> > > > > > > > KIP
> > > > > > > > > > to
> > > > > > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 8. I was under the impression that user could
> > > configure a
> > > > > > range
> > > > > > > > of
> > > > > > > > > > > > > supported versions, what's the trade-off for
> allowing
> > > > > single
> > > > > > > > > > finalized
> > > > > > > > > > > > > version only?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 9. One minor syntax fix: Note that here the
> "client"
> > > here
> > > > > may
> > > > > > > be
> > > > > > > > a
> > > > > > > > > > > > producer
> > > > > > > > > > > > >
> > > > > > > > > > > > > Boyang
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > > > > > > cmccabe@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam
> > > wrote:
> > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the feedback! I've changed the KIP
> to
> > > > > address
> > > > > > > your
> > > > > > > > > > > > > > > suggestions.
> > > > > > > > > > > > > > > Please find below my explanation. Here is a
> link
> > to
> > > > KIP
> > > > > > > 584:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. '__data_version__' is the version of the
> > > finalized
> > > > > > > feature
> > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > (i.e. actual ZK node contents), while the
> > > > > > > > '__schema_version__'
> > > > > > > > > is
> > > > > > > > > > > the
> > > > > > > > > > > > > > > version of the schema of the data persisted in
> > ZK.
> > > > > These
> > > > > > > > serve
> > > > > > > > > > > > > different
> > > > > > > > > > > > > > > purposes. '__data_version__' is is useful
> mainly
> > to
> > > > > > clients
> > > > > > > > > > during
> > > > > > > > > > > > > reads,
> > > > > > > > > > > > > > > to differentiate between the 2 versions of
> > > eventually
> > > > > > > > > consistent
> > > > > > > > > > > > > > 'finalized
> > > > > > > > > > > > > > > features' metadata (i.e. larger metadata
> version
> > is
> > > > > more
> > > > > > > > > recent).
> > > > > > > > > > > > > > > '__schema_version__' provides an additional
> > degree
> > > of
> > > > > > > > > > flexibility,
> > > > > > > > > > > > > where
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > we decide to change the schema for '/features'
> > node
> > > > in
> > > > > ZK
> > > > > > > (in
> > > > > > > > > the
> > > > > > > > > > > > > > future),
> > > > > > > > > > > > > > > then we can manage broker roll outs suitably
> > (i.e.
> > > > > > > > > > > > > > > serialization/deserialization of the ZK data
> can
> > be
> > > > > > handled
> > > > > > > > > > > safely).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Kowshik,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If you're talking about a number that lets you
> know
> > > if
> > > > > data
> > > > > > > is
> > > > > > > > > more
> > > > > > > > > > > or
> > > > > > > > > > > > > > less recent, we would typically call that an
> epoch,
> > > and
> > > > > > not a
> > > > > > > > > > > version.
> > > > > > > > > > > > > For
> > > > > > > > > > > > > > the ZK data structures, the word "version" is
> > > typically
> > > > > > > > reserved
> > > > > > > > > > for
> > > > > > > > > > > > > > describing changes to the overall schema of the
> > data
> > > > that
> > > > > > is
> > > > > > > > > > written
> > > > > > > > > > > to
> > > > > > > > > > > > > > ZooKeeper.  We don't even really change the
> > "version"
> > > > of
> > > > > > > those
> > > > > > > > > > > schemas
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > much, since most changes are
> backwards-compatible.
> > > But
> > > > > we
> > > > > > do
> > > > > > > > > > include
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > version field just in case.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't think we really need an epoch here,
> though,
> > > > since
> > > > > > we
> > > > > > > > can
> > > > > > > > > > just
> > > > > > > > > > > > > look
> > > > > > > > > > > > > > at the broker epoch.  Whenever the broker
> > registers,
> > > > its
> > > > > > > epoch
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > > > greater than the previous broker epoch.  And the
> > > newly
> > > > > > > > registered
> > > > > > > > > > > data
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > take priority.  This will be a lot simpler than
> > > adding
> > > > a
> > > > > > > > separate
> > > > > > > > > > > epoch
> > > > > > > > > > > > > > system, I think.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. Regarding admin client needing min and max
> > > > > > information -
> > > > > > > > you
> > > > > > > > > > are
> > > > > > > > > > > > > > right!
> > > > > > > > > > > > > > > I've changed the KIP such that the Admin API
> also
> > > > > allows
> > > > > > > the
> > > > > > > > > user
> > > > > > > > > > > to
> > > > > > > > > > > > > read
> > > > > > > > > > > > > > > 'supported features' from a specific broker.
> > Please
> > > > > look
> > > > > > at
> > > > > > > > the
> > > > > > > > > > > > section
> > > > > > > > > > > > > > > "Admin API changes".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it
> was
> > > not
> > > > > > > > > deliberate.
> > > > > > > > > > > > I've
> > > > > > > > > > > > > > > improved the KIP to just use `long` at all
> > places.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sounds good.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool -
> > you
> > > > are
> > > > > > > right!
> > > > > > > > > > I've
> > > > > > > > > > > > > > updated
> > > > > > > > > > > > > > > the KIP sketching the functionality provided by
> > > this
> > > > > > tool,
> > > > > > > > with
> > > > > > > > > > > some
> > > > > > > > > > > > > > > examples. Please look at the section "Tooling
> > > support
> > > > > > > > > examples".
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In the "Schema" section, do we really need
> both
> > > > > > > > > > > __schema_version__
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > __data_version__?  Can we just have a single
> > > > version
> > > > > > > field
> > > > > > > > > > here?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function have
> some
> > > way
> > > > to
> > > > > > get
> > > > > > > > the
> > > > > > > > > > min
> > > > > > > > > > > > and
> > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > information that we're exposing as well?  I
> > guess
> > > > we
> > > > > > > could
> > > > > > > > > have
> > > > > > > > > > > > min,
> > > > > > > > > > > > > > max,
> > > > > > > > > > > > > > > > and current.  Unrelated: is the use of Long
> > > rather
> > > > > than
> > > > > > > > long
> > > > > > > > > > > > > deliberate
> > > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It would be good to describe how the command
> > line
> > > > > tool
> > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work.  For
> > > example
> > > > > the
> > > > > > > > flags
> > > > > > > > > > that
> > > > > > > > > > > > it
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > take and the output that it will generate to
> > > > STDOUT.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik
> > Prakasam
> > > > > wrote:
> > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I've opened KIP-584
> > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > is intended to provide a versioning scheme
> > for
> > > > > > > features.
> > > > > > > > > I'd
> > > > > > > > > > > like
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > this thread to discuss the same. I'd
> > appreciate
> > > > any
> > > > > > > > > feedback
> > > > > > > > > > on
> > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > Here
> > > > > > > > > > > > > > > > > is a link to KIP-584
> > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > > >  .
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Jun Rao <ju...@confluent.io>.
Hi, Kowshik,

Thanks for the reply. Makes sense. Just one more question.

200. My understanding is that If the CLI tool passes the
'--allow-downgrade' flag when updating a specific feature, then a future
downgrade is possible. Otherwise, the feature is now downgradable. If so, I
was wondering how the controller remembers this since it can be restarted
over time?

Jun


On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> Thanks a lot for the feedback and the questions!
> Please find my response below.
>
> > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It seems
> > that field needs to be persisted somewhere in ZK?
>
> (Kowshik): Great question! Below is my explanation. Please help me
> understand,
> if you feel there are cases where we would need to still persist it in ZK.
>
> Firstly I have updated my thoughts into the KIP now, under the 'guidelines'
> section:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
>
> The allowDowngrade boolean field is just to restrict the user intent, and
> to remind
> them to double check their intent before proceeding. It should be set to
> true
> by the user in a request, only when the user intent is to forcefully
> "attempt" a
> downgrade of a specific feature's max version level, to the provided value
> in
> the request.
>
> We can extend this safeguard. The controller (on it's end) can maintain
> rules in the code, that, for safety reasons would outright reject certain
> downgrades
> from a specific max_version_level for a specific feature. Such rejections
> may
> happen depending on the feature being downgraded, and from what version
> level.
>
> The CLI tool only allows a downgrade attempt in conjunction with specific
> flags and sub-commands. For example, in the CLI tool, if the user uses the
> 'downgrade-all' command, or passes '--allow-downgrade' flag when updating a
> specific feature, only then the tool will translate this ask to setting
> 'allowDowngrade' field in the request to the server.
>
> > 201. UpdateFeaturesResponse has the following top level fields. Should
> > those fields be per feature?
> >
> >   "fields": [
> >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> >       "about": "The error code, or 0 if there was no error." },
> >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> >       "about": "The error message, or null if there was no error." }
> >   ]
>
> (Kowshik): Great question!
> As such, the API is transactional, as explained in the sections linked
> below.
> Either all provided FeatureUpdate was applied, or none.
> It's the reason I felt we can have just one error code + message.
> Happy to extend this if you feel otherwise. Please let me know.
>
> Link to sections:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
>
> > 202. The /features path in ZK has a field min_version_level. Which API
> and
> > tool can change that value?
>
> (Kowshik): Great question! Currently this cannot be modified by using the
> API or the tool.
> Feature version deprecation (by raising min_version_level) can be done only
> by the Controller directly. The rationale is explained in this section:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
>
>
> Cheers,
> Kowshik
>
> On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > Thanks for addressing those comments. Just a few more minor comments.
> >
> > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It seems
> > that field needs to be persisted somewhere in ZK?
> >
> > 201. UpdateFeaturesResponse has the following top level fields. Should
> > those fields be per feature?
> >
> >   "fields": [
> >     { "name": "ErrorCode", "type": "int16", "versions": "0+",
> >       "about": "The error code, or 0 if there was no error." },
> >     { "name": "ErrorMessage", "type": "string", "versions": "0+",
> >       "about": "The error message, or null if there was no error." }
> >   ]
> >
> > 202. The /features path in ZK has a field min_version_level. Which API
> and
> > tool can change that value?
> >
> > Jun
> >
> > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <kprakasam@confluent.io
> >
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the feedback! I have updated the KIP-584 addressing your
> > > comments.
> > > Please find my response below.
> > >
> > > > 100.6 You can look for the sentence "This operation requires ALTER on
> > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > KafkaApis.authorize().
> > >
> > > (Kowshik): Done. Great point! For the newly introduced UPDATE_FEATURES
> > api,
> > > I have added a
> > > requirement that AclOperation.ALTER is required on
> ResourceType.CLUSTER.
> > >
> > > > 110. Keeping the feature version as int is probably fine. I just felt
> > > that
> > > > for some of the common user interactions, it's more convenient to
> > > > relate that to a release version. For example, if a user wants to
> > > downgrade
> > > > to a release 2.5, it's easier for the user to use the tool like "tool
> > > > --downgrade 2.5" instead of "tool --downgrade --feature X --version
> 6".
> > >
> > > (Kowshik): Great point. Generally, maximum feature version levels are
> not
> > > downgradable after
> > > they are finalized in the cluster. This is because, as a guideline
> > bumping
> > > feature version level usually is used mainly to convey important
> breaking
> > > changes.
> > > Despite the above, there may be some extreme/rare cases where a user
> > wants
> > > to downgrade
> > > all features to a specific previous release. The user may want to do
> this
> > > just
> > > prior to rolling back a Kafka cluster to a previous release.
> > >
> > > To support the above, I have made a change to the KIP explaining that
> the
> > > CLI tool is versioned.
> > > The CLI tool internally has knowledge about a map of features to their
> > > respective max
> > > versions supported by the Broker. The tool's knowledge of features and
> > > their version values,
> > > is limited to the version of the CLI tool itself i.e. the information
> is
> > > packaged into the CLI tool
> > > when it is released. Whenever a Kafka release introduces a new feature
> > > version, or modifies
> > > an existing feature version, the CLI tool shall also be updated with
> this
> > > information,
> > > Newer versions of the CLI tool will be released as part of the Kafka
> > > releases.
> > >
> > > Therefore, to achieve the downgrade need, the user just needs to run
> the
> > > version of
> > > the CLI tool that's part of the particular previous release that he/she
> > is
> > > downgrading to.
> > > To help the user with this, there is a new command added to the CLI
> tool
> > > called `downgrade-all`.
> > > This essentially downgrades max version levels of all features in the
> > > cluster to the versions
> > > known to the CLI tool internally.
> > >
> > > I have explained the above in the KIP under these sections:
> > >
> > > Tooling support (have explained that the CLI tool is versioned):
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > >
> > > Regular CLI tool usage (please refer to point #3, and see the tooling
> > > example)
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > >
> > > > 110. Similarly, if the client library finds a feature mismatch with
> the
> > > broker,
> > > > the client likely needs to log some error message for the user to
> take
> > > some
> > > > actions. It's much more actionable if the error message is "upgrade
> the
> > > > broker to release version 2.6" than just "upgrade the broker to
> feature
> > > > version 7".
> > >
> > > (Kowshik): That's a really good point! If we use ints for feature
> > versions,
> > > the best
> > > message that client can print for debugging is "broker doesn't support
> > > feature version 7", and alongside that print the supported version
> range
> > > returned
> > > by the broker. Then, does it sound reasonable that the user could then
> > > reference
> > > Kafka release logs to figure out which version of the broker release is
> > > required
> > > be deployed, to support feature version 7? I couldn't think of a better
> > > strategy here.
> > >
> > > > 120. When should a developer bump up the version of a feature?
> > >
> > > (Kowshik): Great question! In the KIP, I have added a section:
> > 'Guidelines
> > > on feature versions and workflows'
> > > providing some guidelines on when to use the versioned feature flags,
> and
> > > what
> > > are the regular workflows with the CLI tool.
> > >
> > > Link to the relevant sections:
> > > Guidelines:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> > >
> > > Regular CLI tool usage:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> > >
> > > Advanced CLI tool usage:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > >
> > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the reply. A few more comments.
> > > >
> > > > 110. Keeping the feature version as int is probably fine. I just felt
> > > that
> > > > for some of the common user interactions, it's more convenient to
> > > > relate that to a release version. For example, if a user wants to
> > > downgrade
> > > > to a release 2.5, it's easier for the user to use the tool like "tool
> > > > --downgrade 2.5" instead of "tool --downgrade --feature X --version
> 6".
> > > > Similarly, if the client library finds a feature mismatch with the
> > > broker,
> > > > the client likely needs to log some error message for the user to
> take
> > > some
> > > > actions. It's much more actionable if the error message is "upgrade
> the
> > > > broker to release version 2.6" than just "upgrade the broker to
> feature
> > > > version 7".
> > > >
> > > > 111. Sounds good.
> > > >
> > > > 120. When should a developer bump up the version of a feature?
> > > >
> > > > Jun
> > > >
> > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> > kprakasam@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > I have updated the KIP for the item 111.
> > > > > I'm in the process of addressing 100.6, and will provide an update
> > > soon.
> > > > > I think item 110 is still under discussion given we are now
> > providing a
> > > > way
> > > > > to finalize
> > > > > all features to their latest version levels. In any case, please
> let
> > us
> > > > > know
> > > > > how you feel in response to Colin's comments on this topic.
> > > > >
> > > > > > 111. To put this in context, when we had IBP, the default value
> is
> > > the
> > > > > > current released version. So, if you are a brand new user, you
> > don't
> > > > need
> > > > > > to configure IBP and all new features will be immediately
> available
> > > in
> > > > > the
> > > > > > new cluster. If you are upgrading from an old version, you do
> need
> > to
> > > > > > understand and configure IBP. I see a similar pattern here for
> > > > > > features. From the ease of use perspective, ideally, we shouldn't
> > > > require
> > > > > a
> > > > > > new user to have an extra step such as running a bootstrap script
> > > > unless
> > > > > > it's truly necessary. If someone has a special need (all the
> cases
> > > you
> > > > > > mentioned seem special cases?), they can configure a mode such
> that
> > > > > > features are enabled/disabled manually.
> > > > >
> > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if I didn't
> > > > > understand
> > > > > this need earlier. I have updated the KIP with the approach that
> > > whenever
> > > > > the '/features' node is absent, the controller by default will
> > > bootstrap
> > > > > the node
> > > > > to contain the latest feature levels. Here is the new section in
> the
> > > KIP
> > > > > describing
> > > > > the same:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > > > >
> > > > > Next, as I explained in my response to Colin's suggestions, we are
> > now
> > > > > providing a `--finalize-latest-features` flag with the tooling.
> This
> > > lets
> > > > > the sysadmin finalize all features known to the controller to their
> > > > latest
> > > > > version
> > > > > levels. Please look at this section (point #3 and the tooling
> example
> > > > > later):
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > > >
> > > > >
> > > > > Do you feel this addresses your comment/concern?
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Kowshik,
> > > > > >
> > > > > > Thanks for the reply. A few more replies below.
> > > > > >
> > > > > > 100.6 You can look for the sentence "This operation requires
> ALTER
> > on
> > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > > KafkaApis.authorize().
> > > > > >
> > > > > > 110. From the external client/tooling perspective, it's more
> > natural
> > > to
> > > > > use
> > > > > > the release version for features. If we can use the same release
> > > > version
> > > > > > for internal representation, it seems simpler (easier to
> > understand,
> > > no
> > > > > > mapping overhead, etc). Is there a benefit with separate external
> > and
> > > > > > internal versioning schemes?
> > > > > >
> > > > > > 111. To put this in context, when we had IBP, the default value
> is
> > > the
> > > > > > current released version. So, if you are a brand new user, you
> > don't
> > > > need
> > > > > > to configure IBP and all new features will be immediately
> available
> > > in
> > > > > the
> > > > > > new cluster. If you are upgrading from an old version, you do
> need
> > to
> > > > > > understand and configure IBP. I see a similar pattern here for
> > > > > > features. From the ease of use perspective, ideally, we shouldn't
> > > > > require a
> > > > > > new user to have an extra step such as running a bootstrap script
> > > > unless
> > > > > > it's truly necessary. If someone has a special need (all the
> cases
> > > you
> > > > > > mentioned seem special cases?), they can configure a mode such
> that
> > > > > > features are enabled/disabled manually.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> > > > kprakasam@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Thanks for the feedback and suggestions. Please find my
> response
> > > > below.
> > > > > > >
> > > > > > > > 100.6 For every new request, the admin needs to control who
> is
> > > > > allowed
> > > > > > to
> > > > > > > > issue that request if security is enabled. So, we need to
> > assign
> > > > the
> > > > > > new
> > > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > as an example.
> > > > > > >
> > > > > > > (Kowshik): I don't see any reference to the words ResourceType
> or
> > > > > > > AclOperations
> > > > > > > in the KIP. Please let me know how I can use the KIP that you
> > > linked
> > > > to
> > > > > > > know how to
> > > > > > > setup the appropriate ResourceType and/or ClusterOperation?
> > > > > > >
> > > > > > > > 105. If we change delete to disable, it's better to do this
> > > > > > consistently
> > > > > > > in
> > > > > > > > request protocol and admin api as well.
> > > > > > >
> > > > > > > (Kowshik): The API shouldn't be called 'disable' when it is
> > > deleting
> > > > a
> > > > > > > feature.
> > > > > > > I've just changed the KIP to use 'delete'. I don't have a
> strong
> > > > > > > preference.
> > > > > > >
> > > > > > > > 110. The minVersion/maxVersion for features use int64.
> > Currently,
> > > > our
> > > > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0).
> It's
> > > > > > possible
> > > > > > > > for new features to be included in minor releases too. Should
> > we
> > > > make
> > > > > > the
> > > > > > > > feature versioning match the release versioning?
> > > > > > >
> > > > > > > (Kowshik): The release version can be mapped to a set of
> feature
> > > > > > versions,
> > > > > > > and this can be done, for example in the tool (or even external
> > to
> > > > the
> > > > > > > tool).
> > > > > > > Can you please clarify what I'm missing?
> > > > > > >
> > > > > > > > 111. "During regular operations, the data in the ZK node can
> be
> > > > > mutated
> > > > > > > > only via a specific admin API served only by the
> controller." I
> > > am
> > > > > > > > wondering why can't the controller auto finalize a feature
> > > version
> > > > > > after
> > > > > > > > all brokers are upgraded? For new users who download the
> latest
> > > > > version
> > > > > > > to
> > > > > > > > build a new cluster, it's inconvenient for them to have to
> > > manually
> > > > > > > enable
> > > > > > > > each feature.
> > > > > > >
> > > > > > > (Kowshik): I agree that there is a trade-off here, but it will
> > help
> > > > > > > to decide whether the automation can be thought through in the
> > > future
> > > > > > > in a follow up KIP, or right now in this KIP. We may invest
> > > > > > > in automation, but we have to decide whether we should do it
> > > > > > > now or later.
> > > > > > >
> > > > > > > For the inconvenience that you mentioned, do you think the
> > problem
> > > > that
> > > > > > you
> > > > > > > mentioned can be  overcome by asking for the cluster operator
> to
> > > run
> > > > a
> > > > > > > bootstrap script  when he/she knows that a specific AK release
> > has
> > > > been
> > > > > > > almost completely deployed in a cluster for the first time?
> Idea
> > is
> > > > > that
> > > > > > > the
> > > > > > > bootstrap script will know how to map a specific AK release to
> > > > > finalized
> > > > > > > feature versions, and run the `kafka-features.sh` tool
> > > appropriately
> > > > > > > against
> > > > > > > the cluster.
> > > > > > >
> > > > > > > Now, coming back to your automation proposal/question.
> > > > > > > I do see the value of automated feature version finalization,
> > but I
> > > > > also
> > > > > > > see
> > > > > > > that this will open up several questions and some risks, as
> > > explained
> > > > > > > below.
> > > > > > > The answers to these depend on the definition of the automation
> > we
> > > > > choose
> > > > > > > to build, and how well does it fit into a kafka deployment.
> > > > > > > Basically, it can be unsafe for the controller to finalize
> > feature
> > > > > > version
> > > > > > > upgrades automatically, without learning about the intent of
> the
> > > > > cluster
> > > > > > > operator.
> > > > > > > 1. We would sometimes want to lock feature versions only when
> we
> > > have
> > > > > > > externally verified
> > > > > > > the stability of the broker binary.
> > > > > > > 2. Sometimes only the cluster operator knows that a cluster
> > upgrade
> > > > is
> > > > > > > complete,
> > > > > > > and new brokers are highly unlikely to join the cluster.
> > > > > > > 3. Only the cluster operator knows that the intent is to deploy
> > the
> > > > > same
> > > > > > > version
> > > > > > > of the new broker release across the entire cluster (i.e. the
> > > latest
> > > > > > > downloaded version).
> > > > > > > 4. For downgrades, it appears the controller still needs some
> > > > external
> > > > > > > input
> > > > > > > (such as the proposed tool) to finalize a feature version
> > > downgrade.
> > > > > > >
> > > > > > > If we have automation, that automation can end up failing in
> some
> > > of
> > > > > the
> > > > > > > cases
> > > > > > > above. Then, we need a way to declare that the cluster is "not
> > > ready"
> > > > > if
> > > > > > > the
> > > > > > > controller cannot automatically finalize some basic required
> > > feature
> > > > > > > version
> > > > > > > upgrades across the cluster. We need to make the cluster
> operator
> > > > aware
> > > > > > in
> > > > > > > such a scenario (raise an alert or alike).
> > > > > > >
> > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> > > > instead
> > > > > > of
> > > > > > > 48.
> > > > > > >
> > > > > > > (Kowshik): Done.
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Kowshik,
> > > > > > > >
> > > > > > > > Thanks for the reply. A few more comments below.
> > > > > > > >
> > > > > > > > 100.6 For every new request, the admin needs to control who
> is
> > > > > allowed
> > > > > > to
> > > > > > > > issue that request if security is enabled. So, we need to
> > assign
> > > > the
> > > > > > new
> > > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > > as
> > > > > > > > an example.
> > > > > > > >
> > > > > > > > 105. If we change delete to disable, it's better to do this
> > > > > > consistently
> > > > > > > in
> > > > > > > > request protocol and admin api as well.
> > > > > > > >
> > > > > > > > 110. The minVersion/maxVersion for features use int64.
> > Currently,
> > > > our
> > > > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0).
> It's
> > > > > > possible
> > > > > > > > for new features to be included in minor releases too. Should
> > we
> > > > make
> > > > > > the
> > > > > > > > feature versioning match the release versioning?
> > > > > > > >
> > > > > > > > 111. "During regular operations, the data in the ZK node can
> be
> > > > > mutated
> > > > > > > > only via a specific admin API served only by the
> controller." I
> > > am
> > > > > > > > wondering why can't the controller auto finalize a feature
> > > version
> > > > > > after
> > > > > > > > all brokers are upgraded? For new users who download the
> latest
> > > > > version
> > > > > > > to
> > > > > > > > build a new cluster, it's inconvenient for them to have to
> > > manually
> > > > > > > enable
> > > > > > > > each feature.
> > > > > > > >
> > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> > > > instead
> > > > > > of
> > > > > > > > 48.
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > > > > kprakasam@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey Jun,
> > > > > > > > >
> > > > > > > > > Thanks a lot for the great feedback! Please note that the
> > > design
> > > > > > > > > has changed a little bit on the KIP, and we now propagate
> the
> > > > > > finalized
> > > > > > > > > features metadata only via ZK watches (instead of
> > > > > > UpdateMetadataRequest
> > > > > > > > > from the controller).
> > > > > > > > >
> > > > > > > > > Please find below my response to your questions/feedback,
> > with
> > > > the
> > > > > > > prefix
> > > > > > > > > "(Kowshik):".
> > > > > > > > >
> > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > 100.1 Since this request waits for responses from
> brokers,
> > > > should
> > > > > > we
> > > > > > > > add
> > > > > > > > > a
> > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! Done. I have added a timeout field.
> > > Note:
> > > > > we
> > > > > > no
> > > > > > > > > longer
> > > > > > > > > wait for responses from brokers, since the design has been
> > > > changed
> > > > > so
> > > > > > > > that
> > > > > > > > > the
> > > > > > > > > features information is propagated via ZK. Nevertheless, it
> > is
> > > > > right
> > > > > > to
> > > > > > > > > have a timeout
> > > > > > > > > for the request.
> > > > > > > > >
> > > > > > > > > > 100.2 The response schema is a bit weird. Typically, the
> > > > response
> > > > > > > just
> > > > > > > > > > shows an error code and an error message, instead of
> > echoing
> > > > the
> > > > > > > > request.
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to just
> > return
> > > > an
> > > > > > > error
> > > > > > > > > code and a message.
> > > > > > > > > Previously it was not echoing the "request", rather it was
> > > > > returning
> > > > > > > the
> > > > > > > > > latest set of
> > > > > > > > > cluster-wide finalized features (after applying the
> updates).
> > > But
> > > > > you
> > > > > > > are
> > > > > > > > > right,
> > > > > > > > > the additional info is not required, so I have removed it
> > from
> > > > the
> > > > > > > > response
> > > > > > > > > schema.
> > > > > > > > >
> > > > > > > > > > 100.3 Should we add a separate request to list/describe
> the
> > > > > > existing
> > > > > > > > > > features?
> > > > > > > > >
> > > > > > > > > (Kowshik): This is already present in the KIP via the
> > > > > > > 'DescribeFeatures'
> > > > > > > > > Admin API,
> > > > > > > > > which, underneath covers uses the ApiVersionsRequest to
> > > > > list/describe
> > > > > > > the
> > > > > > > > > existing features. Please read the 'Tooling support'
> section.
> > > > > > > > >
> > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> > > > request.
> > > > > > For
> > > > > > > > > > DELETE, the version field doesn't make sense. So, I guess
> > the
> > > > > > broker
> > > > > > > > just
> > > > > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > > > DeleteFeaturesRequest
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! I have modified the KIP now to
> have 2
> > > > > > separate
> > > > > > > > > controller APIs
> > > > > > > > > serving these different purposes:
> > > > > > > > > 1. updateFeatures
> > > > > > > > > 2. deleteFeatures
> > > > > > > > >
> > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> monotonically
> > > > > > > increasing
> > > > > > > > > > version of the metadata for finalized features." I am
> > > wondering
> > > > > why
> > > > > > > the
> > > > > > > > > > ordering is important?
> > > > > > > > >
> > > > > > > > > (Kowshik): In the latest KIP write-up, it is called epoch
> > > > (instead
> > > > > of
> > > > > > > > > version), and
> > > > > > > > > it is just the ZK node version. Basically, this is the
> epoch
> > > for
> > > > > the
> > > > > > > > > cluster-wide
> > > > > > > > > finalized feature version metadata. This metadata is served
> > to
> > > > > > clients
> > > > > > > > via
> > > > > > > > > the
> > > > > > > > > ApiVersionsResponse (for reads). We propagate updates from
> > the
> > > > > > > > '/features'
> > > > > > > > > ZK node
> > > > > > > > > to all brokers, via ZK watches setup by each broker on the
> > > > > > '/features'
> > > > > > > > > node.
> > > > > > > > >
> > > > > > > > > Now here is why the ordering is important:
> > > > > > > > > ZK watches don't propagate at the same time. As a result,
> the
> > > > > > > > > ApiVersionsResponse
> > > > > > > > > is eventually consistent across brokers. This can introduce
> > > cases
> > > > > > > > > where clients see an older lower epoch of the features
> > > metadata,
> > > > > > after
> > > > > > > a
> > > > > > > > > more recent
> > > > > > > > > higher epoch was returned at a previous point in time. We
> > > expect
> > > > > > > clients
> > > > > > > > > to always employ the rule that the latest received higher
> > epoch
> > > > of
> > > > > > > > metadata
> > > > > > > > > always trumps an older smaller epoch. Those clients that
> are
> > > > > external
> > > > > > > to
> > > > > > > > > Kafka should strongly consider discovering the latest
> > metadata
> > > > once
> > > > > > > > during
> > > > > > > > > startup from the brokers, and if required refresh the
> > metadata
> > > > > > > > periodically
> > > > > > > > > (to get the latest metadata).
> > > > > > > > >
> > > > > > > > > > 100.6 Could you specify the required ACL for this new
> > > request?
> > > > > > > > >
> > > > > > > > > (Kowshik): What is ACL, and how could I find out which one
> to
> > > > > > specify?
> > > > > > > > > Please could you provide me some pointers? I'll be glad to
> > > update
> > > > > the
> > > > > > > > > KIP once I know the next steps.
> > > > > > > > >
> > > > > > > > > > 101. For the broker registration ZK node, should we bump
> up
> > > the
> > > > > > > version
> > > > > > > > > in
> > > > > > > > > the json?
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! Done. I've increased the version in
> > the
> > > > > > broker
> > > > > > > > json
> > > > > > > > > by 1.
> > > > > > > > >
> > > > > > > > > > 102. For the /features ZK node, not sure if we need the
> > epoch
> > > > > > field.
> > > > > > > > Each
> > > > > > > > > > ZK node has an internal version field that is incremented
> > on
> > > > > every
> > > > > > > > > update.
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node version
> > > now,
> > > > > > > instead
> > > > > > > > of
> > > > > > > > > explicitly
> > > > > > > > > incremented epoch.
> > > > > > > > >
> > > > > > > > > > 103. "Enabling the actual semantics of a feature version
> > > > > > cluster-wide
> > > > > > > > is
> > > > > > > > > > left to the discretion of the logic implementing the
> > feature
> > > > (ex:
> > > > > > can
> > > > > > > > be
> > > > > > > > > > done via dynamic broker config)." Does that mean the
> broker
> > > > > > > > registration
> > > > > > > > > ZK
> > > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > >
> > > > > > > > > (Kowshik): Not really. The text was just conveying that a
> > > broker
> > > > > > could
> > > > > > > > > "know" of
> > > > > > > > > a new feature version, but it does not mean the broker
> should
> > > > have
> > > > > > also
> > > > > > > > > activated the effects of the feature version. Knowing vs
> > > > activation
> > > > > > > are 2
> > > > > > > > > separate things,
> > > > > > > > > and the latter can be achieved by dynamic config. I have
> > > reworded
> > > > > the
> > > > > > > > text
> > > > > > > > > to
> > > > > > > > > make this clear to the reader.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > 104.1 It would be useful to describe when the feature
> > > metadata
> > > > is
> > > > > > > > > included
> > > > > > > > > > in the request. My understanding is that it's only
> included
> > > if
> > > > > (1)
> > > > > > > > there
> > > > > > > > > is
> > > > > > > > > > a change to the finalized feature; (2) broker restart;
> (3)
> > > > > > controller
> > > > > > > > > > failover.
> > > > > > > > > > 104.2 The new fields have the following versions. Why are
> > the
> > > > > > > versions
> > > > > > > > 3+
> > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > >       "fields":  [
> > > > > > > > > >         {"name": "Name", "type":  "string", "versions":
> > > "3+",
> > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > >         {"name":  "Version", "type":  "int64",
> "versions":
> > > > "3+",
> > > > > > > > > >           "about": "The finalized version for the
> > feature."}
> > > > > > > > > >       ]
> > > > > > > > >
> > > > > > > > > (Kowshik): With the new improved design, we have completely
> > > > > > eliminated
> > > > > > > > the
> > > > > > > > > need to
> > > > > > > > > use UpdateMetadataRequest. This is because we now rely on
> ZK
> > to
> > > > > > deliver
> > > > > > > > the
> > > > > > > > > notifications for changes to the '/features' ZK node.
> > > > > > > > >
> > > > > > > > > > 105. kafka-features.sh: Instead of using update/delete,
> > > perhaps
> > > > > > it's
> > > > > > > > > better
> > > > > > > > > > to use enable/disable?
> > > > > > > > >
> > > > > > > > > (Kowshik): For delete, yes, I have changed it so that we
> > > instead
> > > > > call
> > > > > > > it
> > > > > > > > > 'disable'.
> > > > > > > > > However for 'update', it can now also refer to either an
> > > upgrade
> > > > > or a
> > > > > > > > > forced downgrade.
> > > > > > > > > Therefore, I have left it the way it is, just calling it as
> > > just
> > > > > > > > 'update'.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Kowshik,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP. Looks good overall. A few comments
> > below.
> > > > > > > > > >
> > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > > 100.1 Since this request waits for responses from
> brokers,
> > > > should
> > > > > > we
> > > > > > > > add
> > > > > > > > > a
> > > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > > 100.2 The response schema is a bit weird. Typically, the
> > > > response
> > > > > > > just
> > > > > > > > > > shows an error code and an error message, instead of
> > echoing
> > > > the
> > > > > > > > request.
> > > > > > > > > > 100.3 Should we add a separate request to list/describe
> the
> > > > > > existing
> > > > > > > > > > features?
> > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> > > > request.
> > > > > > For
> > > > > > > > > > DELETE, the version field doesn't make sense. So, I guess
> > the
> > > > > > broker
> > > > > > > > just
> > > > > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The
> monotonically
> > > > > > > increasing
> > > > > > > > > > version of the metadata for finalized features." I am
> > > wondering
> > > > > why
> > > > > > > the
> > > > > > > > > > ordering is important?
> > > > > > > > > > 100.6 Could you specify the required ACL for this new
> > > request?
> > > > > > > > > >
> > > > > > > > > > 101. For the broker registration ZK node, should we bump
> up
> > > the
> > > > > > > version
> > > > > > > > > in
> > > > > > > > > > the json?
> > > > > > > > > >
> > > > > > > > > > 102. For the /features ZK node, not sure if we need the
> > epoch
> > > > > > field.
> > > > > > > > Each
> > > > > > > > > > ZK node has an internal version field that is incremented
> > on
> > > > > every
> > > > > > > > > update.
> > > > > > > > > >
> > > > > > > > > > 103. "Enabling the actual semantics of a feature version
> > > > > > cluster-wide
> > > > > > > > is
> > > > > > > > > > left to the discretion of the logic implementing the
> > feature
> > > > (ex:
> > > > > > can
> > > > > > > > be
> > > > > > > > > > done via dynamic broker config)." Does that mean the
> broker
> > > > > > > > registration
> > > > > > > > > ZK
> > > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > > >
> > > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > > 104.1 It would be useful to describe when the feature
> > > metadata
> > > > is
> > > > > > > > > included
> > > > > > > > > > in the request. My understanding is that it's only
> included
> > > if
> > > > > (1)
> > > > > > > > there
> > > > > > > > > is
> > > > > > > > > > a change to the finalized feature; (2) broker restart;
> (3)
> > > > > > controller
> > > > > > > > > > failover.
> > > > > > > > > > 104.2 The new fields have the following versions. Why are
> > the
> > > > > > > versions
> > > > > > > > 3+
> > > > > > > > > > when the top version is bumped to 6?
> > > > > > > > > >       "fields":  [
> > > > > > > > > >         {"name": "Name", "type":  "string", "versions":
> > > "3+",
> > > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > > >         {"name":  "Version", "type":  "int64",
> "versions":
> > > > "3+",
> > > > > > > > > >           "about": "The finalized version for the
> > feature."}
> > > > > > > > > >       ]
> > > > > > > > > >
> > > > > > > > > > 105. kafka-features.sh: Instead of using update/delete,
> > > perhaps
> > > > > > it's
> > > > > > > > > better
> > > > > > > > > > to use enable/disable?
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > > > > > kprakasam@confluent.io
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey Boyang,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the great feedback! I have updated the KIP
> > based
> > > > on
> > > > > > your
> > > > > > > > > > > feedback.
> > > > > > > > > > > Please find my response below for your comments, look
> for
> > > > > > sentences
> > > > > > > > > > > starting
> > > > > > > > > > > with "(Kowshik)" below.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 1. "When is it safe for the brokers to begin handling
> > EOS
> > > > > > > traffic"
> > > > > > > > > > could
> > > > > > > > > > > be
> > > > > > > > > > > > converted as "When is it safe for the brokers to
> start
> > > > > serving
> > > > > > > new
> > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> explained
> > > > > earlier
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > context.
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > >
> > > > > > > > > > > > 2. In the *Explanation *section, the metadata version
> > > > number
> > > > > > part
> > > > > > > > > > seems a
> > > > > > > > > > > > bit blurred. Could you point a reference to later
> > section
> > > > > that
> > > > > > we
> > > > > > > > > going
> > > > > > > > > > > to
> > > > > > > > > > > > store it in Zookeeper and update it every time when
> > there
> > > > is
> > > > > a
> > > > > > > > > feature
> > > > > > > > > > > > change?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! Done. I've added a reference in
> > the
> > > > > KIP.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 3. For the feature downgrade, although it's a
> Non-goal
> > of
> > > > the
> > > > > > > KIP,
> > > > > > > > > for
> > > > > > > > > > > > features such as group coordinator semantics, there
> is
> > no
> > > > > legal
> > > > > > > > > > scenario
> > > > > > > > > > > to
> > > > > > > > > > > > perform a downgrade at all. So having downgrade door
> > open
> > > > is
> > > > > > > pretty
> > > > > > > > > > > > error-prone as human faults happen all the time. I'm
> > > > assuming
> > > > > > as
> > > > > > > > new
> > > > > > > > > > > > features are implemented, it's not very hard to add a
> > > flag
> > > > > > during
> > > > > > > > > > feature
> > > > > > > > > > > > creation to indicate whether this feature is
> > > > "downgradable".
> > > > > > > Could
> > > > > > > > > you
> > > > > > > > > > > > explain a bit more on the extra engineering effort
> for
> > > > > shipping
> > > > > > > > this
> > > > > > > > > > KIP
> > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree here.
> > While
> > > I
> > > > > > agree
> > > > > > > > that
> > > > > > > > > > > accidental
> > > > > > > > > > > downgrades can cause problems, I also think sometimes
> > > > > downgrades
> > > > > > > > should
> > > > > > > > > > > be allowed for emergency reasons (not all downgrades
> > cause
> > > > > > issues).
> > > > > > > > > > > It is just subjective to the feature being downgraded.
> > > > > > > > > > >
> > > > > > > > > > > To be more strict about feature version downgrades, I
> > have
> > > > > > modified
> > > > > > > > the
> > > > > > > > > > KIP
> > > > > > > > > > > proposing that we mandate a `--force-downgrade` flag be
> > > used
> > > > in
> > > > > > the
> > > > > > > > > > > UPDATE_FEATURES api
> > > > > > > > > > > and the tooling, whenever the human is downgrading a
> > > > finalized
> > > > > > > > feature
> > > > > > > > > > > version.
> > > > > > > > > > > Hopefully this should cover the requirement, until we
> > find
> > > > the
> > > > > > need
> > > > > > > > for
> > > > > > > > > > > advanced downgrade support.
> > > > > > > > > > >
> > > > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> > > versions
> > > > > will
> > > > > > > be
> > > > > > > > > > > defined
> > > > > > > > > > > > in the broker code." So this means in order to
> > restrict a
> > > > > > certain
> > > > > > > > > > > feature,
> > > > > > > > > > > > we need to start the broker first and then send a
> > feature
> > > > > > gating
> > > > > > > > > > request
> > > > > > > > > > > > immediately, which introduces a time gap and the
> > > > > > > intended-to-close
> > > > > > > > > > > feature
> > > > > > > > > > > > could actually serve request during this phase. Do
> you
> > > > think
> > > > > we
> > > > > > > > > should
> > > > > > > > > > > also
> > > > > > > > > > > > support configurations as well so that admin user
> could
> > > > > freely
> > > > > > > roll
> > > > > > > > > up
> > > > > > > > > > a
> > > > > > > > > > > > cluster with all nodes complying the same feature
> > gating,
> > > > > > without
> > > > > > > > > > > worrying
> > > > > > > > > > > > about the turnaround time to propagate the message
> only
> > > > after
> > > > > > the
> > > > > > > > > > cluster
> > > > > > > > > > > > starts up?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): This is a great point/question. One of the
> > > > > > expectations
> > > > > > > > out
> > > > > > > > > of
> > > > > > > > > > > this KIP, which is
> > > > > > > > > > > already followed in the broker, is the following.
> > > > > > > > > > >  - Imagine at time T1 the broker starts up and
> registers
> > > it’s
> > > > > > > > presence
> > > > > > > > > in
> > > > > > > > > > > ZK,
> > > > > > > > > > >    along with advertising it’s supported features.
> > > > > > > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > > >    from the controller, which contains the latest
> > finalized
> > > > > > > features
> > > > > > > > as
> > > > > > > > > > > seen by
> > > > > > > > > > >    the controller. The broker validates this data
> against
> > > > it’s
> > > > > > > > > supported
> > > > > > > > > > > features to
> > > > > > > > > > >    make sure there is no mismatch (it will shutdown if
> > > there
> > > > is
> > > > > > an
> > > > > > > > > > > incompatibility).
> > > > > > > > > > >
> > > > > > > > > > > It is expected that during the time between the 2
> events
> > T1
> > > > and
> > > > > > T2,
> > > > > > > > the
> > > > > > > > > > > broker is
> > > > > > > > > > > almost a silent entity in the cluster. It does not add
> > any
> > > > > value
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > cluster, or carry
> > > > > > > > > > > out any important broker activities. By “important”, I
> > mean
> > > > it
> > > > > is
> > > > > > > not
> > > > > > > > > > doing
> > > > > > > > > > > mutations
> > > > > > > > > > > on it’s persistence, not mutating critical in-memory
> > state,
> > > > > won’t
> > > > > > > be
> > > > > > > > > > > serving
> > > > > > > > > > > produce/fetch requests. Note it doesn’t even know it’s
> > > > assigned
> > > > > > > > > > partitions
> > > > > > > > > > > until
> > > > > > > > > > > it receives UpdateMetadataRequest from controller.
> > Anything
> > > > the
> > > > > > > > broker
> > > > > > > > > is
> > > > > > > > > > > doing up
> > > > > > > > > > > until this point is not damaging/useful.
> > > > > > > > > > >
> > > > > > > > > > > I’ve clarified the above in the KIP, see this new
> > section:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > > > > .
> > > > > > > > > > >
> > > > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> > > existing
> > > > > > > > Feature",
> > > > > > > > > > may
> > > > > > > > > > > be
> > > > > > > > > > > > I misunderstood something, I thought the features are
> > > > defined
> > > > > > in
> > > > > > > > > broker
> > > > > > > > > > > > code, so admin could not really create a new feature?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! You understood this right. Here
> > > > adding
> > > > > a
> > > > > > > > > feature
> > > > > > > > > > > means we are
> > > > > > > > > > > adding a cluster-wide finalized *max* version for a
> > feature
> > > > > that
> > > > > > > was
> > > > > > > > > > > previously never finalized.
> > > > > > > > > > > I have clarified this in the KIP now.
> > > > > > > > > > >
> > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > to
> > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! I have modified the KIP adding
> > the
> > > > > above
> > > > > > > (see
> > > > > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > > > > >
> > > > > > > > > > > > 7. I think we haven't discussed the alternative
> > solution
> > > to
> > > > > > pass
> > > > > > > > the
> > > > > > > > > > > > feature information through Zookeeper. Is that
> > mentioned
> > > in
> > > > > the
> > > > > > > KIP
> > > > > > > > > to
> > > > > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Nice question! The broker reads finalized
> > > feature
> > > > > info
> > > > > > > > > stored
> > > > > > > > > > in
> > > > > > > > > > > ZK,
> > > > > > > > > > > only during startup when it does a validation. When
> > serving
> > > > > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > > > > broker does not read this info from ZK directly. I'd
> > > imagine
> > > > > the
> > > > > > > risk
> > > > > > > > > is
> > > > > > > > > > > that it can increase
> > > > > > > > > > > the ZK read QPS which can be a bottleneck for the
> system.
> > > > > Today,
> > > > > > in
> > > > > > > > > Kafka
> > > > > > > > > > > we use the
> > > > > > > > > > > controller to fan out ZK updates to brokers and we want
> > to
> > > > > stick
> > > > > > to
> > > > > > > > > that
> > > > > > > > > > > pattern to avoid
> > > > > > > > > > > the ZK read bottleneck when serving
> `ApiVersionsRequest`.
> > > > > > > > > > >
> > > > > > > > > > > > 8. I was under the impression that user could
> > configure a
> > > > > range
> > > > > > > of
> > > > > > > > > > > > supported versions, what's the trade-off for allowing
> > > > single
> > > > > > > > > finalized
> > > > > > > > > > > > version only?
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great question! The finalized version of a
> > > feature
> > > > > > > > basically
> > > > > > > > > > > refers to
> > > > > > > > > > > the cluster-wide finalized feature "maximum" version.
> For
> > > > > > example,
> > > > > > > if
> > > > > > > > > the
> > > > > > > > > > > 'group_coordinator' feature
> > > > > > > > > > > has the finalized version set to 10, then, it means
> that
> > > > > > > cluster-wide
> > > > > > > > > all
> > > > > > > > > > > versions upto v10 are
> > > > > > > > > > > supported for this feature. However, note that if some
> > > > version
> > > > > > (ex:
> > > > > > > > v0)
> > > > > > > > > > > gets deprecated
> > > > > > > > > > > for this feature, then we don’t convey that using this
> > > scheme
> > > > > > (also
> > > > > > > > > > > supporting deprecation is a non-goal).
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all points,
> > > refering
> > > > to
> > > > > > > > > finalized
> > > > > > > > > > > feature "maximum" versions.
> > > > > > > > > > >
> > > > > > > > > > > > 9. One minor syntax fix: Note that here the "client"
> > here
> > > > may
> > > > > > be
> > > > > > > a
> > > > > > > > > > > producer
> > > > > > > > > > >
> > > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Kowshik
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > > > > reluctanthero104@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey Kowshik,
> > > > > > > > > > > >
> > > > > > > > > > > > thanks for the revised KIP. Got a couple of
> questions:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. "When is it safe for the brokers to begin handling
> > EOS
> > > > > > > traffic"
> > > > > > > > > > could
> > > > > > > > > > > be
> > > > > > > > > > > > converted as "When is it safe for the brokers to
> start
> > > > > serving
> > > > > > > new
> > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not
> explained
> > > > > earlier
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > context.
> > > > > > > > > > > >
> > > > > > > > > > > > 2. In the *Explanation *section, the metadata version
> > > > number
> > > > > > part
> > > > > > > > > > seems a
> > > > > > > > > > > > bit blurred. Could you point a reference to later
> > section
> > > > > that
> > > > > > we
> > > > > > > > > going
> > > > > > > > > > > to
> > > > > > > > > > > > store it in Zookeeper and update it every time when
> > there
> > > > is
> > > > > a
> > > > > > > > > feature
> > > > > > > > > > > > change?
> > > > > > > > > > > >
> > > > > > > > > > > > 3. For the feature downgrade, although it's a
> Non-goal
> > of
> > > > the
> > > > > > > KIP,
> > > > > > > > > for
> > > > > > > > > > > > features such as group coordinator semantics, there
> is
> > no
> > > > > legal
> > > > > > > > > > scenario
> > > > > > > > > > > to
> > > > > > > > > > > > perform a downgrade at all. So having downgrade door
> > open
> > > > is
> > > > > > > pretty
> > > > > > > > > > > > error-prone as human faults happen all the time. I'm
> > > > assuming
> > > > > > as
> > > > > > > > new
> > > > > > > > > > > > features are implemented, it's not very hard to add a
> > > flag
> > > > > > during
> > > > > > > > > > feature
> > > > > > > > > > > > creation to indicate whether this feature is
> > > > "downgradable".
> > > > > > > Could
> > > > > > > > > you
> > > > > > > > > > > > explain a bit more on the extra engineering effort
> for
> > > > > shipping
> > > > > > > > this
> > > > > > > > > > KIP
> > > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > > >
> > > > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> > > versions
> > > > > will
> > > > > > > be
> > > > > > > > > > > defined
> > > > > > > > > > > > in the broker code." So this means in order to
> > restrict a
> > > > > > certain
> > > > > > > > > > > feature,
> > > > > > > > > > > > we need to start the broker first and then send a
> > feature
> > > > > > gating
> > > > > > > > > > request
> > > > > > > > > > > > immediately, which introduces a time gap and the
> > > > > > > intended-to-close
> > > > > > > > > > > feature
> > > > > > > > > > > > could actually serve request during this phase. Do
> you
> > > > think
> > > > > we
> > > > > > > > > should
> > > > > > > > > > > also
> > > > > > > > > > > > support configurations as well so that admin user
> could
> > > > > freely
> > > > > > > roll
> > > > > > > > > up
> > > > > > > > > > a
> > > > > > > > > > > > cluster with all nodes complying the same feature
> > gating,
> > > > > > without
> > > > > > > > > > > worrying
> > > > > > > > > > > > about the turnaround time to propagate the message
> only
> > > > after
> > > > > > the
> > > > > > > > > > cluster
> > > > > > > > > > > > starts up?
> > > > > > > > > > > >
> > > > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> > > existing
> > > > > > > > Feature",
> > > > > > > > > > may
> > > > > > > > > > > be
> > > > > > > > > > > > I misunderstood something, I thought the features are
> > > > defined
> > > > > > in
> > > > > > > > > broker
> > > > > > > > > > > > code, so admin could not really create a new feature?
> > > > > > > > > > > >
> > > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > > to
> > > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > > >
> > > > > > > > > > > > 7. I think we haven't discussed the alternative
> > solution
> > > to
> > > > > > pass
> > > > > > > > the
> > > > > > > > > > > > feature information through Zookeeper. Is that
> > mentioned
> > > in
> > > > > the
> > > > > > > KIP
> > > > > > > > > to
> > > > > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > > > > >
> > > > > > > > > > > > 8. I was under the impression that user could
> > configure a
> > > > > range
> > > > > > > of
> > > > > > > > > > > > supported versions, what's the trade-off for allowing
> > > > single
> > > > > > > > > finalized
> > > > > > > > > > > > version only?
> > > > > > > > > > > >
> > > > > > > > > > > > 9. One minor syntax fix: Note that here the "client"
> > here
> > > > may
> > > > > > be
> > > > > > > a
> > > > > > > > > > > producer
> > > > > > > > > > > >
> > > > > > > > > > > > Boyang
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > > > > > cmccabe@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam
> > wrote:
> > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the feedback! I've changed the KIP to
> > > > address
> > > > > > your
> > > > > > > > > > > > > > suggestions.
> > > > > > > > > > > > > > Please find below my explanation. Here is a link
> to
> > > KIP
> > > > > > 584:
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > .
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. '__data_version__' is the version of the
> > finalized
> > > > > > feature
> > > > > > > > > > > metadata
> > > > > > > > > > > > > > (i.e. actual ZK node contents), while the
> > > > > > > '__schema_version__'
> > > > > > > > is
> > > > > > > > > > the
> > > > > > > > > > > > > > version of the schema of the data persisted in
> ZK.
> > > > These
> > > > > > > serve
> > > > > > > > > > > > different
> > > > > > > > > > > > > > purposes. '__data_version__' is is useful mainly
> to
> > > > > clients
> > > > > > > > > during
> > > > > > > > > > > > reads,
> > > > > > > > > > > > > > to differentiate between the 2 versions of
> > eventually
> > > > > > > > consistent
> > > > > > > > > > > > > 'finalized
> > > > > > > > > > > > > > features' metadata (i.e. larger metadata version
> is
> > > > more
> > > > > > > > recent).
> > > > > > > > > > > > > > '__schema_version__' provides an additional
> degree
> > of
> > > > > > > > > flexibility,
> > > > > > > > > > > > where
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > we decide to change the schema for '/features'
> node
> > > in
> > > > ZK
> > > > > > (in
> > > > > > > > the
> > > > > > > > > > > > > future),
> > > > > > > > > > > > > > then we can manage broker roll outs suitably
> (i.e.
> > > > > > > > > > > > > > serialization/deserialization of the ZK data can
> be
> > > > > handled
> > > > > > > > > > safely).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Kowshik,
> > > > > > > > > > > > >
> > > > > > > > > > > > > If you're talking about a number that lets you know
> > if
> > > > data
> > > > > > is
> > > > > > > > more
> > > > > > > > > > or
> > > > > > > > > > > > > less recent, we would typically call that an epoch,
> > and
> > > > > not a
> > > > > > > > > > version.
> > > > > > > > > > > > For
> > > > > > > > > > > > > the ZK data structures, the word "version" is
> > typically
> > > > > > > reserved
> > > > > > > > > for
> > > > > > > > > > > > > describing changes to the overall schema of the
> data
> > > that
> > > > > is
> > > > > > > > > written
> > > > > > > > > > to
> > > > > > > > > > > > > ZooKeeper.  We don't even really change the
> "version"
> > > of
> > > > > > those
> > > > > > > > > > schemas
> > > > > > > > > > > > that
> > > > > > > > > > > > > much, since most changes are backwards-compatible.
> > But
> > > > we
> > > > > do
> > > > > > > > > include
> > > > > > > > > > > > that
> > > > > > > > > > > > > version field just in case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think we really need an epoch here, though,
> > > since
> > > > > we
> > > > > > > can
> > > > > > > > > just
> > > > > > > > > > > > look
> > > > > > > > > > > > > at the broker epoch.  Whenever the broker
> registers,
> > > its
> > > > > > epoch
> > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > greater than the previous broker epoch.  And the
> > newly
> > > > > > > registered
> > > > > > > > > > data
> > > > > > > > > > > > will
> > > > > > > > > > > > > take priority.  This will be a lot simpler than
> > adding
> > > a
> > > > > > > separate
> > > > > > > > > > epoch
> > > > > > > > > > > > > system, I think.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. Regarding admin client needing min and max
> > > > > information -
> > > > > > > you
> > > > > > > > > are
> > > > > > > > > > > > > right!
> > > > > > > > > > > > > > I've changed the KIP such that the Admin API also
> > > > allows
> > > > > > the
> > > > > > > > user
> > > > > > > > > > to
> > > > > > > > > > > > read
> > > > > > > > > > > > > > 'supported features' from a specific broker.
> Please
> > > > look
> > > > > at
> > > > > > > the
> > > > > > > > > > > section
> > > > > > > > > > > > > > "Admin API changes".
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was
> > not
> > > > > > > > deliberate.
> > > > > > > > > > > I've
> > > > > > > > > > > > > > improved the KIP to just use `long` at all
> places.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sounds good.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool -
> you
> > > are
> > > > > > right!
> > > > > > > > > I've
> > > > > > > > > > > > > updated
> > > > > > > > > > > > > > the KIP sketching the functionality provided by
> > this
> > > > > tool,
> > > > > > > with
> > > > > > > > > > some
> > > > > > > > > > > > > > examples. Please look at the section "Tooling
> > support
> > > > > > > > examples".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > > > > >
> > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > Colin
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In the "Schema" section, do we really need both
> > > > > > > > > > __schema_version__
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > __data_version__?  Can we just have a single
> > > version
> > > > > > field
> > > > > > > > > here?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Shouldn't the Admin(Client) function have some
> > way
> > > to
> > > > > get
> > > > > > > the
> > > > > > > > > min
> > > > > > > > > > > and
> > > > > > > > > > > > > max
> > > > > > > > > > > > > > > information that we're exposing as well?  I
> guess
> > > we
> > > > > > could
> > > > > > > > have
> > > > > > > > > > > min,
> > > > > > > > > > > > > max,
> > > > > > > > > > > > > > > and current.  Unrelated: is the use of Long
> > rather
> > > > than
> > > > > > > long
> > > > > > > > > > > > deliberate
> > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It would be good to describe how the command
> line
> > > > tool
> > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work.  For
> > example
> > > > the
> > > > > > > flags
> > > > > > > > > that
> > > > > > > > > > > it
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > take and the output that it will generate to
> > > STDOUT.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik
> Prakasam
> > > > wrote:
> > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I've opened KIP-584
> > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > is intended to provide a versioning scheme
> for
> > > > > > features.
> > > > > > > > I'd
> > > > > > > > > > like
> > > > > > > > > > > > to
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > this thread to discuss the same. I'd
> appreciate
> > > any
> > > > > > > > feedback
> > > > > > > > > on
> > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > Here
> > > > > > > > > > > > > > > > is a link to KIP-584
> > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > > >  .
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Jun,

Thanks a lot for the feedback and the questions!
Please find my response below.

> 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It seems
> that field needs to be persisted somewhere in ZK?

(Kowshik): Great question! Below is my explanation. Please help me
understand,
if you feel there are cases where we would need to still persist it in ZK.

Firstly I have updated my thoughts into the KIP now, under the 'guidelines'
section:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows

The allowDowngrade boolean field is just to restrict the user intent, and
to remind
them to double check their intent before proceeding. It should be set to
true
by the user in a request, only when the user intent is to forcefully
"attempt" a
downgrade of a specific feature's max version level, to the provided value
in
the request.

We can extend this safeguard. The controller (on it's end) can maintain
rules in the code, that, for safety reasons would outright reject certain
downgrades
from a specific max_version_level for a specific feature. Such rejections
may
happen depending on the feature being downgraded, and from what version
level.

The CLI tool only allows a downgrade attempt in conjunction with specific
flags and sub-commands. For example, in the CLI tool, if the user uses the
'downgrade-all' command, or passes '--allow-downgrade' flag when updating a
specific feature, only then the tool will translate this ask to setting
'allowDowngrade' field in the request to the server.

> 201. UpdateFeaturesResponse has the following top level fields. Should
> those fields be per feature?
>
>   "fields": [
>     { "name": "ErrorCode", "type": "int16", "versions": "0+",
>       "about": "The error code, or 0 if there was no error." },
>     { "name": "ErrorMessage", "type": "string", "versions": "0+",
>       "about": "The error message, or null if there was no error." }
>   ]

(Kowshik): Great question!
As such, the API is transactional, as explained in the sections linked
below.
Either all provided FeatureUpdate was applied, or none.
It's the reason I felt we can have just one error code + message.
Happy to extend this if you feel otherwise. Please let me know.

Link to sections:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController

https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees

> 202. The /features path in ZK has a field min_version_level. Which API and
> tool can change that value?

(Kowshik): Great question! Currently this cannot be modified by using the
API or the tool.
Feature version deprecation (by raising min_version_level) can be done only
by the Controller directly. The rationale is explained in this section:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation


Cheers,
Kowshik

On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for addressing those comments. Just a few more minor comments.
>
> 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It seems
> that field needs to be persisted somewhere in ZK?
>
> 201. UpdateFeaturesResponse has the following top level fields. Should
> those fields be per feature?
>
>   "fields": [
>     { "name": "ErrorCode", "type": "int16", "versions": "0+",
>       "about": "The error code, or 0 if there was no error." },
>     { "name": "ErrorMessage", "type": "string", "versions": "0+",
>       "about": "The error message, or null if there was no error." }
>   ]
>
> 202. The /features path in ZK has a field min_version_level. Which API and
> tool can change that value?
>
> Jun
>
> On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hi Jun,
> >
> > Thanks for the feedback! I have updated the KIP-584 addressing your
> > comments.
> > Please find my response below.
> >
> > > 100.6 You can look for the sentence "This operation requires ALTER on
> > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > KafkaApis.authorize().
> >
> > (Kowshik): Done. Great point! For the newly introduced UPDATE_FEATURES
> api,
> > I have added a
> > requirement that AclOperation.ALTER is required on ResourceType.CLUSTER.
> >
> > > 110. Keeping the feature version as int is probably fine. I just felt
> > that
> > > for some of the common user interactions, it's more convenient to
> > > relate that to a release version. For example, if a user wants to
> > downgrade
> > > to a release 2.5, it's easier for the user to use the tool like "tool
> > > --downgrade 2.5" instead of "tool --downgrade --feature X --version 6".
> >
> > (Kowshik): Great point. Generally, maximum feature version levels are not
> > downgradable after
> > they are finalized in the cluster. This is because, as a guideline
> bumping
> > feature version level usually is used mainly to convey important breaking
> > changes.
> > Despite the above, there may be some extreme/rare cases where a user
> wants
> > to downgrade
> > all features to a specific previous release. The user may want to do this
> > just
> > prior to rolling back a Kafka cluster to a previous release.
> >
> > To support the above, I have made a change to the KIP explaining that the
> > CLI tool is versioned.
> > The CLI tool internally has knowledge about a map of features to their
> > respective max
> > versions supported by the Broker. The tool's knowledge of features and
> > their version values,
> > is limited to the version of the CLI tool itself i.e. the information is
> > packaged into the CLI tool
> > when it is released. Whenever a Kafka release introduces a new feature
> > version, or modifies
> > an existing feature version, the CLI tool shall also be updated with this
> > information,
> > Newer versions of the CLI tool will be released as part of the Kafka
> > releases.
> >
> > Therefore, to achieve the downgrade need, the user just needs to run the
> > version of
> > the CLI tool that's part of the particular previous release that he/she
> is
> > downgrading to.
> > To help the user with this, there is a new command added to the CLI tool
> > called `downgrade-all`.
> > This essentially downgrades max version levels of all features in the
> > cluster to the versions
> > known to the CLI tool internally.
> >
> > I have explained the above in the KIP under these sections:
> >
> > Tooling support (have explained that the CLI tool is versioned):
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> >
> > Regular CLI tool usage (please refer to point #3, and see the tooling
> > example)
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> >
> > > 110. Similarly, if the client library finds a feature mismatch with the
> > broker,
> > > the client likely needs to log some error message for the user to take
> > some
> > > actions. It's much more actionable if the error message is "upgrade the
> > > broker to release version 2.6" than just "upgrade the broker to feature
> > > version 7".
> >
> > (Kowshik): That's a really good point! If we use ints for feature
> versions,
> > the best
> > message that client can print for debugging is "broker doesn't support
> > feature version 7", and alongside that print the supported version range
> > returned
> > by the broker. Then, does it sound reasonable that the user could then
> > reference
> > Kafka release logs to figure out which version of the broker release is
> > required
> > be deployed, to support feature version 7? I couldn't think of a better
> > strategy here.
> >
> > > 120. When should a developer bump up the version of a feature?
> >
> > (Kowshik): Great question! In the KIP, I have added a section:
> 'Guidelines
> > on feature versions and workflows'
> > providing some guidelines on when to use the versioned feature flags, and
> > what
> > are the regular workflows with the CLI tool.
> >
> > Link to the relevant sections:
> > Guidelines:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> >
> > Regular CLI tool usage:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> >
> > Advanced CLI tool usage:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> >
> >
> > Cheers,
> > Kowshik
> >
> >
> > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the reply. A few more comments.
> > >
> > > 110. Keeping the feature version as int is probably fine. I just felt
> > that
> > > for some of the common user interactions, it's more convenient to
> > > relate that to a release version. For example, if a user wants to
> > downgrade
> > > to a release 2.5, it's easier for the user to use the tool like "tool
> > > --downgrade 2.5" instead of "tool --downgrade --feature X --version 6".
> > > Similarly, if the client library finds a feature mismatch with the
> > broker,
> > > the client likely needs to log some error message for the user to take
> > some
> > > actions. It's much more actionable if the error message is "upgrade the
> > > broker to release version 2.6" than just "upgrade the broker to feature
> > > version 7".
> > >
> > > 111. Sounds good.
> > >
> > > 120. When should a developer bump up the version of a feature?
> > >
> > > Jun
> > >
> > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> kprakasam@confluent.io
> > >
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > I have updated the KIP for the item 111.
> > > > I'm in the process of addressing 100.6, and will provide an update
> > soon.
> > > > I think item 110 is still under discussion given we are now
> providing a
> > > way
> > > > to finalize
> > > > all features to their latest version levels. In any case, please let
> us
> > > > know
> > > > how you feel in response to Colin's comments on this topic.
> > > >
> > > > > 111. To put this in context, when we had IBP, the default value is
> > the
> > > > > current released version. So, if you are a brand new user, you
> don't
> > > need
> > > > > to configure IBP and all new features will be immediately available
> > in
> > > > the
> > > > > new cluster. If you are upgrading from an old version, you do need
> to
> > > > > understand and configure IBP. I see a similar pattern here for
> > > > > features. From the ease of use perspective, ideally, we shouldn't
> > > require
> > > > a
> > > > > new user to have an extra step such as running a bootstrap script
> > > unless
> > > > > it's truly necessary. If someone has a special need (all the cases
> > you
> > > > > mentioned seem special cases?), they can configure a mode such that
> > > > > features are enabled/disabled manually.
> > > >
> > > > (Kowshik): That makes sense, thanks for the idea! Sorry if I didn't
> > > > understand
> > > > this need earlier. I have updated the KIP with the approach that
> > whenever
> > > > the '/features' node is absent, the controller by default will
> > bootstrap
> > > > the node
> > > > to contain the latest feature levels. Here is the new section in the
> > KIP
> > > > describing
> > > > the same:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > > >
> > > > Next, as I explained in my response to Colin's suggestions, we are
> now
> > > > providing a `--finalize-latest-features` flag with the tooling. This
> > lets
> > > > the sysadmin finalize all features known to the controller to their
> > > latest
> > > > version
> > > > levels. Please look at this section (point #3 and the tooling example
> > > > later):
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > > >
> > > >
> > > > Do you feel this addresses your comment/concern?
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the reply. A few more replies below.
> > > > >
> > > > > 100.6 You can look for the sentence "This operation requires ALTER
> on
> > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > > KafkaApis.authorize().
> > > > >
> > > > > 110. From the external client/tooling perspective, it's more
> natural
> > to
> > > > use
> > > > > the release version for features. If we can use the same release
> > > version
> > > > > for internal representation, it seems simpler (easier to
> understand,
> > no
> > > > > mapping overhead, etc). Is there a benefit with separate external
> and
> > > > > internal versioning schemes?
> > > > >
> > > > > 111. To put this in context, when we had IBP, the default value is
> > the
> > > > > current released version. So, if you are a brand new user, you
> don't
> > > need
> > > > > to configure IBP and all new features will be immediately available
> > in
> > > > the
> > > > > new cluster. If you are upgrading from an old version, you do need
> to
> > > > > understand and configure IBP. I see a similar pattern here for
> > > > > features. From the ease of use perspective, ideally, we shouldn't
> > > > require a
> > > > > new user to have an extra step such as running a bootstrap script
> > > unless
> > > > > it's truly necessary. If someone has a special need (all the cases
> > you
> > > > > mentioned seem special cases?), they can configure a mode such that
> > > > > features are enabled/disabled manually.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> > > kprakasam@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for the feedback and suggestions. Please find my response
> > > below.
> > > > > >
> > > > > > > 100.6 For every new request, the admin needs to control who is
> > > > allowed
> > > > > to
> > > > > > > issue that request if security is enabled. So, we need to
> assign
> > > the
> > > > > new
> > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > as an example.
> > > > > >
> > > > > > (Kowshik): I don't see any reference to the words ResourceType or
> > > > > > AclOperations
> > > > > > in the KIP. Please let me know how I can use the KIP that you
> > linked
> > > to
> > > > > > know how to
> > > > > > setup the appropriate ResourceType and/or ClusterOperation?
> > > > > >
> > > > > > > 105. If we change delete to disable, it's better to do this
> > > > > consistently
> > > > > > in
> > > > > > > request protocol and admin api as well.
> > > > > >
> > > > > > (Kowshik): The API shouldn't be called 'disable' when it is
> > deleting
> > > a
> > > > > > feature.
> > > > > > I've just changed the KIP to use 'delete'. I don't have a strong
> > > > > > preference.
> > > > > >
> > > > > > > 110. The minVersion/maxVersion for features use int64.
> Currently,
> > > our
> > > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > > > > possible
> > > > > > > for new features to be included in minor releases too. Should
> we
> > > make
> > > > > the
> > > > > > > feature versioning match the release versioning?
> > > > > >
> > > > > > (Kowshik): The release version can be mapped to a set of feature
> > > > > versions,
> > > > > > and this can be done, for example in the tool (or even external
> to
> > > the
> > > > > > tool).
> > > > > > Can you please clarify what I'm missing?
> > > > > >
> > > > > > > 111. "During regular operations, the data in the ZK node can be
> > > > mutated
> > > > > > > only via a specific admin API served only by the controller." I
> > am
> > > > > > > wondering why can't the controller auto finalize a feature
> > version
> > > > > after
> > > > > > > all brokers are upgraded? For new users who download the latest
> > > > version
> > > > > > to
> > > > > > > build a new cluster, it's inconvenient for them to have to
> > manually
> > > > > > enable
> > > > > > > each feature.
> > > > > >
> > > > > > (Kowshik): I agree that there is a trade-off here, but it will
> help
> > > > > > to decide whether the automation can be thought through in the
> > future
> > > > > > in a follow up KIP, or right now in this KIP. We may invest
> > > > > > in automation, but we have to decide whether we should do it
> > > > > > now or later.
> > > > > >
> > > > > > For the inconvenience that you mentioned, do you think the
> problem
> > > that
> > > > > you
> > > > > > mentioned can be  overcome by asking for the cluster operator to
> > run
> > > a
> > > > > > bootstrap script  when he/she knows that a specific AK release
> has
> > > been
> > > > > > almost completely deployed in a cluster for the first time? Idea
> is
> > > > that
> > > > > > the
> > > > > > bootstrap script will know how to map a specific AK release to
> > > > finalized
> > > > > > feature versions, and run the `kafka-features.sh` tool
> > appropriately
> > > > > > against
> > > > > > the cluster.
> > > > > >
> > > > > > Now, coming back to your automation proposal/question.
> > > > > > I do see the value of automated feature version finalization,
> but I
> > > > also
> > > > > > see
> > > > > > that this will open up several questions and some risks, as
> > explained
> > > > > > below.
> > > > > > The answers to these depend on the definition of the automation
> we
> > > > choose
> > > > > > to build, and how well does it fit into a kafka deployment.
> > > > > > Basically, it can be unsafe for the controller to finalize
> feature
> > > > > version
> > > > > > upgrades automatically, without learning about the intent of the
> > > > cluster
> > > > > > operator.
> > > > > > 1. We would sometimes want to lock feature versions only when we
> > have
> > > > > > externally verified
> > > > > > the stability of the broker binary.
> > > > > > 2. Sometimes only the cluster operator knows that a cluster
> upgrade
> > > is
> > > > > > complete,
> > > > > > and new brokers are highly unlikely to join the cluster.
> > > > > > 3. Only the cluster operator knows that the intent is to deploy
> the
> > > > same
> > > > > > version
> > > > > > of the new broker release across the entire cluster (i.e. the
> > latest
> > > > > > downloaded version).
> > > > > > 4. For downgrades, it appears the controller still needs some
> > > external
> > > > > > input
> > > > > > (such as the proposed tool) to finalize a feature version
> > downgrade.
> > > > > >
> > > > > > If we have automation, that automation can end up failing in some
> > of
> > > > the
> > > > > > cases
> > > > > > above. Then, we need a way to declare that the cluster is "not
> > ready"
> > > > if
> > > > > > the
> > > > > > controller cannot automatically finalize some basic required
> > feature
> > > > > > version
> > > > > > upgrades across the cluster. We need to make the cluster operator
> > > aware
> > > > > in
> > > > > > such a scenario (raise an alert or alike).
> > > > > >
> > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> > > instead
> > > > > of
> > > > > > 48.
> > > > > >
> > > > > > (Kowshik): Done.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Kowshik,
> > > > > > >
> > > > > > > Thanks for the reply. A few more comments below.
> > > > > > >
> > > > > > > 100.6 For every new request, the admin needs to control who is
> > > > allowed
> > > > > to
> > > > > > > issue that request if security is enabled. So, we need to
> assign
> > > the
> > > > > new
> > > > > > > request a ResourceType and possible AclOperations. See
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > > as
> > > > > > > an example.
> > > > > > >
> > > > > > > 105. If we change delete to disable, it's better to do this
> > > > > consistently
> > > > > > in
> > > > > > > request protocol and admin api as well.
> > > > > > >
> > > > > > > 110. The minVersion/maxVersion for features use int64.
> Currently,
> > > our
> > > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > > > > possible
> > > > > > > for new features to be included in minor releases too. Should
> we
> > > make
> > > > > the
> > > > > > > feature versioning match the release versioning?
> > > > > > >
> > > > > > > 111. "During regular operations, the data in the ZK node can be
> > > > mutated
> > > > > > > only via a specific admin API served only by the controller." I
> > am
> > > > > > > wondering why can't the controller auto finalize a feature
> > version
> > > > > after
> > > > > > > all brokers are upgraded? For new users who download the latest
> > > > version
> > > > > > to
> > > > > > > build a new cluster, it's inconvenient for them to have to
> > manually
> > > > > > enable
> > > > > > > each feature.
> > > > > > >
> > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> > > instead
> > > > > of
> > > > > > > 48.
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > > > kprakasam@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Jun,
> > > > > > > >
> > > > > > > > Thanks a lot for the great feedback! Please note that the
> > design
> > > > > > > > has changed a little bit on the KIP, and we now propagate the
> > > > > finalized
> > > > > > > > features metadata only via ZK watches (instead of
> > > > > UpdateMetadataRequest
> > > > > > > > from the controller).
> > > > > > > >
> > > > > > > > Please find below my response to your questions/feedback,
> with
> > > the
> > > > > > prefix
> > > > > > > > "(Kowshik):".
> > > > > > > >
> > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > 100.1 Since this request waits for responses from brokers,
> > > should
> > > > > we
> > > > > > > add
> > > > > > > > a
> > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > >
> > > > > > > > (Kowshik): Great point! Done. I have added a timeout field.
> > Note:
> > > > we
> > > > > no
> > > > > > > > longer
> > > > > > > > wait for responses from brokers, since the design has been
> > > changed
> > > > so
> > > > > > > that
> > > > > > > > the
> > > > > > > > features information is propagated via ZK. Nevertheless, it
> is
> > > > right
> > > > > to
> > > > > > > > have a timeout
> > > > > > > > for the request.
> > > > > > > >
> > > > > > > > > 100.2 The response schema is a bit weird. Typically, the
> > > response
> > > > > > just
> > > > > > > > > shows an error code and an error message, instead of
> echoing
> > > the
> > > > > > > request.
> > > > > > > >
> > > > > > > > (Kowshik): Great point! Yeah, I have modified it to just
> return
> > > an
> > > > > > error
> > > > > > > > code and a message.
> > > > > > > > Previously it was not echoing the "request", rather it was
> > > > returning
> > > > > > the
> > > > > > > > latest set of
> > > > > > > > cluster-wide finalized features (after applying the updates).
> > But
> > > > you
> > > > > > are
> > > > > > > > right,
> > > > > > > > the additional info is not required, so I have removed it
> from
> > > the
> > > > > > > response
> > > > > > > > schema.
> > > > > > > >
> > > > > > > > > 100.3 Should we add a separate request to list/describe the
> > > > > existing
> > > > > > > > > features?
> > > > > > > >
> > > > > > > > (Kowshik): This is already present in the KIP via the
> > > > > > 'DescribeFeatures'
> > > > > > > > Admin API,
> > > > > > > > which, underneath covers uses the ApiVersionsRequest to
> > > > list/describe
> > > > > > the
> > > > > > > > existing features. Please read the 'Tooling support' section.
> > > > > > > >
> > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> > > request.
> > > > > For
> > > > > > > > > DELETE, the version field doesn't make sense. So, I guess
> the
> > > > > broker
> > > > > > > just
> > > > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > > DeleteFeaturesRequest
> > > > > > > >
> > > > > > > > (Kowshik): Great point! I have modified the KIP now to have 2
> > > > > separate
> > > > > > > > controller APIs
> > > > > > > > serving these different purposes:
> > > > > > > > 1. updateFeatures
> > > > > > > > 2. deleteFeatures
> > > > > > > >
> > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > > > > increasing
> > > > > > > > > version of the metadata for finalized features." I am
> > wondering
> > > > why
> > > > > > the
> > > > > > > > > ordering is important?
> > > > > > > >
> > > > > > > > (Kowshik): In the latest KIP write-up, it is called epoch
> > > (instead
> > > > of
> > > > > > > > version), and
> > > > > > > > it is just the ZK node version. Basically, this is the epoch
> > for
> > > > the
> > > > > > > > cluster-wide
> > > > > > > > finalized feature version metadata. This metadata is served
> to
> > > > > clients
> > > > > > > via
> > > > > > > > the
> > > > > > > > ApiVersionsResponse (for reads). We propagate updates from
> the
> > > > > > > '/features'
> > > > > > > > ZK node
> > > > > > > > to all brokers, via ZK watches setup by each broker on the
> > > > > '/features'
> > > > > > > > node.
> > > > > > > >
> > > > > > > > Now here is why the ordering is important:
> > > > > > > > ZK watches don't propagate at the same time. As a result, the
> > > > > > > > ApiVersionsResponse
> > > > > > > > is eventually consistent across brokers. This can introduce
> > cases
> > > > > > > > where clients see an older lower epoch of the features
> > metadata,
> > > > > after
> > > > > > a
> > > > > > > > more recent
> > > > > > > > higher epoch was returned at a previous point in time. We
> > expect
> > > > > > clients
> > > > > > > > to always employ the rule that the latest received higher
> epoch
> > > of
> > > > > > > metadata
> > > > > > > > always trumps an older smaller epoch. Those clients that are
> > > > external
> > > > > > to
> > > > > > > > Kafka should strongly consider discovering the latest
> metadata
> > > once
> > > > > > > during
> > > > > > > > startup from the brokers, and if required refresh the
> metadata
> > > > > > > periodically
> > > > > > > > (to get the latest metadata).
> > > > > > > >
> > > > > > > > > 100.6 Could you specify the required ACL for this new
> > request?
> > > > > > > >
> > > > > > > > (Kowshik): What is ACL, and how could I find out which one to
> > > > > specify?
> > > > > > > > Please could you provide me some pointers? I'll be glad to
> > update
> > > > the
> > > > > > > > KIP once I know the next steps.
> > > > > > > >
> > > > > > > > > 101. For the broker registration ZK node, should we bump up
> > the
> > > > > > version
> > > > > > > > in
> > > > > > > > the json?
> > > > > > > >
> > > > > > > > (Kowshik): Great point! Done. I've increased the version in
> the
> > > > > broker
> > > > > > > json
> > > > > > > > by 1.
> > > > > > > >
> > > > > > > > > 102. For the /features ZK node, not sure if we need the
> epoch
> > > > > field.
> > > > > > > Each
> > > > > > > > > ZK node has an internal version field that is incremented
> on
> > > > every
> > > > > > > > update.
> > > > > > > >
> > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node version
> > now,
> > > > > > instead
> > > > > > > of
> > > > > > > > explicitly
> > > > > > > > incremented epoch.
> > > > > > > >
> > > > > > > > > 103. "Enabling the actual semantics of a feature version
> > > > > cluster-wide
> > > > > > > is
> > > > > > > > > left to the discretion of the logic implementing the
> feature
> > > (ex:
> > > > > can
> > > > > > > be
> > > > > > > > > done via dynamic broker config)." Does that mean the broker
> > > > > > > registration
> > > > > > > > ZK
> > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > >
> > > > > > > > (Kowshik): Not really. The text was just conveying that a
> > broker
> > > > > could
> > > > > > > > "know" of
> > > > > > > > a new feature version, but it does not mean the broker should
> > > have
> > > > > also
> > > > > > > > activated the effects of the feature version. Knowing vs
> > > activation
> > > > > > are 2
> > > > > > > > separate things,
> > > > > > > > and the latter can be achieved by dynamic config. I have
> > reworded
> > > > the
> > > > > > > text
> > > > > > > > to
> > > > > > > > make this clear to the reader.
> > > > > > > >
> > > > > > > >
> > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > 104.1 It would be useful to describe when the feature
> > metadata
> > > is
> > > > > > > > included
> > > > > > > > > in the request. My understanding is that it's only included
> > if
> > > > (1)
> > > > > > > there
> > > > > > > > is
> > > > > > > > > a change to the finalized feature; (2) broker restart; (3)
> > > > > controller
> > > > > > > > > failover.
> > > > > > > > > 104.2 The new fields have the following versions. Why are
> the
> > > > > > versions
> > > > > > > 3+
> > > > > > > > > when the top version is bumped to 6?
> > > > > > > > >       "fields":  [
> > > > > > > > >         {"name": "Name", "type":  "string", "versions":
> > "3+",
> > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > >         {"name":  "Version", "type":  "int64", "versions":
> > > "3+",
> > > > > > > > >           "about": "The finalized version for the
> feature."}
> > > > > > > > >       ]
> > > > > > > >
> > > > > > > > (Kowshik): With the new improved design, we have completely
> > > > > eliminated
> > > > > > > the
> > > > > > > > need to
> > > > > > > > use UpdateMetadataRequest. This is because we now rely on ZK
> to
> > > > > deliver
> > > > > > > the
> > > > > > > > notifications for changes to the '/features' ZK node.
> > > > > > > >
> > > > > > > > > 105. kafka-features.sh: Instead of using update/delete,
> > perhaps
> > > > > it's
> > > > > > > > better
> > > > > > > > > to use enable/disable?
> > > > > > > >
> > > > > > > > (Kowshik): For delete, yes, I have changed it so that we
> > instead
> > > > call
> > > > > > it
> > > > > > > > 'disable'.
> > > > > > > > However for 'update', it can now also refer to either an
> > upgrade
> > > > or a
> > > > > > > > forced downgrade.
> > > > > > > > Therefore, I have left it the way it is, just calling it as
> > just
> > > > > > > 'update'.
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Kowshik,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Looks good overall. A few comments
> below.
> > > > > > > > >
> > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > > 100.1 Since this request waits for responses from brokers,
> > > should
> > > > > we
> > > > > > > add
> > > > > > > > a
> > > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > > 100.2 The response schema is a bit weird. Typically, the
> > > response
> > > > > > just
> > > > > > > > > shows an error code and an error message, instead of
> echoing
> > > the
> > > > > > > request.
> > > > > > > > > 100.3 Should we add a separate request to list/describe the
> > > > > existing
> > > > > > > > > features?
> > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> > > request.
> > > > > For
> > > > > > > > > DELETE, the version field doesn't make sense. So, I guess
> the
> > > > > broker
> > > > > > > just
> > > > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > > > DeleteFeaturesRequest
> > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > > > > increasing
> > > > > > > > > version of the metadata for finalized features." I am
> > wondering
> > > > why
> > > > > > the
> > > > > > > > > ordering is important?
> > > > > > > > > 100.6 Could you specify the required ACL for this new
> > request?
> > > > > > > > >
> > > > > > > > > 101. For the broker registration ZK node, should we bump up
> > the
> > > > > > version
> > > > > > > > in
> > > > > > > > > the json?
> > > > > > > > >
> > > > > > > > > 102. For the /features ZK node, not sure if we need the
> epoch
> > > > > field.
> > > > > > > Each
> > > > > > > > > ZK node has an internal version field that is incremented
> on
> > > > every
> > > > > > > > update.
> > > > > > > > >
> > > > > > > > > 103. "Enabling the actual semantics of a feature version
> > > > > cluster-wide
> > > > > > > is
> > > > > > > > > left to the discretion of the logic implementing the
> feature
> > > (ex:
> > > > > can
> > > > > > > be
> > > > > > > > > done via dynamic broker config)." Does that mean the broker
> > > > > > > registration
> > > > > > > > ZK
> > > > > > > > > node will be updated dynamically when this happens?
> > > > > > > > >
> > > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > > 104.1 It would be useful to describe when the feature
> > metadata
> > > is
> > > > > > > > included
> > > > > > > > > in the request. My understanding is that it's only included
> > if
> > > > (1)
> > > > > > > there
> > > > > > > > is
> > > > > > > > > a change to the finalized feature; (2) broker restart; (3)
> > > > > controller
> > > > > > > > > failover.
> > > > > > > > > 104.2 The new fields have the following versions. Why are
> the
> > > > > > versions
> > > > > > > 3+
> > > > > > > > > when the top version is bumped to 6?
> > > > > > > > >       "fields":  [
> > > > > > > > >         {"name": "Name", "type":  "string", "versions":
> > "3+",
> > > > > > > > >           "about": "The name of the feature."},
> > > > > > > > >         {"name":  "Version", "type":  "int64", "versions":
> > > "3+",
> > > > > > > > >           "about": "The finalized version for the
> feature."}
> > > > > > > > >       ]
> > > > > > > > >
> > > > > > > > > 105. kafka-features.sh: Instead of using update/delete,
> > perhaps
> > > > > it's
> > > > > > > > better
> > > > > > > > > to use enable/disable?
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > > > > kprakasam@confluent.io
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Boyang,
> > > > > > > > > >
> > > > > > > > > > Thanks for the great feedback! I have updated the KIP
> based
> > > on
> > > > > your
> > > > > > > > > > feedback.
> > > > > > > > > > Please find my response below for your comments, look for
> > > > > sentences
> > > > > > > > > > starting
> > > > > > > > > > with "(Kowshik)" below.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 1. "When is it safe for the brokers to begin handling
> EOS
> > > > > > traffic"
> > > > > > > > > could
> > > > > > > > > > be
> > > > > > > > > > > converted as "When is it safe for the brokers to start
> > > > serving
> > > > > > new
> > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> > > > earlier
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > context.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > >
> > > > > > > > > > > 2. In the *Explanation *section, the metadata version
> > > number
> > > > > part
> > > > > > > > > seems a
> > > > > > > > > > > bit blurred. Could you point a reference to later
> section
> > > > that
> > > > > we
> > > > > > > > going
> > > > > > > > > > to
> > > > > > > > > > > store it in Zookeeper and update it every time when
> there
> > > is
> > > > a
> > > > > > > > feature
> > > > > > > > > > > change?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! Done. I've added a reference in
> the
> > > > KIP.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 3. For the feature downgrade, although it's a Non-goal
> of
> > > the
> > > > > > KIP,
> > > > > > > > for
> > > > > > > > > > > features such as group coordinator semantics, there is
> no
> > > > legal
> > > > > > > > > scenario
> > > > > > > > > > to
> > > > > > > > > > > perform a downgrade at all. So having downgrade door
> open
> > > is
> > > > > > pretty
> > > > > > > > > > > error-prone as human faults happen all the time. I'm
> > > assuming
> > > > > as
> > > > > > > new
> > > > > > > > > > > features are implemented, it's not very hard to add a
> > flag
> > > > > during
> > > > > > > > > feature
> > > > > > > > > > > creation to indicate whether this feature is
> > > "downgradable".
> > > > > > Could
> > > > > > > > you
> > > > > > > > > > > explain a bit more on the extra engineering effort for
> > > > shipping
> > > > > > > this
> > > > > > > > > KIP
> > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! I'd agree and disagree here.
> While
> > I
> > > > > agree
> > > > > > > that
> > > > > > > > > > accidental
> > > > > > > > > > downgrades can cause problems, I also think sometimes
> > > > downgrades
> > > > > > > should
> > > > > > > > > > be allowed for emergency reasons (not all downgrades
> cause
> > > > > issues).
> > > > > > > > > > It is just subjective to the feature being downgraded.
> > > > > > > > > >
> > > > > > > > > > To be more strict about feature version downgrades, I
> have
> > > > > modified
> > > > > > > the
> > > > > > > > > KIP
> > > > > > > > > > proposing that we mandate a `--force-downgrade` flag be
> > used
> > > in
> > > > > the
> > > > > > > > > > UPDATE_FEATURES api
> > > > > > > > > > and the tooling, whenever the human is downgrading a
> > > finalized
> > > > > > > feature
> > > > > > > > > > version.
> > > > > > > > > > Hopefully this should cover the requirement, until we
> find
> > > the
> > > > > need
> > > > > > > for
> > > > > > > > > > advanced downgrade support.
> > > > > > > > > >
> > > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> > versions
> > > > will
> > > > > > be
> > > > > > > > > > defined
> > > > > > > > > > > in the broker code." So this means in order to
> restrict a
> > > > > certain
> > > > > > > > > > feature,
> > > > > > > > > > > we need to start the broker first and then send a
> feature
> > > > > gating
> > > > > > > > > request
> > > > > > > > > > > immediately, which introduces a time gap and the
> > > > > > intended-to-close
> > > > > > > > > > feature
> > > > > > > > > > > could actually serve request during this phase. Do you
> > > think
> > > > we
> > > > > > > > should
> > > > > > > > > > also
> > > > > > > > > > > support configurations as well so that admin user could
> > > > freely
> > > > > > roll
> > > > > > > > up
> > > > > > > > > a
> > > > > > > > > > > cluster with all nodes complying the same feature
> gating,
> > > > > without
> > > > > > > > > > worrying
> > > > > > > > > > > about the turnaround time to propagate the message only
> > > after
> > > > > the
> > > > > > > > > cluster
> > > > > > > > > > > starts up?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): This is a great point/question. One of the
> > > > > expectations
> > > > > > > out
> > > > > > > > of
> > > > > > > > > > this KIP, which is
> > > > > > > > > > already followed in the broker, is the following.
> > > > > > > > > >  - Imagine at time T1 the broker starts up and registers
> > it’s
> > > > > > > presence
> > > > > > > > in
> > > > > > > > > > ZK,
> > > > > > > > > >    along with advertising it’s supported features.
> > > > > > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > > > > > UpdateMetadataRequest
> > > > > > > > > >    from the controller, which contains the latest
> finalized
> > > > > > features
> > > > > > > as
> > > > > > > > > > seen by
> > > > > > > > > >    the controller. The broker validates this data against
> > > it’s
> > > > > > > > supported
> > > > > > > > > > features to
> > > > > > > > > >    make sure there is no mismatch (it will shutdown if
> > there
> > > is
> > > > > an
> > > > > > > > > > incompatibility).
> > > > > > > > > >
> > > > > > > > > > It is expected that during the time between the 2 events
> T1
> > > and
> > > > > T2,
> > > > > > > the
> > > > > > > > > > broker is
> > > > > > > > > > almost a silent entity in the cluster. It does not add
> any
> > > > value
> > > > > to
> > > > > > > the
> > > > > > > > > > cluster, or carry
> > > > > > > > > > out any important broker activities. By “important”, I
> mean
> > > it
> > > > is
> > > > > > not
> > > > > > > > > doing
> > > > > > > > > > mutations
> > > > > > > > > > on it’s persistence, not mutating critical in-memory
> state,
> > > > won’t
> > > > > > be
> > > > > > > > > > serving
> > > > > > > > > > produce/fetch requests. Note it doesn’t even know it’s
> > > assigned
> > > > > > > > > partitions
> > > > > > > > > > until
> > > > > > > > > > it receives UpdateMetadataRequest from controller.
> Anything
> > > the
> > > > > > > broker
> > > > > > > > is
> > > > > > > > > > doing up
> > > > > > > > > > until this point is not damaging/useful.
> > > > > > > > > >
> > > > > > > > > > I’ve clarified the above in the KIP, see this new
> section:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> > existing
> > > > > > > Feature",
> > > > > > > > > may
> > > > > > > > > > be
> > > > > > > > > > > I misunderstood something, I thought the features are
> > > defined
> > > > > in
> > > > > > > > broker
> > > > > > > > > > > code, so admin could not really create a new feature?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! You understood this right. Here
> > > adding
> > > > a
> > > > > > > > feature
> > > > > > > > > > means we are
> > > > > > > > > > adding a cluster-wide finalized *max* version for a
> feature
> > > > that
> > > > > > was
> > > > > > > > > > previously never finalized.
> > > > > > > > > > I have clarified this in the KIP now.
> > > > > > > > > >
> > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > to
> > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! I have modified the KIP adding
> the
> > > > above
> > > > > > (see
> > > > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > > > >
> > > > > > > > > > > 7. I think we haven't discussed the alternative
> solution
> > to
> > > > > pass
> > > > > > > the
> > > > > > > > > > > feature information through Zookeeper. Is that
> mentioned
> > in
> > > > the
> > > > > > KIP
> > > > > > > > to
> > > > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Nice question! The broker reads finalized
> > feature
> > > > info
> > > > > > > > stored
> > > > > > > > > in
> > > > > > > > > > ZK,
> > > > > > > > > > only during startup when it does a validation. When
> serving
> > > > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > > > broker does not read this info from ZK directly. I'd
> > imagine
> > > > the
> > > > > > risk
> > > > > > > > is
> > > > > > > > > > that it can increase
> > > > > > > > > > the ZK read QPS which can be a bottleneck for the system.
> > > > Today,
> > > > > in
> > > > > > > > Kafka
> > > > > > > > > > we use the
> > > > > > > > > > controller to fan out ZK updates to brokers and we want
> to
> > > > stick
> > > > > to
> > > > > > > > that
> > > > > > > > > > pattern to avoid
> > > > > > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > > > > > >
> > > > > > > > > > > 8. I was under the impression that user could
> configure a
> > > > range
> > > > > > of
> > > > > > > > > > > supported versions, what's the trade-off for allowing
> > > single
> > > > > > > > finalized
> > > > > > > > > > > version only?
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great question! The finalized version of a
> > feature
> > > > > > > basically
> > > > > > > > > > refers to
> > > > > > > > > > the cluster-wide finalized feature "maximum" version. For
> > > > > example,
> > > > > > if
> > > > > > > > the
> > > > > > > > > > 'group_coordinator' feature
> > > > > > > > > > has the finalized version set to 10, then, it means that
> > > > > > cluster-wide
> > > > > > > > all
> > > > > > > > > > versions upto v10 are
> > > > > > > > > > supported for this feature. However, note that if some
> > > version
> > > > > (ex:
> > > > > > > v0)
> > > > > > > > > > gets deprecated
> > > > > > > > > > for this feature, then we don’t convey that using this
> > scheme
> > > > > (also
> > > > > > > > > > supporting deprecation is a non-goal).
> > > > > > > > > >
> > > > > > > > > > (Kowshik): I’ve now modified the KIP at all points,
> > refering
> > > to
> > > > > > > > finalized
> > > > > > > > > > feature "maximum" versions.
> > > > > > > > > >
> > > > > > > > > > > 9. One minor syntax fix: Note that here the "client"
> here
> > > may
> > > > > be
> > > > > > a
> > > > > > > > > > producer
> > > > > > > > > >
> > > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > > > reluctanthero104@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey Kowshik,
> > > > > > > > > > >
> > > > > > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > > > > > >
> > > > > > > > > > > 1. "When is it safe for the brokers to begin handling
> EOS
> > > > > > traffic"
> > > > > > > > > could
> > > > > > > > > > be
> > > > > > > > > > > converted as "When is it safe for the brokers to start
> > > > serving
> > > > > > new
> > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> > > > earlier
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > context.
> > > > > > > > > > >
> > > > > > > > > > > 2. In the *Explanation *section, the metadata version
> > > number
> > > > > part
> > > > > > > > > seems a
> > > > > > > > > > > bit blurred. Could you point a reference to later
> section
> > > > that
> > > > > we
> > > > > > > > going
> > > > > > > > > > to
> > > > > > > > > > > store it in Zookeeper and update it every time when
> there
> > > is
> > > > a
> > > > > > > > feature
> > > > > > > > > > > change?
> > > > > > > > > > >
> > > > > > > > > > > 3. For the feature downgrade, although it's a Non-goal
> of
> > > the
> > > > > > KIP,
> > > > > > > > for
> > > > > > > > > > > features such as group coordinator semantics, there is
> no
> > > > legal
> > > > > > > > > scenario
> > > > > > > > > > to
> > > > > > > > > > > perform a downgrade at all. So having downgrade door
> open
> > > is
> > > > > > pretty
> > > > > > > > > > > error-prone as human faults happen all the time. I'm
> > > assuming
> > > > > as
> > > > > > > new
> > > > > > > > > > > features are implemented, it's not very hard to add a
> > flag
> > > > > during
> > > > > > > > > feature
> > > > > > > > > > > creation to indicate whether this feature is
> > > "downgradable".
> > > > > > Could
> > > > > > > > you
> > > > > > > > > > > explain a bit more on the extra engineering effort for
> > > > shipping
> > > > > > > this
> > > > > > > > > KIP
> > > > > > > > > > > with downgrade protection in place?
> > > > > > > > > > >
> > > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> > versions
> > > > will
> > > > > > be
> > > > > > > > > > defined
> > > > > > > > > > > in the broker code." So this means in order to
> restrict a
> > > > > certain
> > > > > > > > > > feature,
> > > > > > > > > > > we need to start the broker first and then send a
> feature
> > > > > gating
> > > > > > > > > request
> > > > > > > > > > > immediately, which introduces a time gap and the
> > > > > > intended-to-close
> > > > > > > > > > feature
> > > > > > > > > > > could actually serve request during this phase. Do you
> > > think
> > > > we
> > > > > > > > should
> > > > > > > > > > also
> > > > > > > > > > > support configurations as well so that admin user could
> > > > freely
> > > > > > roll
> > > > > > > > up
> > > > > > > > > a
> > > > > > > > > > > cluster with all nodes complying the same feature
> gating,
> > > > > without
> > > > > > > > > > worrying
> > > > > > > > > > > about the turnaround time to propagate the message only
> > > after
> > > > > the
> > > > > > > > > cluster
> > > > > > > > > > > starts up?
> > > > > > > > > > >
> > > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> > existing
> > > > > > > Feature",
> > > > > > > > > may
> > > > > > > > > > be
> > > > > > > > > > > I misunderstood something, I thought the features are
> > > defined
> > > > > in
> > > > > > > > broker
> > > > > > > > > > > code, so admin could not really create a new feature?
> > > > > > > > > > >
> > > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > > to
> > > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > > >
> > > > > > > > > > > 7. I think we haven't discussed the alternative
> solution
> > to
> > > > > pass
> > > > > > > the
> > > > > > > > > > > feature information through Zookeeper. Is that
> mentioned
> > in
> > > > the
> > > > > > KIP
> > > > > > > > to
> > > > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > > > >
> > > > > > > > > > > 8. I was under the impression that user could
> configure a
> > > > range
> > > > > > of
> > > > > > > > > > > supported versions, what's the trade-off for allowing
> > > single
> > > > > > > > finalized
> > > > > > > > > > > version only?
> > > > > > > > > > >
> > > > > > > > > > > 9. One minor syntax fix: Note that here the "client"
> here
> > > may
> > > > > be
> > > > > > a
> > > > > > > > > > producer
> > > > > > > > > > >
> > > > > > > > > > > Boyang
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > > > > cmccabe@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam
> wrote:
> > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the feedback! I've changed the KIP to
> > > address
> > > > > your
> > > > > > > > > > > > > suggestions.
> > > > > > > > > > > > > Please find below my explanation. Here is a link to
> > KIP
> > > > > 584:
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > .
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. '__data_version__' is the version of the
> finalized
> > > > > feature
> > > > > > > > > > metadata
> > > > > > > > > > > > > (i.e. actual ZK node contents), while the
> > > > > > '__schema_version__'
> > > > > > > is
> > > > > > > > > the
> > > > > > > > > > > > > version of the schema of the data persisted in ZK.
> > > These
> > > > > > serve
> > > > > > > > > > > different
> > > > > > > > > > > > > purposes. '__data_version__' is is useful mainly to
> > > > clients
> > > > > > > > during
> > > > > > > > > > > reads,
> > > > > > > > > > > > > to differentiate between the 2 versions of
> eventually
> > > > > > > consistent
> > > > > > > > > > > > 'finalized
> > > > > > > > > > > > > features' metadata (i.e. larger metadata version is
> > > more
> > > > > > > recent).
> > > > > > > > > > > > > '__schema_version__' provides an additional degree
> of
> > > > > > > > flexibility,
> > > > > > > > > > > where
> > > > > > > > > > > > if
> > > > > > > > > > > > > we decide to change the schema for '/features' node
> > in
> > > ZK
> > > > > (in
> > > > > > > the
> > > > > > > > > > > > future),
> > > > > > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > > > > > serialization/deserialization of the ZK data can be
> > > > handled
> > > > > > > > > safely).
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Kowshik,
> > > > > > > > > > > >
> > > > > > > > > > > > If you're talking about a number that lets you know
> if
> > > data
> > > > > is
> > > > > > > more
> > > > > > > > > or
> > > > > > > > > > > > less recent, we would typically call that an epoch,
> and
> > > > not a
> > > > > > > > > version.
> > > > > > > > > > > For
> > > > > > > > > > > > the ZK data structures, the word "version" is
> typically
> > > > > > reserved
> > > > > > > > for
> > > > > > > > > > > > describing changes to the overall schema of the data
> > that
> > > > is
> > > > > > > > written
> > > > > > > > > to
> > > > > > > > > > > > ZooKeeper.  We don't even really change the "version"
> > of
> > > > > those
> > > > > > > > > schemas
> > > > > > > > > > > that
> > > > > > > > > > > > much, since most changes are backwards-compatible.
> But
> > > we
> > > > do
> > > > > > > > include
> > > > > > > > > > > that
> > > > > > > > > > > > version field just in case.
> > > > > > > > > > > >
> > > > > > > > > > > > I don't think we really need an epoch here, though,
> > since
> > > > we
> > > > > > can
> > > > > > > > just
> > > > > > > > > > > look
> > > > > > > > > > > > at the broker epoch.  Whenever the broker registers,
> > its
> > > > > epoch
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > greater than the previous broker epoch.  And the
> newly
> > > > > > registered
> > > > > > > > > data
> > > > > > > > > > > will
> > > > > > > > > > > > take priority.  This will be a lot simpler than
> adding
> > a
> > > > > > separate
> > > > > > > > > epoch
> > > > > > > > > > > > system, I think.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. Regarding admin client needing min and max
> > > > information -
> > > > > > you
> > > > > > > > are
> > > > > > > > > > > > right!
> > > > > > > > > > > > > I've changed the KIP such that the Admin API also
> > > allows
> > > > > the
> > > > > > > user
> > > > > > > > > to
> > > > > > > > > > > read
> > > > > > > > > > > > > 'supported features' from a specific broker. Please
> > > look
> > > > at
> > > > > > the
> > > > > > > > > > section
> > > > > > > > > > > > > "Admin API changes".
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was
> not
> > > > > > > deliberate.
> > > > > > > > > > I've
> > > > > > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > > > > > >
> > > > > > > > > > > > Sounds good.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you
> > are
> > > > > right!
> > > > > > > > I've
> > > > > > > > > > > > updated
> > > > > > > > > > > > > the KIP sketching the functionality provided by
> this
> > > > tool,
> > > > > > with
> > > > > > > > > some
> > > > > > > > > > > > > examples. Please look at the section "Tooling
> support
> > > > > > > examples".
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you!
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > > > >
> > > > > > > > > > > > cheers,
> > > > > > > > > > > > Colin
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In the "Schema" section, do we really need both
> > > > > > > > > __schema_version__
> > > > > > > > > > > and
> > > > > > > > > > > > > > __data_version__?  Can we just have a single
> > version
> > > > > field
> > > > > > > > here?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Shouldn't the Admin(Client) function have some
> way
> > to
> > > > get
> > > > > > the
> > > > > > > > min
> > > > > > > > > > and
> > > > > > > > > > > > max
> > > > > > > > > > > > > > information that we're exposing as well?  I guess
> > we
> > > > > could
> > > > > > > have
> > > > > > > > > > min,
> > > > > > > > > > > > max,
> > > > > > > > > > > > > > and current.  Unrelated: is the use of Long
> rather
> > > than
> > > > > > long
> > > > > > > > > > > deliberate
> > > > > > > > > > > > > > here?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It would be good to describe how the command line
> > > tool
> > > > > > > > > > > > > > kafka.admin.FeatureCommand will work.  For
> example
> > > the
> > > > > > flags
> > > > > > > > that
> > > > > > > > > > it
> > > > > > > > > > > > will
> > > > > > > > > > > > > > take and the output that it will generate to
> > STDOUT.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam
> > > wrote:
> > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I've opened KIP-584
> > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > > > >
> > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > is intended to provide a versioning scheme for
> > > > > features.
> > > > > > > I'd
> > > > > > > > > like
> > > > > > > > > > > to
> > > > > > > > > > > > use
> > > > > > > > > > > > > > > this thread to discuss the same. I'd appreciate
> > any
> > > > > > > feedback
> > > > > > > > on
> > > > > > > > > > > this.
> > > > > > > > > > > > > > > Here
> > > > > > > > > > > > > > > is a link to KIP-584
> > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > > >  .
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Jun Rao <ju...@confluent.io>.
Hi, Kowshik,

Thanks for addressing those comments. Just a few more minor comments.

200. The UpdateFeaturesRequest includes an AllowDowngrade field. It seems
that field needs to be persisted somewhere in ZK?

201. UpdateFeaturesResponse has the following top level fields. Should
those fields be per feature?

  "fields": [
    { "name": "ErrorCode", "type": "int16", "versions": "0+",
      "about": "The error code, or 0 if there was no error." },
    { "name": "ErrorMessage", "type": "string", "versions": "0+",
      "about": "The error message, or null if there was no error." }
  ]

202. The /features path in ZK has a field min_version_level. Which API and
tool can change that value?

Jun

On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> Thanks for the feedback! I have updated the KIP-584 addressing your
> comments.
> Please find my response below.
>
> > 100.6 You can look for the sentence "This operation requires ALTER on
> > CLUSTER." in KIP-455. Also, you can check its usage in
> > KafkaApis.authorize().
>
> (Kowshik): Done. Great point! For the newly introduced UPDATE_FEATURES api,
> I have added a
> requirement that AclOperation.ALTER is required on ResourceType.CLUSTER.
>
> > 110. Keeping the feature version as int is probably fine. I just felt
> that
> > for some of the common user interactions, it's more convenient to
> > relate that to a release version. For example, if a user wants to
> downgrade
> > to a release 2.5, it's easier for the user to use the tool like "tool
> > --downgrade 2.5" instead of "tool --downgrade --feature X --version 6".
>
> (Kowshik): Great point. Generally, maximum feature version levels are not
> downgradable after
> they are finalized in the cluster. This is because, as a guideline bumping
> feature version level usually is used mainly to convey important breaking
> changes.
> Despite the above, there may be some extreme/rare cases where a user wants
> to downgrade
> all features to a specific previous release. The user may want to do this
> just
> prior to rolling back a Kafka cluster to a previous release.
>
> To support the above, I have made a change to the KIP explaining that the
> CLI tool is versioned.
> The CLI tool internally has knowledge about a map of features to their
> respective max
> versions supported by the Broker. The tool's knowledge of features and
> their version values,
> is limited to the version of the CLI tool itself i.e. the information is
> packaged into the CLI tool
> when it is released. Whenever a Kafka release introduces a new feature
> version, or modifies
> an existing feature version, the CLI tool shall also be updated with this
> information,
> Newer versions of the CLI tool will be released as part of the Kafka
> releases.
>
> Therefore, to achieve the downgrade need, the user just needs to run the
> version of
> the CLI tool that's part of the particular previous release that he/she is
> downgrading to.
> To help the user with this, there is a new command added to the CLI tool
> called `downgrade-all`.
> This essentially downgrades max version levels of all features in the
> cluster to the versions
> known to the CLI tool internally.
>
> I have explained the above in the KIP under these sections:
>
> Tooling support (have explained that the CLI tool is versioned):
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
>
> Regular CLI tool usage (please refer to point #3, and see the tooling
> example)
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
>
> > 110. Similarly, if the client library finds a feature mismatch with the
> broker,
> > the client likely needs to log some error message for the user to take
> some
> > actions. It's much more actionable if the error message is "upgrade the
> > broker to release version 2.6" than just "upgrade the broker to feature
> > version 7".
>
> (Kowshik): That's a really good point! If we use ints for feature versions,
> the best
> message that client can print for debugging is "broker doesn't support
> feature version 7", and alongside that print the supported version range
> returned
> by the broker. Then, does it sound reasonable that the user could then
> reference
> Kafka release logs to figure out which version of the broker release is
> required
> be deployed, to support feature version 7? I couldn't think of a better
> strategy here.
>
> > 120. When should a developer bump up the version of a feature?
>
> (Kowshik): Great question! In the KIP, I have added a section: 'Guidelines
> on feature versions and workflows'
> providing some guidelines on when to use the versioned feature flags, and
> what
> are the regular workflows with the CLI tool.
>
> Link to the relevant sections:
> Guidelines:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
>
> Regular CLI tool usage:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
>
> Advanced CLI tool usage:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
>
>
> Cheers,
> Kowshik
>
>
> On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > Thanks for the reply. A few more comments.
> >
> > 110. Keeping the feature version as int is probably fine. I just felt
> that
> > for some of the common user interactions, it's more convenient to
> > relate that to a release version. For example, if a user wants to
> downgrade
> > to a release 2.5, it's easier for the user to use the tool like "tool
> > --downgrade 2.5" instead of "tool --downgrade --feature X --version 6".
> > Similarly, if the client library finds a feature mismatch with the
> broker,
> > the client likely needs to log some error message for the user to take
> some
> > actions. It's much more actionable if the error message is "upgrade the
> > broker to release version 2.6" than just "upgrade the broker to feature
> > version 7".
> >
> > 111. Sounds good.
> >
> > 120. When should a developer bump up the version of a feature?
> >
> > Jun
> >
> > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <kprakasam@confluent.io
> >
> > wrote:
> >
> > > Hi Jun,
> > >
> > > I have updated the KIP for the item 111.
> > > I'm in the process of addressing 100.6, and will provide an update
> soon.
> > > I think item 110 is still under discussion given we are now providing a
> > way
> > > to finalize
> > > all features to their latest version levels. In any case, please let us
> > > know
> > > how you feel in response to Colin's comments on this topic.
> > >
> > > > 111. To put this in context, when we had IBP, the default value is
> the
> > > > current released version. So, if you are a brand new user, you don't
> > need
> > > > to configure IBP and all new features will be immediately available
> in
> > > the
> > > > new cluster. If you are upgrading from an old version, you do need to
> > > > understand and configure IBP. I see a similar pattern here for
> > > > features. From the ease of use perspective, ideally, we shouldn't
> > require
> > > a
> > > > new user to have an extra step such as running a bootstrap script
> > unless
> > > > it's truly necessary. If someone has a special need (all the cases
> you
> > > > mentioned seem special cases?), they can configure a mode such that
> > > > features are enabled/disabled manually.
> > >
> > > (Kowshik): That makes sense, thanks for the idea! Sorry if I didn't
> > > understand
> > > this need earlier. I have updated the KIP with the approach that
> whenever
> > > the '/features' node is absent, the controller by default will
> bootstrap
> > > the node
> > > to contain the latest feature levels. Here is the new section in the
> KIP
> > > describing
> > > the same:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > >
> > > Next, as I explained in my response to Colin's suggestions, we are now
> > > providing a `--finalize-latest-features` flag with the tooling. This
> lets
> > > the sysadmin finalize all features known to the controller to their
> > latest
> > > version
> > > levels. Please look at this section (point #3 and the tooling example
> > > later):
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> > >
> > >
> > > Do you feel this addresses your comment/concern?
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the reply. A few more replies below.
> > > >
> > > > 100.6 You can look for the sentence "This operation requires ALTER on
> > > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > > KafkaApis.authorize().
> > > >
> > > > 110. From the external client/tooling perspective, it's more natural
> to
> > > use
> > > > the release version for features. If we can use the same release
> > version
> > > > for internal representation, it seems simpler (easier to understand,
> no
> > > > mapping overhead, etc). Is there a benefit with separate external and
> > > > internal versioning schemes?
> > > >
> > > > 111. To put this in context, when we had IBP, the default value is
> the
> > > > current released version. So, if you are a brand new user, you don't
> > need
> > > > to configure IBP and all new features will be immediately available
> in
> > > the
> > > > new cluster. If you are upgrading from an old version, you do need to
> > > > understand and configure IBP. I see a similar pattern here for
> > > > features. From the ease of use perspective, ideally, we shouldn't
> > > require a
> > > > new user to have an extra step such as running a bootstrap script
> > unless
> > > > it's truly necessary. If someone has a special need (all the cases
> you
> > > > mentioned seem special cases?), they can configure a mode such that
> > > > features are enabled/disabled manually.
> > > >
> > > > Jun
> > > >
> > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> > kprakasam@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for the feedback and suggestions. Please find my response
> > below.
> > > > >
> > > > > > 100.6 For every new request, the admin needs to control who is
> > > allowed
> > > > to
> > > > > > issue that request if security is enabled. So, we need to assign
> > the
> > > > new
> > > > > > request a ResourceType and possible AclOperations. See
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > as an example.
> > > > >
> > > > > (Kowshik): I don't see any reference to the words ResourceType or
> > > > > AclOperations
> > > > > in the KIP. Please let me know how I can use the KIP that you
> linked
> > to
> > > > > know how to
> > > > > setup the appropriate ResourceType and/or ClusterOperation?
> > > > >
> > > > > > 105. If we change delete to disable, it's better to do this
> > > > consistently
> > > > > in
> > > > > > request protocol and admin api as well.
> > > > >
> > > > > (Kowshik): The API shouldn't be called 'disable' when it is
> deleting
> > a
> > > > > feature.
> > > > > I've just changed the KIP to use 'delete'. I don't have a strong
> > > > > preference.
> > > > >
> > > > > > 110. The minVersion/maxVersion for features use int64. Currently,
> > our
> > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > > > possible
> > > > > > for new features to be included in minor releases too. Should we
> > make
> > > > the
> > > > > > feature versioning match the release versioning?
> > > > >
> > > > > (Kowshik): The release version can be mapped to a set of feature
> > > > versions,
> > > > > and this can be done, for example in the tool (or even external to
> > the
> > > > > tool).
> > > > > Can you please clarify what I'm missing?
> > > > >
> > > > > > 111. "During regular operations, the data in the ZK node can be
> > > mutated
> > > > > > only via a specific admin API served only by the controller." I
> am
> > > > > > wondering why can't the controller auto finalize a feature
> version
> > > > after
> > > > > > all brokers are upgraded? For new users who download the latest
> > > version
> > > > > to
> > > > > > build a new cluster, it's inconvenient for them to have to
> manually
> > > > > enable
> > > > > > each feature.
> > > > >
> > > > > (Kowshik): I agree that there is a trade-off here, but it will help
> > > > > to decide whether the automation can be thought through in the
> future
> > > > > in a follow up KIP, or right now in this KIP. We may invest
> > > > > in automation, but we have to decide whether we should do it
> > > > > now or later.
> > > > >
> > > > > For the inconvenience that you mentioned, do you think the problem
> > that
> > > > you
> > > > > mentioned can be  overcome by asking for the cluster operator to
> run
> > a
> > > > > bootstrap script  when he/she knows that a specific AK release has
> > been
> > > > > almost completely deployed in a cluster for the first time? Idea is
> > > that
> > > > > the
> > > > > bootstrap script will know how to map a specific AK release to
> > > finalized
> > > > > feature versions, and run the `kafka-features.sh` tool
> appropriately
> > > > > against
> > > > > the cluster.
> > > > >
> > > > > Now, coming back to your automation proposal/question.
> > > > > I do see the value of automated feature version finalization, but I
> > > also
> > > > > see
> > > > > that this will open up several questions and some risks, as
> explained
> > > > > below.
> > > > > The answers to these depend on the definition of the automation we
> > > choose
> > > > > to build, and how well does it fit into a kafka deployment.
> > > > > Basically, it can be unsafe for the controller to finalize feature
> > > > version
> > > > > upgrades automatically, without learning about the intent of the
> > > cluster
> > > > > operator.
> > > > > 1. We would sometimes want to lock feature versions only when we
> have
> > > > > externally verified
> > > > > the stability of the broker binary.
> > > > > 2. Sometimes only the cluster operator knows that a cluster upgrade
> > is
> > > > > complete,
> > > > > and new brokers are highly unlikely to join the cluster.
> > > > > 3. Only the cluster operator knows that the intent is to deploy the
> > > same
> > > > > version
> > > > > of the new broker release across the entire cluster (i.e. the
> latest
> > > > > downloaded version).
> > > > > 4. For downgrades, it appears the controller still needs some
> > external
> > > > > input
> > > > > (such as the proposed tool) to finalize a feature version
> downgrade.
> > > > >
> > > > > If we have automation, that automation can end up failing in some
> of
> > > the
> > > > > cases
> > > > > above. Then, we need a way to declare that the cluster is "not
> ready"
> > > if
> > > > > the
> > > > > controller cannot automatically finalize some basic required
> feature
> > > > > version
> > > > > upgrades across the cluster. We need to make the cluster operator
> > aware
> > > > in
> > > > > such a scenario (raise an alert or alike).
> > > > >
> > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> > instead
> > > > of
> > > > > 48.
> > > > >
> > > > > (Kowshik): Done.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Kowshik,
> > > > > >
> > > > > > Thanks for the reply. A few more comments below.
> > > > > >
> > > > > > 100.6 For every new request, the admin needs to control who is
> > > allowed
> > > > to
> > > > > > issue that request if security is enabled. So, we need to assign
> > the
> > > > new
> > > > > > request a ResourceType and possible AclOperations. See
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > > as
> > > > > > an example.
> > > > > >
> > > > > > 105. If we change delete to disable, it's better to do this
> > > > consistently
> > > > > in
> > > > > > request protocol and admin api as well.
> > > > > >
> > > > > > 110. The minVersion/maxVersion for features use int64. Currently,
> > our
> > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > > > possible
> > > > > > for new features to be included in minor releases too. Should we
> > make
> > > > the
> > > > > > feature versioning match the release versioning?
> > > > > >
> > > > > > 111. "During regular operations, the data in the ZK node can be
> > > mutated
> > > > > > only via a specific admin API served only by the controller." I
> am
> > > > > > wondering why can't the controller auto finalize a feature
> version
> > > > after
> > > > > > all brokers are upgraded? For new users who download the latest
> > > version
> > > > > to
> > > > > > build a new cluster, it's inconvenient for them to have to
> manually
> > > > > enable
> > > > > > each feature.
> > > > > >
> > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> > instead
> > > > of
> > > > > > 48.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > > kprakasam@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Jun,
> > > > > > >
> > > > > > > Thanks a lot for the great feedback! Please note that the
> design
> > > > > > > has changed a little bit on the KIP, and we now propagate the
> > > > finalized
> > > > > > > features metadata only via ZK watches (instead of
> > > > UpdateMetadataRequest
> > > > > > > from the controller).
> > > > > > >
> > > > > > > Please find below my response to your questions/feedback, with
> > the
> > > > > prefix
> > > > > > > "(Kowshik):".
> > > > > > >
> > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > 100.1 Since this request waits for responses from brokers,
> > should
> > > > we
> > > > > > add
> > > > > > > a
> > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > >
> > > > > > > (Kowshik): Great point! Done. I have added a timeout field.
> Note:
> > > we
> > > > no
> > > > > > > longer
> > > > > > > wait for responses from brokers, since the design has been
> > changed
> > > so
> > > > > > that
> > > > > > > the
> > > > > > > features information is propagated via ZK. Nevertheless, it is
> > > right
> > > > to
> > > > > > > have a timeout
> > > > > > > for the request.
> > > > > > >
> > > > > > > > 100.2 The response schema is a bit weird. Typically, the
> > response
> > > > > just
> > > > > > > > shows an error code and an error message, instead of echoing
> > the
> > > > > > request.
> > > > > > >
> > > > > > > (Kowshik): Great point! Yeah, I have modified it to just return
> > an
> > > > > error
> > > > > > > code and a message.
> > > > > > > Previously it was not echoing the "request", rather it was
> > > returning
> > > > > the
> > > > > > > latest set of
> > > > > > > cluster-wide finalized features (after applying the updates).
> But
> > > you
> > > > > are
> > > > > > > right,
> > > > > > > the additional info is not required, so I have removed it from
> > the
> > > > > > response
> > > > > > > schema.
> > > > > > >
> > > > > > > > 100.3 Should we add a separate request to list/describe the
> > > > existing
> > > > > > > > features?
> > > > > > >
> > > > > > > (Kowshik): This is already present in the KIP via the
> > > > > 'DescribeFeatures'
> > > > > > > Admin API,
> > > > > > > which, underneath covers uses the ApiVersionsRequest to
> > > list/describe
> > > > > the
> > > > > > > existing features. Please read the 'Tooling support' section.
> > > > > > >
> > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> > request.
> > > > For
> > > > > > > > DELETE, the version field doesn't make sense. So, I guess the
> > > > broker
> > > > > > just
> > > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > DeleteFeaturesRequest
> > > > > > >
> > > > > > > (Kowshik): Great point! I have modified the KIP now to have 2
> > > > separate
> > > > > > > controller APIs
> > > > > > > serving these different purposes:
> > > > > > > 1. updateFeatures
> > > > > > > 2. deleteFeatures
> > > > > > >
> > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > > > increasing
> > > > > > > > version of the metadata for finalized features." I am
> wondering
> > > why
> > > > > the
> > > > > > > > ordering is important?
> > > > > > >
> > > > > > > (Kowshik): In the latest KIP write-up, it is called epoch
> > (instead
> > > of
> > > > > > > version), and
> > > > > > > it is just the ZK node version. Basically, this is the epoch
> for
> > > the
> > > > > > > cluster-wide
> > > > > > > finalized feature version metadata. This metadata is served to
> > > > clients
> > > > > > via
> > > > > > > the
> > > > > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > > > > '/features'
> > > > > > > ZK node
> > > > > > > to all brokers, via ZK watches setup by each broker on the
> > > > '/features'
> > > > > > > node.
> > > > > > >
> > > > > > > Now here is why the ordering is important:
> > > > > > > ZK watches don't propagate at the same time. As a result, the
> > > > > > > ApiVersionsResponse
> > > > > > > is eventually consistent across brokers. This can introduce
> cases
> > > > > > > where clients see an older lower epoch of the features
> metadata,
> > > > after
> > > > > a
> > > > > > > more recent
> > > > > > > higher epoch was returned at a previous point in time. We
> expect
> > > > > clients
> > > > > > > to always employ the rule that the latest received higher epoch
> > of
> > > > > > metadata
> > > > > > > always trumps an older smaller epoch. Those clients that are
> > > external
> > > > > to
> > > > > > > Kafka should strongly consider discovering the latest metadata
> > once
> > > > > > during
> > > > > > > startup from the brokers, and if required refresh the metadata
> > > > > > periodically
> > > > > > > (to get the latest metadata).
> > > > > > >
> > > > > > > > 100.6 Could you specify the required ACL for this new
> request?
> > > > > > >
> > > > > > > (Kowshik): What is ACL, and how could I find out which one to
> > > > specify?
> > > > > > > Please could you provide me some pointers? I'll be glad to
> update
> > > the
> > > > > > > KIP once I know the next steps.
> > > > > > >
> > > > > > > > 101. For the broker registration ZK node, should we bump up
> the
> > > > > version
> > > > > > > in
> > > > > > > the json?
> > > > > > >
> > > > > > > (Kowshik): Great point! Done. I've increased the version in the
> > > > broker
> > > > > > json
> > > > > > > by 1.
> > > > > > >
> > > > > > > > 102. For the /features ZK node, not sure if we need the epoch
> > > > field.
> > > > > > Each
> > > > > > > > ZK node has an internal version field that is incremented on
> > > every
> > > > > > > update.
> > > > > > >
> > > > > > > (Kowshik): Great point! Done. I'm using the ZK node version
> now,
> > > > > instead
> > > > > > of
> > > > > > > explicitly
> > > > > > > incremented epoch.
> > > > > > >
> > > > > > > > 103. "Enabling the actual semantics of a feature version
> > > > cluster-wide
> > > > > > is
> > > > > > > > left to the discretion of the logic implementing the feature
> > (ex:
> > > > can
> > > > > > be
> > > > > > > > done via dynamic broker config)." Does that mean the broker
> > > > > > registration
> > > > > > > ZK
> > > > > > > > node will be updated dynamically when this happens?
> > > > > > >
> > > > > > > (Kowshik): Not really. The text was just conveying that a
> broker
> > > > could
> > > > > > > "know" of
> > > > > > > a new feature version, but it does not mean the broker should
> > have
> > > > also
> > > > > > > activated the effects of the feature version. Knowing vs
> > activation
> > > > > are 2
> > > > > > > separate things,
> > > > > > > and the latter can be achieved by dynamic config. I have
> reworded
> > > the
> > > > > > text
> > > > > > > to
> > > > > > > make this clear to the reader.
> > > > > > >
> > > > > > >
> > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > 104.1 It would be useful to describe when the feature
> metadata
> > is
> > > > > > > included
> > > > > > > > in the request. My understanding is that it's only included
> if
> > > (1)
> > > > > > there
> > > > > > > is
> > > > > > > > a change to the finalized feature; (2) broker restart; (3)
> > > > controller
> > > > > > > > failover.
> > > > > > > > 104.2 The new fields have the following versions. Why are the
> > > > > versions
> > > > > > 3+
> > > > > > > > when the top version is bumped to 6?
> > > > > > > >       "fields":  [
> > > > > > > >         {"name": "Name", "type":  "string", "versions":
> "3+",
> > > > > > > >           "about": "The name of the feature."},
> > > > > > > >         {"name":  "Version", "type":  "int64", "versions":
> > "3+",
> > > > > > > >           "about": "The finalized version for the feature."}
> > > > > > > >       ]
> > > > > > >
> > > > > > > (Kowshik): With the new improved design, we have completely
> > > > eliminated
> > > > > > the
> > > > > > > need to
> > > > > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> > > > deliver
> > > > > > the
> > > > > > > notifications for changes to the '/features' ZK node.
> > > > > > >
> > > > > > > > 105. kafka-features.sh: Instead of using update/delete,
> perhaps
> > > > it's
> > > > > > > better
> > > > > > > > to use enable/disable?
> > > > > > >
> > > > > > > (Kowshik): For delete, yes, I have changed it so that we
> instead
> > > call
> > > > > it
> > > > > > > 'disable'.
> > > > > > > However for 'update', it can now also refer to either an
> upgrade
> > > or a
> > > > > > > forced downgrade.
> > > > > > > Therefore, I have left it the way it is, just calling it as
> just
> > > > > > 'update'.
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Kowshik,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > > > > >
> > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > > 100.1 Since this request waits for responses from brokers,
> > should
> > > > we
> > > > > > add
> > > > > > > a
> > > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > > 100.2 The response schema is a bit weird. Typically, the
> > response
> > > > > just
> > > > > > > > shows an error code and an error message, instead of echoing
> > the
> > > > > > request.
> > > > > > > > 100.3 Should we add a separate request to list/describe the
> > > > existing
> > > > > > > > features?
> > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> > request.
> > > > For
> > > > > > > > DELETE, the version field doesn't make sense. So, I guess the
> > > > broker
> > > > > > just
> > > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > > DeleteFeaturesRequest
> > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > > > increasing
> > > > > > > > version of the metadata for finalized features." I am
> wondering
> > > why
> > > > > the
> > > > > > > > ordering is important?
> > > > > > > > 100.6 Could you specify the required ACL for this new
> request?
> > > > > > > >
> > > > > > > > 101. For the broker registration ZK node, should we bump up
> the
> > > > > version
> > > > > > > in
> > > > > > > > the json?
> > > > > > > >
> > > > > > > > 102. For the /features ZK node, not sure if we need the epoch
> > > > field.
> > > > > > Each
> > > > > > > > ZK node has an internal version field that is incremented on
> > > every
> > > > > > > update.
> > > > > > > >
> > > > > > > > 103. "Enabling the actual semantics of a feature version
> > > > cluster-wide
> > > > > > is
> > > > > > > > left to the discretion of the logic implementing the feature
> > (ex:
> > > > can
> > > > > > be
> > > > > > > > done via dynamic broker config)." Does that mean the broker
> > > > > > registration
> > > > > > > ZK
> > > > > > > > node will be updated dynamically when this happens?
> > > > > > > >
> > > > > > > > 104. UpdateMetadataRequest
> > > > > > > > 104.1 It would be useful to describe when the feature
> metadata
> > is
> > > > > > > included
> > > > > > > > in the request. My understanding is that it's only included
> if
> > > (1)
> > > > > > there
> > > > > > > is
> > > > > > > > a change to the finalized feature; (2) broker restart; (3)
> > > > controller
> > > > > > > > failover.
> > > > > > > > 104.2 The new fields have the following versions. Why are the
> > > > > versions
> > > > > > 3+
> > > > > > > > when the top version is bumped to 6?
> > > > > > > >       "fields":  [
> > > > > > > >         {"name": "Name", "type":  "string", "versions":
> "3+",
> > > > > > > >           "about": "The name of the feature."},
> > > > > > > >         {"name":  "Version", "type":  "int64", "versions":
> > "3+",
> > > > > > > >           "about": "The finalized version for the feature."}
> > > > > > > >       ]
> > > > > > > >
> > > > > > > > 105. kafka-features.sh: Instead of using update/delete,
> perhaps
> > > > it's
> > > > > > > better
> > > > > > > > to use enable/disable?
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > > > kprakasam@confluent.io
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey Boyang,
> > > > > > > > >
> > > > > > > > > Thanks for the great feedback! I have updated the KIP based
> > on
> > > > your
> > > > > > > > > feedback.
> > > > > > > > > Please find my response below for your comments, look for
> > > > sentences
> > > > > > > > > starting
> > > > > > > > > with "(Kowshik)" below.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > > > traffic"
> > > > > > > > could
> > > > > > > > > be
> > > > > > > > > > converted as "When is it safe for the brokers to start
> > > serving
> > > > > new
> > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> > > earlier
> > > > > in
> > > > > > > the
> > > > > > > > > > context.
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > >
> > > > > > > > > > 2. In the *Explanation *section, the metadata version
> > number
> > > > part
> > > > > > > > seems a
> > > > > > > > > > bit blurred. Could you point a reference to later section
> > > that
> > > > we
> > > > > > > going
> > > > > > > > > to
> > > > > > > > > > store it in Zookeeper and update it every time when there
> > is
> > > a
> > > > > > > feature
> > > > > > > > > > change?
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! Done. I've added a reference in the
> > > KIP.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 3. For the feature downgrade, although it's a Non-goal of
> > the
> > > > > KIP,
> > > > > > > for
> > > > > > > > > > features such as group coordinator semantics, there is no
> > > legal
> > > > > > > > scenario
> > > > > > > > > to
> > > > > > > > > > perform a downgrade at all. So having downgrade door open
> > is
> > > > > pretty
> > > > > > > > > > error-prone as human faults happen all the time. I'm
> > assuming
> > > > as
> > > > > > new
> > > > > > > > > > features are implemented, it's not very hard to add a
> flag
> > > > during
> > > > > > > > feature
> > > > > > > > > > creation to indicate whether this feature is
> > "downgradable".
> > > > > Could
> > > > > > > you
> > > > > > > > > > explain a bit more on the extra engineering effort for
> > > shipping
> > > > > > this
> > > > > > > > KIP
> > > > > > > > > > with downgrade protection in place?
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! I'd agree and disagree here. While
> I
> > > > agree
> > > > > > that
> > > > > > > > > accidental
> > > > > > > > > downgrades can cause problems, I also think sometimes
> > > downgrades
> > > > > > should
> > > > > > > > > be allowed for emergency reasons (not all downgrades cause
> > > > issues).
> > > > > > > > > It is just subjective to the feature being downgraded.
> > > > > > > > >
> > > > > > > > > To be more strict about feature version downgrades, I have
> > > > modified
> > > > > > the
> > > > > > > > KIP
> > > > > > > > > proposing that we mandate a `--force-downgrade` flag be
> used
> > in
> > > > the
> > > > > > > > > UPDATE_FEATURES api
> > > > > > > > > and the tooling, whenever the human is downgrading a
> > finalized
> > > > > > feature
> > > > > > > > > version.
> > > > > > > > > Hopefully this should cover the requirement, until we find
> > the
> > > > need
> > > > > > for
> > > > > > > > > advanced downgrade support.
> > > > > > > > >
> > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> versions
> > > will
> > > > > be
> > > > > > > > > defined
> > > > > > > > > > in the broker code." So this means in order to restrict a
> > > > certain
> > > > > > > > > feature,
> > > > > > > > > > we need to start the broker first and then send a feature
> > > > gating
> > > > > > > > request
> > > > > > > > > > immediately, which introduces a time gap and the
> > > > > intended-to-close
> > > > > > > > > feature
> > > > > > > > > > could actually serve request during this phase. Do you
> > think
> > > we
> > > > > > > should
> > > > > > > > > also
> > > > > > > > > > support configurations as well so that admin user could
> > > freely
> > > > > roll
> > > > > > > up
> > > > > > > > a
> > > > > > > > > > cluster with all nodes complying the same feature gating,
> > > > without
> > > > > > > > > worrying
> > > > > > > > > > about the turnaround time to propagate the message only
> > after
> > > > the
> > > > > > > > cluster
> > > > > > > > > > starts up?
> > > > > > > > >
> > > > > > > > > (Kowshik): This is a great point/question. One of the
> > > > expectations
> > > > > > out
> > > > > > > of
> > > > > > > > > this KIP, which is
> > > > > > > > > already followed in the broker, is the following.
> > > > > > > > >  - Imagine at time T1 the broker starts up and registers
> it’s
> > > > > > presence
> > > > > > > in
> > > > > > > > > ZK,
> > > > > > > > >    along with advertising it’s supported features.
> > > > > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > > > > UpdateMetadataRequest
> > > > > > > > >    from the controller, which contains the latest finalized
> > > > > features
> > > > > > as
> > > > > > > > > seen by
> > > > > > > > >    the controller. The broker validates this data against
> > it’s
> > > > > > > supported
> > > > > > > > > features to
> > > > > > > > >    make sure there is no mismatch (it will shutdown if
> there
> > is
> > > > an
> > > > > > > > > incompatibility).
> > > > > > > > >
> > > > > > > > > It is expected that during the time between the 2 events T1
> > and
> > > > T2,
> > > > > > the
> > > > > > > > > broker is
> > > > > > > > > almost a silent entity in the cluster. It does not add any
> > > value
> > > > to
> > > > > > the
> > > > > > > > > cluster, or carry
> > > > > > > > > out any important broker activities. By “important”, I mean
> > it
> > > is
> > > > > not
> > > > > > > > doing
> > > > > > > > > mutations
> > > > > > > > > on it’s persistence, not mutating critical in-memory state,
> > > won’t
> > > > > be
> > > > > > > > > serving
> > > > > > > > > produce/fetch requests. Note it doesn’t even know it’s
> > assigned
> > > > > > > > partitions
> > > > > > > > > until
> > > > > > > > > it receives UpdateMetadataRequest from controller. Anything
> > the
> > > > > > broker
> > > > > > > is
> > > > > > > > > doing up
> > > > > > > > > until this point is not damaging/useful.
> > > > > > > > >
> > > > > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> existing
> > > > > > Feature",
> > > > > > > > may
> > > > > > > > > be
> > > > > > > > > > I misunderstood something, I thought the features are
> > defined
> > > > in
> > > > > > > broker
> > > > > > > > > > code, so admin could not really create a new feature?
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! You understood this right. Here
> > adding
> > > a
> > > > > > > feature
> > > > > > > > > means we are
> > > > > > > > > adding a cluster-wide finalized *max* version for a feature
> > > that
> > > > > was
> > > > > > > > > previously never finalized.
> > > > > > > > > I have clarified this in the KIP now.
> > > > > > > > >
> > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > to
> > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! I have modified the KIP adding the
> > > above
> > > > > (see
> > > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > > >
> > > > > > > > > > 7. I think we haven't discussed the alternative solution
> to
> > > > pass
> > > > > > the
> > > > > > > > > > feature information through Zookeeper. Is that mentioned
> in
> > > the
> > > > > KIP
> > > > > > > to
> > > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > >
> > > > > > > > > (Kowshik): Nice question! The broker reads finalized
> feature
> > > info
> > > > > > > stored
> > > > > > > > in
> > > > > > > > > ZK,
> > > > > > > > > only during startup when it does a validation. When serving
> > > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > > broker does not read this info from ZK directly. I'd
> imagine
> > > the
> > > > > risk
> > > > > > > is
> > > > > > > > > that it can increase
> > > > > > > > > the ZK read QPS which can be a bottleneck for the system.
> > > Today,
> > > > in
> > > > > > > Kafka
> > > > > > > > > we use the
> > > > > > > > > controller to fan out ZK updates to brokers and we want to
> > > stick
> > > > to
> > > > > > > that
> > > > > > > > > pattern to avoid
> > > > > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > > > > >
> > > > > > > > > > 8. I was under the impression that user could configure a
> > > range
> > > > > of
> > > > > > > > > > supported versions, what's the trade-off for allowing
> > single
> > > > > > > finalized
> > > > > > > > > > version only?
> > > > > > > > >
> > > > > > > > > (Kowshik): Great question! The finalized version of a
> feature
> > > > > > basically
> > > > > > > > > refers to
> > > > > > > > > the cluster-wide finalized feature "maximum" version. For
> > > > example,
> > > > > if
> > > > > > > the
> > > > > > > > > 'group_coordinator' feature
> > > > > > > > > has the finalized version set to 10, then, it means that
> > > > > cluster-wide
> > > > > > > all
> > > > > > > > > versions upto v10 are
> > > > > > > > > supported for this feature. However, note that if some
> > version
> > > > (ex:
> > > > > > v0)
> > > > > > > > > gets deprecated
> > > > > > > > > for this feature, then we don’t convey that using this
> scheme
> > > > (also
> > > > > > > > > supporting deprecation is a non-goal).
> > > > > > > > >
> > > > > > > > > (Kowshik): I’ve now modified the KIP at all points,
> refering
> > to
> > > > > > > finalized
> > > > > > > > > feature "maximum" versions.
> > > > > > > > >
> > > > > > > > > > 9. One minor syntax fix: Note that here the "client" here
> > may
> > > > be
> > > > > a
> > > > > > > > > producer
> > > > > > > > >
> > > > > > > > > (Kowshik): Great point! Done.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > > reluctanthero104@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Kowshik,
> > > > > > > > > >
> > > > > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > > > > >
> > > > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > > > traffic"
> > > > > > > > could
> > > > > > > > > be
> > > > > > > > > > converted as "When is it safe for the brokers to start
> > > serving
> > > > > new
> > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> > > earlier
> > > > > in
> > > > > > > the
> > > > > > > > > > context.
> > > > > > > > > >
> > > > > > > > > > 2. In the *Explanation *section, the metadata version
> > number
> > > > part
> > > > > > > > seems a
> > > > > > > > > > bit blurred. Could you point a reference to later section
> > > that
> > > > we
> > > > > > > going
> > > > > > > > > to
> > > > > > > > > > store it in Zookeeper and update it every time when there
> > is
> > > a
> > > > > > > feature
> > > > > > > > > > change?
> > > > > > > > > >
> > > > > > > > > > 3. For the feature downgrade, although it's a Non-goal of
> > the
> > > > > KIP,
> > > > > > > for
> > > > > > > > > > features such as group coordinator semantics, there is no
> > > legal
> > > > > > > > scenario
> > > > > > > > > to
> > > > > > > > > > perform a downgrade at all. So having downgrade door open
> > is
> > > > > pretty
> > > > > > > > > > error-prone as human faults happen all the time. I'm
> > assuming
> > > > as
> > > > > > new
> > > > > > > > > > features are implemented, it's not very hard to add a
> flag
> > > > during
> > > > > > > > feature
> > > > > > > > > > creation to indicate whether this feature is
> > "downgradable".
> > > > > Could
> > > > > > > you
> > > > > > > > > > explain a bit more on the extra engineering effort for
> > > shipping
> > > > > > this
> > > > > > > > KIP
> > > > > > > > > > with downgrade protection in place?
> > > > > > > > > >
> > > > > > > > > > 4. "Each broker’s supported dictionary of feature
> versions
> > > will
> > > > > be
> > > > > > > > > defined
> > > > > > > > > > in the broker code." So this means in order to restrict a
> > > > certain
> > > > > > > > > feature,
> > > > > > > > > > we need to start the broker first and then send a feature
> > > > gating
> > > > > > > > request
> > > > > > > > > > immediately, which introduces a time gap and the
> > > > > intended-to-close
> > > > > > > > > feature
> > > > > > > > > > could actually serve request during this phase. Do you
> > think
> > > we
> > > > > > > should
> > > > > > > > > also
> > > > > > > > > > support configurations as well so that admin user could
> > > freely
> > > > > roll
> > > > > > > up
> > > > > > > > a
> > > > > > > > > > cluster with all nodes complying the same feature gating,
> > > > without
> > > > > > > > > worrying
> > > > > > > > > > about the turnaround time to propagate the message only
> > after
> > > > the
> > > > > > > > cluster
> > > > > > > > > > starts up?
> > > > > > > > > >
> > > > > > > > > > 5. "adding a new Feature, updating or deleting an
> existing
> > > > > > Feature",
> > > > > > > > may
> > > > > > > > > be
> > > > > > > > > > I misunderstood something, I thought the features are
> > defined
> > > > in
> > > > > > > broker
> > > > > > > > > > code, so admin could not really create a new feature?
> > > > > > > > > >
> > > > > > > > > > 6. I think we need a separate error code like
> > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > > to
> > > > > > > > > > reject a concurrent feature update request.
> > > > > > > > > >
> > > > > > > > > > 7. I think we haven't discussed the alternative solution
> to
> > > > pass
> > > > > > the
> > > > > > > > > > feature information through Zookeeper. Is that mentioned
> in
> > > the
> > > > > KIP
> > > > > > > to
> > > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > > >
> > > > > > > > > > 8. I was under the impression that user could configure a
> > > range
> > > > > of
> > > > > > > > > > supported versions, what's the trade-off for allowing
> > single
> > > > > > > finalized
> > > > > > > > > > version only?
> > > > > > > > > >
> > > > > > > > > > 9. One minor syntax fix: Note that here the "client" here
> > may
> > > > be
> > > > > a
> > > > > > > > > producer
> > > > > > > > > >
> > > > > > > > > > Boyang
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > > > cmccabe@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the feedback! I've changed the KIP to
> > address
> > > > your
> > > > > > > > > > > > suggestions.
> > > > > > > > > > > > Please find below my explanation. Here is a link to
> KIP
> > > > 584:
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > .
> > > > > > > > > > > >
> > > > > > > > > > > > 1. '__data_version__' is the version of the finalized
> > > > feature
> > > > > > > > > metadata
> > > > > > > > > > > > (i.e. actual ZK node contents), while the
> > > > > '__schema_version__'
> > > > > > is
> > > > > > > > the
> > > > > > > > > > > > version of the schema of the data persisted in ZK.
> > These
> > > > > serve
> > > > > > > > > > different
> > > > > > > > > > > > purposes. '__data_version__' is is useful mainly to
> > > clients
> > > > > > > during
> > > > > > > > > > reads,
> > > > > > > > > > > > to differentiate between the 2 versions of eventually
> > > > > > consistent
> > > > > > > > > > > 'finalized
> > > > > > > > > > > > features' metadata (i.e. larger metadata version is
> > more
> > > > > > recent).
> > > > > > > > > > > > '__schema_version__' provides an additional degree of
> > > > > > > flexibility,
> > > > > > > > > > where
> > > > > > > > > > > if
> > > > > > > > > > > > we decide to change the schema for '/features' node
> in
> > ZK
> > > > (in
> > > > > > the
> > > > > > > > > > > future),
> > > > > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > > > > serialization/deserialization of the ZK data can be
> > > handled
> > > > > > > > safely).
> > > > > > > > > > >
> > > > > > > > > > > Hi Kowshik,
> > > > > > > > > > >
> > > > > > > > > > > If you're talking about a number that lets you know if
> > data
> > > > is
> > > > > > more
> > > > > > > > or
> > > > > > > > > > > less recent, we would typically call that an epoch, and
> > > not a
> > > > > > > > version.
> > > > > > > > > > For
> > > > > > > > > > > the ZK data structures, the word "version" is typically
> > > > > reserved
> > > > > > > for
> > > > > > > > > > > describing changes to the overall schema of the data
> that
> > > is
> > > > > > > written
> > > > > > > > to
> > > > > > > > > > > ZooKeeper.  We don't even really change the "version"
> of
> > > > those
> > > > > > > > schemas
> > > > > > > > > > that
> > > > > > > > > > > much, since most changes are backwards-compatible.  But
> > we
> > > do
> > > > > > > include
> > > > > > > > > > that
> > > > > > > > > > > version field just in case.
> > > > > > > > > > >
> > > > > > > > > > > I don't think we really need an epoch here, though,
> since
> > > we
> > > > > can
> > > > > > > just
> > > > > > > > > > look
> > > > > > > > > > > at the broker epoch.  Whenever the broker registers,
> its
> > > > epoch
> > > > > > will
> > > > > > > > be
> > > > > > > > > > > greater than the previous broker epoch.  And the newly
> > > > > registered
> > > > > > > > data
> > > > > > > > > > will
> > > > > > > > > > > take priority.  This will be a lot simpler than adding
> a
> > > > > separate
> > > > > > > > epoch
> > > > > > > > > > > system, I think.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2. Regarding admin client needing min and max
> > > information -
> > > > > you
> > > > > > > are
> > > > > > > > > > > right!
> > > > > > > > > > > > I've changed the KIP such that the Admin API also
> > allows
> > > > the
> > > > > > user
> > > > > > > > to
> > > > > > > > > > read
> > > > > > > > > > > > 'supported features' from a specific broker. Please
> > look
> > > at
> > > > > the
> > > > > > > > > section
> > > > > > > > > > > > "Admin API changes".
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > > > > deliberate.
> > > > > > > > > I've
> > > > > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > > > > >
> > > > > > > > > > > Sounds good.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you
> are
> > > > right!
> > > > > > > I've
> > > > > > > > > > > updated
> > > > > > > > > > > > the KIP sketching the functionality provided by this
> > > tool,
> > > > > with
> > > > > > > > some
> > > > > > > > > > > > examples. Please look at the section "Tooling support
> > > > > > examples".
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > > >
> > > > > > > > > > > cheers,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Kowshik
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > > > cmccabe@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In the "Schema" section, do we really need both
> > > > > > > > __schema_version__
> > > > > > > > > > and
> > > > > > > > > > > > > __data_version__?  Can we just have a single
> version
> > > > field
> > > > > > > here?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Shouldn't the Admin(Client) function have some way
> to
> > > get
> > > > > the
> > > > > > > min
> > > > > > > > > and
> > > > > > > > > > > max
> > > > > > > > > > > > > information that we're exposing as well?  I guess
> we
> > > > could
> > > > > > have
> > > > > > > > > min,
> > > > > > > > > > > max,
> > > > > > > > > > > > > and current.  Unrelated: is the use of Long rather
> > than
> > > > > long
> > > > > > > > > > deliberate
> > > > > > > > > > > > > here?
> > > > > > > > > > > > >
> > > > > > > > > > > > > It would be good to describe how the command line
> > tool
> > > > > > > > > > > > > kafka.admin.FeatureCommand will work.  For example
> > the
> > > > > flags
> > > > > > > that
> > > > > > > > > it
> > > > > > > > > > > will
> > > > > > > > > > > > > take and the output that it will generate to
> STDOUT.
> > > > > > > > > > > > >
> > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > Colin
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam
> > wrote:
> > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I've opened KIP-584
> > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > > >
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > is intended to provide a versioning scheme for
> > > > features.
> > > > > > I'd
> > > > > > > > like
> > > > > > > > > > to
> > > > > > > > > > > use
> > > > > > > > > > > > > > this thread to discuss the same. I'd appreciate
> any
> > > > > > feedback
> > > > > > > on
> > > > > > > > > > this.
> > > > > > > > > > > > > > Here
> > > > > > > > > > > > > > is a link to KIP-584
> > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > > >  .
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Jun,

Thanks for the feedback! I have updated the KIP-584 addressing your
comments.
Please find my response below.

> 100.6 You can look for the sentence "This operation requires ALTER on
> CLUSTER." in KIP-455. Also, you can check its usage in
> KafkaApis.authorize().

(Kowshik): Done. Great point! For the newly introduced UPDATE_FEATURES api,
I have added a
requirement that AclOperation.ALTER is required on ResourceType.CLUSTER.

> 110. Keeping the feature version as int is probably fine. I just felt that
> for some of the common user interactions, it's more convenient to
> relate that to a release version. For example, if a user wants to
downgrade
> to a release 2.5, it's easier for the user to use the tool like "tool
> --downgrade 2.5" instead of "tool --downgrade --feature X --version 6".

(Kowshik): Great point. Generally, maximum feature version levels are not
downgradable after
they are finalized in the cluster. This is because, as a guideline bumping
feature version level usually is used mainly to convey important breaking
changes.
Despite the above, there may be some extreme/rare cases where a user wants
to downgrade
all features to a specific previous release. The user may want to do this
just
prior to rolling back a Kafka cluster to a previous release.

To support the above, I have made a change to the KIP explaining that the
CLI tool is versioned.
The CLI tool internally has knowledge about a map of features to their
respective max
versions supported by the Broker. The tool's knowledge of features and
their version values,
is limited to the version of the CLI tool itself i.e. the information is
packaged into the CLI tool
when it is released. Whenever a Kafka release introduces a new feature
version, or modifies
an existing feature version, the CLI tool shall also be updated with this
information,
Newer versions of the CLI tool will be released as part of the Kafka
releases.

Therefore, to achieve the downgrade need, the user just needs to run the
version of
the CLI tool that's part of the particular previous release that he/she is
downgrading to.
To help the user with this, there is a new command added to the CLI tool
called `downgrade-all`.
This essentially downgrades max version levels of all features in the
cluster to the versions
known to the CLI tool internally.

I have explained the above in the KIP under these sections:

Tooling support (have explained that the CLI tool is versioned):
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport

Regular CLI tool usage (please refer to point #3, and see the tooling
example)
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage

> 110. Similarly, if the client library finds a feature mismatch with the
broker,
> the client likely needs to log some error message for the user to take
some
> actions. It's much more actionable if the error message is "upgrade the
> broker to release version 2.6" than just "upgrade the broker to feature
> version 7".

(Kowshik): That's a really good point! If we use ints for feature versions,
the best
message that client can print for debugging is "broker doesn't support
feature version 7", and alongside that print the supported version range
returned
by the broker. Then, does it sound reasonable that the user could then
reference
Kafka release logs to figure out which version of the broker release is
required
be deployed, to support feature version 7? I couldn't think of a better
strategy here.

> 120. When should a developer bump up the version of a feature?

(Kowshik): Great question! In the KIP, I have added a section: 'Guidelines
on feature versions and workflows'
providing some guidelines on when to use the versioned feature flags, and
what
are the regular workflows with the CLI tool.

Link to the relevant sections:
Guidelines:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows

Regular CLI tool usage:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage

Advanced CLI tool usage:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage


Cheers,
Kowshik


On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the reply. A few more comments.
>
> 110. Keeping the feature version as int is probably fine. I just felt that
> for some of the common user interactions, it's more convenient to
> relate that to a release version. For example, if a user wants to downgrade
> to a release 2.5, it's easier for the user to use the tool like "tool
> --downgrade 2.5" instead of "tool --downgrade --feature X --version 6".
> Similarly, if the client library finds a feature mismatch with the broker,
> the client likely needs to log some error message for the user to take some
> actions. It's much more actionable if the error message is "upgrade the
> broker to release version 2.6" than just "upgrade the broker to feature
> version 7".
>
> 111. Sounds good.
>
> 120. When should a developer bump up the version of a feature?
>
> Jun
>
> On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hi Jun,
> >
> > I have updated the KIP for the item 111.
> > I'm in the process of addressing 100.6, and will provide an update soon.
> > I think item 110 is still under discussion given we are now providing a
> way
> > to finalize
> > all features to their latest version levels. In any case, please let us
> > know
> > how you feel in response to Colin's comments on this topic.
> >
> > > 111. To put this in context, when we had IBP, the default value is the
> > > current released version. So, if you are a brand new user, you don't
> need
> > > to configure IBP and all new features will be immediately available in
> > the
> > > new cluster. If you are upgrading from an old version, you do need to
> > > understand and configure IBP. I see a similar pattern here for
> > > features. From the ease of use perspective, ideally, we shouldn't
> require
> > a
> > > new user to have an extra step such as running a bootstrap script
> unless
> > > it's truly necessary. If someone has a special need (all the cases you
> > > mentioned seem special cases?), they can configure a mode such that
> > > features are enabled/disabled manually.
> >
> > (Kowshik): That makes sense, thanks for the idea! Sorry if I didn't
> > understand
> > this need earlier. I have updated the KIP with the approach that whenever
> > the '/features' node is absent, the controller by default will bootstrap
> > the node
> > to contain the latest feature levels. Here is the new section in the KIP
> > describing
> > the same:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> >
> > Next, as I explained in my response to Colin's suggestions, we are now
> > providing a `--finalize-latest-features` flag with the tooling. This lets
> > the sysadmin finalize all features known to the controller to their
> latest
> > version
> > levels. Please look at this section (point #3 and the tooling example
> > later):
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> >
> >
> > Do you feel this addresses your comment/concern?
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the reply. A few more replies below.
> > >
> > > 100.6 You can look for the sentence "This operation requires ALTER on
> > > CLUSTER." in KIP-455. Also, you can check its usage in
> > > KafkaApis.authorize().
> > >
> > > 110. From the external client/tooling perspective, it's more natural to
> > use
> > > the release version for features. If we can use the same release
> version
> > > for internal representation, it seems simpler (easier to understand, no
> > > mapping overhead, etc). Is there a benefit with separate external and
> > > internal versioning schemes?
> > >
> > > 111. To put this in context, when we had IBP, the default value is the
> > > current released version. So, if you are a brand new user, you don't
> need
> > > to configure IBP and all new features will be immediately available in
> > the
> > > new cluster. If you are upgrading from an old version, you do need to
> > > understand and configure IBP. I see a similar pattern here for
> > > features. From the ease of use perspective, ideally, we shouldn't
> > require a
> > > new user to have an extra step such as running a bootstrap script
> unless
> > > it's truly necessary. If someone has a special need (all the cases you
> > > mentioned seem special cases?), they can configure a mode such that
> > > features are enabled/disabled manually.
> > >
> > > Jun
> > >
> > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> kprakasam@confluent.io>
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the feedback and suggestions. Please find my response
> below.
> > > >
> > > > > 100.6 For every new request, the admin needs to control who is
> > allowed
> > > to
> > > > > issue that request if security is enabled. So, we need to assign
> the
> > > new
> > > > > request a ResourceType and possible AclOperations. See
> > > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > as an example.
> > > >
> > > > (Kowshik): I don't see any reference to the words ResourceType or
> > > > AclOperations
> > > > in the KIP. Please let me know how I can use the KIP that you linked
> to
> > > > know how to
> > > > setup the appropriate ResourceType and/or ClusterOperation?
> > > >
> > > > > 105. If we change delete to disable, it's better to do this
> > > consistently
> > > > in
> > > > > request protocol and admin api as well.
> > > >
> > > > (Kowshik): The API shouldn't be called 'disable' when it is deleting
> a
> > > > feature.
> > > > I've just changed the KIP to use 'delete'. I don't have a strong
> > > > preference.
> > > >
> > > > > 110. The minVersion/maxVersion for features use int64. Currently,
> our
> > > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > > possible
> > > > > for new features to be included in minor releases too. Should we
> make
> > > the
> > > > > feature versioning match the release versioning?
> > > >
> > > > (Kowshik): The release version can be mapped to a set of feature
> > > versions,
> > > > and this can be done, for example in the tool (or even external to
> the
> > > > tool).
> > > > Can you please clarify what I'm missing?
> > > >
> > > > > 111. "During regular operations, the data in the ZK node can be
> > mutated
> > > > > only via a specific admin API served only by the controller." I am
> > > > > wondering why can't the controller auto finalize a feature version
> > > after
> > > > > all brokers are upgraded? For new users who download the latest
> > version
> > > > to
> > > > > build a new cluster, it's inconvenient for them to have to manually
> > > > enable
> > > > > each feature.
> > > >
> > > > (Kowshik): I agree that there is a trade-off here, but it will help
> > > > to decide whether the automation can be thought through in the future
> > > > in a follow up KIP, or right now in this KIP. We may invest
> > > > in automation, but we have to decide whether we should do it
> > > > now or later.
> > > >
> > > > For the inconvenience that you mentioned, do you think the problem
> that
> > > you
> > > > mentioned can be  overcome by asking for the cluster operator to run
> a
> > > > bootstrap script  when he/she knows that a specific AK release has
> been
> > > > almost completely deployed in a cluster for the first time? Idea is
> > that
> > > > the
> > > > bootstrap script will know how to map a specific AK release to
> > finalized
> > > > feature versions, and run the `kafka-features.sh` tool appropriately
> > > > against
> > > > the cluster.
> > > >
> > > > Now, coming back to your automation proposal/question.
> > > > I do see the value of automated feature version finalization, but I
> > also
> > > > see
> > > > that this will open up several questions and some risks, as explained
> > > > below.
> > > > The answers to these depend on the definition of the automation we
> > choose
> > > > to build, and how well does it fit into a kafka deployment.
> > > > Basically, it can be unsafe for the controller to finalize feature
> > > version
> > > > upgrades automatically, without learning about the intent of the
> > cluster
> > > > operator.
> > > > 1. We would sometimes want to lock feature versions only when we have
> > > > externally verified
> > > > the stability of the broker binary.
> > > > 2. Sometimes only the cluster operator knows that a cluster upgrade
> is
> > > > complete,
> > > > and new brokers are highly unlikely to join the cluster.
> > > > 3. Only the cluster operator knows that the intent is to deploy the
> > same
> > > > version
> > > > of the new broker release across the entire cluster (i.e. the latest
> > > > downloaded version).
> > > > 4. For downgrades, it appears the controller still needs some
> external
> > > > input
> > > > (such as the proposed tool) to finalize a feature version downgrade.
> > > >
> > > > If we have automation, that automation can end up failing in some of
> > the
> > > > cases
> > > > above. Then, we need a way to declare that the cluster is "not ready"
> > if
> > > > the
> > > > controller cannot automatically finalize some basic required feature
> > > > version
> > > > upgrades across the cluster. We need to make the cluster operator
> aware
> > > in
> > > > such a scenario (raise an alert or alike).
> > > >
> > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> instead
> > > of
> > > > 48.
> > > >
> > > > (Kowshik): Done.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the reply. A few more comments below.
> > > > >
> > > > > 100.6 For every new request, the admin needs to control who is
> > allowed
> > > to
> > > > > issue that request if security is enabled. So, we need to assign
> the
> > > new
> > > > > request a ResourceType and possible AclOperations. See
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > > as
> > > > > an example.
> > > > >
> > > > > 105. If we change delete to disable, it's better to do this
> > > consistently
> > > > in
> > > > > request protocol and admin api as well.
> > > > >
> > > > > 110. The minVersion/maxVersion for features use int64. Currently,
> our
> > > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > > possible
> > > > > for new features to be included in minor releases too. Should we
> make
> > > the
> > > > > feature versioning match the release versioning?
> > > > >
> > > > > 111. "During regular operations, the data in the ZK node can be
> > mutated
> > > > > only via a specific admin API served only by the controller." I am
> > > > > wondering why can't the controller auto finalize a feature version
> > > after
> > > > > all brokers are upgraded? For new users who download the latest
> > version
> > > > to
> > > > > build a new cluster, it's inconvenient for them to have to manually
> > > > enable
> > > > > each feature.
> > > > >
> > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> instead
> > > of
> > > > > 48.
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > > kprakasam@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Jun,
> > > > > >
> > > > > > Thanks a lot for the great feedback! Please note that the design
> > > > > > has changed a little bit on the KIP, and we now propagate the
> > > finalized
> > > > > > features metadata only via ZK watches (instead of
> > > UpdateMetadataRequest
> > > > > > from the controller).
> > > > > >
> > > > > > Please find below my response to your questions/feedback, with
> the
> > > > prefix
> > > > > > "(Kowshik):".
> > > > > >
> > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > 100.1 Since this request waits for responses from brokers,
> should
> > > we
> > > > > add
> > > > > > a
> > > > > > > timeout in the request (like createTopicRequest)?
> > > > > >
> > > > > > (Kowshik): Great point! Done. I have added a timeout field. Note:
> > we
> > > no
> > > > > > longer
> > > > > > wait for responses from brokers, since the design has been
> changed
> > so
> > > > > that
> > > > > > the
> > > > > > features information is propagated via ZK. Nevertheless, it is
> > right
> > > to
> > > > > > have a timeout
> > > > > > for the request.
> > > > > >
> > > > > > > 100.2 The response schema is a bit weird. Typically, the
> response
> > > > just
> > > > > > > shows an error code and an error message, instead of echoing
> the
> > > > > request.
> > > > > >
> > > > > > (Kowshik): Great point! Yeah, I have modified it to just return
> an
> > > > error
> > > > > > code and a message.
> > > > > > Previously it was not echoing the "request", rather it was
> > returning
> > > > the
> > > > > > latest set of
> > > > > > cluster-wide finalized features (after applying the updates). But
> > you
> > > > are
> > > > > > right,
> > > > > > the additional info is not required, so I have removed it from
> the
> > > > > response
> > > > > > schema.
> > > > > >
> > > > > > > 100.3 Should we add a separate request to list/describe the
> > > existing
> > > > > > > features?
> > > > > >
> > > > > > (Kowshik): This is already present in the KIP via the
> > > > 'DescribeFeatures'
> > > > > > Admin API,
> > > > > > which, underneath covers uses the ApiVersionsRequest to
> > list/describe
> > > > the
> > > > > > existing features. Please read the 'Tooling support' section.
> > > > > >
> > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> request.
> > > For
> > > > > > > DELETE, the version field doesn't make sense. So, I guess the
> > > broker
> > > > > just
> > > > > > > ignores this? An alternative way is to have a separate
> > > > > > DeleteFeaturesRequest
> > > > > >
> > > > > > (Kowshik): Great point! I have modified the KIP now to have 2
> > > separate
> > > > > > controller APIs
> > > > > > serving these different purposes:
> > > > > > 1. updateFeatures
> > > > > > 2. deleteFeatures
> > > > > >
> > > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > > increasing
> > > > > > > version of the metadata for finalized features." I am wondering
> > why
> > > > the
> > > > > > > ordering is important?
> > > > > >
> > > > > > (Kowshik): In the latest KIP write-up, it is called epoch
> (instead
> > of
> > > > > > version), and
> > > > > > it is just the ZK node version. Basically, this is the epoch for
> > the
> > > > > > cluster-wide
> > > > > > finalized feature version metadata. This metadata is served to
> > > clients
> > > > > via
> > > > > > the
> > > > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > > > '/features'
> > > > > > ZK node
> > > > > > to all brokers, via ZK watches setup by each broker on the
> > > '/features'
> > > > > > node.
> > > > > >
> > > > > > Now here is why the ordering is important:
> > > > > > ZK watches don't propagate at the same time. As a result, the
> > > > > > ApiVersionsResponse
> > > > > > is eventually consistent across brokers. This can introduce cases
> > > > > > where clients see an older lower epoch of the features metadata,
> > > after
> > > > a
> > > > > > more recent
> > > > > > higher epoch was returned at a previous point in time. We expect
> > > > clients
> > > > > > to always employ the rule that the latest received higher epoch
> of
> > > > > metadata
> > > > > > always trumps an older smaller epoch. Those clients that are
> > external
> > > > to
> > > > > > Kafka should strongly consider discovering the latest metadata
> once
> > > > > during
> > > > > > startup from the brokers, and if required refresh the metadata
> > > > > periodically
> > > > > > (to get the latest metadata).
> > > > > >
> > > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > > >
> > > > > > (Kowshik): What is ACL, and how could I find out which one to
> > > specify?
> > > > > > Please could you provide me some pointers? I'll be glad to update
> > the
> > > > > > KIP once I know the next steps.
> > > > > >
> > > > > > > 101. For the broker registration ZK node, should we bump up the
> > > > version
> > > > > > in
> > > > > > the json?
> > > > > >
> > > > > > (Kowshik): Great point! Done. I've increased the version in the
> > > broker
> > > > > json
> > > > > > by 1.
> > > > > >
> > > > > > > 102. For the /features ZK node, not sure if we need the epoch
> > > field.
> > > > > Each
> > > > > > > ZK node has an internal version field that is incremented on
> > every
> > > > > > update.
> > > > > >
> > > > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > > > instead
> > > > > of
> > > > > > explicitly
> > > > > > incremented epoch.
> > > > > >
> > > > > > > 103. "Enabling the actual semantics of a feature version
> > > cluster-wide
> > > > > is
> > > > > > > left to the discretion of the logic implementing the feature
> (ex:
> > > can
> > > > > be
> > > > > > > done via dynamic broker config)." Does that mean the broker
> > > > > registration
> > > > > > ZK
> > > > > > > node will be updated dynamically when this happens?
> > > > > >
> > > > > > (Kowshik): Not really. The text was just conveying that a broker
> > > could
> > > > > > "know" of
> > > > > > a new feature version, but it does not mean the broker should
> have
> > > also
> > > > > > activated the effects of the feature version. Knowing vs
> activation
> > > > are 2
> > > > > > separate things,
> > > > > > and the latter can be achieved by dynamic config. I have reworded
> > the
> > > > > text
> > > > > > to
> > > > > > make this clear to the reader.
> > > > > >
> > > > > >
> > > > > > > 104. UpdateMetadataRequest
> > > > > > > 104.1 It would be useful to describe when the feature metadata
> is
> > > > > > included
> > > > > > > in the request. My understanding is that it's only included if
> > (1)
> > > > > there
> > > > > > is
> > > > > > > a change to the finalized feature; (2) broker restart; (3)
> > > controller
> > > > > > > failover.
> > > > > > > 104.2 The new fields have the following versions. Why are the
> > > > versions
> > > > > 3+
> > > > > > > when the top version is bumped to 6?
> > > > > > >       "fields":  [
> > > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > > >           "about": "The name of the feature."},
> > > > > > >         {"name":  "Version", "type":  "int64", "versions":
> "3+",
> > > > > > >           "about": "The finalized version for the feature."}
> > > > > > >       ]
> > > > > >
> > > > > > (Kowshik): With the new improved design, we have completely
> > > eliminated
> > > > > the
> > > > > > need to
> > > > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> > > deliver
> > > > > the
> > > > > > notifications for changes to the '/features' ZK node.
> > > > > >
> > > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> > > it's
> > > > > > better
> > > > > > > to use enable/disable?
> > > > > >
> > > > > > (Kowshik): For delete, yes, I have changed it so that we instead
> > call
> > > > it
> > > > > > 'disable'.
> > > > > > However for 'update', it can now also refer to either an upgrade
> > or a
> > > > > > forced downgrade.
> > > > > > Therefore, I have left it the way it is, just calling it as just
> > > > > 'update'.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Kowshik,
> > > > > > >
> > > > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > > > >
> > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > > 100.1 Since this request waits for responses from brokers,
> should
> > > we
> > > > > add
> > > > > > a
> > > > > > > timeout in the request (like createTopicRequest)?
> > > > > > > 100.2 The response schema is a bit weird. Typically, the
> response
> > > > just
> > > > > > > shows an error code and an error message, instead of echoing
> the
> > > > > request.
> > > > > > > 100.3 Should we add a separate request to list/describe the
> > > existing
> > > > > > > features?
> > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> request.
> > > For
> > > > > > > DELETE, the version field doesn't make sense. So, I guess the
> > > broker
> > > > > just
> > > > > > > ignores this? An alternative way is to have a separate
> > > > > > > DeleteFeaturesRequest
> > > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > > increasing
> > > > > > > version of the metadata for finalized features." I am wondering
> > why
> > > > the
> > > > > > > ordering is important?
> > > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > > > >
> > > > > > > 101. For the broker registration ZK node, should we bump up the
> > > > version
> > > > > > in
> > > > > > > the json?
> > > > > > >
> > > > > > > 102. For the /features ZK node, not sure if we need the epoch
> > > field.
> > > > > Each
> > > > > > > ZK node has an internal version field that is incremented on
> > every
> > > > > > update.
> > > > > > >
> > > > > > > 103. "Enabling the actual semantics of a feature version
> > > cluster-wide
> > > > > is
> > > > > > > left to the discretion of the logic implementing the feature
> (ex:
> > > can
> > > > > be
> > > > > > > done via dynamic broker config)." Does that mean the broker
> > > > > registration
> > > > > > ZK
> > > > > > > node will be updated dynamically when this happens?
> > > > > > >
> > > > > > > 104. UpdateMetadataRequest
> > > > > > > 104.1 It would be useful to describe when the feature metadata
> is
> > > > > > included
> > > > > > > in the request. My understanding is that it's only included if
> > (1)
> > > > > there
> > > > > > is
> > > > > > > a change to the finalized feature; (2) broker restart; (3)
> > > controller
> > > > > > > failover.
> > > > > > > 104.2 The new fields have the following versions. Why are the
> > > > versions
> > > > > 3+
> > > > > > > when the top version is bumped to 6?
> > > > > > >       "fields":  [
> > > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > > >           "about": "The name of the feature."},
> > > > > > >         {"name":  "Version", "type":  "int64", "versions":
> "3+",
> > > > > > >           "about": "The finalized version for the feature."}
> > > > > > >       ]
> > > > > > >
> > > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> > > it's
> > > > > > better
> > > > > > > to use enable/disable?
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > > kprakasam@confluent.io
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Boyang,
> > > > > > > >
> > > > > > > > Thanks for the great feedback! I have updated the KIP based
> on
> > > your
> > > > > > > > feedback.
> > > > > > > > Please find my response below for your comments, look for
> > > sentences
> > > > > > > > starting
> > > > > > > > with "(Kowshik)" below.
> > > > > > > >
> > > > > > > >
> > > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > > traffic"
> > > > > > > could
> > > > > > > > be
> > > > > > > > > converted as "When is it safe for the brokers to start
> > serving
> > > > new
> > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> > earlier
> > > > in
> > > > > > the
> > > > > > > > > context.
> > > > > > > >
> > > > > > > > (Kowshik): Great point! Done.
> > > > > > > >
> > > > > > > > > 2. In the *Explanation *section, the metadata version
> number
> > > part
> > > > > > > seems a
> > > > > > > > > bit blurred. Could you point a reference to later section
> > that
> > > we
> > > > > > going
> > > > > > > > to
> > > > > > > > > store it in Zookeeper and update it every time when there
> is
> > a
> > > > > > feature
> > > > > > > > > change?
> > > > > > > >
> > > > > > > > (Kowshik): Great point! Done. I've added a reference in the
> > KIP.
> > > > > > > >
> > > > > > > >
> > > > > > > > > 3. For the feature downgrade, although it's a Non-goal of
> the
> > > > KIP,
> > > > > > for
> > > > > > > > > features such as group coordinator semantics, there is no
> > legal
> > > > > > > scenario
> > > > > > > > to
> > > > > > > > > perform a downgrade at all. So having downgrade door open
> is
> > > > pretty
> > > > > > > > > error-prone as human faults happen all the time. I'm
> assuming
> > > as
> > > > > new
> > > > > > > > > features are implemented, it's not very hard to add a flag
> > > during
> > > > > > > feature
> > > > > > > > > creation to indicate whether this feature is
> "downgradable".
> > > > Could
> > > > > > you
> > > > > > > > > explain a bit more on the extra engineering effort for
> > shipping
> > > > > this
> > > > > > > KIP
> > > > > > > > > with downgrade protection in place?
> > > > > > > >
> > > > > > > > (Kowshik): Great point! I'd agree and disagree here. While I
> > > agree
> > > > > that
> > > > > > > > accidental
> > > > > > > > downgrades can cause problems, I also think sometimes
> > downgrades
> > > > > should
> > > > > > > > be allowed for emergency reasons (not all downgrades cause
> > > issues).
> > > > > > > > It is just subjective to the feature being downgraded.
> > > > > > > >
> > > > > > > > To be more strict about feature version downgrades, I have
> > > modified
> > > > > the
> > > > > > > KIP
> > > > > > > > proposing that we mandate a `--force-downgrade` flag be used
> in
> > > the
> > > > > > > > UPDATE_FEATURES api
> > > > > > > > and the tooling, whenever the human is downgrading a
> finalized
> > > > > feature
> > > > > > > > version.
> > > > > > > > Hopefully this should cover the requirement, until we find
> the
> > > need
> > > > > for
> > > > > > > > advanced downgrade support.
> > > > > > > >
> > > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> > will
> > > > be
> > > > > > > > defined
> > > > > > > > > in the broker code." So this means in order to restrict a
> > > certain
> > > > > > > > feature,
> > > > > > > > > we need to start the broker first and then send a feature
> > > gating
> > > > > > > request
> > > > > > > > > immediately, which introduces a time gap and the
> > > > intended-to-close
> > > > > > > > feature
> > > > > > > > > could actually serve request during this phase. Do you
> think
> > we
> > > > > > should
> > > > > > > > also
> > > > > > > > > support configurations as well so that admin user could
> > freely
> > > > roll
> > > > > > up
> > > > > > > a
> > > > > > > > > cluster with all nodes complying the same feature gating,
> > > without
> > > > > > > > worrying
> > > > > > > > > about the turnaround time to propagate the message only
> after
> > > the
> > > > > > > cluster
> > > > > > > > > starts up?
> > > > > > > >
> > > > > > > > (Kowshik): This is a great point/question. One of the
> > > expectations
> > > > > out
> > > > > > of
> > > > > > > > this KIP, which is
> > > > > > > > already followed in the broker, is the following.
> > > > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > > > > presence
> > > > > > in
> > > > > > > > ZK,
> > > > > > > >    along with advertising it’s supported features.
> > > > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > > > UpdateMetadataRequest
> > > > > > > >    from the controller, which contains the latest finalized
> > > > features
> > > > > as
> > > > > > > > seen by
> > > > > > > >    the controller. The broker validates this data against
> it’s
> > > > > > supported
> > > > > > > > features to
> > > > > > > >    make sure there is no mismatch (it will shutdown if there
> is
> > > an
> > > > > > > > incompatibility).
> > > > > > > >
> > > > > > > > It is expected that during the time between the 2 events T1
> and
> > > T2,
> > > > > the
> > > > > > > > broker is
> > > > > > > > almost a silent entity in the cluster. It does not add any
> > value
> > > to
> > > > > the
> > > > > > > > cluster, or carry
> > > > > > > > out any important broker activities. By “important”, I mean
> it
> > is
> > > > not
> > > > > > > doing
> > > > > > > > mutations
> > > > > > > > on it’s persistence, not mutating critical in-memory state,
> > won’t
> > > > be
> > > > > > > > serving
> > > > > > > > produce/fetch requests. Note it doesn’t even know it’s
> assigned
> > > > > > > partitions
> > > > > > > > until
> > > > > > > > it receives UpdateMetadataRequest from controller. Anything
> the
> > > > > broker
> > > > > > is
> > > > > > > > doing up
> > > > > > > > until this point is not damaging/useful.
> > > > > > > >
> > > > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > > .
> > > > > > > >
> > > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > > Feature",
> > > > > > > may
> > > > > > > > be
> > > > > > > > > I misunderstood something, I thought the features are
> defined
> > > in
> > > > > > broker
> > > > > > > > > code, so admin could not really create a new feature?
> > > > > > > >
> > > > > > > > (Kowshik): Great point! You understood this right. Here
> adding
> > a
> > > > > > feature
> > > > > > > > means we are
> > > > > > > > adding a cluster-wide finalized *max* version for a feature
> > that
> > > > was
> > > > > > > > previously never finalized.
> > > > > > > > I have clarified this in the KIP now.
> > > > > > > >
> > > > > > > > > 6. I think we need a separate error code like
> > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > to
> > > > > > > > > reject a concurrent feature update request.
> > > > > > > >
> > > > > > > > (Kowshik): Great point! I have modified the KIP adding the
> > above
> > > > (see
> > > > > > > > 'Tooling support -> Admin API changes').
> > > > > > > >
> > > > > > > > > 7. I think we haven't discussed the alternative solution to
> > > pass
> > > > > the
> > > > > > > > > feature information through Zookeeper. Is that mentioned in
> > the
> > > > KIP
> > > > > > to
> > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > >
> > > > > > > > (Kowshik): Nice question! The broker reads finalized feature
> > info
> > > > > > stored
> > > > > > > in
> > > > > > > > ZK,
> > > > > > > > only during startup when it does a validation. When serving
> > > > > > > > `ApiVersionsRequest`, the
> > > > > > > > broker does not read this info from ZK directly. I'd imagine
> > the
> > > > risk
> > > > > > is
> > > > > > > > that it can increase
> > > > > > > > the ZK read QPS which can be a bottleneck for the system.
> > Today,
> > > in
> > > > > > Kafka
> > > > > > > > we use the
> > > > > > > > controller to fan out ZK updates to brokers and we want to
> > stick
> > > to
> > > > > > that
> > > > > > > > pattern to avoid
> > > > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > > > >
> > > > > > > > > 8. I was under the impression that user could configure a
> > range
> > > > of
> > > > > > > > > supported versions, what's the trade-off for allowing
> single
> > > > > > finalized
> > > > > > > > > version only?
> > > > > > > >
> > > > > > > > (Kowshik): Great question! The finalized version of a feature
> > > > > basically
> > > > > > > > refers to
> > > > > > > > the cluster-wide finalized feature "maximum" version. For
> > > example,
> > > > if
> > > > > > the
> > > > > > > > 'group_coordinator' feature
> > > > > > > > has the finalized version set to 10, then, it means that
> > > > cluster-wide
> > > > > > all
> > > > > > > > versions upto v10 are
> > > > > > > > supported for this feature. However, note that if some
> version
> > > (ex:
> > > > > v0)
> > > > > > > > gets deprecated
> > > > > > > > for this feature, then we don’t convey that using this scheme
> > > (also
> > > > > > > > supporting deprecation is a non-goal).
> > > > > > > >
> > > > > > > > (Kowshik): I’ve now modified the KIP at all points, refering
> to
> > > > > > finalized
> > > > > > > > feature "maximum" versions.
> > > > > > > >
> > > > > > > > > 9. One minor syntax fix: Note that here the "client" here
> may
> > > be
> > > > a
> > > > > > > > producer
> > > > > > > >
> > > > > > > > (Kowshik): Great point! Done.
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > > reluctanthero104@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey Kowshik,
> > > > > > > > >
> > > > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > > > >
> > > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > > traffic"
> > > > > > > could
> > > > > > > > be
> > > > > > > > > converted as "When is it safe for the brokers to start
> > serving
> > > > new
> > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> > earlier
> > > > in
> > > > > > the
> > > > > > > > > context.
> > > > > > > > >
> > > > > > > > > 2. In the *Explanation *section, the metadata version
> number
> > > part
> > > > > > > seems a
> > > > > > > > > bit blurred. Could you point a reference to later section
> > that
> > > we
> > > > > > going
> > > > > > > > to
> > > > > > > > > store it in Zookeeper and update it every time when there
> is
> > a
> > > > > > feature
> > > > > > > > > change?
> > > > > > > > >
> > > > > > > > > 3. For the feature downgrade, although it's a Non-goal of
> the
> > > > KIP,
> > > > > > for
> > > > > > > > > features such as group coordinator semantics, there is no
> > legal
> > > > > > > scenario
> > > > > > > > to
> > > > > > > > > perform a downgrade at all. So having downgrade door open
> is
> > > > pretty
> > > > > > > > > error-prone as human faults happen all the time. I'm
> assuming
> > > as
> > > > > new
> > > > > > > > > features are implemented, it's not very hard to add a flag
> > > during
> > > > > > > feature
> > > > > > > > > creation to indicate whether this feature is
> "downgradable".
> > > > Could
> > > > > > you
> > > > > > > > > explain a bit more on the extra engineering effort for
> > shipping
> > > > > this
> > > > > > > KIP
> > > > > > > > > with downgrade protection in place?
> > > > > > > > >
> > > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> > will
> > > > be
> > > > > > > > defined
> > > > > > > > > in the broker code." So this means in order to restrict a
> > > certain
> > > > > > > > feature,
> > > > > > > > > we need to start the broker first and then send a feature
> > > gating
> > > > > > > request
> > > > > > > > > immediately, which introduces a time gap and the
> > > > intended-to-close
> > > > > > > > feature
> > > > > > > > > could actually serve request during this phase. Do you
> think
> > we
> > > > > > should
> > > > > > > > also
> > > > > > > > > support configurations as well so that admin user could
> > freely
> > > > roll
> > > > > > up
> > > > > > > a
> > > > > > > > > cluster with all nodes complying the same feature gating,
> > > without
> > > > > > > > worrying
> > > > > > > > > about the turnaround time to propagate the message only
> after
> > > the
> > > > > > > cluster
> > > > > > > > > starts up?
> > > > > > > > >
> > > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > > Feature",
> > > > > > > may
> > > > > > > > be
> > > > > > > > > I misunderstood something, I thought the features are
> defined
> > > in
> > > > > > broker
> > > > > > > > > code, so admin could not really create a new feature?
> > > > > > > > >
> > > > > > > > > 6. I think we need a separate error code like
> > > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > > to
> > > > > > > > > reject a concurrent feature update request.
> > > > > > > > >
> > > > > > > > > 7. I think we haven't discussed the alternative solution to
> > > pass
> > > > > the
> > > > > > > > > feature information through Zookeeper. Is that mentioned in
> > the
> > > > KIP
> > > > > > to
> > > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > > >
> > > > > > > > > 8. I was under the impression that user could configure a
> > range
> > > > of
> > > > > > > > > supported versions, what's the trade-off for allowing
> single
> > > > > > finalized
> > > > > > > > > version only?
> > > > > > > > >
> > > > > > > > > 9. One minor syntax fix: Note that here the "client" here
> may
> > > be
> > > > a
> > > > > > > > producer
> > > > > > > > >
> > > > > > > > > Boyang
> > > > > > > > >
> > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > > cmccabe@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > > > Hi Colin,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the feedback! I've changed the KIP to
> address
> > > your
> > > > > > > > > > > suggestions.
> > > > > > > > > > > Please find below my explanation. Here is a link to KIP
> > > 584:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > .
> > > > > > > > > > >
> > > > > > > > > > > 1. '__data_version__' is the version of the finalized
> > > feature
> > > > > > > > metadata
> > > > > > > > > > > (i.e. actual ZK node contents), while the
> > > > '__schema_version__'
> > > > > is
> > > > > > > the
> > > > > > > > > > > version of the schema of the data persisted in ZK.
> These
> > > > serve
> > > > > > > > > different
> > > > > > > > > > > purposes. '__data_version__' is is useful mainly to
> > clients
> > > > > > during
> > > > > > > > > reads,
> > > > > > > > > > > to differentiate between the 2 versions of eventually
> > > > > consistent
> > > > > > > > > > 'finalized
> > > > > > > > > > > features' metadata (i.e. larger metadata version is
> more
> > > > > recent).
> > > > > > > > > > > '__schema_version__' provides an additional degree of
> > > > > > flexibility,
> > > > > > > > > where
> > > > > > > > > > if
> > > > > > > > > > > we decide to change the schema for '/features' node in
> ZK
> > > (in
> > > > > the
> > > > > > > > > > future),
> > > > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > > > serialization/deserialization of the ZK data can be
> > handled
> > > > > > > safely).
> > > > > > > > > >
> > > > > > > > > > Hi Kowshik,
> > > > > > > > > >
> > > > > > > > > > If you're talking about a number that lets you know if
> data
> > > is
> > > > > more
> > > > > > > or
> > > > > > > > > > less recent, we would typically call that an epoch, and
> > not a
> > > > > > > version.
> > > > > > > > > For
> > > > > > > > > > the ZK data structures, the word "version" is typically
> > > > reserved
> > > > > > for
> > > > > > > > > > describing changes to the overall schema of the data that
> > is
> > > > > > written
> > > > > > > to
> > > > > > > > > > ZooKeeper.  We don't even really change the "version" of
> > > those
> > > > > > > schemas
> > > > > > > > > that
> > > > > > > > > > much, since most changes are backwards-compatible.  But
> we
> > do
> > > > > > include
> > > > > > > > > that
> > > > > > > > > > version field just in case.
> > > > > > > > > >
> > > > > > > > > > I don't think we really need an epoch here, though, since
> > we
> > > > can
> > > > > > just
> > > > > > > > > look
> > > > > > > > > > at the broker epoch.  Whenever the broker registers, its
> > > epoch
> > > > > will
> > > > > > > be
> > > > > > > > > > greater than the previous broker epoch.  And the newly
> > > > registered
> > > > > > > data
> > > > > > > > > will
> > > > > > > > > > take priority.  This will be a lot simpler than adding a
> > > > separate
> > > > > > > epoch
> > > > > > > > > > system, I think.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2. Regarding admin client needing min and max
> > information -
> > > > you
> > > > > > are
> > > > > > > > > > right!
> > > > > > > > > > > I've changed the KIP such that the Admin API also
> allows
> > > the
> > > > > user
> > > > > > > to
> > > > > > > > > read
> > > > > > > > > > > 'supported features' from a specific broker. Please
> look
> > at
> > > > the
> > > > > > > > section
> > > > > > > > > > > "Admin API changes".
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > > > deliberate.
> > > > > > > > I've
> > > > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > > > >
> > > > > > > > > > Sounds good.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> > > right!
> > > > > > I've
> > > > > > > > > > updated
> > > > > > > > > > > the KIP sketching the functionality provided by this
> > tool,
> > > > with
> > > > > > > some
> > > > > > > > > > > examples. Please look at the section "Tooling support
> > > > > examples".
> > > > > > > > > > >
> > > > > > > > > > > Thank you!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks, Kowshik.
> > > > > > > > > >
> > > > > > > > > > cheers,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Kowshik
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > > cmccabe@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > > >
> > > > > > > > > > > > In the "Schema" section, do we really need both
> > > > > > > __schema_version__
> > > > > > > > > and
> > > > > > > > > > > > __data_version__?  Can we just have a single version
> > > field
> > > > > > here?
> > > > > > > > > > > >
> > > > > > > > > > > > Shouldn't the Admin(Client) function have some way to
> > get
> > > > the
> > > > > > min
> > > > > > > > and
> > > > > > > > > > max
> > > > > > > > > > > > information that we're exposing as well?  I guess we
> > > could
> > > > > have
> > > > > > > > min,
> > > > > > > > > > max,
> > > > > > > > > > > > and current.  Unrelated: is the use of Long rather
> than
> > > > long
> > > > > > > > > deliberate
> > > > > > > > > > > > here?
> > > > > > > > > > > >
> > > > > > > > > > > > It would be good to describe how the command line
> tool
> > > > > > > > > > > > kafka.admin.FeatureCommand will work.  For example
> the
> > > > flags
> > > > > > that
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > > > >
> > > > > > > > > > > > cheers,
> > > > > > > > > > > > Colin
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam
> wrote:
> > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've opened KIP-584
> > > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > > >
> > > > > > > > > > > > > which
> > > > > > > > > > > > > is intended to provide a versioning scheme for
> > > features.
> > > > > I'd
> > > > > > > like
> > > > > > > > > to
> > > > > > > > > > use
> > > > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > > > > feedback
> > > > > > on
> > > > > > > > > this.
> > > > > > > > > > > > > Here
> > > > > > > > > > > > > is a link to KIP-584
> > > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > > >  .
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Kowshik
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Jun Rao <ju...@confluent.io>.
Hi, Kowshik,

Thanks for the reply. A few more comments.

110. Keeping the feature version as int is probably fine. I just felt that
for some of the common user interactions, it's more convenient to
relate that to a release version. For example, if a user wants to downgrade
to a release 2.5, it's easier for the user to use the tool like "tool
--downgrade 2.5" instead of "tool --downgrade --feature X --version 6".
Similarly, if the client library finds a feature mismatch with the broker,
the client likely needs to log some error message for the user to take some
actions. It's much more actionable if the error message is "upgrade the
broker to release version 2.6" than just "upgrade the broker to feature
version 7".

111. Sounds good.

120. When should a developer bump up the version of a feature?

Jun

On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> I have updated the KIP for the item 111.
> I'm in the process of addressing 100.6, and will provide an update soon.
> I think item 110 is still under discussion given we are now providing a way
> to finalize
> all features to their latest version levels. In any case, please let us
> know
> how you feel in response to Colin's comments on this topic.
>
> > 111. To put this in context, when we had IBP, the default value is the
> > current released version. So, if you are a brand new user, you don't need
> > to configure IBP and all new features will be immediately available in
> the
> > new cluster. If you are upgrading from an old version, you do need to
> > understand and configure IBP. I see a similar pattern here for
> > features. From the ease of use perspective, ideally, we shouldn't require
> a
> > new user to have an extra step such as running a bootstrap script unless
> > it's truly necessary. If someone has a special need (all the cases you
> > mentioned seem special cases?), they can configure a mode such that
> > features are enabled/disabled manually.
>
> (Kowshik): That makes sense, thanks for the idea! Sorry if I didn't
> understand
> this need earlier. I have updated the KIP with the approach that whenever
> the '/features' node is absent, the controller by default will bootstrap
> the node
> to contain the latest feature levels. Here is the new section in the KIP
> describing
> the same:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
>
> Next, as I explained in my response to Colin's suggestions, we are now
> providing a `--finalize-latest-features` flag with the tooling. This lets
> the sysadmin finalize all features known to the controller to their latest
> version
> levels. Please look at this section (point #3 and the tooling example
> later):
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
>
>
> Do you feel this addresses your comment/concern?
>
>
> Cheers,
> Kowshik
>
> On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > Thanks for the reply. A few more replies below.
> >
> > 100.6 You can look for the sentence "This operation requires ALTER on
> > CLUSTER." in KIP-455. Also, you can check its usage in
> > KafkaApis.authorize().
> >
> > 110. From the external client/tooling perspective, it's more natural to
> use
> > the release version for features. If we can use the same release version
> > for internal representation, it seems simpler (easier to understand, no
> > mapping overhead, etc). Is there a benefit with separate external and
> > internal versioning schemes?
> >
> > 111. To put this in context, when we had IBP, the default value is the
> > current released version. So, if you are a brand new user, you don't need
> > to configure IBP and all new features will be immediately available in
> the
> > new cluster. If you are upgrading from an old version, you do need to
> > understand and configure IBP. I see a similar pattern here for
> > features. From the ease of use perspective, ideally, we shouldn't
> require a
> > new user to have an extra step such as running a bootstrap script unless
> > it's truly necessary. If someone has a special need (all the cases you
> > mentioned seem special cases?), they can configure a mode such that
> > features are enabled/disabled manually.
> >
> > Jun
> >
> > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the feedback and suggestions. Please find my response below.
> > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed
> > to
> > > > issue that request if security is enabled. So, we need to assign the
> > new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as an example.
> > >
> > > (Kowshik): I don't see any reference to the words ResourceType or
> > > AclOperations
> > > in the KIP. Please let me know how I can use the KIP that you linked to
> > > know how to
> > > setup the appropriate ResourceType and/or ClusterOperation?
> > >
> > > > 105. If we change delete to disable, it's better to do this
> > consistently
> > > in
> > > > request protocol and admin api as well.
> > >
> > > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > > feature.
> > > I've just changed the KIP to use 'delete'. I don't have a strong
> > > preference.
> > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > possible
> > > > for new features to be included in minor releases too. Should we make
> > the
> > > > feature versioning match the release versioning?
> > >
> > > (Kowshik): The release version can be mapped to a set of feature
> > versions,
> > > and this can be done, for example in the tool (or even external to the
> > > tool).
> > > Can you please clarify what I'm missing?
> > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> > after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > >
> > > (Kowshik): I agree that there is a trade-off here, but it will help
> > > to decide whether the automation can be thought through in the future
> > > in a follow up KIP, or right now in this KIP. We may invest
> > > in automation, but we have to decide whether we should do it
> > > now or later.
> > >
> > > For the inconvenience that you mentioned, do you think the problem that
> > you
> > > mentioned can be  overcome by asking for the cluster operator to run a
> > > bootstrap script  when he/she knows that a specific AK release has been
> > > almost completely deployed in a cluster for the first time? Idea is
> that
> > > the
> > > bootstrap script will know how to map a specific AK release to
> finalized
> > > feature versions, and run the `kafka-features.sh` tool appropriately
> > > against
> > > the cluster.
> > >
> > > Now, coming back to your automation proposal/question.
> > > I do see the value of automated feature version finalization, but I
> also
> > > see
> > > that this will open up several questions and some risks, as explained
> > > below.
> > > The answers to these depend on the definition of the automation we
> choose
> > > to build, and how well does it fit into a kafka deployment.
> > > Basically, it can be unsafe for the controller to finalize feature
> > version
> > > upgrades automatically, without learning about the intent of the
> cluster
> > > operator.
> > > 1. We would sometimes want to lock feature versions only when we have
> > > externally verified
> > > the stability of the broker binary.
> > > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > > complete,
> > > and new brokers are highly unlikely to join the cluster.
> > > 3. Only the cluster operator knows that the intent is to deploy the
> same
> > > version
> > > of the new broker release across the entire cluster (i.e. the latest
> > > downloaded version).
> > > 4. For downgrades, it appears the controller still needs some external
> > > input
> > > (such as the proposed tool) to finalize a feature version downgrade.
> > >
> > > If we have automation, that automation can end up failing in some of
> the
> > > cases
> > > above. Then, we need a way to declare that the cluster is "not ready"
> if
> > > the
> > > controller cannot automatically finalize some basic required feature
> > > version
> > > upgrades across the cluster. We need to make the cluster operator aware
> > in
> > > such a scenario (raise an alert or alike).
> > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> > of
> > > 48.
> > >
> > > (Kowshik): Done.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the reply. A few more comments below.
> > > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed
> > to
> > > > issue that request if security is enabled. So, we need to assign the
> > new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as
> > > > an example.
> > > >
> > > > 105. If we change delete to disable, it's better to do this
> > consistently
> > > in
> > > > request protocol and admin api as well.
> > > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > possible
> > > > for new features to be included in minor releases too. Should we make
> > the
> > > > feature versioning match the release versioning?
> > > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> > after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> > of
> > > > 48.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > kprakasam@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Jun,
> > > > >
> > > > > Thanks a lot for the great feedback! Please note that the design
> > > > > has changed a little bit on the KIP, and we now propagate the
> > finalized
> > > > > features metadata only via ZK watches (instead of
> > UpdateMetadataRequest
> > > > > from the controller).
> > > > >
> > > > > Please find below my response to your questions/feedback, with the
> > > prefix
> > > > > "(Kowshik):".
> > > > >
> > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > 100.1 Since this request waits for responses from brokers, should
> > we
> > > > add
> > > > > a
> > > > > > timeout in the request (like createTopicRequest)?
> > > > >
> > > > > (Kowshik): Great point! Done. I have added a timeout field. Note:
> we
> > no
> > > > > longer
> > > > > wait for responses from brokers, since the design has been changed
> so
> > > > that
> > > > > the
> > > > > features information is propagated via ZK. Nevertheless, it is
> right
> > to
> > > > > have a timeout
> > > > > for the request.
> > > > >
> > > > > > 100.2 The response schema is a bit weird. Typically, the response
> > > just
> > > > > > shows an error code and an error message, instead of echoing the
> > > > request.
> > > > >
> > > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > > error
> > > > > code and a message.
> > > > > Previously it was not echoing the "request", rather it was
> returning
> > > the
> > > > > latest set of
> > > > > cluster-wide finalized features (after applying the updates). But
> you
> > > are
> > > > > right,
> > > > > the additional info is not required, so I have removed it from the
> > > > response
> > > > > schema.
> > > > >
> > > > > > 100.3 Should we add a separate request to list/describe the
> > existing
> > > > > > features?
> > > > >
> > > > > (Kowshik): This is already present in the KIP via the
> > > 'DescribeFeatures'
> > > > > Admin API,
> > > > > which, underneath covers uses the ApiVersionsRequest to
> list/describe
> > > the
> > > > > existing features. Please read the 'Tooling support' section.
> > > > >
> > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> > For
> > > > > > DELETE, the version field doesn't make sense. So, I guess the
> > broker
> > > > just
> > > > > > ignores this? An alternative way is to have a separate
> > > > > DeleteFeaturesRequest
> > > > >
> > > > > (Kowshik): Great point! I have modified the KIP now to have 2
> > separate
> > > > > controller APIs
> > > > > serving these different purposes:
> > > > > 1. updateFeatures
> > > > > 2. deleteFeatures
> > > > >
> > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > increasing
> > > > > > version of the metadata for finalized features." I am wondering
> why
> > > the
> > > > > > ordering is important?
> > > > >
> > > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead
> of
> > > > > version), and
> > > > > it is just the ZK node version. Basically, this is the epoch for
> the
> > > > > cluster-wide
> > > > > finalized feature version metadata. This metadata is served to
> > clients
> > > > via
> > > > > the
> > > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > > '/features'
> > > > > ZK node
> > > > > to all brokers, via ZK watches setup by each broker on the
> > '/features'
> > > > > node.
> > > > >
> > > > > Now here is why the ordering is important:
> > > > > ZK watches don't propagate at the same time. As a result, the
> > > > > ApiVersionsResponse
> > > > > is eventually consistent across brokers. This can introduce cases
> > > > > where clients see an older lower epoch of the features metadata,
> > after
> > > a
> > > > > more recent
> > > > > higher epoch was returned at a previous point in time. We expect
> > > clients
> > > > > to always employ the rule that the latest received higher epoch of
> > > > metadata
> > > > > always trumps an older smaller epoch. Those clients that are
> external
> > > to
> > > > > Kafka should strongly consider discovering the latest metadata once
> > > > during
> > > > > startup from the brokers, and if required refresh the metadata
> > > > periodically
> > > > > (to get the latest metadata).
> > > > >
> > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > >
> > > > > (Kowshik): What is ACL, and how could I find out which one to
> > specify?
> > > > > Please could you provide me some pointers? I'll be glad to update
> the
> > > > > KIP once I know the next steps.
> > > > >
> > > > > > 101. For the broker registration ZK node, should we bump up the
> > > version
> > > > > in
> > > > > the json?
> > > > >
> > > > > (Kowshik): Great point! Done. I've increased the version in the
> > broker
> > > > json
> > > > > by 1.
> > > > >
> > > > > > 102. For the /features ZK node, not sure if we need the epoch
> > field.
> > > > Each
> > > > > > ZK node has an internal version field that is incremented on
> every
> > > > > update.
> > > > >
> > > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > > instead
> > > > of
> > > > > explicitly
> > > > > incremented epoch.
> > > > >
> > > > > > 103. "Enabling the actual semantics of a feature version
> > cluster-wide
> > > > is
> > > > > > left to the discretion of the logic implementing the feature (ex:
> > can
> > > > be
> > > > > > done via dynamic broker config)." Does that mean the broker
> > > > registration
> > > > > ZK
> > > > > > node will be updated dynamically when this happens?
> > > > >
> > > > > (Kowshik): Not really. The text was just conveying that a broker
> > could
> > > > > "know" of
> > > > > a new feature version, but it does not mean the broker should have
> > also
> > > > > activated the effects of the feature version. Knowing vs activation
> > > are 2
> > > > > separate things,
> > > > > and the latter can be achieved by dynamic config. I have reworded
> the
> > > > text
> > > > > to
> > > > > make this clear to the reader.
> > > > >
> > > > >
> > > > > > 104. UpdateMetadataRequest
> > > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > > included
> > > > > > in the request. My understanding is that it's only included if
> (1)
> > > > there
> > > > > is
> > > > > > a change to the finalized feature; (2) broker restart; (3)
> > controller
> > > > > > failover.
> > > > > > 104.2 The new fields have the following versions. Why are the
> > > versions
> > > > 3+
> > > > > > when the top version is bumped to 6?
> > > > > >       "fields":  [
> > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > >           "about": "The name of the feature."},
> > > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > > >           "about": "The finalized version for the feature."}
> > > > > >       ]
> > > > >
> > > > > (Kowshik): With the new improved design, we have completely
> > eliminated
> > > > the
> > > > > need to
> > > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> > deliver
> > > > the
> > > > > notifications for changes to the '/features' ZK node.
> > > > >
> > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> > it's
> > > > > better
> > > > > > to use enable/disable?
> > > > >
> > > > > (Kowshik): For delete, yes, I have changed it so that we instead
> call
> > > it
> > > > > 'disable'.
> > > > > However for 'update', it can now also refer to either an upgrade
> or a
> > > > > forced downgrade.
> > > > > Therefore, I have left it the way it is, just calling it as just
> > > > 'update'.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Kowshik,
> > > > > >
> > > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > > >
> > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > 100.1 Since this request waits for responses from brokers, should
> > we
> > > > add
> > > > > a
> > > > > > timeout in the request (like createTopicRequest)?
> > > > > > 100.2 The response schema is a bit weird. Typically, the response
> > > just
> > > > > > shows an error code and an error message, instead of echoing the
> > > > request.
> > > > > > 100.3 Should we add a separate request to list/describe the
> > existing
> > > > > > features?
> > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> > For
> > > > > > DELETE, the version field doesn't make sense. So, I guess the
> > broker
> > > > just
> > > > > > ignores this? An alternative way is to have a separate
> > > > > > DeleteFeaturesRequest
> > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > increasing
> > > > > > version of the metadata for finalized features." I am wondering
> why
> > > the
> > > > > > ordering is important?
> > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > > >
> > > > > > 101. For the broker registration ZK node, should we bump up the
> > > version
> > > > > in
> > > > > > the json?
> > > > > >
> > > > > > 102. For the /features ZK node, not sure if we need the epoch
> > field.
> > > > Each
> > > > > > ZK node has an internal version field that is incremented on
> every
> > > > > update.
> > > > > >
> > > > > > 103. "Enabling the actual semantics of a feature version
> > cluster-wide
> > > > is
> > > > > > left to the discretion of the logic implementing the feature (ex:
> > can
> > > > be
> > > > > > done via dynamic broker config)." Does that mean the broker
> > > > registration
> > > > > ZK
> > > > > > node will be updated dynamically when this happens?
> > > > > >
> > > > > > 104. UpdateMetadataRequest
> > > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > > included
> > > > > > in the request. My understanding is that it's only included if
> (1)
> > > > there
> > > > > is
> > > > > > a change to the finalized feature; (2) broker restart; (3)
> > controller
> > > > > > failover.
> > > > > > 104.2 The new fields have the following versions. Why are the
> > > versions
> > > > 3+
> > > > > > when the top version is bumped to 6?
> > > > > >       "fields":  [
> > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > >           "about": "The name of the feature."},
> > > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > > >           "about": "The finalized version for the feature."}
> > > > > >       ]
> > > > > >
> > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> > it's
> > > > > better
> > > > > > to use enable/disable?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > kprakasam@confluent.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Boyang,
> > > > > > >
> > > > > > > Thanks for the great feedback! I have updated the KIP based on
> > your
> > > > > > > feedback.
> > > > > > > Please find my response below for your comments, look for
> > sentences
> > > > > > > starting
> > > > > > > with "(Kowshik)" below.
> > > > > > >
> > > > > > >
> > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > traffic"
> > > > > > could
> > > > > > > be
> > > > > > > > converted as "When is it safe for the brokers to start
> serving
> > > new
> > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier
> > > in
> > > > > the
> > > > > > > > context.
> > > > > > >
> > > > > > > (Kowshik): Great point! Done.
> > > > > > >
> > > > > > > > 2. In the *Explanation *section, the metadata version number
> > part
> > > > > > seems a
> > > > > > > > bit blurred. Could you point a reference to later section
> that
> > we
> > > > > going
> > > > > > > to
> > > > > > > > store it in Zookeeper and update it every time when there is
> a
> > > > > feature
> > > > > > > > change?
> > > > > > >
> > > > > > > (Kowshik): Great point! Done. I've added a reference in the
> KIP.
> > > > > > >
> > > > > > >
> > > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > > KIP,
> > > > > for
> > > > > > > > features such as group coordinator semantics, there is no
> legal
> > > > > > scenario
> > > > > > > to
> > > > > > > > perform a downgrade at all. So having downgrade door open is
> > > pretty
> > > > > > > > error-prone as human faults happen all the time. I'm assuming
> > as
> > > > new
> > > > > > > > features are implemented, it's not very hard to add a flag
> > during
> > > > > > feature
> > > > > > > > creation to indicate whether this feature is "downgradable".
> > > Could
> > > > > you
> > > > > > > > explain a bit more on the extra engineering effort for
> shipping
> > > > this
> > > > > > KIP
> > > > > > > > with downgrade protection in place?
> > > > > > >
> > > > > > > (Kowshik): Great point! I'd agree and disagree here. While I
> > agree
> > > > that
> > > > > > > accidental
> > > > > > > downgrades can cause problems, I also think sometimes
> downgrades
> > > > should
> > > > > > > be allowed for emergency reasons (not all downgrades cause
> > issues).
> > > > > > > It is just subjective to the feature being downgraded.
> > > > > > >
> > > > > > > To be more strict about feature version downgrades, I have
> > modified
> > > > the
> > > > > > KIP
> > > > > > > proposing that we mandate a `--force-downgrade` flag be used in
> > the
> > > > > > > UPDATE_FEATURES api
> > > > > > > and the tooling, whenever the human is downgrading a finalized
> > > > feature
> > > > > > > version.
> > > > > > > Hopefully this should cover the requirement, until we find the
> > need
> > > > for
> > > > > > > advanced downgrade support.
> > > > > > >
> > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> will
> > > be
> > > > > > > defined
> > > > > > > > in the broker code." So this means in order to restrict a
> > certain
> > > > > > > feature,
> > > > > > > > we need to start the broker first and then send a feature
> > gating
> > > > > > request
> > > > > > > > immediately, which introduces a time gap and the
> > > intended-to-close
> > > > > > > feature
> > > > > > > > could actually serve request during this phase. Do you think
> we
> > > > > should
> > > > > > > also
> > > > > > > > support configurations as well so that admin user could
> freely
> > > roll
> > > > > up
> > > > > > a
> > > > > > > > cluster with all nodes complying the same feature gating,
> > without
> > > > > > > worrying
> > > > > > > > about the turnaround time to propagate the message only after
> > the
> > > > > > cluster
> > > > > > > > starts up?
> > > > > > >
> > > > > > > (Kowshik): This is a great point/question. One of the
> > expectations
> > > > out
> > > > > of
> > > > > > > this KIP, which is
> > > > > > > already followed in the broker, is the following.
> > > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > > > presence
> > > > > in
> > > > > > > ZK,
> > > > > > >    along with advertising it’s supported features.
> > > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > > UpdateMetadataRequest
> > > > > > >    from the controller, which contains the latest finalized
> > > features
> > > > as
> > > > > > > seen by
> > > > > > >    the controller. The broker validates this data against it’s
> > > > > supported
> > > > > > > features to
> > > > > > >    make sure there is no mismatch (it will shutdown if there is
> > an
> > > > > > > incompatibility).
> > > > > > >
> > > > > > > It is expected that during the time between the 2 events T1 and
> > T2,
> > > > the
> > > > > > > broker is
> > > > > > > almost a silent entity in the cluster. It does not add any
> value
> > to
> > > > the
> > > > > > > cluster, or carry
> > > > > > > out any important broker activities. By “important”, I mean it
> is
> > > not
> > > > > > doing
> > > > > > > mutations
> > > > > > > on it’s persistence, not mutating critical in-memory state,
> won’t
> > > be
> > > > > > > serving
> > > > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > > > partitions
> > > > > > > until
> > > > > > > it receives UpdateMetadataRequest from controller. Anything the
> > > > broker
> > > > > is
> > > > > > > doing up
> > > > > > > until this point is not damaging/useful.
> > > > > > >
> > > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > .
> > > > > > >
> > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > Feature",
> > > > > > may
> > > > > > > be
> > > > > > > > I misunderstood something, I thought the features are defined
> > in
> > > > > broker
> > > > > > > > code, so admin could not really create a new feature?
> > > > > > >
> > > > > > > (Kowshik): Great point! You understood this right. Here adding
> a
> > > > > feature
> > > > > > > means we are
> > > > > > > adding a cluster-wide finalized *max* version for a feature
> that
> > > was
> > > > > > > previously never finalized.
> > > > > > > I have clarified this in the KIP now.
> > > > > > >
> > > > > > > > 6. I think we need a separate error code like
> > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > to
> > > > > > > > reject a concurrent feature update request.
> > > > > > >
> > > > > > > (Kowshik): Great point! I have modified the KIP adding the
> above
> > > (see
> > > > > > > 'Tooling support -> Admin API changes').
> > > > > > >
> > > > > > > > 7. I think we haven't discussed the alternative solution to
> > pass
> > > > the
> > > > > > > > feature information through Zookeeper. Is that mentioned in
> the
> > > KIP
> > > > > to
> > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > >
> > > > > > > (Kowshik): Nice question! The broker reads finalized feature
> info
> > > > > stored
> > > > > > in
> > > > > > > ZK,
> > > > > > > only during startup when it does a validation. When serving
> > > > > > > `ApiVersionsRequest`, the
> > > > > > > broker does not read this info from ZK directly. I'd imagine
> the
> > > risk
> > > > > is
> > > > > > > that it can increase
> > > > > > > the ZK read QPS which can be a bottleneck for the system.
> Today,
> > in
> > > > > Kafka
> > > > > > > we use the
> > > > > > > controller to fan out ZK updates to brokers and we want to
> stick
> > to
> > > > > that
> > > > > > > pattern to avoid
> > > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > > >
> > > > > > > > 8. I was under the impression that user could configure a
> range
> > > of
> > > > > > > > supported versions, what's the trade-off for allowing single
> > > > > finalized
> > > > > > > > version only?
> > > > > > >
> > > > > > > (Kowshik): Great question! The finalized version of a feature
> > > > basically
> > > > > > > refers to
> > > > > > > the cluster-wide finalized feature "maximum" version. For
> > example,
> > > if
> > > > > the
> > > > > > > 'group_coordinator' feature
> > > > > > > has the finalized version set to 10, then, it means that
> > > cluster-wide
> > > > > all
> > > > > > > versions upto v10 are
> > > > > > > supported for this feature. However, note that if some version
> > (ex:
> > > > v0)
> > > > > > > gets deprecated
> > > > > > > for this feature, then we don’t convey that using this scheme
> > (also
> > > > > > > supporting deprecation is a non-goal).
> > > > > > >
> > > > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > > > finalized
> > > > > > > feature "maximum" versions.
> > > > > > >
> > > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> > be
> > > a
> > > > > > > producer
> > > > > > >
> > > > > > > (Kowshik): Great point! Done.
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > reluctanthero104@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Kowshik,
> > > > > > > >
> > > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > > >
> > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > traffic"
> > > > > > could
> > > > > > > be
> > > > > > > > converted as "When is it safe for the brokers to start
> serving
> > > new
> > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier
> > > in
> > > > > the
> > > > > > > > context.
> > > > > > > >
> > > > > > > > 2. In the *Explanation *section, the metadata version number
> > part
> > > > > > seems a
> > > > > > > > bit blurred. Could you point a reference to later section
> that
> > we
> > > > > going
> > > > > > > to
> > > > > > > > store it in Zookeeper and update it every time when there is
> a
> > > > > feature
> > > > > > > > change?
> > > > > > > >
> > > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > > KIP,
> > > > > for
> > > > > > > > features such as group coordinator semantics, there is no
> legal
> > > > > > scenario
> > > > > > > to
> > > > > > > > perform a downgrade at all. So having downgrade door open is
> > > pretty
> > > > > > > > error-prone as human faults happen all the time. I'm assuming
> > as
> > > > new
> > > > > > > > features are implemented, it's not very hard to add a flag
> > during
> > > > > > feature
> > > > > > > > creation to indicate whether this feature is "downgradable".
> > > Could
> > > > > you
> > > > > > > > explain a bit more on the extra engineering effort for
> shipping
> > > > this
> > > > > > KIP
> > > > > > > > with downgrade protection in place?
> > > > > > > >
> > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> will
> > > be
> > > > > > > defined
> > > > > > > > in the broker code." So this means in order to restrict a
> > certain
> > > > > > > feature,
> > > > > > > > we need to start the broker first and then send a feature
> > gating
> > > > > > request
> > > > > > > > immediately, which introduces a time gap and the
> > > intended-to-close
> > > > > > > feature
> > > > > > > > could actually serve request during this phase. Do you think
> we
> > > > > should
> > > > > > > also
> > > > > > > > support configurations as well so that admin user could
> freely
> > > roll
> > > > > up
> > > > > > a
> > > > > > > > cluster with all nodes complying the same feature gating,
> > without
> > > > > > > worrying
> > > > > > > > about the turnaround time to propagate the message only after
> > the
> > > > > > cluster
> > > > > > > > starts up?
> > > > > > > >
> > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > Feature",
> > > > > > may
> > > > > > > be
> > > > > > > > I misunderstood something, I thought the features are defined
> > in
> > > > > broker
> > > > > > > > code, so admin could not really create a new feature?
> > > > > > > >
> > > > > > > > 6. I think we need a separate error code like
> > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > to
> > > > > > > > reject a concurrent feature update request.
> > > > > > > >
> > > > > > > > 7. I think we haven't discussed the alternative solution to
> > pass
> > > > the
> > > > > > > > feature information through Zookeeper. Is that mentioned in
> the
> > > KIP
> > > > > to
> > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > >
> > > > > > > > 8. I was under the impression that user could configure a
> range
> > > of
> > > > > > > > supported versions, what's the trade-off for allowing single
> > > > > finalized
> > > > > > > > version only?
> > > > > > > >
> > > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> > be
> > > a
> > > > > > > producer
> > > > > > > >
> > > > > > > > Boyang
> > > > > > > >
> > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > cmccabe@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > > Hi Colin,
> > > > > > > > > >
> > > > > > > > > > Thanks for the feedback! I've changed the KIP to address
> > your
> > > > > > > > > > suggestions.
> > > > > > > > > > Please find below my explanation. Here is a link to KIP
> > 584:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > 1. '__data_version__' is the version of the finalized
> > feature
> > > > > > > metadata
> > > > > > > > > > (i.e. actual ZK node contents), while the
> > > '__schema_version__'
> > > > is
> > > > > > the
> > > > > > > > > > version of the schema of the data persisted in ZK. These
> > > serve
> > > > > > > > different
> > > > > > > > > > purposes. '__data_version__' is is useful mainly to
> clients
> > > > > during
> > > > > > > > reads,
> > > > > > > > > > to differentiate between the 2 versions of eventually
> > > > consistent
> > > > > > > > > 'finalized
> > > > > > > > > > features' metadata (i.e. larger metadata version is more
> > > > recent).
> > > > > > > > > > '__schema_version__' provides an additional degree of
> > > > > flexibility,
> > > > > > > > where
> > > > > > > > > if
> > > > > > > > > > we decide to change the schema for '/features' node in ZK
> > (in
> > > > the
> > > > > > > > > future),
> > > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > > serialization/deserialization of the ZK data can be
> handled
> > > > > > safely).
> > > > > > > > >
> > > > > > > > > Hi Kowshik,
> > > > > > > > >
> > > > > > > > > If you're talking about a number that lets you know if data
> > is
> > > > more
> > > > > > or
> > > > > > > > > less recent, we would typically call that an epoch, and
> not a
> > > > > > version.
> > > > > > > > For
> > > > > > > > > the ZK data structures, the word "version" is typically
> > > reserved
> > > > > for
> > > > > > > > > describing changes to the overall schema of the data that
> is
> > > > > written
> > > > > > to
> > > > > > > > > ZooKeeper.  We don't even really change the "version" of
> > those
> > > > > > schemas
> > > > > > > > that
> > > > > > > > > much, since most changes are backwards-compatible.  But we
> do
> > > > > include
> > > > > > > > that
> > > > > > > > > version field just in case.
> > > > > > > > >
> > > > > > > > > I don't think we really need an epoch here, though, since
> we
> > > can
> > > > > just
> > > > > > > > look
> > > > > > > > > at the broker epoch.  Whenever the broker registers, its
> > epoch
> > > > will
> > > > > > be
> > > > > > > > > greater than the previous broker epoch.  And the newly
> > > registered
> > > > > > data
> > > > > > > > will
> > > > > > > > > take priority.  This will be a lot simpler than adding a
> > > separate
> > > > > > epoch
> > > > > > > > > system, I think.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. Regarding admin client needing min and max
> information -
> > > you
> > > > > are
> > > > > > > > > right!
> > > > > > > > > > I've changed the KIP such that the Admin API also allows
> > the
> > > > user
> > > > > > to
> > > > > > > > read
> > > > > > > > > > 'supported features' from a specific broker. Please look
> at
> > > the
> > > > > > > section
> > > > > > > > > > "Admin API changes".
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > > deliberate.
> > > > > > > I've
> > > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > > >
> > > > > > > > > Sounds good.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> > right!
> > > > > I've
> > > > > > > > > updated
> > > > > > > > > > the KIP sketching the functionality provided by this
> tool,
> > > with
> > > > > > some
> > > > > > > > > > examples. Please look at the section "Tooling support
> > > > examples".
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks, Kowshik.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > cmccabe@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > >
> > > > > > > > > > > In the "Schema" section, do we really need both
> > > > > > __schema_version__
> > > > > > > > and
> > > > > > > > > > > __data_version__?  Can we just have a single version
> > field
> > > > > here?
> > > > > > > > > > >
> > > > > > > > > > > Shouldn't the Admin(Client) function have some way to
> get
> > > the
> > > > > min
> > > > > > > and
> > > > > > > > > max
> > > > > > > > > > > information that we're exposing as well?  I guess we
> > could
> > > > have
> > > > > > > min,
> > > > > > > > > max,
> > > > > > > > > > > and current.  Unrelated: is the use of Long rather than
> > > long
> > > > > > > > deliberate
> > > > > > > > > > > here?
> > > > > > > > > > >
> > > > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> > > flags
> > > > > that
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > > >
> > > > > > > > > > > cheers,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > I've opened KIP-584
> > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > >
> > > > > > > > > > > > which
> > > > > > > > > > > > is intended to provide a versioning scheme for
> > features.
> > > > I'd
> > > > > > like
> > > > > > > > to
> > > > > > > > > use
> > > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > > > feedback
> > > > > on
> > > > > > > > this.
> > > > > > > > > > > > Here
> > > > > > > > > > > > is a link to KIP-584
> > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > >  .
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you!
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Kowshik
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Jun,

I have updated the KIP for the item 111.
I'm in the process of addressing 100.6, and will provide an update soon.
I think item 110 is still under discussion given we are now providing a way
to finalize
all features to their latest version levels. In any case, please let us know
how you feel in response to Colin's comments on this topic.

> 111. To put this in context, when we had IBP, the default value is the
> current released version. So, if you are a brand new user, you don't need
> to configure IBP and all new features will be immediately available in the
> new cluster. If you are upgrading from an old version, you do need to
> understand and configure IBP. I see a similar pattern here for
> features. From the ease of use perspective, ideally, we shouldn't require
a
> new user to have an extra step such as running a bootstrap script unless
> it's truly necessary. If someone has a special need (all the cases you
> mentioned seem special cases?), they can configure a mode such that
> features are enabled/disabled manually.

(Kowshik): That makes sense, thanks for the idea! Sorry if I didn't
understand
this need earlier. I have updated the KIP with the approach that whenever
the '/features' node is absent, the controller by default will bootstrap
the node
to contain the latest feature levels. Here is the new section in the KIP
describing
the same:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues

Next, as I explained in my response to Colin's suggestions, we are now
providing a `--finalize-latest-features` flag with the tooling. This lets
the sysadmin finalize all features known to the controller to their latest
version
levels. Please look at this section (point #3 and the tooling example
later):
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport


Do you feel this addresses your comment/concern?


Cheers,
Kowshik

On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the reply. A few more replies below.
>
> 100.6 You can look for the sentence "This operation requires ALTER on
> CLUSTER." in KIP-455. Also, you can check its usage in
> KafkaApis.authorize().
>
> 110. From the external client/tooling perspective, it's more natural to use
> the release version for features. If we can use the same release version
> for internal representation, it seems simpler (easier to understand, no
> mapping overhead, etc). Is there a benefit with separate external and
> internal versioning schemes?
>
> 111. To put this in context, when we had IBP, the default value is the
> current released version. So, if you are a brand new user, you don't need
> to configure IBP and all new features will be immediately available in the
> new cluster. If you are upgrading from an old version, you do need to
> understand and configure IBP. I see a similar pattern here for
> features. From the ease of use perspective, ideally, we shouldn't require a
> new user to have an extra step such as running a bootstrap script unless
> it's truly necessary. If someone has a special need (all the cases you
> mentioned seem special cases?), they can configure a mode such that
> features are enabled/disabled manually.
>
> Jun
>
> On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hi Jun,
> >
> > Thanks for the feedback and suggestions. Please find my response below.
> >
> > > 100.6 For every new request, the admin needs to control who is allowed
> to
> > > issue that request if security is enabled. So, we need to assign the
> new
> > > request a ResourceType and possible AclOperations. See
> > >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as an example.
> >
> > (Kowshik): I don't see any reference to the words ResourceType or
> > AclOperations
> > in the KIP. Please let me know how I can use the KIP that you linked to
> > know how to
> > setup the appropriate ResourceType and/or ClusterOperation?
> >
> > > 105. If we change delete to disable, it's better to do this
> consistently
> > in
> > > request protocol and admin api as well.
> >
> > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > feature.
> > I've just changed the KIP to use 'delete'. I don't have a strong
> > preference.
> >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > for new features to be included in minor releases too. Should we make
> the
> > > feature versioning match the release versioning?
> >
> > (Kowshik): The release version can be mapped to a set of feature
> versions,
> > and this can be done, for example in the tool (or even external to the
> > tool).
> > Can you please clarify what I'm missing?
> >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version
> after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> >
> > (Kowshik): I agree that there is a trade-off here, but it will help
> > to decide whether the automation can be thought through in the future
> > in a follow up KIP, or right now in this KIP. We may invest
> > in automation, but we have to decide whether we should do it
> > now or later.
> >
> > For the inconvenience that you mentioned, do you think the problem that
> you
> > mentioned can be  overcome by asking for the cluster operator to run a
> > bootstrap script  when he/she knows that a specific AK release has been
> > almost completely deployed in a cluster for the first time? Idea is that
> > the
> > bootstrap script will know how to map a specific AK release to finalized
> > feature versions, and run the `kafka-features.sh` tool appropriately
> > against
> > the cluster.
> >
> > Now, coming back to your automation proposal/question.
> > I do see the value of automated feature version finalization, but I also
> > see
> > that this will open up several questions and some risks, as explained
> > below.
> > The answers to these depend on the definition of the automation we choose
> > to build, and how well does it fit into a kafka deployment.
> > Basically, it can be unsafe for the controller to finalize feature
> version
> > upgrades automatically, without learning about the intent of the cluster
> > operator.
> > 1. We would sometimes want to lock feature versions only when we have
> > externally verified
> > the stability of the broker binary.
> > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > complete,
> > and new brokers are highly unlikely to join the cluster.
> > 3. Only the cluster operator knows that the intent is to deploy the same
> > version
> > of the new broker release across the entire cluster (i.e. the latest
> > downloaded version).
> > 4. For downgrades, it appears the controller still needs some external
> > input
> > (such as the proposed tool) to finalize a feature version downgrade.
> >
> > If we have automation, that automation can end up failing in some of the
> > cases
> > above. Then, we need a way to declare that the cluster is "not ready" if
> > the
> > controller cannot automatically finalize some basic required feature
> > version
> > upgrades across the cluster. We need to make the cluster operator aware
> in
> > such a scenario (raise an alert or alike).
> >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> of
> > 48.
> >
> > (Kowshik): Done.
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the reply. A few more comments below.
> > >
> > > 100.6 For every new request, the admin needs to control who is allowed
> to
> > > issue that request if security is enabled. So, we need to assign the
> new
> > > request a ResourceType and possible AclOperations. See
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as
> > > an example.
> > >
> > > 105. If we change delete to disable, it's better to do this
> consistently
> > in
> > > request protocol and admin api as well.
> > >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > for new features to be included in minor releases too. Should we make
> the
> > > feature versioning match the release versioning?
> > >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version
> after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> > >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> of
> > > 48.
> > >
> > > Jun
> > >
> > >
> > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> kprakasam@confluent.io>
> > > wrote:
> > >
> > > > Hey Jun,
> > > >
> > > > Thanks a lot for the great feedback! Please note that the design
> > > > has changed a little bit on the KIP, and we now propagate the
> finalized
> > > > features metadata only via ZK watches (instead of
> UpdateMetadataRequest
> > > > from the controller).
> > > >
> > > > Please find below my response to your questions/feedback, with the
> > prefix
> > > > "(Kowshik):".
> > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should
> we
> > > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > >
> > > > (Kowshik): Great point! Done. I have added a timeout field. Note: we
> no
> > > > longer
> > > > wait for responses from brokers, since the design has been changed so
> > > that
> > > > the
> > > > features information is propagated via ZK. Nevertheless, it is right
> to
> > > > have a timeout
> > > > for the request.
> > > >
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > > request.
> > > >
> > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > error
> > > > code and a message.
> > > > Previously it was not echoing the "request", rather it was returning
> > the
> > > > latest set of
> > > > cluster-wide finalized features (after applying the updates). But you
> > are
> > > > right,
> > > > the additional info is not required, so I have removed it from the
> > > response
> > > > schema.
> > > >
> > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > features?
> > > >
> > > > (Kowshik): This is already present in the KIP via the
> > 'DescribeFeatures'
> > > > Admin API,
> > > > which, underneath covers uses the ApiVersionsRequest to list/describe
> > the
> > > > existing features. Please read the 'Tooling support' section.
> > > >
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> For
> > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > > just
> > > > > ignores this? An alternative way is to have a separate
> > > > DeleteFeaturesRequest
> > > >
> > > > (Kowshik): Great point! I have modified the KIP now to have 2
> separate
> > > > controller APIs
> > > > serving these different purposes:
> > > > 1. updateFeatures
> > > > 2. deleteFeatures
> > > >
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > >
> > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > > > version), and
> > > > it is just the ZK node version. Basically, this is the epoch for the
> > > > cluster-wide
> > > > finalized feature version metadata. This metadata is served to
> clients
> > > via
> > > > the
> > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > '/features'
> > > > ZK node
> > > > to all brokers, via ZK watches setup by each broker on the
> '/features'
> > > > node.
> > > >
> > > > Now here is why the ordering is important:
> > > > ZK watches don't propagate at the same time. As a result, the
> > > > ApiVersionsResponse
> > > > is eventually consistent across brokers. This can introduce cases
> > > > where clients see an older lower epoch of the features metadata,
> after
> > a
> > > > more recent
> > > > higher epoch was returned at a previous point in time. We expect
> > clients
> > > > to always employ the rule that the latest received higher epoch of
> > > metadata
> > > > always trumps an older smaller epoch. Those clients that are external
> > to
> > > > Kafka should strongly consider discovering the latest metadata once
> > > during
> > > > startup from the brokers, and if required refresh the metadata
> > > periodically
> > > > (to get the latest metadata).
> > > >
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > >
> > > > (Kowshik): What is ACL, and how could I find out which one to
> specify?
> > > > Please could you provide me some pointers? I'll be glad to update the
> > > > KIP once I know the next steps.
> > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > the json?
> > > >
> > > > (Kowshik): Great point! Done. I've increased the version in the
> broker
> > > json
> > > > by 1.
> > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > >
> > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > instead
> > > of
> > > > explicitly
> > > > incremented epoch.
> > > >
> > > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > > is
> > > > > left to the discretion of the logic implementing the feature (ex:
> can
> > > be
> > > > > done via dynamic broker config)." Does that mean the broker
> > > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > >
> > > > (Kowshik): Not really. The text was just conveying that a broker
> could
> > > > "know" of
> > > > a new feature version, but it does not mean the broker should have
> also
> > > > activated the effects of the feature version. Knowing vs activation
> > are 2
> > > > separate things,
> > > > and the latter can be achieved by dynamic config. I have reworded the
> > > text
> > > > to
> > > > make this clear to the reader.
> > > >
> > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions
> > > 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > >
> > > > (Kowshik): With the new improved design, we have completely
> eliminated
> > > the
> > > > need to
> > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> deliver
> > > the
> > > > notifications for changes to the '/features' ZK node.
> > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > better
> > > > > to use enable/disable?
> > > >
> > > > (Kowshik): For delete, yes, I have changed it so that we instead call
> > it
> > > > 'disable'.
> > > > However for 'update', it can now also refer to either an upgrade or a
> > > > forced downgrade.
> > > > Therefore, I have left it the way it is, just calling it as just
> > > 'update'.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should
> we
> > > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > > request.
> > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > features?
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> For
> > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > > just
> > > > > ignores this? An alternative way is to have a separate
> > > > > DeleteFeaturesRequest
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > > the json?
> > > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > > >
> > > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > > is
> > > > > left to the discretion of the logic implementing the feature (ex:
> can
> > > be
> > > > > done via dynamic broker config)." Does that mean the broker
> > > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions
> > > 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > better
> > > > > to use enable/disable?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > kprakasam@confluent.io
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey Boyang,
> > > > > >
> > > > > > Thanks for the great feedback! I have updated the KIP based on
> your
> > > > > > feedback.
> > > > > > Please find my response below for your comments, look for
> sentences
> > > > > > starting
> > > > > > with "(Kowshik)" below.
> > > > > >
> > > > > >
> > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > traffic"
> > > > > could
> > > > > > be
> > > > > > > converted as "When is it safe for the brokers to start serving
> > new
> > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> > in
> > > > the
> > > > > > > context.
> > > > > >
> > > > > > (Kowshik): Great point! Done.
> > > > > >
> > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > seems a
> > > > > > > bit blurred. Could you point a reference to later section that
> we
> > > > going
> > > > > > to
> > > > > > > store it in Zookeeper and update it every time when there is a
> > > > feature
> > > > > > > change?
> > > > > >
> > > > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > > > >
> > > > > >
> > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > KIP,
> > > > for
> > > > > > > features such as group coordinator semantics, there is no legal
> > > > > scenario
> > > > > > to
> > > > > > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > > > > > error-prone as human faults happen all the time. I'm assuming
> as
> > > new
> > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > feature
> > > > > > > creation to indicate whether this feature is "downgradable".
> > Could
> > > > you
> > > > > > > explain a bit more on the extra engineering effort for shipping
> > > this
> > > > > KIP
> > > > > > > with downgrade protection in place?
> > > > > >
> > > > > > (Kowshik): Great point! I'd agree and disagree here. While I
> agree
> > > that
> > > > > > accidental
> > > > > > downgrades can cause problems, I also think sometimes downgrades
> > > should
> > > > > > be allowed for emergency reasons (not all downgrades cause
> issues).
> > > > > > It is just subjective to the feature being downgraded.
> > > > > >
> > > > > > To be more strict about feature version downgrades, I have
> modified
> > > the
> > > > > KIP
> > > > > > proposing that we mandate a `--force-downgrade` flag be used in
> the
> > > > > > UPDATE_FEATURES api
> > > > > > and the tooling, whenever the human is downgrading a finalized
> > > feature
> > > > > > version.
> > > > > > Hopefully this should cover the requirement, until we find the
> need
> > > for
> > > > > > advanced downgrade support.
> > > > > >
> > > > > > > 4. "Each broker’s supported dictionary of feature versions will
> > be
> > > > > > defined
> > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > feature,
> > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > request
> > > > > > > immediately, which introduces a time gap and the
> > intended-to-close
> > > > > > feature
> > > > > > > could actually serve request during this phase. Do you think we
> > > > should
> > > > > > also
> > > > > > > support configurations as well so that admin user could freely
> > roll
> > > > up
> > > > > a
> > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > worrying
> > > > > > > about the turnaround time to propagate the message only after
> the
> > > > > cluster
> > > > > > > starts up?
> > > > > >
> > > > > > (Kowshik): This is a great point/question. One of the
> expectations
> > > out
> > > > of
> > > > > > this KIP, which is
> > > > > > already followed in the broker, is the following.
> > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > > presence
> > > > in
> > > > > > ZK,
> > > > > >    along with advertising it’s supported features.
> > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > UpdateMetadataRequest
> > > > > >    from the controller, which contains the latest finalized
> > features
> > > as
> > > > > > seen by
> > > > > >    the controller. The broker validates this data against it’s
> > > > supported
> > > > > > features to
> > > > > >    make sure there is no mismatch (it will shutdown if there is
> an
> > > > > > incompatibility).
> > > > > >
> > > > > > It is expected that during the time between the 2 events T1 and
> T2,
> > > the
> > > > > > broker is
> > > > > > almost a silent entity in the cluster. It does not add any value
> to
> > > the
> > > > > > cluster, or carry
> > > > > > out any important broker activities. By “important”, I mean it is
> > not
> > > > > doing
> > > > > > mutations
> > > > > > on it’s persistence, not mutating critical in-memory state, won’t
> > be
> > > > > > serving
> > > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > > partitions
> > > > > > until
> > > > > > it receives UpdateMetadataRequest from controller. Anything the
> > > broker
> > > > is
> > > > > > doing up
> > > > > > until this point is not damaging/useful.
> > > > > >
> > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > .
> > > > > >
> > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > Feature",
> > > > > may
> > > > > > be
> > > > > > > I misunderstood something, I thought the features are defined
> in
> > > > broker
> > > > > > > code, so admin could not really create a new feature?
> > > > > >
> > > > > > (Kowshik): Great point! You understood this right. Here adding a
> > > > feature
> > > > > > means we are
> > > > > > adding a cluster-wide finalized *max* version for a feature that
> > was
> > > > > > previously never finalized.
> > > > > > I have clarified this in the KIP now.
> > > > > >
> > > > > > > 6. I think we need a separate error code like
> > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > to
> > > > > > > reject a concurrent feature update request.
> > > > > >
> > > > > > (Kowshik): Great point! I have modified the KIP adding the above
> > (see
> > > > > > 'Tooling support -> Admin API changes').
> > > > > >
> > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > > the
> > > > > > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > > to
> > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > >
> > > > > > (Kowshik): Nice question! The broker reads finalized feature info
> > > > stored
> > > > > in
> > > > > > ZK,
> > > > > > only during startup when it does a validation. When serving
> > > > > > `ApiVersionsRequest`, the
> > > > > > broker does not read this info from ZK directly. I'd imagine the
> > risk
> > > > is
> > > > > > that it can increase
> > > > > > the ZK read QPS which can be a bottleneck for the system. Today,
> in
> > > > Kafka
> > > > > > we use the
> > > > > > controller to fan out ZK updates to brokers and we want to stick
> to
> > > > that
> > > > > > pattern to avoid
> > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > >
> > > > > > > 8. I was under the impression that user could configure a range
> > of
> > > > > > > supported versions, what's the trade-off for allowing single
> > > > finalized
> > > > > > > version only?
> > > > > >
> > > > > > (Kowshik): Great question! The finalized version of a feature
> > > basically
> > > > > > refers to
> > > > > > the cluster-wide finalized feature "maximum" version. For
> example,
> > if
> > > > the
> > > > > > 'group_coordinator' feature
> > > > > > has the finalized version set to 10, then, it means that
> > cluster-wide
> > > > all
> > > > > > versions upto v10 are
> > > > > > supported for this feature. However, note that if some version
> (ex:
> > > v0)
> > > > > > gets deprecated
> > > > > > for this feature, then we don’t convey that using this scheme
> (also
> > > > > > supporting deprecation is a non-goal).
> > > > > >
> > > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > > finalized
> > > > > > feature "maximum" versions.
> > > > > >
> > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> be
> > a
> > > > > > producer
> > > > > >
> > > > > > (Kowshik): Great point! Done.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > reluctanthero104@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Kowshik,
> > > > > > >
> > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > >
> > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > traffic"
> > > > > could
> > > > > > be
> > > > > > > converted as "When is it safe for the brokers to start serving
> > new
> > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> > in
> > > > the
> > > > > > > context.
> > > > > > >
> > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > seems a
> > > > > > > bit blurred. Could you point a reference to later section that
> we
> > > > going
> > > > > > to
> > > > > > > store it in Zookeeper and update it every time when there is a
> > > > feature
> > > > > > > change?
> > > > > > >
> > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > KIP,
> > > > for
> > > > > > > features such as group coordinator semantics, there is no legal
> > > > > scenario
> > > > > > to
> > > > > > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > > > > > error-prone as human faults happen all the time. I'm assuming
> as
> > > new
> > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > feature
> > > > > > > creation to indicate whether this feature is "downgradable".
> > Could
> > > > you
> > > > > > > explain a bit more on the extra engineering effort for shipping
> > > this
> > > > > KIP
> > > > > > > with downgrade protection in place?
> > > > > > >
> > > > > > > 4. "Each broker’s supported dictionary of feature versions will
> > be
> > > > > > defined
> > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > feature,
> > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > request
> > > > > > > immediately, which introduces a time gap and the
> > intended-to-close
> > > > > > feature
> > > > > > > could actually serve request during this phase. Do you think we
> > > > should
> > > > > > also
> > > > > > > support configurations as well so that admin user could freely
> > roll
> > > > up
> > > > > a
> > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > worrying
> > > > > > > about the turnaround time to propagate the message only after
> the
> > > > > cluster
> > > > > > > starts up?
> > > > > > >
> > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > Feature",
> > > > > may
> > > > > > be
> > > > > > > I misunderstood something, I thought the features are defined
> in
> > > > broker
> > > > > > > code, so admin could not really create a new feature?
> > > > > > >
> > > > > > > 6. I think we need a separate error code like
> > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > to
> > > > > > > reject a concurrent feature update request.
> > > > > > >
> > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > > the
> > > > > > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > > to
> > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > >
> > > > > > > 8. I was under the impression that user could configure a range
> > of
> > > > > > > supported versions, what's the trade-off for allowing single
> > > > finalized
> > > > > > > version only?
> > > > > > >
> > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> be
> > a
> > > > > > producer
> > > > > > >
> > > > > > > Boyang
> > > > > > >
> > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> cmccabe@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > Hi Colin,
> > > > > > > > >
> > > > > > > > > Thanks for the feedback! I've changed the KIP to address
> your
> > > > > > > > > suggestions.
> > > > > > > > > Please find below my explanation. Here is a link to KIP
> 584:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > 1. '__data_version__' is the version of the finalized
> feature
> > > > > > metadata
> > > > > > > > > (i.e. actual ZK node contents), while the
> > '__schema_version__'
> > > is
> > > > > the
> > > > > > > > > version of the schema of the data persisted in ZK. These
> > serve
> > > > > > > different
> > > > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > > > during
> > > > > > > reads,
> > > > > > > > > to differentiate between the 2 versions of eventually
> > > consistent
> > > > > > > > 'finalized
> > > > > > > > > features' metadata (i.e. larger metadata version is more
> > > recent).
> > > > > > > > > '__schema_version__' provides an additional degree of
> > > > flexibility,
> > > > > > > where
> > > > > > > > if
> > > > > > > > > we decide to change the schema for '/features' node in ZK
> (in
> > > the
> > > > > > > > future),
> > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > serialization/deserialization of the ZK data can be handled
> > > > > safely).
> > > > > > > >
> > > > > > > > Hi Kowshik,
> > > > > > > >
> > > > > > > > If you're talking about a number that lets you know if data
> is
> > > more
> > > > > or
> > > > > > > > less recent, we would typically call that an epoch, and not a
> > > > > version.
> > > > > > > For
> > > > > > > > the ZK data structures, the word "version" is typically
> > reserved
> > > > for
> > > > > > > > describing changes to the overall schema of the data that is
> > > > written
> > > > > to
> > > > > > > > ZooKeeper.  We don't even really change the "version" of
> those
> > > > > schemas
> > > > > > > that
> > > > > > > > much, since most changes are backwards-compatible.  But we do
> > > > include
> > > > > > > that
> > > > > > > > version field just in case.
> > > > > > > >
> > > > > > > > I don't think we really need an epoch here, though, since we
> > can
> > > > just
> > > > > > > look
> > > > > > > > at the broker epoch.  Whenever the broker registers, its
> epoch
> > > will
> > > > > be
> > > > > > > > greater than the previous broker epoch.  And the newly
> > registered
> > > > > data
> > > > > > > will
> > > > > > > > take priority.  This will be a lot simpler than adding a
> > separate
> > > > > epoch
> > > > > > > > system, I think.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 2. Regarding admin client needing min and max information -
> > you
> > > > are
> > > > > > > > right!
> > > > > > > > > I've changed the KIP such that the Admin API also allows
> the
> > > user
> > > > > to
> > > > > > > read
> > > > > > > > > 'supported features' from a specific broker. Please look at
> > the
> > > > > > section
> > > > > > > > > "Admin API changes".
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > deliberate.
> > > > > > I've
> > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > >
> > > > > > > > Sounds good.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> right!
> > > > I've
> > > > > > > > updated
> > > > > > > > > the KIP sketching the functionality provided by this tool,
> > with
> > > > > some
> > > > > > > > > examples. Please look at the section "Tooling support
> > > examples".
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks, Kowshik.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > cmccabe@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > >
> > > > > > > > > > In the "Schema" section, do we really need both
> > > > > __schema_version__
> > > > > > > and
> > > > > > > > > > __data_version__?  Can we just have a single version
> field
> > > > here?
> > > > > > > > > >
> > > > > > > > > > Shouldn't the Admin(Client) function have some way to get
> > the
> > > > min
> > > > > > and
> > > > > > > > max
> > > > > > > > > > information that we're exposing as well?  I guess we
> could
> > > have
> > > > > > min,
> > > > > > > > max,
> > > > > > > > > > and current.  Unrelated: is the use of Long rather than
> > long
> > > > > > > deliberate
> > > > > > > > > > here?
> > > > > > > > > >
> > > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> > flags
> > > > that
> > > > > > it
> > > > > > > > will
> > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > >
> > > > > > > > > > cheers,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I've opened KIP-584
> > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > >
> > > > > > > > > > > which
> > > > > > > > > > > is intended to provide a versioning scheme for
> features.
> > > I'd
> > > > > like
> > > > > > > to
> > > > > > > > use
> > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > > feedback
> > > > on
> > > > > > > this.
> > > > > > > > > > > Here
> > > > > > > > > > > is a link to KIP-584
> > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > >  .
> > > > > > > > > > >
> > > > > > > > > > > Thank you!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Kowshik
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Guozhang,

Thanks for the explanation! This is a very good point. I have updated the
KIP incorporating
the proposed idea. We now maintain/serve MAX as well as MIN version levels
of finalized features.
So, the client will get to know both these values in the
ApiVersionsResponse. This serves as a solution
to the problem that you explained earlier.

It is important to note the following explanation. We only allow the
finalized feature MAX version level to
be increased/decreased dynamically via the controller API. Contrastingly,
the MIN version level can not be
mutated via the Controller API. This is because, MIN version level is
usually increased only to indicate the
intent to stop support for a certain feature version. We would usually
deprecate features during broker releases,
after prior announcements. Therefore, the facility to mutate MIN version
level need not be made available
through the controller API to the cluster operator.

Instead it is sufficient if such changes can be done directly by the
controller i.e. during a certain Kafka
release we would change the controller code to mutate the '/features' ZK
node increasing the MIN version level
of one or more finalized features (this will be a planned change, as
determined by Kafka developers). Then, as
this Broker release gets rolled out to a cluster, the feature versions will
become permanently deprecated.

Here are links to the specific sub-sections with the changes including
MIN/MAX version levels:

Goals:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Goals

Non-goals (see point #2):
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Non-goals

Feature version deprecation:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation

Admin API changes:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdminAPIchanges


Cheers,
Kowshik



On Mon, Apr 6, 2020 at 3:28 PM Guozhang Wang <wa...@gmail.com> wrote:

> Hello Kowshik,
>
> For 2) above, my motivations is more from the flexibility on client side
> instead of version deprecation: let's say a client talks to the cluster
> learned that the cluster-wide version for feature is X, while the client
> itself only knows how to execute the feature up to version Y ( < X), then
> at the moment the client has to give up leveraging that since it is not
> sure if all brokers actually supports version Y or not. This is because the
> version X is only guaranteed to be within the common overlapping range of
> all [low, high] across brokers where "low" is not always 0, so the client
> cannot safely assume that any versions smaller than X are also supported on
> the cluster.
>
> If we assume that when cluster-wide version is X, then all versions smaller
> than X are guaranteed to be supported, then it means all broker's supported
> version range is like [0, high], which I think is not realistic?
>
>
> Guozhang
>
>
>
> On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > Thanks for the reply. A few more replies below.
> >
> > 100.6 You can look for the sentence "This operation requires ALTER on
> > CLUSTER." in KIP-455 <https://issues.apache.org/jira/browse/KIP-455>.
> Also, you can check its usage in
> > KafkaApis.authorize().
> >
> > 110. From the external client/tooling perspective, it's more natural to
> use
> > the release version for features. If we can use the same release version
> > for internal representation, it seems simpler (easier to understand, no
> > mapping overhead, etc). Is there a benefit with separate external and
> > internal versioning schemes?
> >
> > 111. To put this in context, when we had IBP, the default value is the
> > current released version. So, if you are a brand new user, you don't need
> > to configure IBP and all new features will be immediately available in
> the
> > new cluster. If you are upgrading from an old version, you do need to
> > understand and configure IBP. I see a similar pattern here for
> > features. From the ease of use perspective, ideally, we shouldn't
> require a
> > new user to have an extra step such as running a bootstrap script unless
> > it's truly necessary. If someone has a special need (all the cases you
> > mentioned seem special cases?), they can configure a mode such that
> > features are enabled/disabled manually.
> >
> > Jun
> >
> > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the feedback and suggestions. Please find my response below.
> > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed
> > to
> > > > issue that request if security is enabled. So, we need to assign the
> > new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as an example.
> > >
> > > (Kowshik): I don't see any reference to the words ResourceType or
> > > AclOperations
> > > in the KIP. Please let me know how I can use the KIP that you linked to
> > > know how to
> > > setup the appropriate ResourceType and/or ClusterOperation?
> > >
> > > > 105. If we change delete to disable, it's better to do this
> > consistently
> > > in
> > > > request protocol and admin api as well.
> > >
> > > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > > feature.
> > > I've just changed the KIP to use 'delete'. I don't have a strong
> > > preference.
> > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > possible
> > > > for new features to be included in minor releases too. Should we make
> > the
> > > > feature versioning match the release versioning?
> > >
> > > (Kowshik): The release version can be mapped to a set of feature
> > versions,
> > > and this can be done, for example in the tool (or even external to the
> > > tool).
> > > Can you please clarify what I'm missing?
> > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> > after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > >
> > > (Kowshik): I agree that there is a trade-off here, but it will help
> > > to decide whether the automation can be thought through in the future
> > > in a follow up KIP, or right now in this KIP. We may invest
> > > in automation, but we have to decide whether we should do it
> > > now or later.
> > >
> > > For the inconvenience that you mentioned, do you think the problem that
> > you
> > > mentioned can be  overcome by asking for the cluster operator to run a
> > > bootstrap script  when he/she knows that a specific AK release has been
> > > almost completely deployed in a cluster for the first time? Idea is
> that
> > > the
> > > bootstrap script will know how to map a specific AK release to
> finalized
> > > feature versions, and run the `kafka-features.sh` tool appropriately
> > > against
> > > the cluster.
> > >
> > > Now, coming back to your automation proposal/question.
> > > I do see the value of automated feature version finalization, but I
> also
> > > see
> > > that this will open up several questions and some risks, as explained
> > > below.
> > > The answers to these depend on the definition of the automation we
> choose
> > > to build, and how well does it fit into a kafka deployment.
> > > Basically, it can be unsafe for the controller to finalize feature
> > version
> > > upgrades automatically, without learning about the intent of the
> cluster
> > > operator.
> > > 1. We would sometimes want to lock feature versions only when we have
> > > externally verified
> > > the stability of the broker binary.
> > > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > > complete,
> > > and new brokers are highly unlikely to join the cluster.
> > > 3. Only the cluster operator knows that the intent is to deploy the
> same
> > > version
> > > of the new broker release across the entire cluster (i.e. the latest
> > > downloaded version).
> > > 4. For downgrades, it appears the controller still needs some external
> > > input
> > > (such as the proposed tool) to finalize a feature version downgrade.
> > >
> > > If we have automation, that automation can end up failing in some of
> the
> > > cases
> > > above. Then, we need a way to declare that the cluster is "not ready"
> if
> > > the
> > > controller cannot automatically finalize some basic required feature
> > > version
> > > upgrades across the cluster. We need to make the cluster operator aware
> > in
> > > such a scenario (raise an alert or alike).
> > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> > of
> > > 48.
> > >
> > > (Kowshik): Done.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the reply. A few more comments below.
> > > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed
> > to
> > > > issue that request if security is enabled. So, we need to assign the
> > new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as
> > > > an example.
> > > >
> > > > 105. If we change delete to disable, it's better to do this
> > consistently
> > > in
> > > > request protocol and admin api as well.
> > > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> > possible
> > > > for new features to be included in minor releases too. Should we make
> > the
> > > > feature versioning match the release versioning?
> > > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> > after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> > of
> > > > 48.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> > kprakasam@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Jun,
> > > > >
> > > > > Thanks a lot for the great feedback! Please note that the design
> > > > > has changed a little bit on the KIP, and we now propagate the
> > finalized
> > > > > features metadata only via ZK watches (instead of
> > UpdateMetadataRequest
> > > > > from the controller).
> > > > >
> > > > > Please find below my response to your questions/feedback, with the
> > > prefix
> > > > > "(Kowshik):".
> > > > >
> > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > 100.1 Since this request waits for responses from brokers, should
> > we
> > > > add
> > > > > a
> > > > > > timeout in the request (like createTopicRequest)?
> > > > >
> > > > > (Kowshik): Great point! Done. I have added a timeout field. Note:
> we
> > no
> > > > > longer
> > > > > wait for responses from brokers, since the design has been changed
> so
> > > > that
> > > > > the
> > > > > features information is propagated via ZK. Nevertheless, it is
> right
> > to
> > > > > have a timeout
> > > > > for the request.
> > > > >
> > > > > > 100.2 The response schema is a bit weird. Typically, the response
> > > just
> > > > > > shows an error code and an error message, instead of echoing the
> > > > request.
> > > > >
> > > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > > error
> > > > > code and a message.
> > > > > Previously it was not echoing the "request", rather it was
> returning
> > > the
> > > > > latest set of
> > > > > cluster-wide finalized features (after applying the updates). But
> you
> > > are
> > > > > right,
> > > > > the additional info is not required, so I have removed it from the
> > > > response
> > > > > schema.
> > > > >
> > > > > > 100.3 Should we add a separate request to list/describe the
> > existing
> > > > > > features?
> > > > >
> > > > > (Kowshik): This is already present in the KIP via the
> > > 'DescribeFeatures'
> > > > > Admin API,
> > > > > which, underneath covers uses the ApiVersionsRequest to
> list/describe
> > > the
> > > > > existing features. Please read the 'Tooling support' section.
> > > > >
> > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> > For
> > > > > > DELETE, the version field doesn't make sense. So, I guess the
> > broker
> > > > just
> > > > > > ignores this? An alternative way is to have a separate
> > > > > DeleteFeaturesRequest
> > > > >
> > > > > (Kowshik): Great point! I have modified the KIP now to have 2
> > separate
> > > > > controller APIs
> > > > > serving these different purposes:
> > > > > 1. updateFeatures
> > > > > 2. deleteFeatures
> > > > >
> > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > increasing
> > > > > > version of the metadata for finalized features." I am wondering
> why
> > > the
> > > > > > ordering is important?
> > > > >
> > > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead
> of
> > > > > version), and
> > > > > it is just the ZK node version. Basically, this is the epoch for
> the
> > > > > cluster-wide
> > > > > finalized feature version metadata. This metadata is served to
> > clients
> > > > via
> > > > > the
> > > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > > '/features'
> > > > > ZK node
> > > > > to all brokers, via ZK watches setup by each broker on the
> > '/features'
> > > > > node.
> > > > >
> > > > > Now here is why the ordering is important:
> > > > > ZK watches don't propagate at the same time. As a result, the
> > > > > ApiVersionsResponse
> > > > > is eventually consistent across brokers. This can introduce cases
> > > > > where clients see an older lower epoch of the features metadata,
> > after
> > > a
> > > > > more recent
> > > > > higher epoch was returned at a previous point in time. We expect
> > > clients
> > > > > to always employ the rule that the latest received higher epoch of
> > > > metadata
> > > > > always trumps an older smaller epoch. Those clients that are
> external
> > > to
> > > > > Kafka should strongly consider discovering the latest metadata once
> > > > during
> > > > > startup from the brokers, and if required refresh the metadata
> > > > periodically
> > > > > (to get the latest metadata).
> > > > >
> > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > >
> > > > > (Kowshik): What is ACL, and how could I find out which one to
> > specify?
> > > > > Please could you provide me some pointers? I'll be glad to update
> the
> > > > > KIP once I know the next steps.
> > > > >
> > > > > > 101. For the broker registration ZK node, should we bump up the
> > > version
> > > > > in
> > > > > the json?
> > > > >
> > > > > (Kowshik): Great point! Done. I've increased the version in the
> > broker
> > > > json
> > > > > by 1.
> > > > >
> > > > > > 102. For the /features ZK node, not sure if we need the epoch
> > field.
> > > > Each
> > > > > > ZK node has an internal version field that is incremented on
> every
> > > > > update.
> > > > >
> > > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > > instead
> > > > of
> > > > > explicitly
> > > > > incremented epoch.
> > > > >
> > > > > > 103. "Enabling the actual semantics of a feature version
> > cluster-wide
> > > > is
> > > > > > left to the discretion of the logic implementing the feature (ex:
> > can
> > > > be
> > > > > > done via dynamic broker config)." Does that mean the broker
> > > > registration
> > > > > ZK
> > > > > > node will be updated dynamically when this happens?
> > > > >
> > > > > (Kowshik): Not really. The text was just conveying that a broker
> > could
> > > > > "know" of
> > > > > a new feature version, but it does not mean the broker should have
> > also
> > > > > activated the effects of the feature version. Knowing vs activation
> > > are 2
> > > > > separate things,
> > > > > and the latter can be achieved by dynamic config. I have reworded
> the
> > > > text
> > > > > to
> > > > > make this clear to the reader.
> > > > >
> > > > >
> > > > > > 104. UpdateMetadataRequest
> > > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > > included
> > > > > > in the request. My understanding is that it's only included if
> (1)
> > > > there
> > > > > is
> > > > > > a change to the finalized feature; (2) broker restart; (3)
> > controller
> > > > > > failover.
> > > > > > 104.2 The new fields have the following versions. Why are the
> > > versions
> > > > 3+
> > > > > > when the top version is bumped to 6?
> > > > > >       "fields":  [
> > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > >           "about": "The name of the feature."},
> > > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > > >           "about": "The finalized version for the feature."}
> > > > > >       ]
> > > > >
> > > > > (Kowshik): With the new improved design, we have completely
> > eliminated
> > > > the
> > > > > need to
> > > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> > deliver
> > > > the
> > > > > notifications for changes to the '/features' ZK node.
> > > > >
> > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> > it's
> > > > > better
> > > > > > to use enable/disable?
> > > > >
> > > > > (Kowshik): For delete, yes, I have changed it so that we instead
> call
> > > it
> > > > > 'disable'.
> > > > > However for 'update', it can now also refer to either an upgrade
> or a
> > > > > forced downgrade.
> > > > > Therefore, I have left it the way it is, just calling it as just
> > > > 'update'.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Kowshik,
> > > > > >
> > > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > > >
> > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > 100.1 Since this request waits for responses from brokers, should
> > we
> > > > add
> > > > > a
> > > > > > timeout in the request (like createTopicRequest)?
> > > > > > 100.2 The response schema is a bit weird. Typically, the response
> > > just
> > > > > > shows an error code and an error message, instead of echoing the
> > > > request.
> > > > > > 100.3 Should we add a separate request to list/describe the
> > existing
> > > > > > features?
> > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> > For
> > > > > > DELETE, the version field doesn't make sense. So, I guess the
> > broker
> > > > just
> > > > > > ignores this? An alternative way is to have a separate
> > > > > > DeleteFeaturesRequest
> > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > increasing
> > > > > > version of the metadata for finalized features." I am wondering
> why
> > > the
> > > > > > ordering is important?
> > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > > >
> > > > > > 101. For the broker registration ZK node, should we bump up the
> > > version
> > > > > in
> > > > > > the json?
> > > > > >
> > > > > > 102. For the /features ZK node, not sure if we need the epoch
> > field.
> > > > Each
> > > > > > ZK node has an internal version field that is incremented on
> every
> > > > > update.
> > > > > >
> > > > > > 103. "Enabling the actual semantics of a feature version
> > cluster-wide
> > > > is
> > > > > > left to the discretion of the logic implementing the feature (ex:
> > can
> > > > be
> > > > > > done via dynamic broker config)." Does that mean the broker
> > > > registration
> > > > > ZK
> > > > > > node will be updated dynamically when this happens?
> > > > > >
> > > > > > 104. UpdateMetadataRequest
> > > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > > included
> > > > > > in the request. My understanding is that it's only included if
> (1)
> > > > there
> > > > > is
> > > > > > a change to the finalized feature; (2) broker restart; (3)
> > controller
> > > > > > failover.
> > > > > > 104.2 The new fields have the following versions. Why are the
> > > versions
> > > > 3+
> > > > > > when the top version is bumped to 6?
> > > > > >       "fields":  [
> > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > >           "about": "The name of the feature."},
> > > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > > >           "about": "The finalized version for the feature."}
> > > > > >       ]
> > > > > >
> > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> > it's
> > > > > better
> > > > > > to use enable/disable?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > kprakasam@confluent.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Boyang,
> > > > > > >
> > > > > > > Thanks for the great feedback! I have updated the KIP based on
> > your
> > > > > > > feedback.
> > > > > > > Please find my response below for your comments, look for
> > sentences
> > > > > > > starting
> > > > > > > with "(Kowshik)" below.
> > > > > > >
> > > > > > >
> > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > traffic"
> > > > > > could
> > > > > > > be
> > > > > > > > converted as "When is it safe for the brokers to start
> serving
> > > new
> > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier
> > > in
> > > > > the
> > > > > > > > context.
> > > > > > >
> > > > > > > (Kowshik): Great point! Done.
> > > > > > >
> > > > > > > > 2. In the *Explanation *section, the metadata version number
> > part
> > > > > > seems a
> > > > > > > > bit blurred. Could you point a reference to later section
> that
> > we
> > > > > going
> > > > > > > to
> > > > > > > > store it in Zookeeper and update it every time when there is
> a
> > > > > feature
> > > > > > > > change?
> > > > > > >
> > > > > > > (Kowshik): Great point! Done. I've added a reference in the
> KIP.
> > > > > > >
> > > > > > >
> > > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > > KIP,
> > > > > for
> > > > > > > > features such as group coordinator semantics, there is no
> legal
> > > > > > scenario
> > > > > > > to
> > > > > > > > perform a downgrade at all. So having downgrade door open is
> > > pretty
> > > > > > > > error-prone as human faults happen all the time. I'm assuming
> > as
> > > > new
> > > > > > > > features are implemented, it's not very hard to add a flag
> > during
> > > > > > feature
> > > > > > > > creation to indicate whether this feature is "downgradable".
> > > Could
> > > > > you
> > > > > > > > explain a bit more on the extra engineering effort for
> shipping
> > > > this
> > > > > > KIP
> > > > > > > > with downgrade protection in place?
> > > > > > >
> > > > > > > (Kowshik): Great point! I'd agree and disagree here. While I
> > agree
> > > > that
> > > > > > > accidental
> > > > > > > downgrades can cause problems, I also think sometimes
> downgrades
> > > > should
> > > > > > > be allowed for emergency reasons (not all downgrades cause
> > issues).
> > > > > > > It is just subjective to the feature being downgraded.
> > > > > > >
> > > > > > > To be more strict about feature version downgrades, I have
> > modified
> > > > the
> > > > > > KIP
> > > > > > > proposing that we mandate a `--force-downgrade` flag be used in
> > the
> > > > > > > UPDATE_FEATURES api
> > > > > > > and the tooling, whenever the human is downgrading a finalized
> > > > feature
> > > > > > > version.
> > > > > > > Hopefully this should cover the requirement, until we find the
> > need
> > > > for
> > > > > > > advanced downgrade support.
> > > > > > >
> > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> will
> > > be
> > > > > > > defined
> > > > > > > > in the broker code." So this means in order to restrict a
> > certain
> > > > > > > feature,
> > > > > > > > we need to start the broker first and then send a feature
> > gating
> > > > > > request
> > > > > > > > immediately, which introduces a time gap and the
> > > intended-to-close
> > > > > > > feature
> > > > > > > > could actually serve request during this phase. Do you think
> we
> > > > > should
> > > > > > > also
> > > > > > > > support configurations as well so that admin user could
> freely
> > > roll
> > > > > up
> > > > > > a
> > > > > > > > cluster with all nodes complying the same feature gating,
> > without
> > > > > > > worrying
> > > > > > > > about the turnaround time to propagate the message only after
> > the
> > > > > > cluster
> > > > > > > > starts up?
> > > > > > >
> > > > > > > (Kowshik): This is a great point/question. One of the
> > expectations
> > > > out
> > > > > of
> > > > > > > this KIP, which is
> > > > > > > already followed in the broker, is the following.
> > > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > > > presence
> > > > > in
> > > > > > > ZK,
> > > > > > >    along with advertising it’s supported features.
> > > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > > UpdateMetadataRequest
> > > > > > >    from the controller, which contains the latest finalized
> > > features
> > > > as
> > > > > > > seen by
> > > > > > >    the controller. The broker validates this data against it’s
> > > > > supported
> > > > > > > features to
> > > > > > >    make sure there is no mismatch (it will shutdown if there is
> > an
> > > > > > > incompatibility).
> > > > > > >
> > > > > > > It is expected that during the time between the 2 events T1 and
> > T2,
> > > > the
> > > > > > > broker is
> > > > > > > almost a silent entity in the cluster. It does not add any
> value
> > to
> > > > the
> > > > > > > cluster, or carry
> > > > > > > out any important broker activities. By “important”, I mean it
> is
> > > not
> > > > > > doing
> > > > > > > mutations
> > > > > > > on it’s persistence, not mutating critical in-memory state,
> won’t
> > > be
> > > > > > > serving
> > > > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > > > partitions
> > > > > > > until
> > > > > > > it receives UpdateMetadataRequest from controller. Anything the
> > > > broker
> > > > > is
> > > > > > > doing up
> > > > > > > until this point is not damaging/useful.
> > > > > > >
> > > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > .
> > > > > > >
> > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > Feature",
> > > > > > may
> > > > > > > be
> > > > > > > > I misunderstood something, I thought the features are defined
> > in
> > > > > broker
> > > > > > > > code, so admin could not really create a new feature?
> > > > > > >
> > > > > > > (Kowshik): Great point! You understood this right. Here adding
> a
> > > > > feature
> > > > > > > means we are
> > > > > > > adding a cluster-wide finalized *max* version for a feature
> that
> > > was
> > > > > > > previously never finalized.
> > > > > > > I have clarified this in the KIP now.
> > > > > > >
> > > > > > > > 6. I think we need a separate error code like
> > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > to
> > > > > > > > reject a concurrent feature update request.
> > > > > > >
> > > > > > > (Kowshik): Great point! I have modified the KIP adding the
> above
> > > (see
> > > > > > > 'Tooling support -> Admin API changes').
> > > > > > >
> > > > > > > > 7. I think we haven't discussed the alternative solution to
> > pass
> > > > the
> > > > > > > > feature information through Zookeeper. Is that mentioned in
> the
> > > KIP
> > > > > to
> > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > >
> > > > > > > (Kowshik): Nice question! The broker reads finalized feature
> info
> > > > > stored
> > > > > > in
> > > > > > > ZK,
> > > > > > > only during startup when it does a validation. When serving
> > > > > > > `ApiVersionsRequest`, the
> > > > > > > broker does not read this info from ZK directly. I'd imagine
> the
> > > risk
> > > > > is
> > > > > > > that it can increase
> > > > > > > the ZK read QPS which can be a bottleneck for the system.
> Today,
> > in
> > > > > Kafka
> > > > > > > we use the
> > > > > > > controller to fan out ZK updates to brokers and we want to
> stick
> > to
> > > > > that
> > > > > > > pattern to avoid
> > > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > > >
> > > > > > > > 8. I was under the impression that user could configure a
> range
> > > of
> > > > > > > > supported versions, what's the trade-off for allowing single
> > > > > finalized
> > > > > > > > version only?
> > > > > > >
> > > > > > > (Kowshik): Great question! The finalized version of a feature
> > > > basically
> > > > > > > refers to
> > > > > > > the cluster-wide finalized feature "maximum" version. For
> > example,
> > > if
> > > > > the
> > > > > > > 'group_coordinator' feature
> > > > > > > has the finalized version set to 10, then, it means that
> > > cluster-wide
> > > > > all
> > > > > > > versions upto v10 are
> > > > > > > supported for this feature. However, note that if some version
> > (ex:
> > > > v0)
> > > > > > > gets deprecated
> > > > > > > for this feature, then we don’t convey that using this scheme
> > (also
> > > > > > > supporting deprecation is a non-goal).
> > > > > > >
> > > > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > > > finalized
> > > > > > > feature "maximum" versions.
> > > > > > >
> > > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> > be
> > > a
> > > > > > > producer
> > > > > > >
> > > > > > > (Kowshik): Great point! Done.
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > reluctanthero104@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Kowshik,
> > > > > > > >
> > > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > > >
> > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > traffic"
> > > > > > could
> > > > > > > be
> > > > > > > > converted as "When is it safe for the brokers to start
> serving
> > > new
> > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier
> > > in
> > > > > the
> > > > > > > > context.
> > > > > > > >
> > > > > > > > 2. In the *Explanation *section, the metadata version number
> > part
> > > > > > seems a
> > > > > > > > bit blurred. Could you point a reference to later section
> that
> > we
> > > > > going
> > > > > > > to
> > > > > > > > store it in Zookeeper and update it every time when there is
> a
> > > > > feature
> > > > > > > > change?
> > > > > > > >
> > > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > > KIP,
> > > > > for
> > > > > > > > features such as group coordinator semantics, there is no
> legal
> > > > > > scenario
> > > > > > > to
> > > > > > > > perform a downgrade at all. So having downgrade door open is
> > > pretty
> > > > > > > > error-prone as human faults happen all the time. I'm assuming
> > as
> > > > new
> > > > > > > > features are implemented, it's not very hard to add a flag
> > during
> > > > > > feature
> > > > > > > > creation to indicate whether this feature is "downgradable".
> > > Could
> > > > > you
> > > > > > > > explain a bit more on the extra engineering effort for
> shipping
> > > > this
> > > > > > KIP
> > > > > > > > with downgrade protection in place?
> > > > > > > >
> > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> will
> > > be
> > > > > > > defined
> > > > > > > > in the broker code." So this means in order to restrict a
> > certain
> > > > > > > feature,
> > > > > > > > we need to start the broker first and then send a feature
> > gating
> > > > > > request
> > > > > > > > immediately, which introduces a time gap and the
> > > intended-to-close
> > > > > > > feature
> > > > > > > > could actually serve request during this phase. Do you think
> we
> > > > > should
> > > > > > > also
> > > > > > > > support configurations as well so that admin user could
> freely
> > > roll
> > > > > up
> > > > > > a
> > > > > > > > cluster with all nodes complying the same feature gating,
> > without
> > > > > > > worrying
> > > > > > > > about the turnaround time to propagate the message only after
> > the
> > > > > > cluster
> > > > > > > > starts up?
> > > > > > > >
> > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > Feature",
> > > > > > may
> > > > > > > be
> > > > > > > > I misunderstood something, I thought the features are defined
> > in
> > > > > broker
> > > > > > > > code, so admin could not really create a new feature?
> > > > > > > >
> > > > > > > > 6. I think we need a separate error code like
> > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > to
> > > > > > > > reject a concurrent feature update request.
> > > > > > > >
> > > > > > > > 7. I think we haven't discussed the alternative solution to
> > pass
> > > > the
> > > > > > > > feature information through Zookeeper. Is that mentioned in
> the
> > > KIP
> > > > > to
> > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > >
> > > > > > > > 8. I was under the impression that user could configure a
> range
> > > of
> > > > > > > > supported versions, what's the trade-off for allowing single
> > > > > finalized
> > > > > > > > version only?
> > > > > > > >
> > > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> > be
> > > a
> > > > > > > producer
> > > > > > > >
> > > > > > > > Boyang
> > > > > > > >
> > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> > cmccabe@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > > Hi Colin,
> > > > > > > > > >
> > > > > > > > > > Thanks for the feedback! I've changed the KIP to address
> > your
> > > > > > > > > > suggestions.
> > > > > > > > > > Please find below my explanation. Here is a link to KIP
> > 584:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > 1. '__data_version__' is the version of the finalized
> > feature
> > > > > > > metadata
> > > > > > > > > > (i.e. actual ZK node contents), while the
> > > '__schema_version__'
> > > > is
> > > > > > the
> > > > > > > > > > version of the schema of the data persisted in ZK. These
> > > serve
> > > > > > > > different
> > > > > > > > > > purposes. '__data_version__' is is useful mainly to
> clients
> > > > > during
> > > > > > > > reads,
> > > > > > > > > > to differentiate between the 2 versions of eventually
> > > > consistent
> > > > > > > > > 'finalized
> > > > > > > > > > features' metadata (i.e. larger metadata version is more
> > > > recent).
> > > > > > > > > > '__schema_version__' provides an additional degree of
> > > > > flexibility,
> > > > > > > > where
> > > > > > > > > if
> > > > > > > > > > we decide to change the schema for '/features' node in ZK
> > (in
> > > > the
> > > > > > > > > future),
> > > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > > serialization/deserialization of the ZK data can be
> handled
> > > > > > safely).
> > > > > > > > >
> > > > > > > > > Hi Kowshik,
> > > > > > > > >
> > > > > > > > > If you're talking about a number that lets you know if data
> > is
> > > > more
> > > > > > or
> > > > > > > > > less recent, we would typically call that an epoch, and
> not a
> > > > > > version.
> > > > > > > > For
> > > > > > > > > the ZK data structures, the word "version" is typically
> > > reserved
> > > > > for
> > > > > > > > > describing changes to the overall schema of the data that
> is
> > > > > written
> > > > > > to
> > > > > > > > > ZooKeeper.  We don't even really change the "version" of
> > those
> > > > > > schemas
> > > > > > > > that
> > > > > > > > > much, since most changes are backwards-compatible.  But we
> do
> > > > > include
> > > > > > > > that
> > > > > > > > > version field just in case.
> > > > > > > > >
> > > > > > > > > I don't think we really need an epoch here, though, since
> we
> > > can
> > > > > just
> > > > > > > > look
> > > > > > > > > at the broker epoch.  Whenever the broker registers, its
> > epoch
> > > > will
> > > > > > be
> > > > > > > > > greater than the previous broker epoch.  And the newly
> > > registered
> > > > > > data
> > > > > > > > will
> > > > > > > > > take priority.  This will be a lot simpler than adding a
> > > separate
> > > > > > epoch
> > > > > > > > > system, I think.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. Regarding admin client needing min and max
> information -
> > > you
> > > > > are
> > > > > > > > > right!
> > > > > > > > > > I've changed the KIP such that the Admin API also allows
> > the
> > > > user
> > > > > > to
> > > > > > > > read
> > > > > > > > > > 'supported features' from a specific broker. Please look
> at
> > > the
> > > > > > > section
> > > > > > > > > > "Admin API changes".
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > > deliberate.
> > > > > > > I've
> > > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > > >
> > > > > > > > > Sounds good.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> > right!
> > > > > I've
> > > > > > > > > updated
> > > > > > > > > > the KIP sketching the functionality provided by this
> tool,
> > > with
> > > > > > some
> > > > > > > > > > examples. Please look at the section "Tooling support
> > > > examples".
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks, Kowshik.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > cmccabe@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > >
> > > > > > > > > > > In the "Schema" section, do we really need both
> > > > > > __schema_version__
> > > > > > > > and
> > > > > > > > > > > __data_version__?  Can we just have a single version
> > field
> > > > > here?
> > > > > > > > > > >
> > > > > > > > > > > Shouldn't the Admin(Client) function have some way to
> get
> > > the
> > > > > min
> > > > > > > and
> > > > > > > > > max
> > > > > > > > > > > information that we're exposing as well?  I guess we
> > could
> > > > have
> > > > > > > min,
> > > > > > > > > max,
> > > > > > > > > > > and current.  Unrelated: is the use of Long rather than
> > > long
> > > > > > > > deliberate
> > > > > > > > > > > here?
> > > > > > > > > > >
> > > > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> > > flags
> > > > > that
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > > >
> > > > > > > > > > > cheers,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > I've opened KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>
> > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > >
> > > > > > > > > > > > which
> > > > > > > > > > > > is intended to provide a versioning scheme for
> > features.
> > > > I'd
> > > > > > like
> > > > > > > > to
> > > > > > > > > use
> > > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > > > feedback
> > > > > on
> > > > > > > > this.
> > > > > > > > > > > > Here
> > > > > > > > > > > > is a link to KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>
> > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > >  .
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you!
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Kowshik
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Kowshik,

For 2) above, my motivations is more from the flexibility on client side
instead of version deprecation: let's say a client talks to the cluster
learned that the cluster-wide version for feature is X, while the client
itself only knows how to execute the feature up to version Y ( < X), then
at the moment the client has to give up leveraging that since it is not
sure if all brokers actually supports version Y or not. This is because the
version X is only guaranteed to be within the common overlapping range of
all [low, high] across brokers where "low" is not always 0, so the client
cannot safely assume that any versions smaller than X are also supported on
the cluster.

If we assume that when cluster-wide version is X, then all versions smaller
than X are guaranteed to be supported, then it means all broker's supported
version range is like [0, high], which I think is not realistic?


Guozhang



On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the reply. A few more replies below.
>
> 100.6 You can look for the sentence "This operation requires ALTER on
> CLUSTER." in KIP-455. Also, you can check its usage in
> KafkaApis.authorize().
>
> 110. From the external client/tooling perspective, it's more natural to use
> the release version for features. If we can use the same release version
> for internal representation, it seems simpler (easier to understand, no
> mapping overhead, etc). Is there a benefit with separate external and
> internal versioning schemes?
>
> 111. To put this in context, when we had IBP, the default value is the
> current released version. So, if you are a brand new user, you don't need
> to configure IBP and all new features will be immediately available in the
> new cluster. If you are upgrading from an old version, you do need to
> understand and configure IBP. I see a similar pattern here for
> features. From the ease of use perspective, ideally, we shouldn't require a
> new user to have an extra step such as running a bootstrap script unless
> it's truly necessary. If someone has a special need (all the cases you
> mentioned seem special cases?), they can configure a mode such that
> features are enabled/disabled manually.
>
> Jun
>
> On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hi Jun,
> >
> > Thanks for the feedback and suggestions. Please find my response below.
> >
> > > 100.6 For every new request, the admin needs to control who is allowed
> to
> > > issue that request if security is enabled. So, we need to assign the
> new
> > > request a ResourceType and possible AclOperations. See
> > >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as an example.
> >
> > (Kowshik): I don't see any reference to the words ResourceType or
> > AclOperations
> > in the KIP. Please let me know how I can use the KIP that you linked to
> > know how to
> > setup the appropriate ResourceType and/or ClusterOperation?
> >
> > > 105. If we change delete to disable, it's better to do this
> consistently
> > in
> > > request protocol and admin api as well.
> >
> > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > feature.
> > I've just changed the KIP to use 'delete'. I don't have a strong
> > preference.
> >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > for new features to be included in minor releases too. Should we make
> the
> > > feature versioning match the release versioning?
> >
> > (Kowshik): The release version can be mapped to a set of feature
> versions,
> > and this can be done, for example in the tool (or even external to the
> > tool).
> > Can you please clarify what I'm missing?
> >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version
> after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> >
> > (Kowshik): I agree that there is a trade-off here, but it will help
> > to decide whether the automation can be thought through in the future
> > in a follow up KIP, or right now in this KIP. We may invest
> > in automation, but we have to decide whether we should do it
> > now or later.
> >
> > For the inconvenience that you mentioned, do you think the problem that
> you
> > mentioned can be  overcome by asking for the cluster operator to run a
> > bootstrap script  when he/she knows that a specific AK release has been
> > almost completely deployed in a cluster for the first time? Idea is that
> > the
> > bootstrap script will know how to map a specific AK release to finalized
> > feature versions, and run the `kafka-features.sh` tool appropriately
> > against
> > the cluster.
> >
> > Now, coming back to your automation proposal/question.
> > I do see the value of automated feature version finalization, but I also
> > see
> > that this will open up several questions and some risks, as explained
> > below.
> > The answers to these depend on the definition of the automation we choose
> > to build, and how well does it fit into a kafka deployment.
> > Basically, it can be unsafe for the controller to finalize feature
> version
> > upgrades automatically, without learning about the intent of the cluster
> > operator.
> > 1. We would sometimes want to lock feature versions only when we have
> > externally verified
> > the stability of the broker binary.
> > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > complete,
> > and new brokers are highly unlikely to join the cluster.
> > 3. Only the cluster operator knows that the intent is to deploy the same
> > version
> > of the new broker release across the entire cluster (i.e. the latest
> > downloaded version).
> > 4. For downgrades, it appears the controller still needs some external
> > input
> > (such as the proposed tool) to finalize a feature version downgrade.
> >
> > If we have automation, that automation can end up failing in some of the
> > cases
> > above. Then, we need a way to declare that the cluster is "not ready" if
> > the
> > controller cannot automatically finalize some basic required feature
> > version
> > upgrades across the cluster. We need to make the cluster operator aware
> in
> > such a scenario (raise an alert or alike).
> >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> of
> > 48.
> >
> > (Kowshik): Done.
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the reply. A few more comments below.
> > >
> > > 100.6 For every new request, the admin needs to control who is allowed
> to
> > > issue that request if security is enabled. So, we need to assign the
> new
> > > request a ResourceType and possible AclOperations. See
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as
> > > an example.
> > >
> > > 105. If we change delete to disable, it's better to do this
> consistently
> > in
> > > request protocol and admin api as well.
> > >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > for new features to be included in minor releases too. Should we make
> the
> > > feature versioning match the release versioning?
> > >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version
> after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> > >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> of
> > > 48.
> > >
> > > Jun
> > >
> > >
> > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> kprakasam@confluent.io>
> > > wrote:
> > >
> > > > Hey Jun,
> > > >
> > > > Thanks a lot for the great feedback! Please note that the design
> > > > has changed a little bit on the KIP, and we now propagate the
> finalized
> > > > features metadata only via ZK watches (instead of
> UpdateMetadataRequest
> > > > from the controller).
> > > >
> > > > Please find below my response to your questions/feedback, with the
> > prefix
> > > > "(Kowshik):".
> > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should
> we
> > > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > >
> > > > (Kowshik): Great point! Done. I have added a timeout field. Note: we
> no
> > > > longer
> > > > wait for responses from brokers, since the design has been changed so
> > > that
> > > > the
> > > > features information is propagated via ZK. Nevertheless, it is right
> to
> > > > have a timeout
> > > > for the request.
> > > >
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > > request.
> > > >
> > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > error
> > > > code and a message.
> > > > Previously it was not echoing the "request", rather it was returning
> > the
> > > > latest set of
> > > > cluster-wide finalized features (after applying the updates). But you
> > are
> > > > right,
> > > > the additional info is not required, so I have removed it from the
> > > response
> > > > schema.
> > > >
> > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > features?
> > > >
> > > > (Kowshik): This is already present in the KIP via the
> > 'DescribeFeatures'
> > > > Admin API,
> > > > which, underneath covers uses the ApiVersionsRequest to list/describe
> > the
> > > > existing features. Please read the 'Tooling support' section.
> > > >
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> For
> > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > > just
> > > > > ignores this? An alternative way is to have a separate
> > > > DeleteFeaturesRequest
> > > >
> > > > (Kowshik): Great point! I have modified the KIP now to have 2
> separate
> > > > controller APIs
> > > > serving these different purposes:
> > > > 1. updateFeatures
> > > > 2. deleteFeatures
> > > >
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > >
> > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > > > version), and
> > > > it is just the ZK node version. Basically, this is the epoch for the
> > > > cluster-wide
> > > > finalized feature version metadata. This metadata is served to
> clients
> > > via
> > > > the
> > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > '/features'
> > > > ZK node
> > > > to all brokers, via ZK watches setup by each broker on the
> '/features'
> > > > node.
> > > >
> > > > Now here is why the ordering is important:
> > > > ZK watches don't propagate at the same time. As a result, the
> > > > ApiVersionsResponse
> > > > is eventually consistent across brokers. This can introduce cases
> > > > where clients see an older lower epoch of the features metadata,
> after
> > a
> > > > more recent
> > > > higher epoch was returned at a previous point in time. We expect
> > clients
> > > > to always employ the rule that the latest received higher epoch of
> > > metadata
> > > > always trumps an older smaller epoch. Those clients that are external
> > to
> > > > Kafka should strongly consider discovering the latest metadata once
> > > during
> > > > startup from the brokers, and if required refresh the metadata
> > > periodically
> > > > (to get the latest metadata).
> > > >
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > >
> > > > (Kowshik): What is ACL, and how could I find out which one to
> specify?
> > > > Please could you provide me some pointers? I'll be glad to update the
> > > > KIP once I know the next steps.
> > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > the json?
> > > >
> > > > (Kowshik): Great point! Done. I've increased the version in the
> broker
> > > json
> > > > by 1.
> > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > >
> > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > instead
> > > of
> > > > explicitly
> > > > incremented epoch.
> > > >
> > > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > > is
> > > > > left to the discretion of the logic implementing the feature (ex:
> can
> > > be
> > > > > done via dynamic broker config)." Does that mean the broker
> > > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > >
> > > > (Kowshik): Not really. The text was just conveying that a broker
> could
> > > > "know" of
> > > > a new feature version, but it does not mean the broker should have
> also
> > > > activated the effects of the feature version. Knowing vs activation
> > are 2
> > > > separate things,
> > > > and the latter can be achieved by dynamic config. I have reworded the
> > > text
> > > > to
> > > > make this clear to the reader.
> > > >
> > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions
> > > 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > >
> > > > (Kowshik): With the new improved design, we have completely
> eliminated
> > > the
> > > > need to
> > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> deliver
> > > the
> > > > notifications for changes to the '/features' ZK node.
> > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > better
> > > > > to use enable/disable?
> > > >
> > > > (Kowshik): For delete, yes, I have changed it so that we instead call
> > it
> > > > 'disable'.
> > > > However for 'update', it can now also refer to either an upgrade or a
> > > > forced downgrade.
> > > > Therefore, I have left it the way it is, just calling it as just
> > > 'update'.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should
> we
> > > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > > request.
> > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > features?
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> For
> > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > > just
> > > > > ignores this? An alternative way is to have a separate
> > > > > DeleteFeaturesRequest
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > > the json?
> > > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > > >
> > > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > > is
> > > > > left to the discretion of the logic implementing the feature (ex:
> can
> > > be
> > > > > done via dynamic broker config)." Does that mean the broker
> > > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions
> > > 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > better
> > > > > to use enable/disable?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > kprakasam@confluent.io
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey Boyang,
> > > > > >
> > > > > > Thanks for the great feedback! I have updated the KIP based on
> your
> > > > > > feedback.
> > > > > > Please find my response below for your comments, look for
> sentences
> > > > > > starting
> > > > > > with "(Kowshik)" below.
> > > > > >
> > > > > >
> > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > traffic"
> > > > > could
> > > > > > be
> > > > > > > converted as "When is it safe for the brokers to start serving
> > new
> > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> > in
> > > > the
> > > > > > > context.
> > > > > >
> > > > > > (Kowshik): Great point! Done.
> > > > > >
> > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > seems a
> > > > > > > bit blurred. Could you point a reference to later section that
> we
> > > > going
> > > > > > to
> > > > > > > store it in Zookeeper and update it every time when there is a
> > > > feature
> > > > > > > change?
> > > > > >
> > > > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > > > >
> > > > > >
> > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > KIP,
> > > > for
> > > > > > > features such as group coordinator semantics, there is no legal
> > > > > scenario
> > > > > > to
> > > > > > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > > > > > error-prone as human faults happen all the time. I'm assuming
> as
> > > new
> > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > feature
> > > > > > > creation to indicate whether this feature is "downgradable".
> > Could
> > > > you
> > > > > > > explain a bit more on the extra engineering effort for shipping
> > > this
> > > > > KIP
> > > > > > > with downgrade protection in place?
> > > > > >
> > > > > > (Kowshik): Great point! I'd agree and disagree here. While I
> agree
> > > that
> > > > > > accidental
> > > > > > downgrades can cause problems, I also think sometimes downgrades
> > > should
> > > > > > be allowed for emergency reasons (not all downgrades cause
> issues).
> > > > > > It is just subjective to the feature being downgraded.
> > > > > >
> > > > > > To be more strict about feature version downgrades, I have
> modified
> > > the
> > > > > KIP
> > > > > > proposing that we mandate a `--force-downgrade` flag be used in
> the
> > > > > > UPDATE_FEATURES api
> > > > > > and the tooling, whenever the human is downgrading a finalized
> > > feature
> > > > > > version.
> > > > > > Hopefully this should cover the requirement, until we find the
> need
> > > for
> > > > > > advanced downgrade support.
> > > > > >
> > > > > > > 4. "Each broker’s supported dictionary of feature versions will
> > be
> > > > > > defined
> > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > feature,
> > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > request
> > > > > > > immediately, which introduces a time gap and the
> > intended-to-close
> > > > > > feature
> > > > > > > could actually serve request during this phase. Do you think we
> > > > should
> > > > > > also
> > > > > > > support configurations as well so that admin user could freely
> > roll
> > > > up
> > > > > a
> > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > worrying
> > > > > > > about the turnaround time to propagate the message only after
> the
> > > > > cluster
> > > > > > > starts up?
> > > > > >
> > > > > > (Kowshik): This is a great point/question. One of the
> expectations
> > > out
> > > > of
> > > > > > this KIP, which is
> > > > > > already followed in the broker, is the following.
> > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > > presence
> > > > in
> > > > > > ZK,
> > > > > >    along with advertising it’s supported features.
> > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > UpdateMetadataRequest
> > > > > >    from the controller, which contains the latest finalized
> > features
> > > as
> > > > > > seen by
> > > > > >    the controller. The broker validates this data against it’s
> > > > supported
> > > > > > features to
> > > > > >    make sure there is no mismatch (it will shutdown if there is
> an
> > > > > > incompatibility).
> > > > > >
> > > > > > It is expected that during the time between the 2 events T1 and
> T2,
> > > the
> > > > > > broker is
> > > > > > almost a silent entity in the cluster. It does not add any value
> to
> > > the
> > > > > > cluster, or carry
> > > > > > out any important broker activities. By “important”, I mean it is
> > not
> > > > > doing
> > > > > > mutations
> > > > > > on it’s persistence, not mutating critical in-memory state, won’t
> > be
> > > > > > serving
> > > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > > partitions
> > > > > > until
> > > > > > it receives UpdateMetadataRequest from controller. Anything the
> > > broker
> > > > is
> > > > > > doing up
> > > > > > until this point is not damaging/useful.
> > > > > >
> > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > .
> > > > > >
> > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > Feature",
> > > > > may
> > > > > > be
> > > > > > > I misunderstood something, I thought the features are defined
> in
> > > > broker
> > > > > > > code, so admin could not really create a new feature?
> > > > > >
> > > > > > (Kowshik): Great point! You understood this right. Here adding a
> > > > feature
> > > > > > means we are
> > > > > > adding a cluster-wide finalized *max* version for a feature that
> > was
> > > > > > previously never finalized.
> > > > > > I have clarified this in the KIP now.
> > > > > >
> > > > > > > 6. I think we need a separate error code like
> > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > to
> > > > > > > reject a concurrent feature update request.
> > > > > >
> > > > > > (Kowshik): Great point! I have modified the KIP adding the above
> > (see
> > > > > > 'Tooling support -> Admin API changes').
> > > > > >
> > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > > the
> > > > > > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > > to
> > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > >
> > > > > > (Kowshik): Nice question! The broker reads finalized feature info
> > > > stored
> > > > > in
> > > > > > ZK,
> > > > > > only during startup when it does a validation. When serving
> > > > > > `ApiVersionsRequest`, the
> > > > > > broker does not read this info from ZK directly. I'd imagine the
> > risk
> > > > is
> > > > > > that it can increase
> > > > > > the ZK read QPS which can be a bottleneck for the system. Today,
> in
> > > > Kafka
> > > > > > we use the
> > > > > > controller to fan out ZK updates to brokers and we want to stick
> to
> > > > that
> > > > > > pattern to avoid
> > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > >
> > > > > > > 8. I was under the impression that user could configure a range
> > of
> > > > > > > supported versions, what's the trade-off for allowing single
> > > > finalized
> > > > > > > version only?
> > > > > >
> > > > > > (Kowshik): Great question! The finalized version of a feature
> > > basically
> > > > > > refers to
> > > > > > the cluster-wide finalized feature "maximum" version. For
> example,
> > if
> > > > the
> > > > > > 'group_coordinator' feature
> > > > > > has the finalized version set to 10, then, it means that
> > cluster-wide
> > > > all
> > > > > > versions upto v10 are
> > > > > > supported for this feature. However, note that if some version
> (ex:
> > > v0)
> > > > > > gets deprecated
> > > > > > for this feature, then we don’t convey that using this scheme
> (also
> > > > > > supporting deprecation is a non-goal).
> > > > > >
> > > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > > finalized
> > > > > > feature "maximum" versions.
> > > > > >
> > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> be
> > a
> > > > > > producer
> > > > > >
> > > > > > (Kowshik): Great point! Done.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > reluctanthero104@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Kowshik,
> > > > > > >
> > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > >
> > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > traffic"
> > > > > could
> > > > > > be
> > > > > > > converted as "When is it safe for the brokers to start serving
> > new
> > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> > in
> > > > the
> > > > > > > context.
> > > > > > >
> > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > seems a
> > > > > > > bit blurred. Could you point a reference to later section that
> we
> > > > going
> > > > > > to
> > > > > > > store it in Zookeeper and update it every time when there is a
> > > > feature
> > > > > > > change?
> > > > > > >
> > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > KIP,
> > > > for
> > > > > > > features such as group coordinator semantics, there is no legal
> > > > > scenario
> > > > > > to
> > > > > > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > > > > > error-prone as human faults happen all the time. I'm assuming
> as
> > > new
> > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > feature
> > > > > > > creation to indicate whether this feature is "downgradable".
> > Could
> > > > you
> > > > > > > explain a bit more on the extra engineering effort for shipping
> > > this
> > > > > KIP
> > > > > > > with downgrade protection in place?
> > > > > > >
> > > > > > > 4. "Each broker’s supported dictionary of feature versions will
> > be
> > > > > > defined
> > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > feature,
> > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > request
> > > > > > > immediately, which introduces a time gap and the
> > intended-to-close
> > > > > > feature
> > > > > > > could actually serve request during this phase. Do you think we
> > > > should
> > > > > > also
> > > > > > > support configurations as well so that admin user could freely
> > roll
> > > > up
> > > > > a
> > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > worrying
> > > > > > > about the turnaround time to propagate the message only after
> the
> > > > > cluster
> > > > > > > starts up?
> > > > > > >
> > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > Feature",
> > > > > may
> > > > > > be
> > > > > > > I misunderstood something, I thought the features are defined
> in
> > > > broker
> > > > > > > code, so admin could not really create a new feature?
> > > > > > >
> > > > > > > 6. I think we need a separate error code like
> > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > to
> > > > > > > reject a concurrent feature update request.
> > > > > > >
> > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > > the
> > > > > > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > > to
> > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > >
> > > > > > > 8. I was under the impression that user could configure a range
> > of
> > > > > > > supported versions, what's the trade-off for allowing single
> > > > finalized
> > > > > > > version only?
> > > > > > >
> > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> be
> > a
> > > > > > producer
> > > > > > >
> > > > > > > Boyang
> > > > > > >
> > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> cmccabe@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > Hi Colin,
> > > > > > > > >
> > > > > > > > > Thanks for the feedback! I've changed the KIP to address
> your
> > > > > > > > > suggestions.
> > > > > > > > > Please find below my explanation. Here is a link to KIP
> 584:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > 1. '__data_version__' is the version of the finalized
> feature
> > > > > > metadata
> > > > > > > > > (i.e. actual ZK node contents), while the
> > '__schema_version__'
> > > is
> > > > > the
> > > > > > > > > version of the schema of the data persisted in ZK. These
> > serve
> > > > > > > different
> > > > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > > > during
> > > > > > > reads,
> > > > > > > > > to differentiate between the 2 versions of eventually
> > > consistent
> > > > > > > > 'finalized
> > > > > > > > > features' metadata (i.e. larger metadata version is more
> > > recent).
> > > > > > > > > '__schema_version__' provides an additional degree of
> > > > flexibility,
> > > > > > > where
> > > > > > > > if
> > > > > > > > > we decide to change the schema for '/features' node in ZK
> (in
> > > the
> > > > > > > > future),
> > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > serialization/deserialization of the ZK data can be handled
> > > > > safely).
> > > > > > > >
> > > > > > > > Hi Kowshik,
> > > > > > > >
> > > > > > > > If you're talking about a number that lets you know if data
> is
> > > more
> > > > > or
> > > > > > > > less recent, we would typically call that an epoch, and not a
> > > > > version.
> > > > > > > For
> > > > > > > > the ZK data structures, the word "version" is typically
> > reserved
> > > > for
> > > > > > > > describing changes to the overall schema of the data that is
> > > > written
> > > > > to
> > > > > > > > ZooKeeper.  We don't even really change the "version" of
> those
> > > > > schemas
> > > > > > > that
> > > > > > > > much, since most changes are backwards-compatible.  But we do
> > > > include
> > > > > > > that
> > > > > > > > version field just in case.
> > > > > > > >
> > > > > > > > I don't think we really need an epoch here, though, since we
> > can
> > > > just
> > > > > > > look
> > > > > > > > at the broker epoch.  Whenever the broker registers, its
> epoch
> > > will
> > > > > be
> > > > > > > > greater than the previous broker epoch.  And the newly
> > registered
> > > > > data
> > > > > > > will
> > > > > > > > take priority.  This will be a lot simpler than adding a
> > separate
> > > > > epoch
> > > > > > > > system, I think.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 2. Regarding admin client needing min and max information -
> > you
> > > > are
> > > > > > > > right!
> > > > > > > > > I've changed the KIP such that the Admin API also allows
> the
> > > user
> > > > > to
> > > > > > > read
> > > > > > > > > 'supported features' from a specific broker. Please look at
> > the
> > > > > > section
> > > > > > > > > "Admin API changes".
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > deliberate.
> > > > > > I've
> > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > >
> > > > > > > > Sounds good.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> right!
> > > > I've
> > > > > > > > updated
> > > > > > > > > the KIP sketching the functionality provided by this tool,
> > with
> > > > > some
> > > > > > > > > examples. Please look at the section "Tooling support
> > > examples".
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks, Kowshik.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > cmccabe@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > >
> > > > > > > > > > In the "Schema" section, do we really need both
> > > > > __schema_version__
> > > > > > > and
> > > > > > > > > > __data_version__?  Can we just have a single version
> field
> > > > here?
> > > > > > > > > >
> > > > > > > > > > Shouldn't the Admin(Client) function have some way to get
> > the
> > > > min
> > > > > > and
> > > > > > > > max
> > > > > > > > > > information that we're exposing as well?  I guess we
> could
> > > have
> > > > > > min,
> > > > > > > > max,
> > > > > > > > > > and current.  Unrelated: is the use of Long rather than
> > long
> > > > > > > deliberate
> > > > > > > > > > here?
> > > > > > > > > >
> > > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> > flags
> > > > that
> > > > > > it
> > > > > > > > will
> > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > >
> > > > > > > > > > cheers,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I've opened KIP-584
> > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > >
> > > > > > > > > > > which
> > > > > > > > > > > is intended to provide a versioning scheme for
> features.
> > > I'd
> > > > > like
> > > > > > > to
> > > > > > > > use
> > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > > feedback
> > > > on
> > > > > > > this.
> > > > > > > > > > > Here
> > > > > > > > > > > is a link to KIP-584
> > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > >  .
> > > > > > > > > > >
> > > > > > > > > > > Thank you!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Kowshik
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Colin,

Thanks for the feedback, and suggestions! It is a great idea to provide a
`--finalize-latest` flag. I agree it's a burden to ask the user to manually
upgrade each feature to the latest version, after a release.

I have now updated the KIP adding this idea.

> What about a simple solution to problem this where we add a flag to the
command-line tool like --enable-latest?  The command-line tool could query
what the highest possible versions for
> each feature were (using the API) and then make another RPC to enable the
latest features.

(Kowshik): I've updated the KIP with the above idea, please look at this
section (point #3 and the tooling example later):
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport


> I think this is actually much easier than the version number solution.
The version string solution requires us to maintain a complicated mapping
table between version strings and features.  > In practice, we also have
"internal versions" in ApiVersion.scala like 2.4IV0, 2.4IV1, and so on.
This isn't simple for users to understand or use.

> It's also hard to know what the difference is between different version
strings.  For example, there's actually no difference between 2.5IV0 and
2.4IV1, but you wouldn't know that unless you > read the comments in
ApiVersion.scala.  A system administrator who didn't know this might end up
doing a cluster roll to upgrade the IBP that turned out to be unnecessary.

(Kowshik): Yes, I can see the disadvantages!


Cheers,
Kowshik



On Mon, Apr 6, 2020 at 3:46 PM Colin McCabe <cm...@apache.org> wrote:

> Hi Jun,
>
> I agree that asking the user to manually upgrade all features to the
> latest version is a burden.  Then the user has to know what the latest
> version of every feature is when upgrading.
>
> What about a simple solution to problem this where we add a flag to the
> command-line tool like --enable-latest?  The command-line tool could query
> what the highest possible versions for each feature were (using the API)
> and then make another RPC to enable the latest features.
>
> I think this is actually much easier than the version number solution.
> The version string solution requires us to maintain a complicated mapping
> table between version strings and features.  In practice, we also have
> "internal versions" in ApiVersion.scala like 2.4IV0, 2.4IV1, and so on.
> This isn't simple for users to understand or use.
>
> It's also hard to know what the difference is between different version
> strings.  For example, there's actually no difference between 2.5IV0 and
> 2.4IV1, but you wouldn't know that unless you read the comments in
> ApiVersion.scala.  A system administrator who didn't know this might end up
> doing a cluster roll to upgrade the IBP that turned out to be unnecessary.
>
> best,
> Colin
>
>
> On Mon, Apr 6, 2020, at 12:06, Jun Rao wrote:
> > Hi, Kowshik,
> >
> > Thanks for the reply. A few more replies below.
> >
> > 100.6 You can look for the sentence "This operation requires ALTER on
> > CLUSTER." in KIP-455. Also, you can check its usage in
> > KafkaApis.authorize().
> >
> > 110. From the external client/tooling perspective, it's more natural to
> use
> > the release version for features. If we can use the same release version
> > for internal representation, it seems simpler (easier to understand, no
> > mapping overhead, etc). Is there a benefit with separate external and
> > internal versioning schemes?
> >
> > 111. To put this in context, when we had IBP, the default value is the
> > current released version. So, if you are a brand new user, you don't need
> > to configure IBP and all new features will be immediately available in
> the
> > new cluster. If you are upgrading from an old version, you do need to
> > understand and configure IBP. I see a similar pattern here for
> > features. From the ease of use perspective, ideally, we shouldn't
> require a
> > new user to have an extra step such as running a bootstrap script unless
> > it's truly necessary. If someone has a special need (all the cases you
> > mentioned seem special cases?), they can configure a mode such that
> > features are enabled/disabled manually.
> >
> > Jun
> >
> > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the feedback and suggestions. Please find my response below.
> > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed to
> > > > issue that request if security is enabled. So, we need to assign the
> new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as an example.
> > >
> > > (Kowshik): I don't see any reference to the words ResourceType or
> > > AclOperations
> > > in the KIP. Please let me know how I can use the KIP that you linked to
> > > know how to
> > > setup the appropriate ResourceType and/or ClusterOperation?
> > >
> > > > 105. If we change delete to disable, it's better to do this
> consistently
> > > in
> > > > request protocol and admin api as well.
> > >
> > > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > > feature.
> > > I've just changed the KIP to use 'delete'. I don't have a strong
> > > preference.
> > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > > for new features to be included in minor releases too. Should we
> make the
> > > > feature versioning match the release versioning?
> > >
> > > (Kowshik): The release version can be mapped to a set of feature
> versions,
> > > and this can be done, for example in the tool (or even external to the
> > > tool).
> > > Can you please clarify what I'm missing?
> > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > >
> > > (Kowshik): I agree that there is a trade-off here, but it will help
> > > to decide whether the automation can be thought through in the future
> > > in a follow up KIP, or right now in this KIP. We may invest
> > > in automation, but we have to decide whether we should do it
> > > now or later.
> > >
> > > For the inconvenience that you mentioned, do you think the problem
> that you
> > > mentioned can be  overcome by asking for the cluster operator to run a
> > > bootstrap script  when he/she knows that a specific AK release has been
> > > almost completely deployed in a cluster for the first time? Idea is
> that
> > > the
> > > bootstrap script will know how to map a specific AK release to
> finalized
> > > feature versions, and run the `kafka-features.sh` tool appropriately
> > > against
> > > the cluster.
> > >
> > > Now, coming back to your automation proposal/question.
> > > I do see the value of automated feature version finalization, but I
> also
> > > see
> > > that this will open up several questions and some risks, as explained
> > > below.
> > > The answers to these depend on the definition of the automation we
> choose
> > > to build, and how well does it fit into a kafka deployment.
> > > Basically, it can be unsafe for the controller to finalize feature
> version
> > > upgrades automatically, without learning about the intent of the
> cluster
> > > operator.
> > > 1. We would sometimes want to lock feature versions only when we have
> > > externally verified
> > > the stability of the broker binary.
> > > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > > complete,
> > > and new brokers are highly unlikely to join the cluster.
> > > 3. Only the cluster operator knows that the intent is to deploy the
> same
> > > version
> > > of the new broker release across the entire cluster (i.e. the latest
> > > downloaded version).
> > > 4. For downgrades, it appears the controller still needs some external
> > > input
> > > (such as the proposed tool) to finalize a feature version downgrade.
> > >
> > > If we have automation, that automation can end up failing in some of
> the
> > > cases
> > > above. Then, we need a way to declare that the cluster is "not ready"
> if
> > > the
> > > controller cannot automatically finalize some basic required feature
> > > version
> > > upgrades across the cluster. We need to make the cluster operator
> aware in
> > > such a scenario (raise an alert or alike).
> > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> instead of
> > > 48.
> > >
> > > (Kowshik): Done.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the reply. A few more comments below.
> > > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed to
> > > > issue that request if security is enabled. So, we need to assign the
> new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as
> > > > an example.
> > > >
> > > > 105. If we change delete to disable, it's better to do this
> consistently
> > > in
> > > > request protocol and admin api as well.
> > > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > > for new features to be included in minor releases too. Should we
> make the
> > > > feature versioning match the release versioning?
> > > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> instead of
> > > > 48.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> kprakasam@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Jun,
> > > > >
> > > > > Thanks a lot for the great feedback! Please note that the design
> > > > > has changed a little bit on the KIP, and we now propagate the
> finalized
> > > > > features metadata only via ZK watches (instead of
> UpdateMetadataRequest
> > > > > from the controller).
> > > > >
> > > > > Please find below my response to your questions/feedback, with the
> > > prefix
> > > > > "(Kowshik):".
> > > > >
> > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > 100.1 Since this request waits for responses from brokers,
> should we
> > > > add
> > > > > a
> > > > > > timeout in the request (like createTopicRequest)?
> > > > >
> > > > > (Kowshik): Great point! Done. I have added a timeout field. Note:
> we no
> > > > > longer
> > > > > wait for responses from brokers, since the design has been changed
> so
> > > > that
> > > > > the
> > > > > features information is propagated via ZK. Nevertheless, it is
> right to
> > > > > have a timeout
> > > > > for the request.
> > > > >
> > > > > > 100.2 The response schema is a bit weird. Typically, the response
> > > just
> > > > > > shows an error code and an error message, instead of echoing the
> > > > request.
> > > > >
> > > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > > error
> > > > > code and a message.
> > > > > Previously it was not echoing the "request", rather it was
> returning
> > > the
> > > > > latest set of
> > > > > cluster-wide finalized features (after applying the updates). But
> you
> > > are
> > > > > right,
> > > > > the additional info is not required, so I have removed it from the
> > > > response
> > > > > schema.
> > > > >
> > > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > > features?
> > > > >
> > > > > (Kowshik): This is already present in the KIP via the
> > > 'DescribeFeatures'
> > > > > Admin API,
> > > > > which, underneath covers uses the ApiVersionsRequest to
> list/describe
> > > the
> > > > > existing features. Please read the 'Tooling support' section.
> > > > >
> > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> request. For
> > > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > > > just
> > > > > > ignores this? An alternative way is to have a separate
> > > > > DeleteFeaturesRequest
> > > > >
> > > > > (Kowshik): Great point! I have modified the KIP now to have 2
> separate
> > > > > controller APIs
> > > > > serving these different purposes:
> > > > > 1. updateFeatures
> > > > > 2. deleteFeatures
> > > > >
> > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > increasing
> > > > > > version of the metadata for finalized features." I am wondering
> why
> > > the
> > > > > > ordering is important?
> > > > >
> > > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead
> of
> > > > > version), and
> > > > > it is just the ZK node version. Basically, this is the epoch for
> the
> > > > > cluster-wide
> > > > > finalized feature version metadata. This metadata is served to
> clients
> > > > via
> > > > > the
> > > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > > '/features'
> > > > > ZK node
> > > > > to all brokers, via ZK watches setup by each broker on the
> '/features'
> > > > > node.
> > > > >
> > > > > Now here is why the ordering is important:
> > > > > ZK watches don't propagate at the same time. As a result, the
> > > > > ApiVersionsResponse
> > > > > is eventually consistent across brokers. This can introduce cases
> > > > > where clients see an older lower epoch of the features metadata,
> after
> > > a
> > > > > more recent
> > > > > higher epoch was returned at a previous point in time. We expect
> > > clients
> > > > > to always employ the rule that the latest received higher epoch of
> > > > metadata
> > > > > always trumps an older smaller epoch. Those clients that are
> external
> > > to
> > > > > Kafka should strongly consider discovering the latest metadata once
> > > > during
> > > > > startup from the brokers, and if required refresh the metadata
> > > > periodically
> > > > > (to get the latest metadata).
> > > > >
> > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > >
> > > > > (Kowshik): What is ACL, and how could I find out which one to
> specify?
> > > > > Please could you provide me some pointers? I'll be glad to update
> the
> > > > > KIP once I know the next steps.
> > > > >
> > > > > > 101. For the broker registration ZK node, should we bump up the
> > > version
> > > > > in
> > > > > the json?
> > > > >
> > > > > (Kowshik): Great point! Done. I've increased the version in the
> broker
> > > > json
> > > > > by 1.
> > > > >
> > > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > > > Each
> > > > > > ZK node has an internal version field that is incremented on
> every
> > > > > update.
> > > > >
> > > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > > instead
> > > > of
> > > > > explicitly
> > > > > incremented epoch.
> > > > >
> > > > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > > > is
> > > > > > left to the discretion of the logic implementing the feature
> (ex: can
> > > > be
> > > > > > done via dynamic broker config)." Does that mean the broker
> > > > registration
> > > > > ZK
> > > > > > node will be updated dynamically when this happens?
> > > > >
> > > > > (Kowshik): Not really. The text was just conveying that a broker
> could
> > > > > "know" of
> > > > > a new feature version, but it does not mean the broker should have
> also
> > > > > activated the effects of the feature version. Knowing vs activation
> > > are 2
> > > > > separate things,
> > > > > and the latter can be achieved by dynamic config. I have reworded
> the
> > > > text
> > > > > to
> > > > > make this clear to the reader.
> > > > >
> > > > >
> > > > > > 104. UpdateMetadataRequest
> > > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > > included
> > > > > > in the request. My understanding is that it's only included if
> (1)
> > > > there
> > > > > is
> > > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > > failover.
> > > > > > 104.2 The new fields have the following versions. Why are the
> > > versions
> > > > 3+
> > > > > > when the top version is bumped to 6?
> > > > > >       "fields":  [
> > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > >           "about": "The name of the feature."},
> > > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > > >           "about": "The finalized version for the feature."}
> > > > > >       ]
> > > > >
> > > > > (Kowshik): With the new improved design, we have completely
> eliminated
> > > > the
> > > > > need to
> > > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> deliver
> > > > the
> > > > > notifications for changes to the '/features' ZK node.
> > > > >
> > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > > better
> > > > > > to use enable/disable?
> > > > >
> > > > > (Kowshik): For delete, yes, I have changed it so that we instead
> call
> > > it
> > > > > 'disable'.
> > > > > However for 'update', it can now also refer to either an upgrade
> or a
> > > > > forced downgrade.
> > > > > Therefore, I have left it the way it is, just calling it as just
> > > > 'update'.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Kowshik,
> > > > > >
> > > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > > >
> > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > 100.1 Since this request waits for responses from brokers,
> should we
> > > > add
> > > > > a
> > > > > > timeout in the request (like createTopicRequest)?
> > > > > > 100.2 The response schema is a bit weird. Typically, the response
> > > just
> > > > > > shows an error code and an error message, instead of echoing the
> > > > request.
> > > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > > features?
> > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> request. For
> > > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > > > just
> > > > > > ignores this? An alternative way is to have a separate
> > > > > > DeleteFeaturesRequest
> > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > increasing
> > > > > > version of the metadata for finalized features." I am wondering
> why
> > > the
> > > > > > ordering is important?
> > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > > >
> > > > > > 101. For the broker registration ZK node, should we bump up the
> > > version
> > > > > in
> > > > > > the json?
> > > > > >
> > > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > > > Each
> > > > > > ZK node has an internal version field that is incremented on
> every
> > > > > update.
> > > > > >
> > > > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > > > is
> > > > > > left to the discretion of the logic implementing the feature
> (ex: can
> > > > be
> > > > > > done via dynamic broker config)." Does that mean the broker
> > > > registration
> > > > > ZK
> > > > > > node will be updated dynamically when this happens?
> > > > > >
> > > > > > 104. UpdateMetadataRequest
> > > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > > included
> > > > > > in the request. My understanding is that it's only included if
> (1)
> > > > there
> > > > > is
> > > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > > failover.
> > > > > > 104.2 The new fields have the following versions. Why are the
> > > versions
> > > > 3+
> > > > > > when the top version is bumped to 6?
> > > > > >       "fields":  [
> > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > >           "about": "The name of the feature."},
> > > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > > >           "about": "The finalized version for the feature."}
> > > > > >       ]
> > > > > >
> > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > > better
> > > > > > to use enable/disable?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > kprakasam@confluent.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Boyang,
> > > > > > >
> > > > > > > Thanks for the great feedback! I have updated the KIP based on
> your
> > > > > > > feedback.
> > > > > > > Please find my response below for your comments, look for
> sentences
> > > > > > > starting
> > > > > > > with "(Kowshik)" below.
> > > > > > >
> > > > > > >
> > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > traffic"
> > > > > > could
> > > > > > > be
> > > > > > > > converted as "When is it safe for the brokers to start
> serving
> > > new
> > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier
> > > in
> > > > > the
> > > > > > > > context.
> > > > > > >
> > > > > > > (Kowshik): Great point! Done.
> > > > > > >
> > > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > > seems a
> > > > > > > > bit blurred. Could you point a reference to later section
> that we
> > > > > going
> > > > > > > to
> > > > > > > > store it in Zookeeper and update it every time when there is
> a
> > > > > feature
> > > > > > > > change?
> > > > > > >
> > > > > > > (Kowshik): Great point! Done. I've added a reference in the
> KIP.
> > > > > > >
> > > > > > >
> > > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > > KIP,
> > > > > for
> > > > > > > > features such as group coordinator semantics, there is no
> legal
> > > > > > scenario
> > > > > > > to
> > > > > > > > perform a downgrade at all. So having downgrade door open is
> > > pretty
> > > > > > > > error-prone as human faults happen all the time. I'm
> assuming as
> > > > new
> > > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > > feature
> > > > > > > > creation to indicate whether this feature is "downgradable".
> > > Could
> > > > > you
> > > > > > > > explain a bit more on the extra engineering effort for
> shipping
> > > > this
> > > > > > KIP
> > > > > > > > with downgrade protection in place?
> > > > > > >
> > > > > > > (Kowshik): Great point! I'd agree and disagree here. While I
> agree
> > > > that
> > > > > > > accidental
> > > > > > > downgrades can cause problems, I also think sometimes
> downgrades
> > > > should
> > > > > > > be allowed for emergency reasons (not all downgrades cause
> issues).
> > > > > > > It is just subjective to the feature being downgraded.
> > > > > > >
> > > > > > > To be more strict about feature version downgrades, I have
> modified
> > > > the
> > > > > > KIP
> > > > > > > proposing that we mandate a `--force-downgrade` flag be used
> in the
> > > > > > > UPDATE_FEATURES api
> > > > > > > and the tooling, whenever the human is downgrading a finalized
> > > > feature
> > > > > > > version.
> > > > > > > Hopefully this should cover the requirement, until we find the
> need
> > > > for
> > > > > > > advanced downgrade support.
> > > > > > >
> > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> will
> > > be
> > > > > > > defined
> > > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > > feature,
> > > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > > request
> > > > > > > > immediately, which introduces a time gap and the
> > > intended-to-close
> > > > > > > feature
> > > > > > > > could actually serve request during this phase. Do you think
> we
> > > > > should
> > > > > > > also
> > > > > > > > support configurations as well so that admin user could
> freely
> > > roll
> > > > > up
> > > > > > a
> > > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > > worrying
> > > > > > > > about the turnaround time to propagate the message only
> after the
> > > > > > cluster
> > > > > > > > starts up?
> > > > > > >
> > > > > > > (Kowshik): This is a great point/question. One of the
> expectations
> > > > out
> > > > > of
> > > > > > > this KIP, which is
> > > > > > > already followed in the broker, is the following.
> > > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > > > presence
> > > > > in
> > > > > > > ZK,
> > > > > > >    along with advertising it’s supported features.
> > > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > > UpdateMetadataRequest
> > > > > > >    from the controller, which contains the latest finalized
> > > features
> > > > as
> > > > > > > seen by
> > > > > > >    the controller. The broker validates this data against it’s
> > > > > supported
> > > > > > > features to
> > > > > > >    make sure there is no mismatch (it will shutdown if there
> is an
> > > > > > > incompatibility).
> > > > > > >
> > > > > > > It is expected that during the time between the 2 events T1
> and T2,
> > > > the
> > > > > > > broker is
> > > > > > > almost a silent entity in the cluster. It does not add any
> value to
> > > > the
> > > > > > > cluster, or carry
> > > > > > > out any important broker activities. By “important”, I mean it
> is
> > > not
> > > > > > doing
> > > > > > > mutations
> > > > > > > on it’s persistence, not mutating critical in-memory state,
> won’t
> > > be
> > > > > > > serving
> > > > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > > > partitions
> > > > > > > until
> > > > > > > it receives UpdateMetadataRequest from controller. Anything the
> > > > broker
> > > > > is
> > > > > > > doing up
> > > > > > > until this point is not damaging/useful.
> > > > > > >
> > > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > .
> > > > > > >
> > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > Feature",
> > > > > > may
> > > > > > > be
> > > > > > > > I misunderstood something, I thought the features are
> defined in
> > > > > broker
> > > > > > > > code, so admin could not really create a new feature?
> > > > > > >
> > > > > > > (Kowshik): Great point! You understood this right. Here adding
> a
> > > > > feature
> > > > > > > means we are
> > > > > > > adding a cluster-wide finalized *max* version for a feature
> that
> > > was
> > > > > > > previously never finalized.
> > > > > > > I have clarified this in the KIP now.
> > > > > > >
> > > > > > > > 6. I think we need a separate error code like
> > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > to
> > > > > > > > reject a concurrent feature update request.
> > > > > > >
> > > > > > > (Kowshik): Great point! I have modified the KIP adding the
> above
> > > (see
> > > > > > > 'Tooling support -> Admin API changes').
> > > > > > >
> > > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > > > the
> > > > > > > > feature information through Zookeeper. Is that mentioned in
> the
> > > KIP
> > > > > to
> > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > >
> > > > > > > (Kowshik): Nice question! The broker reads finalized feature
> info
> > > > > stored
> > > > > > in
> > > > > > > ZK,
> > > > > > > only during startup when it does a validation. When serving
> > > > > > > `ApiVersionsRequest`, the
> > > > > > > broker does not read this info from ZK directly. I'd imagine
> the
> > > risk
> > > > > is
> > > > > > > that it can increase
> > > > > > > the ZK read QPS which can be a bottleneck for the system.
> Today, in
> > > > > Kafka
> > > > > > > we use the
> > > > > > > controller to fan out ZK updates to brokers and we want to
> stick to
> > > > > that
> > > > > > > pattern to avoid
> > > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > > >
> > > > > > > > 8. I was under the impression that user could configure a
> range
> > > of
> > > > > > > > supported versions, what's the trade-off for allowing single
> > > > > finalized
> > > > > > > > version only?
> > > > > > >
> > > > > > > (Kowshik): Great question! The finalized version of a feature
> > > > basically
> > > > > > > refers to
> > > > > > > the cluster-wide finalized feature "maximum" version. For
> example,
> > > if
> > > > > the
> > > > > > > 'group_coordinator' feature
> > > > > > > has the finalized version set to 10, then, it means that
> > > cluster-wide
> > > > > all
> > > > > > > versions upto v10 are
> > > > > > > supported for this feature. However, note that if some version
> (ex:
> > > > v0)
> > > > > > > gets deprecated
> > > > > > > for this feature, then we don’t convey that using this scheme
> (also
> > > > > > > supporting deprecation is a non-goal).
> > > > > > >
> > > > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > > > finalized
> > > > > > > feature "maximum" versions.
> > > > > > >
> > > > > > > > 9. One minor syntax fix: Note that here the "client" here
> may be
> > > a
> > > > > > > producer
> > > > > > >
> > > > > > > (Kowshik): Great point! Done.
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > reluctanthero104@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Kowshik,
> > > > > > > >
> > > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > > >
> > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > traffic"
> > > > > > could
> > > > > > > be
> > > > > > > > converted as "When is it safe for the brokers to start
> serving
> > > new
> > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier
> > > in
> > > > > the
> > > > > > > > context.
> > > > > > > >
> > > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > > seems a
> > > > > > > > bit blurred. Could you point a reference to later section
> that we
> > > > > going
> > > > > > > to
> > > > > > > > store it in Zookeeper and update it every time when there is
> a
> > > > > feature
> > > > > > > > change?
> > > > > > > >
> > > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > > KIP,
> > > > > for
> > > > > > > > features such as group coordinator semantics, there is no
> legal
> > > > > > scenario
> > > > > > > to
> > > > > > > > perform a downgrade at all. So having downgrade door open is
> > > pretty
> > > > > > > > error-prone as human faults happen all the time. I'm
> assuming as
> > > > new
> > > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > > feature
> > > > > > > > creation to indicate whether this feature is "downgradable".
> > > Could
> > > > > you
> > > > > > > > explain a bit more on the extra engineering effort for
> shipping
> > > > this
> > > > > > KIP
> > > > > > > > with downgrade protection in place?
> > > > > > > >
> > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> will
> > > be
> > > > > > > defined
> > > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > > feature,
> > > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > > request
> > > > > > > > immediately, which introduces a time gap and the
> > > intended-to-close
> > > > > > > feature
> > > > > > > > could actually serve request during this phase. Do you think
> we
> > > > > should
> > > > > > > also
> > > > > > > > support configurations as well so that admin user could
> freely
> > > roll
> > > > > up
> > > > > > a
> > > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > > worrying
> > > > > > > > about the turnaround time to propagate the message only
> after the
> > > > > > cluster
> > > > > > > > starts up?
> > > > > > > >
> > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > Feature",
> > > > > > may
> > > > > > > be
> > > > > > > > I misunderstood something, I thought the features are
> defined in
> > > > > broker
> > > > > > > > code, so admin could not really create a new feature?
> > > > > > > >
> > > > > > > > 6. I think we need a separate error code like
> > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > to
> > > > > > > > reject a concurrent feature update request.
> > > > > > > >
> > > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > > > the
> > > > > > > > feature information through Zookeeper. Is that mentioned in
> the
> > > KIP
> > > > > to
> > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > >
> > > > > > > > 8. I was under the impression that user could configure a
> range
> > > of
> > > > > > > > supported versions, what's the trade-off for allowing single
> > > > > finalized
> > > > > > > > version only?
> > > > > > > >
> > > > > > > > 9. One minor syntax fix: Note that here the "client" here
> may be
> > > a
> > > > > > > producer
> > > > > > > >
> > > > > > > > Boyang
> > > > > > > >
> > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> cmccabe@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > > Hi Colin,
> > > > > > > > > >
> > > > > > > > > > Thanks for the feedback! I've changed the KIP to address
> your
> > > > > > > > > > suggestions.
> > > > > > > > > > Please find below my explanation. Here is a link to KIP
> 584:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > 1. '__data_version__' is the version of the finalized
> feature
> > > > > > > metadata
> > > > > > > > > > (i.e. actual ZK node contents), while the
> > > '__schema_version__'
> > > > is
> > > > > > the
> > > > > > > > > > version of the schema of the data persisted in ZK. These
> > > serve
> > > > > > > > different
> > > > > > > > > > purposes. '__data_version__' is is useful mainly to
> clients
> > > > > during
> > > > > > > > reads,
> > > > > > > > > > to differentiate between the 2 versions of eventually
> > > > consistent
> > > > > > > > > 'finalized
> > > > > > > > > > features' metadata (i.e. larger metadata version is more
> > > > recent).
> > > > > > > > > > '__schema_version__' provides an additional degree of
> > > > > flexibility,
> > > > > > > > where
> > > > > > > > > if
> > > > > > > > > > we decide to change the schema for '/features' node in
> ZK (in
> > > > the
> > > > > > > > > future),
> > > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > > serialization/deserialization of the ZK data can be
> handled
> > > > > > safely).
> > > > > > > > >
> > > > > > > > > Hi Kowshik,
> > > > > > > > >
> > > > > > > > > If you're talking about a number that lets you know if
> data is
> > > > more
> > > > > > or
> > > > > > > > > less recent, we would typically call that an epoch, and
> not a
> > > > > > version.
> > > > > > > > For
> > > > > > > > > the ZK data structures, the word "version" is typically
> > > reserved
> > > > > for
> > > > > > > > > describing changes to the overall schema of the data that
> is
> > > > > written
> > > > > > to
> > > > > > > > > ZooKeeper.  We don't even really change the "version" of
> those
> > > > > > schemas
> > > > > > > > that
> > > > > > > > > much, since most changes are backwards-compatible.  But we
> do
> > > > > include
> > > > > > > > that
> > > > > > > > > version field just in case.
> > > > > > > > >
> > > > > > > > > I don't think we really need an epoch here, though, since
> we
> > > can
> > > > > just
> > > > > > > > look
> > > > > > > > > at the broker epoch.  Whenever the broker registers, its
> epoch
> > > > will
> > > > > > be
> > > > > > > > > greater than the previous broker epoch.  And the newly
> > > registered
> > > > > > data
> > > > > > > > will
> > > > > > > > > take priority.  This will be a lot simpler than adding a
> > > separate
> > > > > > epoch
> > > > > > > > > system, I think.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. Regarding admin client needing min and max
> information -
> > > you
> > > > > are
> > > > > > > > > right!
> > > > > > > > > > I've changed the KIP such that the Admin API also allows
> the
> > > > user
> > > > > > to
> > > > > > > > read
> > > > > > > > > > 'supported features' from a specific broker. Please look
> at
> > > the
> > > > > > > section
> > > > > > > > > > "Admin API changes".
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > > deliberate.
> > > > > > > I've
> > > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > > >
> > > > > > > > > Sounds good.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> right!
> > > > > I've
> > > > > > > > > updated
> > > > > > > > > > the KIP sketching the functionality provided by this
> tool,
> > > with
> > > > > > some
> > > > > > > > > > examples. Please look at the section "Tooling support
> > > > examples".
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks, Kowshik.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > cmccabe@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > >
> > > > > > > > > > > In the "Schema" section, do we really need both
> > > > > > __schema_version__
> > > > > > > > and
> > > > > > > > > > > __data_version__?  Can we just have a single version
> field
> > > > > here?
> > > > > > > > > > >
> > > > > > > > > > > Shouldn't the Admin(Client) function have some way to
> get
> > > the
> > > > > min
> > > > > > > and
> > > > > > > > > max
> > > > > > > > > > > information that we're exposing as well?  I guess we
> could
> > > > have
> > > > > > > min,
> > > > > > > > > max,
> > > > > > > > > > > and current.  Unrelated: is the use of Long rather than
> > > long
> > > > > > > > deliberate
> > > > > > > > > > > here?
> > > > > > > > > > >
> > > > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> > > flags
> > > > > that
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > > >
> > > > > > > > > > > cheers,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > I've opened KIP-584
> > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > >
> > > > > > > > > > > > which
> > > > > > > > > > > > is intended to provide a versioning scheme for
> features.
> > > > I'd
> > > > > > like
> > > > > > > > to
> > > > > > > > > use
> > > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > > > feedback
> > > > > on
> > > > > > > > this.
> > > > > > > > > > > > Here
> > > > > > > > > > > > is a link to KIP-584
> > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > >  .
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you!
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Kowshik
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Colin McCabe <cm...@apache.org>.
Hi Jun,

I agree that asking the user to manually upgrade all features to the latest version is a burden.  Then the user has to know what the latest version of every feature is when upgrading.

What about a simple solution to problem this where we add a flag to the command-line tool like --enable-latest?  The command-line tool could query what the highest possible versions for each feature were (using the API) and then make another RPC to enable the latest features.

I think this is actually much easier than the version number solution.  The version string solution requires us to maintain a complicated mapping table between version strings and features.  In practice, we also have "internal versions" in ApiVersion.scala like 2.4IV0, 2.4IV1, and so on.  This isn't simple for users to understand or use.

It's also hard to know what the difference is between different version strings.  For example, there's actually no difference between 2.5IV0 and 2.4IV1, but you wouldn't know that unless you read the comments in ApiVersion.scala.  A system administrator who didn't know this might end up doing a cluster roll to upgrade the IBP that turned out to be unnecessary.

best,
Colin


On Mon, Apr 6, 2020, at 12:06, Jun Rao wrote:
> Hi, Kowshik,
> 
> Thanks for the reply. A few more replies below.
> 
> 100.6 You can look for the sentence "This operation requires ALTER on
> CLUSTER." in KIP-455. Also, you can check its usage in
> KafkaApis.authorize().
> 
> 110. From the external client/tooling perspective, it's more natural to use
> the release version for features. If we can use the same release version
> for internal representation, it seems simpler (easier to understand, no
> mapping overhead, etc). Is there a benefit with separate external and
> internal versioning schemes?
> 
> 111. To put this in context, when we had IBP, the default value is the
> current released version. So, if you are a brand new user, you don't need
> to configure IBP and all new features will be immediately available in the
> new cluster. If you are upgrading from an old version, you do need to
> understand and configure IBP. I see a similar pattern here for
> features. From the ease of use perspective, ideally, we shouldn't require a
> new user to have an extra step such as running a bootstrap script unless
> it's truly necessary. If someone has a special need (all the cases you
> mentioned seem special cases?), they can configure a mode such that
> features are enabled/disabled manually.
> 
> Jun
> 
> On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
> 
> > Hi Jun,
> >
> > Thanks for the feedback and suggestions. Please find my response below.
> >
> > > 100.6 For every new request, the admin needs to control who is allowed to
> > > issue that request if security is enabled. So, we need to assign the new
> > > request a ResourceType and possible AclOperations. See
> > >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as an example.
> >
> > (Kowshik): I don't see any reference to the words ResourceType or
> > AclOperations
> > in the KIP. Please let me know how I can use the KIP that you linked to
> > know how to
> > setup the appropriate ResourceType and/or ClusterOperation?
> >
> > > 105. If we change delete to disable, it's better to do this consistently
> > in
> > > request protocol and admin api as well.
> >
> > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > feature.
> > I've just changed the KIP to use 'delete'. I don't have a strong
> > preference.
> >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > > for new features to be included in minor releases too. Should we make the
> > > feature versioning match the release versioning?
> >
> > (Kowshik): The release version can be mapped to a set of feature versions,
> > and this can be done, for example in the tool (or even external to the
> > tool).
> > Can you please clarify what I'm missing?
> >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> >
> > (Kowshik): I agree that there is a trade-off here, but it will help
> > to decide whether the automation can be thought through in the future
> > in a follow up KIP, or right now in this KIP. We may invest
> > in automation, but we have to decide whether we should do it
> > now or later.
> >
> > For the inconvenience that you mentioned, do you think the problem that you
> > mentioned can be  overcome by asking for the cluster operator to run a
> > bootstrap script  when he/she knows that a specific AK release has been
> > almost completely deployed in a cluster for the first time? Idea is that
> > the
> > bootstrap script will know how to map a specific AK release to finalized
> > feature versions, and run the `kafka-features.sh` tool appropriately
> > against
> > the cluster.
> >
> > Now, coming back to your automation proposal/question.
> > I do see the value of automated feature version finalization, but I also
> > see
> > that this will open up several questions and some risks, as explained
> > below.
> > The answers to these depend on the definition of the automation we choose
> > to build, and how well does it fit into a kafka deployment.
> > Basically, it can be unsafe for the controller to finalize feature version
> > upgrades automatically, without learning about the intent of the cluster
> > operator.
> > 1. We would sometimes want to lock feature versions only when we have
> > externally verified
> > the stability of the broker binary.
> > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > complete,
> > and new brokers are highly unlikely to join the cluster.
> > 3. Only the cluster operator knows that the intent is to deploy the same
> > version
> > of the new broker release across the entire cluster (i.e. the latest
> > downloaded version).
> > 4. For downgrades, it appears the controller still needs some external
> > input
> > (such as the proposed tool) to finalize a feature version downgrade.
> >
> > If we have automation, that automation can end up failing in some of the
> > cases
> > above. Then, we need a way to declare that the cluster is "not ready" if
> > the
> > controller cannot automatically finalize some basic required feature
> > version
> > upgrades across the cluster. We need to make the cluster operator aware in
> > such a scenario (raise an alert or alike).
> >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> > 48.
> >
> > (Kowshik): Done.
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the reply. A few more comments below.
> > >
> > > 100.6 For every new request, the admin needs to control who is allowed to
> > > issue that request if security is enabled. So, we need to assign the new
> > > request a ResourceType and possible AclOperations. See
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as
> > > an example.
> > >
> > > 105. If we change delete to disable, it's better to do this consistently
> > in
> > > request protocol and admin api as well.
> > >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > > for new features to be included in minor releases too. Should we make the
> > > feature versioning match the release versioning?
> > >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> > >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> > > 48.
> > >
> > > Jun
> > >
> > >
> > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> > > wrote:
> > >
> > > > Hey Jun,
> > > >
> > > > Thanks a lot for the great feedback! Please note that the design
> > > > has changed a little bit on the KIP, and we now propagate the finalized
> > > > features metadata only via ZK watches (instead of UpdateMetadataRequest
> > > > from the controller).
> > > >
> > > > Please find below my response to your questions/feedback, with the
> > prefix
> > > > "(Kowshik):".
> > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should we
> > > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > >
> > > > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> > > > longer
> > > > wait for responses from brokers, since the design has been changed so
> > > that
> > > > the
> > > > features information is propagated via ZK. Nevertheless, it is right to
> > > > have a timeout
> > > > for the request.
> > > >
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > > request.
> > > >
> > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > error
> > > > code and a message.
> > > > Previously it was not echoing the "request", rather it was returning
> > the
> > > > latest set of
> > > > cluster-wide finalized features (after applying the updates). But you
> > are
> > > > right,
> > > > the additional info is not required, so I have removed it from the
> > > response
> > > > schema.
> > > >
> > > > > 100.3 Should we add a separate request to list/describe the existing
> > > > > features?
> > > >
> > > > (Kowshik): This is already present in the KIP via the
> > 'DescribeFeatures'
> > > > Admin API,
> > > > which, underneath covers uses the ApiVersionsRequest to list/describe
> > the
> > > > existing features. Please read the 'Tooling support' section.
> > > >
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > > DELETE, the version field doesn't make sense. So, I guess the broker
> > > just
> > > > > ignores this? An alternative way is to have a separate
> > > > DeleteFeaturesRequest
> > > >
> > > > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> > > > controller APIs
> > > > serving these different purposes:
> > > > 1. updateFeatures
> > > > 2. deleteFeatures
> > > >
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > >
> > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > > > version), and
> > > > it is just the ZK node version. Basically, this is the epoch for the
> > > > cluster-wide
> > > > finalized feature version metadata. This metadata is served to clients
> > > via
> > > > the
> > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > '/features'
> > > > ZK node
> > > > to all brokers, via ZK watches setup by each broker on the '/features'
> > > > node.
> > > >
> > > > Now here is why the ordering is important:
> > > > ZK watches don't propagate at the same time. As a result, the
> > > > ApiVersionsResponse
> > > > is eventually consistent across brokers. This can introduce cases
> > > > where clients see an older lower epoch of the features metadata, after
> > a
> > > > more recent
> > > > higher epoch was returned at a previous point in time. We expect
> > clients
> > > > to always employ the rule that the latest received higher epoch of
> > > metadata
> > > > always trumps an older smaller epoch. Those clients that are external
> > to
> > > > Kafka should strongly consider discovering the latest metadata once
> > > during
> > > > startup from the brokers, and if required refresh the metadata
> > > periodically
> > > > (to get the latest metadata).
> > > >
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > >
> > > > (Kowshik): What is ACL, and how could I find out which one to specify?
> > > > Please could you provide me some pointers? I'll be glad to update the
> > > > KIP once I know the next steps.
> > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > the json?
> > > >
> > > > (Kowshik): Great point! Done. I've increased the version in the broker
> > > json
> > > > by 1.
> > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch field.
> > > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > >
> > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > instead
> > > of
> > > > explicitly
> > > > incremented epoch.
> > > >
> > > > > 103. "Enabling the actual semantics of a feature version cluster-wide
> > > is
> > > > > left to the discretion of the logic implementing the feature (ex: can
> > > be
> > > > > done via dynamic broker config)." Does that mean the broker
> > > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > >
> > > > (Kowshik): Not really. The text was just conveying that a broker could
> > > > "know" of
> > > > a new feature version, but it does not mean the broker should have also
> > > > activated the effects of the feature version. Knowing vs activation
> > are 2
> > > > separate things,
> > > > and the latter can be achieved by dynamic config. I have reworded the
> > > text
> > > > to
> > > > make this clear to the reader.
> > > >
> > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions
> > > 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > >
> > > > (Kowshik): With the new improved design, we have completely eliminated
> > > the
> > > > need to
> > > > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
> > > the
> > > > notifications for changes to the '/features' ZK node.
> > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > > better
> > > > > to use enable/disable?
> > > >
> > > > (Kowshik): For delete, yes, I have changed it so that we instead call
> > it
> > > > 'disable'.
> > > > However for 'update', it can now also refer to either an upgrade or a
> > > > forced downgrade.
> > > > Therefore, I have left it the way it is, just calling it as just
> > > 'update'.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should we
> > > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > > request.
> > > > > 100.3 Should we add a separate request to list/describe the existing
> > > > > features?
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > > DELETE, the version field doesn't make sense. So, I guess the broker
> > > just
> > > > > ignores this? An alternative way is to have a separate
> > > > > DeleteFeaturesRequest
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > > the json?
> > > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch field.
> > > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > > >
> > > > > 103. "Enabling the actual semantics of a feature version cluster-wide
> > > is
> > > > > left to the discretion of the logic implementing the feature (ex: can
> > > be
> > > > > done via dynamic broker config)." Does that mean the broker
> > > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions
> > > 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > > better
> > > > > to use enable/disable?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > kprakasam@confluent.io
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey Boyang,
> > > > > >
> > > > > > Thanks for the great feedback! I have updated the KIP based on your
> > > > > > feedback.
> > > > > > Please find my response below for your comments, look for sentences
> > > > > > starting
> > > > > > with "(Kowshik)" below.
> > > > > >
> > > > > >
> > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > traffic"
> > > > > could
> > > > > > be
> > > > > > > converted as "When is it safe for the brokers to start serving
> > new
> > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> > in
> > > > the
> > > > > > > context.
> > > > > >
> > > > > > (Kowshik): Great point! Done.
> > > > > >
> > > > > > > 2. In the *Explanation *section, the metadata version number part
> > > > > seems a
> > > > > > > bit blurred. Could you point a reference to later section that we
> > > > going
> > > > > > to
> > > > > > > store it in Zookeeper and update it every time when there is a
> > > > feature
> > > > > > > change?
> > > > > >
> > > > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > > > >
> > > > > >
> > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > KIP,
> > > > for
> > > > > > > features such as group coordinator semantics, there is no legal
> > > > > scenario
> > > > > > to
> > > > > > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > > > > > error-prone as human faults happen all the time. I'm assuming as
> > > new
> > > > > > > features are implemented, it's not very hard to add a flag during
> > > > > feature
> > > > > > > creation to indicate whether this feature is "downgradable".
> > Could
> > > > you
> > > > > > > explain a bit more on the extra engineering effort for shipping
> > > this
> > > > > KIP
> > > > > > > with downgrade protection in place?
> > > > > >
> > > > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
> > > that
> > > > > > accidental
> > > > > > downgrades can cause problems, I also think sometimes downgrades
> > > should
> > > > > > be allowed for emergency reasons (not all downgrades cause issues).
> > > > > > It is just subjective to the feature being downgraded.
> > > > > >
> > > > > > To be more strict about feature version downgrades, I have modified
> > > the
> > > > > KIP
> > > > > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > > > > UPDATE_FEATURES api
> > > > > > and the tooling, whenever the human is downgrading a finalized
> > > feature
> > > > > > version.
> > > > > > Hopefully this should cover the requirement, until we find the need
> > > for
> > > > > > advanced downgrade support.
> > > > > >
> > > > > > > 4. "Each broker’s supported dictionary of feature versions will
> > be
> > > > > > defined
> > > > > > > in the broker code." So this means in order to restrict a certain
> > > > > > feature,
> > > > > > > we need to start the broker first and then send a feature gating
> > > > > request
> > > > > > > immediately, which introduces a time gap and the
> > intended-to-close
> > > > > > feature
> > > > > > > could actually serve request during this phase. Do you think we
> > > > should
> > > > > > also
> > > > > > > support configurations as well so that admin user could freely
> > roll
> > > > up
> > > > > a
> > > > > > > cluster with all nodes complying the same feature gating, without
> > > > > > worrying
> > > > > > > about the turnaround time to propagate the message only after the
> > > > > cluster
> > > > > > > starts up?
> > > > > >
> > > > > > (Kowshik): This is a great point/question. One of the expectations
> > > out
> > > > of
> > > > > > this KIP, which is
> > > > > > already followed in the broker, is the following.
> > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > > presence
> > > > in
> > > > > > ZK,
> > > > > >    along with advertising it’s supported features.
> > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > UpdateMetadataRequest
> > > > > >    from the controller, which contains the latest finalized
> > features
> > > as
> > > > > > seen by
> > > > > >    the controller. The broker validates this data against it’s
> > > > supported
> > > > > > features to
> > > > > >    make sure there is no mismatch (it will shutdown if there is an
> > > > > > incompatibility).
> > > > > >
> > > > > > It is expected that during the time between the 2 events T1 and T2,
> > > the
> > > > > > broker is
> > > > > > almost a silent entity in the cluster. It does not add any value to
> > > the
> > > > > > cluster, or carry
> > > > > > out any important broker activities. By “important”, I mean it is
> > not
> > > > > doing
> > > > > > mutations
> > > > > > on it’s persistence, not mutating critical in-memory state, won’t
> > be
> > > > > > serving
> > > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > > partitions
> > > > > > until
> > > > > > it receives UpdateMetadataRequest from controller. Anything the
> > > broker
> > > > is
> > > > > > doing up
> > > > > > until this point is not damaging/useful.
> > > > > >
> > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > .
> > > > > >
> > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > Feature",
> > > > > may
> > > > > > be
> > > > > > > I misunderstood something, I thought the features are defined in
> > > > broker
> > > > > > > code, so admin could not really create a new feature?
> > > > > >
> > > > > > (Kowshik): Great point! You understood this right. Here adding a
> > > > feature
> > > > > > means we are
> > > > > > adding a cluster-wide finalized *max* version for a feature that
> > was
> > > > > > previously never finalized.
> > > > > > I have clarified this in the KIP now.
> > > > > >
> > > > > > > 6. I think we need a separate error code like
> > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > to
> > > > > > > reject a concurrent feature update request.
> > > > > >
> > > > > > (Kowshik): Great point! I have modified the KIP adding the above
> > (see
> > > > > > 'Tooling support -> Admin API changes').
> > > > > >
> > > > > > > 7. I think we haven't discussed the alternative solution to pass
> > > the
> > > > > > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > > to
> > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > >
> > > > > > (Kowshik): Nice question! The broker reads finalized feature info
> > > > stored
> > > > > in
> > > > > > ZK,
> > > > > > only during startup when it does a validation. When serving
> > > > > > `ApiVersionsRequest`, the
> > > > > > broker does not read this info from ZK directly. I'd imagine the
> > risk
> > > > is
> > > > > > that it can increase
> > > > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> > > > Kafka
> > > > > > we use the
> > > > > > controller to fan out ZK updates to brokers and we want to stick to
> > > > that
> > > > > > pattern to avoid
> > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > >
> > > > > > > 8. I was under the impression that user could configure a range
> > of
> > > > > > > supported versions, what's the trade-off for allowing single
> > > > finalized
> > > > > > > version only?
> > > > > >
> > > > > > (Kowshik): Great question! The finalized version of a feature
> > > basically
> > > > > > refers to
> > > > > > the cluster-wide finalized feature "maximum" version. For example,
> > if
> > > > the
> > > > > > 'group_coordinator' feature
> > > > > > has the finalized version set to 10, then, it means that
> > cluster-wide
> > > > all
> > > > > > versions upto v10 are
> > > > > > supported for this feature. However, note that if some version (ex:
> > > v0)
> > > > > > gets deprecated
> > > > > > for this feature, then we don’t convey that using this scheme (also
> > > > > > supporting deprecation is a non-goal).
> > > > > >
> > > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > > finalized
> > > > > > feature "maximum" versions.
> > > > > >
> > > > > > > 9. One minor syntax fix: Note that here the "client" here may be
> > a
> > > > > > producer
> > > > > >
> > > > > > (Kowshik): Great point! Done.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > reluctanthero104@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Kowshik,
> > > > > > >
> > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > >
> > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > traffic"
> > > > > could
> > > > > > be
> > > > > > > converted as "When is it safe for the brokers to start serving
> > new
> > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> > in
> > > > the
> > > > > > > context.
> > > > > > >
> > > > > > > 2. In the *Explanation *section, the metadata version number part
> > > > > seems a
> > > > > > > bit blurred. Could you point a reference to later section that we
> > > > going
> > > > > > to
> > > > > > > store it in Zookeeper and update it every time when there is a
> > > > feature
> > > > > > > change?
> > > > > > >
> > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > KIP,
> > > > for
> > > > > > > features such as group coordinator semantics, there is no legal
> > > > > scenario
> > > > > > to
> > > > > > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > > > > > error-prone as human faults happen all the time. I'm assuming as
> > > new
> > > > > > > features are implemented, it's not very hard to add a flag during
> > > > > feature
> > > > > > > creation to indicate whether this feature is "downgradable".
> > Could
> > > > you
> > > > > > > explain a bit more on the extra engineering effort for shipping
> > > this
> > > > > KIP
> > > > > > > with downgrade protection in place?
> > > > > > >
> > > > > > > 4. "Each broker’s supported dictionary of feature versions will
> > be
> > > > > > defined
> > > > > > > in the broker code." So this means in order to restrict a certain
> > > > > > feature,
> > > > > > > we need to start the broker first and then send a feature gating
> > > > > request
> > > > > > > immediately, which introduces a time gap and the
> > intended-to-close
> > > > > > feature
> > > > > > > could actually serve request during this phase. Do you think we
> > > > should
> > > > > > also
> > > > > > > support configurations as well so that admin user could freely
> > roll
> > > > up
> > > > > a
> > > > > > > cluster with all nodes complying the same feature gating, without
> > > > > > worrying
> > > > > > > about the turnaround time to propagate the message only after the
> > > > > cluster
> > > > > > > starts up?
> > > > > > >
> > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > Feature",
> > > > > may
> > > > > > be
> > > > > > > I misunderstood something, I thought the features are defined in
> > > > broker
> > > > > > > code, so admin could not really create a new feature?
> > > > > > >
> > > > > > > 6. I think we need a separate error code like
> > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > to
> > > > > > > reject a concurrent feature update request.
> > > > > > >
> > > > > > > 7. I think we haven't discussed the alternative solution to pass
> > > the
> > > > > > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > > to
> > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > >
> > > > > > > 8. I was under the impression that user could configure a range
> > of
> > > > > > > supported versions, what's the trade-off for allowing single
> > > > finalized
> > > > > > > version only?
> > > > > > >
> > > > > > > 9. One minor syntax fix: Note that here the "client" here may be
> > a
> > > > > > producer
> > > > > > >
> > > > > > > Boyang
> > > > > > >
> > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmccabe@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > Hi Colin,
> > > > > > > > >
> > > > > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > > > > suggestions.
> > > > > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > 1. '__data_version__' is the version of the finalized feature
> > > > > > metadata
> > > > > > > > > (i.e. actual ZK node contents), while the
> > '__schema_version__'
> > > is
> > > > > the
> > > > > > > > > version of the schema of the data persisted in ZK. These
> > serve
> > > > > > > different
> > > > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > > > during
> > > > > > > reads,
> > > > > > > > > to differentiate between the 2 versions of eventually
> > > consistent
> > > > > > > > 'finalized
> > > > > > > > > features' metadata (i.e. larger metadata version is more
> > > recent).
> > > > > > > > > '__schema_version__' provides an additional degree of
> > > > flexibility,
> > > > > > > where
> > > > > > > > if
> > > > > > > > > we decide to change the schema for '/features' node in ZK (in
> > > the
> > > > > > > > future),
> > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > serialization/deserialization of the ZK data can be handled
> > > > > safely).
> > > > > > > >
> > > > > > > > Hi Kowshik,
> > > > > > > >
> > > > > > > > If you're talking about a number that lets you know if data is
> > > more
> > > > > or
> > > > > > > > less recent, we would typically call that an epoch, and not a
> > > > > version.
> > > > > > > For
> > > > > > > > the ZK data structures, the word "version" is typically
> > reserved
> > > > for
> > > > > > > > describing changes to the overall schema of the data that is
> > > > written
> > > > > to
> > > > > > > > ZooKeeper.  We don't even really change the "version" of those
> > > > > schemas
> > > > > > > that
> > > > > > > > much, since most changes are backwards-compatible.  But we do
> > > > include
> > > > > > > that
> > > > > > > > version field just in case.
> > > > > > > >
> > > > > > > > I don't think we really need an epoch here, though, since we
> > can
> > > > just
> > > > > > > look
> > > > > > > > at the broker epoch.  Whenever the broker registers, its epoch
> > > will
> > > > > be
> > > > > > > > greater than the previous broker epoch.  And the newly
> > registered
> > > > > data
> > > > > > > will
> > > > > > > > take priority.  This will be a lot simpler than adding a
> > separate
> > > > > epoch
> > > > > > > > system, I think.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 2. Regarding admin client needing min and max information -
> > you
> > > > are
> > > > > > > > right!
> > > > > > > > > I've changed the KIP such that the Admin API also allows the
> > > user
> > > > > to
> > > > > > > read
> > > > > > > > > 'supported features' from a specific broker. Please look at
> > the
> > > > > > section
> > > > > > > > > "Admin API changes".
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > deliberate.
> > > > > > I've
> > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > >
> > > > > > > > Sounds good.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > > > I've
> > > > > > > > updated
> > > > > > > > > the KIP sketching the functionality provided by this tool,
> > with
> > > > > some
> > > > > > > > > examples. Please look at the section "Tooling support
> > > examples".
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks, Kowshik.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > cmccabe@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > >
> > > > > > > > > > In the "Schema" section, do we really need both
> > > > > __schema_version__
> > > > > > > and
> > > > > > > > > > __data_version__?  Can we just have a single version field
> > > > here?
> > > > > > > > > >
> > > > > > > > > > Shouldn't the Admin(Client) function have some way to get
> > the
> > > > min
> > > > > > and
> > > > > > > > max
> > > > > > > > > > information that we're exposing as well?  I guess we could
> > > have
> > > > > > min,
> > > > > > > > max,
> > > > > > > > > > and current.  Unrelated: is the use of Long rather than
> > long
> > > > > > > deliberate
> > > > > > > > > > here?
> > > > > > > > > >
> > > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> > flags
> > > > that
> > > > > > it
> > > > > > > > will
> > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > >
> > > > > > > > > > cheers,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I've opened KIP-584
> > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > >
> > > > > > > > > > > which
> > > > > > > > > > > is intended to provide a versioning scheme for features.
> > > I'd
> > > > > like
> > > > > > > to
> > > > > > > > use
> > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > > feedback
> > > > on
> > > > > > > this.
> > > > > > > > > > > Here
> > > > > > > > > > > is a link to KIP-584
> > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > >  .
> > > > > > > > > > >
> > > > > > > > > > > Thank you!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Kowshik
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Jun Rao <ju...@confluent.io>.
Hi, Kowshik,

Thanks for the reply. A few more replies below.

100.6 You can look for the sentence "This operation requires ALTER on
CLUSTER." in KIP-455. Also, you can check its usage in
KafkaApis.authorize().

110. From the external client/tooling perspective, it's more natural to use
the release version for features. If we can use the same release version
for internal representation, it seems simpler (easier to understand, no
mapping overhead, etc). Is there a benefit with separate external and
internal versioning schemes?

111. To put this in context, when we had IBP, the default value is the
current released version. So, if you are a brand new user, you don't need
to configure IBP and all new features will be immediately available in the
new cluster. If you are upgrading from an old version, you do need to
understand and configure IBP. I see a similar pattern here for
features. From the ease of use perspective, ideally, we shouldn't require a
new user to have an extra step such as running a bootstrap script unless
it's truly necessary. If someone has a special need (all the cases you
mentioned seem special cases?), they can configure a mode such that
features are enabled/disabled manually.

Jun

On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> Thanks for the feedback and suggestions. Please find my response below.
>
> > 100.6 For every new request, the admin needs to control who is allowed to
> > issue that request if security is enabled. So, we need to assign the new
> > request a ResourceType and possible AclOperations. See
> >
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > as an example.
>
> (Kowshik): I don't see any reference to the words ResourceType or
> AclOperations
> in the KIP. Please let me know how I can use the KIP that you linked to
> know how to
> setup the appropriate ResourceType and/or ClusterOperation?
>
> > 105. If we change delete to disable, it's better to do this consistently
> in
> > request protocol and admin api as well.
>
> (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> feature.
> I've just changed the KIP to use 'delete'. I don't have a strong
> preference.
>
> > 110. The minVersion/maxVersion for features use int64. Currently, our
> > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > for new features to be included in minor releases too. Should we make the
> > feature versioning match the release versioning?
>
> (Kowshik): The release version can be mapped to a set of feature versions,
> and this can be done, for example in the tool (or even external to the
> tool).
> Can you please clarify what I'm missing?
>
> > 111. "During regular operations, the data in the ZK node can be mutated
> > only via a specific admin API served only by the controller." I am
> > wondering why can't the controller auto finalize a feature version after
> > all brokers are upgraded? For new users who download the latest version
> to
> > build a new cluster, it's inconvenient for them to have to manually
> enable
> > each feature.
>
> (Kowshik): I agree that there is a trade-off here, but it will help
> to decide whether the automation can be thought through in the future
> in a follow up KIP, or right now in this KIP. We may invest
> in automation, but we have to decide whether we should do it
> now or later.
>
> For the inconvenience that you mentioned, do you think the problem that you
> mentioned can be  overcome by asking for the cluster operator to run a
> bootstrap script  when he/she knows that a specific AK release has been
> almost completely deployed in a cluster for the first time? Idea is that
> the
> bootstrap script will know how to map a specific AK release to finalized
> feature versions, and run the `kafka-features.sh` tool appropriately
> against
> the cluster.
>
> Now, coming back to your automation proposal/question.
> I do see the value of automated feature version finalization, but I also
> see
> that this will open up several questions and some risks, as explained
> below.
> The answers to these depend on the definition of the automation we choose
> to build, and how well does it fit into a kafka deployment.
> Basically, it can be unsafe for the controller to finalize feature version
> upgrades automatically, without learning about the intent of the cluster
> operator.
> 1. We would sometimes want to lock feature versions only when we have
> externally verified
> the stability of the broker binary.
> 2. Sometimes only the cluster operator knows that a cluster upgrade is
> complete,
> and new brokers are highly unlikely to join the cluster.
> 3. Only the cluster operator knows that the intent is to deploy the same
> version
> of the new broker release across the entire cluster (i.e. the latest
> downloaded version).
> 4. For downgrades, it appears the controller still needs some external
> input
> (such as the proposed tool) to finalize a feature version downgrade.
>
> If we have automation, that automation can end up failing in some of the
> cases
> above. Then, we need a way to declare that the cluster is "not ready" if
> the
> controller cannot automatically finalize some basic required feature
> version
> upgrades across the cluster. We need to make the cluster operator aware in
> such a scenario (raise an alert or alike).
>
> > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> 48.
>
> (Kowshik): Done.
>
>
> Cheers,
> Kowshik
>
> On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > Thanks for the reply. A few more comments below.
> >
> > 100.6 For every new request, the admin needs to control who is allowed to
> > issue that request if security is enabled. So, we need to assign the new
> > request a ResourceType and possible AclOperations. See
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > as
> > an example.
> >
> > 105. If we change delete to disable, it's better to do this consistently
> in
> > request protocol and admin api as well.
> >
> > 110. The minVersion/maxVersion for features use int64. Currently, our
> > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > for new features to be included in minor releases too. Should we make the
> > feature versioning match the release versioning?
> >
> > 111. "During regular operations, the data in the ZK node can be mutated
> > only via a specific admin API served only by the controller." I am
> > wondering why can't the controller auto finalize a feature version after
> > all brokers are upgraded? For new users who download the latest version
> to
> > build a new cluster, it's inconvenient for them to have to manually
> enable
> > each feature.
> >
> > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> > 48.
> >
> > Jun
> >
> >
> > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> > wrote:
> >
> > > Hey Jun,
> > >
> > > Thanks a lot for the great feedback! Please note that the design
> > > has changed a little bit on the KIP, and we now propagate the finalized
> > > features metadata only via ZK watches (instead of UpdateMetadataRequest
> > > from the controller).
> > >
> > > Please find below my response to your questions/feedback, with the
> prefix
> > > "(Kowshik):".
> > >
> > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > 100.1 Since this request waits for responses from brokers, should we
> > add
> > > a
> > > > timeout in the request (like createTopicRequest)?
> > >
> > > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> > > longer
> > > wait for responses from brokers, since the design has been changed so
> > that
> > > the
> > > features information is propagated via ZK. Nevertheless, it is right to
> > > have a timeout
> > > for the request.
> > >
> > > > 100.2 The response schema is a bit weird. Typically, the response
> just
> > > > shows an error code and an error message, instead of echoing the
> > request.
> > >
> > > (Kowshik): Great point! Yeah, I have modified it to just return an
> error
> > > code and a message.
> > > Previously it was not echoing the "request", rather it was returning
> the
> > > latest set of
> > > cluster-wide finalized features (after applying the updates). But you
> are
> > > right,
> > > the additional info is not required, so I have removed it from the
> > response
> > > schema.
> > >
> > > > 100.3 Should we add a separate request to list/describe the existing
> > > > features?
> > >
> > > (Kowshik): This is already present in the KIP via the
> 'DescribeFeatures'
> > > Admin API,
> > > which, underneath covers uses the ApiVersionsRequest to list/describe
> the
> > > existing features. Please read the 'Tooling support' section.
> > >
> > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > DELETE, the version field doesn't make sense. So, I guess the broker
> > just
> > > > ignores this? An alternative way is to have a separate
> > > DeleteFeaturesRequest
> > >
> > > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> > > controller APIs
> > > serving these different purposes:
> > > 1. updateFeatures
> > > 2. deleteFeatures
> > >
> > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> increasing
> > > > version of the metadata for finalized features." I am wondering why
> the
> > > > ordering is important?
> > >
> > > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > > version), and
> > > it is just the ZK node version. Basically, this is the epoch for the
> > > cluster-wide
> > > finalized feature version metadata. This metadata is served to clients
> > via
> > > the
> > > ApiVersionsResponse (for reads). We propagate updates from the
> > '/features'
> > > ZK node
> > > to all brokers, via ZK watches setup by each broker on the '/features'
> > > node.
> > >
> > > Now here is why the ordering is important:
> > > ZK watches don't propagate at the same time. As a result, the
> > > ApiVersionsResponse
> > > is eventually consistent across brokers. This can introduce cases
> > > where clients see an older lower epoch of the features metadata, after
> a
> > > more recent
> > > higher epoch was returned at a previous point in time. We expect
> clients
> > > to always employ the rule that the latest received higher epoch of
> > metadata
> > > always trumps an older smaller epoch. Those clients that are external
> to
> > > Kafka should strongly consider discovering the latest metadata once
> > during
> > > startup from the brokers, and if required refresh the metadata
> > periodically
> > > (to get the latest metadata).
> > >
> > > > 100.6 Could you specify the required ACL for this new request?
> > >
> > > (Kowshik): What is ACL, and how could I find out which one to specify?
> > > Please could you provide me some pointers? I'll be glad to update the
> > > KIP once I know the next steps.
> > >
> > > > 101. For the broker registration ZK node, should we bump up the
> version
> > > in
> > > the json?
> > >
> > > (Kowshik): Great point! Done. I've increased the version in the broker
> > json
> > > by 1.
> > >
> > > > 102. For the /features ZK node, not sure if we need the epoch field.
> > Each
> > > > ZK node has an internal version field that is incremented on every
> > > update.
> > >
> > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> instead
> > of
> > > explicitly
> > > incremented epoch.
> > >
> > > > 103. "Enabling the actual semantics of a feature version cluster-wide
> > is
> > > > left to the discretion of the logic implementing the feature (ex: can
> > be
> > > > done via dynamic broker config)." Does that mean the broker
> > registration
> > > ZK
> > > > node will be updated dynamically when this happens?
> > >
> > > (Kowshik): Not really. The text was just conveying that a broker could
> > > "know" of
> > > a new feature version, but it does not mean the broker should have also
> > > activated the effects of the feature version. Knowing vs activation
> are 2
> > > separate things,
> > > and the latter can be achieved by dynamic config. I have reworded the
> > text
> > > to
> > > make this clear to the reader.
> > >
> > >
> > > > 104. UpdateMetadataRequest
> > > > 104.1 It would be useful to describe when the feature metadata is
> > > included
> > > > in the request. My understanding is that it's only included if (1)
> > there
> > > is
> > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > failover.
> > > > 104.2 The new fields have the following versions. Why are the
> versions
> > 3+
> > > > when the top version is bumped to 6?
> > > >       "fields":  [
> > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > >           "about": "The name of the feature."},
> > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > >           "about": "The finalized version for the feature."}
> > > >       ]
> > >
> > > (Kowshik): With the new improved design, we have completely eliminated
> > the
> > > need to
> > > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
> > the
> > > notifications for changes to the '/features' ZK node.
> > >
> > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > better
> > > > to use enable/disable?
> > >
> > > (Kowshik): For delete, yes, I have changed it so that we instead call
> it
> > > 'disable'.
> > > However for 'update', it can now also refer to either an upgrade or a
> > > forced downgrade.
> > > Therefore, I have left it the way it is, just calling it as just
> > 'update'.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > >
> > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > 100.1 Since this request waits for responses from brokers, should we
> > add
> > > a
> > > > timeout in the request (like createTopicRequest)?
> > > > 100.2 The response schema is a bit weird. Typically, the response
> just
> > > > shows an error code and an error message, instead of echoing the
> > request.
> > > > 100.3 Should we add a separate request to list/describe the existing
> > > > features?
> > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > DELETE, the version field doesn't make sense. So, I guess the broker
> > just
> > > > ignores this? An alternative way is to have a separate
> > > > DeleteFeaturesRequest
> > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> increasing
> > > > version of the metadata for finalized features." I am wondering why
> the
> > > > ordering is important?
> > > > 100.6 Could you specify the required ACL for this new request?
> > > >
> > > > 101. For the broker registration ZK node, should we bump up the
> version
> > > in
> > > > the json?
> > > >
> > > > 102. For the /features ZK node, not sure if we need the epoch field.
> > Each
> > > > ZK node has an internal version field that is incremented on every
> > > update.
> > > >
> > > > 103. "Enabling the actual semantics of a feature version cluster-wide
> > is
> > > > left to the discretion of the logic implementing the feature (ex: can
> > be
> > > > done via dynamic broker config)." Does that mean the broker
> > registration
> > > ZK
> > > > node will be updated dynamically when this happens?
> > > >
> > > > 104. UpdateMetadataRequest
> > > > 104.1 It would be useful to describe when the feature metadata is
> > > included
> > > > in the request. My understanding is that it's only included if (1)
> > there
> > > is
> > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > failover.
> > > > 104.2 The new fields have the following versions. Why are the
> versions
> > 3+
> > > > when the top version is bumped to 6?
> > > >       "fields":  [
> > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > >           "about": "The name of the feature."},
> > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > >           "about": "The finalized version for the feature."}
> > > >       ]
> > > >
> > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > better
> > > > to use enable/disable?
> > > >
> > > > Jun
> > > >
> > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > kprakasam@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > Hey Boyang,
> > > > >
> > > > > Thanks for the great feedback! I have updated the KIP based on your
> > > > > feedback.
> > > > > Please find my response below for your comments, look for sentences
> > > > > starting
> > > > > with "(Kowshik)" below.
> > > > >
> > > > >
> > > > > > 1. "When is it safe for the brokers to begin handling EOS
> traffic"
> > > > could
> > > > > be
> > > > > > converted as "When is it safe for the brokers to start serving
> new
> > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> in
> > > the
> > > > > > context.
> > > > >
> > > > > (Kowshik): Great point! Done.
> > > > >
> > > > > > 2. In the *Explanation *section, the metadata version number part
> > > > seems a
> > > > > > bit blurred. Could you point a reference to later section that we
> > > going
> > > > > to
> > > > > > store it in Zookeeper and update it every time when there is a
> > > feature
> > > > > > change?
> > > > >
> > > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > > >
> > > > >
> > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> KIP,
> > > for
> > > > > > features such as group coordinator semantics, there is no legal
> > > > scenario
> > > > > to
> > > > > > perform a downgrade at all. So having downgrade door open is
> pretty
> > > > > > error-prone as human faults happen all the time. I'm assuming as
> > new
> > > > > > features are implemented, it's not very hard to add a flag during
> > > > feature
> > > > > > creation to indicate whether this feature is "downgradable".
> Could
> > > you
> > > > > > explain a bit more on the extra engineering effort for shipping
> > this
> > > > KIP
> > > > > > with downgrade protection in place?
> > > > >
> > > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
> > that
> > > > > accidental
> > > > > downgrades can cause problems, I also think sometimes downgrades
> > should
> > > > > be allowed for emergency reasons (not all downgrades cause issues).
> > > > > It is just subjective to the feature being downgraded.
> > > > >
> > > > > To be more strict about feature version downgrades, I have modified
> > the
> > > > KIP
> > > > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > > > UPDATE_FEATURES api
> > > > > and the tooling, whenever the human is downgrading a finalized
> > feature
> > > > > version.
> > > > > Hopefully this should cover the requirement, until we find the need
> > for
> > > > > advanced downgrade support.
> > > > >
> > > > > > 4. "Each broker’s supported dictionary of feature versions will
> be
> > > > > defined
> > > > > > in the broker code." So this means in order to restrict a certain
> > > > > feature,
> > > > > > we need to start the broker first and then send a feature gating
> > > > request
> > > > > > immediately, which introduces a time gap and the
> intended-to-close
> > > > > feature
> > > > > > could actually serve request during this phase. Do you think we
> > > should
> > > > > also
> > > > > > support configurations as well so that admin user could freely
> roll
> > > up
> > > > a
> > > > > > cluster with all nodes complying the same feature gating, without
> > > > > worrying
> > > > > > about the turnaround time to propagate the message only after the
> > > > cluster
> > > > > > starts up?
> > > > >
> > > > > (Kowshik): This is a great point/question. One of the expectations
> > out
> > > of
> > > > > this KIP, which is
> > > > > already followed in the broker, is the following.
> > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > presence
> > > in
> > > > > ZK,
> > > > >    along with advertising it’s supported features.
> > > > >  - Imagine at a future time T2 the broker receives the
> > > > > UpdateMetadataRequest
> > > > >    from the controller, which contains the latest finalized
> features
> > as
> > > > > seen by
> > > > >    the controller. The broker validates this data against it’s
> > > supported
> > > > > features to
> > > > >    make sure there is no mismatch (it will shutdown if there is an
> > > > > incompatibility).
> > > > >
> > > > > It is expected that during the time between the 2 events T1 and T2,
> > the
> > > > > broker is
> > > > > almost a silent entity in the cluster. It does not add any value to
> > the
> > > > > cluster, or carry
> > > > > out any important broker activities. By “important”, I mean it is
> not
> > > > doing
> > > > > mutations
> > > > > on it’s persistence, not mutating critical in-memory state, won’t
> be
> > > > > serving
> > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > partitions
> > > > > until
> > > > > it receives UpdateMetadataRequest from controller. Anything the
> > broker
> > > is
> > > > > doing up
> > > > > until this point is not damaging/useful.
> > > > >
> > > > > I’ve clarified the above in the KIP, see this new section:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > .
> > > > >
> > > > > > 5. "adding a new Feature, updating or deleting an existing
> > Feature",
> > > > may
> > > > > be
> > > > > > I misunderstood something, I thought the features are defined in
> > > broker
> > > > > > code, so admin could not really create a new feature?
> > > > >
> > > > > (Kowshik): Great point! You understood this right. Here adding a
> > > feature
> > > > > means we are
> > > > > adding a cluster-wide finalized *max* version for a feature that
> was
> > > > > previously never finalized.
> > > > > I have clarified this in the KIP now.
> > > > >
> > > > > > 6. I think we need a separate error code like
> > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > to
> > > > > > reject a concurrent feature update request.
> > > > >
> > > > > (Kowshik): Great point! I have modified the KIP adding the above
> (see
> > > > > 'Tooling support -> Admin API changes').
> > > > >
> > > > > > 7. I think we haven't discussed the alternative solution to pass
> > the
> > > > > > feature information through Zookeeper. Is that mentioned in the
> KIP
> > > to
> > > > > > justify why using UpdateMetadata is more favorable?
> > > > >
> > > > > (Kowshik): Nice question! The broker reads finalized feature info
> > > stored
> > > > in
> > > > > ZK,
> > > > > only during startup when it does a validation. When serving
> > > > > `ApiVersionsRequest`, the
> > > > > broker does not read this info from ZK directly. I'd imagine the
> risk
> > > is
> > > > > that it can increase
> > > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> > > Kafka
> > > > > we use the
> > > > > controller to fan out ZK updates to brokers and we want to stick to
> > > that
> > > > > pattern to avoid
> > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > >
> > > > > > 8. I was under the impression that user could configure a range
> of
> > > > > > supported versions, what's the trade-off for allowing single
> > > finalized
> > > > > > version only?
> > > > >
> > > > > (Kowshik): Great question! The finalized version of a feature
> > basically
> > > > > refers to
> > > > > the cluster-wide finalized feature "maximum" version. For example,
> if
> > > the
> > > > > 'group_coordinator' feature
> > > > > has the finalized version set to 10, then, it means that
> cluster-wide
> > > all
> > > > > versions upto v10 are
> > > > > supported for this feature. However, note that if some version (ex:
> > v0)
> > > > > gets deprecated
> > > > > for this feature, then we don’t convey that using this scheme (also
> > > > > supporting deprecation is a non-goal).
> > > > >
> > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > finalized
> > > > > feature "maximum" versions.
> > > > >
> > > > > > 9. One minor syntax fix: Note that here the "client" here may be
> a
> > > > > producer
> > > > >
> > > > > (Kowshik): Great point! Done.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > >
> > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > reluctanthero104@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hey Kowshik,
> > > > > >
> > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > >
> > > > > > 1. "When is it safe for the brokers to begin handling EOS
> traffic"
> > > > could
> > > > > be
> > > > > > converted as "When is it safe for the brokers to start serving
> new
> > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> in
> > > the
> > > > > > context.
> > > > > >
> > > > > > 2. In the *Explanation *section, the metadata version number part
> > > > seems a
> > > > > > bit blurred. Could you point a reference to later section that we
> > > going
> > > > > to
> > > > > > store it in Zookeeper and update it every time when there is a
> > > feature
> > > > > > change?
> > > > > >
> > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> KIP,
> > > for
> > > > > > features such as group coordinator semantics, there is no legal
> > > > scenario
> > > > > to
> > > > > > perform a downgrade at all. So having downgrade door open is
> pretty
> > > > > > error-prone as human faults happen all the time. I'm assuming as
> > new
> > > > > > features are implemented, it's not very hard to add a flag during
> > > > feature
> > > > > > creation to indicate whether this feature is "downgradable".
> Could
> > > you
> > > > > > explain a bit more on the extra engineering effort for shipping
> > this
> > > > KIP
> > > > > > with downgrade protection in place?
> > > > > >
> > > > > > 4. "Each broker’s supported dictionary of feature versions will
> be
> > > > > defined
> > > > > > in the broker code." So this means in order to restrict a certain
> > > > > feature,
> > > > > > we need to start the broker first and then send a feature gating
> > > > request
> > > > > > immediately, which introduces a time gap and the
> intended-to-close
> > > > > feature
> > > > > > could actually serve request during this phase. Do you think we
> > > should
> > > > > also
> > > > > > support configurations as well so that admin user could freely
> roll
> > > up
> > > > a
> > > > > > cluster with all nodes complying the same feature gating, without
> > > > > worrying
> > > > > > about the turnaround time to propagate the message only after the
> > > > cluster
> > > > > > starts up?
> > > > > >
> > > > > > 5. "adding a new Feature, updating or deleting an existing
> > Feature",
> > > > may
> > > > > be
> > > > > > I misunderstood something, I thought the features are defined in
> > > broker
> > > > > > code, so admin could not really create a new feature?
> > > > > >
> > > > > > 6. I think we need a separate error code like
> > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > to
> > > > > > reject a concurrent feature update request.
> > > > > >
> > > > > > 7. I think we haven't discussed the alternative solution to pass
> > the
> > > > > > feature information through Zookeeper. Is that mentioned in the
> KIP
> > > to
> > > > > > justify why using UpdateMetadata is more favorable?
> > > > > >
> > > > > > 8. I was under the impression that user could configure a range
> of
> > > > > > supported versions, what's the trade-off for allowing single
> > > finalized
> > > > > > version only?
> > > > > >
> > > > > > 9. One minor syntax fix: Note that here the "client" here may be
> a
> > > > > producer
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmccabe@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > Hi Colin,
> > > > > > > >
> > > > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > > > suggestions.
> > > > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > .
> > > > > > > >
> > > > > > > > 1. '__data_version__' is the version of the finalized feature
> > > > > metadata
> > > > > > > > (i.e. actual ZK node contents), while the
> '__schema_version__'
> > is
> > > > the
> > > > > > > > version of the schema of the data persisted in ZK. These
> serve
> > > > > > different
> > > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > > during
> > > > > > reads,
> > > > > > > > to differentiate between the 2 versions of eventually
> > consistent
> > > > > > > 'finalized
> > > > > > > > features' metadata (i.e. larger metadata version is more
> > recent).
> > > > > > > > '__schema_version__' provides an additional degree of
> > > flexibility,
> > > > > > where
> > > > > > > if
> > > > > > > > we decide to change the schema for '/features' node in ZK (in
> > the
> > > > > > > future),
> > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > serialization/deserialization of the ZK data can be handled
> > > > safely).
> > > > > > >
> > > > > > > Hi Kowshik,
> > > > > > >
> > > > > > > If you're talking about a number that lets you know if data is
> > more
> > > > or
> > > > > > > less recent, we would typically call that an epoch, and not a
> > > > version.
> > > > > > For
> > > > > > > the ZK data structures, the word "version" is typically
> reserved
> > > for
> > > > > > > describing changes to the overall schema of the data that is
> > > written
> > > > to
> > > > > > > ZooKeeper.  We don't even really change the "version" of those
> > > > schemas
> > > > > > that
> > > > > > > much, since most changes are backwards-compatible.  But we do
> > > include
> > > > > > that
> > > > > > > version field just in case.
> > > > > > >
> > > > > > > I don't think we really need an epoch here, though, since we
> can
> > > just
> > > > > > look
> > > > > > > at the broker epoch.  Whenever the broker registers, its epoch
> > will
> > > > be
> > > > > > > greater than the previous broker epoch.  And the newly
> registered
> > > > data
> > > > > > will
> > > > > > > take priority.  This will be a lot simpler than adding a
> separate
> > > > epoch
> > > > > > > system, I think.
> > > > > > >
> > > > > > > >
> > > > > > > > 2. Regarding admin client needing min and max information -
> you
> > > are
> > > > > > > right!
> > > > > > > > I've changed the KIP such that the Admin API also allows the
> > user
> > > > to
> > > > > > read
> > > > > > > > 'supported features' from a specific broker. Please look at
> the
> > > > > section
> > > > > > > > "Admin API changes".
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > >
> > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > deliberate.
> > > > > I've
> > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > >
> > > > > > > Sounds good.
> > > > > > >
> > > > > > > >
> > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > > I've
> > > > > > > updated
> > > > > > > > the KIP sketching the functionality provided by this tool,
> with
> > > > some
> > > > > > > > examples. Please look at the section "Tooling support
> > examples".
> > > > > > > >
> > > > > > > > Thank you!
> > > > > > >
> > > > > > >
> > > > > > > Thanks, Kowshik.
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > cmccabe@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > >
> > > > > > > > > In the "Schema" section, do we really need both
> > > > __schema_version__
> > > > > > and
> > > > > > > > > __data_version__?  Can we just have a single version field
> > > here?
> > > > > > > > >
> > > > > > > > > Shouldn't the Admin(Client) function have some way to get
> the
> > > min
> > > > > and
> > > > > > > max
> > > > > > > > > information that we're exposing as well?  I guess we could
> > have
> > > > > min,
> > > > > > > max,
> > > > > > > > > and current.  Unrelated: is the use of Long rather than
> long
> > > > > > deliberate
> > > > > > > > > here?
> > > > > > > > >
> > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> flags
> > > that
> > > > > it
> > > > > > > will
> > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I've opened KIP-584
> > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > >
> > > > > > > > > > which
> > > > > > > > > > is intended to provide a versioning scheme for features.
> > I'd
> > > > like
> > > > > > to
> > > > > > > use
> > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > feedback
> > > on
> > > > > > this.
> > > > > > > > > > Here
> > > > > > > > > > is a link to KIP-584
> > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > >  .
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Colin,

Thanks a lot for the explanation! I've updated the KIP based on your
suggestions. Please find my response to your comments below.

> If you can just treat "not present" as version level 0, you can have just
checks like the second one.  This should lead to simpler code.

(Kowshik): Good idea! Done. I've updated the KIP to eliminate
FeatureUpdateType.DELETE, and instead use a version level value < 1 to be
indicative of feature  deletion, or more generally absence of a feature.
Thanks for the idea!

> I guess this ties in with the discussion above-- I would rather not have
a "deleted" state.  It doesn't seem to make anything more expressive, and
it complicates the code.

(Kowshik): As mentioned above, I've eliminated FeatureUpdateType.DELETE
now. Thanks for the idea!


Cheers,
Kowshik

On Sat, Apr 4, 2020 at 8:32 PM Colin McCabe <cm...@apache.org> wrote:

> On Fri, Apr 3, 2020, at 20:32, Kowshik Prakasam wrote:
> > > Colin wrote:
> > > It would be simpler to just say that a feature flag which doesn't
> appear
> > > in the znode is considered to be at version level 0.  This will also
> > > simplify the code a lot, I think, since you won't have to keep track of
> > > tricky distinctions between "disabled" and "enabled at version 0."
> > > Then you would be able to just use an int in most places.
> >
> > (Kowshik): I'm not sure I understood why we want do it this way. If an
> > entry for some finalized feature is absent in '/features' node,
> > alternatively we can just treat this as a feature with a version that
> > was never finalized/enabled or it was deleted at some point. Then, we can
> > even allow for "enabled at version 0" as the {minVersion, maxVersion}
> range
> > can be any valid range, not necessarily minVersion > 0.
>
> Think about the following pseudocode.  Which is simpler:
>
> > if (feature is not present) || (feature level < 1) {
> > ... something ...
> > } else {
> > ... something ...
> > }
>
> or
>
> > if (feature level < 1) {
> > ... something ...
> > } else {
> > ... something ...
> > }
>
> If you can just treat "not present" as version level 0, you can have just
> checks like the second one.  This should lead to simpler code.
>
> > (Kowshik): Yes, the whole change is a transaction. Either all provided
> > FeatureUpdate is carried out in ZK, or none happens. That's why we just
> > allow for a single error code field, as it is easier that way. This
> > transactional guarantee is mentioned under 'Proposed changes > New
> > controller API'
>
> That makes sense, thanks.
>
> > > Rather than FeatureUpdateType, I would just go with a boolean like
> > > "force."  I'm not sure what other values we'd want to add to this
> later on,
> > > if it were an enum.  I think the boolean is clearer.
> >
> > (Kowshik): Since we have decided to go just one API (i.e.
> > UpdateFeaturesRequest), it is better that FeatureUpdateType is an enum
> with
> > multiple values. A FeatureUpdateType is tied to a feature, and the
> possible
> > values are: ADD_OR_UPDATE, ADD_OR_UPDATE_ALLOW_DOWNGRADE, DELETE.
>
> I guess this ties in with the discussion above-- I would rather not have a
> "deleted" state.  It doesn't seem to make anything more expressive, and it
> complicates the code.
>
> >
> > > This ties in with my comment earlier, but for the result classes, we
> need
> > > methods other than just "all".  Batch operations aren't usable if
> > > you can't get the result per operation.... unless the semantics are
> > > transactional and it really is just everything succeeded or everything
> > > failed.
> >
> > (Kowshik): The semantics are transactional, as I explained above.
>
> Thanks for the clarification.
>
> >
> > > There are a bunch of Java interfaces described like FinalizedFeature,
> > > FeatureUpdate, UpdateFeaturesResult, and so on that should just be
> > > regular concrete Java classes.  In general we'd only use an interface
> if
> > > we wanted the caller to implement some kind of callback function. We
> > > don't make classes that are just designed to hold data into interfaces,
> > > since that just imposes extra work on callers (they have to define
> > > their own concrete class for each interface just to use the API.)
> > >  There's also probably no reason to have these classes inherit from
> each
> > > other or have complex type relationships.  One more nitpick is that
> Kafka
> > > generally doesn't use "get" in the function names of accessors.
> >
> > (Kowshik): Done, I have changed the KIP. By 'interface', I just meant
> > interface from a pseudocode standpoint (i.e. it was just an abstraction
> > providing at least the specified behavior). Since that was a bit
> confusing,
> > I have now renamed it calling it a class. Also I have eliminated the type
> > relationships.
>
> Thanks.
>
> best,
> Colin
>
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the feedback and suggestions. Please find my response below.
> > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed to
> > > > issue that request if security is enabled. So, we need to assign the
> new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as an example.
> > >
> > > (Kowshik): I don't see any reference to the words ResourceType or
> > > AclOperations
> > > in the KIP. Please let me know how I can use the KIP that you linked to
> > > know how to
> > > setup the appropriate ResourceType and/or ClusterOperation?
> > >
> > > > 105. If we change delete to disable, it's better to do this
> consistently
> > > in
> > > > request protocol and admin api as well.
> > >
> > > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > > feature.
> > > I've just changed the KIP to use 'delete'. I don't have a strong
> > > preference.
> > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > > for new features to be included in minor releases too. Should we
> make the
> > > > feature versioning match the release versioning?
> > >
> > > (Kowshik): The release version can be mapped to a set of feature
> versions,
> > > and this can be done, for example in the tool (or even external to the
> > > tool).
> > > Can you please clarify what I'm missing?
> > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > >
> > > (Kowshik): I agree that there is a trade-off here, but it will help
> > > to decide whether the automation can be thought through in the future
> > > in a follow up KIP, or right now in this KIP. We may invest
> > > in automation, but we have to decide whether we should do it
> > > now or later.
> > >
> > > For the inconvenience that you mentioned, do you think the problem
> that you
> > > mentioned can be  overcome by asking for the cluster operator to run a
> > > bootstrap script  when he/she knows that a specific AK release has been
> > > almost completely deployed in a cluster for the first time? Idea is
> that
> > > the
> > > bootstrap script will know how to map a specific AK release to
> finalized
> > > feature versions, and run the `kafka-features.sh` tool appropriately
> > > against
> > > the cluster.
> > >
> > > Now, coming back to your automation proposal/question.
> > > I do see the value of automated feature version finalization, but I
> also
> > > see
> > > that this will open up several questions and some risks, as explained
> > > below.
> > > The answers to these depend on the definition of the automation we
> choose
> > > to build, and how well does it fit into a kafka deployment.
> > > Basically, it can be unsafe for the controller to finalize feature
> version
> > > upgrades automatically, without learning about the intent of the
> cluster
> > > operator.
> > > 1. We would sometimes want to lock feature versions only when we have
> > > externally verified
> > > the stability of the broker binary.
> > > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > > complete,
> > > and new brokers are highly unlikely to join the cluster.
> > > 3. Only the cluster operator knows that the intent is to deploy the
> same
> > > version
> > > of the new broker release across the entire cluster (i.e. the latest
> > > downloaded version).
> > > 4. For downgrades, it appears the controller still needs some external
> > > input
> > > (such as the proposed tool) to finalize a feature version downgrade.
> > >
> > > If we have automation, that automation can end up failing in some of
> the
> > > cases
> > > above. Then, we need a way to declare that the cluster is "not ready"
> if
> > > the
> > > controller cannot automatically finalize some basic required feature
> > > version
> > > upgrades across the cluster. We need to make the cluster operator
> aware in
> > > such a scenario (raise an alert or alike).
> > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> instead of
> > > 48.
> > >
> > > (Kowshik): Done.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> > >
> > >> Hi, Kowshik,
> > >>
> > >> Thanks for the reply. A few more comments below.
> > >>
> > >> 100.6 For every new request, the admin needs to control who is
> allowed to
> > >> issue that request if security is enabled. So, we need to assign the
> new
> > >> request a ResourceType and possible AclOperations. See
> > >>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > >> as
> > >> an example.
> > >>
> > >> 105. If we change delete to disable, it's better to do this
> consistently
> > >> in
> > >> request protocol and admin api as well.
> > >>
> > >> 110. The minVersion/maxVersion for features use int64. Currently, our
> > >> release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > >> for new features to be included in minor releases too. Should we make
> the
> > >> feature versioning match the release versioning?
> > >>
> > >> 111. "During regular operations, the data in the ZK node can be
> mutated
> > >> only via a specific admin API served only by the controller." I am
> > >> wondering why can't the controller auto finalize a feature version
> after
> > >> all brokers are upgraded? For new users who download the latest
> version to
> > >> build a new cluster, it's inconvenient for them to have to manually
> enable
> > >> each feature.
> > >>
> > >> 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> of
> > >> 48.
> > >>
> > >> Jun
> > >>
> > >>
> > >> On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> kprakasam@confluent.io>
> > >> wrote:
> > >>
> > >> > Hey Jun,
> > >> >
> > >> > Thanks a lot for the great feedback! Please note that the design
> > >> > has changed a little bit on the KIP, and we now propagate the
> finalized
> > >> > features metadata only via ZK watches (instead of
> UpdateMetadataRequest
> > >> > from the controller).
> > >> >
> > >> > Please find below my response to your questions/feedback, with the
> > >> prefix
> > >> > "(Kowshik):".
> > >> >
> > >> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > >> > > 100.1 Since this request waits for responses from brokers, should
> we
> > >> add
> > >> > a
> > >> > > timeout in the request (like createTopicRequest)?
> > >> >
> > >> > (Kowshik): Great point! Done. I have added a timeout field. Note:
> we no
> > >> > longer
> > >> > wait for responses from brokers, since the design has been changed
> so
> > >> that
> > >> > the
> > >> > features information is propagated via ZK. Nevertheless, it is
> right to
> > >> > have a timeout
> > >> > for the request.
> > >> >
> > >> > > 100.2 The response schema is a bit weird. Typically, the response
> just
> > >> > > shows an error code and an error message, instead of echoing the
> > >> request.
> > >> >
> > >> > (Kowshik): Great point! Yeah, I have modified it to just return an
> error
> > >> > code and a message.
> > >> > Previously it was not echoing the "request", rather it was
> returning the
> > >> > latest set of
> > >> > cluster-wide finalized features (after applying the updates). But
> you
> > >> are
> > >> > right,
> > >> > the additional info is not required, so I have removed it from the
> > >> response
> > >> > schema.
> > >> >
> > >> > > 100.3 Should we add a separate request to list/describe the
> existing
> > >> > > features?
> > >> >
> > >> > (Kowshik): This is already present in the KIP via the
> 'DescribeFeatures'
> > >> > Admin API,
> > >> > which, underneath covers uses the ApiVersionsRequest to
> list/describe
> > >> the
> > >> > existing features. Please read the 'Tooling support' section.
> > >> >
> > >> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> For
> > >> > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > >> just
> > >> > > ignores this? An alternative way is to have a separate
> > >> > DeleteFeaturesRequest
> > >> >
> > >> > (Kowshik): Great point! I have modified the KIP now to have 2
> separate
> > >> > controller APIs
> > >> > serving these different purposes:
> > >> > 1. updateFeatures
> > >> > 2. deleteFeatures
> > >> >
> > >> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> increasing
> > >> > > version of the metadata for finalized features." I am wondering
> why
> > >> the
> > >> > > ordering is important?
> > >> >
> > >> > (Kowshik): In the latest KIP write-up, it is called epoch (instead
> of
> > >> > version), and
> > >> > it is just the ZK node version. Basically, this is the epoch for the
> > >> > cluster-wide
> > >> > finalized feature version metadata. This metadata is served to
> clients
> > >> via
> > >> > the
> > >> > ApiVersionsResponse (for reads). We propagate updates from the
> > >> '/features'
> > >> > ZK node
> > >> > to all brokers, via ZK watches setup by each broker on the
> '/features'
> > >> > node.
> > >> >
> > >> > Now here is why the ordering is important:
> > >> > ZK watches don't propagate at the same time. As a result, the
> > >> > ApiVersionsResponse
> > >> > is eventually consistent across brokers. This can introduce cases
> > >> > where clients see an older lower epoch of the features metadata,
> after a
> > >> > more recent
> > >> > higher epoch was returned at a previous point in time. We expect
> clients
> > >> > to always employ the rule that the latest received higher epoch of
> > >> metadata
> > >> > always trumps an older smaller epoch. Those clients that are
> external to
> > >> > Kafka should strongly consider discovering the latest metadata once
> > >> during
> > >> > startup from the brokers, and if required refresh the metadata
> > >> periodically
> > >> > (to get the latest metadata).
> > >> >
> > >> > > 100.6 Could you specify the required ACL for this new request?
> > >> >
> > >> > (Kowshik): What is ACL, and how could I find out which one to
> specify?
> > >> > Please could you provide me some pointers? I'll be glad to update
> the
> > >> > KIP once I know the next steps.
> > >> >
> > >> > > 101. For the broker registration ZK node, should we bump up the
> > >> version
> > >> > in
> > >> > the json?
> > >> >
> > >> > (Kowshik): Great point! Done. I've increased the version in the
> broker
> > >> json
> > >> > by 1.
> > >> >
> > >> > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > >> Each
> > >> > > ZK node has an internal version field that is incremented on every
> > >> > update.
> > >> >
> > >> > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > >> instead of
> > >> > explicitly
> > >> > incremented epoch.
> > >> >
> > >> > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > >> is
> > >> > > left to the discretion of the logic implementing the feature (ex:
> can
> > >> be
> > >> > > done via dynamic broker config)." Does that mean the broker
> > >> registration
> > >> > ZK
> > >> > > node will be updated dynamically when this happens?
> > >> >
> > >> > (Kowshik): Not really. The text was just conveying that a broker
> could
> > >> > "know" of
> > >> > a new feature version, but it does not mean the broker should have
> also
> > >> > activated the effects of the feature version. Knowing vs activation
> are
> > >> 2
> > >> > separate things,
> > >> > and the latter can be achieved by dynamic config. I have reworded
> the
> > >> text
> > >> > to
> > >> > make this clear to the reader.
> > >> >
> > >> >
> > >> > > 104. UpdateMetadataRequest
> > >> > > 104.1 It would be useful to describe when the feature metadata is
> > >> > included
> > >> > > in the request. My understanding is that it's only included if (1)
> > >> there
> > >> > is
> > >> > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > >> > > failover.
> > >> > > 104.2 The new fields have the following versions. Why are the
> > >> versions 3+
> > >> > > when the top version is bumped to 6?
> > >> > >       "fields":  [
> > >> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > >> > >           "about": "The name of the feature."},
> > >> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >> > >           "about": "The finalized version for the feature."}
> > >> > >       ]
> > >> >
> > >> > (Kowshik): With the new improved design, we have completely
> eliminated
> > >> the
> > >> > need to
> > >> > use UpdateMetadataRequest. This is because we now rely on ZK to
> deliver
> > >> the
> > >> > notifications for changes to the '/features' ZK node.
> > >> >
> > >> > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > >> > better
> > >> > > to use enable/disable?
> > >> >
> > >> > (Kowshik): For delete, yes, I have changed it so that we instead
> call it
> > >> > 'disable'.
> > >> > However for 'update', it can now also refer to either an upgrade or
> a
> > >> > forced downgrade.
> > >> > Therefore, I have left it the way it is, just calling it as just
> > >> 'update'.
> > >> >
> > >> >
> > >> > Cheers,
> > >> > Kowshik
> > >> >
> > >> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > >> >
> > >> > > Hi, Kowshik,
> > >> > >
> > >> > > Thanks for the KIP. Looks good overall. A few comments below.
> > >> > >
> > >> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > >> > > 100.1 Since this request waits for responses from brokers, should
> we
> > >> add
> > >> > a
> > >> > > timeout in the request (like createTopicRequest)?
> > >> > > 100.2 The response schema is a bit weird. Typically, the response
> just
> > >> > > shows an error code and an error message, instead of echoing the
> > >> request.
> > >> > > 100.3 Should we add a separate request to list/describe the
> existing
> > >> > > features?
> > >> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> For
> > >> > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > >> just
> > >> > > ignores this? An alternative way is to have a separate
> > >> > > DeleteFeaturesRequest
> > >> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> increasing
> > >> > > version of the metadata for finalized features." I am wondering
> why
> > >> the
> > >> > > ordering is important?
> > >> > > 100.6 Could you specify the required ACL for this new request?
> > >> > >
> > >> > > 101. For the broker registration ZK node, should we bump up the
> > >> version
> > >> > in
> > >> > > the json?
> > >> > >
> > >> > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > >> Each
> > >> > > ZK node has an internal version field that is incremented on every
> > >> > update.
> > >> > >
> > >> > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > >> is
> > >> > > left to the discretion of the logic implementing the feature (ex:
> can
> > >> be
> > >> > > done via dynamic broker config)." Does that mean the broker
> > >> registration
> > >> > ZK
> > >> > > node will be updated dynamically when this happens?
> > >> > >
> > >> > > 104. UpdateMetadataRequest
> > >> > > 104.1 It would be useful to describe when the feature metadata is
> > >> > included
> > >> > > in the request. My understanding is that it's only included if (1)
> > >> there
> > >> > is
> > >> > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > >> > > failover.
> > >> > > 104.2 The new fields have the following versions. Why are the
> > >> versions 3+
> > >> > > when the top version is bumped to 6?
> > >> > >       "fields":  [
> > >> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > >> > >           "about": "The name of the feature."},
> > >> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >> > >           "about": "The finalized version for the feature."}
> > >> > >       ]
> > >> > >
> > >> > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > >> > better
> > >> > > to use enable/disable?
> > >> > >
> > >> > > Jun
> > >> > >
> > >> > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > >> kprakasam@confluent.io
> > >> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Hey Boyang,
> > >> > > >
> > >> > > > Thanks for the great feedback! I have updated the KIP based on
> your
> > >> > > > feedback.
> > >> > > > Please find my response below for your comments, look for
> sentences
> > >> > > > starting
> > >> > > > with "(Kowshik)" below.
> > >> > > >
> > >> > > >
> > >> > > > > 1. "When is it safe for the brokers to begin handling EOS
> traffic"
> > >> > > could
> > >> > > > be
> > >> > > > > converted as "When is it safe for the brokers to start
> serving new
> > >> > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier in
> > >> > the
> > >> > > > > context.
> > >> > > >
> > >> > > > (Kowshik): Great point! Done.
> > >> > > >
> > >> > > > > 2. In the *Explanation *section, the metadata version number
> part
> > >> > > seems a
> > >> > > > > bit blurred. Could you point a reference to later section
> that we
> > >> > going
> > >> > > > to
> > >> > > > > store it in Zookeeper and update it every time when there is a
> > >> > feature
> > >> > > > > change?
> > >> > > >
> > >> > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > >> > > >
> > >> > > >
> > >> > > > > 3. For the feature downgrade, although it's a Non-goal of the
> KIP,
> > >> > for
> > >> > > > > features such as group coordinator semantics, there is no
> legal
> > >> > > scenario
> > >> > > > to
> > >> > > > > perform a downgrade at all. So having downgrade door open is
> > >> pretty
> > >> > > > > error-prone as human faults happen all the time. I'm assuming
> as
> > >> new
> > >> > > > > features are implemented, it's not very hard to add a flag
> during
> > >> > > feature
> > >> > > > > creation to indicate whether this feature is "downgradable".
> Could
> > >> > you
> > >> > > > > explain a bit more on the extra engineering effort for
> shipping
> > >> this
> > >> > > KIP
> > >> > > > > with downgrade protection in place?
> > >> > > >
> > >> > > > (Kowshik): Great point! I'd agree and disagree here. While I
> agree
> > >> that
> > >> > > > accidental
> > >> > > > downgrades can cause problems, I also think sometimes downgrades
> > >> should
> > >> > > > be allowed for emergency reasons (not all downgrades cause
> issues).
> > >> > > > It is just subjective to the feature being downgraded.
> > >> > > >
> > >> > > > To be more strict about feature version downgrades, I have
> modified
> > >> the
> > >> > > KIP
> > >> > > > proposing that we mandate a `--force-downgrade` flag be used in
> the
> > >> > > > UPDATE_FEATURES api
> > >> > > > and the tooling, whenever the human is downgrading a finalized
> > >> feature
> > >> > > > version.
> > >> > > > Hopefully this should cover the requirement, until we find the
> need
> > >> for
> > >> > > > advanced downgrade support.
> > >> > > >
> > >> > > > > 4. "Each broker’s supported dictionary of feature versions
> will be
> > >> > > > defined
> > >> > > > > in the broker code." So this means in order to restrict a
> certain
> > >> > > > feature,
> > >> > > > > we need to start the broker first and then send a feature
> gating
> > >> > > request
> > >> > > > > immediately, which introduces a time gap and the
> intended-to-close
> > >> > > > feature
> > >> > > > > could actually serve request during this phase. Do you think
> we
> > >> > should
> > >> > > > also
> > >> > > > > support configurations as well so that admin user could freely
> > >> roll
> > >> > up
> > >> > > a
> > >> > > > > cluster with all nodes complying the same feature gating,
> without
> > >> > > > worrying
> > >> > > > > about the turnaround time to propagate the message only after
> the
> > >> > > cluster
> > >> > > > > starts up?
> > >> > > >
> > >> > > > (Kowshik): This is a great point/question. One of the
> expectations
> > >> out
> > >> > of
> > >> > > > this KIP, which is
> > >> > > > already followed in the broker, is the following.
> > >> > > >  - Imagine at time T1 the broker starts up and registers it’s
> > >> presence
> > >> > in
> > >> > > > ZK,
> > >> > > >    along with advertising it’s supported features.
> > >> > > >  - Imagine at a future time T2 the broker receives the
> > >> > > > UpdateMetadataRequest
> > >> > > >    from the controller, which contains the latest finalized
> > >> features as
> > >> > > > seen by
> > >> > > >    the controller. The broker validates this data against it’s
> > >> > supported
> > >> > > > features to
> > >> > > >    make sure there is no mismatch (it will shutdown if there is
> an
> > >> > > > incompatibility).
> > >> > > >
> > >> > > > It is expected that during the time between the 2 events T1 and
> T2,
> > >> the
> > >> > > > broker is
> > >> > > > almost a silent entity in the cluster. It does not add any
> value to
> > >> the
> > >> > > > cluster, or carry
> > >> > > > out any important broker activities. By “important”, I mean it
> is
> > >> not
> > >> > > doing
> > >> > > > mutations
> > >> > > > on it’s persistence, not mutating critical in-memory state,
> won’t be
> > >> > > > serving
> > >> > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > >> > > partitions
> > >> > > > until
> > >> > > > it receives UpdateMetadataRequest from controller. Anything the
> > >> broker
> > >> > is
> > >> > > > doing up
> > >> > > > until this point is not damaging/useful.
> > >> > > >
> > >> > > > I’ve clarified the above in the KIP, see this new section:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > >> > > > .
> > >> > > >
> > >> > > > > 5. "adding a new Feature, updating or deleting an existing
> > >> Feature",
> > >> > > may
> > >> > > > be
> > >> > > > > I misunderstood something, I thought the features are defined
> in
> > >> > broker
> > >> > > > > code, so admin could not really create a new feature?
> > >> > > >
> > >> > > > (Kowshik): Great point! You understood this right. Here adding a
> > >> > feature
> > >> > > > means we are
> > >> > > > adding a cluster-wide finalized *max* version for a feature
> that was
> > >> > > > previously never finalized.
> > >> > > > I have clarified this in the KIP now.
> > >> > > >
> > >> > > > > 6. I think we need a separate error code like
> > >> > > FEATURE_UPDATE_IN_PROGRESS
> > >> > > > to
> > >> > > > > reject a concurrent feature update request.
> > >> > > >
> > >> > > > (Kowshik): Great point! I have modified the KIP adding the above
> > >> (see
> > >> > > > 'Tooling support -> Admin API changes').
> > >> > > >
> > >> > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > >> the
> > >> > > > > feature information through Zookeeper. Is that mentioned in
> the
> > >> KIP
> > >> > to
> > >> > > > > justify why using UpdateMetadata is more favorable?
> > >> > > >
> > >> > > > (Kowshik): Nice question! The broker reads finalized feature
> info
> > >> > stored
> > >> > > in
> > >> > > > ZK,
> > >> > > > only during startup when it does a validation. When serving
> > >> > > > `ApiVersionsRequest`, the
> > >> > > > broker does not read this info from ZK directly. I'd imagine the
> > >> risk
> > >> > is
> > >> > > > that it can increase
> > >> > > > the ZK read QPS which can be a bottleneck for the system.
> Today, in
> > >> > Kafka
> > >> > > > we use the
> > >> > > > controller to fan out ZK updates to brokers and we want to
> stick to
> > >> > that
> > >> > > > pattern to avoid
> > >> > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > >> > > >
> > >> > > > > 8. I was under the impression that user could configure a
> range of
> > >> > > > > supported versions, what's the trade-off for allowing single
> > >> > finalized
> > >> > > > > version only?
> > >> > > >
> > >> > > > (Kowshik): Great question! The finalized version of a feature
> > >> basically
> > >> > > > refers to
> > >> > > > the cluster-wide finalized feature "maximum" version. For
> example,
> > >> if
> > >> > the
> > >> > > > 'group_coordinator' feature
> > >> > > > has the finalized version set to 10, then, it means that
> > >> cluster-wide
> > >> > all
> > >> > > > versions upto v10 are
> > >> > > > supported for this feature. However, note that if some version
> (ex:
> > >> v0)
> > >> > > > gets deprecated
> > >> > > > for this feature, then we don’t convey that using this scheme
> (also
> > >> > > > supporting deprecation is a non-goal).
> > >> > > >
> > >> > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > >> > finalized
> > >> > > > feature "maximum" versions.
> > >> > > >
> > >> > > > > 9. One minor syntax fix: Note that here the "client" here may
> be a
> > >> > > > producer
> > >> > > >
> > >> > > > (Kowshik): Great point! Done.
> > >> > > >
> > >> > > >
> > >> > > > Cheers,
> > >> > > > Kowshik
> > >> > > >
> > >> > > >
> > >> > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > >> > reluctanthero104@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hey Kowshik,
> > >> > > > >
> > >> > > > > thanks for the revised KIP. Got a couple of questions:
> > >> > > > >
> > >> > > > > 1. "When is it safe for the brokers to begin handling EOS
> traffic"
> > >> > > could
> > >> > > > be
> > >> > > > > converted as "When is it safe for the brokers to start
> serving new
> > >> > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier in
> > >> > the
> > >> > > > > context.
> > >> > > > >
> > >> > > > > 2. In the *Explanation *section, the metadata version number
> part
> > >> > > seems a
> > >> > > > > bit blurred. Could you point a reference to later section
> that we
> > >> > going
> > >> > > > to
> > >> > > > > store it in Zookeeper and update it every time when there is a
> > >> > feature
> > >> > > > > change?
> > >> > > > >
> > >> > > > > 3. For the feature downgrade, although it's a Non-goal of the
> KIP,
> > >> > for
> > >> > > > > features such as group coordinator semantics, there is no
> legal
> > >> > > scenario
> > >> > > > to
> > >> > > > > perform a downgrade at all. So having downgrade door open is
> > >> pretty
> > >> > > > > error-prone as human faults happen all the time. I'm assuming
> as
> > >> new
> > >> > > > > features are implemented, it's not very hard to add a flag
> during
> > >> > > feature
> > >> > > > > creation to indicate whether this feature is "downgradable".
> Could
> > >> > you
> > >> > > > > explain a bit more on the extra engineering effort for
> shipping
> > >> this
> > >> > > KIP
> > >> > > > > with downgrade protection in place?
> > >> > > > >
> > >> > > > > 4. "Each broker’s supported dictionary of feature versions
> will be
> > >> > > > defined
> > >> > > > > in the broker code." So this means in order to restrict a
> certain
> > >> > > > feature,
> > >> > > > > we need to start the broker first and then send a feature
> gating
> > >> > > request
> > >> > > > > immediately, which introduces a time gap and the
> intended-to-close
> > >> > > > feature
> > >> > > > > could actually serve request during this phase. Do you think
> we
> > >> > should
> > >> > > > also
> > >> > > > > support configurations as well so that admin user could freely
> > >> roll
> > >> > up
> > >> > > a
> > >> > > > > cluster with all nodes complying the same feature gating,
> without
> > >> > > > worrying
> > >> > > > > about the turnaround time to propagate the message only after
> the
> > >> > > cluster
> > >> > > > > starts up?
> > >> > > > >
> > >> > > > > 5. "adding a new Feature, updating or deleting an existing
> > >> Feature",
> > >> > > may
> > >> > > > be
> > >> > > > > I misunderstood something, I thought the features are defined
> in
> > >> > broker
> > >> > > > > code, so admin could not really create a new feature?
> > >> > > > >
> > >> > > > > 6. I think we need a separate error code like
> > >> > > FEATURE_UPDATE_IN_PROGRESS
> > >> > > > to
> > >> > > > > reject a concurrent feature update request.
> > >> > > > >
> > >> > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > >> the
> > >> > > > > feature information through Zookeeper. Is that mentioned in
> the
> > >> KIP
> > >> > to
> > >> > > > > justify why using UpdateMetadata is more favorable?
> > >> > > > >
> > >> > > > > 8. I was under the impression that user could configure a
> range of
> > >> > > > > supported versions, what's the trade-off for allowing single
> > >> > finalized
> > >> > > > > version only?
> > >> > > > >
> > >> > > > > 9. One minor syntax fix: Note that here the "client" here may
> be a
> > >> > > > producer
> > >> > > > >
> > >> > > > > Boyang
> > >> > > > >
> > >> > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> cmccabe@apache.org>
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > >> > > > > > > Hi Colin,
> > >> > > > > > >
> > >> > > > > > > Thanks for the feedback! I've changed the KIP to address
> your
> > >> > > > > > > suggestions.
> > >> > > > > > > Please find below my explanation. Here is a link to KIP
> 584:
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > >> > > > > > > .
> > >> > > > > > >
> > >> > > > > > > 1. '__data_version__' is the version of the finalized
> feature
> > >> > > > metadata
> > >> > > > > > > (i.e. actual ZK node contents), while the
> > >> '__schema_version__' is
> > >> > > the
> > >> > > > > > > version of the schema of the data persisted in ZK. These
> serve
> > >> > > > > different
> > >> > > > > > > purposes. '__data_version__' is is useful mainly to
> clients
> > >> > during
> > >> > > > > reads,
> > >> > > > > > > to differentiate between the 2 versions of eventually
> > >> consistent
> > >> > > > > > 'finalized
> > >> > > > > > > features' metadata (i.e. larger metadata version is more
> > >> recent).
> > >> > > > > > > '__schema_version__' provides an additional degree of
> > >> > flexibility,
> > >> > > > > where
> > >> > > > > > if
> > >> > > > > > > we decide to change the schema for '/features' node in ZK
> (in
> > >> the
> > >> > > > > > future),
> > >> > > > > > > then we can manage broker roll outs suitably (i.e.
> > >> > > > > > > serialization/deserialization of the ZK data can be
> handled
> > >> > > safely).
> > >> > > > > >
> > >> > > > > > Hi Kowshik,
> > >> > > > > >
> > >> > > > > > If you're talking about a number that lets you know if data
> is
> > >> more
> > >> > > or
> > >> > > > > > less recent, we would typically call that an epoch, and not
> a
> > >> > > version.
> > >> > > > > For
> > >> > > > > > the ZK data structures, the word "version" is typically
> reserved
> > >> > for
> > >> > > > > > describing changes to the overall schema of the data that is
> > >> > written
> > >> > > to
> > >> > > > > > ZooKeeper.  We don't even really change the "version" of
> those
> > >> > > schemas
> > >> > > > > that
> > >> > > > > > much, since most changes are backwards-compatible.  But we
> do
> > >> > include
> > >> > > > > that
> > >> > > > > > version field just in case.
> > >> > > > > >
> > >> > > > > > I don't think we really need an epoch here, though, since
> we can
> > >> > just
> > >> > > > > look
> > >> > > > > > at the broker epoch.  Whenever the broker registers, its
> epoch
> > >> will
> > >> > > be
> > >> > > > > > greater than the previous broker epoch.  And the newly
> > >> registered
> > >> > > data
> > >> > > > > will
> > >> > > > > > take priority.  This will be a lot simpler than adding a
> > >> separate
> > >> > > epoch
> > >> > > > > > system, I think.
> > >> > > > > >
> > >> > > > > > >
> > >> > > > > > > 2. Regarding admin client needing min and max information
> -
> > >> you
> > >> > are
> > >> > > > > > right!
> > >> > > > > > > I've changed the KIP such that the Admin API also allows
> the
> > >> user
> > >> > > to
> > >> > > > > read
> > >> > > > > > > 'supported features' from a specific broker. Please look
> at
> > >> the
> > >> > > > section
> > >> > > > > > > "Admin API changes".
> > >> > > > > >
> > >> > > > > > Thanks.
> > >> > > > > >
> > >> > > > > > >
> > >> > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > >> deliberate.
> > >> > > > I've
> > >> > > > > > > improved the KIP to just use `long` at all places.
> > >> > > > > >
> > >> > > > > > Sounds good.
> > >> > > > > >
> > >> > > > > > >
> > >> > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> right!
> > >> > I've
> > >> > > > > > updated
> > >> > > > > > > the KIP sketching the functionality provided by this tool,
> > >> with
> > >> > > some
> > >> > > > > > > examples. Please look at the section "Tooling support
> > >> examples".
> > >> > > > > > >
> > >> > > > > > > Thank you!
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Thanks, Kowshik.
> > >> > > > > >
> > >> > > > > > cheers,
> > >> > > > > > Colin
> > >> > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Cheers,
> > >> > > > > > > Kowshik
> > >> > > > > > >
> > >> > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > >> > cmccabe@apache.org>
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Thanks, Kowshik, this looks good.
> > >> > > > > > > >
> > >> > > > > > > > In the "Schema" section, do we really need both
> > >> > > __schema_version__
> > >> > > > > and
> > >> > > > > > > > __data_version__?  Can we just have a single version
> field
> > >> > here?
> > >> > > > > > > >
> > >> > > > > > > > Shouldn't the Admin(Client) function have some way to
> get
> > >> the
> > >> > min
> > >> > > > and
> > >> > > > > > max
> > >> > > > > > > > information that we're exposing as well?  I guess we
> could
> > >> have
> > >> > > > min,
> > >> > > > > > max,
> > >> > > > > > > > and current.  Unrelated: is the use of Long rather than
> long
> > >> > > > > deliberate
> > >> > > > > > > > here?
> > >> > > > > > > >
> > >> > > > > > > > It would be good to describe how the command line tool
> > >> > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> flags
> > >> > that
> > >> > > > it
> > >> > > > > > will
> > >> > > > > > > > take and the output that it will generate to STDOUT.
> > >> > > > > > > >
> > >> > > > > > > > cheers,
> > >> > > > > > > > Colin
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > >> > > > > > > > > Hi all,
> > >> > > > > > > > >
> > >> > > > > > > > > I've opened KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>
> > >> > > <https://issues.apache.org/jira/browse/KIP-584> <
> > >> > > > https://issues.apache.org/jira/browse/KIP-584
> > >> > > > > >
> > >> > > > > > > > > which
> > >> > > > > > > > > is intended to provide a versioning scheme for
> features.
> > >> I'd
> > >> > > like
> > >> > > > > to
> > >> > > > > > use
> > >> > > > > > > > > this thread to discuss the same. I'd appreciate any
> > >> feedback
> > >> > on
> > >> > > > > this.
> > >> > > > > > > > > Here
> > >> > > > > > > > > is a link to KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>
> > >> > > <https://issues.apache.org/jira/browse/KIP-584>:
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > >> > > > > > > > >  .
> > >> > > > > > > > >
> > >> > > > > > > > > Thank you!
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > Cheers,
> > >> > > > > > > > > Kowshik
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Colin McCabe <cm...@apache.org>.
On Fri, Apr 3, 2020, at 20:32, Kowshik Prakasam wrote:
> > Colin wrote:
> > It would be simpler to just say that a feature flag which doesn't appear
> > in the znode is considered to be at version level 0.  This will also
> > simplify the code a lot, I think, since you won't have to keep track of
> > tricky distinctions between "disabled" and "enabled at version 0."
> > Then you would be able to just use an int in most places.
> 
> (Kowshik): I'm not sure I understood why we want do it this way. If an
> entry for some finalized feature is absent in '/features' node,
> alternatively we can just treat this as a feature with a version that
> was never finalized/enabled or it was deleted at some point. Then, we can
> even allow for "enabled at version 0" as the {minVersion, maxVersion} range
> can be any valid range, not necessarily minVersion > 0.

Think about the following pseudocode.  Which is simpler:

> if (feature is not present) || (feature level < 1) {
> ... something ...
> } else {
> ... something ...
> }

or

> if (feature level < 1) {
> ... something ...
> } else {
> ... something ...
> }

If you can just treat "not present" as version level 0, you can have just checks like the second one.  This should lead to simpler code.

> (Kowshik): Yes, the whole change is a transaction. Either all provided
> FeatureUpdate is carried out in ZK, or none happens. That's why we just
> allow for a single error code field, as it is easier that way. This
> transactional guarantee is mentioned under 'Proposed changes > New
> controller API'

That makes sense, thanks.

> > Rather than FeatureUpdateType, I would just go with a boolean like
> > "force."  I'm not sure what other values we'd want to add to this later on,
> > if it were an enum.  I think the boolean is clearer.
> 
> (Kowshik): Since we have decided to go just one API (i.e.
> UpdateFeaturesRequest), it is better that FeatureUpdateType is an enum with
> multiple values. A FeatureUpdateType is tied to a feature, and the possible
> values are: ADD_OR_UPDATE, ADD_OR_UPDATE_ALLOW_DOWNGRADE, DELETE.

I guess this ties in with the discussion above-- I would rather not have a "deleted" state.  It doesn't seem to make anything more expressive, and it complicates the code.

> 
> > This ties in with my comment earlier, but for the result classes, we need
> > methods other than just "all".  Batch operations aren't usable if
> > you can't get the result per operation.... unless the semantics are
> > transactional and it really is just everything succeeded or everything
> > failed.
> 
> (Kowshik): The semantics are transactional, as I explained above.

Thanks for the clarification.

> 
> > There are a bunch of Java interfaces described like FinalizedFeature,
> > FeatureUpdate, UpdateFeaturesResult, and so on that should just be
> > regular concrete Java classes.  In general we'd only use an interface if
> > we wanted the caller to implement some kind of callback function. We
> > don't make classes that are just designed to hold data into interfaces,
> > since that just imposes extra work on callers (they have to define
> > their own concrete class for each interface just to use the API.)
> >  There's also probably no reason to have these classes inherit from each
> > other or have complex type relationships.  One more nitpick is that Kafka
> > generally doesn't use "get" in the function names of accessors.
> 
> (Kowshik): Done, I have changed the KIP. By 'interface', I just meant
> interface from a pseudocode standpoint (i.e. it was just an abstraction
> providing at least the specified behavior). Since that was a bit confusing,
> I have now renamed it calling it a class. Also I have eliminated the type
> relationships.

Thanks.

best,
Colin

> 
> 
> Cheers,
> Kowshik
> 
> On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
> 
> > Hi Jun,
> >
> > Thanks for the feedback and suggestions. Please find my response below.
> >
> > > 100.6 For every new request, the admin needs to control who is allowed to
> > > issue that request if security is enabled. So, we need to assign the new
> > > request a ResourceType and possible AclOperations. See
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as an example.
> >
> > (Kowshik): I don't see any reference to the words ResourceType or
> > AclOperations
> > in the KIP. Please let me know how I can use the KIP that you linked to
> > know how to
> > setup the appropriate ResourceType and/or ClusterOperation?
> >
> > > 105. If we change delete to disable, it's better to do this consistently
> > in
> > > request protocol and admin api as well.
> >
> > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > feature.
> > I've just changed the KIP to use 'delete'. I don't have a strong
> > preference.
> >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > > for new features to be included in minor releases too. Should we make the
> > > feature versioning match the release versioning?
> >
> > (Kowshik): The release version can be mapped to a set of feature versions,
> > and this can be done, for example in the tool (or even external to the
> > tool).
> > Can you please clarify what I'm missing?
> >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> >
> > (Kowshik): I agree that there is a trade-off here, but it will help
> > to decide whether the automation can be thought through in the future
> > in a follow up KIP, or right now in this KIP. We may invest
> > in automation, but we have to decide whether we should do it
> > now or later.
> >
> > For the inconvenience that you mentioned, do you think the problem that you
> > mentioned can be  overcome by asking for the cluster operator to run a
> > bootstrap script  when he/she knows that a specific AK release has been
> > almost completely deployed in a cluster for the first time? Idea is that
> > the
> > bootstrap script will know how to map a specific AK release to finalized
> > feature versions, and run the `kafka-features.sh` tool appropriately
> > against
> > the cluster.
> >
> > Now, coming back to your automation proposal/question.
> > I do see the value of automated feature version finalization, but I also
> > see
> > that this will open up several questions and some risks, as explained
> > below.
> > The answers to these depend on the definition of the automation we choose
> > to build, and how well does it fit into a kafka deployment.
> > Basically, it can be unsafe for the controller to finalize feature version
> > upgrades automatically, without learning about the intent of the cluster
> > operator.
> > 1. We would sometimes want to lock feature versions only when we have
> > externally verified
> > the stability of the broker binary.
> > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > complete,
> > and new brokers are highly unlikely to join the cluster.
> > 3. Only the cluster operator knows that the intent is to deploy the same
> > version
> > of the new broker release across the entire cluster (i.e. the latest
> > downloaded version).
> > 4. For downgrades, it appears the controller still needs some external
> > input
> > (such as the proposed tool) to finalize a feature version downgrade.
> >
> > If we have automation, that automation can end up failing in some of the
> > cases
> > above. Then, we need a way to declare that the cluster is "not ready" if
> > the
> > controller cannot automatically finalize some basic required feature
> > version
> > upgrades across the cluster. We need to make the cluster operator aware in
> > such a scenario (raise an alert or alike).
> >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> > 48.
> >
> > (Kowshik): Done.
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
> >
> >> Hi, Kowshik,
> >>
> >> Thanks for the reply. A few more comments below.
> >>
> >> 100.6 For every new request, the admin needs to control who is allowed to
> >> issue that request if security is enabled. So, we need to assign the new
> >> request a ResourceType and possible AclOperations. See
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> >> as
> >> an example.
> >>
> >> 105. If we change delete to disable, it's better to do this consistently
> >> in
> >> request protocol and admin api as well.
> >>
> >> 110. The minVersion/maxVersion for features use int64. Currently, our
> >> release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> >> for new features to be included in minor releases too. Should we make the
> >> feature versioning match the release versioning?
> >>
> >> 111. "During regular operations, the data in the ZK node can be mutated
> >> only via a specific admin API served only by the controller." I am
> >> wondering why can't the controller auto finalize a feature version after
> >> all brokers are upgraded? For new users who download the latest version to
> >> build a new cluster, it's inconvenient for them to have to manually enable
> >> each feature.
> >>
> >> 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> >> 48.
> >>
> >> Jun
> >>
> >>
> >> On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> >> wrote:
> >>
> >> > Hey Jun,
> >> >
> >> > Thanks a lot for the great feedback! Please note that the design
> >> > has changed a little bit on the KIP, and we now propagate the finalized
> >> > features metadata only via ZK watches (instead of UpdateMetadataRequest
> >> > from the controller).
> >> >
> >> > Please find below my response to your questions/feedback, with the
> >> prefix
> >> > "(Kowshik):".
> >> >
> >> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> >> > > 100.1 Since this request waits for responses from brokers, should we
> >> add
> >> > a
> >> > > timeout in the request (like createTopicRequest)?
> >> >
> >> > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> >> > longer
> >> > wait for responses from brokers, since the design has been changed so
> >> that
> >> > the
> >> > features information is propagated via ZK. Nevertheless, it is right to
> >> > have a timeout
> >> > for the request.
> >> >
> >> > > 100.2 The response schema is a bit weird. Typically, the response just
> >> > > shows an error code and an error message, instead of echoing the
> >> request.
> >> >
> >> > (Kowshik): Great point! Yeah, I have modified it to just return an error
> >> > code and a message.
> >> > Previously it was not echoing the "request", rather it was returning the
> >> > latest set of
> >> > cluster-wide finalized features (after applying the updates). But you
> >> are
> >> > right,
> >> > the additional info is not required, so I have removed it from the
> >> response
> >> > schema.
> >> >
> >> > > 100.3 Should we add a separate request to list/describe the existing
> >> > > features?
> >> >
> >> > (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
> >> > Admin API,
> >> > which, underneath covers uses the ApiVersionsRequest to list/describe
> >> the
> >> > existing features. Please read the 'Tooling support' section.
> >> >
> >> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> >> > > DELETE, the version field doesn't make sense. So, I guess the broker
> >> just
> >> > > ignores this? An alternative way is to have a separate
> >> > DeleteFeaturesRequest
> >> >
> >> > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> >> > controller APIs
> >> > serving these different purposes:
> >> > 1. updateFeatures
> >> > 2. deleteFeatures
> >> >
> >> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> >> > > version of the metadata for finalized features." I am wondering why
> >> the
> >> > > ordering is important?
> >> >
> >> > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> >> > version), and
> >> > it is just the ZK node version. Basically, this is the epoch for the
> >> > cluster-wide
> >> > finalized feature version metadata. This metadata is served to clients
> >> via
> >> > the
> >> > ApiVersionsResponse (for reads). We propagate updates from the
> >> '/features'
> >> > ZK node
> >> > to all brokers, via ZK watches setup by each broker on the '/features'
> >> > node.
> >> >
> >> > Now here is why the ordering is important:
> >> > ZK watches don't propagate at the same time. As a result, the
> >> > ApiVersionsResponse
> >> > is eventually consistent across brokers. This can introduce cases
> >> > where clients see an older lower epoch of the features metadata, after a
> >> > more recent
> >> > higher epoch was returned at a previous point in time. We expect clients
> >> > to always employ the rule that the latest received higher epoch of
> >> metadata
> >> > always trumps an older smaller epoch. Those clients that are external to
> >> > Kafka should strongly consider discovering the latest metadata once
> >> during
> >> > startup from the brokers, and if required refresh the metadata
> >> periodically
> >> > (to get the latest metadata).
> >> >
> >> > > 100.6 Could you specify the required ACL for this new request?
> >> >
> >> > (Kowshik): What is ACL, and how could I find out which one to specify?
> >> > Please could you provide me some pointers? I'll be glad to update the
> >> > KIP once I know the next steps.
> >> >
> >> > > 101. For the broker registration ZK node, should we bump up the
> >> version
> >> > in
> >> > the json?
> >> >
> >> > (Kowshik): Great point! Done. I've increased the version in the broker
> >> json
> >> > by 1.
> >> >
> >> > > 102. For the /features ZK node, not sure if we need the epoch field.
> >> Each
> >> > > ZK node has an internal version field that is incremented on every
> >> > update.
> >> >
> >> > (Kowshik): Great point! Done. I'm using the ZK node version now,
> >> instead of
> >> > explicitly
> >> > incremented epoch.
> >> >
> >> > > 103. "Enabling the actual semantics of a feature version cluster-wide
> >> is
> >> > > left to the discretion of the logic implementing the feature (ex: can
> >> be
> >> > > done via dynamic broker config)." Does that mean the broker
> >> registration
> >> > ZK
> >> > > node will be updated dynamically when this happens?
> >> >
> >> > (Kowshik): Not really. The text was just conveying that a broker could
> >> > "know" of
> >> > a new feature version, but it does not mean the broker should have also
> >> > activated the effects of the feature version. Knowing vs activation are
> >> 2
> >> > separate things,
> >> > and the latter can be achieved by dynamic config. I have reworded the
> >> text
> >> > to
> >> > make this clear to the reader.
> >> >
> >> >
> >> > > 104. UpdateMetadataRequest
> >> > > 104.1 It would be useful to describe when the feature metadata is
> >> > included
> >> > > in the request. My understanding is that it's only included if (1)
> >> there
> >> > is
> >> > > a change to the finalized feature; (2) broker restart; (3) controller
> >> > > failover.
> >> > > 104.2 The new fields have the following versions. Why are the
> >> versions 3+
> >> > > when the top version is bumped to 6?
> >> > >       "fields":  [
> >> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> >> > >           "about": "The name of the feature."},
> >> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >> > >           "about": "The finalized version for the feature."}
> >> > >       ]
> >> >
> >> > (Kowshik): With the new improved design, we have completely eliminated
> >> the
> >> > need to
> >> > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
> >> the
> >> > notifications for changes to the '/features' ZK node.
> >> >
> >> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> >> > better
> >> > > to use enable/disable?
> >> >
> >> > (Kowshik): For delete, yes, I have changed it so that we instead call it
> >> > 'disable'.
> >> > However for 'update', it can now also refer to either an upgrade or a
> >> > forced downgrade.
> >> > Therefore, I have left it the way it is, just calling it as just
> >> 'update'.
> >> >
> >> >
> >> > Cheers,
> >> > Kowshik
> >> >
> >> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> >> >
> >> > > Hi, Kowshik,
> >> > >
> >> > > Thanks for the KIP. Looks good overall. A few comments below.
> >> > >
> >> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> >> > > 100.1 Since this request waits for responses from brokers, should we
> >> add
> >> > a
> >> > > timeout in the request (like createTopicRequest)?
> >> > > 100.2 The response schema is a bit weird. Typically, the response just
> >> > > shows an error code and an error message, instead of echoing the
> >> request.
> >> > > 100.3 Should we add a separate request to list/describe the existing
> >> > > features?
> >> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> >> > > DELETE, the version field doesn't make sense. So, I guess the broker
> >> just
> >> > > ignores this? An alternative way is to have a separate
> >> > > DeleteFeaturesRequest
> >> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> >> > > version of the metadata for finalized features." I am wondering why
> >> the
> >> > > ordering is important?
> >> > > 100.6 Could you specify the required ACL for this new request?
> >> > >
> >> > > 101. For the broker registration ZK node, should we bump up the
> >> version
> >> > in
> >> > > the json?
> >> > >
> >> > > 102. For the /features ZK node, not sure if we need the epoch field.
> >> Each
> >> > > ZK node has an internal version field that is incremented on every
> >> > update.
> >> > >
> >> > > 103. "Enabling the actual semantics of a feature version cluster-wide
> >> is
> >> > > left to the discretion of the logic implementing the feature (ex: can
> >> be
> >> > > done via dynamic broker config)." Does that mean the broker
> >> registration
> >> > ZK
> >> > > node will be updated dynamically when this happens?
> >> > >
> >> > > 104. UpdateMetadataRequest
> >> > > 104.1 It would be useful to describe when the feature metadata is
> >> > included
> >> > > in the request. My understanding is that it's only included if (1)
> >> there
> >> > is
> >> > > a change to the finalized feature; (2) broker restart; (3) controller
> >> > > failover.
> >> > > 104.2 The new fields have the following versions. Why are the
> >> versions 3+
> >> > > when the top version is bumped to 6?
> >> > >       "fields":  [
> >> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> >> > >           "about": "The name of the feature."},
> >> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >> > >           "about": "The finalized version for the feature."}
> >> > >       ]
> >> > >
> >> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> >> > better
> >> > > to use enable/disable?
> >> > >
> >> > > Jun
> >> > >
> >> > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> >> kprakasam@confluent.io
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Hey Boyang,
> >> > > >
> >> > > > Thanks for the great feedback! I have updated the KIP based on your
> >> > > > feedback.
> >> > > > Please find my response below for your comments, look for sentences
> >> > > > starting
> >> > > > with "(Kowshik)" below.
> >> > > >
> >> > > >
> >> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> >> > > could
> >> > > > be
> >> > > > > converted as "When is it safe for the brokers to start serving new
> >> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> >> > the
> >> > > > > context.
> >> > > >
> >> > > > (Kowshik): Great point! Done.
> >> > > >
> >> > > > > 2. In the *Explanation *section, the metadata version number part
> >> > > seems a
> >> > > > > bit blurred. Could you point a reference to later section that we
> >> > going
> >> > > > to
> >> > > > > store it in Zookeeper and update it every time when there is a
> >> > feature
> >> > > > > change?
> >> > > >
> >> > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> >> > > >
> >> > > >
> >> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> >> > for
> >> > > > > features such as group coordinator semantics, there is no legal
> >> > > scenario
> >> > > > to
> >> > > > > perform a downgrade at all. So having downgrade door open is
> >> pretty
> >> > > > > error-prone as human faults happen all the time. I'm assuming as
> >> new
> >> > > > > features are implemented, it's not very hard to add a flag during
> >> > > feature
> >> > > > > creation to indicate whether this feature is "downgradable". Could
> >> > you
> >> > > > > explain a bit more on the extra engineering effort for shipping
> >> this
> >> > > KIP
> >> > > > > with downgrade protection in place?
> >> > > >
> >> > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
> >> that
> >> > > > accidental
> >> > > > downgrades can cause problems, I also think sometimes downgrades
> >> should
> >> > > > be allowed for emergency reasons (not all downgrades cause issues).
> >> > > > It is just subjective to the feature being downgraded.
> >> > > >
> >> > > > To be more strict about feature version downgrades, I have modified
> >> the
> >> > > KIP
> >> > > > proposing that we mandate a `--force-downgrade` flag be used in the
> >> > > > UPDATE_FEATURES api
> >> > > > and the tooling, whenever the human is downgrading a finalized
> >> feature
> >> > > > version.
> >> > > > Hopefully this should cover the requirement, until we find the need
> >> for
> >> > > > advanced downgrade support.
> >> > > >
> >> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> >> > > > defined
> >> > > > > in the broker code." So this means in order to restrict a certain
> >> > > > feature,
> >> > > > > we need to start the broker first and then send a feature gating
> >> > > request
> >> > > > > immediately, which introduces a time gap and the intended-to-close
> >> > > > feature
> >> > > > > could actually serve request during this phase. Do you think we
> >> > should
> >> > > > also
> >> > > > > support configurations as well so that admin user could freely
> >> roll
> >> > up
> >> > > a
> >> > > > > cluster with all nodes complying the same feature gating, without
> >> > > > worrying
> >> > > > > about the turnaround time to propagate the message only after the
> >> > > cluster
> >> > > > > starts up?
> >> > > >
> >> > > > (Kowshik): This is a great point/question. One of the expectations
> >> out
> >> > of
> >> > > > this KIP, which is
> >> > > > already followed in the broker, is the following.
> >> > > >  - Imagine at time T1 the broker starts up and registers it’s
> >> presence
> >> > in
> >> > > > ZK,
> >> > > >    along with advertising it’s supported features.
> >> > > >  - Imagine at a future time T2 the broker receives the
> >> > > > UpdateMetadataRequest
> >> > > >    from the controller, which contains the latest finalized
> >> features as
> >> > > > seen by
> >> > > >    the controller. The broker validates this data against it’s
> >> > supported
> >> > > > features to
> >> > > >    make sure there is no mismatch (it will shutdown if there is an
> >> > > > incompatibility).
> >> > > >
> >> > > > It is expected that during the time between the 2 events T1 and T2,
> >> the
> >> > > > broker is
> >> > > > almost a silent entity in the cluster. It does not add any value to
> >> the
> >> > > > cluster, or carry
> >> > > > out any important broker activities. By “important”, I mean it is
> >> not
> >> > > doing
> >> > > > mutations
> >> > > > on it’s persistence, not mutating critical in-memory state, won’t be
> >> > > > serving
> >> > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> >> > > partitions
> >> > > > until
> >> > > > it receives UpdateMetadataRequest from controller. Anything the
> >> broker
> >> > is
> >> > > > doing up
> >> > > > until this point is not damaging/useful.
> >> > > >
> >> > > > I’ve clarified the above in the KIP, see this new section:
> >> > > >
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> >> > > > .
> >> > > >
> >> > > > > 5. "adding a new Feature, updating or deleting an existing
> >> Feature",
> >> > > may
> >> > > > be
> >> > > > > I misunderstood something, I thought the features are defined in
> >> > broker
> >> > > > > code, so admin could not really create a new feature?
> >> > > >
> >> > > > (Kowshik): Great point! You understood this right. Here adding a
> >> > feature
> >> > > > means we are
> >> > > > adding a cluster-wide finalized *max* version for a feature that was
> >> > > > previously never finalized.
> >> > > > I have clarified this in the KIP now.
> >> > > >
> >> > > > > 6. I think we need a separate error code like
> >> > > FEATURE_UPDATE_IN_PROGRESS
> >> > > > to
> >> > > > > reject a concurrent feature update request.
> >> > > >
> >> > > > (Kowshik): Great point! I have modified the KIP adding the above
> >> (see
> >> > > > 'Tooling support -> Admin API changes').
> >> > > >
> >> > > > > 7. I think we haven't discussed the alternative solution to pass
> >> the
> >> > > > > feature information through Zookeeper. Is that mentioned in the
> >> KIP
> >> > to
> >> > > > > justify why using UpdateMetadata is more favorable?
> >> > > >
> >> > > > (Kowshik): Nice question! The broker reads finalized feature info
> >> > stored
> >> > > in
> >> > > > ZK,
> >> > > > only during startup when it does a validation. When serving
> >> > > > `ApiVersionsRequest`, the
> >> > > > broker does not read this info from ZK directly. I'd imagine the
> >> risk
> >> > is
> >> > > > that it can increase
> >> > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> >> > Kafka
> >> > > > we use the
> >> > > > controller to fan out ZK updates to brokers and we want to stick to
> >> > that
> >> > > > pattern to avoid
> >> > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> >> > > >
> >> > > > > 8. I was under the impression that user could configure a range of
> >> > > > > supported versions, what's the trade-off for allowing single
> >> > finalized
> >> > > > > version only?
> >> > > >
> >> > > > (Kowshik): Great question! The finalized version of a feature
> >> basically
> >> > > > refers to
> >> > > > the cluster-wide finalized feature "maximum" version. For example,
> >> if
> >> > the
> >> > > > 'group_coordinator' feature
> >> > > > has the finalized version set to 10, then, it means that
> >> cluster-wide
> >> > all
> >> > > > versions upto v10 are
> >> > > > supported for this feature. However, note that if some version (ex:
> >> v0)
> >> > > > gets deprecated
> >> > > > for this feature, then we don’t convey that using this scheme (also
> >> > > > supporting deprecation is a non-goal).
> >> > > >
> >> > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> >> > finalized
> >> > > > feature "maximum" versions.
> >> > > >
> >> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> >> > > > producer
> >> > > >
> >> > > > (Kowshik): Great point! Done.
> >> > > >
> >> > > >
> >> > > > Cheers,
> >> > > > Kowshik
> >> > > >
> >> > > >
> >> > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> >> > reluctanthero104@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hey Kowshik,
> >> > > > >
> >> > > > > thanks for the revised KIP. Got a couple of questions:
> >> > > > >
> >> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> >> > > could
> >> > > > be
> >> > > > > converted as "When is it safe for the brokers to start serving new
> >> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> >> > the
> >> > > > > context.
> >> > > > >
> >> > > > > 2. In the *Explanation *section, the metadata version number part
> >> > > seems a
> >> > > > > bit blurred. Could you point a reference to later section that we
> >> > going
> >> > > > to
> >> > > > > store it in Zookeeper and update it every time when there is a
> >> > feature
> >> > > > > change?
> >> > > > >
> >> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> >> > for
> >> > > > > features such as group coordinator semantics, there is no legal
> >> > > scenario
> >> > > > to
> >> > > > > perform a downgrade at all. So having downgrade door open is
> >> pretty
> >> > > > > error-prone as human faults happen all the time. I'm assuming as
> >> new
> >> > > > > features are implemented, it's not very hard to add a flag during
> >> > > feature
> >> > > > > creation to indicate whether this feature is "downgradable". Could
> >> > you
> >> > > > > explain a bit more on the extra engineering effort for shipping
> >> this
> >> > > KIP
> >> > > > > with downgrade protection in place?
> >> > > > >
> >> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> >> > > > defined
> >> > > > > in the broker code." So this means in order to restrict a certain
> >> > > > feature,
> >> > > > > we need to start the broker first and then send a feature gating
> >> > > request
> >> > > > > immediately, which introduces a time gap and the intended-to-close
> >> > > > feature
> >> > > > > could actually serve request during this phase. Do you think we
> >> > should
> >> > > > also
> >> > > > > support configurations as well so that admin user could freely
> >> roll
> >> > up
> >> > > a
> >> > > > > cluster with all nodes complying the same feature gating, without
> >> > > > worrying
> >> > > > > about the turnaround time to propagate the message only after the
> >> > > cluster
> >> > > > > starts up?
> >> > > > >
> >> > > > > 5. "adding a new Feature, updating or deleting an existing
> >> Feature",
> >> > > may
> >> > > > be
> >> > > > > I misunderstood something, I thought the features are defined in
> >> > broker
> >> > > > > code, so admin could not really create a new feature?
> >> > > > >
> >> > > > > 6. I think we need a separate error code like
> >> > > FEATURE_UPDATE_IN_PROGRESS
> >> > > > to
> >> > > > > reject a concurrent feature update request.
> >> > > > >
> >> > > > > 7. I think we haven't discussed the alternative solution to pass
> >> the
> >> > > > > feature information through Zookeeper. Is that mentioned in the
> >> KIP
> >> > to
> >> > > > > justify why using UpdateMetadata is more favorable?
> >> > > > >
> >> > > > > 8. I was under the impression that user could configure a range of
> >> > > > > supported versions, what's the trade-off for allowing single
> >> > finalized
> >> > > > > version only?
> >> > > > >
> >> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> >> > > > producer
> >> > > > >
> >> > > > > Boyang
> >> > > > >
> >> > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> >> > > wrote:
> >> > > > >
> >> > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> >> > > > > > > Hi Colin,
> >> > > > > > >
> >> > > > > > > Thanks for the feedback! I've changed the KIP to address your
> >> > > > > > > suggestions.
> >> > > > > > > Please find below my explanation. Here is a link to KIP 584:
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > > .
> >> > > > > > >
> >> > > > > > > 1. '__data_version__' is the version of the finalized feature
> >> > > > metadata
> >> > > > > > > (i.e. actual ZK node contents), while the
> >> '__schema_version__' is
> >> > > the
> >> > > > > > > version of the schema of the data persisted in ZK. These serve
> >> > > > > different
> >> > > > > > > purposes. '__data_version__' is is useful mainly to clients
> >> > during
> >> > > > > reads,
> >> > > > > > > to differentiate between the 2 versions of eventually
> >> consistent
> >> > > > > > 'finalized
> >> > > > > > > features' metadata (i.e. larger metadata version is more
> >> recent).
> >> > > > > > > '__schema_version__' provides an additional degree of
> >> > flexibility,
> >> > > > > where
> >> > > > > > if
> >> > > > > > > we decide to change the schema for '/features' node in ZK (in
> >> the
> >> > > > > > future),
> >> > > > > > > then we can manage broker roll outs suitably (i.e.
> >> > > > > > > serialization/deserialization of the ZK data can be handled
> >> > > safely).
> >> > > > > >
> >> > > > > > Hi Kowshik,
> >> > > > > >
> >> > > > > > If you're talking about a number that lets you know if data is
> >> more
> >> > > or
> >> > > > > > less recent, we would typically call that an epoch, and not a
> >> > > version.
> >> > > > > For
> >> > > > > > the ZK data structures, the word "version" is typically reserved
> >> > for
> >> > > > > > describing changes to the overall schema of the data that is
> >> > written
> >> > > to
> >> > > > > > ZooKeeper.  We don't even really change the "version" of those
> >> > > schemas
> >> > > > > that
> >> > > > > > much, since most changes are backwards-compatible.  But we do
> >> > include
> >> > > > > that
> >> > > > > > version field just in case.
> >> > > > > >
> >> > > > > > I don't think we really need an epoch here, though, since we can
> >> > just
> >> > > > > look
> >> > > > > > at the broker epoch.  Whenever the broker registers, its epoch
> >> will
> >> > > be
> >> > > > > > greater than the previous broker epoch.  And the newly
> >> registered
> >> > > data
> >> > > > > will
> >> > > > > > take priority.  This will be a lot simpler than adding a
> >> separate
> >> > > epoch
> >> > > > > > system, I think.
> >> > > > > >
> >> > > > > > >
> >> > > > > > > 2. Regarding admin client needing min and max information -
> >> you
> >> > are
> >> > > > > > right!
> >> > > > > > > I've changed the KIP such that the Admin API also allows the
> >> user
> >> > > to
> >> > > > > read
> >> > > > > > > 'supported features' from a specific broker. Please look at
> >> the
> >> > > > section
> >> > > > > > > "Admin API changes".
> >> > > > > >
> >> > > > > > Thanks.
> >> > > > > >
> >> > > > > > >
> >> > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> >> deliberate.
> >> > > > I've
> >> > > > > > > improved the KIP to just use `long` at all places.
> >> > > > > >
> >> > > > > > Sounds good.
> >> > > > > >
> >> > > > > > >
> >> > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> >> > I've
> >> > > > > > updated
> >> > > > > > > the KIP sketching the functionality provided by this tool,
> >> with
> >> > > some
> >> > > > > > > examples. Please look at the section "Tooling support
> >> examples".
> >> > > > > > >
> >> > > > > > > Thank you!
> >> > > > > >
> >> > > > > >
> >> > > > > > Thanks, Kowshik.
> >> > > > > >
> >> > > > > > cheers,
> >> > > > > > Colin
> >> > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Cheers,
> >> > > > > > > Kowshik
> >> > > > > > >
> >> > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> >> > cmccabe@apache.org>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Thanks, Kowshik, this looks good.
> >> > > > > > > >
> >> > > > > > > > In the "Schema" section, do we really need both
> >> > > __schema_version__
> >> > > > > and
> >> > > > > > > > __data_version__?  Can we just have a single version field
> >> > here?
> >> > > > > > > >
> >> > > > > > > > Shouldn't the Admin(Client) function have some way to get
> >> the
> >> > min
> >> > > > and
> >> > > > > > max
> >> > > > > > > > information that we're exposing as well?  I guess we could
> >> have
> >> > > > min,
> >> > > > > > max,
> >> > > > > > > > and current.  Unrelated: is the use of Long rather than long
> >> > > > > deliberate
> >> > > > > > > > here?
> >> > > > > > > >
> >> > > > > > > > It would be good to describe how the command line tool
> >> > > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> >> > that
> >> > > > it
> >> > > > > > will
> >> > > > > > > > take and the output that it will generate to STDOUT.
> >> > > > > > > >
> >> > > > > > > > cheers,
> >> > > > > > > > Colin
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> >> > > > > > > > > Hi all,
> >> > > > > > > > >
> >> > > > > > > > > I've opened KIP-584
> >> > > <https://issues.apache.org/jira/browse/KIP-584> <
> >> > > > https://issues.apache.org/jira/browse/KIP-584
> >> > > > > >
> >> > > > > > > > > which
> >> > > > > > > > > is intended to provide a versioning scheme for features.
> >> I'd
> >> > > like
> >> > > > > to
> >> > > > > > use
> >> > > > > > > > > this thread to discuss the same. I'd appreciate any
> >> feedback
> >> > on
> >> > > > > this.
> >> > > > > > > > > Here
> >> > > > > > > > > is a link to KIP-584
> >> > > <https://issues.apache.org/jira/browse/KIP-584>:
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > > > >  .
> >> > > > > > > > >
> >> > > > > > > > > Thank you!
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Cheers,
> >> > > > > > > > > Kowshik
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Colin,

Thanks for the feedback! I have updated the KIP based on your feedback.
Please find my response below.

> The discussion on ZooKeeper reads versus writes makes sense to me.  The
important thing to keep in mind here is that in the bridge release,
> all brokers can read from ZooKeeper, but only the controller writes.

(Kowshik): Great, thanks!

> Why do we need both UpdateFeaturesRequest and DeleteFeaturesRequest?  It
seems awkward to have "deleting" be a special case here when the
> general idea is that we have an RPC to change the supported feature
flags.  Changing the feature level from 2 to 1 doesn't seem that different
> from changing it from 1 to not present.

(Kowshik): Done, makes sense. I have updated the KIP to just use 1 API, and
that's the UpdateFeaturesRequest. For the deletion case, we can just ignore
the version number passed in the API (which is indicative of 'not present').

> It would be simpler to just say that a feature flag which doesn't appear
in the znode is considered to be at version level 0.  This will also
> simplify the code a lot, I think, since you won't have to keep track of
tricky distinctions between "disabled" and "enabled at version 0."
> Then you would be able to just use an int in most places.

(Kowshik): I'm not sure I understood why we want do it this way. If an
entry for some finalized feature is absent in '/features' node,
alternatively we can just treat this as a feature with a version that
was never finalized/enabled or it was deleted at some point. Then, we can
even allow for "enabled at version 0" as the {minVersion, maxVersion} range
can be any valid range, not necessarily minVersion > 0.

> (By the way, I would propose the term "version level" for this number,
since it avoids confusion with all the other meanings of the word
> "version" that we have in the code.)

(Kowshik): Good idea! I have updated the KIP to refer to "version level"
instead of version.

> Another thing to keep in mind is that if a request RPC is batch, the
corresponding response RPC also needs to be batch.  In other words, you
> need multiple error codes, one for each feature flag whose level you are
trying to change.  Unless the idea is that the whole change is a
> transaction that all either happens or doesn't?

(Kowshik): Yes, the whole change is a transaction. Either all provided
FeatureUpdate is carried out in ZK, or none happens. That's why we just
allow for a single error code field, as it is easier that way. This
transactional guarantee is mentioned under 'Proposed changes > New
controller API'

> Rather than FeatureUpdateType, I would just go with a boolean like
"force."  I'm not sure what other values we'd want to add to this later on,
> if it were an enum.  I think the boolean is clearer.

(Kowshik): Since we have decided to go just one API (i.e.
UpdateFeaturesRequest), it is better that FeatureUpdateType is an enum with
multiple values. A FeatureUpdateType is tied to a feature, and the possible
values are: ADD_OR_UPDATE, ADD_OR_UPDATE_ALLOW_DOWNGRADE, DELETE.

> This ties in with my comment earlier, but for the result classes, we need
methods other than just "all".  Batch operations aren't usable if
> you can't get the result per operation.... unless the semantics are
transactional and it really is just everything succeeded or everything
> failed.

(Kowshik): The semantics are transactional, as I explained above.

> There are a bunch of Java interfaces described like FinalizedFeature,
FeatureUpdate, UpdateFeaturesResult, and so on that should just be
> regular concrete Java classes.  In general we'd only use an interface if
we wanted the caller to implement some kind of callback function. We
> don't make classes that are just designed to hold data into interfaces,
since that just imposes extra work on callers (they have to define
> their own concrete class for each interface just to use the API.)
 There's also probably no reason to have these classes inherit from each
> other or have complex type relationships.  One more nitpick is that Kafka
generally doesn't use "get" in the function names of accessors.

(Kowshik): Done, I have changed the KIP. By 'interface', I just meant
interface from a pseudocode standpoint (i.e. it was just an abstraction
providing at least the specified behavior). Since that was a bit confusing,
I have now renamed it calling it a class. Also I have eliminated the type
relationships.


Cheers,
Kowshik

On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi Jun,
>
> Thanks for the feedback and suggestions. Please find my response below.
>
> > 100.6 For every new request, the admin needs to control who is allowed to
> > issue that request if security is enabled. So, we need to assign the new
> > request a ResourceType and possible AclOperations. See
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > as an example.
>
> (Kowshik): I don't see any reference to the words ResourceType or
> AclOperations
> in the KIP. Please let me know how I can use the KIP that you linked to
> know how to
> setup the appropriate ResourceType and/or ClusterOperation?
>
> > 105. If we change delete to disable, it's better to do this consistently
> in
> > request protocol and admin api as well.
>
> (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> feature.
> I've just changed the KIP to use 'delete'. I don't have a strong
> preference.
>
> > 110. The minVersion/maxVersion for features use int64. Currently, our
> > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > for new features to be included in minor releases too. Should we make the
> > feature versioning match the release versioning?
>
> (Kowshik): The release version can be mapped to a set of feature versions,
> and this can be done, for example in the tool (or even external to the
> tool).
> Can you please clarify what I'm missing?
>
> > 111. "During regular operations, the data in the ZK node can be mutated
> > only via a specific admin API served only by the controller." I am
> > wondering why can't the controller auto finalize a feature version after
> > all brokers are upgraded? For new users who download the latest version
> to
> > build a new cluster, it's inconvenient for them to have to manually
> enable
> > each feature.
>
> (Kowshik): I agree that there is a trade-off here, but it will help
> to decide whether the automation can be thought through in the future
> in a follow up KIP, or right now in this KIP. We may invest
> in automation, but we have to decide whether we should do it
> now or later.
>
> For the inconvenience that you mentioned, do you think the problem that you
> mentioned can be  overcome by asking for the cluster operator to run a
> bootstrap script  when he/she knows that a specific AK release has been
> almost completely deployed in a cluster for the first time? Idea is that
> the
> bootstrap script will know how to map a specific AK release to finalized
> feature versions, and run the `kafka-features.sh` tool appropriately
> against
> the cluster.
>
> Now, coming back to your automation proposal/question.
> I do see the value of automated feature version finalization, but I also
> see
> that this will open up several questions and some risks, as explained
> below.
> The answers to these depend on the definition of the automation we choose
> to build, and how well does it fit into a kafka deployment.
> Basically, it can be unsafe for the controller to finalize feature version
> upgrades automatically, without learning about the intent of the cluster
> operator.
> 1. We would sometimes want to lock feature versions only when we have
> externally verified
> the stability of the broker binary.
> 2. Sometimes only the cluster operator knows that a cluster upgrade is
> complete,
> and new brokers are highly unlikely to join the cluster.
> 3. Only the cluster operator knows that the intent is to deploy the same
> version
> of the new broker release across the entire cluster (i.e. the latest
> downloaded version).
> 4. For downgrades, it appears the controller still needs some external
> input
> (such as the proposed tool) to finalize a feature version downgrade.
>
> If we have automation, that automation can end up failing in some of the
> cases
> above. Then, we need a way to declare that the cluster is "not ready" if
> the
> controller cannot automatically finalize some basic required feature
> version
> upgrades across the cluster. We need to make the cluster operator aware in
> such a scenario (raise an alert or alike).
>
> > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> 48.
>
> (Kowshik): Done.
>
>
> Cheers,
> Kowshik
>
> On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Kowshik,
>>
>> Thanks for the reply. A few more comments below.
>>
>> 100.6 For every new request, the admin needs to control who is allowed to
>> issue that request if security is enabled. So, we need to assign the new
>> request a ResourceType and possible AclOperations. See
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
>> as
>> an example.
>>
>> 105. If we change delete to disable, it's better to do this consistently
>> in
>> request protocol and admin api as well.
>>
>> 110. The minVersion/maxVersion for features use int64. Currently, our
>> release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
>> for new features to be included in minor releases too. Should we make the
>> feature versioning match the release versioning?
>>
>> 111. "During regular operations, the data in the ZK node can be mutated
>> only via a specific admin API served only by the controller." I am
>> wondering why can't the controller auto finalize a feature version after
>> all brokers are upgraded? For new users who download the latest version to
>> build a new cluster, it's inconvenient for them to have to manually enable
>> each feature.
>>
>> 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
>> 48.
>>
>> Jun
>>
>>
>> On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
>> wrote:
>>
>> > Hey Jun,
>> >
>> > Thanks a lot for the great feedback! Please note that the design
>> > has changed a little bit on the KIP, and we now propagate the finalized
>> > features metadata only via ZK watches (instead of UpdateMetadataRequest
>> > from the controller).
>> >
>> > Please find below my response to your questions/feedback, with the
>> prefix
>> > "(Kowshik):".
>> >
>> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
>> > > 100.1 Since this request waits for responses from brokers, should we
>> add
>> > a
>> > > timeout in the request (like createTopicRequest)?
>> >
>> > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
>> > longer
>> > wait for responses from brokers, since the design has been changed so
>> that
>> > the
>> > features information is propagated via ZK. Nevertheless, it is right to
>> > have a timeout
>> > for the request.
>> >
>> > > 100.2 The response schema is a bit weird. Typically, the response just
>> > > shows an error code and an error message, instead of echoing the
>> request.
>> >
>> > (Kowshik): Great point! Yeah, I have modified it to just return an error
>> > code and a message.
>> > Previously it was not echoing the "request", rather it was returning the
>> > latest set of
>> > cluster-wide finalized features (after applying the updates). But you
>> are
>> > right,
>> > the additional info is not required, so I have removed it from the
>> response
>> > schema.
>> >
>> > > 100.3 Should we add a separate request to list/describe the existing
>> > > features?
>> >
>> > (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
>> > Admin API,
>> > which, underneath covers uses the ApiVersionsRequest to list/describe
>> the
>> > existing features. Please read the 'Tooling support' section.
>> >
>> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
>> > > DELETE, the version field doesn't make sense. So, I guess the broker
>> just
>> > > ignores this? An alternative way is to have a separate
>> > DeleteFeaturesRequest
>> >
>> > (Kowshik): Great point! I have modified the KIP now to have 2 separate
>> > controller APIs
>> > serving these different purposes:
>> > 1. updateFeatures
>> > 2. deleteFeatures
>> >
>> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
>> > > version of the metadata for finalized features." I am wondering why
>> the
>> > > ordering is important?
>> >
>> > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
>> > version), and
>> > it is just the ZK node version. Basically, this is the epoch for the
>> > cluster-wide
>> > finalized feature version metadata. This metadata is served to clients
>> via
>> > the
>> > ApiVersionsResponse (for reads). We propagate updates from the
>> '/features'
>> > ZK node
>> > to all brokers, via ZK watches setup by each broker on the '/features'
>> > node.
>> >
>> > Now here is why the ordering is important:
>> > ZK watches don't propagate at the same time. As a result, the
>> > ApiVersionsResponse
>> > is eventually consistent across brokers. This can introduce cases
>> > where clients see an older lower epoch of the features metadata, after a
>> > more recent
>> > higher epoch was returned at a previous point in time. We expect clients
>> > to always employ the rule that the latest received higher epoch of
>> metadata
>> > always trumps an older smaller epoch. Those clients that are external to
>> > Kafka should strongly consider discovering the latest metadata once
>> during
>> > startup from the brokers, and if required refresh the metadata
>> periodically
>> > (to get the latest metadata).
>> >
>> > > 100.6 Could you specify the required ACL for this new request?
>> >
>> > (Kowshik): What is ACL, and how could I find out which one to specify?
>> > Please could you provide me some pointers? I'll be glad to update the
>> > KIP once I know the next steps.
>> >
>> > > 101. For the broker registration ZK node, should we bump up the
>> version
>> > in
>> > the json?
>> >
>> > (Kowshik): Great point! Done. I've increased the version in the broker
>> json
>> > by 1.
>> >
>> > > 102. For the /features ZK node, not sure if we need the epoch field.
>> Each
>> > > ZK node has an internal version field that is incremented on every
>> > update.
>> >
>> > (Kowshik): Great point! Done. I'm using the ZK node version now,
>> instead of
>> > explicitly
>> > incremented epoch.
>> >
>> > > 103. "Enabling the actual semantics of a feature version cluster-wide
>> is
>> > > left to the discretion of the logic implementing the feature (ex: can
>> be
>> > > done via dynamic broker config)." Does that mean the broker
>> registration
>> > ZK
>> > > node will be updated dynamically when this happens?
>> >
>> > (Kowshik): Not really. The text was just conveying that a broker could
>> > "know" of
>> > a new feature version, but it does not mean the broker should have also
>> > activated the effects of the feature version. Knowing vs activation are
>> 2
>> > separate things,
>> > and the latter can be achieved by dynamic config. I have reworded the
>> text
>> > to
>> > make this clear to the reader.
>> >
>> >
>> > > 104. UpdateMetadataRequest
>> > > 104.1 It would be useful to describe when the feature metadata is
>> > included
>> > > in the request. My understanding is that it's only included if (1)
>> there
>> > is
>> > > a change to the finalized feature; (2) broker restart; (3) controller
>> > > failover.
>> > > 104.2 The new fields have the following versions. Why are the
>> versions 3+
>> > > when the top version is bumped to 6?
>> > >       "fields":  [
>> > >         {"name": "Name", "type":  "string", "versions":  "3+",
>> > >           "about": "The name of the feature."},
>> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
>> > >           "about": "The finalized version for the feature."}
>> > >       ]
>> >
>> > (Kowshik): With the new improved design, we have completely eliminated
>> the
>> > need to
>> > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
>> the
>> > notifications for changes to the '/features' ZK node.
>> >
>> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
>> > better
>> > > to use enable/disable?
>> >
>> > (Kowshik): For delete, yes, I have changed it so that we instead call it
>> > 'disable'.
>> > However for 'update', it can now also refer to either an upgrade or a
>> > forced downgrade.
>> > Therefore, I have left it the way it is, just calling it as just
>> 'update'.
>> >
>> >
>> > Cheers,
>> > Kowshik
>> >
>> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
>> >
>> > > Hi, Kowshik,
>> > >
>> > > Thanks for the KIP. Looks good overall. A few comments below.
>> > >
>> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
>> > > 100.1 Since this request waits for responses from brokers, should we
>> add
>> > a
>> > > timeout in the request (like createTopicRequest)?
>> > > 100.2 The response schema is a bit weird. Typically, the response just
>> > > shows an error code and an error message, instead of echoing the
>> request.
>> > > 100.3 Should we add a separate request to list/describe the existing
>> > > features?
>> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
>> > > DELETE, the version field doesn't make sense. So, I guess the broker
>> just
>> > > ignores this? An alternative way is to have a separate
>> > > DeleteFeaturesRequest
>> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
>> > > version of the metadata for finalized features." I am wondering why
>> the
>> > > ordering is important?
>> > > 100.6 Could you specify the required ACL for this new request?
>> > >
>> > > 101. For the broker registration ZK node, should we bump up the
>> version
>> > in
>> > > the json?
>> > >
>> > > 102. For the /features ZK node, not sure if we need the epoch field.
>> Each
>> > > ZK node has an internal version field that is incremented on every
>> > update.
>> > >
>> > > 103. "Enabling the actual semantics of a feature version cluster-wide
>> is
>> > > left to the discretion of the logic implementing the feature (ex: can
>> be
>> > > done via dynamic broker config)." Does that mean the broker
>> registration
>> > ZK
>> > > node will be updated dynamically when this happens?
>> > >
>> > > 104. UpdateMetadataRequest
>> > > 104.1 It would be useful to describe when the feature metadata is
>> > included
>> > > in the request. My understanding is that it's only included if (1)
>> there
>> > is
>> > > a change to the finalized feature; (2) broker restart; (3) controller
>> > > failover.
>> > > 104.2 The new fields have the following versions. Why are the
>> versions 3+
>> > > when the top version is bumped to 6?
>> > >       "fields":  [
>> > >         {"name": "Name", "type":  "string", "versions":  "3+",
>> > >           "about": "The name of the feature."},
>> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
>> > >           "about": "The finalized version for the feature."}
>> > >       ]
>> > >
>> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
>> > better
>> > > to use enable/disable?
>> > >
>> > > Jun
>> > >
>> > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
>> kprakasam@confluent.io
>> > >
>> > > wrote:
>> > >
>> > > > Hey Boyang,
>> > > >
>> > > > Thanks for the great feedback! I have updated the KIP based on your
>> > > > feedback.
>> > > > Please find my response below for your comments, look for sentences
>> > > > starting
>> > > > with "(Kowshik)" below.
>> > > >
>> > > >
>> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
>> > > could
>> > > > be
>> > > > > converted as "When is it safe for the brokers to start serving new
>> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
>> > the
>> > > > > context.
>> > > >
>> > > > (Kowshik): Great point! Done.
>> > > >
>> > > > > 2. In the *Explanation *section, the metadata version number part
>> > > seems a
>> > > > > bit blurred. Could you point a reference to later section that we
>> > going
>> > > > to
>> > > > > store it in Zookeeper and update it every time when there is a
>> > feature
>> > > > > change?
>> > > >
>> > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
>> > > >
>> > > >
>> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
>> > for
>> > > > > features such as group coordinator semantics, there is no legal
>> > > scenario
>> > > > to
>> > > > > perform a downgrade at all. So having downgrade door open is
>> pretty
>> > > > > error-prone as human faults happen all the time. I'm assuming as
>> new
>> > > > > features are implemented, it's not very hard to add a flag during
>> > > feature
>> > > > > creation to indicate whether this feature is "downgradable". Could
>> > you
>> > > > > explain a bit more on the extra engineering effort for shipping
>> this
>> > > KIP
>> > > > > with downgrade protection in place?
>> > > >
>> > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
>> that
>> > > > accidental
>> > > > downgrades can cause problems, I also think sometimes downgrades
>> should
>> > > > be allowed for emergency reasons (not all downgrades cause issues).
>> > > > It is just subjective to the feature being downgraded.
>> > > >
>> > > > To be more strict about feature version downgrades, I have modified
>> the
>> > > KIP
>> > > > proposing that we mandate a `--force-downgrade` flag be used in the
>> > > > UPDATE_FEATURES api
>> > > > and the tooling, whenever the human is downgrading a finalized
>> feature
>> > > > version.
>> > > > Hopefully this should cover the requirement, until we find the need
>> for
>> > > > advanced downgrade support.
>> > > >
>> > > > > 4. "Each broker’s supported dictionary of feature versions will be
>> > > > defined
>> > > > > in the broker code." So this means in order to restrict a certain
>> > > > feature,
>> > > > > we need to start the broker first and then send a feature gating
>> > > request
>> > > > > immediately, which introduces a time gap and the intended-to-close
>> > > > feature
>> > > > > could actually serve request during this phase. Do you think we
>> > should
>> > > > also
>> > > > > support configurations as well so that admin user could freely
>> roll
>> > up
>> > > a
>> > > > > cluster with all nodes complying the same feature gating, without
>> > > > worrying
>> > > > > about the turnaround time to propagate the message only after the
>> > > cluster
>> > > > > starts up?
>> > > >
>> > > > (Kowshik): This is a great point/question. One of the expectations
>> out
>> > of
>> > > > this KIP, which is
>> > > > already followed in the broker, is the following.
>> > > >  - Imagine at time T1 the broker starts up and registers it’s
>> presence
>> > in
>> > > > ZK,
>> > > >    along with advertising it’s supported features.
>> > > >  - Imagine at a future time T2 the broker receives the
>> > > > UpdateMetadataRequest
>> > > >    from the controller, which contains the latest finalized
>> features as
>> > > > seen by
>> > > >    the controller. The broker validates this data against it’s
>> > supported
>> > > > features to
>> > > >    make sure there is no mismatch (it will shutdown if there is an
>> > > > incompatibility).
>> > > >
>> > > > It is expected that during the time between the 2 events T1 and T2,
>> the
>> > > > broker is
>> > > > almost a silent entity in the cluster. It does not add any value to
>> the
>> > > > cluster, or carry
>> > > > out any important broker activities. By “important”, I mean it is
>> not
>> > > doing
>> > > > mutations
>> > > > on it’s persistence, not mutating critical in-memory state, won’t be
>> > > > serving
>> > > > produce/fetch requests. Note it doesn’t even know it’s assigned
>> > > partitions
>> > > > until
>> > > > it receives UpdateMetadataRequest from controller. Anything the
>> broker
>> > is
>> > > > doing up
>> > > > until this point is not damaging/useful.
>> > > >
>> > > > I’ve clarified the above in the KIP, see this new section:
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
>> > > > .
>> > > >
>> > > > > 5. "adding a new Feature, updating or deleting an existing
>> Feature",
>> > > may
>> > > > be
>> > > > > I misunderstood something, I thought the features are defined in
>> > broker
>> > > > > code, so admin could not really create a new feature?
>> > > >
>> > > > (Kowshik): Great point! You understood this right. Here adding a
>> > feature
>> > > > means we are
>> > > > adding a cluster-wide finalized *max* version for a feature that was
>> > > > previously never finalized.
>> > > > I have clarified this in the KIP now.
>> > > >
>> > > > > 6. I think we need a separate error code like
>> > > FEATURE_UPDATE_IN_PROGRESS
>> > > > to
>> > > > > reject a concurrent feature update request.
>> > > >
>> > > > (Kowshik): Great point! I have modified the KIP adding the above
>> (see
>> > > > 'Tooling support -> Admin API changes').
>> > > >
>> > > > > 7. I think we haven't discussed the alternative solution to pass
>> the
>> > > > > feature information through Zookeeper. Is that mentioned in the
>> KIP
>> > to
>> > > > > justify why using UpdateMetadata is more favorable?
>> > > >
>> > > > (Kowshik): Nice question! The broker reads finalized feature info
>> > stored
>> > > in
>> > > > ZK,
>> > > > only during startup when it does a validation. When serving
>> > > > `ApiVersionsRequest`, the
>> > > > broker does not read this info from ZK directly. I'd imagine the
>> risk
>> > is
>> > > > that it can increase
>> > > > the ZK read QPS which can be a bottleneck for the system. Today, in
>> > Kafka
>> > > > we use the
>> > > > controller to fan out ZK updates to brokers and we want to stick to
>> > that
>> > > > pattern to avoid
>> > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
>> > > >
>> > > > > 8. I was under the impression that user could configure a range of
>> > > > > supported versions, what's the trade-off for allowing single
>> > finalized
>> > > > > version only?
>> > > >
>> > > > (Kowshik): Great question! The finalized version of a feature
>> basically
>> > > > refers to
>> > > > the cluster-wide finalized feature "maximum" version. For example,
>> if
>> > the
>> > > > 'group_coordinator' feature
>> > > > has the finalized version set to 10, then, it means that
>> cluster-wide
>> > all
>> > > > versions upto v10 are
>> > > > supported for this feature. However, note that if some version (ex:
>> v0)
>> > > > gets deprecated
>> > > > for this feature, then we don’t convey that using this scheme (also
>> > > > supporting deprecation is a non-goal).
>> > > >
>> > > > (Kowshik): I’ve now modified the KIP at all points, refering to
>> > finalized
>> > > > feature "maximum" versions.
>> > > >
>> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
>> > > > producer
>> > > >
>> > > > (Kowshik): Great point! Done.
>> > > >
>> > > >
>> > > > Cheers,
>> > > > Kowshik
>> > > >
>> > > >
>> > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
>> > reluctanthero104@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hey Kowshik,
>> > > > >
>> > > > > thanks for the revised KIP. Got a couple of questions:
>> > > > >
>> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
>> > > could
>> > > > be
>> > > > > converted as "When is it safe for the brokers to start serving new
>> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
>> > the
>> > > > > context.
>> > > > >
>> > > > > 2. In the *Explanation *section, the metadata version number part
>> > > seems a
>> > > > > bit blurred. Could you point a reference to later section that we
>> > going
>> > > > to
>> > > > > store it in Zookeeper and update it every time when there is a
>> > feature
>> > > > > change?
>> > > > >
>> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
>> > for
>> > > > > features such as group coordinator semantics, there is no legal
>> > > scenario
>> > > > to
>> > > > > perform a downgrade at all. So having downgrade door open is
>> pretty
>> > > > > error-prone as human faults happen all the time. I'm assuming as
>> new
>> > > > > features are implemented, it's not very hard to add a flag during
>> > > feature
>> > > > > creation to indicate whether this feature is "downgradable". Could
>> > you
>> > > > > explain a bit more on the extra engineering effort for shipping
>> this
>> > > KIP
>> > > > > with downgrade protection in place?
>> > > > >
>> > > > > 4. "Each broker’s supported dictionary of feature versions will be
>> > > > defined
>> > > > > in the broker code." So this means in order to restrict a certain
>> > > > feature,
>> > > > > we need to start the broker first and then send a feature gating
>> > > request
>> > > > > immediately, which introduces a time gap and the intended-to-close
>> > > > feature
>> > > > > could actually serve request during this phase. Do you think we
>> > should
>> > > > also
>> > > > > support configurations as well so that admin user could freely
>> roll
>> > up
>> > > a
>> > > > > cluster with all nodes complying the same feature gating, without
>> > > > worrying
>> > > > > about the turnaround time to propagate the message only after the
>> > > cluster
>> > > > > starts up?
>> > > > >
>> > > > > 5. "adding a new Feature, updating or deleting an existing
>> Feature",
>> > > may
>> > > > be
>> > > > > I misunderstood something, I thought the features are defined in
>> > broker
>> > > > > code, so admin could not really create a new feature?
>> > > > >
>> > > > > 6. I think we need a separate error code like
>> > > FEATURE_UPDATE_IN_PROGRESS
>> > > > to
>> > > > > reject a concurrent feature update request.
>> > > > >
>> > > > > 7. I think we haven't discussed the alternative solution to pass
>> the
>> > > > > feature information through Zookeeper. Is that mentioned in the
>> KIP
>> > to
>> > > > > justify why using UpdateMetadata is more favorable?
>> > > > >
>> > > > > 8. I was under the impression that user could configure a range of
>> > > > > supported versions, what's the trade-off for allowing single
>> > finalized
>> > > > > version only?
>> > > > >
>> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
>> > > > producer
>> > > > >
>> > > > > Boyang
>> > > > >
>> > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
>> > > > > > > Hi Colin,
>> > > > > > >
>> > > > > > > Thanks for the feedback! I've changed the KIP to address your
>> > > > > > > suggestions.
>> > > > > > > Please find below my explanation. Here is a link to KIP 584:
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > > .
>> > > > > > >
>> > > > > > > 1. '__data_version__' is the version of the finalized feature
>> > > > metadata
>> > > > > > > (i.e. actual ZK node contents), while the
>> '__schema_version__' is
>> > > the
>> > > > > > > version of the schema of the data persisted in ZK. These serve
>> > > > > different
>> > > > > > > purposes. '__data_version__' is is useful mainly to clients
>> > during
>> > > > > reads,
>> > > > > > > to differentiate between the 2 versions of eventually
>> consistent
>> > > > > > 'finalized
>> > > > > > > features' metadata (i.e. larger metadata version is more
>> recent).
>> > > > > > > '__schema_version__' provides an additional degree of
>> > flexibility,
>> > > > > where
>> > > > > > if
>> > > > > > > we decide to change the schema for '/features' node in ZK (in
>> the
>> > > > > > future),
>> > > > > > > then we can manage broker roll outs suitably (i.e.
>> > > > > > > serialization/deserialization of the ZK data can be handled
>> > > safely).
>> > > > > >
>> > > > > > Hi Kowshik,
>> > > > > >
>> > > > > > If you're talking about a number that lets you know if data is
>> more
>> > > or
>> > > > > > less recent, we would typically call that an epoch, and not a
>> > > version.
>> > > > > For
>> > > > > > the ZK data structures, the word "version" is typically reserved
>> > for
>> > > > > > describing changes to the overall schema of the data that is
>> > written
>> > > to
>> > > > > > ZooKeeper.  We don't even really change the "version" of those
>> > > schemas
>> > > > > that
>> > > > > > much, since most changes are backwards-compatible.  But we do
>> > include
>> > > > > that
>> > > > > > version field just in case.
>> > > > > >
>> > > > > > I don't think we really need an epoch here, though, since we can
>> > just
>> > > > > look
>> > > > > > at the broker epoch.  Whenever the broker registers, its epoch
>> will
>> > > be
>> > > > > > greater than the previous broker epoch.  And the newly
>> registered
>> > > data
>> > > > > will
>> > > > > > take priority.  This will be a lot simpler than adding a
>> separate
>> > > epoch
>> > > > > > system, I think.
>> > > > > >
>> > > > > > >
>> > > > > > > 2. Regarding admin client needing min and max information -
>> you
>> > are
>> > > > > > right!
>> > > > > > > I've changed the KIP such that the Admin API also allows the
>> user
>> > > to
>> > > > > read
>> > > > > > > 'supported features' from a specific broker. Please look at
>> the
>> > > > section
>> > > > > > > "Admin API changes".
>> > > > > >
>> > > > > > Thanks.
>> > > > > >
>> > > > > > >
>> > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
>> deliberate.
>> > > > I've
>> > > > > > > improved the KIP to just use `long` at all places.
>> > > > > >
>> > > > > > Sounds good.
>> > > > > >
>> > > > > > >
>> > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
>> > I've
>> > > > > > updated
>> > > > > > > the KIP sketching the functionality provided by this tool,
>> with
>> > > some
>> > > > > > > examples. Please look at the section "Tooling support
>> examples".
>> > > > > > >
>> > > > > > > Thank you!
>> > > > > >
>> > > > > >
>> > > > > > Thanks, Kowshik.
>> > > > > >
>> > > > > > cheers,
>> > > > > > Colin
>> > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Kowshik
>> > > > > > >
>> > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
>> > cmccabe@apache.org>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Thanks, Kowshik, this looks good.
>> > > > > > > >
>> > > > > > > > In the "Schema" section, do we really need both
>> > > __schema_version__
>> > > > > and
>> > > > > > > > __data_version__?  Can we just have a single version field
>> > here?
>> > > > > > > >
>> > > > > > > > Shouldn't the Admin(Client) function have some way to get
>> the
>> > min
>> > > > and
>> > > > > > max
>> > > > > > > > information that we're exposing as well?  I guess we could
>> have
>> > > > min,
>> > > > > > max,
>> > > > > > > > and current.  Unrelated: is the use of Long rather than long
>> > > > > deliberate
>> > > > > > > > here?
>> > > > > > > >
>> > > > > > > > It would be good to describe how the command line tool
>> > > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
>> > that
>> > > > it
>> > > > > > will
>> > > > > > > > take and the output that it will generate to STDOUT.
>> > > > > > > >
>> > > > > > > > cheers,
>> > > > > > > > Colin
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
>> > > > > > > > > Hi all,
>> > > > > > > > >
>> > > > > > > > > I've opened KIP-584
>> > > <https://issues.apache.org/jira/browse/KIP-584> <
>> > > > https://issues.apache.org/jira/browse/KIP-584
>> > > > > >
>> > > > > > > > > which
>> > > > > > > > > is intended to provide a versioning scheme for features.
>> I'd
>> > > like
>> > > > > to
>> > > > > > use
>> > > > > > > > > this thread to discuss the same. I'd appreciate any
>> feedback
>> > on
>> > > > > this.
>> > > > > > > > > Here
>> > > > > > > > > is a link to KIP-584
>> > > <https://issues.apache.org/jira/browse/KIP-584>:
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > > > >  .
>> > > > > > > > >
>> > > > > > > > > Thank you!
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Cheers,
>> > > > > > > > > Kowshik
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Jun,

Thanks for the feedback and suggestions. Please find my response below.

> 100.6 For every new request, the admin needs to control who is allowed to
> issue that request if security is enabled. So, we need to assign the new
> request a ResourceType and possible AclOperations. See
>
https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> as an example.

(Kowshik): I don't see any reference to the words ResourceType or
AclOperations
in the KIP. Please let me know how I can use the KIP that you linked to
know how to
setup the appropriate ResourceType and/or ClusterOperation?

> 105. If we change delete to disable, it's better to do this consistently
in
> request protocol and admin api as well.

(Kowshik): The API shouldn't be called 'disable' when it is deleting a
feature.
I've just changed the KIP to use 'delete'. I don't have a strong preference.

> 110. The minVersion/maxVersion for features use int64. Currently, our
> release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> for new features to be included in minor releases too. Should we make the
> feature versioning match the release versioning?

(Kowshik): The release version can be mapped to a set of feature versions,
and this can be done, for example in the tool (or even external to the
tool).
Can you please clarify what I'm missing?

> 111. "During regular operations, the data in the ZK node can be mutated
> only via a specific admin API served only by the controller." I am
> wondering why can't the controller auto finalize a feature version after
> all brokers are upgraded? For new users who download the latest version to
> build a new cluster, it's inconvenient for them to have to manually enable
> each feature.

(Kowshik): I agree that there is a trade-off here, but it will help
to decide whether the automation can be thought through in the future
in a follow up KIP, or right now in this KIP. We may invest
in automation, but we have to decide whether we should do it
now or later.

For the inconvenience that you mentioned, do you think the problem that you
mentioned can be  overcome by asking for the cluster operator to run a
bootstrap script  when he/she knows that a specific AK release has been
almost completely deployed in a cluster for the first time? Idea is that the
bootstrap script will know how to map a specific AK release to finalized
feature versions, and run the `kafka-features.sh` tool appropriately against
the cluster.

Now, coming back to your automation proposal/question.
I do see the value of automated feature version finalization, but I also
see
that this will open up several questions and some risks, as explained below.
The answers to these depend on the definition of the automation we choose
to build, and how well does it fit into a kafka deployment.
Basically, it can be unsafe for the controller to finalize feature version
upgrades automatically, without learning about the intent of the cluster
operator.
1. We would sometimes want to lock feature versions only when we have
externally verified
the stability of the broker binary.
2. Sometimes only the cluster operator knows that a cluster upgrade is
complete,
and new brokers are highly unlikely to join the cluster.
3. Only the cluster operator knows that the intent is to deploy the same
version
of the new broker release across the entire cluster (i.e. the latest
downloaded version).
4. For downgrades, it appears the controller still needs some external input
(such as the proposed tool) to finalize a feature version downgrade.

If we have automation, that automation can end up failing in some of the
cases
above. Then, we need a way to declare that the cluster is "not ready" if the
controller cannot automatically finalize some basic required feature version
upgrades across the cluster. We need to make the cluster operator aware in
such a scenario (raise an alert or alike).

> 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
48.

(Kowshik): Done.


Cheers,
Kowshik

On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the reply. A few more comments below.
>
> 100.6 For every new request, the admin needs to control who is allowed to
> issue that request if security is enabled. So, we need to assign the new
> request a ResourceType and possible AclOperations. See
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> as
> an example.
>
> 105. If we change delete to disable, it's better to do this consistently in
> request protocol and admin api as well.
>
> 110. The minVersion/maxVersion for features use int64. Currently, our
> release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> for new features to be included in minor releases too. Should we make the
> feature versioning match the release versioning?
>
> 111. "During regular operations, the data in the ZK node can be mutated
> only via a specific admin API served only by the controller." I am
> wondering why can't the controller auto finalize a feature version after
> all brokers are upgraded? For new users who download the latest version to
> build a new cluster, it's inconvenient for them to have to manually enable
> each feature.
>
> 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> 48.
>
> Jun
>
>
> On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hey Jun,
> >
> > Thanks a lot for the great feedback! Please note that the design
> > has changed a little bit on the KIP, and we now propagate the finalized
> > features metadata only via ZK watches (instead of UpdateMetadataRequest
> > from the controller).
> >
> > Please find below my response to your questions/feedback, with the prefix
> > "(Kowshik):".
> >
> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > 100.1 Since this request waits for responses from brokers, should we
> add
> > a
> > > timeout in the request (like createTopicRequest)?
> >
> > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> > longer
> > wait for responses from brokers, since the design has been changed so
> that
> > the
> > features information is propagated via ZK. Nevertheless, it is right to
> > have a timeout
> > for the request.
> >
> > > 100.2 The response schema is a bit weird. Typically, the response just
> > > shows an error code and an error message, instead of echoing the
> request.
> >
> > (Kowshik): Great point! Yeah, I have modified it to just return an error
> > code and a message.
> > Previously it was not echoing the "request", rather it was returning the
> > latest set of
> > cluster-wide finalized features (after applying the updates). But you are
> > right,
> > the additional info is not required, so I have removed it from the
> response
> > schema.
> >
> > > 100.3 Should we add a separate request to list/describe the existing
> > > features?
> >
> > (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
> > Admin API,
> > which, underneath covers uses the ApiVersionsRequest to list/describe the
> > existing features. Please read the 'Tooling support' section.
> >
> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > DELETE, the version field doesn't make sense. So, I guess the broker
> just
> > > ignores this? An alternative way is to have a separate
> > DeleteFeaturesRequest
> >
> > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> > controller APIs
> > serving these different purposes:
> > 1. updateFeatures
> > 2. deleteFeatures
> >
> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > > version of the metadata for finalized features." I am wondering why the
> > > ordering is important?
> >
> > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > version), and
> > it is just the ZK node version. Basically, this is the epoch for the
> > cluster-wide
> > finalized feature version metadata. This metadata is served to clients
> via
> > the
> > ApiVersionsResponse (for reads). We propagate updates from the
> '/features'
> > ZK node
> > to all brokers, via ZK watches setup by each broker on the '/features'
> > node.
> >
> > Now here is why the ordering is important:
> > ZK watches don't propagate at the same time. As a result, the
> > ApiVersionsResponse
> > is eventually consistent across brokers. This can introduce cases
> > where clients see an older lower epoch of the features metadata, after a
> > more recent
> > higher epoch was returned at a previous point in time. We expect clients
> > to always employ the rule that the latest received higher epoch of
> metadata
> > always trumps an older smaller epoch. Those clients that are external to
> > Kafka should strongly consider discovering the latest metadata once
> during
> > startup from the brokers, and if required refresh the metadata
> periodically
> > (to get the latest metadata).
> >
> > > 100.6 Could you specify the required ACL for this new request?
> >
> > (Kowshik): What is ACL, and how could I find out which one to specify?
> > Please could you provide me some pointers? I'll be glad to update the
> > KIP once I know the next steps.
> >
> > > 101. For the broker registration ZK node, should we bump up the version
> > in
> > the json?
> >
> > (Kowshik): Great point! Done. I've increased the version in the broker
> json
> > by 1.
> >
> > > 102. For the /features ZK node, not sure if we need the epoch field.
> Each
> > > ZK node has an internal version field that is incremented on every
> > update.
> >
> > (Kowshik): Great point! Done. I'm using the ZK node version now, instead
> of
> > explicitly
> > incremented epoch.
> >
> > > 103. "Enabling the actual semantics of a feature version cluster-wide
> is
> > > left to the discretion of the logic implementing the feature (ex: can
> be
> > > done via dynamic broker config)." Does that mean the broker
> registration
> > ZK
> > > node will be updated dynamically when this happens?
> >
> > (Kowshik): Not really. The text was just conveying that a broker could
> > "know" of
> > a new feature version, but it does not mean the broker should have also
> > activated the effects of the feature version. Knowing vs activation are 2
> > separate things,
> > and the latter can be achieved by dynamic config. I have reworded the
> text
> > to
> > make this clear to the reader.
> >
> >
> > > 104. UpdateMetadataRequest
> > > 104.1 It would be useful to describe when the feature metadata is
> > included
> > > in the request. My understanding is that it's only included if (1)
> there
> > is
> > > a change to the finalized feature; (2) broker restart; (3) controller
> > > failover.
> > > 104.2 The new fields have the following versions. Why are the versions
> 3+
> > > when the top version is bumped to 6?
> > >       "fields":  [
> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > >           "about": "The name of the feature."},
> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >           "about": "The finalized version for the feature."}
> > >       ]
> >
> > (Kowshik): With the new improved design, we have completely eliminated
> the
> > need to
> > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
> the
> > notifications for changes to the '/features' ZK node.
> >
> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > better
> > > to use enable/disable?
> >
> > (Kowshik): For delete, yes, I have changed it so that we instead call it
> > 'disable'.
> > However for 'update', it can now also refer to either an upgrade or a
> > forced downgrade.
> > Therefore, I have left it the way it is, just calling it as just
> 'update'.
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the KIP. Looks good overall. A few comments below.
> > >
> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > 100.1 Since this request waits for responses from brokers, should we
> add
> > a
> > > timeout in the request (like createTopicRequest)?
> > > 100.2 The response schema is a bit weird. Typically, the response just
> > > shows an error code and an error message, instead of echoing the
> request.
> > > 100.3 Should we add a separate request to list/describe the existing
> > > features?
> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > DELETE, the version field doesn't make sense. So, I guess the broker
> just
> > > ignores this? An alternative way is to have a separate
> > > DeleteFeaturesRequest
> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > > version of the metadata for finalized features." I am wondering why the
> > > ordering is important?
> > > 100.6 Could you specify the required ACL for this new request?
> > >
> > > 101. For the broker registration ZK node, should we bump up the version
> > in
> > > the json?
> > >
> > > 102. For the /features ZK node, not sure if we need the epoch field.
> Each
> > > ZK node has an internal version field that is incremented on every
> > update.
> > >
> > > 103. "Enabling the actual semantics of a feature version cluster-wide
> is
> > > left to the discretion of the logic implementing the feature (ex: can
> be
> > > done via dynamic broker config)." Does that mean the broker
> registration
> > ZK
> > > node will be updated dynamically when this happens?
> > >
> > > 104. UpdateMetadataRequest
> > > 104.1 It would be useful to describe when the feature metadata is
> > included
> > > in the request. My understanding is that it's only included if (1)
> there
> > is
> > > a change to the finalized feature; (2) broker restart; (3) controller
> > > failover.
> > > 104.2 The new fields have the following versions. Why are the versions
> 3+
> > > when the top version is bumped to 6?
> > >       "fields":  [
> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > >           "about": "The name of the feature."},
> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >           "about": "The finalized version for the feature."}
> > >       ]
> > >
> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > better
> > > to use enable/disable?
> > >
> > > Jun
> > >
> > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> kprakasam@confluent.io
> > >
> > > wrote:
> > >
> > > > Hey Boyang,
> > > >
> > > > Thanks for the great feedback! I have updated the KIP based on your
> > > > feedback.
> > > > Please find my response below for your comments, look for sentences
> > > > starting
> > > > with "(Kowshik)" below.
> > > >
> > > >
> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > > could
> > > > be
> > > > > converted as "When is it safe for the brokers to start serving new
> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > the
> > > > > context.
> > > >
> > > > (Kowshik): Great point! Done.
> > > >
> > > > > 2. In the *Explanation *section, the metadata version number part
> > > seems a
> > > > > bit blurred. Could you point a reference to later section that we
> > going
> > > > to
> > > > > store it in Zookeeper and update it every time when there is a
> > feature
> > > > > change?
> > > >
> > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > >
> > > >
> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > for
> > > > > features such as group coordinator semantics, there is no legal
> > > scenario
> > > > to
> > > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > > error-prone as human faults happen all the time. I'm assuming as
> new
> > > > > features are implemented, it's not very hard to add a flag during
> > > feature
> > > > > creation to indicate whether this feature is "downgradable". Could
> > you
> > > > > explain a bit more on the extra engineering effort for shipping
> this
> > > KIP
> > > > > with downgrade protection in place?
> > > >
> > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
> that
> > > > accidental
> > > > downgrades can cause problems, I also think sometimes downgrades
> should
> > > > be allowed for emergency reasons (not all downgrades cause issues).
> > > > It is just subjective to the feature being downgraded.
> > > >
> > > > To be more strict about feature version downgrades, I have modified
> the
> > > KIP
> > > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > > UPDATE_FEATURES api
> > > > and the tooling, whenever the human is downgrading a finalized
> feature
> > > > version.
> > > > Hopefully this should cover the requirement, until we find the need
> for
> > > > advanced downgrade support.
> > > >
> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > > defined
> > > > > in the broker code." So this means in order to restrict a certain
> > > > feature,
> > > > > we need to start the broker first and then send a feature gating
> > > request
> > > > > immediately, which introduces a time gap and the intended-to-close
> > > > feature
> > > > > could actually serve request during this phase. Do you think we
> > should
> > > > also
> > > > > support configurations as well so that admin user could freely roll
> > up
> > > a
> > > > > cluster with all nodes complying the same feature gating, without
> > > > worrying
> > > > > about the turnaround time to propagate the message only after the
> > > cluster
> > > > > starts up?
> > > >
> > > > (Kowshik): This is a great point/question. One of the expectations
> out
> > of
> > > > this KIP, which is
> > > > already followed in the broker, is the following.
> > > >  - Imagine at time T1 the broker starts up and registers it’s
> presence
> > in
> > > > ZK,
> > > >    along with advertising it’s supported features.
> > > >  - Imagine at a future time T2 the broker receives the
> > > > UpdateMetadataRequest
> > > >    from the controller, which contains the latest finalized features
> as
> > > > seen by
> > > >    the controller. The broker validates this data against it’s
> > supported
> > > > features to
> > > >    make sure there is no mismatch (it will shutdown if there is an
> > > > incompatibility).
> > > >
> > > > It is expected that during the time between the 2 events T1 and T2,
> the
> > > > broker is
> > > > almost a silent entity in the cluster. It does not add any value to
> the
> > > > cluster, or carry
> > > > out any important broker activities. By “important”, I mean it is not
> > > doing
> > > > mutations
> > > > on it’s persistence, not mutating critical in-memory state, won’t be
> > > > serving
> > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > partitions
> > > > until
> > > > it receives UpdateMetadataRequest from controller. Anything the
> broker
> > is
> > > > doing up
> > > > until this point is not damaging/useful.
> > > >
> > > > I’ve clarified the above in the KIP, see this new section:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > .
> > > >
> > > > > 5. "adding a new Feature, updating or deleting an existing
> Feature",
> > > may
> > > > be
> > > > > I misunderstood something, I thought the features are defined in
> > broker
> > > > > code, so admin could not really create a new feature?
> > > >
> > > > (Kowshik): Great point! You understood this right. Here adding a
> > feature
> > > > means we are
> > > > adding a cluster-wide finalized *max* version for a feature that was
> > > > previously never finalized.
> > > > I have clarified this in the KIP now.
> > > >
> > > > > 6. I think we need a separate error code like
> > > FEATURE_UPDATE_IN_PROGRESS
> > > > to
> > > > > reject a concurrent feature update request.
> > > >
> > > > (Kowshik): Great point! I have modified the KIP adding the above (see
> > > > 'Tooling support -> Admin API changes').
> > > >
> > > > > 7. I think we haven't discussed the alternative solution to pass
> the
> > > > > feature information through Zookeeper. Is that mentioned in the KIP
> > to
> > > > > justify why using UpdateMetadata is more favorable?
> > > >
> > > > (Kowshik): Nice question! The broker reads finalized feature info
> > stored
> > > in
> > > > ZK,
> > > > only during startup when it does a validation. When serving
> > > > `ApiVersionsRequest`, the
> > > > broker does not read this info from ZK directly. I'd imagine the risk
> > is
> > > > that it can increase
> > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> > Kafka
> > > > we use the
> > > > controller to fan out ZK updates to brokers and we want to stick to
> > that
> > > > pattern to avoid
> > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > >
> > > > > 8. I was under the impression that user could configure a range of
> > > > > supported versions, what's the trade-off for allowing single
> > finalized
> > > > > version only?
> > > >
> > > > (Kowshik): Great question! The finalized version of a feature
> basically
> > > > refers to
> > > > the cluster-wide finalized feature "maximum" version. For example, if
> > the
> > > > 'group_coordinator' feature
> > > > has the finalized version set to 10, then, it means that cluster-wide
> > all
> > > > versions upto v10 are
> > > > supported for this feature. However, note that if some version (ex:
> v0)
> > > > gets deprecated
> > > > for this feature, then we don’t convey that using this scheme (also
> > > > supporting deprecation is a non-goal).
> > > >
> > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > finalized
> > > > feature "maximum" versions.
> > > >
> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > > producer
> > > >
> > > > (Kowshik): Great point! Done.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > >
> > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > reluctanthero104@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey Kowshik,
> > > > >
> > > > > thanks for the revised KIP. Got a couple of questions:
> > > > >
> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > > could
> > > > be
> > > > > converted as "When is it safe for the brokers to start serving new
> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > the
> > > > > context.
> > > > >
> > > > > 2. In the *Explanation *section, the metadata version number part
> > > seems a
> > > > > bit blurred. Could you point a reference to later section that we
> > going
> > > > to
> > > > > store it in Zookeeper and update it every time when there is a
> > feature
> > > > > change?
> > > > >
> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > for
> > > > > features such as group coordinator semantics, there is no legal
> > > scenario
> > > > to
> > > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > > error-prone as human faults happen all the time. I'm assuming as
> new
> > > > > features are implemented, it's not very hard to add a flag during
> > > feature
> > > > > creation to indicate whether this feature is "downgradable". Could
> > you
> > > > > explain a bit more on the extra engineering effort for shipping
> this
> > > KIP
> > > > > with downgrade protection in place?
> > > > >
> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > > defined
> > > > > in the broker code." So this means in order to restrict a certain
> > > > feature,
> > > > > we need to start the broker first and then send a feature gating
> > > request
> > > > > immediately, which introduces a time gap and the intended-to-close
> > > > feature
> > > > > could actually serve request during this phase. Do you think we
> > should
> > > > also
> > > > > support configurations as well so that admin user could freely roll
> > up
> > > a
> > > > > cluster with all nodes complying the same feature gating, without
> > > > worrying
> > > > > about the turnaround time to propagate the message only after the
> > > cluster
> > > > > starts up?
> > > > >
> > > > > 5. "adding a new Feature, updating or deleting an existing
> Feature",
> > > may
> > > > be
> > > > > I misunderstood something, I thought the features are defined in
> > broker
> > > > > code, so admin could not really create a new feature?
> > > > >
> > > > > 6. I think we need a separate error code like
> > > FEATURE_UPDATE_IN_PROGRESS
> > > > to
> > > > > reject a concurrent feature update request.
> > > > >
> > > > > 7. I think we haven't discussed the alternative solution to pass
> the
> > > > > feature information through Zookeeper. Is that mentioned in the KIP
> > to
> > > > > justify why using UpdateMetadata is more favorable?
> > > > >
> > > > > 8. I was under the impression that user could configure a range of
> > > > > supported versions, what's the trade-off for allowing single
> > finalized
> > > > > version only?
> > > > >
> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > > producer
> > > > >
> > > > > Boyang
> > > > >
> > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> > > wrote:
> > > > >
> > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > Hi Colin,
> > > > > > >
> > > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > > suggestions.
> > > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > .
> > > > > > >
> > > > > > > 1. '__data_version__' is the version of the finalized feature
> > > > metadata
> > > > > > > (i.e. actual ZK node contents), while the '__schema_version__'
> is
> > > the
> > > > > > > version of the schema of the data persisted in ZK. These serve
> > > > > different
> > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > during
> > > > > reads,
> > > > > > > to differentiate between the 2 versions of eventually
> consistent
> > > > > > 'finalized
> > > > > > > features' metadata (i.e. larger metadata version is more
> recent).
> > > > > > > '__schema_version__' provides an additional degree of
> > flexibility,
> > > > > where
> > > > > > if
> > > > > > > we decide to change the schema for '/features' node in ZK (in
> the
> > > > > > future),
> > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > serialization/deserialization of the ZK data can be handled
> > > safely).
> > > > > >
> > > > > > Hi Kowshik,
> > > > > >
> > > > > > If you're talking about a number that lets you know if data is
> more
> > > or
> > > > > > less recent, we would typically call that an epoch, and not a
> > > version.
> > > > > For
> > > > > > the ZK data structures, the word "version" is typically reserved
> > for
> > > > > > describing changes to the overall schema of the data that is
> > written
> > > to
> > > > > > ZooKeeper.  We don't even really change the "version" of those
> > > schemas
> > > > > that
> > > > > > much, since most changes are backwards-compatible.  But we do
> > include
> > > > > that
> > > > > > version field just in case.
> > > > > >
> > > > > > I don't think we really need an epoch here, though, since we can
> > just
> > > > > look
> > > > > > at the broker epoch.  Whenever the broker registers, its epoch
> will
> > > be
> > > > > > greater than the previous broker epoch.  And the newly registered
> > > data
> > > > > will
> > > > > > take priority.  This will be a lot simpler than adding a separate
> > > epoch
> > > > > > system, I think.
> > > > > >
> > > > > > >
> > > > > > > 2. Regarding admin client needing min and max information - you
> > are
> > > > > > right!
> > > > > > > I've changed the KIP such that the Admin API also allows the
> user
> > > to
> > > > > read
> > > > > > > 'supported features' from a specific broker. Please look at the
> > > > section
> > > > > > > "Admin API changes".
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> deliberate.
> > > > I've
> > > > > > > improved the KIP to just use `long` at all places.
> > > > > >
> > > > > > Sounds good.
> > > > > >
> > > > > > >
> > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > I've
> > > > > > updated
> > > > > > > the KIP sketching the functionality provided by this tool, with
> > > some
> > > > > > > examples. Please look at the section "Tooling support
> examples".
> > > > > > >
> > > > > > > Thank you!
> > > > > >
> > > > > >
> > > > > > Thanks, Kowshik.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > cmccabe@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > >
> > > > > > > > In the "Schema" section, do we really need both
> > > __schema_version__
> > > > > and
> > > > > > > > __data_version__?  Can we just have a single version field
> > here?
> > > > > > > >
> > > > > > > > Shouldn't the Admin(Client) function have some way to get the
> > min
> > > > and
> > > > > > max
> > > > > > > > information that we're exposing as well?  I guess we could
> have
> > > > min,
> > > > > > max,
> > > > > > > > and current.  Unrelated: is the use of Long rather than long
> > > > > deliberate
> > > > > > > > here?
> > > > > > > >
> > > > > > > > It would be good to describe how the command line tool
> > > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> > that
> > > > it
> > > > > > will
> > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I've opened KIP-584
> > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > >
> > > > > > > > > which
> > > > > > > > > is intended to provide a versioning scheme for features.
> I'd
> > > like
> > > > > to
> > > > > > use
> > > > > > > > > this thread to discuss the same. I'd appreciate any
> feedback
> > on
> > > > > this.
> > > > > > > > > Here
> > > > > > > > > is a link to KIP-584
> > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > >  .
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi Guozhang,

Thanks for the insightful feedback and questions!
I have updated the KIP in response to some of the suggestions.
Please find my response below.

> 1. Could you explain a bit what would the "the set of features supported
by
> a broker" information, beyond the cluster-level finalized features, be
used
> by the client? I think that if we consider all of the features should be
> "cluster-wide", i.e. the client may need to talk any brokers of the
cluster
> to execute specific features, then knowing the supported versions of that
> feature from a single broker would not help much, and hence it is
> unnecessary to include this information --- maybe I overlooked some use
> cases here :) in that case, could you augment the KIP to add some
> motivational examples?

(Kowshik): This is a good question. You are right, we don't have a use case
for a client to
make meaningful use of broker-level supported features. In the future this
may
be useful, but today it is purely part of the API for debuggability reasons.
Sometimes, a human may want to look at supported features, and see how it
compares against finalized features.

I added the supported features to the ApiVersionsResponse, after earlier
feedback from Colin hinting about the usefulness of it. Feel free to let us
know
if you think this is overkill, or confusing, or not required (for now).
We could discuss removing the extra field from this KIP.

> 2. For the cluster-wide FinalizedFeature (for both ZK storage schema, and
> the ApiVersionResponse protocol schema), I understand that our assumption
> for using a single version value is that the cluster supports all versions
> up to that max-version. However, I wonder if that assumption is
appropriate
> under normal operations besides Boyang's questions regarding deprecation:
> each brokers supported versions for a feature may not always start with
> version 0 actually, while a client getting the version X assumes that any
> version under X would be supported; so if client only knows about how to
> execute a feature on version Y (< X) and then sends a request to broker,
> what happens if it only knows at that time that broker actually only
> supported min-version larger than Y? Today on the request-level, we either
> auto-downgrade the request version (which is not recommended) or we throw
> an exception, which could be too late since the client already executed
> some steps of that feature which have side-effects persisted.

(Kowshik): This is a good question. So far, within the scope of this KIP,
we only track
the highest level of the finalized feature version (let's call it H). So
far,
we haven't come across a concrete use case where the client needs to learn
the
lowest cluster-wide version level (L) in order to make safe + correct
decisions at its end. Do you have any use-case in mind?
It's open for discussion whether we need to support this in this KIP,
or a follow up KIP.

Now, below are purely my thoughts, on why we may need this support only
rarely (or never).
Fundamentally, feature version deprecation is quite a different process
than upgrades.
During feature version upgrades, the client "may" at some point upgrade to
using a
newer feature version (at their discretion). On the other hand, feature
version deprecation
would involve clear cautious steps on the side of the feature provider
(i.e. Kafka cluster):
1. Announce the intent to deprecate support for the lowest feature version
level L.
2. Check collected data to know that almost all clients have stopped using
the feature at version L.
3. Permanently stop accepting requests on the broker corresponding to
the feature version L.
4. By this point, we know for sure that no client could be using the
feature,
because the cluster has stopped taking 100% requests. It is rare for clients
to go back to a mode where they will need to decide programmatically, on
whether to start using the feature at the deprecated lowest version level L.
5. Now completely drop support for feature version L (ex: eliminate the code
in the broker side). This will mean that subsequently brokers will
advertise their min_version for the feature at a value V > L and V <= H.

As you can see, by the time we hit step 5, it looks unlikely that a client
can go back to using the deprecated feature version, unless there is a
very drastic change.

> 3. Regarding the "Incompatible broker lifetime race condition" section, my
> understanding is actually a little bit different, please correct me if I'm
> wrong: during broker starts up, after it has registered in ZK along with
> its supported versions, the validation procedure is actually executed at
> both the broker side as well as the controller:

> 3.a) the broker reads the cluster-level feature vectors from ZK directly
> and compare with its own supported versions; if the validation fails, it
> will shutdown itself, otherwise, proceed normally.
> 3.b) upon being notified through ZK watchers of the newly registered
> broker, the controller will ALSO execute the validation comparing its
> registry's supported feature versions with the cluster-level feature
> vectors; if the validation fails, the controller will stop the remaining
of
> the new-broker-startup procedure like potentially adding it to some
> partition's replica list or moving leaders to it.

> The key here is that 3.b) on the controller side is serially executed with
> all other controller operations, including the add/update/delete-feature
> request handling. So if the broker-startup registry is executed first,
then
> the later update-feature request which would make the broker incompatible
> would be rejected; if the update-feature request is handled first, then
the
> broker-startup logic would abort since the validation fails. In that
sense,
> there would be no race condition windows -- of course that's based on my
> understanding that validation is also executed on the controller side.
> Please let me know if that makes sense?

(Kowshik): Yes, you are absolutely right. Great point!
We are on the same page here. I have updated the KIP.
The race condition can be avoided, when in the controller, the thread that
handles add/update/delete-feature request is also the same thread that
updates
the controller's cache of Broker info whenever new brokers join the cluster.

In this setup, if an update feature request (E1) is processed ahead of a
notification from ZK about an incompatible broker joining the cluster (E2),
then the controller can certainly detect that the broker is incompatible
when E2 is processed. The controller could stop the remaining of the new
broker startup sequence by perhaps refusing to send an UpdateMetadataRequest
to bootstrap the new broker.

I believe the above is the case today (i.e. controller event processing
appears
single threaded to me), and I just looked at the code to verify the same.
I have made a note of the same on the KIP. Pointers to the code I looked
at are below:

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerEventManager.scala#L115

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerContext.scala#L78

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L1835


Cheers,
Kowshik

On Sun, Apr 5, 2020 at 10:06 AM Guozhang Wang <wa...@gmail.com> wrote:

> Hello Kowshik,
>
> Thanks for the great write-up, overall it reads great to me already. Adding
> a few meta comments here:
>
> 1. Could you explain a bit what would the "the set of features supported by
> a broker" information, beyond the cluster-level finalized features, be used
> by the client? I think that if we consider all of the features should be
> "cluster-wide", i.e. the client may need to talk any brokers of the cluster
> to execute specific features, then knowing the supported versions of that
> feature from a single broker would not help much, and hence it is
> unnecessary to include this information --- maybe I overlooked some use
> cases here :) in that case, could you augment the KIP to add some
> motivational examples?
>
> 2. For the cluster-wide FinalizedFeature (for both ZK storage schema, and
> the ApiVersionResponse protocol schema), I understand that our assumption
> for using a single version value is that the cluster supports all versions
> up to that max-version. However, I wonder if that assumption is appropriate
> under normal operations besides Boyang's questions regarding deprecation:
> each brokers supported versions for a feature may not always start with
> version 0 actually, while a client getting the version X assumes that any
> version under X would be supported; so if client only knows about how to
> execute a feature on version Y (< X) and then sends a request to broker,
> what happens if it only knows at that time that broker actually only
> supported min-version larger than Y? Today on the request-level, we either
> auto-downgrade the request version (which is not recommended) or we throw
> an exception, which could be too late since the client already executed
> some steps of that feature which have side-effects persisted.
>
> Or do we require clients seeing cluster-wide supporter version X must be
> able to execute the feature at exactly version X? If so, that seems too
> restrictive to me too.
>
> 3. Regarding the "Incompatible broker lifetime race condition" section, my
> understanding is actually a little bit different, please correct me if I'm
> wrong: during broker starts up, after it has registered in ZK along with
> its supported versions, the validation procedure is actually executed at
> both the broker side as well as the controller:
>
> 3.a) the broker reads the cluster-level feature vectors from ZK directly
> and compare with its own supported versions; if the validation fails, it
> will shutdown itself, otherwise, proceed normally.
> 3.b) upon being notified through ZK watchers of the newly registered
> broker, the controller will ALSO execute the validation comparing its
> registry's supported feature versions with the cluster-level feature
> vectors; if the validation fails, the controller will stop the remaining of
> the new-broker-startup procedure like potentially adding it to some
> partition's replica list or moving leaders to it.
>
> The key here is that 3.b) on the controller side is serially executed with
> all other controller operations, including the add/update/delete-feature
> request handling. So if the broker-startup registry is executed first, then
> the later update-feature request which would make the broker incompatible
> would be rejected; if the update-feature request is handled first, then the
> broker-startup logic would abort since the validation fails. In that sense,
> there would be no race condition windows -- of course that's based on my
> understanding that validation is also executed on the controller side.
> Please let me know if that makes sense?
>
>
> Guozhang
>
>
> On Sat, Apr 4, 2020 at 8:36 PM Colin McCabe <cm...@apache.org> wrote:
>
> > On Fri, Apr 3, 2020, at 11:24, Jun Rao wrote:
> > > Hi, Kowshik,
> > >
> > > Thanks for the reply. A few more comments below.
> > >
> > > 100.6 For every new request, the admin needs to control who is allowed
> > > to issue that request if security is enabled. So, we need to assign the
> > new
> > > request a ResourceType and possible AclOperations. See
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as an example.
> > >
> >
> > Yeah, agreed.  To be more specific, the permissions required for this
> > should be Alter on Cluster, right?  It's certainly something only system
> > administrators should be doing (KIP-455 also specifies ALTER on CLUSTER)
> >
> > best,
> > Colin
> >
> >
> > > 105. If we change delete to disable, it's better to do this
> consistently
> > in
> > > request protocol and admin api as well.
> > >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > for new features to be included in minor releases too. Should we make
> the
> > > feature versioning match the release versioning?
> > >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version
> after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> > >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead
> of
> > 48.
> > >
> > > Jun
> > >
> > >
> > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> kprakasam@confluent.io>
> > > wrote:
> > >
> > > > Hey Jun,
> > > >
> > > > Thanks a lot for the great feedback! Please note that the design
> > > > has changed a little bit on the KIP, and we now propagate the
> finalized
> > > > features metadata only via ZK watches (instead of
> UpdateMetadataRequest
> > > > from the controller).
> > > >
> > > > Please find below my response to your questions/feedback, with the
> > prefix
> > > > "(Kowshik):".
> > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should
> we
> > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > >
> > > > (Kowshik): Great point! Done. I have added a timeout field. Note: we
> no
> > > > longer
> > > > wait for responses from brokers, since the design has been changed so
> > that
> > > > the
> > > > features information is propagated via ZK. Nevertheless, it is right
> to
> > > > have a timeout
> > > > for the request.
> > > >
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > request.
> > > >
> > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > error
> > > > code and a message.
> > > > Previously it was not echoing the "request", rather it was returning
> > the
> > > > latest set of
> > > > cluster-wide finalized features (after applying the updates). But you
> > are
> > > > right,
> > > > the additional info is not required, so I have removed it from the
> > response
> > > > schema.
> > > >
> > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > features?
> > > >
> > > > (Kowshik): This is already present in the KIP via the
> > 'DescribeFeatures'
> > > > Admin API,
> > > > which, underneath covers uses the ApiVersionsRequest to list/describe
> > the
> > > > existing features. Please read the 'Tooling support' section.
> > > >
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> For
> > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > just
> > > > > ignores this? An alternative way is to have a separate
> > > > DeleteFeaturesRequest
> > > >
> > > > (Kowshik): Great point! I have modified the KIP now to have 2
> separate
> > > > controller APIs
> > > > serving these different purposes:
> > > > 1. updateFeatures
> > > > 2. deleteFeatures
> > > >
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > >
> > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > > > version), and
> > > > it is just the ZK node version. Basically, this is the epoch for the
> > > > cluster-wide
> > > > finalized feature version metadata. This metadata is served to
> clients
> > via
> > > > the
> > > > ApiVersionsResponse (for reads). We propagate updates from the
> > '/features'
> > > > ZK node
> > > > to all brokers, via ZK watches setup by each broker on the
> '/features'
> > > > node.
> > > >
> > > > Now here is why the ordering is important:
> > > > ZK watches don't propagate at the same time. As a result, the
> > > > ApiVersionsResponse
> > > > is eventually consistent across brokers. This can introduce cases
> > > > where clients see an older lower epoch of the features metadata,
> after
> > a
> > > > more recent
> > > > higher epoch was returned at a previous point in time. We expect
> > clients
> > > > to always employ the rule that the latest received higher epoch of
> > metadata
> > > > always trumps an older smaller epoch. Those clients that are external
> > to
> > > > Kafka should strongly consider discovering the latest metadata once
> > during
> > > > startup from the brokers, and if required refresh the metadata
> > periodically
> > > > (to get the latest metadata).
> > > >
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > >
> > > > (Kowshik): What is ACL, and how could I find out which one to
> specify?
> > > > Please could you provide me some pointers? I'll be glad to update the
> > > > KIP once I know the next steps.
> > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > the json?
> > > >
> > > > (Kowshik): Great point! Done. I've increased the version in the
> broker
> > json
> > > > by 1.
> > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > >
> > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > instead of
> > > > explicitly
> > > > incremented epoch.
> > > >
> > > > > 103. "Enabling the actual semantics of a feature version
> > cluster-wide is
> > > > > left to the discretion of the logic implementing the feature (ex:
> > can be
> > > > > done via dynamic broker config)." Does that mean the broker
> > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > >
> > > > (Kowshik): Not really. The text was just conveying that a broker
> could
> > > > "know" of
> > > > a new feature version, but it does not mean the broker should have
> also
> > > > activated the effects of the feature version. Knowing vs activation
> > are 2
> > > > separate things,
> > > > and the latter can be achieved by dynamic config. I have reworded the
> > text
> > > > to
> > > > make this clear to the reader.
> > > >
> > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > >
> > > > (Kowshik): With the new improved design, we have completely
> eliminated
> > the
> > > > need to
> > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> > deliver the
> > > > notifications for changes to the '/features' ZK node.
> > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > better
> > > > > to use enable/disable?
> > > >
> > > > (Kowshik): For delete, yes, I have changed it so that we instead call
> > it
> > > > 'disable'.
> > > > However for 'update', it can now also refer to either an upgrade or a
> > > > forced downgrade.
> > > > Therefore, I have left it the way it is, just calling it as just
> > 'update'.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should
> we
> > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > request.
> > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > features?
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request.
> For
> > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > just
> > > > > ignores this? An alternative way is to have a separate
> > > > > DeleteFeaturesRequest
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > > the json?
> > > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > > >
> > > > > 103. "Enabling the actual semantics of a feature version
> > cluster-wide is
> > > > > left to the discretion of the logic implementing the feature (ex:
> > can be
> > > > > done via dynamic broker config)." Does that mean the broker
> > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > better
> > > > > to use enable/disable?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > kprakasam@confluent.io
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey Boyang,
> > > > > >
> > > > > > Thanks for the great feedback! I have updated the KIP based on
> your
> > > > > > feedback.
> > > > > > Please find my response below for your comments, look for
> sentences
> > > > > > starting
> > > > > > with "(Kowshik)" below.
> > > > > >
> > > > > >
> > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > traffic"
> > > > > could
> > > > > > be
> > > > > > > converted as "When is it safe for the brokers to start serving
> > new
> > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> > in
> > > > the
> > > > > > > context.
> > > > > >
> > > > > > (Kowshik): Great point! Done.
> > > > > >
> > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > seems a
> > > > > > > bit blurred. Could you point a reference to later section that
> we
> > > > going
> > > > > > to
> > > > > > > store it in Zookeeper and update it every time when there is a
> > > > feature
> > > > > > > change?
> > > > > >
> > > > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > > > >
> > > > > >
> > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > KIP,
> > > > for
> > > > > > > features such as group coordinator semantics, there is no legal
> > > > > scenario
> > > > > > to
> > > > > > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > > > > > error-prone as human faults happen all the time. I'm assuming
> as
> > new
> > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > feature
> > > > > > > creation to indicate whether this feature is "downgradable".
> > Could
> > > > you
> > > > > > > explain a bit more on the extra engineering effort for shipping
> > this
> > > > > KIP
> > > > > > > with downgrade protection in place?
> > > > > >
> > > > > > (Kowshik): Great point! I'd agree and disagree here. While I
> agree
> > that
> > > > > > accidental
> > > > > > downgrades can cause problems, I also think sometimes downgrades
> > should
> > > > > > be allowed for emergency reasons (not all downgrades cause
> issues).
> > > > > > It is just subjective to the feature being downgraded.
> > > > > >
> > > > > > To be more strict about feature version downgrades, I have
> > modified the
> > > > > KIP
> > > > > > proposing that we mandate a `--force-downgrade` flag be used in
> the
> > > > > > UPDATE_FEATURES api
> > > > > > and the tooling, whenever the human is downgrading a finalized
> > feature
> > > > > > version.
> > > > > > Hopefully this should cover the requirement, until we find the
> > need for
> > > > > > advanced downgrade support.
> > > > > >
> > > > > > > 4. "Each broker’s supported dictionary of feature versions will
> > be
> > > > > > defined
> > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > feature,
> > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > request
> > > > > > > immediately, which introduces a time gap and the
> > intended-to-close
> > > > > > feature
> > > > > > > could actually serve request during this phase. Do you think we
> > > > should
> > > > > > also
> > > > > > > support configurations as well so that admin user could freely
> > roll
> > > > up
> > > > > a
> > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > worrying
> > > > > > > about the turnaround time to propagate the message only after
> the
> > > > > cluster
> > > > > > > starts up?
> > > > > >
> > > > > > (Kowshik): This is a great point/question. One of the
> expectations
> > out
> > > > of
> > > > > > this KIP, which is
> > > > > > already followed in the broker, is the following.
> > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > presence
> > > > in
> > > > > > ZK,
> > > > > >    along with advertising it’s supported features.
> > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > UpdateMetadataRequest
> > > > > >    from the controller, which contains the latest finalized
> > features as
> > > > > > seen by
> > > > > >    the controller. The broker validates this data against it’s
> > > > supported
> > > > > > features to
> > > > > >    make sure there is no mismatch (it will shutdown if there is
> an
> > > > > > incompatibility).
> > > > > >
> > > > > > It is expected that during the time between the 2 events T1 and
> > T2, the
> > > > > > broker is
> > > > > > almost a silent entity in the cluster. It does not add any value
> > to the
> > > > > > cluster, or carry
> > > > > > out any important broker activities. By “important”, I mean it is
> > not
> > > > > doing
> > > > > > mutations
> > > > > > on it’s persistence, not mutating critical in-memory state, won’t
> > be
> > > > > > serving
> > > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > > partitions
> > > > > > until
> > > > > > it receives UpdateMetadataRequest from controller. Anything the
> > broker
> > > > is
> > > > > > doing up
> > > > > > until this point is not damaging/useful.
> > > > > >
> > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > .
> > > > > >
> > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > Feature",
> > > > > may
> > > > > > be
> > > > > > > I misunderstood something, I thought the features are defined
> in
> > > > broker
> > > > > > > code, so admin could not really create a new feature?
> > > > > >
> > > > > > (Kowshik): Great point! You understood this right. Here adding a
> > > > feature
> > > > > > means we are
> > > > > > adding a cluster-wide finalized *max* version for a feature that
> > was
> > > > > > previously never finalized.
> > > > > > I have clarified this in the KIP now.
> > > > > >
> > > > > > > 6. I think we need a separate error code like
> > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > to
> > > > > > > reject a concurrent feature update request.
> > > > > >
> > > > > > (Kowshik): Great point! I have modified the KIP adding the above
> > (see
> > > > > > 'Tooling support -> Admin API changes').
> > > > > >
> > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > the
> > > > > > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > > to
> > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > >
> > > > > > (Kowshik): Nice question! The broker reads finalized feature info
> > > > stored
> > > > > in
> > > > > > ZK,
> > > > > > only during startup when it does a validation. When serving
> > > > > > `ApiVersionsRequest`, the
> > > > > > broker does not read this info from ZK directly. I'd imagine the
> > risk
> > > > is
> > > > > > that it can increase
> > > > > > the ZK read QPS which can be a bottleneck for the system. Today,
> in
> > > > Kafka
> > > > > > we use the
> > > > > > controller to fan out ZK updates to brokers and we want to stick
> to
> > > > that
> > > > > > pattern to avoid
> > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > >
> > > > > > > 8. I was under the impression that user could configure a range
> > of
> > > > > > > supported versions, what's the trade-off for allowing single
> > > > finalized
> > > > > > > version only?
> > > > > >
> > > > > > (Kowshik): Great question! The finalized version of a feature
> > basically
> > > > > > refers to
> > > > > > the cluster-wide finalized feature "maximum" version. For
> example,
> > if
> > > > the
> > > > > > 'group_coordinator' feature
> > > > > > has the finalized version set to 10, then, it means that
> > cluster-wide
> > > > all
> > > > > > versions upto v10 are
> > > > > > supported for this feature. However, note that if some version
> > (ex: v0)
> > > > > > gets deprecated
> > > > > > for this feature, then we don’t convey that using this scheme
> (also
> > > > > > supporting deprecation is a non-goal).
> > > > > >
> > > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > > finalized
> > > > > > feature "maximum" versions.
> > > > > >
> > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> be
> > a
> > > > > > producer
> > > > > >
> > > > > > (Kowshik): Great point! Done.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > reluctanthero104@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Kowshik,
> > > > > > >
> > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > >
> > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > traffic"
> > > > > could
> > > > > > be
> > > > > > > converted as "When is it safe for the brokers to start serving
> > new
> > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> > in
> > > > the
> > > > > > > context.
> > > > > > >
> > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > seems a
> > > > > > > bit blurred. Could you point a reference to later section that
> we
> > > > going
> > > > > > to
> > > > > > > store it in Zookeeper and update it every time when there is a
> > > > feature
> > > > > > > change?
> > > > > > >
> > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > KIP,
> > > > for
> > > > > > > features such as group coordinator semantics, there is no legal
> > > > > scenario
> > > > > > to
> > > > > > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > > > > > error-prone as human faults happen all the time. I'm assuming
> as
> > new
> > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > feature
> > > > > > > creation to indicate whether this feature is "downgradable".
> > Could
> > > > you
> > > > > > > explain a bit more on the extra engineering effort for shipping
> > this
> > > > > KIP
> > > > > > > with downgrade protection in place?
> > > > > > >
> > > > > > > 4. "Each broker’s supported dictionary of feature versions will
> > be
> > > > > > defined
> > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > feature,
> > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > request
> > > > > > > immediately, which introduces a time gap and the
> > intended-to-close
> > > > > > feature
> > > > > > > could actually serve request during this phase. Do you think we
> > > > should
> > > > > > also
> > > > > > > support configurations as well so that admin user could freely
> > roll
> > > > up
> > > > > a
> > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > worrying
> > > > > > > about the turnaround time to propagate the message only after
> the
> > > > > cluster
> > > > > > > starts up?
> > > > > > >
> > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > Feature",
> > > > > may
> > > > > > be
> > > > > > > I misunderstood something, I thought the features are defined
> in
> > > > broker
> > > > > > > code, so admin could not really create a new feature?
> > > > > > >
> > > > > > > 6. I think we need a separate error code like
> > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > to
> > > > > > > reject a concurrent feature update request.
> > > > > > >
> > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > the
> > > > > > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > > to
> > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > >
> > > > > > > 8. I was under the impression that user could configure a range
> > of
> > > > > > > supported versions, what's the trade-off for allowing single
> > > > finalized
> > > > > > > version only?
> > > > > > >
> > > > > > > 9. One minor syntax fix: Note that here the "client" here may
> be
> > a
> > > > > > producer
> > > > > > >
> > > > > > > Boyang
> > > > > > >
> > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> cmccabe@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > Hi Colin,
> > > > > > > > >
> > > > > > > > > Thanks for the feedback! I've changed the KIP to address
> your
> > > > > > > > > suggestions.
> > > > > > > > > Please find below my explanation. Here is a link to KIP
> 584:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > 1. '__data_version__' is the version of the finalized
> feature
> > > > > > metadata
> > > > > > > > > (i.e. actual ZK node contents), while the
> > '__schema_version__' is
> > > > > the
> > > > > > > > > version of the schema of the data persisted in ZK. These
> > serve
> > > > > > > different
> > > > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > > > during
> > > > > > > reads,
> > > > > > > > > to differentiate between the 2 versions of eventually
> > consistent
> > > > > > > > 'finalized
> > > > > > > > > features' metadata (i.e. larger metadata version is more
> > recent).
> > > > > > > > > '__schema_version__' provides an additional degree of
> > > > flexibility,
> > > > > > > where
> > > > > > > > if
> > > > > > > > > we decide to change the schema for '/features' node in ZK
> > (in the
> > > > > > > > future),
> > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > serialization/deserialization of the ZK data can be handled
> > > > > safely).
> > > > > > > >
> > > > > > > > Hi Kowshik,
> > > > > > > >
> > > > > > > > If you're talking about a number that lets you know if data
> is
> > more
> > > > > or
> > > > > > > > less recent, we would typically call that an epoch, and not a
> > > > > version.
> > > > > > > For
> > > > > > > > the ZK data structures, the word "version" is typically
> > reserved
> > > > for
> > > > > > > > describing changes to the overall schema of the data that is
> > > > written
> > > > > to
> > > > > > > > ZooKeeper.  We don't even really change the "version" of
> those
> > > > > schemas
> > > > > > > that
> > > > > > > > much, since most changes are backwards-compatible.  But we do
> > > > include
> > > > > > > that
> > > > > > > > version field just in case.
> > > > > > > >
> > > > > > > > I don't think we really need an epoch here, though, since we
> > can
> > > > just
> > > > > > > look
> > > > > > > > at the broker epoch.  Whenever the broker registers, its
> epoch
> > will
> > > > > be
> > > > > > > > greater than the previous broker epoch.  And the newly
> > registered
> > > > > data
> > > > > > > will
> > > > > > > > take priority.  This will be a lot simpler than adding a
> > separate
> > > > > epoch
> > > > > > > > system, I think.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 2. Regarding admin client needing min and max information -
> > you
> > > > are
> > > > > > > > right!
> > > > > > > > > I've changed the KIP such that the Admin API also allows
> the
> > user
> > > > > to
> > > > > > > read
> > > > > > > > > 'supported features' from a specific broker. Please look at
> > the
> > > > > > section
> > > > > > > > > "Admin API changes".
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > deliberate.
> > > > > > I've
> > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > >
> > > > > > > > Sounds good.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> right!
> > > > I've
> > > > > > > > updated
> > > > > > > > > the KIP sketching the functionality provided by this tool,
> > with
> > > > > some
> > > > > > > > > examples. Please look at the section "Tooling support
> > examples".
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks, Kowshik.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > cmccabe@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > >
> > > > > > > > > > In the "Schema" section, do we really need both
> > > > > __schema_version__
> > > > > > > and
> > > > > > > > > > __data_version__?  Can we just have a single version
> field
> > > > here?
> > > > > > > > > >
> > > > > > > > > > Shouldn't the Admin(Client) function have some way to get
> > the
> > > > min
> > > > > > and
> > > > > > > > max
> > > > > > > > > > information that we're exposing as well?  I guess we
> could
> > have
> > > > > > min,
> > > > > > > > max,
> > > > > > > > > > and current.  Unrelated: is the use of Long rather than
> > long
> > > > > > > deliberate
> > > > > > > > > > here?
> > > > > > > > > >
> > > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> > flags
> > > > that
> > > > > > it
> > > > > > > > will
> > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > >
> > > > > > > > > > cheers,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I've opened KIP-584
> > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > >
> > > > > > > > > > > which
> > > > > > > > > > > is intended to provide a versioning scheme for
> features.
> > I'd
> > > > > like
> > > > > > > to
> > > > > > > > use
> > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > feedback
> > > > on
> > > > > > > this.
> > > > > > > > > > > Here
> > > > > > > > > > > is a link to KIP-584
> > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > >  .
> > > > > > > > > > >
> > > > > > > > > > > Thank you!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Kowshik
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Kowshik,

Thanks for the great write-up, overall it reads great to me already. Adding
a few meta comments here:

1. Could you explain a bit what would the "the set of features supported by
a broker" information, beyond the cluster-level finalized features, be used
by the client? I think that if we consider all of the features should be
"cluster-wide", i.e. the client may need to talk any brokers of the cluster
to execute specific features, then knowing the supported versions of that
feature from a single broker would not help much, and hence it is
unnecessary to include this information --- maybe I overlooked some use
cases here :) in that case, could you augment the KIP to add some
motivational examples?

2. For the cluster-wide FinalizedFeature (for both ZK storage schema, and
the ApiVersionResponse protocol schema), I understand that our assumption
for using a single version value is that the cluster supports all versions
up to that max-version. However, I wonder if that assumption is appropriate
under normal operations besides Boyang's questions regarding deprecation:
each brokers supported versions for a feature may not always start with
version 0 actually, while a client getting the version X assumes that any
version under X would be supported; so if client only knows about how to
execute a feature on version Y (< X) and then sends a request to broker,
what happens if it only knows at that time that broker actually only
supported min-version larger than Y? Today on the request-level, we either
auto-downgrade the request version (which is not recommended) or we throw
an exception, which could be too late since the client already executed
some steps of that feature which have side-effects persisted.

Or do we require clients seeing cluster-wide supporter version X must be
able to execute the feature at exactly version X? If so, that seems too
restrictive to me too.

3. Regarding the "Incompatible broker lifetime race condition" section, my
understanding is actually a little bit different, please correct me if I'm
wrong: during broker starts up, after it has registered in ZK along with
its supported versions, the validation procedure is actually executed at
both the broker side as well as the controller:

3.a) the broker reads the cluster-level feature vectors from ZK directly
and compare with its own supported versions; if the validation fails, it
will shutdown itself, otherwise, proceed normally.
3.b) upon being notified through ZK watchers of the newly registered
broker, the controller will ALSO execute the validation comparing its
registry's supported feature versions with the cluster-level feature
vectors; if the validation fails, the controller will stop the remaining of
the new-broker-startup procedure like potentially adding it to some
partition's replica list or moving leaders to it.

The key here is that 3.b) on the controller side is serially executed with
all other controller operations, including the add/update/delete-feature
request handling. So if the broker-startup registry is executed first, then
the later update-feature request which would make the broker incompatible
would be rejected; if the update-feature request is handled first, then the
broker-startup logic would abort since the validation fails. In that sense,
there would be no race condition windows -- of course that's based on my
understanding that validation is also executed on the controller side.
Please let me know if that makes sense?


Guozhang


On Sat, Apr 4, 2020 at 8:36 PM Colin McCabe <cm...@apache.org> wrote:

> On Fri, Apr 3, 2020, at 11:24, Jun Rao wrote:
> > Hi, Kowshik,
> >
> > Thanks for the reply. A few more comments below.
> >
> > 100.6 For every new request, the admin needs to control who is allowed
> > to issue that request if security is enabled. So, we need to assign the
> new
> > request a ResourceType and possible AclOperations. See
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > as an example.
> >
>
> Yeah, agreed.  To be more specific, the permissions required for this
> should be Alter on Cluster, right?  It's certainly something only system
> administrators should be doing (KIP-455 also specifies ALTER on CLUSTER)
>
> best,
> Colin
>
>
> > 105. If we change delete to disable, it's better to do this consistently
> in
> > request protocol and admin api as well.
> >
> > 110. The minVersion/maxVersion for features use int64. Currently, our
> > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > for new features to be included in minor releases too. Should we make the
> > feature versioning match the release versioning?
> >
> > 111. "During regular operations, the data in the ZK node can be mutated
> > only via a specific admin API served only by the controller." I am
> > wondering why can't the controller auto finalize a feature version after
> > all brokers are upgraded? For new users who download the latest version
> to
> > build a new cluster, it's inconvenient for them to have to manually
> enable
> > each feature.
> >
> > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> 48.
> >
> > Jun
> >
> >
> > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> > wrote:
> >
> > > Hey Jun,
> > >
> > > Thanks a lot for the great feedback! Please note that the design
> > > has changed a little bit on the KIP, and we now propagate the finalized
> > > features metadata only via ZK watches (instead of UpdateMetadataRequest
> > > from the controller).
> > >
> > > Please find below my response to your questions/feedback, with the
> prefix
> > > "(Kowshik):".
> > >
> > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > 100.1 Since this request waits for responses from brokers, should we
> add
> > > a
> > > > timeout in the request (like createTopicRequest)?
> > >
> > > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> > > longer
> > > wait for responses from brokers, since the design has been changed so
> that
> > > the
> > > features information is propagated via ZK. Nevertheless, it is right to
> > > have a timeout
> > > for the request.
> > >
> > > > 100.2 The response schema is a bit weird. Typically, the response
> just
> > > > shows an error code and an error message, instead of echoing the
> request.
> > >
> > > (Kowshik): Great point! Yeah, I have modified it to just return an
> error
> > > code and a message.
> > > Previously it was not echoing the "request", rather it was returning
> the
> > > latest set of
> > > cluster-wide finalized features (after applying the updates). But you
> are
> > > right,
> > > the additional info is not required, so I have removed it from the
> response
> > > schema.
> > >
> > > > 100.3 Should we add a separate request to list/describe the existing
> > > > features?
> > >
> > > (Kowshik): This is already present in the KIP via the
> 'DescribeFeatures'
> > > Admin API,
> > > which, underneath covers uses the ApiVersionsRequest to list/describe
> the
> > > existing features. Please read the 'Tooling support' section.
> > >
> > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > DELETE, the version field doesn't make sense. So, I guess the broker
> just
> > > > ignores this? An alternative way is to have a separate
> > > DeleteFeaturesRequest
> > >
> > > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> > > controller APIs
> > > serving these different purposes:
> > > 1. updateFeatures
> > > 2. deleteFeatures
> > >
> > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> increasing
> > > > version of the metadata for finalized features." I am wondering why
> the
> > > > ordering is important?
> > >
> > > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > > version), and
> > > it is just the ZK node version. Basically, this is the epoch for the
> > > cluster-wide
> > > finalized feature version metadata. This metadata is served to clients
> via
> > > the
> > > ApiVersionsResponse (for reads). We propagate updates from the
> '/features'
> > > ZK node
> > > to all brokers, via ZK watches setup by each broker on the '/features'
> > > node.
> > >
> > > Now here is why the ordering is important:
> > > ZK watches don't propagate at the same time. As a result, the
> > > ApiVersionsResponse
> > > is eventually consistent across brokers. This can introduce cases
> > > where clients see an older lower epoch of the features metadata, after
> a
> > > more recent
> > > higher epoch was returned at a previous point in time. We expect
> clients
> > > to always employ the rule that the latest received higher epoch of
> metadata
> > > always trumps an older smaller epoch. Those clients that are external
> to
> > > Kafka should strongly consider discovering the latest metadata once
> during
> > > startup from the brokers, and if required refresh the metadata
> periodically
> > > (to get the latest metadata).
> > >
> > > > 100.6 Could you specify the required ACL for this new request?
> > >
> > > (Kowshik): What is ACL, and how could I find out which one to specify?
> > > Please could you provide me some pointers? I'll be glad to update the
> > > KIP once I know the next steps.
> > >
> > > > 101. For the broker registration ZK node, should we bump up the
> version
> > > in
> > > the json?
> > >
> > > (Kowshik): Great point! Done. I've increased the version in the broker
> json
> > > by 1.
> > >
> > > > 102. For the /features ZK node, not sure if we need the epoch field.
> Each
> > > > ZK node has an internal version field that is incremented on every
> > > update.
> > >
> > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> instead of
> > > explicitly
> > > incremented epoch.
> > >
> > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide is
> > > > left to the discretion of the logic implementing the feature (ex:
> can be
> > > > done via dynamic broker config)." Does that mean the broker
> registration
> > > ZK
> > > > node will be updated dynamically when this happens?
> > >
> > > (Kowshik): Not really. The text was just conveying that a broker could
> > > "know" of
> > > a new feature version, but it does not mean the broker should have also
> > > activated the effects of the feature version. Knowing vs activation
> are 2
> > > separate things,
> > > and the latter can be achieved by dynamic config. I have reworded the
> text
> > > to
> > > make this clear to the reader.
> > >
> > >
> > > > 104. UpdateMetadataRequest
> > > > 104.1 It would be useful to describe when the feature metadata is
> > > included
> > > > in the request. My understanding is that it's only included if (1)
> there
> > > is
> > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > failover.
> > > > 104.2 The new fields have the following versions. Why are the
> versions 3+
> > > > when the top version is bumped to 6?
> > > >       "fields":  [
> > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > >           "about": "The name of the feature."},
> > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > >           "about": "The finalized version for the feature."}
> > > >       ]
> > >
> > > (Kowshik): With the new improved design, we have completely eliminated
> the
> > > need to
> > > use UpdateMetadataRequest. This is because we now rely on ZK to
> deliver the
> > > notifications for changes to the '/features' ZK node.
> > >
> > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > better
> > > > to use enable/disable?
> > >
> > > (Kowshik): For delete, yes, I have changed it so that we instead call
> it
> > > 'disable'.
> > > However for 'update', it can now also refer to either an upgrade or a
> > > forced downgrade.
> > > Therefore, I have left it the way it is, just calling it as just
> 'update'.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > >
> > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > 100.1 Since this request waits for responses from brokers, should we
> add
> > > a
> > > > timeout in the request (like createTopicRequest)?
> > > > 100.2 The response schema is a bit weird. Typically, the response
> just
> > > > shows an error code and an error message, instead of echoing the
> request.
> > > > 100.3 Should we add a separate request to list/describe the existing
> > > > features?
> > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > DELETE, the version field doesn't make sense. So, I guess the broker
> just
> > > > ignores this? An alternative way is to have a separate
> > > > DeleteFeaturesRequest
> > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> increasing
> > > > version of the metadata for finalized features." I am wondering why
> the
> > > > ordering is important?
> > > > 100.6 Could you specify the required ACL for this new request?
> > > >
> > > > 101. For the broker registration ZK node, should we bump up the
> version
> > > in
> > > > the json?
> > > >
> > > > 102. For the /features ZK node, not sure if we need the epoch field.
> Each
> > > > ZK node has an internal version field that is incremented on every
> > > update.
> > > >
> > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide is
> > > > left to the discretion of the logic implementing the feature (ex:
> can be
> > > > done via dynamic broker config)." Does that mean the broker
> registration
> > > ZK
> > > > node will be updated dynamically when this happens?
> > > >
> > > > 104. UpdateMetadataRequest
> > > > 104.1 It would be useful to describe when the feature metadata is
> > > included
> > > > in the request. My understanding is that it's only included if (1)
> there
> > > is
> > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > failover.
> > > > 104.2 The new fields have the following versions. Why are the
> versions 3+
> > > > when the top version is bumped to 6?
> > > >       "fields":  [
> > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > >           "about": "The name of the feature."},
> > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > >           "about": "The finalized version for the feature."}
> > > >       ]
> > > >
> > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > better
> > > > to use enable/disable?
> > > >
> > > > Jun
> > > >
> > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> kprakasam@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > Hey Boyang,
> > > > >
> > > > > Thanks for the great feedback! I have updated the KIP based on your
> > > > > feedback.
> > > > > Please find my response below for your comments, look for sentences
> > > > > starting
> > > > > with "(Kowshik)" below.
> > > > >
> > > > >
> > > > > > 1. "When is it safe for the brokers to begin handling EOS
> traffic"
> > > > could
> > > > > be
> > > > > > converted as "When is it safe for the brokers to start serving
> new
> > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> in
> > > the
> > > > > > context.
> > > > >
> > > > > (Kowshik): Great point! Done.
> > > > >
> > > > > > 2. In the *Explanation *section, the metadata version number part
> > > > seems a
> > > > > > bit blurred. Could you point a reference to later section that we
> > > going
> > > > > to
> > > > > > store it in Zookeeper and update it every time when there is a
> > > feature
> > > > > > change?
> > > > >
> > > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > > >
> > > > >
> > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> KIP,
> > > for
> > > > > > features such as group coordinator semantics, there is no legal
> > > > scenario
> > > > > to
> > > > > > perform a downgrade at all. So having downgrade door open is
> pretty
> > > > > > error-prone as human faults happen all the time. I'm assuming as
> new
> > > > > > features are implemented, it's not very hard to add a flag during
> > > > feature
> > > > > > creation to indicate whether this feature is "downgradable".
> Could
> > > you
> > > > > > explain a bit more on the extra engineering effort for shipping
> this
> > > > KIP
> > > > > > with downgrade protection in place?
> > > > >
> > > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
> that
> > > > > accidental
> > > > > downgrades can cause problems, I also think sometimes downgrades
> should
> > > > > be allowed for emergency reasons (not all downgrades cause issues).
> > > > > It is just subjective to the feature being downgraded.
> > > > >
> > > > > To be more strict about feature version downgrades, I have
> modified the
> > > > KIP
> > > > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > > > UPDATE_FEATURES api
> > > > > and the tooling, whenever the human is downgrading a finalized
> feature
> > > > > version.
> > > > > Hopefully this should cover the requirement, until we find the
> need for
> > > > > advanced downgrade support.
> > > > >
> > > > > > 4. "Each broker’s supported dictionary of feature versions will
> be
> > > > > defined
> > > > > > in the broker code." So this means in order to restrict a certain
> > > > > feature,
> > > > > > we need to start the broker first and then send a feature gating
> > > > request
> > > > > > immediately, which introduces a time gap and the
> intended-to-close
> > > > > feature
> > > > > > could actually serve request during this phase. Do you think we
> > > should
> > > > > also
> > > > > > support configurations as well so that admin user could freely
> roll
> > > up
> > > > a
> > > > > > cluster with all nodes complying the same feature gating, without
> > > > > worrying
> > > > > > about the turnaround time to propagate the message only after the
> > > > cluster
> > > > > > starts up?
> > > > >
> > > > > (Kowshik): This is a great point/question. One of the expectations
> out
> > > of
> > > > > this KIP, which is
> > > > > already followed in the broker, is the following.
> > > > >  - Imagine at time T1 the broker starts up and registers it’s
> presence
> > > in
> > > > > ZK,
> > > > >    along with advertising it’s supported features.
> > > > >  - Imagine at a future time T2 the broker receives the
> > > > > UpdateMetadataRequest
> > > > >    from the controller, which contains the latest finalized
> features as
> > > > > seen by
> > > > >    the controller. The broker validates this data against it’s
> > > supported
> > > > > features to
> > > > >    make sure there is no mismatch (it will shutdown if there is an
> > > > > incompatibility).
> > > > >
> > > > > It is expected that during the time between the 2 events T1 and
> T2, the
> > > > > broker is
> > > > > almost a silent entity in the cluster. It does not add any value
> to the
> > > > > cluster, or carry
> > > > > out any important broker activities. By “important”, I mean it is
> not
> > > > doing
> > > > > mutations
> > > > > on it’s persistence, not mutating critical in-memory state, won’t
> be
> > > > > serving
> > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > partitions
> > > > > until
> > > > > it receives UpdateMetadataRequest from controller. Anything the
> broker
> > > is
> > > > > doing up
> > > > > until this point is not damaging/useful.
> > > > >
> > > > > I’ve clarified the above in the KIP, see this new section:
> > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > .
> > > > >
> > > > > > 5. "adding a new Feature, updating or deleting an existing
> Feature",
> > > > may
> > > > > be
> > > > > > I misunderstood something, I thought the features are defined in
> > > broker
> > > > > > code, so admin could not really create a new feature?
> > > > >
> > > > > (Kowshik): Great point! You understood this right. Here adding a
> > > feature
> > > > > means we are
> > > > > adding a cluster-wide finalized *max* version for a feature that
> was
> > > > > previously never finalized.
> > > > > I have clarified this in the KIP now.
> > > > >
> > > > > > 6. I think we need a separate error code like
> > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > to
> > > > > > reject a concurrent feature update request.
> > > > >
> > > > > (Kowshik): Great point! I have modified the KIP adding the above
> (see
> > > > > 'Tooling support -> Admin API changes').
> > > > >
> > > > > > 7. I think we haven't discussed the alternative solution to pass
> the
> > > > > > feature information through Zookeeper. Is that mentioned in the
> KIP
> > > to
> > > > > > justify why using UpdateMetadata is more favorable?
> > > > >
> > > > > (Kowshik): Nice question! The broker reads finalized feature info
> > > stored
> > > > in
> > > > > ZK,
> > > > > only during startup when it does a validation. When serving
> > > > > `ApiVersionsRequest`, the
> > > > > broker does not read this info from ZK directly. I'd imagine the
> risk
> > > is
> > > > > that it can increase
> > > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> > > Kafka
> > > > > we use the
> > > > > controller to fan out ZK updates to brokers and we want to stick to
> > > that
> > > > > pattern to avoid
> > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > >
> > > > > > 8. I was under the impression that user could configure a range
> of
> > > > > > supported versions, what's the trade-off for allowing single
> > > finalized
> > > > > > version only?
> > > > >
> > > > > (Kowshik): Great question! The finalized version of a feature
> basically
> > > > > refers to
> > > > > the cluster-wide finalized feature "maximum" version. For example,
> if
> > > the
> > > > > 'group_coordinator' feature
> > > > > has the finalized version set to 10, then, it means that
> cluster-wide
> > > all
> > > > > versions upto v10 are
> > > > > supported for this feature. However, note that if some version
> (ex: v0)
> > > > > gets deprecated
> > > > > for this feature, then we don’t convey that using this scheme (also
> > > > > supporting deprecation is a non-goal).
> > > > >
> > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > finalized
> > > > > feature "maximum" versions.
> > > > >
> > > > > > 9. One minor syntax fix: Note that here the "client" here may be
> a
> > > > > producer
> > > > >
> > > > > (Kowshik): Great point! Done.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > >
> > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > reluctanthero104@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hey Kowshik,
> > > > > >
> > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > >
> > > > > > 1. "When is it safe for the brokers to begin handling EOS
> traffic"
> > > > could
> > > > > be
> > > > > > converted as "When is it safe for the brokers to start serving
> new
> > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> in
> > > the
> > > > > > context.
> > > > > >
> > > > > > 2. In the *Explanation *section, the metadata version number part
> > > > seems a
> > > > > > bit blurred. Could you point a reference to later section that we
> > > going
> > > > > to
> > > > > > store it in Zookeeper and update it every time when there is a
> > > feature
> > > > > > change?
> > > > > >
> > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> KIP,
> > > for
> > > > > > features such as group coordinator semantics, there is no legal
> > > > scenario
> > > > > to
> > > > > > perform a downgrade at all. So having downgrade door open is
> pretty
> > > > > > error-prone as human faults happen all the time. I'm assuming as
> new
> > > > > > features are implemented, it's not very hard to add a flag during
> > > > feature
> > > > > > creation to indicate whether this feature is "downgradable".
> Could
> > > you
> > > > > > explain a bit more on the extra engineering effort for shipping
> this
> > > > KIP
> > > > > > with downgrade protection in place?
> > > > > >
> > > > > > 4. "Each broker’s supported dictionary of feature versions will
> be
> > > > > defined
> > > > > > in the broker code." So this means in order to restrict a certain
> > > > > feature,
> > > > > > we need to start the broker first and then send a feature gating
> > > > request
> > > > > > immediately, which introduces a time gap and the
> intended-to-close
> > > > > feature
> > > > > > could actually serve request during this phase. Do you think we
> > > should
> > > > > also
> > > > > > support configurations as well so that admin user could freely
> roll
> > > up
> > > > a
> > > > > > cluster with all nodes complying the same feature gating, without
> > > > > worrying
> > > > > > about the turnaround time to propagate the message only after the
> > > > cluster
> > > > > > starts up?
> > > > > >
> > > > > > 5. "adding a new Feature, updating or deleting an existing
> Feature",
> > > > may
> > > > > be
> > > > > > I misunderstood something, I thought the features are defined in
> > > broker
> > > > > > code, so admin could not really create a new feature?
> > > > > >
> > > > > > 6. I think we need a separate error code like
> > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > to
> > > > > > reject a concurrent feature update request.
> > > > > >
> > > > > > 7. I think we haven't discussed the alternative solution to pass
> the
> > > > > > feature information through Zookeeper. Is that mentioned in the
> KIP
> > > to
> > > > > > justify why using UpdateMetadata is more favorable?
> > > > > >
> > > > > > 8. I was under the impression that user could configure a range
> of
> > > > > > supported versions, what's the trade-off for allowing single
> > > finalized
> > > > > > version only?
> > > > > >
> > > > > > 9. One minor syntax fix: Note that here the "client" here may be
> a
> > > > > producer
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmccabe@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > Hi Colin,
> > > > > > > >
> > > > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > > > suggestions.
> > > > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > .
> > > > > > > >
> > > > > > > > 1. '__data_version__' is the version of the finalized feature
> > > > > metadata
> > > > > > > > (i.e. actual ZK node contents), while the
> '__schema_version__' is
> > > > the
> > > > > > > > version of the schema of the data persisted in ZK. These
> serve
> > > > > > different
> > > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > > during
> > > > > > reads,
> > > > > > > > to differentiate between the 2 versions of eventually
> consistent
> > > > > > > 'finalized
> > > > > > > > features' metadata (i.e. larger metadata version is more
> recent).
> > > > > > > > '__schema_version__' provides an additional degree of
> > > flexibility,
> > > > > > where
> > > > > > > if
> > > > > > > > we decide to change the schema for '/features' node in ZK
> (in the
> > > > > > > future),
> > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > serialization/deserialization of the ZK data can be handled
> > > > safely).
> > > > > > >
> > > > > > > Hi Kowshik,
> > > > > > >
> > > > > > > If you're talking about a number that lets you know if data is
> more
> > > > or
> > > > > > > less recent, we would typically call that an epoch, and not a
> > > > version.
> > > > > > For
> > > > > > > the ZK data structures, the word "version" is typically
> reserved
> > > for
> > > > > > > describing changes to the overall schema of the data that is
> > > written
> > > > to
> > > > > > > ZooKeeper.  We don't even really change the "version" of those
> > > > schemas
> > > > > > that
> > > > > > > much, since most changes are backwards-compatible.  But we do
> > > include
> > > > > > that
> > > > > > > version field just in case.
> > > > > > >
> > > > > > > I don't think we really need an epoch here, though, since we
> can
> > > just
> > > > > > look
> > > > > > > at the broker epoch.  Whenever the broker registers, its epoch
> will
> > > > be
> > > > > > > greater than the previous broker epoch.  And the newly
> registered
> > > > data
> > > > > > will
> > > > > > > take priority.  This will be a lot simpler than adding a
> separate
> > > > epoch
> > > > > > > system, I think.
> > > > > > >
> > > > > > > >
> > > > > > > > 2. Regarding admin client needing min and max information -
> you
> > > are
> > > > > > > right!
> > > > > > > > I've changed the KIP such that the Admin API also allows the
> user
> > > > to
> > > > > > read
> > > > > > > > 'supported features' from a specific broker. Please look at
> the
> > > > > section
> > > > > > > > "Admin API changes".
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > >
> > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> deliberate.
> > > > > I've
> > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > >
> > > > > > > Sounds good.
> > > > > > >
> > > > > > > >
> > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > > I've
> > > > > > > updated
> > > > > > > > the KIP sketching the functionality provided by this tool,
> with
> > > > some
> > > > > > > > examples. Please look at the section "Tooling support
> examples".
> > > > > > > >
> > > > > > > > Thank you!
> > > > > > >
> > > > > > >
> > > > > > > Thanks, Kowshik.
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > cmccabe@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > >
> > > > > > > > > In the "Schema" section, do we really need both
> > > > __schema_version__
> > > > > > and
> > > > > > > > > __data_version__?  Can we just have a single version field
> > > here?
> > > > > > > > >
> > > > > > > > > Shouldn't the Admin(Client) function have some way to get
> the
> > > min
> > > > > and
> > > > > > > max
> > > > > > > > > information that we're exposing as well?  I guess we could
> have
> > > > > min,
> > > > > > > max,
> > > > > > > > > and current.  Unrelated: is the use of Long rather than
> long
> > > > > > deliberate
> > > > > > > > > here?
> > > > > > > > >
> > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> flags
> > > that
> > > > > it
> > > > > > > will
> > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I've opened KIP-584
> > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > >
> > > > > > > > > > which
> > > > > > > > > > is intended to provide a versioning scheme for features.
> I'd
> > > > like
> > > > > > to
> > > > > > > use
> > > > > > > > > > this thread to discuss the same. I'd appreciate any
> feedback
> > > on
> > > > > > this.
> > > > > > > > > > Here
> > > > > > > > > > is a link to KIP-584
> > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > >  .
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Colin McCabe <cm...@apache.org>.
On Fri, Apr 3, 2020, at 11:24, Jun Rao wrote:
> Hi, Kowshik,
> 
> Thanks for the reply. A few more comments below.
> 
> 100.6 For every new request, the admin needs to control who is allowed 
> to issue that request if security is enabled. So, we need to assign the new
> request a ResourceType and possible AclOperations. See
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> as an example.
> 

Yeah, agreed.  To be more specific, the permissions required for this should be Alter on Cluster, right?  It's certainly something only system administrators should be doing (KIP-455 also specifies ALTER on CLUSTER)

best,
Colin


> 105. If we change delete to disable, it's better to do this consistently in
> request protocol and admin api as well.
> 
> 110. The minVersion/maxVersion for features use int64. Currently, our
> release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> for new features to be included in minor releases too. Should we make the
> feature versioning match the release versioning?
> 
> 111. "During regular operations, the data in the ZK node can be mutated
> only via a specific admin API served only by the controller." I am
> wondering why can't the controller auto finalize a feature version after
> all brokers are upgraded? For new users who download the latest version to
> build a new cluster, it's inconvenient for them to have to manually enable
> each feature.
> 
> 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of 48.
> 
> Jun
> 
> 
> On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> wrote:
> 
> > Hey Jun,
> >
> > Thanks a lot for the great feedback! Please note that the design
> > has changed a little bit on the KIP, and we now propagate the finalized
> > features metadata only via ZK watches (instead of UpdateMetadataRequest
> > from the controller).
> >
> > Please find below my response to your questions/feedback, with the prefix
> > "(Kowshik):".
> >
> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > 100.1 Since this request waits for responses from brokers, should we add
> > a
> > > timeout in the request (like createTopicRequest)?
> >
> > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> > longer
> > wait for responses from brokers, since the design has been changed so that
> > the
> > features information is propagated via ZK. Nevertheless, it is right to
> > have a timeout
> > for the request.
> >
> > > 100.2 The response schema is a bit weird. Typically, the response just
> > > shows an error code and an error message, instead of echoing the request.
> >
> > (Kowshik): Great point! Yeah, I have modified it to just return an error
> > code and a message.
> > Previously it was not echoing the "request", rather it was returning the
> > latest set of
> > cluster-wide finalized features (after applying the updates). But you are
> > right,
> > the additional info is not required, so I have removed it from the response
> > schema.
> >
> > > 100.3 Should we add a separate request to list/describe the existing
> > > features?
> >
> > (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
> > Admin API,
> > which, underneath covers uses the ApiVersionsRequest to list/describe the
> > existing features. Please read the 'Tooling support' section.
> >
> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > DELETE, the version field doesn't make sense. So, I guess the broker just
> > > ignores this? An alternative way is to have a separate
> > DeleteFeaturesRequest
> >
> > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> > controller APIs
> > serving these different purposes:
> > 1. updateFeatures
> > 2. deleteFeatures
> >
> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > > version of the metadata for finalized features." I am wondering why the
> > > ordering is important?
> >
> > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > version), and
> > it is just the ZK node version. Basically, this is the epoch for the
> > cluster-wide
> > finalized feature version metadata. This metadata is served to clients via
> > the
> > ApiVersionsResponse (for reads). We propagate updates from the '/features'
> > ZK node
> > to all brokers, via ZK watches setup by each broker on the '/features'
> > node.
> >
> > Now here is why the ordering is important:
> > ZK watches don't propagate at the same time. As a result, the
> > ApiVersionsResponse
> > is eventually consistent across brokers. This can introduce cases
> > where clients see an older lower epoch of the features metadata, after a
> > more recent
> > higher epoch was returned at a previous point in time. We expect clients
> > to always employ the rule that the latest received higher epoch of metadata
> > always trumps an older smaller epoch. Those clients that are external to
> > Kafka should strongly consider discovering the latest metadata once during
> > startup from the brokers, and if required refresh the metadata periodically
> > (to get the latest metadata).
> >
> > > 100.6 Could you specify the required ACL for this new request?
> >
> > (Kowshik): What is ACL, and how could I find out which one to specify?
> > Please could you provide me some pointers? I'll be glad to update the
> > KIP once I know the next steps.
> >
> > > 101. For the broker registration ZK node, should we bump up the version
> > in
> > the json?
> >
> > (Kowshik): Great point! Done. I've increased the version in the broker json
> > by 1.
> >
> > > 102. For the /features ZK node, not sure if we need the epoch field. Each
> > > ZK node has an internal version field that is incremented on every
> > update.
> >
> > (Kowshik): Great point! Done. I'm using the ZK node version now, instead of
> > explicitly
> > incremented epoch.
> >
> > > 103. "Enabling the actual semantics of a feature version cluster-wide is
> > > left to the discretion of the logic implementing the feature (ex: can be
> > > done via dynamic broker config)." Does that mean the broker registration
> > ZK
> > > node will be updated dynamically when this happens?
> >
> > (Kowshik): Not really. The text was just conveying that a broker could
> > "know" of
> > a new feature version, but it does not mean the broker should have also
> > activated the effects of the feature version. Knowing vs activation are 2
> > separate things,
> > and the latter can be achieved by dynamic config. I have reworded the text
> > to
> > make this clear to the reader.
> >
> >
> > > 104. UpdateMetadataRequest
> > > 104.1 It would be useful to describe when the feature metadata is
> > included
> > > in the request. My understanding is that it's only included if (1) there
> > is
> > > a change to the finalized feature; (2) broker restart; (3) controller
> > > failover.
> > > 104.2 The new fields have the following versions. Why are the versions 3+
> > > when the top version is bumped to 6?
> > >       "fields":  [
> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > >           "about": "The name of the feature."},
> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >           "about": "The finalized version for the feature."}
> > >       ]
> >
> > (Kowshik): With the new improved design, we have completely eliminated the
> > need to
> > use UpdateMetadataRequest. This is because we now rely on ZK to deliver the
> > notifications for changes to the '/features' ZK node.
> >
> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > better
> > > to use enable/disable?
> >
> > (Kowshik): For delete, yes, I have changed it so that we instead call it
> > 'disable'.
> > However for 'update', it can now also refer to either an upgrade or a
> > forced downgrade.
> > Therefore, I have left it the way it is, just calling it as just 'update'.
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the KIP. Looks good overall. A few comments below.
> > >
> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > 100.1 Since this request waits for responses from brokers, should we add
> > a
> > > timeout in the request (like createTopicRequest)?
> > > 100.2 The response schema is a bit weird. Typically, the response just
> > > shows an error code and an error message, instead of echoing the request.
> > > 100.3 Should we add a separate request to list/describe the existing
> > > features?
> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > DELETE, the version field doesn't make sense. So, I guess the broker just
> > > ignores this? An alternative way is to have a separate
> > > DeleteFeaturesRequest
> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > > version of the metadata for finalized features." I am wondering why the
> > > ordering is important?
> > > 100.6 Could you specify the required ACL for this new request?
> > >
> > > 101. For the broker registration ZK node, should we bump up the version
> > in
> > > the json?
> > >
> > > 102. For the /features ZK node, not sure if we need the epoch field. Each
> > > ZK node has an internal version field that is incremented on every
> > update.
> > >
> > > 103. "Enabling the actual semantics of a feature version cluster-wide is
> > > left to the discretion of the logic implementing the feature (ex: can be
> > > done via dynamic broker config)." Does that mean the broker registration
> > ZK
> > > node will be updated dynamically when this happens?
> > >
> > > 104. UpdateMetadataRequest
> > > 104.1 It would be useful to describe when the feature metadata is
> > included
> > > in the request. My understanding is that it's only included if (1) there
> > is
> > > a change to the finalized feature; (2) broker restart; (3) controller
> > > failover.
> > > 104.2 The new fields have the following versions. Why are the versions 3+
> > > when the top version is bumped to 6?
> > >       "fields":  [
> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > >           "about": "The name of the feature."},
> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >           "about": "The finalized version for the feature."}
> > >       ]
> > >
> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > better
> > > to use enable/disable?
> > >
> > > Jun
> > >
> > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kprakasam@confluent.io
> > >
> > > wrote:
> > >
> > > > Hey Boyang,
> > > >
> > > > Thanks for the great feedback! I have updated the KIP based on your
> > > > feedback.
> > > > Please find my response below for your comments, look for sentences
> > > > starting
> > > > with "(Kowshik)" below.
> > > >
> > > >
> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > > could
> > > > be
> > > > > converted as "When is it safe for the brokers to start serving new
> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > the
> > > > > context.
> > > >
> > > > (Kowshik): Great point! Done.
> > > >
> > > > > 2. In the *Explanation *section, the metadata version number part
> > > seems a
> > > > > bit blurred. Could you point a reference to later section that we
> > going
> > > > to
> > > > > store it in Zookeeper and update it every time when there is a
> > feature
> > > > > change?
> > > >
> > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > >
> > > >
> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > for
> > > > > features such as group coordinator semantics, there is no legal
> > > scenario
> > > > to
> > > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > > error-prone as human faults happen all the time. I'm assuming as new
> > > > > features are implemented, it's not very hard to add a flag during
> > > feature
> > > > > creation to indicate whether this feature is "downgradable". Could
> > you
> > > > > explain a bit more on the extra engineering effort for shipping this
> > > KIP
> > > > > with downgrade protection in place?
> > > >
> > > > (Kowshik): Great point! I'd agree and disagree here. While I agree that
> > > > accidental
> > > > downgrades can cause problems, I also think sometimes downgrades should
> > > > be allowed for emergency reasons (not all downgrades cause issues).
> > > > It is just subjective to the feature being downgraded.
> > > >
> > > > To be more strict about feature version downgrades, I have modified the
> > > KIP
> > > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > > UPDATE_FEATURES api
> > > > and the tooling, whenever the human is downgrading a finalized feature
> > > > version.
> > > > Hopefully this should cover the requirement, until we find the need for
> > > > advanced downgrade support.
> > > >
> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > > defined
> > > > > in the broker code." So this means in order to restrict a certain
> > > > feature,
> > > > > we need to start the broker first and then send a feature gating
> > > request
> > > > > immediately, which introduces a time gap and the intended-to-close
> > > > feature
> > > > > could actually serve request during this phase. Do you think we
> > should
> > > > also
> > > > > support configurations as well so that admin user could freely roll
> > up
> > > a
> > > > > cluster with all nodes complying the same feature gating, without
> > > > worrying
> > > > > about the turnaround time to propagate the message only after the
> > > cluster
> > > > > starts up?
> > > >
> > > > (Kowshik): This is a great point/question. One of the expectations out
> > of
> > > > this KIP, which is
> > > > already followed in the broker, is the following.
> > > >  - Imagine at time T1 the broker starts up and registers it’s presence
> > in
> > > > ZK,
> > > >    along with advertising it’s supported features.
> > > >  - Imagine at a future time T2 the broker receives the
> > > > UpdateMetadataRequest
> > > >    from the controller, which contains the latest finalized features as
> > > > seen by
> > > >    the controller. The broker validates this data against it’s
> > supported
> > > > features to
> > > >    make sure there is no mismatch (it will shutdown if there is an
> > > > incompatibility).
> > > >
> > > > It is expected that during the time between the 2 events T1 and T2, the
> > > > broker is
> > > > almost a silent entity in the cluster. It does not add any value to the
> > > > cluster, or carry
> > > > out any important broker activities. By “important”, I mean it is not
> > > doing
> > > > mutations
> > > > on it’s persistence, not mutating critical in-memory state, won’t be
> > > > serving
> > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > partitions
> > > > until
> > > > it receives UpdateMetadataRequest from controller. Anything the broker
> > is
> > > > doing up
> > > > until this point is not damaging/useful.
> > > >
> > > > I’ve clarified the above in the KIP, see this new section:
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > .
> > > >
> > > > > 5. "adding a new Feature, updating or deleting an existing Feature",
> > > may
> > > > be
> > > > > I misunderstood something, I thought the features are defined in
> > broker
> > > > > code, so admin could not really create a new feature?
> > > >
> > > > (Kowshik): Great point! You understood this right. Here adding a
> > feature
> > > > means we are
> > > > adding a cluster-wide finalized *max* version for a feature that was
> > > > previously never finalized.
> > > > I have clarified this in the KIP now.
> > > >
> > > > > 6. I think we need a separate error code like
> > > FEATURE_UPDATE_IN_PROGRESS
> > > > to
> > > > > reject a concurrent feature update request.
> > > >
> > > > (Kowshik): Great point! I have modified the KIP adding the above (see
> > > > 'Tooling support -> Admin API changes').
> > > >
> > > > > 7. I think we haven't discussed the alternative solution to pass the
> > > > > feature information through Zookeeper. Is that mentioned in the KIP
> > to
> > > > > justify why using UpdateMetadata is more favorable?
> > > >
> > > > (Kowshik): Nice question! The broker reads finalized feature info
> > stored
> > > in
> > > > ZK,
> > > > only during startup when it does a validation. When serving
> > > > `ApiVersionsRequest`, the
> > > > broker does not read this info from ZK directly. I'd imagine the risk
> > is
> > > > that it can increase
> > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> > Kafka
> > > > we use the
> > > > controller to fan out ZK updates to brokers and we want to stick to
> > that
> > > > pattern to avoid
> > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > >
> > > > > 8. I was under the impression that user could configure a range of
> > > > > supported versions, what's the trade-off for allowing single
> > finalized
> > > > > version only?
> > > >
> > > > (Kowshik): Great question! The finalized version of a feature basically
> > > > refers to
> > > > the cluster-wide finalized feature "maximum" version. For example, if
> > the
> > > > 'group_coordinator' feature
> > > > has the finalized version set to 10, then, it means that cluster-wide
> > all
> > > > versions upto v10 are
> > > > supported for this feature. However, note that if some version (ex: v0)
> > > > gets deprecated
> > > > for this feature, then we don’t convey that using this scheme (also
> > > > supporting deprecation is a non-goal).
> > > >
> > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > finalized
> > > > feature "maximum" versions.
> > > >
> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > > producer
> > > >
> > > > (Kowshik): Great point! Done.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > >
> > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > reluctanthero104@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey Kowshik,
> > > > >
> > > > > thanks for the revised KIP. Got a couple of questions:
> > > > >
> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > > could
> > > > be
> > > > > converted as "When is it safe for the brokers to start serving new
> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > the
> > > > > context.
> > > > >
> > > > > 2. In the *Explanation *section, the metadata version number part
> > > seems a
> > > > > bit blurred. Could you point a reference to later section that we
> > going
> > > > to
> > > > > store it in Zookeeper and update it every time when there is a
> > feature
> > > > > change?
> > > > >
> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > for
> > > > > features such as group coordinator semantics, there is no legal
> > > scenario
> > > > to
> > > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > > error-prone as human faults happen all the time. I'm assuming as new
> > > > > features are implemented, it's not very hard to add a flag during
> > > feature
> > > > > creation to indicate whether this feature is "downgradable". Could
> > you
> > > > > explain a bit more on the extra engineering effort for shipping this
> > > KIP
> > > > > with downgrade protection in place?
> > > > >
> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > > defined
> > > > > in the broker code." So this means in order to restrict a certain
> > > > feature,
> > > > > we need to start the broker first and then send a feature gating
> > > request
> > > > > immediately, which introduces a time gap and the intended-to-close
> > > > feature
> > > > > could actually serve request during this phase. Do you think we
> > should
> > > > also
> > > > > support configurations as well so that admin user could freely roll
> > up
> > > a
> > > > > cluster with all nodes complying the same feature gating, without
> > > > worrying
> > > > > about the turnaround time to propagate the message only after the
> > > cluster
> > > > > starts up?
> > > > >
> > > > > 5. "adding a new Feature, updating or deleting an existing Feature",
> > > may
> > > > be
> > > > > I misunderstood something, I thought the features are defined in
> > broker
> > > > > code, so admin could not really create a new feature?
> > > > >
> > > > > 6. I think we need a separate error code like
> > > FEATURE_UPDATE_IN_PROGRESS
> > > > to
> > > > > reject a concurrent feature update request.
> > > > >
> > > > > 7. I think we haven't discussed the alternative solution to pass the
> > > > > feature information through Zookeeper. Is that mentioned in the KIP
> > to
> > > > > justify why using UpdateMetadata is more favorable?
> > > > >
> > > > > 8. I was under the impression that user could configure a range of
> > > > > supported versions, what's the trade-off for allowing single
> > finalized
> > > > > version only?
> > > > >
> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > > producer
> > > > >
> > > > > Boyang
> > > > >
> > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> > > wrote:
> > > > >
> > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > Hi Colin,
> > > > > > >
> > > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > > suggestions.
> > > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > .
> > > > > > >
> > > > > > > 1. '__data_version__' is the version of the finalized feature
> > > > metadata
> > > > > > > (i.e. actual ZK node contents), while the '__schema_version__' is
> > > the
> > > > > > > version of the schema of the data persisted in ZK. These serve
> > > > > different
> > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > during
> > > > > reads,
> > > > > > > to differentiate between the 2 versions of eventually consistent
> > > > > > 'finalized
> > > > > > > features' metadata (i.e. larger metadata version is more recent).
> > > > > > > '__schema_version__' provides an additional degree of
> > flexibility,
> > > > > where
> > > > > > if
> > > > > > > we decide to change the schema for '/features' node in ZK (in the
> > > > > > future),
> > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > serialization/deserialization of the ZK data can be handled
> > > safely).
> > > > > >
> > > > > > Hi Kowshik,
> > > > > >
> > > > > > If you're talking about a number that lets you know if data is more
> > > or
> > > > > > less recent, we would typically call that an epoch, and not a
> > > version.
> > > > > For
> > > > > > the ZK data structures, the word "version" is typically reserved
> > for
> > > > > > describing changes to the overall schema of the data that is
> > written
> > > to
> > > > > > ZooKeeper.  We don't even really change the "version" of those
> > > schemas
> > > > > that
> > > > > > much, since most changes are backwards-compatible.  But we do
> > include
> > > > > that
> > > > > > version field just in case.
> > > > > >
> > > > > > I don't think we really need an epoch here, though, since we can
> > just
> > > > > look
> > > > > > at the broker epoch.  Whenever the broker registers, its epoch will
> > > be
> > > > > > greater than the previous broker epoch.  And the newly registered
> > > data
> > > > > will
> > > > > > take priority.  This will be a lot simpler than adding a separate
> > > epoch
> > > > > > system, I think.
> > > > > >
> > > > > > >
> > > > > > > 2. Regarding admin client needing min and max information - you
> > are
> > > > > > right!
> > > > > > > I've changed the KIP such that the Admin API also allows the user
> > > to
> > > > > read
> > > > > > > 'supported features' from a specific broker. Please look at the
> > > > section
> > > > > > > "Admin API changes".
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
> > > > I've
> > > > > > > improved the KIP to just use `long` at all places.
> > > > > >
> > > > > > Sounds good.
> > > > > >
> > > > > > >
> > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > I've
> > > > > > updated
> > > > > > > the KIP sketching the functionality provided by this tool, with
> > > some
> > > > > > > examples. Please look at the section "Tooling support examples".
> > > > > > >
> > > > > > > Thank you!
> > > > > >
> > > > > >
> > > > > > Thanks, Kowshik.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > cmccabe@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > >
> > > > > > > > In the "Schema" section, do we really need both
> > > __schema_version__
> > > > > and
> > > > > > > > __data_version__?  Can we just have a single version field
> > here?
> > > > > > > >
> > > > > > > > Shouldn't the Admin(Client) function have some way to get the
> > min
> > > > and
> > > > > > max
> > > > > > > > information that we're exposing as well?  I guess we could have
> > > > min,
> > > > > > max,
> > > > > > > > and current.  Unrelated: is the use of Long rather than long
> > > > > deliberate
> > > > > > > > here?
> > > > > > > >
> > > > > > > > It would be good to describe how the command line tool
> > > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> > that
> > > > it
> > > > > > will
> > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I've opened KIP-584
> > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > >
> > > > > > > > > which
> > > > > > > > > is intended to provide a versioning scheme for features. I'd
> > > like
> > > > > to
> > > > > > use
> > > > > > > > > this thread to discuss the same. I'd appreciate any feedback
> > on
> > > > > this.
> > > > > > > > > Here
> > > > > > > > > is a link to KIP-584
> > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > >  .
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Jun Rao <ju...@confluent.io>.
Hi, Kowshik,

Thanks for the reply. A few more comments below.

100.6 For every new request, the admin needs to control who is allowed to
issue that request if security is enabled. So, we need to assign the new
request a ResourceType and possible AclOperations. See
https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
as
an example.

105. If we change delete to disable, it's better to do this consistently in
request protocol and admin api as well.

110. The minVersion/maxVersion for features use int64. Currently, our
release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
for new features to be included in minor releases too. Should we make the
feature versioning match the release versioning?

111. "During regular operations, the data in the ZK node can be mutated
only via a specific admin API served only by the controller." I am
wondering why can't the controller auto finalize a feature version after
all brokers are upgraded? For new users who download the latest version to
build a new cluster, it's inconvenient for them to have to manually enable
each feature.

112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of 48.

Jun


On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hey Jun,
>
> Thanks a lot for the great feedback! Please note that the design
> has changed a little bit on the KIP, and we now propagate the finalized
> features metadata only via ZK watches (instead of UpdateMetadataRequest
> from the controller).
>
> Please find below my response to your questions/feedback, with the prefix
> "(Kowshik):".
>
> > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > 100.1 Since this request waits for responses from brokers, should we add
> a
> > timeout in the request (like createTopicRequest)?
>
> (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> longer
> wait for responses from brokers, since the design has been changed so that
> the
> features information is propagated via ZK. Nevertheless, it is right to
> have a timeout
> for the request.
>
> > 100.2 The response schema is a bit weird. Typically, the response just
> > shows an error code and an error message, instead of echoing the request.
>
> (Kowshik): Great point! Yeah, I have modified it to just return an error
> code and a message.
> Previously it was not echoing the "request", rather it was returning the
> latest set of
> cluster-wide finalized features (after applying the updates). But you are
> right,
> the additional info is not required, so I have removed it from the response
> schema.
>
> > 100.3 Should we add a separate request to list/describe the existing
> > features?
>
> (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
> Admin API,
> which, underneath covers uses the ApiVersionsRequest to list/describe the
> existing features. Please read the 'Tooling support' section.
>
> > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > DELETE, the version field doesn't make sense. So, I guess the broker just
> > ignores this? An alternative way is to have a separate
> DeleteFeaturesRequest
>
> (Kowshik): Great point! I have modified the KIP now to have 2 separate
> controller APIs
> serving these different purposes:
> 1. updateFeatures
> 2. deleteFeatures
>
> > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > version of the metadata for finalized features." I am wondering why the
> > ordering is important?
>
> (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> version), and
> it is just the ZK node version. Basically, this is the epoch for the
> cluster-wide
> finalized feature version metadata. This metadata is served to clients via
> the
> ApiVersionsResponse (for reads). We propagate updates from the '/features'
> ZK node
> to all brokers, via ZK watches setup by each broker on the '/features'
> node.
>
> Now here is why the ordering is important:
> ZK watches don't propagate at the same time. As a result, the
> ApiVersionsResponse
> is eventually consistent across brokers. This can introduce cases
> where clients see an older lower epoch of the features metadata, after a
> more recent
> higher epoch was returned at a previous point in time. We expect clients
> to always employ the rule that the latest received higher epoch of metadata
> always trumps an older smaller epoch. Those clients that are external to
> Kafka should strongly consider discovering the latest metadata once during
> startup from the brokers, and if required refresh the metadata periodically
> (to get the latest metadata).
>
> > 100.6 Could you specify the required ACL for this new request?
>
> (Kowshik): What is ACL, and how could I find out which one to specify?
> Please could you provide me some pointers? I'll be glad to update the
> KIP once I know the next steps.
>
> > 101. For the broker registration ZK node, should we bump up the version
> in
> the json?
>
> (Kowshik): Great point! Done. I've increased the version in the broker json
> by 1.
>
> > 102. For the /features ZK node, not sure if we need the epoch field. Each
> > ZK node has an internal version field that is incremented on every
> update.
>
> (Kowshik): Great point! Done. I'm using the ZK node version now, instead of
> explicitly
> incremented epoch.
>
> > 103. "Enabling the actual semantics of a feature version cluster-wide is
> > left to the discretion of the logic implementing the feature (ex: can be
> > done via dynamic broker config)." Does that mean the broker registration
> ZK
> > node will be updated dynamically when this happens?
>
> (Kowshik): Not really. The text was just conveying that a broker could
> "know" of
> a new feature version, but it does not mean the broker should have also
> activated the effects of the feature version. Knowing vs activation are 2
> separate things,
> and the latter can be achieved by dynamic config. I have reworded the text
> to
> make this clear to the reader.
>
>
> > 104. UpdateMetadataRequest
> > 104.1 It would be useful to describe when the feature metadata is
> included
> > in the request. My understanding is that it's only included if (1) there
> is
> > a change to the finalized feature; (2) broker restart; (3) controller
> > failover.
> > 104.2 The new fields have the following versions. Why are the versions 3+
> > when the top version is bumped to 6?
> >       "fields":  [
> >         {"name": "Name", "type":  "string", "versions":  "3+",
> >           "about": "The name of the feature."},
> >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >           "about": "The finalized version for the feature."}
> >       ]
>
> (Kowshik): With the new improved design, we have completely eliminated the
> need to
> use UpdateMetadataRequest. This is because we now rely on ZK to deliver the
> notifications for changes to the '/features' ZK node.
>
> > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> better
> > to use enable/disable?
>
> (Kowshik): For delete, yes, I have changed it so that we instead call it
> 'disable'.
> However for 'update', it can now also refer to either an upgrade or a
> forced downgrade.
> Therefore, I have left it the way it is, just calling it as just 'update'.
>
>
> Cheers,
> Kowshik
>
> On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > Thanks for the KIP. Looks good overall. A few comments below.
> >
> > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > 100.1 Since this request waits for responses from brokers, should we add
> a
> > timeout in the request (like createTopicRequest)?
> > 100.2 The response schema is a bit weird. Typically, the response just
> > shows an error code and an error message, instead of echoing the request.
> > 100.3 Should we add a separate request to list/describe the existing
> > features?
> > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > DELETE, the version field doesn't make sense. So, I guess the broker just
> > ignores this? An alternative way is to have a separate
> > DeleteFeaturesRequest
> > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > version of the metadata for finalized features." I am wondering why the
> > ordering is important?
> > 100.6 Could you specify the required ACL for this new request?
> >
> > 101. For the broker registration ZK node, should we bump up the version
> in
> > the json?
> >
> > 102. For the /features ZK node, not sure if we need the epoch field. Each
> > ZK node has an internal version field that is incremented on every
> update.
> >
> > 103. "Enabling the actual semantics of a feature version cluster-wide is
> > left to the discretion of the logic implementing the feature (ex: can be
> > done via dynamic broker config)." Does that mean the broker registration
> ZK
> > node will be updated dynamically when this happens?
> >
> > 104. UpdateMetadataRequest
> > 104.1 It would be useful to describe when the feature metadata is
> included
> > in the request. My understanding is that it's only included if (1) there
> is
> > a change to the finalized feature; (2) broker restart; (3) controller
> > failover.
> > 104.2 The new fields have the following versions. Why are the versions 3+
> > when the top version is bumped to 6?
> >       "fields":  [
> >         {"name": "Name", "type":  "string", "versions":  "3+",
> >           "about": "The name of the feature."},
> >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >           "about": "The finalized version for the feature."}
> >       ]
> >
> > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> better
> > to use enable/disable?
> >
> > Jun
> >
> > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kprakasam@confluent.io
> >
> > wrote:
> >
> > > Hey Boyang,
> > >
> > > Thanks for the great feedback! I have updated the KIP based on your
> > > feedback.
> > > Please find my response below for your comments, look for sentences
> > > starting
> > > with "(Kowshik)" below.
> > >
> > >
> > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > could
> > > be
> > > > converted as "When is it safe for the brokers to start serving new
> > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> the
> > > > context.
> > >
> > > (Kowshik): Great point! Done.
> > >
> > > > 2. In the *Explanation *section, the metadata version number part
> > seems a
> > > > bit blurred. Could you point a reference to later section that we
> going
> > > to
> > > > store it in Zookeeper and update it every time when there is a
> feature
> > > > change?
> > >
> > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > >
> > >
> > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> for
> > > > features such as group coordinator semantics, there is no legal
> > scenario
> > > to
> > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > error-prone as human faults happen all the time. I'm assuming as new
> > > > features are implemented, it's not very hard to add a flag during
> > feature
> > > > creation to indicate whether this feature is "downgradable". Could
> you
> > > > explain a bit more on the extra engineering effort for shipping this
> > KIP
> > > > with downgrade protection in place?
> > >
> > > (Kowshik): Great point! I'd agree and disagree here. While I agree that
> > > accidental
> > > downgrades can cause problems, I also think sometimes downgrades should
> > > be allowed for emergency reasons (not all downgrades cause issues).
> > > It is just subjective to the feature being downgraded.
> > >
> > > To be more strict about feature version downgrades, I have modified the
> > KIP
> > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > UPDATE_FEATURES api
> > > and the tooling, whenever the human is downgrading a finalized feature
> > > version.
> > > Hopefully this should cover the requirement, until we find the need for
> > > advanced downgrade support.
> > >
> > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > defined
> > > > in the broker code." So this means in order to restrict a certain
> > > feature,
> > > > we need to start the broker first and then send a feature gating
> > request
> > > > immediately, which introduces a time gap and the intended-to-close
> > > feature
> > > > could actually serve request during this phase. Do you think we
> should
> > > also
> > > > support configurations as well so that admin user could freely roll
> up
> > a
> > > > cluster with all nodes complying the same feature gating, without
> > > worrying
> > > > about the turnaround time to propagate the message only after the
> > cluster
> > > > starts up?
> > >
> > > (Kowshik): This is a great point/question. One of the expectations out
> of
> > > this KIP, which is
> > > already followed in the broker, is the following.
> > >  - Imagine at time T1 the broker starts up and registers it’s presence
> in
> > > ZK,
> > >    along with advertising it’s supported features.
> > >  - Imagine at a future time T2 the broker receives the
> > > UpdateMetadataRequest
> > >    from the controller, which contains the latest finalized features as
> > > seen by
> > >    the controller. The broker validates this data against it’s
> supported
> > > features to
> > >    make sure there is no mismatch (it will shutdown if there is an
> > > incompatibility).
> > >
> > > It is expected that during the time between the 2 events T1 and T2, the
> > > broker is
> > > almost a silent entity in the cluster. It does not add any value to the
> > > cluster, or carry
> > > out any important broker activities. By “important”, I mean it is not
> > doing
> > > mutations
> > > on it’s persistence, not mutating critical in-memory state, won’t be
> > > serving
> > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > partitions
> > > until
> > > it receives UpdateMetadataRequest from controller. Anything the broker
> is
> > > doing up
> > > until this point is not damaging/useful.
> > >
> > > I’ve clarified the above in the KIP, see this new section:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > .
> > >
> > > > 5. "adding a new Feature, updating or deleting an existing Feature",
> > may
> > > be
> > > > I misunderstood something, I thought the features are defined in
> broker
> > > > code, so admin could not really create a new feature?
> > >
> > > (Kowshik): Great point! You understood this right. Here adding a
> feature
> > > means we are
> > > adding a cluster-wide finalized *max* version for a feature that was
> > > previously never finalized.
> > > I have clarified this in the KIP now.
> > >
> > > > 6. I think we need a separate error code like
> > FEATURE_UPDATE_IN_PROGRESS
> > > to
> > > > reject a concurrent feature update request.
> > >
> > > (Kowshik): Great point! I have modified the KIP adding the above (see
> > > 'Tooling support -> Admin API changes').
> > >
> > > > 7. I think we haven't discussed the alternative solution to pass the
> > > > feature information through Zookeeper. Is that mentioned in the KIP
> to
> > > > justify why using UpdateMetadata is more favorable?
> > >
> > > (Kowshik): Nice question! The broker reads finalized feature info
> stored
> > in
> > > ZK,
> > > only during startup when it does a validation. When serving
> > > `ApiVersionsRequest`, the
> > > broker does not read this info from ZK directly. I'd imagine the risk
> is
> > > that it can increase
> > > the ZK read QPS which can be a bottleneck for the system. Today, in
> Kafka
> > > we use the
> > > controller to fan out ZK updates to brokers and we want to stick to
> that
> > > pattern to avoid
> > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > >
> > > > 8. I was under the impression that user could configure a range of
> > > > supported versions, what's the trade-off for allowing single
> finalized
> > > > version only?
> > >
> > > (Kowshik): Great question! The finalized version of a feature basically
> > > refers to
> > > the cluster-wide finalized feature "maximum" version. For example, if
> the
> > > 'group_coordinator' feature
> > > has the finalized version set to 10, then, it means that cluster-wide
> all
> > > versions upto v10 are
> > > supported for this feature. However, note that if some version (ex: v0)
> > > gets deprecated
> > > for this feature, then we don’t convey that using this scheme (also
> > > supporting deprecation is a non-goal).
> > >
> > > (Kowshik): I’ve now modified the KIP at all points, refering to
> finalized
> > > feature "maximum" versions.
> > >
> > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > producer
> > >
> > > (Kowshik): Great point! Done.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > >
> > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> reluctanthero104@gmail.com>
> > > wrote:
> > >
> > > > Hey Kowshik,
> > > >
> > > > thanks for the revised KIP. Got a couple of questions:
> > > >
> > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > could
> > > be
> > > > converted as "When is it safe for the brokers to start serving new
> > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> the
> > > > context.
> > > >
> > > > 2. In the *Explanation *section, the metadata version number part
> > seems a
> > > > bit blurred. Could you point a reference to later section that we
> going
> > > to
> > > > store it in Zookeeper and update it every time when there is a
> feature
> > > > change?
> > > >
> > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> for
> > > > features such as group coordinator semantics, there is no legal
> > scenario
> > > to
> > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > error-prone as human faults happen all the time. I'm assuming as new
> > > > features are implemented, it's not very hard to add a flag during
> > feature
> > > > creation to indicate whether this feature is "downgradable". Could
> you
> > > > explain a bit more on the extra engineering effort for shipping this
> > KIP
> > > > with downgrade protection in place?
> > > >
> > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > defined
> > > > in the broker code." So this means in order to restrict a certain
> > > feature,
> > > > we need to start the broker first and then send a feature gating
> > request
> > > > immediately, which introduces a time gap and the intended-to-close
> > > feature
> > > > could actually serve request during this phase. Do you think we
> should
> > > also
> > > > support configurations as well so that admin user could freely roll
> up
> > a
> > > > cluster with all nodes complying the same feature gating, without
> > > worrying
> > > > about the turnaround time to propagate the message only after the
> > cluster
> > > > starts up?
> > > >
> > > > 5. "adding a new Feature, updating or deleting an existing Feature",
> > may
> > > be
> > > > I misunderstood something, I thought the features are defined in
> broker
> > > > code, so admin could not really create a new feature?
> > > >
> > > > 6. I think we need a separate error code like
> > FEATURE_UPDATE_IN_PROGRESS
> > > to
> > > > reject a concurrent feature update request.
> > > >
> > > > 7. I think we haven't discussed the alternative solution to pass the
> > > > feature information through Zookeeper. Is that mentioned in the KIP
> to
> > > > justify why using UpdateMetadata is more favorable?
> > > >
> > > > 8. I was under the impression that user could configure a range of
> > > > supported versions, what's the trade-off for allowing single
> finalized
> > > > version only?
> > > >
> > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > producer
> > > >
> > > > Boyang
> > > >
> > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> > wrote:
> > > >
> > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > Hi Colin,
> > > > > >
> > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > suggestions.
> > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > .
> > > > > >
> > > > > > 1. '__data_version__' is the version of the finalized feature
> > > metadata
> > > > > > (i.e. actual ZK node contents), while the '__schema_version__' is
> > the
> > > > > > version of the schema of the data persisted in ZK. These serve
> > > > different
> > > > > > purposes. '__data_version__' is is useful mainly to clients
> during
> > > > reads,
> > > > > > to differentiate between the 2 versions of eventually consistent
> > > > > 'finalized
> > > > > > features' metadata (i.e. larger metadata version is more recent).
> > > > > > '__schema_version__' provides an additional degree of
> flexibility,
> > > > where
> > > > > if
> > > > > > we decide to change the schema for '/features' node in ZK (in the
> > > > > future),
> > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > serialization/deserialization of the ZK data can be handled
> > safely).
> > > > >
> > > > > Hi Kowshik,
> > > > >
> > > > > If you're talking about a number that lets you know if data is more
> > or
> > > > > less recent, we would typically call that an epoch, and not a
> > version.
> > > > For
> > > > > the ZK data structures, the word "version" is typically reserved
> for
> > > > > describing changes to the overall schema of the data that is
> written
> > to
> > > > > ZooKeeper.  We don't even really change the "version" of those
> > schemas
> > > > that
> > > > > much, since most changes are backwards-compatible.  But we do
> include
> > > > that
> > > > > version field just in case.
> > > > >
> > > > > I don't think we really need an epoch here, though, since we can
> just
> > > > look
> > > > > at the broker epoch.  Whenever the broker registers, its epoch will
> > be
> > > > > greater than the previous broker epoch.  And the newly registered
> > data
> > > > will
> > > > > take priority.  This will be a lot simpler than adding a separate
> > epoch
> > > > > system, I think.
> > > > >
> > > > > >
> > > > > > 2. Regarding admin client needing min and max information - you
> are
> > > > > right!
> > > > > > I've changed the KIP such that the Admin API also allows the user
> > to
> > > > read
> > > > > > 'supported features' from a specific broker. Please look at the
> > > section
> > > > > > "Admin API changes".
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
> > > I've
> > > > > > improved the KIP to just use `long` at all places.
> > > > >
> > > > > Sounds good.
> > > > >
> > > > > >
> > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> I've
> > > > > updated
> > > > > > the KIP sketching the functionality provided by this tool, with
> > some
> > > > > > examples. Please look at the section "Tooling support examples".
> > > > > >
> > > > > > Thank you!
> > > > >
> > > > >
> > > > > Thanks, Kowshik.
> > > > >
> > > > > cheers,
> > > > > Colin
> > > > >
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> cmccabe@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks, Kowshik, this looks good.
> > > > > > >
> > > > > > > In the "Schema" section, do we really need both
> > __schema_version__
> > > > and
> > > > > > > __data_version__?  Can we just have a single version field
> here?
> > > > > > >
> > > > > > > Shouldn't the Admin(Client) function have some way to get the
> min
> > > and
> > > > > max
> > > > > > > information that we're exposing as well?  I guess we could have
> > > min,
> > > > > max,
> > > > > > > and current.  Unrelated: is the use of Long rather than long
> > > > deliberate
> > > > > > > here?
> > > > > > >
> > > > > > > It would be good to describe how the command line tool
> > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> that
> > > it
> > > > > will
> > > > > > > take and the output that it will generate to STDOUT.
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I've opened KIP-584
> > <https://issues.apache.org/jira/browse/KIP-584> <
> > > https://issues.apache.org/jira/browse/KIP-584
> > > > >
> > > > > > > > which
> > > > > > > > is intended to provide a versioning scheme for features. I'd
> > like
> > > > to
> > > > > use
> > > > > > > > this thread to discuss the same. I'd appreciate any feedback
> on
> > > > this.
> > > > > > > > Here
> > > > > > > > is a link to KIP-584
> > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > >  .
> > > > > > > >
> > > > > > > > Thank you!
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Colin McCabe <cm...@apache.org>.
Hi Kowshik,

The discussion on ZooKeeper reads versus writes makes sense to me.  The important thing to keep in mind here is that in the bridge release, all brokers can read from ZooKeeper, but only the controller writes.

Why do we need both UpdateFeaturesRequest and DeleteFeaturesRequest?  It seems awkward to have "deleting" be a special case here when the general idea is that we have an RPC to change the supported feature flags.  Changing the feature level from 2 to 1 doesn't seem that different from changing it from 1 to not present.

It would be simpler to just say that a feature flag which doesn't appear in the znode is considered to be at version level 0.  This will also simplify the code a lot, I think, since you won't have to keep track of tricky distinctions between "disabled" and "enabled at version 0."  Then you would be able to just use an int in most places.

(By the way, I would propose the term "version level" for this number, since it avoids confusion with all the other meanings of the word "version" that we have in the code.)

Another thing to keep in mind is that if a request RPC is batch, the corresponding response RPC also needs to be batch.  In other words, you need multiple error codes, one for each feature flag whose level you are trying to change.  Unless the idea is that the whole change is a transaction that all either happens or doesn't?

Rather than FeatureUpdateType, I would just go with a boolean like "force."  I'm not sure what other values we'd want to add to this later on, if it were an enum.  I think the boolean is clearer.

This ties in with my comment earlier, but for the result classes, we need methods other than just "all".  Batch operations aren't usable if you can't get the result per operation.... unless the semantics are transactional and it really is just everything succeeded or everything failed.

There are a bunch of Java interfaces described like FinalizedFeature, FeatureUpdate, UpdateFeaturesResult, and so on that should just be regular concrete Java classes.  In general we'd only use an interface if we wanted the caller to implement some kind of callback function.  We don't make classes that are just designed to hold data into interfaces, since that just imposes extra work on callers (they have to define their own concrete class for each interface just to use the API.)  There's also probably no reason to have these classes inherit from each other or have complex type relationships.  One more nitpick is that Kafka generally doesn't use "get" in the function names of accessors.

best,
Colin


On Fri, Apr 3, 2020, at 13:04, Kowshik Prakasam wrote:
> Hey Boyang,
> 
> Thanks for the feedback! I've updated the KIP. Please find below my
> response.
> 
> > 1. Do you mind updating the non-goal section as we are introducing a
> > --feature-force-downgrade to address downgrade concern?
> 
> (Kowshik): This was already mentioned. Look for non-goal: 1-b.
> 
> > 2. For the flags `--feature` seems to be a redundant prefix, as the script
> > is already called `kafka-features.sh`. They could just be called
> > `--upgrade` and `--force-downgrade`.
> 
> (Kowshik): Great point! Done.
> 
> > 3. I don't feel strong to require a confirmation for a normal feature
> > upgrade, unless there are other existing scripts doing so.
> 
> (Kowshik): Done. Removed now. We now ask for a confirmation only for
> downgrades.
> 
> > 4. How could we know the existing feature versions when user are only
> > executing upgrades? Does the `kafka-features.sh` always send a
> > DescribeFeatureRequest to broker first?
> 
> (Kowshik): For deletes, yes it will make an ApiVersionsRequest call to show
> the
> versions of the features. Perhaps the ApiVersionsRequest can be sent
> to just the controller to avoid questions on consistency, but that's
> an implementation detail.
> 
> > 5. I'm not 100% sure, but a script usually use the same flag once, so
> maybe
> > we should also do that for `--upgrade-feature`? Instead of flagging twice
> > for different features, a comma separated list of (feature:max_version)
> > will be expected, or something like that.
> 
> (Kowshik): Done. I'm using a comma-separated list now.
> 
> > 6. "The node data shall be readable via existing ZK tooling" Just trying
> to
> > clarify, we are not introducing ZK direct read tool in this KIP correct?
> As
> > for KIP-500 <https://issues.apache.org/jira/browse/KIP-500> we are
> eventually going to deprecate all direct ZK access tools.
> 
> (Kowshik): Done. Yes, we are not intending to add such a tool. I was just
> saying that
> if we ever want to read it from ZK, then it's readable via ZK cli (in the
> interim).
> I have modified the text conveying the intent to support reads via
> ApiVersionsRequest only (optionally this request can be directed at the
> controller to
> void questions on consistency, but that's an implementation detail).
> 
> > 7. Could we have a centralized section called `Public Interfaces` to
> > summarize all the public API changes? This is a required section in a KIP.
> > And we should also write down the new error codes we will be introducing
> in
> > this KIP, and include both new and old error codes in the Response schema
> > comment if possible. For example, UpdateFeatureResponse could expect a
> > `NOT_CONTROLLER` error code.
> 
> (Kowshik): Done. The error codes have been documented in the response
> schemas now.
> Added a new section titled "New or Changed Public Interfaces" summarizing
> only the
> changes made to the public interfaces.
> 
> 
> Cheers,
> Kowshik
> 
> 
> On Fri, Apr 3, 2020 at 9:39 AM Boyang Chen <re...@gmail.com>
> wrote:
> 
> > Hey Kowshik,
> >
> > thanks for getting the KIP updated. The Zookeeper routing approach makes
> > sense and simplifies the changes.
> > Some follow-ups:
> >
> > 1. Do you mind updating the non-goal section as we are introducing a
> > --feature-force-downgrade to address downgrade concern?
> >
> > 2. For the flags `--feature` seems to be a redundant prefix, as the script
> > is already called `kafka-features.sh`. They could just be called
> > `--upgrade` and `--force-downgrade`.
> >
> > 3. I don't feel strong to require a confirmation for a normal feature
> > upgrade, unless there are other existing scripts doing so.
> >
> > 4. How could we know the existing feature versions when user are only
> > executing upgrades? Does the `kafka-features.sh` always send a
> > DescribeFeatureRequest to broker first?
> >
> > 5. I'm not 100% sure, but a script usually use the same flag once, so maybe
> > we should also do that for `--upgrade-feature`? Instead of flagging twice
> > for different features, a comma separated list of (feature:max_version)
> > will be expected, or something like that.
> >
> > 6. "The node data shall be readable via existing ZK tooling" Just trying to
> > clarify, we are not introducing ZK direct read tool in this KIP correct? As
> > for KIP-500 <https://issues.apache.org/jira/browse/KIP-500> we are
> > eventually going to deprecate all direct ZK access tools.
> >
> > 7. Could we have a centralized section called `Public Interfaces` to
> > summarize all the public API changes? This is a required section in a KIP.
> > And we should also write down the new error codes we will be introducing in
> > this KIP, and include both new and old error codes in the Response schema
> > comment if possible. For example, UpdateFeatureResponse could expect a
> > `NOT_CONTROLLER` error code.
> >
> >
> > Boyang
> >
> > On Fri, Apr 3, 2020 at 3:15 AM Kowshik Prakasam <kp...@confluent.io>
> > wrote:
> >
> > > Hi all,
> > >
> > > Any other feedback on this KIP before we start the vote?
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> > > wrote:
> > >
> > > > Hey Jun,
> > > >
> > > > Thanks a lot for the great feedback! Please note that the design
> > > > has changed a little bit on the KIP, and we now propagate the finalized
> > > > features metadata only via ZK watches (instead of UpdateMetadataRequest
> > > > from the controller).
> > > >
> > > > Please find below my response to your questions/feedback, with the
> > prefix
> > > > "(Kowshik):".
> > > >
> > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > 100.1 Since this request waits for responses from brokers, should we
> > > add
> > > > a
> > > > > timeout in the request (like createTopicRequest)?
> > > >
> > > > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> > > > longer
> > > > wait for responses from brokers, since the design has been changed so
> > > that
> > > > the
> > > > features information is propagated via ZK. Nevertheless, it is right to
> > > > have a timeout
> > > > for the request.
> > > >
> > > > > 100.2 The response schema is a bit weird. Typically, the response
> > just
> > > > > shows an error code and an error message, instead of echoing the
> > > request.
> > > >
> > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > error
> > > > code and a message.
> > > > Previously it was not echoing the "request", rather it was returning
> > the
> > > > latest set of
> > > > cluster-wide finalized features (after applying the updates). But you
> > are
> > > > right,
> > > > the additional info is not required, so I have removed it from the
> > > > response schema.
> > > >
> > > > > 100.3 Should we add a separate request to list/describe the existing
> > > > > features?
> > > >
> > > > (Kowshik): This is already present in the KIP via the
> > 'DescribeFeatures'
> > > > Admin API,
> > > > which, underneath covers uses the ApiVersionsRequest to list/describe
> > the
> > > > existing features. Please read the 'Tooling support' section.
> > > >
> > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > > DELETE, the version field doesn't make sense. So, I guess the broker
> > > just
> > > > > ignores this? An alternative way is to have a separate
> > > > DeleteFeaturesRequest
> > > >
> > > > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> > > > controller APIs
> > > > serving these different purposes:
> > > > 1. updateFeatures
> > > > 2. deleteFeatures
> > > >
> > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > increasing
> > > > > version of the metadata for finalized features." I am wondering why
> > the
> > > > > ordering is important?
> > > >
> > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > > > version), and
> > > > it is just the ZK node version. Basically, this is the epoch for the
> > > > cluster-wide
> > > > finalized feature version metadata. This metadata is served to clients
> > > via
> > > > the
> > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > '/features'
> > > > ZK node
> > > > to all brokers, via ZK watches setup by each broker on the '/features'
> > > > node.
> > > >
> > > > Now here is why the ordering is important:
> > > > ZK watches don't propagate at the same time. As a result, the
> > > > ApiVersionsResponse
> > > > is eventually consistent across brokers. This can introduce cases
> > > > where clients see an older lower epoch of the features metadata, after
> > a
> > > > more recent
> > > > higher epoch was returned at a previous point in time. We expect
> > clients
> > > > to always employ the rule that the latest received higher epoch of
> > > metadata
> > > > always trumps an older smaller epoch. Those clients that are external
> > to
> > > > Kafka should strongly consider discovering the latest metadata once
> > > during
> > > > startup from the brokers, and if required refresh the metadata
> > > periodically
> > > > (to get the latest metadata).
> > > >
> > > > > 100.6 Could you specify the required ACL for this new request?
> > > >
> > > > (Kowshik): What is ACL, and how could I find out which one to specify?
> > > > Please could you provide me some pointers? I'll be glad to update the
> > > > KIP once I know the next steps.
> > > >
> > > > > 101. For the broker registration ZK node, should we bump up the
> > version
> > > > in
> > > > the json?
> > > >
> > > > (Kowshik): Great point! Done. I've increased the version in the broker
> > > > json by 1.
> > > >
> > > > > 102. For the /features ZK node, not sure if we need the epoch field.
> > > Each
> > > > > ZK node has an internal version field that is incremented on every
> > > > update.
> > > >
> > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > instead
> > > > of explicitly
> > > > incremented epoch.
> > > >
> > > > > 103. "Enabling the actual semantics of a feature version cluster-wide
> > > is
> > > > > left to the discretion of the logic implementing the feature (ex: can
> > > be
> > > > > done via dynamic broker config)." Does that mean the broker
> > > registration
> > > > ZK
> > > > > node will be updated dynamically when this happens?
> > > >
> > > > (Kowshik): Not really. The text was just conveying that a broker could
> > > > "know" of
> > > > a new feature version, but it does not mean the broker should have also
> > > > activated the effects of the feature version. Knowing vs activation
> > are 2
> > > > separate things,
> > > > and the latter can be achieved by dynamic config. I have reworded the
> > > text
> > > > to
> > > > make this clear to the reader.
> > > >
> > > >
> > > > > 104. UpdateMetadataRequest
> > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > included
> > > > > in the request. My understanding is that it's only included if (1)
> > > there
> > > > is
> > > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > > failover.
> > > > > 104.2 The new fields have the following versions. Why are the
> > versions
> > > 3+
> > > > > when the top version is bumped to 6?
> > > > >       "fields":  [
> > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > >           "about": "The name of the feature."},
> > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > >           "about": "The finalized version for the feature."}
> > > > >       ]
> > > >
> > > > (Kowshik): With the new improved design, we have completely eliminated
> > > the
> > > > need to
> > > > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
> > > the
> > > > notifications for changes to the '/features' ZK node.
> > > >
> > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > > better
> > > > > to use enable/disable?
> > > >
> > > > (Kowshik): For delete, yes, I have changed it so that we instead call
> > it
> > > > 'disable'.
> > > > However for 'update', it can now also refer to either an upgrade or a
> > > > forced downgrade.
> > > > Therefore, I have left it the way it is, just calling it as just
> > > 'update'.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > >> Hi, Kowshik,
> > > >>
> > > >> Thanks for the KIP. Looks good overall. A few comments below.
> > > >>
> > > >> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > >> 100.1 Since this request waits for responses from brokers, should we
> > > add a
> > > >> timeout in the request (like createTopicRequest)?
> > > >> 100.2 The response schema is a bit weird. Typically, the response just
> > > >> shows an error code and an error message, instead of echoing the
> > > request.
> > > >> 100.3 Should we add a separate request to list/describe the existing
> > > >> features?
> > > >> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > >> DELETE, the version field doesn't make sense. So, I guess the broker
> > > just
> > > >> ignores this? An alternative way is to have a separate
> > > >> DeleteFeaturesRequest
> > > >> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > > >> version of the metadata for finalized features." I am wondering why
> > the
> > > >> ordering is important?
> > > >> 100.6 Could you specify the required ACL for this new request?
> > > >>
> > > >> 101. For the broker registration ZK node, should we bump up the
> > version
> > > in
> > > >> the json?
> > > >>
> > > >> 102. For the /features ZK node, not sure if we need the epoch field.
> > > Each
> > > >> ZK node has an internal version field that is incremented on every
> > > update.
> > > >>
> > > >> 103. "Enabling the actual semantics of a feature version cluster-wide
> > is
> > > >> left to the discretion of the logic implementing the feature (ex: can
> > be
> > > >> done via dynamic broker config)." Does that mean the broker
> > registration
> > > >> ZK
> > > >> node will be updated dynamically when this happens?
> > > >>
> > > >> 104. UpdateMetadataRequest
> > > >> 104.1 It would be useful to describe when the feature metadata is
> > > included
> > > >> in the request. My understanding is that it's only included if (1)
> > there
> > > >> is
> > > >> a change to the finalized feature; (2) broker restart; (3) controller
> > > >> failover.
> > > >> 104.2 The new fields have the following versions. Why are the versions
> > > 3+
> > > >> when the top version is bumped to 6?
> > > >>       "fields":  [
> > > >>         {"name": "Name", "type":  "string", "versions":  "3+",
> > > >>           "about": "The name of the feature."},
> > > >>         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > >>           "about": "The finalized version for the feature."}
> > > >>       ]
> > > >>
> > > >> 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > >> better
> > > >> to use enable/disable?
> > > >>
> > > >> Jun
> > > >>
> > > >> On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > kprakasam@confluent.io>
> > > >> wrote:
> > > >>
> > > >> > Hey Boyang,
> > > >> >
> > > >> > Thanks for the great feedback! I have updated the KIP based on your
> > > >> > feedback.
> > > >> > Please find my response below for your comments, look for sentences
> > > >> > starting
> > > >> > with "(Kowshik)" below.
> > > >> >
> > > >> >
> > > >> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > > >> could
> > > >> > be
> > > >> > > converted as "When is it safe for the brokers to start serving new
> > > >> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > > the
> > > >> > > context.
> > > >> >
> > > >> > (Kowshik): Great point! Done.
> > > >> >
> > > >> > > 2. In the *Explanation *section, the metadata version number part
> > > >> seems a
> > > >> > > bit blurred. Could you point a reference to later section that we
> > > >> going
> > > >> > to
> > > >> > > store it in Zookeeper and update it every time when there is a
> > > feature
> > > >> > > change?
> > > >> >
> > > >> > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > >> >
> > > >> >
> > > >> > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > > for
> > > >> > > features such as group coordinator semantics, there is no legal
> > > >> scenario
> > > >> > to
> > > >> > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > >> > > error-prone as human faults happen all the time. I'm assuming as
> > new
> > > >> > > features are implemented, it's not very hard to add a flag during
> > > >> feature
> > > >> > > creation to indicate whether this feature is "downgradable". Could
> > > you
> > > >> > > explain a bit more on the extra engineering effort for shipping
> > this
> > > >> KIP
> > > >> > > with downgrade protection in place?
> > > >> >
> > > >> > (Kowshik): Great point! I'd agree and disagree here. While I agree
> > > that
> > > >> > accidental
> > > >> > downgrades can cause problems, I also think sometimes downgrades
> > > should
> > > >> > be allowed for emergency reasons (not all downgrades cause issues).
> > > >> > It is just subjective to the feature being downgraded.
> > > >> >
> > > >> > To be more strict about feature version downgrades, I have modified
> > > the
> > > >> KIP
> > > >> > proposing that we mandate a `--force-downgrade` flag be used in the
> > > >> > UPDATE_FEATURES api
> > > >> > and the tooling, whenever the human is downgrading a finalized
> > feature
> > > >> > version.
> > > >> > Hopefully this should cover the requirement, until we find the need
> > > for
> > > >> > advanced downgrade support.
> > > >> >
> > > >> > > 4. "Each broker’s supported dictionary of feature versions will be
> > > >> > defined
> > > >> > > in the broker code." So this means in order to restrict a certain
> > > >> > feature,
> > > >> > > we need to start the broker first and then send a feature gating
> > > >> request
> > > >> > > immediately, which introduces a time gap and the intended-to-close
> > > >> > feature
> > > >> > > could actually serve request during this phase. Do you think we
> > > should
> > > >> > also
> > > >> > > support configurations as well so that admin user could freely
> > roll
> > > >> up a
> > > >> > > cluster with all nodes complying the same feature gating, without
> > > >> > worrying
> > > >> > > about the turnaround time to propagate the message only after the
> > > >> cluster
> > > >> > > starts up?
> > > >> >
> > > >> > (Kowshik): This is a great point/question. One of the expectations
> > out
> > > >> of
> > > >> > this KIP, which is
> > > >> > already followed in the broker, is the following.
> > > >> >  - Imagine at time T1 the broker starts up and registers it’s
> > presence
> > > >> in
> > > >> > ZK,
> > > >> >    along with advertising it’s supported features.
> > > >> >  - Imagine at a future time T2 the broker receives the
> > > >> > UpdateMetadataRequest
> > > >> >    from the controller, which contains the latest finalized features
> > > as
> > > >> > seen by
> > > >> >    the controller. The broker validates this data against it’s
> > > supported
> > > >> > features to
> > > >> >    make sure there is no mismatch (it will shutdown if there is an
> > > >> > incompatibility).
> > > >> >
> > > >> > It is expected that during the time between the 2 events T1 and T2,
> > > the
> > > >> > broker is
> > > >> > almost a silent entity in the cluster. It does not add any value to
> > > the
> > > >> > cluster, or carry
> > > >> > out any important broker activities. By “important”, I mean it is
> > not
> > > >> doing
> > > >> > mutations
> > > >> > on it’s persistence, not mutating critical in-memory state, won’t be
> > > >> > serving
> > > >> > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > >> partitions
> > > >> > until
> > > >> > it receives UpdateMetadataRequest from controller. Anything the
> > broker
> > > >> is
> > > >> > doing up
> > > >> > until this point is not damaging/useful.
> > > >> >
> > > >> > I’ve clarified the above in the KIP, see this new section:
> > > >> >
> > > >> >
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > >> > .
> > > >> >
> > > >> > > 5. "adding a new Feature, updating or deleting an existing
> > Feature",
> > > >> may
> > > >> > be
> > > >> > > I misunderstood something, I thought the features are defined in
> > > >> broker
> > > >> > > code, so admin could not really create a new feature?
> > > >> >
> > > >> > (Kowshik): Great point! You understood this right. Here adding a
> > > feature
> > > >> > means we are
> > > >> > adding a cluster-wide finalized *max* version for a feature that was
> > > >> > previously never finalized.
> > > >> > I have clarified this in the KIP now.
> > > >> >
> > > >> > > 6. I think we need a separate error code like
> > > >> FEATURE_UPDATE_IN_PROGRESS
> > > >> > to
> > > >> > > reject a concurrent feature update request.
> > > >> >
> > > >> > (Kowshik): Great point! I have modified the KIP adding the above
> > (see
> > > >> > 'Tooling support -> Admin API changes').
> > > >> >
> > > >> > > 7. I think we haven't discussed the alternative solution to pass
> > the
> > > >> > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > to
> > > >> > > justify why using UpdateMetadata is more favorable?
> > > >> >
> > > >> > (Kowshik): Nice question! The broker reads finalized feature info
> > > >> stored in
> > > >> > ZK,
> > > >> > only during startup when it does a validation. When serving
> > > >> > `ApiVersionsRequest`, the
> > > >> > broker does not read this info from ZK directly. I'd imagine the
> > risk
> > > is
> > > >> > that it can increase
> > > >> > the ZK read QPS which can be a bottleneck for the system. Today, in
> > > >> Kafka
> > > >> > we use the
> > > >> > controller to fan out ZK updates to brokers and we want to stick to
> > > that
> > > >> > pattern to avoid
> > > >> > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > >> >
> > > >> > > 8. I was under the impression that user could configure a range of
> > > >> > > supported versions, what's the trade-off for allowing single
> > > finalized
> > > >> > > version only?
> > > >> >
> > > >> > (Kowshik): Great question! The finalized version of a feature
> > > basically
> > > >> > refers to
> > > >> > the cluster-wide finalized feature "maximum" version. For example,
> > if
> > > >> the
> > > >> > 'group_coordinator' feature
> > > >> > has the finalized version set to 10, then, it means that
> > cluster-wide
> > > >> all
> > > >> > versions upto v10 are
> > > >> > supported for this feature. However, note that if some version (ex:
> > > v0)
> > > >> > gets deprecated
> > > >> > for this feature, then we don’t convey that using this scheme (also
> > > >> > supporting deprecation is a non-goal).
> > > >> >
> > > >> > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > >> finalized
> > > >> > feature "maximum" versions.
> > > >> >
> > > >> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > >> > producer
> > > >> >
> > > >> > (Kowshik): Great point! Done.
> > > >> >
> > > >> >
> > > >> > Cheers,
> > > >> > Kowshik
> > > >> >
> > > >> >
> > > >> > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > reluctanthero104@gmail.com
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> > > Hey Kowshik,
> > > >> > >
> > > >> > > thanks for the revised KIP. Got a couple of questions:
> > > >> > >
> > > >> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > > >> could
> > > >> > be
> > > >> > > converted as "When is it safe for the brokers to start serving new
> > > >> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > > the
> > > >> > > context.
> > > >> > >
> > > >> > > 2. In the *Explanation *section, the metadata version number part
> > > >> seems a
> > > >> > > bit blurred. Could you point a reference to later section that we
> > > >> going
> > > >> > to
> > > >> > > store it in Zookeeper and update it every time when there is a
> > > feature
> > > >> > > change?
> > > >> > >
> > > >> > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > > for
> > > >> > > features such as group coordinator semantics, there is no legal
> > > >> scenario
> > > >> > to
> > > >> > > perform a downgrade at all. So having downgrade door open is
> > pretty
> > > >> > > error-prone as human faults happen all the time. I'm assuming as
> > new
> > > >> > > features are implemented, it's not very hard to add a flag during
> > > >> feature
> > > >> > > creation to indicate whether this feature is "downgradable". Could
> > > you
> > > >> > > explain a bit more on the extra engineering effort for shipping
> > this
> > > >> KIP
> > > >> > > with downgrade protection in place?
> > > >> > >
> > > >> > > 4. "Each broker’s supported dictionary of feature versions will be
> > > >> > defined
> > > >> > > in the broker code." So this means in order to restrict a certain
> > > >> > feature,
> > > >> > > we need to start the broker first and then send a feature gating
> > > >> request
> > > >> > > immediately, which introduces a time gap and the intended-to-close
> > > >> > feature
> > > >> > > could actually serve request during this phase. Do you think we
> > > should
> > > >> > also
> > > >> > > support configurations as well so that admin user could freely
> > roll
> > > >> up a
> > > >> > > cluster with all nodes complying the same feature gating, without
> > > >> > worrying
> > > >> > > about the turnaround time to propagate the message only after the
> > > >> cluster
> > > >> > > starts up?
> > > >> > >
> > > >> > > 5. "adding a new Feature, updating or deleting an existing
> > Feature",
> > > >> may
> > > >> > be
> > > >> > > I misunderstood something, I thought the features are defined in
> > > >> broker
> > > >> > > code, so admin could not really create a new feature?
> > > >> > >
> > > >> > > 6. I think we need a separate error code like
> > > >> FEATURE_UPDATE_IN_PROGRESS
> > > >> > to
> > > >> > > reject a concurrent feature update request.
> > > >> > >
> > > >> > > 7. I think we haven't discussed the alternative solution to pass
> > the
> > > >> > > feature information through Zookeeper. Is that mentioned in the
> > KIP
> > > to
> > > >> > > justify why using UpdateMetadata is more favorable?
> > > >> > >
> > > >> > > 8. I was under the impression that user could configure a range of
> > > >> > > supported versions, what's the trade-off for allowing single
> > > finalized
> > > >> > > version only?
> > > >> > >
> > > >> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > >> > producer
> > > >> > >
> > > >> > > Boyang
> > > >> > >
> > > >> > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> > > >> wrote:
> > > >> > >
> > > >> > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > >> > > > > Hi Colin,
> > > >> > > > >
> > > >> > > > > Thanks for the feedback! I've changed the KIP to address your
> > > >> > > > > suggestions.
> > > >> > > > > Please find below my explanation. Here is a link to KIP 584:
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > >> > > > > .
> > > >> > > > >
> > > >> > > > > 1. '__data_version__' is the version of the finalized feature
> > > >> > metadata
> > > >> > > > > (i.e. actual ZK node contents), while the '__schema_version__'
> > > is
> > > >> the
> > > >> > > > > version of the schema of the data persisted in ZK. These serve
> > > >> > > different
> > > >> > > > > purposes. '__data_version__' is is useful mainly to clients
> > > during
> > > >> > > reads,
> > > >> > > > > to differentiate between the 2 versions of eventually
> > consistent
> > > >> > > > 'finalized
> > > >> > > > > features' metadata (i.e. larger metadata version is more
> > > recent).
> > > >> > > > > '__schema_version__' provides an additional degree of
> > > flexibility,
> > > >> > > where
> > > >> > > > if
> > > >> > > > > we decide to change the schema for '/features' node in ZK (in
> > > the
> > > >> > > > future),
> > > >> > > > > then we can manage broker roll outs suitably (i.e.
> > > >> > > > > serialization/deserialization of the ZK data can be handled
> > > >> safely).
> > > >> > > >
> > > >> > > > Hi Kowshik,
> > > >> > > >
> > > >> > > > If you're talking about a number that lets you know if data is
> > > more
> > > >> or
> > > >> > > > less recent, we would typically call that an epoch, and not a
> > > >> version.
> > > >> > > For
> > > >> > > > the ZK data structures, the word "version" is typically reserved
> > > for
> > > >> > > > describing changes to the overall schema of the data that is
> > > >> written to
> > > >> > > > ZooKeeper.  We don't even really change the "version" of those
> > > >> schemas
> > > >> > > that
> > > >> > > > much, since most changes are backwards-compatible.  But we do
> > > >> include
> > > >> > > that
> > > >> > > > version field just in case.
> > > >> > > >
> > > >> > > > I don't think we really need an epoch here, though, since we can
> > > >> just
> > > >> > > look
> > > >> > > > at the broker epoch.  Whenever the broker registers, its epoch
> > > will
> > > >> be
> > > >> > > > greater than the previous broker epoch.  And the newly
> > registered
> > > >> data
> > > >> > > will
> > > >> > > > take priority.  This will be a lot simpler than adding a
> > separate
> > > >> epoch
> > > >> > > > system, I think.
> > > >> > > >
> > > >> > > > >
> > > >> > > > > 2. Regarding admin client needing min and max information -
> > you
> > > >> are
> > > >> > > > right!
> > > >> > > > > I've changed the KIP such that the Admin API also allows the
> > > user
> > > >> to
> > > >> > > read
> > > >> > > > > 'supported features' from a specific broker. Please look at
> > the
> > > >> > section
> > > >> > > > > "Admin API changes".
> > > >> > > >
> > > >> > > > Thanks.
> > > >> > > >
> > > >> > > > >
> > > >> > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > deliberate.
> > > >> > I've
> > > >> > > > > improved the KIP to just use `long` at all places.
> > > >> > > >
> > > >> > > > Sounds good.
> > > >> > > >
> > > >> > > > >
> > > >> > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > > I've
> > > >> > > > updated
> > > >> > > > > the KIP sketching the functionality provided by this tool,
> > with
> > > >> some
> > > >> > > > > examples. Please look at the section "Tooling support
> > examples".
> > > >> > > > >
> > > >> > > > > Thank you!
> > > >> > > >
> > > >> > > >
> > > >> > > > Thanks, Kowshik.
> > > >> > > >
> > > >> > > > cheers,
> > > >> > > > Colin
> > > >> > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > Cheers,
> > > >> > > > > Kowshik
> > > >> > > > >
> > > >> > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > cmccabe@apache.org
> > > >> >
> > > >> > > > wrote:
> > > >> > > > >
> > > >> > > > > > Thanks, Kowshik, this looks good.
> > > >> > > > > >
> > > >> > > > > > In the "Schema" section, do we really need both
> > > >> __schema_version__
> > > >> > > and
> > > >> > > > > > __data_version__?  Can we just have a single version field
> > > here?
> > > >> > > > > >
> > > >> > > > > > Shouldn't the Admin(Client) function have some way to get
> > the
> > > >> min
> > > >> > and
> > > >> > > > max
> > > >> > > > > > information that we're exposing as well?  I guess we could
> > > have
> > > >> > min,
> > > >> > > > max,
> > > >> > > > > > and current.  Unrelated: is the use of Long rather than long
> > > >> > > deliberate
> > > >> > > > > > here?
> > > >> > > > > >
> > > >> > > > > > It would be good to describe how the command line tool
> > > >> > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> > > >> that
> > > >> > it
> > > >> > > > will
> > > >> > > > > > take and the output that it will generate to STDOUT.
> > > >> > > > > >
> > > >> > > > > > cheers,
> > > >> > > > > > Colin
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > >> > > > > > > Hi all,
> > > >> > > > > > >
> > > >> > > > > > > I've opened KIP-584
> > <https://issues.apache.org/jira/browse/KIP-584>
> > > >> <https://issues.apache.org/jira/browse/KIP-584> <
> > > >> > https://issues.apache.org/jira/browse/KIP-584
> > > >> > > >
> > > >> > > > > > > which
> > > >> > > > > > > is intended to provide a versioning scheme for features.
> > I'd
> > > >> like
> > > >> > > to
> > > >> > > > use
> > > >> > > > > > > this thread to discuss the same. I'd appreciate any
> > feedback
> > > >> on
> > > >> > > this.
> > > >> > > > > > > Here
> > > >> > > > > > > is a link to KIP-584
> > <https://issues.apache.org/jira/browse/KIP-584>
> > > >> <https://issues.apache.org/jira/browse/KIP-584>:
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > >> > > > > > >  .
> > > >> > > > > > >
> > > >> > > > > > > Thank you!
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > Cheers,
> > > >> > > > > > > Kowshik
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hey Boyang,

Thanks for the feedback! I've updated the KIP. Please find below my
response.

> 1. Do you mind updating the non-goal section as we are introducing a
> --feature-force-downgrade to address downgrade concern?

(Kowshik): This was already mentioned. Look for non-goal: 1-b.

> 2. For the flags `--feature` seems to be a redundant prefix, as the script
> is already called `kafka-features.sh`. They could just be called
> `--upgrade` and `--force-downgrade`.

(Kowshik): Great point! Done.

> 3. I don't feel strong to require a confirmation for a normal feature
> upgrade, unless there are other existing scripts doing so.

(Kowshik): Done. Removed now. We now ask for a confirmation only for
downgrades.

> 4. How could we know the existing feature versions when user are only
> executing upgrades? Does the `kafka-features.sh` always send a
> DescribeFeatureRequest to broker first?

(Kowshik): For deletes, yes it will make an ApiVersionsRequest call to show
the
versions of the features. Perhaps the ApiVersionsRequest can be sent
to just the controller to avoid questions on consistency, but that's
an implementation detail.

> 5. I'm not 100% sure, but a script usually use the same flag once, so
maybe
> we should also do that for `--upgrade-feature`? Instead of flagging twice
> for different features, a comma separated list of (feature:max_version)
> will be expected, or something like that.

(Kowshik): Done. I'm using a comma-separated list now.

> 6. "The node data shall be readable via existing ZK tooling" Just trying
to
> clarify, we are not introducing ZK direct read tool in this KIP correct?
As
> for KIP-500 <https://issues.apache.org/jira/browse/KIP-500> we are
eventually going to deprecate all direct ZK access tools.

(Kowshik): Done. Yes, we are not intending to add such a tool. I was just
saying that
if we ever want to read it from ZK, then it's readable via ZK cli (in the
interim).
I have modified the text conveying the intent to support reads via
ApiVersionsRequest only (optionally this request can be directed at the
controller to
void questions on consistency, but that's an implementation detail).

> 7. Could we have a centralized section called `Public Interfaces` to
> summarize all the public API changes? This is a required section in a KIP.
> And we should also write down the new error codes we will be introducing
in
> this KIP, and include both new and old error codes in the Response schema
> comment if possible. For example, UpdateFeatureResponse could expect a
> `NOT_CONTROLLER` error code.

(Kowshik): Done. The error codes have been documented in the response
schemas now.
Added a new section titled "New or Changed Public Interfaces" summarizing
only the
changes made to the public interfaces.


Cheers,
Kowshik


On Fri, Apr 3, 2020 at 9:39 AM Boyang Chen <re...@gmail.com>
wrote:

> Hey Kowshik,
>
> thanks for getting the KIP updated. The Zookeeper routing approach makes
> sense and simplifies the changes.
> Some follow-ups:
>
> 1. Do you mind updating the non-goal section as we are introducing a
> --feature-force-downgrade to address downgrade concern?
>
> 2. For the flags `--feature` seems to be a redundant prefix, as the script
> is already called `kafka-features.sh`. They could just be called
> `--upgrade` and `--force-downgrade`.
>
> 3. I don't feel strong to require a confirmation for a normal feature
> upgrade, unless there are other existing scripts doing so.
>
> 4. How could we know the existing feature versions when user are only
> executing upgrades? Does the `kafka-features.sh` always send a
> DescribeFeatureRequest to broker first?
>
> 5. I'm not 100% sure, but a script usually use the same flag once, so maybe
> we should also do that for `--upgrade-feature`? Instead of flagging twice
> for different features, a comma separated list of (feature:max_version)
> will be expected, or something like that.
>
> 6. "The node data shall be readable via existing ZK tooling" Just trying to
> clarify, we are not introducing ZK direct read tool in this KIP correct? As
> for KIP-500 <https://issues.apache.org/jira/browse/KIP-500> we are
> eventually going to deprecate all direct ZK access tools.
>
> 7. Could we have a centralized section called `Public Interfaces` to
> summarize all the public API changes? This is a required section in a KIP.
> And we should also write down the new error codes we will be introducing in
> this KIP, and include both new and old error codes in the Response schema
> comment if possible. For example, UpdateFeatureResponse could expect a
> `NOT_CONTROLLER` error code.
>
>
> Boyang
>
> On Fri, Apr 3, 2020 at 3:15 AM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hi all,
> >
> > Any other feedback on this KIP before we start the vote?
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> > wrote:
> >
> > > Hey Jun,
> > >
> > > Thanks a lot for the great feedback! Please note that the design
> > > has changed a little bit on the KIP, and we now propagate the finalized
> > > features metadata only via ZK watches (instead of UpdateMetadataRequest
> > > from the controller).
> > >
> > > Please find below my response to your questions/feedback, with the
> prefix
> > > "(Kowshik):".
> > >
> > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > 100.1 Since this request waits for responses from brokers, should we
> > add
> > > a
> > > > timeout in the request (like createTopicRequest)?
> > >
> > > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> > > longer
> > > wait for responses from brokers, since the design has been changed so
> > that
> > > the
> > > features information is propagated via ZK. Nevertheless, it is right to
> > > have a timeout
> > > for the request.
> > >
> > > > 100.2 The response schema is a bit weird. Typically, the response
> just
> > > > shows an error code and an error message, instead of echoing the
> > request.
> > >
> > > (Kowshik): Great point! Yeah, I have modified it to just return an
> error
> > > code and a message.
> > > Previously it was not echoing the "request", rather it was returning
> the
> > > latest set of
> > > cluster-wide finalized features (after applying the updates). But you
> are
> > > right,
> > > the additional info is not required, so I have removed it from the
> > > response schema.
> > >
> > > > 100.3 Should we add a separate request to list/describe the existing
> > > > features?
> > >
> > > (Kowshik): This is already present in the KIP via the
> 'DescribeFeatures'
> > > Admin API,
> > > which, underneath covers uses the ApiVersionsRequest to list/describe
> the
> > > existing features. Please read the 'Tooling support' section.
> > >
> > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > DELETE, the version field doesn't make sense. So, I guess the broker
> > just
> > > > ignores this? An alternative way is to have a separate
> > > DeleteFeaturesRequest
> > >
> > > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> > > controller APIs
> > > serving these different purposes:
> > > 1. updateFeatures
> > > 2. deleteFeatures
> > >
> > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> increasing
> > > > version of the metadata for finalized features." I am wondering why
> the
> > > > ordering is important?
> > >
> > > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > > version), and
> > > it is just the ZK node version. Basically, this is the epoch for the
> > > cluster-wide
> > > finalized feature version metadata. This metadata is served to clients
> > via
> > > the
> > > ApiVersionsResponse (for reads). We propagate updates from the
> > '/features'
> > > ZK node
> > > to all brokers, via ZK watches setup by each broker on the '/features'
> > > node.
> > >
> > > Now here is why the ordering is important:
> > > ZK watches don't propagate at the same time. As a result, the
> > > ApiVersionsResponse
> > > is eventually consistent across brokers. This can introduce cases
> > > where clients see an older lower epoch of the features metadata, after
> a
> > > more recent
> > > higher epoch was returned at a previous point in time. We expect
> clients
> > > to always employ the rule that the latest received higher epoch of
> > metadata
> > > always trumps an older smaller epoch. Those clients that are external
> to
> > > Kafka should strongly consider discovering the latest metadata once
> > during
> > > startup from the brokers, and if required refresh the metadata
> > periodically
> > > (to get the latest metadata).
> > >
> > > > 100.6 Could you specify the required ACL for this new request?
> > >
> > > (Kowshik): What is ACL, and how could I find out which one to specify?
> > > Please could you provide me some pointers? I'll be glad to update the
> > > KIP once I know the next steps.
> > >
> > > > 101. For the broker registration ZK node, should we bump up the
> version
> > > in
> > > the json?
> > >
> > > (Kowshik): Great point! Done. I've increased the version in the broker
> > > json by 1.
> > >
> > > > 102. For the /features ZK node, not sure if we need the epoch field.
> > Each
> > > > ZK node has an internal version field that is incremented on every
> > > update.
> > >
> > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> instead
> > > of explicitly
> > > incremented epoch.
> > >
> > > > 103. "Enabling the actual semantics of a feature version cluster-wide
> > is
> > > > left to the discretion of the logic implementing the feature (ex: can
> > be
> > > > done via dynamic broker config)." Does that mean the broker
> > registration
> > > ZK
> > > > node will be updated dynamically when this happens?
> > >
> > > (Kowshik): Not really. The text was just conveying that a broker could
> > > "know" of
> > > a new feature version, but it does not mean the broker should have also
> > > activated the effects of the feature version. Knowing vs activation
> are 2
> > > separate things,
> > > and the latter can be achieved by dynamic config. I have reworded the
> > text
> > > to
> > > make this clear to the reader.
> > >
> > >
> > > > 104. UpdateMetadataRequest
> > > > 104.1 It would be useful to describe when the feature metadata is
> > > included
> > > > in the request. My understanding is that it's only included if (1)
> > there
> > > is
> > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > failover.
> > > > 104.2 The new fields have the following versions. Why are the
> versions
> > 3+
> > > > when the top version is bumped to 6?
> > > >       "fields":  [
> > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > >           "about": "The name of the feature."},
> > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > >           "about": "The finalized version for the feature."}
> > > >       ]
> > >
> > > (Kowshik): With the new improved design, we have completely eliminated
> > the
> > > need to
> > > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
> > the
> > > notifications for changes to the '/features' ZK node.
> > >
> > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > better
> > > > to use enable/disable?
> > >
> > > (Kowshik): For delete, yes, I have changed it so that we instead call
> it
> > > 'disable'.
> > > However for 'update', it can now also refer to either an upgrade or a
> > > forced downgrade.
> > > Therefore, I have left it the way it is, just calling it as just
> > 'update'.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > >> Hi, Kowshik,
> > >>
> > >> Thanks for the KIP. Looks good overall. A few comments below.
> > >>
> > >> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > >> 100.1 Since this request waits for responses from brokers, should we
> > add a
> > >> timeout in the request (like createTopicRequest)?
> > >> 100.2 The response schema is a bit weird. Typically, the response just
> > >> shows an error code and an error message, instead of echoing the
> > request.
> > >> 100.3 Should we add a separate request to list/describe the existing
> > >> features?
> > >> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > >> DELETE, the version field doesn't make sense. So, I guess the broker
> > just
> > >> ignores this? An alternative way is to have a separate
> > >> DeleteFeaturesRequest
> > >> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > >> version of the metadata for finalized features." I am wondering why
> the
> > >> ordering is important?
> > >> 100.6 Could you specify the required ACL for this new request?
> > >>
> > >> 101. For the broker registration ZK node, should we bump up the
> version
> > in
> > >> the json?
> > >>
> > >> 102. For the /features ZK node, not sure if we need the epoch field.
> > Each
> > >> ZK node has an internal version field that is incremented on every
> > update.
> > >>
> > >> 103. "Enabling the actual semantics of a feature version cluster-wide
> is
> > >> left to the discretion of the logic implementing the feature (ex: can
> be
> > >> done via dynamic broker config)." Does that mean the broker
> registration
> > >> ZK
> > >> node will be updated dynamically when this happens?
> > >>
> > >> 104. UpdateMetadataRequest
> > >> 104.1 It would be useful to describe when the feature metadata is
> > included
> > >> in the request. My understanding is that it's only included if (1)
> there
> > >> is
> > >> a change to the finalized feature; (2) broker restart; (3) controller
> > >> failover.
> > >> 104.2 The new fields have the following versions. Why are the versions
> > 3+
> > >> when the top version is bumped to 6?
> > >>       "fields":  [
> > >>         {"name": "Name", "type":  "string", "versions":  "3+",
> > >>           "about": "The name of the feature."},
> > >>         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >>           "about": "The finalized version for the feature."}
> > >>       ]
> > >>
> > >> 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > >> better
> > >> to use enable/disable?
> > >>
> > >> Jun
> > >>
> > >> On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > kprakasam@confluent.io>
> > >> wrote:
> > >>
> > >> > Hey Boyang,
> > >> >
> > >> > Thanks for the great feedback! I have updated the KIP based on your
> > >> > feedback.
> > >> > Please find my response below for your comments, look for sentences
> > >> > starting
> > >> > with "(Kowshik)" below.
> > >> >
> > >> >
> > >> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > >> could
> > >> > be
> > >> > > converted as "When is it safe for the brokers to start serving new
> > >> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > the
> > >> > > context.
> > >> >
> > >> > (Kowshik): Great point! Done.
> > >> >
> > >> > > 2. In the *Explanation *section, the metadata version number part
> > >> seems a
> > >> > > bit blurred. Could you point a reference to later section that we
> > >> going
> > >> > to
> > >> > > store it in Zookeeper and update it every time when there is a
> > feature
> > >> > > change?
> > >> >
> > >> > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > >> >
> > >> >
> > >> > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > for
> > >> > > features such as group coordinator semantics, there is no legal
> > >> scenario
> > >> > to
> > >> > > perform a downgrade at all. So having downgrade door open is
> pretty
> > >> > > error-prone as human faults happen all the time. I'm assuming as
> new
> > >> > > features are implemented, it's not very hard to add a flag during
> > >> feature
> > >> > > creation to indicate whether this feature is "downgradable". Could
> > you
> > >> > > explain a bit more on the extra engineering effort for shipping
> this
> > >> KIP
> > >> > > with downgrade protection in place?
> > >> >
> > >> > (Kowshik): Great point! I'd agree and disagree here. While I agree
> > that
> > >> > accidental
> > >> > downgrades can cause problems, I also think sometimes downgrades
> > should
> > >> > be allowed for emergency reasons (not all downgrades cause issues).
> > >> > It is just subjective to the feature being downgraded.
> > >> >
> > >> > To be more strict about feature version downgrades, I have modified
> > the
> > >> KIP
> > >> > proposing that we mandate a `--force-downgrade` flag be used in the
> > >> > UPDATE_FEATURES api
> > >> > and the tooling, whenever the human is downgrading a finalized
> feature
> > >> > version.
> > >> > Hopefully this should cover the requirement, until we find the need
> > for
> > >> > advanced downgrade support.
> > >> >
> > >> > > 4. "Each broker’s supported dictionary of feature versions will be
> > >> > defined
> > >> > > in the broker code." So this means in order to restrict a certain
> > >> > feature,
> > >> > > we need to start the broker first and then send a feature gating
> > >> request
> > >> > > immediately, which introduces a time gap and the intended-to-close
> > >> > feature
> > >> > > could actually serve request during this phase. Do you think we
> > should
> > >> > also
> > >> > > support configurations as well so that admin user could freely
> roll
> > >> up a
> > >> > > cluster with all nodes complying the same feature gating, without
> > >> > worrying
> > >> > > about the turnaround time to propagate the message only after the
> > >> cluster
> > >> > > starts up?
> > >> >
> > >> > (Kowshik): This is a great point/question. One of the expectations
> out
> > >> of
> > >> > this KIP, which is
> > >> > already followed in the broker, is the following.
> > >> >  - Imagine at time T1 the broker starts up and registers it’s
> presence
> > >> in
> > >> > ZK,
> > >> >    along with advertising it’s supported features.
> > >> >  - Imagine at a future time T2 the broker receives the
> > >> > UpdateMetadataRequest
> > >> >    from the controller, which contains the latest finalized features
> > as
> > >> > seen by
> > >> >    the controller. The broker validates this data against it’s
> > supported
> > >> > features to
> > >> >    make sure there is no mismatch (it will shutdown if there is an
> > >> > incompatibility).
> > >> >
> > >> > It is expected that during the time between the 2 events T1 and T2,
> > the
> > >> > broker is
> > >> > almost a silent entity in the cluster. It does not add any value to
> > the
> > >> > cluster, or carry
> > >> > out any important broker activities. By “important”, I mean it is
> not
> > >> doing
> > >> > mutations
> > >> > on it’s persistence, not mutating critical in-memory state, won’t be
> > >> > serving
> > >> > produce/fetch requests. Note it doesn’t even know it’s assigned
> > >> partitions
> > >> > until
> > >> > it receives UpdateMetadataRequest from controller. Anything the
> broker
> > >> is
> > >> > doing up
> > >> > until this point is not damaging/useful.
> > >> >
> > >> > I’ve clarified the above in the KIP, see this new section:
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > >> > .
> > >> >
> > >> > > 5. "adding a new Feature, updating or deleting an existing
> Feature",
> > >> may
> > >> > be
> > >> > > I misunderstood something, I thought the features are defined in
> > >> broker
> > >> > > code, so admin could not really create a new feature?
> > >> >
> > >> > (Kowshik): Great point! You understood this right. Here adding a
> > feature
> > >> > means we are
> > >> > adding a cluster-wide finalized *max* version for a feature that was
> > >> > previously never finalized.
> > >> > I have clarified this in the KIP now.
> > >> >
> > >> > > 6. I think we need a separate error code like
> > >> FEATURE_UPDATE_IN_PROGRESS
> > >> > to
> > >> > > reject a concurrent feature update request.
> > >> >
> > >> > (Kowshik): Great point! I have modified the KIP adding the above
> (see
> > >> > 'Tooling support -> Admin API changes').
> > >> >
> > >> > > 7. I think we haven't discussed the alternative solution to pass
> the
> > >> > > feature information through Zookeeper. Is that mentioned in the
> KIP
> > to
> > >> > > justify why using UpdateMetadata is more favorable?
> > >> >
> > >> > (Kowshik): Nice question! The broker reads finalized feature info
> > >> stored in
> > >> > ZK,
> > >> > only during startup when it does a validation. When serving
> > >> > `ApiVersionsRequest`, the
> > >> > broker does not read this info from ZK directly. I'd imagine the
> risk
> > is
> > >> > that it can increase
> > >> > the ZK read QPS which can be a bottleneck for the system. Today, in
> > >> Kafka
> > >> > we use the
> > >> > controller to fan out ZK updates to brokers and we want to stick to
> > that
> > >> > pattern to avoid
> > >> > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > >> >
> > >> > > 8. I was under the impression that user could configure a range of
> > >> > > supported versions, what's the trade-off for allowing single
> > finalized
> > >> > > version only?
> > >> >
> > >> > (Kowshik): Great question! The finalized version of a feature
> > basically
> > >> > refers to
> > >> > the cluster-wide finalized feature "maximum" version. For example,
> if
> > >> the
> > >> > 'group_coordinator' feature
> > >> > has the finalized version set to 10, then, it means that
> cluster-wide
> > >> all
> > >> > versions upto v10 are
> > >> > supported for this feature. However, note that if some version (ex:
> > v0)
> > >> > gets deprecated
> > >> > for this feature, then we don’t convey that using this scheme (also
> > >> > supporting deprecation is a non-goal).
> > >> >
> > >> > (Kowshik): I’ve now modified the KIP at all points, refering to
> > >> finalized
> > >> > feature "maximum" versions.
> > >> >
> > >> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > >> > producer
> > >> >
> > >> > (Kowshik): Great point! Done.
> > >> >
> > >> >
> > >> > Cheers,
> > >> > Kowshik
> > >> >
> > >> >
> > >> > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > reluctanthero104@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > Hey Kowshik,
> > >> > >
> > >> > > thanks for the revised KIP. Got a couple of questions:
> > >> > >
> > >> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > >> could
> > >> > be
> > >> > > converted as "When is it safe for the brokers to start serving new
> > >> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > the
> > >> > > context.
> > >> > >
> > >> > > 2. In the *Explanation *section, the metadata version number part
> > >> seems a
> > >> > > bit blurred. Could you point a reference to later section that we
> > >> going
> > >> > to
> > >> > > store it in Zookeeper and update it every time when there is a
> > feature
> > >> > > change?
> > >> > >
> > >> > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > for
> > >> > > features such as group coordinator semantics, there is no legal
> > >> scenario
> > >> > to
> > >> > > perform a downgrade at all. So having downgrade door open is
> pretty
> > >> > > error-prone as human faults happen all the time. I'm assuming as
> new
> > >> > > features are implemented, it's not very hard to add a flag during
> > >> feature
> > >> > > creation to indicate whether this feature is "downgradable". Could
> > you
> > >> > > explain a bit more on the extra engineering effort for shipping
> this
> > >> KIP
> > >> > > with downgrade protection in place?
> > >> > >
> > >> > > 4. "Each broker’s supported dictionary of feature versions will be
> > >> > defined
> > >> > > in the broker code." So this means in order to restrict a certain
> > >> > feature,
> > >> > > we need to start the broker first and then send a feature gating
> > >> request
> > >> > > immediately, which introduces a time gap and the intended-to-close
> > >> > feature
> > >> > > could actually serve request during this phase. Do you think we
> > should
> > >> > also
> > >> > > support configurations as well so that admin user could freely
> roll
> > >> up a
> > >> > > cluster with all nodes complying the same feature gating, without
> > >> > worrying
> > >> > > about the turnaround time to propagate the message only after the
> > >> cluster
> > >> > > starts up?
> > >> > >
> > >> > > 5. "adding a new Feature, updating or deleting an existing
> Feature",
> > >> may
> > >> > be
> > >> > > I misunderstood something, I thought the features are defined in
> > >> broker
> > >> > > code, so admin could not really create a new feature?
> > >> > >
> > >> > > 6. I think we need a separate error code like
> > >> FEATURE_UPDATE_IN_PROGRESS
> > >> > to
> > >> > > reject a concurrent feature update request.
> > >> > >
> > >> > > 7. I think we haven't discussed the alternative solution to pass
> the
> > >> > > feature information through Zookeeper. Is that mentioned in the
> KIP
> > to
> > >> > > justify why using UpdateMetadata is more favorable?
> > >> > >
> > >> > > 8. I was under the impression that user could configure a range of
> > >> > > supported versions, what's the trade-off for allowing single
> > finalized
> > >> > > version only?
> > >> > >
> > >> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > >> > producer
> > >> > >
> > >> > > Boyang
> > >> > >
> > >> > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> > >> wrote:
> > >> > >
> > >> > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > >> > > > > Hi Colin,
> > >> > > > >
> > >> > > > > Thanks for the feedback! I've changed the KIP to address your
> > >> > > > > suggestions.
> > >> > > > > Please find below my explanation. Here is a link to KIP 584:
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > >> > > > > .
> > >> > > > >
> > >> > > > > 1. '__data_version__' is the version of the finalized feature
> > >> > metadata
> > >> > > > > (i.e. actual ZK node contents), while the '__schema_version__'
> > is
> > >> the
> > >> > > > > version of the schema of the data persisted in ZK. These serve
> > >> > > different
> > >> > > > > purposes. '__data_version__' is is useful mainly to clients
> > during
> > >> > > reads,
> > >> > > > > to differentiate between the 2 versions of eventually
> consistent
> > >> > > > 'finalized
> > >> > > > > features' metadata (i.e. larger metadata version is more
> > recent).
> > >> > > > > '__schema_version__' provides an additional degree of
> > flexibility,
> > >> > > where
> > >> > > > if
> > >> > > > > we decide to change the schema for '/features' node in ZK (in
> > the
> > >> > > > future),
> > >> > > > > then we can manage broker roll outs suitably (i.e.
> > >> > > > > serialization/deserialization of the ZK data can be handled
> > >> safely).
> > >> > > >
> > >> > > > Hi Kowshik,
> > >> > > >
> > >> > > > If you're talking about a number that lets you know if data is
> > more
> > >> or
> > >> > > > less recent, we would typically call that an epoch, and not a
> > >> version.
> > >> > > For
> > >> > > > the ZK data structures, the word "version" is typically reserved
> > for
> > >> > > > describing changes to the overall schema of the data that is
> > >> written to
> > >> > > > ZooKeeper.  We don't even really change the "version" of those
> > >> schemas
> > >> > > that
> > >> > > > much, since most changes are backwards-compatible.  But we do
> > >> include
> > >> > > that
> > >> > > > version field just in case.
> > >> > > >
> > >> > > > I don't think we really need an epoch here, though, since we can
> > >> just
> > >> > > look
> > >> > > > at the broker epoch.  Whenever the broker registers, its epoch
> > will
> > >> be
> > >> > > > greater than the previous broker epoch.  And the newly
> registered
> > >> data
> > >> > > will
> > >> > > > take priority.  This will be a lot simpler than adding a
> separate
> > >> epoch
> > >> > > > system, I think.
> > >> > > >
> > >> > > > >
> > >> > > > > 2. Regarding admin client needing min and max information -
> you
> > >> are
> > >> > > > right!
> > >> > > > > I've changed the KIP such that the Admin API also allows the
> > user
> > >> to
> > >> > > read
> > >> > > > > 'supported features' from a specific broker. Please look at
> the
> > >> > section
> > >> > > > > "Admin API changes".
> > >> > > >
> > >> > > > Thanks.
> > >> > > >
> > >> > > > >
> > >> > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > deliberate.
> > >> > I've
> > >> > > > > improved the KIP to just use `long` at all places.
> > >> > > >
> > >> > > > Sounds good.
> > >> > > >
> > >> > > > >
> > >> > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > I've
> > >> > > > updated
> > >> > > > > the KIP sketching the functionality provided by this tool,
> with
> > >> some
> > >> > > > > examples. Please look at the section "Tooling support
> examples".
> > >> > > > >
> > >> > > > > Thank you!
> > >> > > >
> > >> > > >
> > >> > > > Thanks, Kowshik.
> > >> > > >
> > >> > > > cheers,
> > >> > > > Colin
> > >> > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > Cheers,
> > >> > > > > Kowshik
> > >> > > > >
> > >> > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > cmccabe@apache.org
> > >> >
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > > Thanks, Kowshik, this looks good.
> > >> > > > > >
> > >> > > > > > In the "Schema" section, do we really need both
> > >> __schema_version__
> > >> > > and
> > >> > > > > > __data_version__?  Can we just have a single version field
> > here?
> > >> > > > > >
> > >> > > > > > Shouldn't the Admin(Client) function have some way to get
> the
> > >> min
> > >> > and
> > >> > > > max
> > >> > > > > > information that we're exposing as well?  I guess we could
> > have
> > >> > min,
> > >> > > > max,
> > >> > > > > > and current.  Unrelated: is the use of Long rather than long
> > >> > > deliberate
> > >> > > > > > here?
> > >> > > > > >
> > >> > > > > > It would be good to describe how the command line tool
> > >> > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> > >> that
> > >> > it
> > >> > > > will
> > >> > > > > > take and the output that it will generate to STDOUT.
> > >> > > > > >
> > >> > > > > > cheers,
> > >> > > > > > Colin
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > >> > > > > > > Hi all,
> > >> > > > > > >
> > >> > > > > > > I've opened KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>
> > >> <https://issues.apache.org/jira/browse/KIP-584> <
> > >> > https://issues.apache.org/jira/browse/KIP-584
> > >> > > >
> > >> > > > > > > which
> > >> > > > > > > is intended to provide a versioning scheme for features.
> I'd
> > >> like
> > >> > > to
> > >> > > > use
> > >> > > > > > > this thread to discuss the same. I'd appreciate any
> feedback
> > >> on
> > >> > > this.
> > >> > > > > > > Here
> > >> > > > > > > is a link to KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>
> > >> <https://issues.apache.org/jira/browse/KIP-584>:
> > >> > > > > > >
> > >> > > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > >> > > > > > >  .
> > >> > > > > > >
> > >> > > > > > > Thank you!
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Cheers,
> > >> > > > > > > Kowshik
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Boyang Chen <re...@gmail.com>.
Hey Kowshik,

thanks for getting the KIP updated. The Zookeeper routing approach makes
sense and simplifies the changes.
Some follow-ups:

1. Do you mind updating the non-goal section as we are introducing a
--feature-force-downgrade to address downgrade concern?

2. For the flags `--feature` seems to be a redundant prefix, as the script
is already called `kafka-features.sh`. They could just be called
`--upgrade` and `--force-downgrade`.

3. I don't feel strong to require a confirmation for a normal feature
upgrade, unless there are other existing scripts doing so.

4. How could we know the existing feature versions when user are only
executing upgrades? Does the `kafka-features.sh` always send a
DescribeFeatureRequest to broker first?

5. I'm not 100% sure, but a script usually use the same flag once, so maybe
we should also do that for `--upgrade-feature`? Instead of flagging twice
for different features, a comma separated list of (feature:max_version)
will be expected, or something like that.

6. "The node data shall be readable via existing ZK tooling" Just trying to
clarify, we are not introducing ZK direct read tool in this KIP correct? As
for KIP-500 we are eventually going to deprecate all direct ZK access tools.

7. Could we have a centralized section called `Public Interfaces` to
summarize all the public API changes? This is a required section in a KIP.
And we should also write down the new error codes we will be introducing in
this KIP, and include both new and old error codes in the Response schema
comment if possible. For example, UpdateFeatureResponse could expect a
`NOT_CONTROLLER` error code.


Boyang

On Fri, Apr 3, 2020 at 3:15 AM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hi all,
>
> Any other feedback on this KIP before we start the vote?
>
>
> Cheers,
> Kowshik
>
> On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hey Jun,
> >
> > Thanks a lot for the great feedback! Please note that the design
> > has changed a little bit on the KIP, and we now propagate the finalized
> > features metadata only via ZK watches (instead of UpdateMetadataRequest
> > from the controller).
> >
> > Please find below my response to your questions/feedback, with the prefix
> > "(Kowshik):".
> >
> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > 100.1 Since this request waits for responses from brokers, should we
> add
> > a
> > > timeout in the request (like createTopicRequest)?
> >
> > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> > longer
> > wait for responses from brokers, since the design has been changed so
> that
> > the
> > features information is propagated via ZK. Nevertheless, it is right to
> > have a timeout
> > for the request.
> >
> > > 100.2 The response schema is a bit weird. Typically, the response just
> > > shows an error code and an error message, instead of echoing the
> request.
> >
> > (Kowshik): Great point! Yeah, I have modified it to just return an error
> > code and a message.
> > Previously it was not echoing the "request", rather it was returning the
> > latest set of
> > cluster-wide finalized features (after applying the updates). But you are
> > right,
> > the additional info is not required, so I have removed it from the
> > response schema.
> >
> > > 100.3 Should we add a separate request to list/describe the existing
> > > features?
> >
> > (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
> > Admin API,
> > which, underneath covers uses the ApiVersionsRequest to list/describe the
> > existing features. Please read the 'Tooling support' section.
> >
> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > DELETE, the version field doesn't make sense. So, I guess the broker
> just
> > > ignores this? An alternative way is to have a separate
> > DeleteFeaturesRequest
> >
> > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> > controller APIs
> > serving these different purposes:
> > 1. updateFeatures
> > 2. deleteFeatures
> >
> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > > version of the metadata for finalized features." I am wondering why the
> > > ordering is important?
> >
> > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> > version), and
> > it is just the ZK node version. Basically, this is the epoch for the
> > cluster-wide
> > finalized feature version metadata. This metadata is served to clients
> via
> > the
> > ApiVersionsResponse (for reads). We propagate updates from the
> '/features'
> > ZK node
> > to all brokers, via ZK watches setup by each broker on the '/features'
> > node.
> >
> > Now here is why the ordering is important:
> > ZK watches don't propagate at the same time. As a result, the
> > ApiVersionsResponse
> > is eventually consistent across brokers. This can introduce cases
> > where clients see an older lower epoch of the features metadata, after a
> > more recent
> > higher epoch was returned at a previous point in time. We expect clients
> > to always employ the rule that the latest received higher epoch of
> metadata
> > always trumps an older smaller epoch. Those clients that are external to
> > Kafka should strongly consider discovering the latest metadata once
> during
> > startup from the brokers, and if required refresh the metadata
> periodically
> > (to get the latest metadata).
> >
> > > 100.6 Could you specify the required ACL for this new request?
> >
> > (Kowshik): What is ACL, and how could I find out which one to specify?
> > Please could you provide me some pointers? I'll be glad to update the
> > KIP once I know the next steps.
> >
> > > 101. For the broker registration ZK node, should we bump up the version
> > in
> > the json?
> >
> > (Kowshik): Great point! Done. I've increased the version in the broker
> > json by 1.
> >
> > > 102. For the /features ZK node, not sure if we need the epoch field.
> Each
> > > ZK node has an internal version field that is incremented on every
> > update.
> >
> > (Kowshik): Great point! Done. I'm using the ZK node version now, instead
> > of explicitly
> > incremented epoch.
> >
> > > 103. "Enabling the actual semantics of a feature version cluster-wide
> is
> > > left to the discretion of the logic implementing the feature (ex: can
> be
> > > done via dynamic broker config)." Does that mean the broker
> registration
> > ZK
> > > node will be updated dynamically when this happens?
> >
> > (Kowshik): Not really. The text was just conveying that a broker could
> > "know" of
> > a new feature version, but it does not mean the broker should have also
> > activated the effects of the feature version. Knowing vs activation are 2
> > separate things,
> > and the latter can be achieved by dynamic config. I have reworded the
> text
> > to
> > make this clear to the reader.
> >
> >
> > > 104. UpdateMetadataRequest
> > > 104.1 It would be useful to describe when the feature metadata is
> > included
> > > in the request. My understanding is that it's only included if (1)
> there
> > is
> > > a change to the finalized feature; (2) broker restart; (3) controller
> > > failover.
> > > 104.2 The new fields have the following versions. Why are the versions
> 3+
> > > when the top version is bumped to 6?
> > >       "fields":  [
> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > >           "about": "The name of the feature."},
> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >           "about": "The finalized version for the feature."}
> > >       ]
> >
> > (Kowshik): With the new improved design, we have completely eliminated
> the
> > need to
> > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
> the
> > notifications for changes to the '/features' ZK node.
> >
> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > better
> > > to use enable/disable?
> >
> > (Kowshik): For delete, yes, I have changed it so that we instead call it
> > 'disable'.
> > However for 'update', it can now also refer to either an upgrade or a
> > forced downgrade.
> > Therefore, I have left it the way it is, just calling it as just
> 'update'.
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> >
> >> Hi, Kowshik,
> >>
> >> Thanks for the KIP. Looks good overall. A few comments below.
> >>
> >> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> >> 100.1 Since this request waits for responses from brokers, should we
> add a
> >> timeout in the request (like createTopicRequest)?
> >> 100.2 The response schema is a bit weird. Typically, the response just
> >> shows an error code and an error message, instead of echoing the
> request.
> >> 100.3 Should we add a separate request to list/describe the existing
> >> features?
> >> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> >> DELETE, the version field doesn't make sense. So, I guess the broker
> just
> >> ignores this? An alternative way is to have a separate
> >> DeleteFeaturesRequest
> >> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> >> version of the metadata for finalized features." I am wondering why the
> >> ordering is important?
> >> 100.6 Could you specify the required ACL for this new request?
> >>
> >> 101. For the broker registration ZK node, should we bump up the version
> in
> >> the json?
> >>
> >> 102. For the /features ZK node, not sure if we need the epoch field.
> Each
> >> ZK node has an internal version field that is incremented on every
> update.
> >>
> >> 103. "Enabling the actual semantics of a feature version cluster-wide is
> >> left to the discretion of the logic implementing the feature (ex: can be
> >> done via dynamic broker config)." Does that mean the broker registration
> >> ZK
> >> node will be updated dynamically when this happens?
> >>
> >> 104. UpdateMetadataRequest
> >> 104.1 It would be useful to describe when the feature metadata is
> included
> >> in the request. My understanding is that it's only included if (1) there
> >> is
> >> a change to the finalized feature; (2) broker restart; (3) controller
> >> failover.
> >> 104.2 The new fields have the following versions. Why are the versions
> 3+
> >> when the top version is bumped to 6?
> >>       "fields":  [
> >>         {"name": "Name", "type":  "string", "versions":  "3+",
> >>           "about": "The name of the feature."},
> >>         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >>           "about": "The finalized version for the feature."}
> >>       ]
> >>
> >> 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> >> better
> >> to use enable/disable?
> >>
> >> Jun
> >>
> >> On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> kprakasam@confluent.io>
> >> wrote:
> >>
> >> > Hey Boyang,
> >> >
> >> > Thanks for the great feedback! I have updated the KIP based on your
> >> > feedback.
> >> > Please find my response below for your comments, look for sentences
> >> > starting
> >> > with "(Kowshik)" below.
> >> >
> >> >
> >> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> >> could
> >> > be
> >> > > converted as "When is it safe for the brokers to start serving new
> >> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> the
> >> > > context.
> >> >
> >> > (Kowshik): Great point! Done.
> >> >
> >> > > 2. In the *Explanation *section, the metadata version number part
> >> seems a
> >> > > bit blurred. Could you point a reference to later section that we
> >> going
> >> > to
> >> > > store it in Zookeeper and update it every time when there is a
> feature
> >> > > change?
> >> >
> >> > (Kowshik): Great point! Done. I've added a reference in the KIP.
> >> >
> >> >
> >> > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> for
> >> > > features such as group coordinator semantics, there is no legal
> >> scenario
> >> > to
> >> > > perform a downgrade at all. So having downgrade door open is pretty
> >> > > error-prone as human faults happen all the time. I'm assuming as new
> >> > > features are implemented, it's not very hard to add a flag during
> >> feature
> >> > > creation to indicate whether this feature is "downgradable". Could
> you
> >> > > explain a bit more on the extra engineering effort for shipping this
> >> KIP
> >> > > with downgrade protection in place?
> >> >
> >> > (Kowshik): Great point! I'd agree and disagree here. While I agree
> that
> >> > accidental
> >> > downgrades can cause problems, I also think sometimes downgrades
> should
> >> > be allowed for emergency reasons (not all downgrades cause issues).
> >> > It is just subjective to the feature being downgraded.
> >> >
> >> > To be more strict about feature version downgrades, I have modified
> the
> >> KIP
> >> > proposing that we mandate a `--force-downgrade` flag be used in the
> >> > UPDATE_FEATURES api
> >> > and the tooling, whenever the human is downgrading a finalized feature
> >> > version.
> >> > Hopefully this should cover the requirement, until we find the need
> for
> >> > advanced downgrade support.
> >> >
> >> > > 4. "Each broker’s supported dictionary of feature versions will be
> >> > defined
> >> > > in the broker code." So this means in order to restrict a certain
> >> > feature,
> >> > > we need to start the broker first and then send a feature gating
> >> request
> >> > > immediately, which introduces a time gap and the intended-to-close
> >> > feature
> >> > > could actually serve request during this phase. Do you think we
> should
> >> > also
> >> > > support configurations as well so that admin user could freely roll
> >> up a
> >> > > cluster with all nodes complying the same feature gating, without
> >> > worrying
> >> > > about the turnaround time to propagate the message only after the
> >> cluster
> >> > > starts up?
> >> >
> >> > (Kowshik): This is a great point/question. One of the expectations out
> >> of
> >> > this KIP, which is
> >> > already followed in the broker, is the following.
> >> >  - Imagine at time T1 the broker starts up and registers it’s presence
> >> in
> >> > ZK,
> >> >    along with advertising it’s supported features.
> >> >  - Imagine at a future time T2 the broker receives the
> >> > UpdateMetadataRequest
> >> >    from the controller, which contains the latest finalized features
> as
> >> > seen by
> >> >    the controller. The broker validates this data against it’s
> supported
> >> > features to
> >> >    make sure there is no mismatch (it will shutdown if there is an
> >> > incompatibility).
> >> >
> >> > It is expected that during the time between the 2 events T1 and T2,
> the
> >> > broker is
> >> > almost a silent entity in the cluster. It does not add any value to
> the
> >> > cluster, or carry
> >> > out any important broker activities. By “important”, I mean it is not
> >> doing
> >> > mutations
> >> > on it’s persistence, not mutating critical in-memory state, won’t be
> >> > serving
> >> > produce/fetch requests. Note it doesn’t even know it’s assigned
> >> partitions
> >> > until
> >> > it receives UpdateMetadataRequest from controller. Anything the broker
> >> is
> >> > doing up
> >> > until this point is not damaging/useful.
> >> >
> >> > I’ve clarified the above in the KIP, see this new section:
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> >> > .
> >> >
> >> > > 5. "adding a new Feature, updating or deleting an existing Feature",
> >> may
> >> > be
> >> > > I misunderstood something, I thought the features are defined in
> >> broker
> >> > > code, so admin could not really create a new feature?
> >> >
> >> > (Kowshik): Great point! You understood this right. Here adding a
> feature
> >> > means we are
> >> > adding a cluster-wide finalized *max* version for a feature that was
> >> > previously never finalized.
> >> > I have clarified this in the KIP now.
> >> >
> >> > > 6. I think we need a separate error code like
> >> FEATURE_UPDATE_IN_PROGRESS
> >> > to
> >> > > reject a concurrent feature update request.
> >> >
> >> > (Kowshik): Great point! I have modified the KIP adding the above (see
> >> > 'Tooling support -> Admin API changes').
> >> >
> >> > > 7. I think we haven't discussed the alternative solution to pass the
> >> > > feature information through Zookeeper. Is that mentioned in the KIP
> to
> >> > > justify why using UpdateMetadata is more favorable?
> >> >
> >> > (Kowshik): Nice question! The broker reads finalized feature info
> >> stored in
> >> > ZK,
> >> > only during startup when it does a validation. When serving
> >> > `ApiVersionsRequest`, the
> >> > broker does not read this info from ZK directly. I'd imagine the risk
> is
> >> > that it can increase
> >> > the ZK read QPS which can be a bottleneck for the system. Today, in
> >> Kafka
> >> > we use the
> >> > controller to fan out ZK updates to brokers and we want to stick to
> that
> >> > pattern to avoid
> >> > the ZK read bottleneck when serving `ApiVersionsRequest`.
> >> >
> >> > > 8. I was under the impression that user could configure a range of
> >> > > supported versions, what's the trade-off for allowing single
> finalized
> >> > > version only?
> >> >
> >> > (Kowshik): Great question! The finalized version of a feature
> basically
> >> > refers to
> >> > the cluster-wide finalized feature "maximum" version. For example, if
> >> the
> >> > 'group_coordinator' feature
> >> > has the finalized version set to 10, then, it means that cluster-wide
> >> all
> >> > versions upto v10 are
> >> > supported for this feature. However, note that if some version (ex:
> v0)
> >> > gets deprecated
> >> > for this feature, then we don’t convey that using this scheme (also
> >> > supporting deprecation is a non-goal).
> >> >
> >> > (Kowshik): I’ve now modified the KIP at all points, refering to
> >> finalized
> >> > feature "maximum" versions.
> >> >
> >> > > 9. One minor syntax fix: Note that here the "client" here may be a
> >> > producer
> >> >
> >> > (Kowshik): Great point! Done.
> >> >
> >> >
> >> > Cheers,
> >> > Kowshik
> >> >
> >> >
> >> > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> reluctanthero104@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > Hey Kowshik,
> >> > >
> >> > > thanks for the revised KIP. Got a couple of questions:
> >> > >
> >> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> >> could
> >> > be
> >> > > converted as "When is it safe for the brokers to start serving new
> >> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> the
> >> > > context.
> >> > >
> >> > > 2. In the *Explanation *section, the metadata version number part
> >> seems a
> >> > > bit blurred. Could you point a reference to later section that we
> >> going
> >> > to
> >> > > store it in Zookeeper and update it every time when there is a
> feature
> >> > > change?
> >> > >
> >> > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> for
> >> > > features such as group coordinator semantics, there is no legal
> >> scenario
> >> > to
> >> > > perform a downgrade at all. So having downgrade door open is pretty
> >> > > error-prone as human faults happen all the time. I'm assuming as new
> >> > > features are implemented, it's not very hard to add a flag during
> >> feature
> >> > > creation to indicate whether this feature is "downgradable". Could
> you
> >> > > explain a bit more on the extra engineering effort for shipping this
> >> KIP
> >> > > with downgrade protection in place?
> >> > >
> >> > > 4. "Each broker’s supported dictionary of feature versions will be
> >> > defined
> >> > > in the broker code." So this means in order to restrict a certain
> >> > feature,
> >> > > we need to start the broker first and then send a feature gating
> >> request
> >> > > immediately, which introduces a time gap and the intended-to-close
> >> > feature
> >> > > could actually serve request during this phase. Do you think we
> should
> >> > also
> >> > > support configurations as well so that admin user could freely roll
> >> up a
> >> > > cluster with all nodes complying the same feature gating, without
> >> > worrying
> >> > > about the turnaround time to propagate the message only after the
> >> cluster
> >> > > starts up?
> >> > >
> >> > > 5. "adding a new Feature, updating or deleting an existing Feature",
> >> may
> >> > be
> >> > > I misunderstood something, I thought the features are defined in
> >> broker
> >> > > code, so admin could not really create a new feature?
> >> > >
> >> > > 6. I think we need a separate error code like
> >> FEATURE_UPDATE_IN_PROGRESS
> >> > to
> >> > > reject a concurrent feature update request.
> >> > >
> >> > > 7. I think we haven't discussed the alternative solution to pass the
> >> > > feature information through Zookeeper. Is that mentioned in the KIP
> to
> >> > > justify why using UpdateMetadata is more favorable?
> >> > >
> >> > > 8. I was under the impression that user could configure a range of
> >> > > supported versions, what's the trade-off for allowing single
> finalized
> >> > > version only?
> >> > >
> >> > > 9. One minor syntax fix: Note that here the "client" here may be a
> >> > producer
> >> > >
> >> > > Boyang
> >> > >
> >> > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> >> wrote:
> >> > >
> >> > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> >> > > > > Hi Colin,
> >> > > > >
> >> > > > > Thanks for the feedback! I've changed the KIP to address your
> >> > > > > suggestions.
> >> > > > > Please find below my explanation. Here is a link to KIP 584:
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > .
> >> > > > >
> >> > > > > 1. '__data_version__' is the version of the finalized feature
> >> > metadata
> >> > > > > (i.e. actual ZK node contents), while the '__schema_version__'
> is
> >> the
> >> > > > > version of the schema of the data persisted in ZK. These serve
> >> > > different
> >> > > > > purposes. '__data_version__' is is useful mainly to clients
> during
> >> > > reads,
> >> > > > > to differentiate between the 2 versions of eventually consistent
> >> > > > 'finalized
> >> > > > > features' metadata (i.e. larger metadata version is more
> recent).
> >> > > > > '__schema_version__' provides an additional degree of
> flexibility,
> >> > > where
> >> > > > if
> >> > > > > we decide to change the schema for '/features' node in ZK (in
> the
> >> > > > future),
> >> > > > > then we can manage broker roll outs suitably (i.e.
> >> > > > > serialization/deserialization of the ZK data can be handled
> >> safely).
> >> > > >
> >> > > > Hi Kowshik,
> >> > > >
> >> > > > If you're talking about a number that lets you know if data is
> more
> >> or
> >> > > > less recent, we would typically call that an epoch, and not a
> >> version.
> >> > > For
> >> > > > the ZK data structures, the word "version" is typically reserved
> for
> >> > > > describing changes to the overall schema of the data that is
> >> written to
> >> > > > ZooKeeper.  We don't even really change the "version" of those
> >> schemas
> >> > > that
> >> > > > much, since most changes are backwards-compatible.  But we do
> >> include
> >> > > that
> >> > > > version field just in case.
> >> > > >
> >> > > > I don't think we really need an epoch here, though, since we can
> >> just
> >> > > look
> >> > > > at the broker epoch.  Whenever the broker registers, its epoch
> will
> >> be
> >> > > > greater than the previous broker epoch.  And the newly registered
> >> data
> >> > > will
> >> > > > take priority.  This will be a lot simpler than adding a separate
> >> epoch
> >> > > > system, I think.
> >> > > >
> >> > > > >
> >> > > > > 2. Regarding admin client needing min and max information - you
> >> are
> >> > > > right!
> >> > > > > I've changed the KIP such that the Admin API also allows the
> user
> >> to
> >> > > read
> >> > > > > 'supported features' from a specific broker. Please look at the
> >> > section
> >> > > > > "Admin API changes".
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > > >
> >> > > > > 3. Regarding the use of `long` vs `Long` - it was not
> deliberate.
> >> > I've
> >> > > > > improved the KIP to just use `long` at all places.
> >> > > >
> >> > > > Sounds good.
> >> > > >
> >> > > > >
> >> > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> I've
> >> > > > updated
> >> > > > > the KIP sketching the functionality provided by this tool, with
> >> some
> >> > > > > examples. Please look at the section "Tooling support examples".
> >> > > > >
> >> > > > > Thank you!
> >> > > >
> >> > > >
> >> > > > Thanks, Kowshik.
> >> > > >
> >> > > > cheers,
> >> > > > Colin
> >> > > >
> >> > > > >
> >> > > > >
> >> > > > > Cheers,
> >> > > > > Kowshik
> >> > > > >
> >> > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> cmccabe@apache.org
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > Thanks, Kowshik, this looks good.
> >> > > > > >
> >> > > > > > In the "Schema" section, do we really need both
> >> __schema_version__
> >> > > and
> >> > > > > > __data_version__?  Can we just have a single version field
> here?
> >> > > > > >
> >> > > > > > Shouldn't the Admin(Client) function have some way to get the
> >> min
> >> > and
> >> > > > max
> >> > > > > > information that we're exposing as well?  I guess we could
> have
> >> > min,
> >> > > > max,
> >> > > > > > and current.  Unrelated: is the use of Long rather than long
> >> > > deliberate
> >> > > > > > here?
> >> > > > > >
> >> > > > > > It would be good to describe how the command line tool
> >> > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> >> that
> >> > it
> >> > > > will
> >> > > > > > take and the output that it will generate to STDOUT.
> >> > > > > >
> >> > > > > > cheers,
> >> > > > > > Colin
> >> > > > > >
> >> > > > > >
> >> > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> >> > > > > > > Hi all,
> >> > > > > > >
> >> > > > > > > I've opened KIP-584
> >> <https://issues.apache.org/jira/browse/KIP-584> <
> >> > https://issues.apache.org/jira/browse/KIP-584
> >> > > >
> >> > > > > > > which
> >> > > > > > > is intended to provide a versioning scheme for features. I'd
> >> like
> >> > > to
> >> > > > use
> >> > > > > > > this thread to discuss the same. I'd appreciate any feedback
> >> on
> >> > > this.
> >> > > > > > > Here
> >> > > > > > > is a link to KIP-584
> >> <https://issues.apache.org/jira/browse/KIP-584>:
> >> > > > > > >
> >> > > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > >  .
> >> > > > > > >
> >> > > > > > > Thank you!
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Cheers,
> >> > > > > > > Kowshik
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hi all,

Any other feedback on this KIP before we start the vote?


Cheers,
Kowshik

On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hey Jun,
>
> Thanks a lot for the great feedback! Please note that the design
> has changed a little bit on the KIP, and we now propagate the finalized
> features metadata only via ZK watches (instead of UpdateMetadataRequest
> from the controller).
>
> Please find below my response to your questions/feedback, with the prefix
> "(Kowshik):".
>
> > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > 100.1 Since this request waits for responses from brokers, should we add
> a
> > timeout in the request (like createTopicRequest)?
>
> (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> longer
> wait for responses from brokers, since the design has been changed so that
> the
> features information is propagated via ZK. Nevertheless, it is right to
> have a timeout
> for the request.
>
> > 100.2 The response schema is a bit weird. Typically, the response just
> > shows an error code and an error message, instead of echoing the request.
>
> (Kowshik): Great point! Yeah, I have modified it to just return an error
> code and a message.
> Previously it was not echoing the "request", rather it was returning the
> latest set of
> cluster-wide finalized features (after applying the updates). But you are
> right,
> the additional info is not required, so I have removed it from the
> response schema.
>
> > 100.3 Should we add a separate request to list/describe the existing
> > features?
>
> (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
> Admin API,
> which, underneath covers uses the ApiVersionsRequest to list/describe the
> existing features. Please read the 'Tooling support' section.
>
> > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > DELETE, the version field doesn't make sense. So, I guess the broker just
> > ignores this? An alternative way is to have a separate
> DeleteFeaturesRequest
>
> (Kowshik): Great point! I have modified the KIP now to have 2 separate
> controller APIs
> serving these different purposes:
> 1. updateFeatures
> 2. deleteFeatures
>
> > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > version of the metadata for finalized features." I am wondering why the
> > ordering is important?
>
> (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> version), and
> it is just the ZK node version. Basically, this is the epoch for the
> cluster-wide
> finalized feature version metadata. This metadata is served to clients via
> the
> ApiVersionsResponse (for reads). We propagate updates from the '/features'
> ZK node
> to all brokers, via ZK watches setup by each broker on the '/features'
> node.
>
> Now here is why the ordering is important:
> ZK watches don't propagate at the same time. As a result, the
> ApiVersionsResponse
> is eventually consistent across brokers. This can introduce cases
> where clients see an older lower epoch of the features metadata, after a
> more recent
> higher epoch was returned at a previous point in time. We expect clients
> to always employ the rule that the latest received higher epoch of metadata
> always trumps an older smaller epoch. Those clients that are external to
> Kafka should strongly consider discovering the latest metadata once during
> startup from the brokers, and if required refresh the metadata periodically
> (to get the latest metadata).
>
> > 100.6 Could you specify the required ACL for this new request?
>
> (Kowshik): What is ACL, and how could I find out which one to specify?
> Please could you provide me some pointers? I'll be glad to update the
> KIP once I know the next steps.
>
> > 101. For the broker registration ZK node, should we bump up the version
> in
> the json?
>
> (Kowshik): Great point! Done. I've increased the version in the broker
> json by 1.
>
> > 102. For the /features ZK node, not sure if we need the epoch field. Each
> > ZK node has an internal version field that is incremented on every
> update.
>
> (Kowshik): Great point! Done. I'm using the ZK node version now, instead
> of explicitly
> incremented epoch.
>
> > 103. "Enabling the actual semantics of a feature version cluster-wide is
> > left to the discretion of the logic implementing the feature (ex: can be
> > done via dynamic broker config)." Does that mean the broker registration
> ZK
> > node will be updated dynamically when this happens?
>
> (Kowshik): Not really. The text was just conveying that a broker could
> "know" of
> a new feature version, but it does not mean the broker should have also
> activated the effects of the feature version. Knowing vs activation are 2
> separate things,
> and the latter can be achieved by dynamic config. I have reworded the text
> to
> make this clear to the reader.
>
>
> > 104. UpdateMetadataRequest
> > 104.1 It would be useful to describe when the feature metadata is
> included
> > in the request. My understanding is that it's only included if (1) there
> is
> > a change to the finalized feature; (2) broker restart; (3) controller
> > failover.
> > 104.2 The new fields have the following versions. Why are the versions 3+
> > when the top version is bumped to 6?
> >       "fields":  [
> >         {"name": "Name", "type":  "string", "versions":  "3+",
> >           "about": "The name of the feature."},
> >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >           "about": "The finalized version for the feature."}
> >       ]
>
> (Kowshik): With the new improved design, we have completely eliminated the
> need to
> use UpdateMetadataRequest. This is because we now rely on ZK to deliver the
> notifications for changes to the '/features' ZK node.
>
> > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> better
> > to use enable/disable?
>
> (Kowshik): For delete, yes, I have changed it so that we instead call it
> 'disable'.
> However for 'update', it can now also refer to either an upgrade or a
> forced downgrade.
> Therefore, I have left it the way it is, just calling it as just 'update'.
>
>
> Cheers,
> Kowshik
>
> On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Kowshik,
>>
>> Thanks for the KIP. Looks good overall. A few comments below.
>>
>> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
>> 100.1 Since this request waits for responses from brokers, should we add a
>> timeout in the request (like createTopicRequest)?
>> 100.2 The response schema is a bit weird. Typically, the response just
>> shows an error code and an error message, instead of echoing the request.
>> 100.3 Should we add a separate request to list/describe the existing
>> features?
>> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
>> DELETE, the version field doesn't make sense. So, I guess the broker just
>> ignores this? An alternative way is to have a separate
>> DeleteFeaturesRequest
>> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
>> version of the metadata for finalized features." I am wondering why the
>> ordering is important?
>> 100.6 Could you specify the required ACL for this new request?
>>
>> 101. For the broker registration ZK node, should we bump up the version in
>> the json?
>>
>> 102. For the /features ZK node, not sure if we need the epoch field. Each
>> ZK node has an internal version field that is incremented on every update.
>>
>> 103. "Enabling the actual semantics of a feature version cluster-wide is
>> left to the discretion of the logic implementing the feature (ex: can be
>> done via dynamic broker config)." Does that mean the broker registration
>> ZK
>> node will be updated dynamically when this happens?
>>
>> 104. UpdateMetadataRequest
>> 104.1 It would be useful to describe when the feature metadata is included
>> in the request. My understanding is that it's only included if (1) there
>> is
>> a change to the finalized feature; (2) broker restart; (3) controller
>> failover.
>> 104.2 The new fields have the following versions. Why are the versions 3+
>> when the top version is bumped to 6?
>>       "fields":  [
>>         {"name": "Name", "type":  "string", "versions":  "3+",
>>           "about": "The name of the feature."},
>>         {"name":  "Version", "type":  "int64", "versions":  "3+",
>>           "about": "The finalized version for the feature."}
>>       ]
>>
>> 105. kafka-features.sh: Instead of using update/delete, perhaps it's
>> better
>> to use enable/disable?
>>
>> Jun
>>
>> On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kp...@confluent.io>
>> wrote:
>>
>> > Hey Boyang,
>> >
>> > Thanks for the great feedback! I have updated the KIP based on your
>> > feedback.
>> > Please find my response below for your comments, look for sentences
>> > starting
>> > with "(Kowshik)" below.
>> >
>> >
>> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
>> could
>> > be
>> > > converted as "When is it safe for the brokers to start serving new
>> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
>> > > context.
>> >
>> > (Kowshik): Great point! Done.
>> >
>> > > 2. In the *Explanation *section, the metadata version number part
>> seems a
>> > > bit blurred. Could you point a reference to later section that we
>> going
>> > to
>> > > store it in Zookeeper and update it every time when there is a feature
>> > > change?
>> >
>> > (Kowshik): Great point! Done. I've added a reference in the KIP.
>> >
>> >
>> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
>> > > features such as group coordinator semantics, there is no legal
>> scenario
>> > to
>> > > perform a downgrade at all. So having downgrade door open is pretty
>> > > error-prone as human faults happen all the time. I'm assuming as new
>> > > features are implemented, it's not very hard to add a flag during
>> feature
>> > > creation to indicate whether this feature is "downgradable". Could you
>> > > explain a bit more on the extra engineering effort for shipping this
>> KIP
>> > > with downgrade protection in place?
>> >
>> > (Kowshik): Great point! I'd agree and disagree here. While I agree that
>> > accidental
>> > downgrades can cause problems, I also think sometimes downgrades should
>> > be allowed for emergency reasons (not all downgrades cause issues).
>> > It is just subjective to the feature being downgraded.
>> >
>> > To be more strict about feature version downgrades, I have modified the
>> KIP
>> > proposing that we mandate a `--force-downgrade` flag be used in the
>> > UPDATE_FEATURES api
>> > and the tooling, whenever the human is downgrading a finalized feature
>> > version.
>> > Hopefully this should cover the requirement, until we find the need for
>> > advanced downgrade support.
>> >
>> > > 4. "Each broker’s supported dictionary of feature versions will be
>> > defined
>> > > in the broker code." So this means in order to restrict a certain
>> > feature,
>> > > we need to start the broker first and then send a feature gating
>> request
>> > > immediately, which introduces a time gap and the intended-to-close
>> > feature
>> > > could actually serve request during this phase. Do you think we should
>> > also
>> > > support configurations as well so that admin user could freely roll
>> up a
>> > > cluster with all nodes complying the same feature gating, without
>> > worrying
>> > > about the turnaround time to propagate the message only after the
>> cluster
>> > > starts up?
>> >
>> > (Kowshik): This is a great point/question. One of the expectations out
>> of
>> > this KIP, which is
>> > already followed in the broker, is the following.
>> >  - Imagine at time T1 the broker starts up and registers it’s presence
>> in
>> > ZK,
>> >    along with advertising it’s supported features.
>> >  - Imagine at a future time T2 the broker receives the
>> > UpdateMetadataRequest
>> >    from the controller, which contains the latest finalized features as
>> > seen by
>> >    the controller. The broker validates this data against it’s supported
>> > features to
>> >    make sure there is no mismatch (it will shutdown if there is an
>> > incompatibility).
>> >
>> > It is expected that during the time between the 2 events T1 and T2, the
>> > broker is
>> > almost a silent entity in the cluster. It does not add any value to the
>> > cluster, or carry
>> > out any important broker activities. By “important”, I mean it is not
>> doing
>> > mutations
>> > on it’s persistence, not mutating critical in-memory state, won’t be
>> > serving
>> > produce/fetch requests. Note it doesn’t even know it’s assigned
>> partitions
>> > until
>> > it receives UpdateMetadataRequest from controller. Anything the broker
>> is
>> > doing up
>> > until this point is not damaging/useful.
>> >
>> > I’ve clarified the above in the KIP, see this new section:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
>> > .
>> >
>> > > 5. "adding a new Feature, updating or deleting an existing Feature",
>> may
>> > be
>> > > I misunderstood something, I thought the features are defined in
>> broker
>> > > code, so admin could not really create a new feature?
>> >
>> > (Kowshik): Great point! You understood this right. Here adding a feature
>> > means we are
>> > adding a cluster-wide finalized *max* version for a feature that was
>> > previously never finalized.
>> > I have clarified this in the KIP now.
>> >
>> > > 6. I think we need a separate error code like
>> FEATURE_UPDATE_IN_PROGRESS
>> > to
>> > > reject a concurrent feature update request.
>> >
>> > (Kowshik): Great point! I have modified the KIP adding the above (see
>> > 'Tooling support -> Admin API changes').
>> >
>> > > 7. I think we haven't discussed the alternative solution to pass the
>> > > feature information through Zookeeper. Is that mentioned in the KIP to
>> > > justify why using UpdateMetadata is more favorable?
>> >
>> > (Kowshik): Nice question! The broker reads finalized feature info
>> stored in
>> > ZK,
>> > only during startup when it does a validation. When serving
>> > `ApiVersionsRequest`, the
>> > broker does not read this info from ZK directly. I'd imagine the risk is
>> > that it can increase
>> > the ZK read QPS which can be a bottleneck for the system. Today, in
>> Kafka
>> > we use the
>> > controller to fan out ZK updates to brokers and we want to stick to that
>> > pattern to avoid
>> > the ZK read bottleneck when serving `ApiVersionsRequest`.
>> >
>> > > 8. I was under the impression that user could configure a range of
>> > > supported versions, what's the trade-off for allowing single finalized
>> > > version only?
>> >
>> > (Kowshik): Great question! The finalized version of a feature basically
>> > refers to
>> > the cluster-wide finalized feature "maximum" version. For example, if
>> the
>> > 'group_coordinator' feature
>> > has the finalized version set to 10, then, it means that cluster-wide
>> all
>> > versions upto v10 are
>> > supported for this feature. However, note that if some version (ex: v0)
>> > gets deprecated
>> > for this feature, then we don’t convey that using this scheme (also
>> > supporting deprecation is a non-goal).
>> >
>> > (Kowshik): I’ve now modified the KIP at all points, refering to
>> finalized
>> > feature "maximum" versions.
>> >
>> > > 9. One minor syntax fix: Note that here the "client" here may be a
>> > producer
>> >
>> > (Kowshik): Great point! Done.
>> >
>> >
>> > Cheers,
>> > Kowshik
>> >
>> >
>> > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <reluctanthero104@gmail.com
>> >
>> > wrote:
>> >
>> > > Hey Kowshik,
>> > >
>> > > thanks for the revised KIP. Got a couple of questions:
>> > >
>> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
>> could
>> > be
>> > > converted as "When is it safe for the brokers to start serving new
>> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
>> > > context.
>> > >
>> > > 2. In the *Explanation *section, the metadata version number part
>> seems a
>> > > bit blurred. Could you point a reference to later section that we
>> going
>> > to
>> > > store it in Zookeeper and update it every time when there is a feature
>> > > change?
>> > >
>> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
>> > > features such as group coordinator semantics, there is no legal
>> scenario
>> > to
>> > > perform a downgrade at all. So having downgrade door open is pretty
>> > > error-prone as human faults happen all the time. I'm assuming as new
>> > > features are implemented, it's not very hard to add a flag during
>> feature
>> > > creation to indicate whether this feature is "downgradable". Could you
>> > > explain a bit more on the extra engineering effort for shipping this
>> KIP
>> > > with downgrade protection in place?
>> > >
>> > > 4. "Each broker’s supported dictionary of feature versions will be
>> > defined
>> > > in the broker code." So this means in order to restrict a certain
>> > feature,
>> > > we need to start the broker first and then send a feature gating
>> request
>> > > immediately, which introduces a time gap and the intended-to-close
>> > feature
>> > > could actually serve request during this phase. Do you think we should
>> > also
>> > > support configurations as well so that admin user could freely roll
>> up a
>> > > cluster with all nodes complying the same feature gating, without
>> > worrying
>> > > about the turnaround time to propagate the message only after the
>> cluster
>> > > starts up?
>> > >
>> > > 5. "adding a new Feature, updating or deleting an existing Feature",
>> may
>> > be
>> > > I misunderstood something, I thought the features are defined in
>> broker
>> > > code, so admin could not really create a new feature?
>> > >
>> > > 6. I think we need a separate error code like
>> FEATURE_UPDATE_IN_PROGRESS
>> > to
>> > > reject a concurrent feature update request.
>> > >
>> > > 7. I think we haven't discussed the alternative solution to pass the
>> > > feature information through Zookeeper. Is that mentioned in the KIP to
>> > > justify why using UpdateMetadata is more favorable?
>> > >
>> > > 8. I was under the impression that user could configure a range of
>> > > supported versions, what's the trade-off for allowing single finalized
>> > > version only?
>> > >
>> > > 9. One minor syntax fix: Note that here the "client" here may be a
>> > producer
>> > >
>> > > Boyang
>> > >
>> > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
>> wrote:
>> > >
>> > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
>> > > > > Hi Colin,
>> > > > >
>> > > > > Thanks for the feedback! I've changed the KIP to address your
>> > > > > suggestions.
>> > > > > Please find below my explanation. Here is a link to KIP 584:
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > .
>> > > > >
>> > > > > 1. '__data_version__' is the version of the finalized feature
>> > metadata
>> > > > > (i.e. actual ZK node contents), while the '__schema_version__' is
>> the
>> > > > > version of the schema of the data persisted in ZK. These serve
>> > > different
>> > > > > purposes. '__data_version__' is is useful mainly to clients during
>> > > reads,
>> > > > > to differentiate between the 2 versions of eventually consistent
>> > > > 'finalized
>> > > > > features' metadata (i.e. larger metadata version is more recent).
>> > > > > '__schema_version__' provides an additional degree of flexibility,
>> > > where
>> > > > if
>> > > > > we decide to change the schema for '/features' node in ZK (in the
>> > > > future),
>> > > > > then we can manage broker roll outs suitably (i.e.
>> > > > > serialization/deserialization of the ZK data can be handled
>> safely).
>> > > >
>> > > > Hi Kowshik,
>> > > >
>> > > > If you're talking about a number that lets you know if data is more
>> or
>> > > > less recent, we would typically call that an epoch, and not a
>> version.
>> > > For
>> > > > the ZK data structures, the word "version" is typically reserved for
>> > > > describing changes to the overall schema of the data that is
>> written to
>> > > > ZooKeeper.  We don't even really change the "version" of those
>> schemas
>> > > that
>> > > > much, since most changes are backwards-compatible.  But we do
>> include
>> > > that
>> > > > version field just in case.
>> > > >
>> > > > I don't think we really need an epoch here, though, since we can
>> just
>> > > look
>> > > > at the broker epoch.  Whenever the broker registers, its epoch will
>> be
>> > > > greater than the previous broker epoch.  And the newly registered
>> data
>> > > will
>> > > > take priority.  This will be a lot simpler than adding a separate
>> epoch
>> > > > system, I think.
>> > > >
>> > > > >
>> > > > > 2. Regarding admin client needing min and max information - you
>> are
>> > > > right!
>> > > > > I've changed the KIP such that the Admin API also allows the user
>> to
>> > > read
>> > > > > 'supported features' from a specific broker. Please look at the
>> > section
>> > > > > "Admin API changes".
>> > > >
>> > > > Thanks.
>> > > >
>> > > > >
>> > > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
>> > I've
>> > > > > improved the KIP to just use `long` at all places.
>> > > >
>> > > > Sounds good.
>> > > >
>> > > > >
>> > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right! I've
>> > > > updated
>> > > > > the KIP sketching the functionality provided by this tool, with
>> some
>> > > > > examples. Please look at the section "Tooling support examples".
>> > > > >
>> > > > > Thank you!
>> > > >
>> > > >
>> > > > Thanks, Kowshik.
>> > > >
>> > > > cheers,
>> > > > Colin
>> > > >
>> > > > >
>> > > > >
>> > > > > Cheers,
>> > > > > Kowshik
>> > > > >
>> > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <cmccabe@apache.org
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Thanks, Kowshik, this looks good.
>> > > > > >
>> > > > > > In the "Schema" section, do we really need both
>> __schema_version__
>> > > and
>> > > > > > __data_version__?  Can we just have a single version field here?
>> > > > > >
>> > > > > > Shouldn't the Admin(Client) function have some way to get the
>> min
>> > and
>> > > > max
>> > > > > > information that we're exposing as well?  I guess we could have
>> > min,
>> > > > max,
>> > > > > > and current.  Unrelated: is the use of Long rather than long
>> > > deliberate
>> > > > > > here?
>> > > > > >
>> > > > > > It would be good to describe how the command line tool
>> > > > > > kafka.admin.FeatureCommand will work.  For example the flags
>> that
>> > it
>> > > > will
>> > > > > > take and the output that it will generate to STDOUT.
>> > > > > >
>> > > > > > cheers,
>> > > > > > Colin
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
>> > > > > > > Hi all,
>> > > > > > >
>> > > > > > > I've opened KIP-584
>> <https://issues.apache.org/jira/browse/KIP-584> <
>> > https://issues.apache.org/jira/browse/KIP-584
>> > > >
>> > > > > > > which
>> > > > > > > is intended to provide a versioning scheme for features. I'd
>> like
>> > > to
>> > > > use
>> > > > > > > this thread to discuss the same. I'd appreciate any feedback
>> on
>> > > this.
>> > > > > > > Here
>> > > > > > > is a link to KIP-584
>> <https://issues.apache.org/jira/browse/KIP-584>:
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > >  .
>> > > > > > >
>> > > > > > > Thank you!
>> > > > > > >
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Kowshik
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hey Jun,

Thanks a lot for the great feedback! Please note that the design
has changed a little bit on the KIP, and we now propagate the finalized
features metadata only via ZK watches (instead of UpdateMetadataRequest
from the controller).

Please find below my response to your questions/feedback, with the prefix
"(Kowshik):".

> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> 100.1 Since this request waits for responses from brokers, should we add a
> timeout in the request (like createTopicRequest)?

(Kowshik): Great point! Done. I have added a timeout field. Note: we no
longer
wait for responses from brokers, since the design has been changed so that
the
features information is propagated via ZK. Nevertheless, it is right to
have a timeout
for the request.

> 100.2 The response schema is a bit weird. Typically, the response just
> shows an error code and an error message, instead of echoing the request.

(Kowshik): Great point! Yeah, I have modified it to just return an error
code and a message.
Previously it was not echoing the "request", rather it was returning the
latest set of
cluster-wide finalized features (after applying the updates). But you are
right,
the additional info is not required, so I have removed it from the response
schema.

> 100.3 Should we add a separate request to list/describe the existing
> features?

(Kowshik): This is already present in the KIP via the 'DescribeFeatures'
Admin API,
which, underneath covers uses the ApiVersionsRequest to list/describe the
existing features. Please read the 'Tooling support' section.

> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> DELETE, the version field doesn't make sense. So, I guess the broker just
> ignores this? An alternative way is to have a separate
DeleteFeaturesRequest

(Kowshik): Great point! I have modified the KIP now to have 2 separate
controller APIs
serving these different purposes:
1. updateFeatures
2. deleteFeatures

> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> version of the metadata for finalized features." I am wondering why the
> ordering is important?

(Kowshik): In the latest KIP write-up, it is called epoch (instead of
version), and
it is just the ZK node version. Basically, this is the epoch for the
cluster-wide
finalized feature version metadata. This metadata is served to clients via
the
ApiVersionsResponse (for reads). We propagate updates from the '/features'
ZK node
to all brokers, via ZK watches setup by each broker on the '/features' node.

Now here is why the ordering is important:
ZK watches don't propagate at the same time. As a result, the
ApiVersionsResponse
is eventually consistent across brokers. This can introduce cases
where clients see an older lower epoch of the features metadata, after a
more recent
higher epoch was returned at a previous point in time. We expect clients
to always employ the rule that the latest received higher epoch of metadata
always trumps an older smaller epoch. Those clients that are external to
Kafka should strongly consider discovering the latest metadata once during
startup from the brokers, and if required refresh the metadata periodically
(to get the latest metadata).

> 100.6 Could you specify the required ACL for this new request?

(Kowshik): What is ACL, and how could I find out which one to specify?
Please could you provide me some pointers? I'll be glad to update the
KIP once I know the next steps.

> 101. For the broker registration ZK node, should we bump up the version in
the json?

(Kowshik): Great point! Done. I've increased the version in the broker json
by 1.

> 102. For the /features ZK node, not sure if we need the epoch field. Each
> ZK node has an internal version field that is incremented on every update.

(Kowshik): Great point! Done. I'm using the ZK node version now, instead of
explicitly
incremented epoch.

> 103. "Enabling the actual semantics of a feature version cluster-wide is
> left to the discretion of the logic implementing the feature (ex: can be
> done via dynamic broker config)." Does that mean the broker registration
ZK
> node will be updated dynamically when this happens?

(Kowshik): Not really. The text was just conveying that a broker could
"know" of
a new feature version, but it does not mean the broker should have also
activated the effects of the feature version. Knowing vs activation are 2
separate things,
and the latter can be achieved by dynamic config. I have reworded the text
to
make this clear to the reader.


> 104. UpdateMetadataRequest
> 104.1 It would be useful to describe when the feature metadata is included
> in the request. My understanding is that it's only included if (1) there
is
> a change to the finalized feature; (2) broker restart; (3) controller
> failover.
> 104.2 The new fields have the following versions. Why are the versions 3+
> when the top version is bumped to 6?
>       "fields":  [
>         {"name": "Name", "type":  "string", "versions":  "3+",
>           "about": "The name of the feature."},
>         {"name":  "Version", "type":  "int64", "versions":  "3+",
>           "about": "The finalized version for the feature."}
>       ]

(Kowshik): With the new improved design, we have completely eliminated the
need to
use UpdateMetadataRequest. This is because we now rely on ZK to deliver the
notifications for changes to the '/features' ZK node.

> 105. kafka-features.sh: Instead of using update/delete, perhaps it's
better
> to use enable/disable?

(Kowshik): For delete, yes, I have changed it so that we instead call it
'disable'.
However for 'update', it can now also refer to either an upgrade or a
forced downgrade.
Therefore, I have left it the way it is, just calling it as just 'update'.


Cheers,
Kowshik

On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the KIP. Looks good overall. A few comments below.
>
> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> 100.1 Since this request waits for responses from brokers, should we add a
> timeout in the request (like createTopicRequest)?
> 100.2 The response schema is a bit weird. Typically, the response just
> shows an error code and an error message, instead of echoing the request.
> 100.3 Should we add a separate request to list/describe the existing
> features?
> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> DELETE, the version field doesn't make sense. So, I guess the broker just
> ignores this? An alternative way is to have a separate
> DeleteFeaturesRequest
> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> version of the metadata for finalized features." I am wondering why the
> ordering is important?
> 100.6 Could you specify the required ACL for this new request?
>
> 101. For the broker registration ZK node, should we bump up the version in
> the json?
>
> 102. For the /features ZK node, not sure if we need the epoch field. Each
> ZK node has an internal version field that is incremented on every update.
>
> 103. "Enabling the actual semantics of a feature version cluster-wide is
> left to the discretion of the logic implementing the feature (ex: can be
> done via dynamic broker config)." Does that mean the broker registration ZK
> node will be updated dynamically when this happens?
>
> 104. UpdateMetadataRequest
> 104.1 It would be useful to describe when the feature metadata is included
> in the request. My understanding is that it's only included if (1) there is
> a change to the finalized feature; (2) broker restart; (3) controller
> failover.
> 104.2 The new fields have the following versions. Why are the versions 3+
> when the top version is bumped to 6?
>       "fields":  [
>         {"name": "Name", "type":  "string", "versions":  "3+",
>           "about": "The name of the feature."},
>         {"name":  "Version", "type":  "int64", "versions":  "3+",
>           "about": "The finalized version for the feature."}
>       ]
>
> 105. kafka-features.sh: Instead of using update/delete, perhaps it's better
> to use enable/disable?
>
> Jun
>
> On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hey Boyang,
> >
> > Thanks for the great feedback! I have updated the KIP based on your
> > feedback.
> > Please find my response below for your comments, look for sentences
> > starting
> > with "(Kowshik)" below.
> >
> >
> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> could
> > be
> > > converted as "When is it safe for the brokers to start serving new
> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > > context.
> >
> > (Kowshik): Great point! Done.
> >
> > > 2. In the *Explanation *section, the metadata version number part
> seems a
> > > bit blurred. Could you point a reference to later section that we going
> > to
> > > store it in Zookeeper and update it every time when there is a feature
> > > change?
> >
> > (Kowshik): Great point! Done. I've added a reference in the KIP.
> >
> >
> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > > features such as group coordinator semantics, there is no legal
> scenario
> > to
> > > perform a downgrade at all. So having downgrade door open is pretty
> > > error-prone as human faults happen all the time. I'm assuming as new
> > > features are implemented, it's not very hard to add a flag during
> feature
> > > creation to indicate whether this feature is "downgradable". Could you
> > > explain a bit more on the extra engineering effort for shipping this
> KIP
> > > with downgrade protection in place?
> >
> > (Kowshik): Great point! I'd agree and disagree here. While I agree that
> > accidental
> > downgrades can cause problems, I also think sometimes downgrades should
> > be allowed for emergency reasons (not all downgrades cause issues).
> > It is just subjective to the feature being downgraded.
> >
> > To be more strict about feature version downgrades, I have modified the
> KIP
> > proposing that we mandate a `--force-downgrade` flag be used in the
> > UPDATE_FEATURES api
> > and the tooling, whenever the human is downgrading a finalized feature
> > version.
> > Hopefully this should cover the requirement, until we find the need for
> > advanced downgrade support.
> >
> > > 4. "Each broker’s supported dictionary of feature versions will be
> > defined
> > > in the broker code." So this means in order to restrict a certain
> > feature,
> > > we need to start the broker first and then send a feature gating
> request
> > > immediately, which introduces a time gap and the intended-to-close
> > feature
> > > could actually serve request during this phase. Do you think we should
> > also
> > > support configurations as well so that admin user could freely roll up
> a
> > > cluster with all nodes complying the same feature gating, without
> > worrying
> > > about the turnaround time to propagate the message only after the
> cluster
> > > starts up?
> >
> > (Kowshik): This is a great point/question. One of the expectations out of
> > this KIP, which is
> > already followed in the broker, is the following.
> >  - Imagine at time T1 the broker starts up and registers it’s presence in
> > ZK,
> >    along with advertising it’s supported features.
> >  - Imagine at a future time T2 the broker receives the
> > UpdateMetadataRequest
> >    from the controller, which contains the latest finalized features as
> > seen by
> >    the controller. The broker validates this data against it’s supported
> > features to
> >    make sure there is no mismatch (it will shutdown if there is an
> > incompatibility).
> >
> > It is expected that during the time between the 2 events T1 and T2, the
> > broker is
> > almost a silent entity in the cluster. It does not add any value to the
> > cluster, or carry
> > out any important broker activities. By “important”, I mean it is not
> doing
> > mutations
> > on it’s persistence, not mutating critical in-memory state, won’t be
> > serving
> > produce/fetch requests. Note it doesn’t even know it’s assigned
> partitions
> > until
> > it receives UpdateMetadataRequest from controller. Anything the broker is
> > doing up
> > until this point is not damaging/useful.
> >
> > I’ve clarified the above in the KIP, see this new section:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > .
> >
> > > 5. "adding a new Feature, updating or deleting an existing Feature",
> may
> > be
> > > I misunderstood something, I thought the features are defined in broker
> > > code, so admin could not really create a new feature?
> >
> > (Kowshik): Great point! You understood this right. Here adding a feature
> > means we are
> > adding a cluster-wide finalized *max* version for a feature that was
> > previously never finalized.
> > I have clarified this in the KIP now.
> >
> > > 6. I think we need a separate error code like
> FEATURE_UPDATE_IN_PROGRESS
> > to
> > > reject a concurrent feature update request.
> >
> > (Kowshik): Great point! I have modified the KIP adding the above (see
> > 'Tooling support -> Admin API changes').
> >
> > > 7. I think we haven't discussed the alternative solution to pass the
> > > feature information through Zookeeper. Is that mentioned in the KIP to
> > > justify why using UpdateMetadata is more favorable?
> >
> > (Kowshik): Nice question! The broker reads finalized feature info stored
> in
> > ZK,
> > only during startup when it does a validation. When serving
> > `ApiVersionsRequest`, the
> > broker does not read this info from ZK directly. I'd imagine the risk is
> > that it can increase
> > the ZK read QPS which can be a bottleneck for the system. Today, in Kafka
> > we use the
> > controller to fan out ZK updates to brokers and we want to stick to that
> > pattern to avoid
> > the ZK read bottleneck when serving `ApiVersionsRequest`.
> >
> > > 8. I was under the impression that user could configure a range of
> > > supported versions, what's the trade-off for allowing single finalized
> > > version only?
> >
> > (Kowshik): Great question! The finalized version of a feature basically
> > refers to
> > the cluster-wide finalized feature "maximum" version. For example, if the
> > 'group_coordinator' feature
> > has the finalized version set to 10, then, it means that cluster-wide all
> > versions upto v10 are
> > supported for this feature. However, note that if some version (ex: v0)
> > gets deprecated
> > for this feature, then we don’t convey that using this scheme (also
> > supporting deprecation is a non-goal).
> >
> > (Kowshik): I’ve now modified the KIP at all points, refering to finalized
> > feature "maximum" versions.
> >
> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > producer
> >
> > (Kowshik): Great point! Done.
> >
> >
> > Cheers,
> > Kowshik
> >
> >
> > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <re...@gmail.com>
> > wrote:
> >
> > > Hey Kowshik,
> > >
> > > thanks for the revised KIP. Got a couple of questions:
> > >
> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> could
> > be
> > > converted as "When is it safe for the brokers to start serving new
> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > > context.
> > >
> > > 2. In the *Explanation *section, the metadata version number part
> seems a
> > > bit blurred. Could you point a reference to later section that we going
> > to
> > > store it in Zookeeper and update it every time when there is a feature
> > > change?
> > >
> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > > features such as group coordinator semantics, there is no legal
> scenario
> > to
> > > perform a downgrade at all. So having downgrade door open is pretty
> > > error-prone as human faults happen all the time. I'm assuming as new
> > > features are implemented, it's not very hard to add a flag during
> feature
> > > creation to indicate whether this feature is "downgradable". Could you
> > > explain a bit more on the extra engineering effort for shipping this
> KIP
> > > with downgrade protection in place?
> > >
> > > 4. "Each broker’s supported dictionary of feature versions will be
> > defined
> > > in the broker code." So this means in order to restrict a certain
> > feature,
> > > we need to start the broker first and then send a feature gating
> request
> > > immediately, which introduces a time gap and the intended-to-close
> > feature
> > > could actually serve request during this phase. Do you think we should
> > also
> > > support configurations as well so that admin user could freely roll up
> a
> > > cluster with all nodes complying the same feature gating, without
> > worrying
> > > about the turnaround time to propagate the message only after the
> cluster
> > > starts up?
> > >
> > > 5. "adding a new Feature, updating or deleting an existing Feature",
> may
> > be
> > > I misunderstood something, I thought the features are defined in broker
> > > code, so admin could not really create a new feature?
> > >
> > > 6. I think we need a separate error code like
> FEATURE_UPDATE_IN_PROGRESS
> > to
> > > reject a concurrent feature update request.
> > >
> > > 7. I think we haven't discussed the alternative solution to pass the
> > > feature information through Zookeeper. Is that mentioned in the KIP to
> > > justify why using UpdateMetadata is more favorable?
> > >
> > > 8. I was under the impression that user could configure a range of
> > > supported versions, what's the trade-off for allowing single finalized
> > > version only?
> > >
> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > producer
> > >
> > > Boyang
> > >
> > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> wrote:
> > >
> > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > Hi Colin,
> > > > >
> > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > suggestions.
> > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > .
> > > > >
> > > > > 1. '__data_version__' is the version of the finalized feature
> > metadata
> > > > > (i.e. actual ZK node contents), while the '__schema_version__' is
> the
> > > > > version of the schema of the data persisted in ZK. These serve
> > > different
> > > > > purposes. '__data_version__' is is useful mainly to clients during
> > > reads,
> > > > > to differentiate between the 2 versions of eventually consistent
> > > > 'finalized
> > > > > features' metadata (i.e. larger metadata version is more recent).
> > > > > '__schema_version__' provides an additional degree of flexibility,
> > > where
> > > > if
> > > > > we decide to change the schema for '/features' node in ZK (in the
> > > > future),
> > > > > then we can manage broker roll outs suitably (i.e.
> > > > > serialization/deserialization of the ZK data can be handled
> safely).
> > > >
> > > > Hi Kowshik,
> > > >
> > > > If you're talking about a number that lets you know if data is more
> or
> > > > less recent, we would typically call that an epoch, and not a
> version.
> > > For
> > > > the ZK data structures, the word "version" is typically reserved for
> > > > describing changes to the overall schema of the data that is written
> to
> > > > ZooKeeper.  We don't even really change the "version" of those
> schemas
> > > that
> > > > much, since most changes are backwards-compatible.  But we do include
> > > that
> > > > version field just in case.
> > > >
> > > > I don't think we really need an epoch here, though, since we can just
> > > look
> > > > at the broker epoch.  Whenever the broker registers, its epoch will
> be
> > > > greater than the previous broker epoch.  And the newly registered
> data
> > > will
> > > > take priority.  This will be a lot simpler than adding a separate
> epoch
> > > > system, I think.
> > > >
> > > > >
> > > > > 2. Regarding admin client needing min and max information - you are
> > > > right!
> > > > > I've changed the KIP such that the Admin API also allows the user
> to
> > > read
> > > > > 'supported features' from a specific broker. Please look at the
> > section
> > > > > "Admin API changes".
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
> > I've
> > > > > improved the KIP to just use `long` at all places.
> > > >
> > > > Sounds good.
> > > >
> > > > >
> > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right! I've
> > > > updated
> > > > > the KIP sketching the functionality provided by this tool, with
> some
> > > > > examples. Please look at the section "Tooling support examples".
> > > > >
> > > > > Thank you!
> > > >
> > > >
> > > > Thanks, Kowshik.
> > > >
> > > > cheers,
> > > > Colin
> > > >
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <cm...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Thanks, Kowshik, this looks good.
> > > > > >
> > > > > > In the "Schema" section, do we really need both
> __schema_version__
> > > and
> > > > > > __data_version__?  Can we just have a single version field here?
> > > > > >
> > > > > > Shouldn't the Admin(Client) function have some way to get the min
> > and
> > > > max
> > > > > > information that we're exposing as well?  I guess we could have
> > min,
> > > > max,
> > > > > > and current.  Unrelated: is the use of Long rather than long
> > > deliberate
> > > > > > here?
> > > > > >
> > > > > > It would be good to describe how the command line tool
> > > > > > kafka.admin.FeatureCommand will work.  For example the flags that
> > it
> > > > will
> > > > > > take and the output that it will generate to STDOUT.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I've opened KIP-584
> <https://issues.apache.org/jira/browse/KIP-584> <
> > https://issues.apache.org/jira/browse/KIP-584
> > > >
> > > > > > > which
> > > > > > > is intended to provide a versioning scheme for features. I'd
> like
> > > to
> > > > use
> > > > > > > this thread to discuss the same. I'd appreciate any feedback on
> > > this.
> > > > > > > Here
> > > > > > > is a link to KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > >  .
> > > > > > >
> > > > > > > Thank you!
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hey Boyang,

Great point! You are right, thanks for the suggestion!
Yes, we can just use ZK watches to propagate finalized features
information. I have updated the KIP write up with this change.
As a result, I feel the design is simpler as we have also eliminated
the changes to UpdateMetadataRequest.

You are right, after exploring/discussing KIP-500 further, we have now
realized that taking a ZK dependency here in this KIP just for reads is OK.
The future migration path off ZK (in post ZK world) will simply involve
reading the finalized features from the controller quorum via the new
MetadataFetch API that's proposed in KIP-500.

Also note that in the latest KIP write-up, the features metadata epoch
is just the ZK node version (as suggested by Jun).

Hey Colin,

Please feel free to let us know if you have any questions or concerns
on the above.


Cheers,
Kowshik

On Thu, Apr 2, 2020 at 10:39 AM Boyang Chen <re...@gmail.com>
wrote:

> Thanks for the reply. The only remaining question is the propagation path.
> KIP-500 <https://issues.apache.org/jira/browse/KIP-500> only restricts
> `write access` to the controller, in a sense that
> brokers in the pre-KIP-500 <https://issues.apache.org/jira/browse/KIP-500>
> world could still listen to Zookeeper
> notifications. Thus, we are open to discuss the engineering effort to go
> through Zookeeper vs UpdateMetadata routing. What's your opinion on this
> matter? Will either path significantly simpler than another?
>
> Boyang
>
> On Wed, Apr 1, 2020 at 12:10 AM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hey Boyang,
> >
> > Thanks for the feedback! Please find below my response to your latest
> > comments.
> > I have modified the KIP wherever possible to address the comments.
> >
> > > My point is that during a bootstrapping stage of a cluster, we could
> not
> > > pick the desired feature version as no controller is actively handling
> > our
> > > request.
> >
> > (Kowshik): Note that just deploying the latest broker binary does not
> > always mean that the
> > new version of a certain feature will be automatically activated.
> Enabling
> > the effects of the
> > actual feature version is still left to the discretion of the
> > implementation logic for  the feature.
> > For example, for safety reasons, the feature can still be gated behind a
> > dynamic config
> > and later activated when the time comes.
> >
> > > Feature changes should be roughly the same frequency as config changes.
> > > Today, the dynamic configuration changes are propagated via Zookeeper.
> > > So I guess propagating through UpdateMetadata doesn't get us more
> > benefits,
> > > while going through ZK notification should be a simpler solution.
> >
> > (Kowshik): Maybe I'm missing something, but were you suggesting we should
> > have these
> > notifications delivered to the brokers directly via ZK? Note that with
> > KIP-500 <https://issues.apache.org/jira/browse/KIP-500> (where we are
> replacing ZK),
> > for the bridge release we prefer that we will perform all access to ZK in
> > the controller,
> > rather than in other brokers, clients, or tools. Therefore, although ZK
> > will still be
> > required for the bridge release, it will be a well-isolated dependency.
> > Please read
> > this section of KIP-500 <https://issues.apache.org/jira/browse/KIP-500>:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum#KIP-500:ReplaceZooKeeperwithaSelf-ManagedMetadataQuorum-BridgeRelease
> > .
> >
> > Therefore, the existing approach in the KIP is future proof with regards
> to
> > the above requirement.
> > We deliver the ZK notification only via the controller's
> > `UpdateMetadataRequest` to the brokers.
> > We also access ZK only always via the controller.
> >
> > > Understood, I don't feel strong about deprecation, but does the current
> > KIP
> > > keep the door open for future improvements if
> > > someone has a need for feature deprecation? Could we briefly discuss
> > about
> > > it in the future work section?
> >
> > (Kowshik): Done. Please refer to the 'Future work' section:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Futurework
> >
> >
> > Cheers,
> > Kowshik
> >
> >
> > On Tue, Mar 31, 2020 at 9:12 PM Boyang Chen <re...@gmail.com>
> > wrote:
> >
> > > Thanks Kowshik, the answers are making sense. Some follow-ups:
> > >
> > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > >
> > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > 100.1 Since this request waits for responses from brokers, should we
> > add
> > > a
> > > > timeout in the request (like createTopicRequest)?
> > > > 100.2 The response schema is a bit weird. Typically, the response
> just
> > > > shows an error code and an error message, instead of echoing the
> > request.
> > > > 100.3 Should we add a separate request to list/describe the existing
> > > > features?
> > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > > DELETE, the version field doesn't make sense. So, I guess the broker
> > just
> > > > ignores this? An alternative way is to have a separate
> > > > DeleteFeaturesRequest
> > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> increasing
> > > > version of the metadata for finalized features." I am wondering why
> the
> > > > ordering is important?
> > > > 100.6 Could you specify the required ACL for this new request?
> > > >
> > > > 101. For the broker registration ZK node, should we bump up the
> version
> > > in
> > > > the json?
> > > >
> > > > 102. For the /features ZK node, not sure if we need the epoch field.
> > Each
> > > > ZK node has an internal version field that is incremented on every
> > > update.
> > > >
> > > > 103. "Enabling the actual semantics of a feature version cluster-wide
> > is
> > > > left to the discretion of the logic implementing the feature (ex: can
> > be
> > > > done via dynamic broker config)." Does that mean the broker
> > registration
> > > ZK
> > > > node will be updated dynamically when this happens?
> > > >
> > > > 104. UpdateMetadataRequest
> > > > 104.1 It would be useful to describe when the feature metadata is
> > > included
> > > > in the request. My understanding is that it's only included if (1)
> > there
> > > is
> > > > a change to the finalized feature; (2) broker restart; (3) controller
> > > > failover.
> > > > 104.2 The new fields have the following versions. Why are the
> versions
> > 3+
> > > > when the top version is bumped to 6?
> > > >       "fields":  [
> > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > >           "about": "The name of the feature."},
> > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > >           "about": "The finalized version for the feature."}
> > > >       ]
> > > >
> > > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > > better
> > > > to use enable/disable?
> > > >
> > > > Jun
> > > >
> > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > kprakasam@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > Hey Boyang,
> > > > >
> > > > > Thanks for the great feedback! I have updated the KIP based on your
> > > > > feedback.
> > > > > Please find my response below for your comments, look for sentences
> > > > > starting
> > > > > with "(Kowshik)" below.
> > > > >
> > > > >
> > > > > > 1. "When is it safe for the brokers to begin handling EOS
> traffic"
> > > > could
> > > > > be
> > > > > > converted as "When is it safe for the brokers to start serving
> new
> > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> in
> > > the
> > > > > > context.
> > > > >
> > > > > (Kowshik): Great point! Done.
> > > > >
> > >
> > > > > 2. In the *Explanation *section, the metadata version number part
> > seems
> > > > a
> > > > > > bit blurred. Could you point a reference to later section that we
> > > going
> > > > > to
> > > > > > store it in Zookeeper and update it every time when there is a
> > > feature
> > > > > > change?
> > > > >
> > > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > > >
> > > > >
> > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> KIP,
> > > for
> > > > > > features such as group coordinator semantics, there is no legal
> > > > scenario
> > > > > to
> > > > > > perform a downgrade at all. So having downgrade door open is
> pretty
> > > > > > error-prone as human faults happen all the time. I'm assuming as
> > new
> > > > > > features are implemented, it's not very hard to add a flag during
> > > > feature
> > > > > > creation to indicate whether this feature is "downgradable".
> Could
> > > you
> > > > > > explain a bit more on the extra engineering effort for shipping
> > this
> > > > KIP
> > > > > > with downgrade protection in place?
> > > > >
> > > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
> > that
> > > > > accidental
> > > > > downgrades can cause problems, I also think sometimes downgrades
> > should
> > > > > be allowed for emergency reasons (not all downgrades cause issues).
> > > > > It is just subjective to the feature being downgraded.
> > > > >
> > > > > To be more strict about feature version downgrades, I have modified
> > the
> > > > KIP
> > > > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > > > UPDATE_FEATURES api
> > > > > and the tooling, whenever the human is downgrading a finalized
> > feature
> > > > > version.
> > > > > Hopefully this should cover the requirement, until we find the need
> > for
> > > > > advanced downgrade support.
> > > > >
> > > >
> > > +1 for adding this flag.
> > >
> > > > > > 4. "Each broker’s supported dictionary of feature versions will
> be
> > > > > defined
> > > > > > in the broker code." So this means in order to restrict a certain
> > > > > feature,
> > > > > > we need to start the broker first and then send a feature gating
> > > > request
> > > > > > immediately, which introduces a time gap and the
> intended-to-close
> > > > > feature
> > > > > > could actually serve request during this phase. Do you think we
> > > should
> > > > > also
> > > > > > support configurations as well so that admin user could freely
> roll
> > > up
> > > > a
> > > > > > cluster with all nodes complying the same feature gating, without
> > > > > worrying
> > > > > > about the turnaround time to propagate the message only after the
> > > > cluster
> > > > > > starts up?
> > > > >
> > > > > (Kowshik): This is a great point/question. One of the expectations
> > out
> > > of
> > > > > this KIP, which is
> > > > > already followed in the broker, is the following.
> > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > presence
> > > in
> > > > > ZK,
> > > > >    along with advertising it’s supported features.
> > > > >  - Imagine at a future time T2 the broker receives the
> > > > > UpdateMetadataRequest
> > > > >    from the controller, which contains the latest finalized
> features
> > as
> > > > > seen by
> > > > >    the controller. The broker validates this data against it’s
> > > supported
> > > > > features to
> > > > >    make sure there is no mismatch (it will shutdown if there is an
> > > > > incompatibility).
> > > > >
> > > > > It is expected that during the time between the 2 events T1 and T2,
> > the
> > > > > broker is
> > > > > almost a silent entity in the cluster. It does not add any value to
> > the
> > > > > cluster, or carry
> > > > > out any important broker activities. By “important”, I mean it is
> not
> > > > doing
> > > > > mutations
> > > > > on it’s persistence, not mutating critical in-memory state, won’t
> be
> > > > > serving
> > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > partitions
> > > > > until
> > > > > it receives UpdateMetadataRequest from controller. Anything the
> > broker
> > > is
> > > > > doing up
> > > > > until this point is not damaging/useful.
> > > > >
> > > > > I’ve clarified the above in the KIP, see this new section:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > .
> > > >
> > > > My point is that during a bootstrapping stage of a cluster, we could
> > not
> > > pick the desired feature version as no controller is actively handling
> > our
> > > request. But anyway, I think this is a rare case to discuss, and the
> > added
> > > paragraph looks good :)
> > >
> > >
> > > > > > 5. "adding a new Feature, updating or deleting an existing
> > Feature",
> > > > may
> > > > > be
> > > > > > I misunderstood something, I thought the features are defined in
> > > broker
> > > > > > code, so admin could not really create a new feature?
> > > > >
> > > > > (Kowshik): Great point! You understood this right. Here adding a
> > > feature
> > > > > means we are
> > > > > adding a cluster-wide finalized *max* version for a feature that
> was
> > > > > previously never finalized.
> > > > > I have clarified this in the KIP now.
> > > > >
> > > > > > 6. I think we need a separate error code like
> > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > to
> > > > > > reject a concurrent feature update request.
> > > > >
> > > > > (Kowshik): Great point! I have modified the KIP adding the above
> (see
> > > > > 'Tooling support -> Admin API changes').
> > > > >
> > > > > > 7. I think we haven't discussed the alternative solution to pass
> > the
> > > > > > feature information through Zookeeper. Is that mentioned in the
> KIP
> > > to
> > > > > > justify why using UpdateMetadata is more favorable?
> > > > >
> > > > > (Kowshik): Nice question! The broker reads finalized feature info
> > > stored
> > > > in
> > > > > ZK,
> > > > > only during startup when it does a validation. When serving
> > > > > `ApiVersionsRequest`, the
> > > > > broker does not read this info from ZK directly. I'd imagine the
> risk
> > > is
> > > > > that it can increase
> > > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> > > Kafka
> > > > > we use the
> > > > > controller to fan out ZK updates to brokers and we want to stick to
> > > that
> > > > > pattern to avoid
> > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > >
> > > > Feature changes should be roughly the same frequency as config
> changes.
> > > Today, the dynamic configuration
> > > changes are propagated via Zookeeper. So I guess propagating through
> > > UpdateMetadata doesn't get us more benefits,
> > > while going through ZK notification should be a simpler solution.
> > >
> > > > > 8. I was under the impression that user could configure a range of
> > > > > > supported versions, what's the trade-off for allowing single
> > > finalized
> > > > > > version only?
> > > > >
> > > > > (Kowshik): Great question! The finalized version of a feature
> > basically
> > > > > refers to
> > > > > the cluster-wide finalized feature "maximum" version. For example,
> if
> > > the
> > > > > 'group_coordinator' feature
> > > > > has the finalized version set to 10, then, it means that
> cluster-wide
> > > all
> > > > > versions upto v10 are
> > > > > supported for this feature. However, note that if some version (ex:
> > v0)
> > > > > gets deprecated
> > > > > for this feature, then we don’t convey that using this scheme (also
> > > > > supporting deprecation is a non-goal).
> > > > >
> > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > finalized
> > > > > feature "maximum" versions.
> > > > >
> > > >
> > > Understood, I don't feel strong about deprecation, but does the current
> > KIP
> > > keep the door open for future improvements if
> > > someone has a need for feature deprecation? Could we briefly discuss
> > about
> > > it in the future work section?
> > >
> > >
> > > > > > 9. One minor syntax fix: Note that here the "client" here may be
> a
> > > > > producer
> > > > >
> > > > > (Kowshik): Great point! Done.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > >
> > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > reluctanthero104@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hey Kowshik,
> > > > > >
> > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > >
> > > > > > 1. "When is it safe for the brokers to begin handling EOS
> traffic"
> > > > could
> > > > > be
> > > > > > converted as "When is it safe for the brokers to start serving
> new
> > > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier
> in
> > > the
> > > > > > context.
> > > > > >
> > > > > > 2. In the *Explanation *section, the metadata version number part
> > > > seems a
> > > > > > bit blurred. Could you point a reference to later section that we
> > > going
> > > > > to
> > > > > > store it in Zookeeper and update it every time when there is a
> > > feature
> > > > > > change?
> > > > > >
> > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> KIP,
> > > for
> > > > > > features such as group coordinator semantics, there is no legal
> > > > scenario
> > > > > to
> > > > > > perform a downgrade at all. So having downgrade door open is
> pretty
> > > > > > error-prone as human faults happen all the time. I'm assuming as
> > new
> > > > > > features are implemented, it's not very hard to add a flag during
> > > > feature
> > > > > > creation to indicate whether this feature is "downgradable".
> Could
> > > you
> > > > > > explain a bit more on the extra engineering effort for shipping
> > this
> > > > KIP
> > > > > > with downgrade protection in place?
> > > > > >
> > > > > > 4. "Each broker’s supported dictionary of feature versions will
> be
> > > > > defined
> > > > > > in the broker code." So this means in order to restrict a certain
> > > > > feature,
> > > > > > we need to start the broker first and then send a feature gating
> > > > request
> > > > > > immediately, which introduces a time gap and the
> intended-to-close
> > > > > feature
> > > > > > could actually serve request during this phase. Do you think we
> > > should
> > > > > also
> > > > > > support configurations as well so that admin user could freely
> roll
> > > up
> > > > a
> > > > > > cluster with all nodes complying the same feature gating, without
> > > > > worrying
> > > > > > about the turnaround time to propagate the message only after the
> > > > cluster
> > > > > > starts up?
> > > > > >
> > > > > > 5. "adding a new Feature, updating or deleting an existing
> > Feature",
> > > > may
> > > > > be
> > > > > > I misunderstood something, I thought the features are defined in
> > > broker
> > > > > > code, so admin could not really create a new feature?
> > > > > >
> > > > > > 6. I think we need a separate error code like
> > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > to
> > > > > > reject a concurrent feature update request.
> > > > > >
> > > > > > 7. I think we haven't discussed the alternative solution to pass
> > the
> > > > > > feature information through Zookeeper. Is that mentioned in the
> KIP
> > > to
> > > > > > justify why using UpdateMetadata is more favorable?
> > > > > >
> > > > > > 8. I was under the impression that user could configure a range
> of
> > > > > > supported versions, what's the trade-off for allowing single
> > > finalized
> > > > > > version only?
> > > > > >
> > > > > > 9. One minor syntax fix: Note that here the "client" here may be
> a
> > > > > producer
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmccabe@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > Hi Colin,
> > > > > > > >
> > > > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > > > suggestions.
> > > > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > .
> > > > > > > >
> > > > > > > > 1. '__data_version__' is the version of the finalized feature
> > > > > metadata
> > > > > > > > (i.e. actual ZK node contents), while the
> '__schema_version__'
> > is
> > > > the
> > > > > > > > version of the schema of the data persisted in ZK. These
> serve
> > > > > > different
> > > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > > during
> > > > > > reads,
> > > > > > > > to differentiate between the 2 versions of eventually
> > consistent
> > > > > > > 'finalized
> > > > > > > > features' metadata (i.e. larger metadata version is more
> > recent).
> > > > > > > > '__schema_version__' provides an additional degree of
> > > flexibility,
> > > > > > where
> > > > > > > if
> > > > > > > > we decide to change the schema for '/features' node in ZK (in
> > the
> > > > > > > future),
> > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > serialization/deserialization of the ZK data can be handled
> > > > safely).
> > > > > > >
> > > > > > > Hi Kowshik,
> > > > > > >
> > > > > > > If you're talking about a number that lets you know if data is
> > more
> > > > or
> > > > > > > less recent, we would typically call that an epoch, and not a
> > > > version.
> > > > > > For
> > > > > > > the ZK data structures, the word "version" is typically
> reserved
> > > for
> > > > > > > describing changes to the overall schema of the data that is
> > > written
> > > > to
> > > > > > > ZooKeeper.  We don't even really change the "version" of those
> > > > schemas
> > > > > > that
> > > > > > > much, since most changes are backwards-compatible.  But we do
> > > include
> > > > > > that
> > > > > > > version field just in case.
> > > > > > >
> > > > > > > I don't think we really need an epoch here, though, since we
> can
> > > just
> > > > > > look
> > > > > > > at the broker epoch.  Whenever the broker registers, its epoch
> > will
> > > > be
> > > > > > > greater than the previous broker epoch.  And the newly
> registered
> > > > data
> > > > > > will
> > > > > > > take priority.  This will be a lot simpler than adding a
> separate
> > > > epoch
> > > > > > > system, I think.
> > > > > > >
> > > > > > > >
> > > > > > > > 2. Regarding admin client needing min and max information -
> you
> > > are
> > > > > > > right!
> > > > > > > > I've changed the KIP such that the Admin API also allows the
> > user
> > > > to
> > > > > > read
> > > > > > > > 'supported features' from a specific broker. Please look at
> the
> > > > > section
> > > > > > > > "Admin API changes".
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > >
> > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > deliberate.
> > > > > I've
> > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > >
> > > > > > > Sounds good.
> > > > > > >
> > > > > > > >
> > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > > I've
> > > > > > > updated
> > > > > > > > the KIP sketching the functionality provided by this tool,
> with
> > > > some
> > > > > > > > examples. Please look at the section "Tooling support
> > examples".
> > > > > > > >
> > > > > > > > Thank you!
> > > > > > >
> > > > > > >
> > > > > > > Thanks, Kowshik.
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > cmccabe@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > >
> > > > > > > > > In the "Schema" section, do we really need both
> > > > __schema_version__
> > > > > > and
> > > > > > > > > __data_version__?  Can we just have a single version field
> > > here?
> > > > > > > > >
> > > > > > > > > Shouldn't the Admin(Client) function have some way to get
> the
> > > min
> > > > > and
> > > > > > > max
> > > > > > > > > information that we're exposing as well?  I guess we could
> > have
> > > > > min,
> > > > > > > max,
> > > > > > > > > and current.  Unrelated: is the use of Long rather than
> long
> > > > > > deliberate
> > > > > > > > > here?
> > > > > > > > >
> > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> flags
> > > that
> > > > > it
> > > > > > > will
> > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I've opened KIP-584
> <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > >
> > > > > > > > > > which
> > > > > > > > > > is intended to provide a versioning scheme for features.
> > I'd
> > > > like
> > > > > > to
> > > > > > > use
> > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > feedback
> > > on
> > > > > > this.
> > > > > > > > > > Here
> > > > > > > > > > is a link to KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > >  .
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Boyang Chen <re...@gmail.com>.
Thanks for the reply. The only remaining question is the propagation path.
KIP-500 only restricts `write access` to the controller, in a sense that
brokers in the pre-KIP-500 world could still listen to Zookeeper
notifications. Thus, we are open to discuss the engineering effort to go
through Zookeeper vs UpdateMetadata routing. What's your opinion on this
matter? Will either path significantly simpler than another?

Boyang

On Wed, Apr 1, 2020 at 12:10 AM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hey Boyang,
>
> Thanks for the feedback! Please find below my response to your latest
> comments.
> I have modified the KIP wherever possible to address the comments.
>
> > My point is that during a bootstrapping stage of a cluster, we could not
> > pick the desired feature version as no controller is actively handling
> our
> > request.
>
> (Kowshik): Note that just deploying the latest broker binary does not
> always mean that the
> new version of a certain feature will be automatically activated. Enabling
> the effects of the
> actual feature version is still left to the discretion of the
> implementation logic for  the feature.
> For example, for safety reasons, the feature can still be gated behind a
> dynamic config
> and later activated when the time comes.
>
> > Feature changes should be roughly the same frequency as config changes.
> > Today, the dynamic configuration changes are propagated via Zookeeper.
> > So I guess propagating through UpdateMetadata doesn't get us more
> benefits,
> > while going through ZK notification should be a simpler solution.
>
> (Kowshik): Maybe I'm missing something, but were you suggesting we should
> have these
> notifications delivered to the brokers directly via ZK? Note that with
> KIP-500 (where we are replacing ZK),
> for the bridge release we prefer that we will perform all access to ZK in
> the controller,
> rather than in other brokers, clients, or tools. Therefore, although ZK
> will still be
> required for the bridge release, it will be a well-isolated dependency.
> Please read
> this section of KIP-500:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum#KIP-500:ReplaceZooKeeperwithaSelf-ManagedMetadataQuorum-BridgeRelease
> .
>
> Therefore, the existing approach in the KIP is future proof with regards to
> the above requirement.
> We deliver the ZK notification only via the controller's
> `UpdateMetadataRequest` to the brokers.
> We also access ZK only always via the controller.
>
> > Understood, I don't feel strong about deprecation, but does the current
> KIP
> > keep the door open for future improvements if
> > someone has a need for feature deprecation? Could we briefly discuss
> about
> > it in the future work section?
>
> (Kowshik): Done. Please refer to the 'Future work' section:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Futurework
>
>
> Cheers,
> Kowshik
>
>
> On Tue, Mar 31, 2020 at 9:12 PM Boyang Chen <re...@gmail.com>
> wrote:
>
> > Thanks Kowshik, the answers are making sense. Some follow-ups:
> >
> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Kowshik,
> > >
> > > Thanks for the KIP. Looks good overall. A few comments below.
> > >
> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > 100.1 Since this request waits for responses from brokers, should we
> add
> > a
> > > timeout in the request (like createTopicRequest)?
> > > 100.2 The response schema is a bit weird. Typically, the response just
> > > shows an error code and an error message, instead of echoing the
> request.
> > > 100.3 Should we add a separate request to list/describe the existing
> > > features?
> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > > DELETE, the version field doesn't make sense. So, I guess the broker
> just
> > > ignores this? An alternative way is to have a separate
> > > DeleteFeaturesRequest
> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > > version of the metadata for finalized features." I am wondering why the
> > > ordering is important?
> > > 100.6 Could you specify the required ACL for this new request?
> > >
> > > 101. For the broker registration ZK node, should we bump up the version
> > in
> > > the json?
> > >
> > > 102. For the /features ZK node, not sure if we need the epoch field.
> Each
> > > ZK node has an internal version field that is incremented on every
> > update.
> > >
> > > 103. "Enabling the actual semantics of a feature version cluster-wide
> is
> > > left to the discretion of the logic implementing the feature (ex: can
> be
> > > done via dynamic broker config)." Does that mean the broker
> registration
> > ZK
> > > node will be updated dynamically when this happens?
> > >
> > > 104. UpdateMetadataRequest
> > > 104.1 It would be useful to describe when the feature metadata is
> > included
> > > in the request. My understanding is that it's only included if (1)
> there
> > is
> > > a change to the finalized feature; (2) broker restart; (3) controller
> > > failover.
> > > 104.2 The new fields have the following versions. Why are the versions
> 3+
> > > when the top version is bumped to 6?
> > >       "fields":  [
> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > >           "about": "The name of the feature."},
> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > >           "about": "The finalized version for the feature."}
> > >       ]
> > >
> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> > better
> > > to use enable/disable?
> > >
> > > Jun
> > >
> > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> kprakasam@confluent.io
> > >
> > > wrote:
> > >
> > > > Hey Boyang,
> > > >
> > > > Thanks for the great feedback! I have updated the KIP based on your
> > > > feedback.
> > > > Please find my response below for your comments, look for sentences
> > > > starting
> > > > with "(Kowshik)" below.
> > > >
> > > >
> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > > could
> > > > be
> > > > > converted as "When is it safe for the brokers to start serving new
> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > the
> > > > > context.
> > > >
> > > > (Kowshik): Great point! Done.
> > > >
> >
> > > > 2. In the *Explanation *section, the metadata version number part
> seems
> > > a
> > > > > bit blurred. Could you point a reference to later section that we
> > going
> > > > to
> > > > > store it in Zookeeper and update it every time when there is a
> > feature
> > > > > change?
> > > >
> > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > > >
> > > >
> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > for
> > > > > features such as group coordinator semantics, there is no legal
> > > scenario
> > > > to
> > > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > > error-prone as human faults happen all the time. I'm assuming as
> new
> > > > > features are implemented, it's not very hard to add a flag during
> > > feature
> > > > > creation to indicate whether this feature is "downgradable". Could
> > you
> > > > > explain a bit more on the extra engineering effort for shipping
> this
> > > KIP
> > > > > with downgrade protection in place?
> > > >
> > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
> that
> > > > accidental
> > > > downgrades can cause problems, I also think sometimes downgrades
> should
> > > > be allowed for emergency reasons (not all downgrades cause issues).
> > > > It is just subjective to the feature being downgraded.
> > > >
> > > > To be more strict about feature version downgrades, I have modified
> the
> > > KIP
> > > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > > UPDATE_FEATURES api
> > > > and the tooling, whenever the human is downgrading a finalized
> feature
> > > > version.
> > > > Hopefully this should cover the requirement, until we find the need
> for
> > > > advanced downgrade support.
> > > >
> > >
> > +1 for adding this flag.
> >
> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > > defined
> > > > > in the broker code." So this means in order to restrict a certain
> > > > feature,
> > > > > we need to start the broker first and then send a feature gating
> > > request
> > > > > immediately, which introduces a time gap and the intended-to-close
> > > > feature
> > > > > could actually serve request during this phase. Do you think we
> > should
> > > > also
> > > > > support configurations as well so that admin user could freely roll
> > up
> > > a
> > > > > cluster with all nodes complying the same feature gating, without
> > > > worrying
> > > > > about the turnaround time to propagate the message only after the
> > > cluster
> > > > > starts up?
> > > >
> > > > (Kowshik): This is a great point/question. One of the expectations
> out
> > of
> > > > this KIP, which is
> > > > already followed in the broker, is the following.
> > > >  - Imagine at time T1 the broker starts up and registers it’s
> presence
> > in
> > > > ZK,
> > > >    along with advertising it’s supported features.
> > > >  - Imagine at a future time T2 the broker receives the
> > > > UpdateMetadataRequest
> > > >    from the controller, which contains the latest finalized features
> as
> > > > seen by
> > > >    the controller. The broker validates this data against it’s
> > supported
> > > > features to
> > > >    make sure there is no mismatch (it will shutdown if there is an
> > > > incompatibility).
> > > >
> > > > It is expected that during the time between the 2 events T1 and T2,
> the
> > > > broker is
> > > > almost a silent entity in the cluster. It does not add any value to
> the
> > > > cluster, or carry
> > > > out any important broker activities. By “important”, I mean it is not
> > > doing
> > > > mutations
> > > > on it’s persistence, not mutating critical in-memory state, won’t be
> > > > serving
> > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > partitions
> > > > until
> > > > it receives UpdateMetadataRequest from controller. Anything the
> broker
> > is
> > > > doing up
> > > > until this point is not damaging/useful.
> > > >
> > > > I’ve clarified the above in the KIP, see this new section:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > .
> > >
> > > My point is that during a bootstrapping stage of a cluster, we could
> not
> > pick the desired feature version as no controller is actively handling
> our
> > request. But anyway, I think this is a rare case to discuss, and the
> added
> > paragraph looks good :)
> >
> >
> > > > > 5. "adding a new Feature, updating or deleting an existing
> Feature",
> > > may
> > > > be
> > > > > I misunderstood something, I thought the features are defined in
> > broker
> > > > > code, so admin could not really create a new feature?
> > > >
> > > > (Kowshik): Great point! You understood this right. Here adding a
> > feature
> > > > means we are
> > > > adding a cluster-wide finalized *max* version for a feature that was
> > > > previously never finalized.
> > > > I have clarified this in the KIP now.
> > > >
> > > > > 6. I think we need a separate error code like
> > > FEATURE_UPDATE_IN_PROGRESS
> > > > to
> > > > > reject a concurrent feature update request.
> > > >
> > > > (Kowshik): Great point! I have modified the KIP adding the above (see
> > > > 'Tooling support -> Admin API changes').
> > > >
> > > > > 7. I think we haven't discussed the alternative solution to pass
> the
> > > > > feature information through Zookeeper. Is that mentioned in the KIP
> > to
> > > > > justify why using UpdateMetadata is more favorable?
> > > >
> > > > (Kowshik): Nice question! The broker reads finalized feature info
> > stored
> > > in
> > > > ZK,
> > > > only during startup when it does a validation. When serving
> > > > `ApiVersionsRequest`, the
> > > > broker does not read this info from ZK directly. I'd imagine the risk
> > is
> > > > that it can increase
> > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> > Kafka
> > > > we use the
> > > > controller to fan out ZK updates to brokers and we want to stick to
> > that
> > > > pattern to avoid
> > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > >
> > > Feature changes should be roughly the same frequency as config changes.
> > Today, the dynamic configuration
> > changes are propagated via Zookeeper. So I guess propagating through
> > UpdateMetadata doesn't get us more benefits,
> > while going through ZK notification should be a simpler solution.
> >
> > > > 8. I was under the impression that user could configure a range of
> > > > > supported versions, what's the trade-off for allowing single
> > finalized
> > > > > version only?
> > > >
> > > > (Kowshik): Great question! The finalized version of a feature
> basically
> > > > refers to
> > > > the cluster-wide finalized feature "maximum" version. For example, if
> > the
> > > > 'group_coordinator' feature
> > > > has the finalized version set to 10, then, it means that cluster-wide
> > all
> > > > versions upto v10 are
> > > > supported for this feature. However, note that if some version (ex:
> v0)
> > > > gets deprecated
> > > > for this feature, then we don’t convey that using this scheme (also
> > > > supporting deprecation is a non-goal).
> > > >
> > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > finalized
> > > > feature "maximum" versions.
> > > >
> > >
> > Understood, I don't feel strong about deprecation, but does the current
> KIP
> > keep the door open for future improvements if
> > someone has a need for feature deprecation? Could we briefly discuss
> about
> > it in the future work section?
> >
> >
> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > > producer
> > > >
> > > > (Kowshik): Great point! Done.
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > >
> > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > reluctanthero104@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey Kowshik,
> > > > >
> > > > > thanks for the revised KIP. Got a couple of questions:
> > > > >
> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > > could
> > > > be
> > > > > converted as "When is it safe for the brokers to start serving new
> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> > the
> > > > > context.
> > > > >
> > > > > 2. In the *Explanation *section, the metadata version number part
> > > seems a
> > > > > bit blurred. Could you point a reference to later section that we
> > going
> > > > to
> > > > > store it in Zookeeper and update it every time when there is a
> > feature
> > > > > change?
> > > > >
> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> > for
> > > > > features such as group coordinator semantics, there is no legal
> > > scenario
> > > > to
> > > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > > error-prone as human faults happen all the time. I'm assuming as
> new
> > > > > features are implemented, it's not very hard to add a flag during
> > > feature
> > > > > creation to indicate whether this feature is "downgradable". Could
> > you
> > > > > explain a bit more on the extra engineering effort for shipping
> this
> > > KIP
> > > > > with downgrade protection in place?
> > > > >
> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > > defined
> > > > > in the broker code." So this means in order to restrict a certain
> > > > feature,
> > > > > we need to start the broker first and then send a feature gating
> > > request
> > > > > immediately, which introduces a time gap and the intended-to-close
> > > > feature
> > > > > could actually serve request during this phase. Do you think we
> > should
> > > > also
> > > > > support configurations as well so that admin user could freely roll
> > up
> > > a
> > > > > cluster with all nodes complying the same feature gating, without
> > > > worrying
> > > > > about the turnaround time to propagate the message only after the
> > > cluster
> > > > > starts up?
> > > > >
> > > > > 5. "adding a new Feature, updating or deleting an existing
> Feature",
> > > may
> > > > be
> > > > > I misunderstood something, I thought the features are defined in
> > broker
> > > > > code, so admin could not really create a new feature?
> > > > >
> > > > > 6. I think we need a separate error code like
> > > FEATURE_UPDATE_IN_PROGRESS
> > > > to
> > > > > reject a concurrent feature update request.
> > > > >
> > > > > 7. I think we haven't discussed the alternative solution to pass
> the
> > > > > feature information through Zookeeper. Is that mentioned in the KIP
> > to
> > > > > justify why using UpdateMetadata is more favorable?
> > > > >
> > > > > 8. I was under the impression that user could configure a range of
> > > > > supported versions, what's the trade-off for allowing single
> > finalized
> > > > > version only?
> > > > >
> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > > producer
> > > > >
> > > > > Boyang
> > > > >
> > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> > > wrote:
> > > > >
> > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > Hi Colin,
> > > > > > >
> > > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > > suggestions.
> > > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > .
> > > > > > >
> > > > > > > 1. '__data_version__' is the version of the finalized feature
> > > > metadata
> > > > > > > (i.e. actual ZK node contents), while the '__schema_version__'
> is
> > > the
> > > > > > > version of the schema of the data persisted in ZK. These serve
> > > > > different
> > > > > > > purposes. '__data_version__' is is useful mainly to clients
> > during
> > > > > reads,
> > > > > > > to differentiate between the 2 versions of eventually
> consistent
> > > > > > 'finalized
> > > > > > > features' metadata (i.e. larger metadata version is more
> recent).
> > > > > > > '__schema_version__' provides an additional degree of
> > flexibility,
> > > > > where
> > > > > > if
> > > > > > > we decide to change the schema for '/features' node in ZK (in
> the
> > > > > > future),
> > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > serialization/deserialization of the ZK data can be handled
> > > safely).
> > > > > >
> > > > > > Hi Kowshik,
> > > > > >
> > > > > > If you're talking about a number that lets you know if data is
> more
> > > or
> > > > > > less recent, we would typically call that an epoch, and not a
> > > version.
> > > > > For
> > > > > > the ZK data structures, the word "version" is typically reserved
> > for
> > > > > > describing changes to the overall schema of the data that is
> > written
> > > to
> > > > > > ZooKeeper.  We don't even really change the "version" of those
> > > schemas
> > > > > that
> > > > > > much, since most changes are backwards-compatible.  But we do
> > include
> > > > > that
> > > > > > version field just in case.
> > > > > >
> > > > > > I don't think we really need an epoch here, though, since we can
> > just
> > > > > look
> > > > > > at the broker epoch.  Whenever the broker registers, its epoch
> will
> > > be
> > > > > > greater than the previous broker epoch.  And the newly registered
> > > data
> > > > > will
> > > > > > take priority.  This will be a lot simpler than adding a separate
> > > epoch
> > > > > > system, I think.
> > > > > >
> > > > > > >
> > > > > > > 2. Regarding admin client needing min and max information - you
> > are
> > > > > > right!
> > > > > > > I've changed the KIP such that the Admin API also allows the
> user
> > > to
> > > > > read
> > > > > > > 'supported features' from a specific broker. Please look at the
> > > > section
> > > > > > > "Admin API changes".
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> deliberate.
> > > > I've
> > > > > > > improved the KIP to just use `long` at all places.
> > > > > >
> > > > > > Sounds good.
> > > > > >
> > > > > > >
> > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> > I've
> > > > > > updated
> > > > > > > the KIP sketching the functionality provided by this tool, with
> > > some
> > > > > > > examples. Please look at the section "Tooling support
> examples".
> > > > > > >
> > > > > > > Thank you!
> > > > > >
> > > > > >
> > > > > > Thanks, Kowshik.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > cmccabe@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > >
> > > > > > > > In the "Schema" section, do we really need both
> > > __schema_version__
> > > > > and
> > > > > > > > __data_version__?  Can we just have a single version field
> > here?
> > > > > > > >
> > > > > > > > Shouldn't the Admin(Client) function have some way to get the
> > min
> > > > and
> > > > > > max
> > > > > > > > information that we're exposing as well?  I guess we could
> have
> > > > min,
> > > > > > max,
> > > > > > > > and current.  Unrelated: is the use of Long rather than long
> > > > > deliberate
> > > > > > > > here?
> > > > > > > >
> > > > > > > > It would be good to describe how the command line tool
> > > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> > that
> > > > it
> > > > > > will
> > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I've opened KIP-584 <
> > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > >
> > > > > > > > > which
> > > > > > > > > is intended to provide a versioning scheme for features.
> I'd
> > > like
> > > > > to
> > > > > > use
> > > > > > > > > this thread to discuss the same. I'd appreciate any
> feedback
> > on
> > > > > this.
> > > > > > > > > Here
> > > > > > > > > is a link to KIP-584:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > >  .
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Kowshik
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Kowshik Prakasam <kp...@confluent.io>.
Hey Boyang,

Thanks for the feedback! Please find below my response to your latest
comments.
I have modified the KIP wherever possible to address the comments.

> My point is that during a bootstrapping stage of a cluster, we could not
> pick the desired feature version as no controller is actively handling our
> request.

(Kowshik): Note that just deploying the latest broker binary does not
always mean that the
new version of a certain feature will be automatically activated. Enabling
the effects of the
actual feature version is still left to the discretion of the
implementation logic for  the feature.
For example, for safety reasons, the feature can still be gated behind a
dynamic config
and later activated when the time comes.

> Feature changes should be roughly the same frequency as config changes.
> Today, the dynamic configuration changes are propagated via Zookeeper.
> So I guess propagating through UpdateMetadata doesn't get us more
benefits,
> while going through ZK notification should be a simpler solution.

(Kowshik): Maybe I'm missing something, but were you suggesting we should
have these
notifications delivered to the brokers directly via ZK? Note that with
KIP-500 (where we are replacing ZK),
for the bridge release we prefer that we will perform all access to ZK in
the controller,
rather than in other brokers, clients, or tools. Therefore, although ZK
will still be
required for the bridge release, it will be a well-isolated dependency.
Please read
this section of KIP-500:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum#KIP-500:ReplaceZooKeeperwithaSelf-ManagedMetadataQuorum-BridgeRelease
.

Therefore, the existing approach in the KIP is future proof with regards to
the above requirement.
We deliver the ZK notification only via the controller's
`UpdateMetadataRequest` to the brokers.
We also access ZK only always via the controller.

> Understood, I don't feel strong about deprecation, but does the current
KIP
> keep the door open for future improvements if
> someone has a need for feature deprecation? Could we briefly discuss about
> it in the future work section?

(Kowshik): Done. Please refer to the 'Future work' section:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Futurework


Cheers,
Kowshik


On Tue, Mar 31, 2020 at 9:12 PM Boyang Chen <re...@gmail.com>
wrote:

> Thanks Kowshik, the answers are making sense. Some follow-ups:
>
> On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Kowshik,
> >
> > Thanks for the KIP. Looks good overall. A few comments below.
> >
> > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > 100.1 Since this request waits for responses from brokers, should we add
> a
> > timeout in the request (like createTopicRequest)?
> > 100.2 The response schema is a bit weird. Typically, the response just
> > shows an error code and an error message, instead of echoing the request.
> > 100.3 Should we add a separate request to list/describe the existing
> > features?
> > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > DELETE, the version field doesn't make sense. So, I guess the broker just
> > ignores this? An alternative way is to have a separate
> > DeleteFeaturesRequest
> > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > version of the metadata for finalized features." I am wondering why the
> > ordering is important?
> > 100.6 Could you specify the required ACL for this new request?
> >
> > 101. For the broker registration ZK node, should we bump up the version
> in
> > the json?
> >
> > 102. For the /features ZK node, not sure if we need the epoch field. Each
> > ZK node has an internal version field that is incremented on every
> update.
> >
> > 103. "Enabling the actual semantics of a feature version cluster-wide is
> > left to the discretion of the logic implementing the feature (ex: can be
> > done via dynamic broker config)." Does that mean the broker registration
> ZK
> > node will be updated dynamically when this happens?
> >
> > 104. UpdateMetadataRequest
> > 104.1 It would be useful to describe when the feature metadata is
> included
> > in the request. My understanding is that it's only included if (1) there
> is
> > a change to the finalized feature; (2) broker restart; (3) controller
> > failover.
> > 104.2 The new fields have the following versions. Why are the versions 3+
> > when the top version is bumped to 6?
> >       "fields":  [
> >         {"name": "Name", "type":  "string", "versions":  "3+",
> >           "about": "The name of the feature."},
> >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >           "about": "The finalized version for the feature."}
> >       ]
> >
> > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> better
> > to use enable/disable?
> >
> > Jun
> >
> > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kprakasam@confluent.io
> >
> > wrote:
> >
> > > Hey Boyang,
> > >
> > > Thanks for the great feedback! I have updated the KIP based on your
> > > feedback.
> > > Please find my response below for your comments, look for sentences
> > > starting
> > > with "(Kowshik)" below.
> > >
> > >
> > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > could
> > > be
> > > > converted as "When is it safe for the brokers to start serving new
> > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> the
> > > > context.
> > >
> > > (Kowshik): Great point! Done.
> > >
>
> > > 2. In the *Explanation *section, the metadata version number part seems
> > a
> > > > bit blurred. Could you point a reference to later section that we
> going
> > > to
> > > > store it in Zookeeper and update it every time when there is a
> feature
> > > > change?
> > >
> > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> > >
> > >
> > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> for
> > > > features such as group coordinator semantics, there is no legal
> > scenario
> > > to
> > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > error-prone as human faults happen all the time. I'm assuming as new
> > > > features are implemented, it's not very hard to add a flag during
> > feature
> > > > creation to indicate whether this feature is "downgradable". Could
> you
> > > > explain a bit more on the extra engineering effort for shipping this
> > KIP
> > > > with downgrade protection in place?
> > >
> > > (Kowshik): Great point! I'd agree and disagree here. While I agree that
> > > accidental
> > > downgrades can cause problems, I also think sometimes downgrades should
> > > be allowed for emergency reasons (not all downgrades cause issues).
> > > It is just subjective to the feature being downgraded.
> > >
> > > To be more strict about feature version downgrades, I have modified the
> > KIP
> > > proposing that we mandate a `--force-downgrade` flag be used in the
> > > UPDATE_FEATURES api
> > > and the tooling, whenever the human is downgrading a finalized feature
> > > version.
> > > Hopefully this should cover the requirement, until we find the need for
> > > advanced downgrade support.
> > >
> >
> +1 for adding this flag.
>
> > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > defined
> > > > in the broker code." So this means in order to restrict a certain
> > > feature,
> > > > we need to start the broker first and then send a feature gating
> > request
> > > > immediately, which introduces a time gap and the intended-to-close
> > > feature
> > > > could actually serve request during this phase. Do you think we
> should
> > > also
> > > > support configurations as well so that admin user could freely roll
> up
> > a
> > > > cluster with all nodes complying the same feature gating, without
> > > worrying
> > > > about the turnaround time to propagate the message only after the
> > cluster
> > > > starts up?
> > >
> > > (Kowshik): This is a great point/question. One of the expectations out
> of
> > > this KIP, which is
> > > already followed in the broker, is the following.
> > >  - Imagine at time T1 the broker starts up and registers it’s presence
> in
> > > ZK,
> > >    along with advertising it’s supported features.
> > >  - Imagine at a future time T2 the broker receives the
> > > UpdateMetadataRequest
> > >    from the controller, which contains the latest finalized features as
> > > seen by
> > >    the controller. The broker validates this data against it’s
> supported
> > > features to
> > >    make sure there is no mismatch (it will shutdown if there is an
> > > incompatibility).
> > >
> > > It is expected that during the time between the 2 events T1 and T2, the
> > > broker is
> > > almost a silent entity in the cluster. It does not add any value to the
> > > cluster, or carry
> > > out any important broker activities. By “important”, I mean it is not
> > doing
> > > mutations
> > > on it’s persistence, not mutating critical in-memory state, won’t be
> > > serving
> > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > partitions
> > > until
> > > it receives UpdateMetadataRequest from controller. Anything the broker
> is
> > > doing up
> > > until this point is not damaging/useful.
> > >
> > > I’ve clarified the above in the KIP, see this new section:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > .
> >
> > My point is that during a bootstrapping stage of a cluster, we could not
> pick the desired feature version as no controller is actively handling our
> request. But anyway, I think this is a rare case to discuss, and the added
> paragraph looks good :)
>
>
> > > > 5. "adding a new Feature, updating or deleting an existing Feature",
> > may
> > > be
> > > > I misunderstood something, I thought the features are defined in
> broker
> > > > code, so admin could not really create a new feature?
> > >
> > > (Kowshik): Great point! You understood this right. Here adding a
> feature
> > > means we are
> > > adding a cluster-wide finalized *max* version for a feature that was
> > > previously never finalized.
> > > I have clarified this in the KIP now.
> > >
> > > > 6. I think we need a separate error code like
> > FEATURE_UPDATE_IN_PROGRESS
> > > to
> > > > reject a concurrent feature update request.
> > >
> > > (Kowshik): Great point! I have modified the KIP adding the above (see
> > > 'Tooling support -> Admin API changes').
> > >
> > > > 7. I think we haven't discussed the alternative solution to pass the
> > > > feature information through Zookeeper. Is that mentioned in the KIP
> to
> > > > justify why using UpdateMetadata is more favorable?
> > >
> > > (Kowshik): Nice question! The broker reads finalized feature info
> stored
> > in
> > > ZK,
> > > only during startup when it does a validation. When serving
> > > `ApiVersionsRequest`, the
> > > broker does not read this info from ZK directly. I'd imagine the risk
> is
> > > that it can increase
> > > the ZK read QPS which can be a bottleneck for the system. Today, in
> Kafka
> > > we use the
> > > controller to fan out ZK updates to brokers and we want to stick to
> that
> > > pattern to avoid
> > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> >
> > Feature changes should be roughly the same frequency as config changes.
> Today, the dynamic configuration
> changes are propagated via Zookeeper. So I guess propagating through
> UpdateMetadata doesn't get us more benefits,
> while going through ZK notification should be a simpler solution.
>
> > > 8. I was under the impression that user could configure a range of
> > > > supported versions, what's the trade-off for allowing single
> finalized
> > > > version only?
> > >
> > > (Kowshik): Great question! The finalized version of a feature basically
> > > refers to
> > > the cluster-wide finalized feature "maximum" version. For example, if
> the
> > > 'group_coordinator' feature
> > > has the finalized version set to 10, then, it means that cluster-wide
> all
> > > versions upto v10 are
> > > supported for this feature. However, note that if some version (ex: v0)
> > > gets deprecated
> > > for this feature, then we don’t convey that using this scheme (also
> > > supporting deprecation is a non-goal).
> > >
> > > (Kowshik): I’ve now modified the KIP at all points, refering to
> finalized
> > > feature "maximum" versions.
> > >
> >
> Understood, I don't feel strong about deprecation, but does the current KIP
> keep the door open for future improvements if
> someone has a need for feature deprecation? Could we briefly discuss about
> it in the future work section?
>
>
> > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > producer
> > >
> > > (Kowshik): Great point! Done.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > >
> > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> reluctanthero104@gmail.com>
> > > wrote:
> > >
> > > > Hey Kowshik,
> > > >
> > > > thanks for the revised KIP. Got a couple of questions:
> > > >
> > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> > could
> > > be
> > > > converted as "When is it safe for the brokers to start serving new
> > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> the
> > > > context.
> > > >
> > > > 2. In the *Explanation *section, the metadata version number part
> > seems a
> > > > bit blurred. Could you point a reference to later section that we
> going
> > > to
> > > > store it in Zookeeper and update it every time when there is a
> feature
> > > > change?
> > > >
> > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> for
> > > > features such as group coordinator semantics, there is no legal
> > scenario
> > > to
> > > > perform a downgrade at all. So having downgrade door open is pretty
> > > > error-prone as human faults happen all the time. I'm assuming as new
> > > > features are implemented, it's not very hard to add a flag during
> > feature
> > > > creation to indicate whether this feature is "downgradable". Could
> you
> > > > explain a bit more on the extra engineering effort for shipping this
> > KIP
> > > > with downgrade protection in place?
> > > >
> > > > 4. "Each broker’s supported dictionary of feature versions will be
> > > defined
> > > > in the broker code." So this means in order to restrict a certain
> > > feature,
> > > > we need to start the broker first and then send a feature gating
> > request
> > > > immediately, which introduces a time gap and the intended-to-close
> > > feature
> > > > could actually serve request during this phase. Do you think we
> should
> > > also
> > > > support configurations as well so that admin user could freely roll
> up
> > a
> > > > cluster with all nodes complying the same feature gating, without
> > > worrying
> > > > about the turnaround time to propagate the message only after the
> > cluster
> > > > starts up?
> > > >
> > > > 5. "adding a new Feature, updating or deleting an existing Feature",
> > may
> > > be
> > > > I misunderstood something, I thought the features are defined in
> broker
> > > > code, so admin could not really create a new feature?
> > > >
> > > > 6. I think we need a separate error code like
> > FEATURE_UPDATE_IN_PROGRESS
> > > to
> > > > reject a concurrent feature update request.
> > > >
> > > > 7. I think we haven't discussed the alternative solution to pass the
> > > > feature information through Zookeeper. Is that mentioned in the KIP
> to
> > > > justify why using UpdateMetadata is more favorable?
> > > >
> > > > 8. I was under the impression that user could configure a range of
> > > > supported versions, what's the trade-off for allowing single
> finalized
> > > > version only?
> > > >
> > > > 9. One minor syntax fix: Note that here the "client" here may be a
> > > producer
> > > >
> > > > Boyang
> > > >
> > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> > wrote:
> > > >
> > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > Hi Colin,
> > > > > >
> > > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > > suggestions.
> > > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > .
> > > > > >
> > > > > > 1. '__data_version__' is the version of the finalized feature
> > > metadata
> > > > > > (i.e. actual ZK node contents), while the '__schema_version__' is
> > the
> > > > > > version of the schema of the data persisted in ZK. These serve
> > > > different
> > > > > > purposes. '__data_version__' is is useful mainly to clients
> during
> > > > reads,
> > > > > > to differentiate between the 2 versions of eventually consistent
> > > > > 'finalized
> > > > > > features' metadata (i.e. larger metadata version is more recent).
> > > > > > '__schema_version__' provides an additional degree of
> flexibility,
> > > > where
> > > > > if
> > > > > > we decide to change the schema for '/features' node in ZK (in the
> > > > > future),
> > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > serialization/deserialization of the ZK data can be handled
> > safely).
> > > > >
> > > > > Hi Kowshik,
> > > > >
> > > > > If you're talking about a number that lets you know if data is more
> > or
> > > > > less recent, we would typically call that an epoch, and not a
> > version.
> > > > For
> > > > > the ZK data structures, the word "version" is typically reserved
> for
> > > > > describing changes to the overall schema of the data that is
> written
> > to
> > > > > ZooKeeper.  We don't even really change the "version" of those
> > schemas
> > > > that
> > > > > much, since most changes are backwards-compatible.  But we do
> include
> > > > that
> > > > > version field just in case.
> > > > >
> > > > > I don't think we really need an epoch here, though, since we can
> just
> > > > look
> > > > > at the broker epoch.  Whenever the broker registers, its epoch will
> > be
> > > > > greater than the previous broker epoch.  And the newly registered
> > data
> > > > will
> > > > > take priority.  This will be a lot simpler than adding a separate
> > epoch
> > > > > system, I think.
> > > > >
> > > > > >
> > > > > > 2. Regarding admin client needing min and max information - you
> are
> > > > > right!
> > > > > > I've changed the KIP such that the Admin API also allows the user
> > to
> > > > read
> > > > > > 'supported features' from a specific broker. Please look at the
> > > section
> > > > > > "Admin API changes".
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
> > > I've
> > > > > > improved the KIP to just use `long` at all places.
> > > > >
> > > > > Sounds good.
> > > > >
> > > > > >
> > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> I've
> > > > > updated
> > > > > > the KIP sketching the functionality provided by this tool, with
> > some
> > > > > > examples. Please look at the section "Tooling support examples".
> > > > > >
> > > > > > Thank you!
> > > > >
> > > > >
> > > > > Thanks, Kowshik.
> > > > >
> > > > > cheers,
> > > > > Colin
> > > > >
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> cmccabe@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks, Kowshik, this looks good.
> > > > > > >
> > > > > > > In the "Schema" section, do we really need both
> > __schema_version__
> > > > and
> > > > > > > __data_version__?  Can we just have a single version field
> here?
> > > > > > >
> > > > > > > Shouldn't the Admin(Client) function have some way to get the
> min
> > > and
> > > > > max
> > > > > > > information that we're exposing as well?  I guess we could have
> > > min,
> > > > > max,
> > > > > > > and current.  Unrelated: is the use of Long rather than long
> > > > deliberate
> > > > > > > here?
> > > > > > >
> > > > > > > It would be good to describe how the command line tool
> > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> that
> > > it
> > > > > will
> > > > > > > take and the output that it will generate to STDOUT.
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I've opened KIP-584 <
> > > https://issues.apache.org/jira/browse/KIP-584
> > > > >
> > > > > > > > which
> > > > > > > > is intended to provide a versioning scheme for features. I'd
> > like
> > > > to
> > > > > use
> > > > > > > > this thread to discuss the same. I'd appreciate any feedback
> on
> > > > this.
> > > > > > > > Here
> > > > > > > > is a link to KIP-584:
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > >  .
> > > > > > > >
> > > > > > > > Thank you!
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Kowshik
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Boyang Chen <re...@gmail.com>.
Thanks Kowshik, the answers are making sense. Some follow-ups:

On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the KIP. Looks good overall. A few comments below.
>
> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> 100.1 Since this request waits for responses from brokers, should we add a
> timeout in the request (like createTopicRequest)?
> 100.2 The response schema is a bit weird. Typically, the response just
> shows an error code and an error message, instead of echoing the request.
> 100.3 Should we add a separate request to list/describe the existing
> features?
> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> DELETE, the version field doesn't make sense. So, I guess the broker just
> ignores this? An alternative way is to have a separate
> DeleteFeaturesRequest
> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> version of the metadata for finalized features." I am wondering why the
> ordering is important?
> 100.6 Could you specify the required ACL for this new request?
>
> 101. For the broker registration ZK node, should we bump up the version in
> the json?
>
> 102. For the /features ZK node, not sure if we need the epoch field. Each
> ZK node has an internal version field that is incremented on every update.
>
> 103. "Enabling the actual semantics of a feature version cluster-wide is
> left to the discretion of the logic implementing the feature (ex: can be
> done via dynamic broker config)." Does that mean the broker registration ZK
> node will be updated dynamically when this happens?
>
> 104. UpdateMetadataRequest
> 104.1 It would be useful to describe when the feature metadata is included
> in the request. My understanding is that it's only included if (1) there is
> a change to the finalized feature; (2) broker restart; (3) controller
> failover.
> 104.2 The new fields have the following versions. Why are the versions 3+
> when the top version is bumped to 6?
>       "fields":  [
>         {"name": "Name", "type":  "string", "versions":  "3+",
>           "about": "The name of the feature."},
>         {"name":  "Version", "type":  "int64", "versions":  "3+",
>           "about": "The finalized version for the feature."}
>       ]
>
> 105. kafka-features.sh: Instead of using update/delete, perhaps it's better
> to use enable/disable?
>
> Jun
>
> On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kp...@confluent.io>
> wrote:
>
> > Hey Boyang,
> >
> > Thanks for the great feedback! I have updated the KIP based on your
> > feedback.
> > Please find my response below for your comments, look for sentences
> > starting
> > with "(Kowshik)" below.
> >
> >
> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> could
> > be
> > > converted as "When is it safe for the brokers to start serving new
> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > > context.
> >
> > (Kowshik): Great point! Done.
> >

> > 2. In the *Explanation *section, the metadata version number part seems
> a
> > > bit blurred. Could you point a reference to later section that we going
> > to
> > > store it in Zookeeper and update it every time when there is a feature
> > > change?
> >
> > (Kowshik): Great point! Done. I've added a reference in the KIP.
> >
> >
> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > > features such as group coordinator semantics, there is no legal
> scenario
> > to
> > > perform a downgrade at all. So having downgrade door open is pretty
> > > error-prone as human faults happen all the time. I'm assuming as new
> > > features are implemented, it's not very hard to add a flag during
> feature
> > > creation to indicate whether this feature is "downgradable". Could you
> > > explain a bit more on the extra engineering effort for shipping this
> KIP
> > > with downgrade protection in place?
> >
> > (Kowshik): Great point! I'd agree and disagree here. While I agree that
> > accidental
> > downgrades can cause problems, I also think sometimes downgrades should
> > be allowed for emergency reasons (not all downgrades cause issues).
> > It is just subjective to the feature being downgraded.
> >
> > To be more strict about feature version downgrades, I have modified the
> KIP
> > proposing that we mandate a `--force-downgrade` flag be used in the
> > UPDATE_FEATURES api
> > and the tooling, whenever the human is downgrading a finalized feature
> > version.
> > Hopefully this should cover the requirement, until we find the need for
> > advanced downgrade support.
> >
>
+1 for adding this flag.

> > > 4. "Each broker’s supported dictionary of feature versions will be
> > defined
> > > in the broker code." So this means in order to restrict a certain
> > feature,
> > > we need to start the broker first and then send a feature gating
> request
> > > immediately, which introduces a time gap and the intended-to-close
> > feature
> > > could actually serve request during this phase. Do you think we should
> > also
> > > support configurations as well so that admin user could freely roll up
> a
> > > cluster with all nodes complying the same feature gating, without
> > worrying
> > > about the turnaround time to propagate the message only after the
> cluster
> > > starts up?
> >
> > (Kowshik): This is a great point/question. One of the expectations out of
> > this KIP, which is
> > already followed in the broker, is the following.
> >  - Imagine at time T1 the broker starts up and registers it’s presence in
> > ZK,
> >    along with advertising it’s supported features.
> >  - Imagine at a future time T2 the broker receives the
> > UpdateMetadataRequest
> >    from the controller, which contains the latest finalized features as
> > seen by
> >    the controller. The broker validates this data against it’s supported
> > features to
> >    make sure there is no mismatch (it will shutdown if there is an
> > incompatibility).
> >
> > It is expected that during the time between the 2 events T1 and T2, the
> > broker is
> > almost a silent entity in the cluster. It does not add any value to the
> > cluster, or carry
> > out any important broker activities. By “important”, I mean it is not
> doing
> > mutations
> > on it’s persistence, not mutating critical in-memory state, won’t be
> > serving
> > produce/fetch requests. Note it doesn’t even know it’s assigned
> partitions
> > until
> > it receives UpdateMetadataRequest from controller. Anything the broker is
> > doing up
> > until this point is not damaging/useful.
> >
> > I’ve clarified the above in the KIP, see this new section:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > .
>
> My point is that during a bootstrapping stage of a cluster, we could not
pick the desired feature version as no controller is actively handling our
request. But anyway, I think this is a rare case to discuss, and the added
paragraph looks good :)


> > > 5. "adding a new Feature, updating or deleting an existing Feature",
> may
> > be
> > > I misunderstood something, I thought the features are defined in broker
> > > code, so admin could not really create a new feature?
> >
> > (Kowshik): Great point! You understood this right. Here adding a feature
> > means we are
> > adding a cluster-wide finalized *max* version for a feature that was
> > previously never finalized.
> > I have clarified this in the KIP now.
> >
> > > 6. I think we need a separate error code like
> FEATURE_UPDATE_IN_PROGRESS
> > to
> > > reject a concurrent feature update request.
> >
> > (Kowshik): Great point! I have modified the KIP adding the above (see
> > 'Tooling support -> Admin API changes').
> >
> > > 7. I think we haven't discussed the alternative solution to pass the
> > > feature information through Zookeeper. Is that mentioned in the KIP to
> > > justify why using UpdateMetadata is more favorable?
> >
> > (Kowshik): Nice question! The broker reads finalized feature info stored
> in
> > ZK,
> > only during startup when it does a validation. When serving
> > `ApiVersionsRequest`, the
> > broker does not read this info from ZK directly. I'd imagine the risk is
> > that it can increase
> > the ZK read QPS which can be a bottleneck for the system. Today, in Kafka
> > we use the
> > controller to fan out ZK updates to brokers and we want to stick to that
> > pattern to avoid
> > the ZK read bottleneck when serving `ApiVersionsRequest`.
>
> Feature changes should be roughly the same frequency as config changes.
Today, the dynamic configuration
changes are propagated via Zookeeper. So I guess propagating through
UpdateMetadata doesn't get us more benefits,
while going through ZK notification should be a simpler solution.

> > 8. I was under the impression that user could configure a range of
> > > supported versions, what's the trade-off for allowing single finalized
> > > version only?
> >
> > (Kowshik): Great question! The finalized version of a feature basically
> > refers to
> > the cluster-wide finalized feature "maximum" version. For example, if the
> > 'group_coordinator' feature
> > has the finalized version set to 10, then, it means that cluster-wide all
> > versions upto v10 are
> > supported for this feature. However, note that if some version (ex: v0)
> > gets deprecated
> > for this feature, then we don’t convey that using this scheme (also
> > supporting deprecation is a non-goal).
> >
> > (Kowshik): I’ve now modified the KIP at all points, refering to finalized
> > feature "maximum" versions.
> >
>
Understood, I don't feel strong about deprecation, but does the current KIP
keep the door open for future improvements if
someone has a need for feature deprecation? Could we briefly discuss about
it in the future work section?


> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > producer
> >
> > (Kowshik): Great point! Done.
> >
> >
> > Cheers,
> > Kowshik
> >
> >
> > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <re...@gmail.com>
> > wrote:
> >
> > > Hey Kowshik,
> > >
> > > thanks for the revised KIP. Got a couple of questions:
> > >
> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> could
> > be
> > > converted as "When is it safe for the brokers to start serving new
> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > > context.
> > >
> > > 2. In the *Explanation *section, the metadata version number part
> seems a
> > > bit blurred. Could you point a reference to later section that we going
> > to
> > > store it in Zookeeper and update it every time when there is a feature
> > > change?
> > >
> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > > features such as group coordinator semantics, there is no legal
> scenario
> > to
> > > perform a downgrade at all. So having downgrade door open is pretty
> > > error-prone as human faults happen all the time. I'm assuming as new
> > > features are implemented, it's not very hard to add a flag during
> feature
> > > creation to indicate whether this feature is "downgradable". Could you
> > > explain a bit more on the extra engineering effort for shipping this
> KIP
> > > with downgrade protection in place?
> > >
> > > 4. "Each broker’s supported dictionary of feature versions will be
> > defined
> > > in the broker code." So this means in order to restrict a certain
> > feature,
> > > we need to start the broker first and then send a feature gating
> request
> > > immediately, which introduces a time gap and the intended-to-close
> > feature
> > > could actually serve request during this phase. Do you think we should
> > also
> > > support configurations as well so that admin user could freely roll up
> a
> > > cluster with all nodes complying the same feature gating, without
> > worrying
> > > about the turnaround time to propagate the message only after the
> cluster
> > > starts up?
> > >
> > > 5. "adding a new Feature, updating or deleting an existing Feature",
> may
> > be
> > > I misunderstood something, I thought the features are defined in broker
> > > code, so admin could not really create a new feature?
> > >
> > > 6. I think we need a separate error code like
> FEATURE_UPDATE_IN_PROGRESS
> > to
> > > reject a concurrent feature update request.
> > >
> > > 7. I think we haven't discussed the alternative solution to pass the
> > > feature information through Zookeeper. Is that mentioned in the KIP to
> > > justify why using UpdateMetadata is more favorable?
> > >
> > > 8. I was under the impression that user could configure a range of
> > > supported versions, what's the trade-off for allowing single finalized
> > > version only?
> > >
> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > producer
> > >
> > > Boyang
> > >
> > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org>
> wrote:
> > >
> > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > Hi Colin,
> > > > >
> > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > suggestions.
> > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > .
> > > > >
> > > > > 1. '__data_version__' is the version of the finalized feature
> > metadata
> > > > > (i.e. actual ZK node contents), while the '__schema_version__' is
> the
> > > > > version of the schema of the data persisted in ZK. These serve
> > > different
> > > > > purposes. '__data_version__' is is useful mainly to clients during
> > > reads,
> > > > > to differentiate between the 2 versions of eventually consistent
> > > > 'finalized
> > > > > features' metadata (i.e. larger metadata version is more recent).
> > > > > '__schema_version__' provides an additional degree of flexibility,
> > > where
> > > > if
> > > > > we decide to change the schema for '/features' node in ZK (in the
> > > > future),
> > > > > then we can manage broker roll outs suitably (i.e.
> > > > > serialization/deserialization of the ZK data can be handled
> safely).
> > > >
> > > > Hi Kowshik,
> > > >
> > > > If you're talking about a number that lets you know if data is more
> or
> > > > less recent, we would typically call that an epoch, and not a
> version.
> > > For
> > > > the ZK data structures, the word "version" is typically reserved for
> > > > describing changes to the overall schema of the data that is written
> to
> > > > ZooKeeper.  We don't even really change the "version" of those
> schemas
> > > that
> > > > much, since most changes are backwards-compatible.  But we do include
> > > that
> > > > version field just in case.
> > > >
> > > > I don't think we really need an epoch here, though, since we can just
> > > look
> > > > at the broker epoch.  Whenever the broker registers, its epoch will
> be
> > > > greater than the previous broker epoch.  And the newly registered
> data
> > > will
> > > > take priority.  This will be a lot simpler than adding a separate
> epoch
> > > > system, I think.
> > > >
> > > > >
> > > > > 2. Regarding admin client needing min and max information - you are
> > > > right!
> > > > > I've changed the KIP such that the Admin API also allows the user
> to
> > > read
> > > > > 'supported features' from a specific broker. Please look at the
> > section
> > > > > "Admin API changes".
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
> > I've
> > > > > improved the KIP to just use `long` at all places.
> > > >
> > > > Sounds good.
> > > >
> > > > >
> > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right! I've
> > > > updated
> > > > > the KIP sketching the functionality provided by this tool, with
> some
> > > > > examples. Please look at the section "Tooling support examples".
> > > > >
> > > > > Thank you!
> > > >
> > > >
> > > > Thanks, Kowshik.
> > > >
> > > > cheers,
> > > > Colin
> > > >
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <cm...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Thanks, Kowshik, this looks good.
> > > > > >
> > > > > > In the "Schema" section, do we really need both
> __schema_version__
> > > and
> > > > > > __data_version__?  Can we just have a single version field here?
> > > > > >
> > > > > > Shouldn't the Admin(Client) function have some way to get the min
> > and
> > > > max
> > > > > > information that we're exposing as well?  I guess we could have
> > min,
> > > > max,
> > > > > > and current.  Unrelated: is the use of Long rather than long
> > > deliberate
> > > > > > here?
> > > > > >
> > > > > > It would be good to describe how the command line tool
> > > > > > kafka.admin.FeatureCommand will work.  For example the flags that
> > it
> > > > will
> > > > > > take and the output that it will generate to STDOUT.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I've opened KIP-584 <
> > https://issues.apache.org/jira/browse/KIP-584
> > > >
> > > > > > > which
> > > > > > > is intended to provide a versioning scheme for features. I'd
> like
> > > to
> > > > use
> > > > > > > this thread to discuss the same. I'd appreciate any feedback on
> > > this.
> > > > > > > Here
> > > > > > > is a link to KIP-584:
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > >  .
> > > > > > >
> > > > > > > Thank you!
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-584: Versioning scheme for features

Posted by Jun Rao <ju...@confluent.io>.
Hi, Kowshik,

Thanks for the KIP. Looks good overall. A few comments below.

100. UpdateFeaturesRequest/UpdateFeaturesResponse
100.1 Since this request waits for responses from brokers, should we add a
timeout in the request (like createTopicRequest)?
100.2 The response schema is a bit weird. Typically, the response just
shows an error code and an error message, instead of echoing the request.
100.3 Should we add a separate request to list/describe the existing
features?
100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
DELETE, the version field doesn't make sense. So, I guess the broker just
ignores this? An alternative way is to have a separate DeleteFeaturesRequest
100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
version of the metadata for finalized features." I am wondering why the
ordering is important?
100.6 Could you specify the required ACL for this new request?

101. For the broker registration ZK node, should we bump up the version in
the json?

102. For the /features ZK node, not sure if we need the epoch field. Each
ZK node has an internal version field that is incremented on every update.

103. "Enabling the actual semantics of a feature version cluster-wide is
left to the discretion of the logic implementing the feature (ex: can be
done via dynamic broker config)." Does that mean the broker registration ZK
node will be updated dynamically when this happens?

104. UpdateMetadataRequest
104.1 It would be useful to describe when the feature metadata is included
in the request. My understanding is that it's only included if (1) there is
a change to the finalized feature; (2) broker restart; (3) controller
failover.
104.2 The new fields have the following versions. Why are the versions 3+
when the top version is bumped to 6?
      "fields":  [
        {"name": "Name", "type":  "string", "versions":  "3+",
          "about": "The name of the feature."},
        {"name":  "Version", "type":  "int64", "versions":  "3+",
          "about": "The finalized version for the feature."}
      ]

105. kafka-features.sh: Instead of using update/delete, perhaps it's better
to use enable/disable?

Jun

On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kp...@confluent.io>
wrote:

> Hey Boyang,
>
> Thanks for the great feedback! I have updated the KIP based on your
> feedback.
> Please find my response below for your comments, look for sentences
> starting
> with "(Kowshik)" below.
>
>
> > 1. "When is it safe for the brokers to begin handling EOS traffic" could
> be
> > converted as "When is it safe for the brokers to start serving new
> > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > context.
>
> (Kowshik): Great point! Done.
>
> > 2. In the *Explanation *section, the metadata version number part seems a
> > bit blurred. Could you point a reference to later section that we going
> to
> > store it in Zookeeper and update it every time when there is a feature
> > change?
>
> (Kowshik): Great point! Done. I've added a reference in the KIP.
>
>
> > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > features such as group coordinator semantics, there is no legal scenario
> to
> > perform a downgrade at all. So having downgrade door open is pretty
> > error-prone as human faults happen all the time. I'm assuming as new
> > features are implemented, it's not very hard to add a flag during feature
> > creation to indicate whether this feature is "downgradable". Could you
> > explain a bit more on the extra engineering effort for shipping this KIP
> > with downgrade protection in place?
>
> (Kowshik): Great point! I'd agree and disagree here. While I agree that
> accidental
> downgrades can cause problems, I also think sometimes downgrades should
> be allowed for emergency reasons (not all downgrades cause issues).
> It is just subjective to the feature being downgraded.
>
> To be more strict about feature version downgrades, I have modified the KIP
> proposing that we mandate a `--force-downgrade` flag be used in the
> UPDATE_FEATURES api
> and the tooling, whenever the human is downgrading a finalized feature
> version.
> Hopefully this should cover the requirement, until we find the need for
> advanced downgrade support.
>
> > 4. "Each broker’s supported dictionary of feature versions will be
> defined
> > in the broker code." So this means in order to restrict a certain
> feature,
> > we need to start the broker first and then send a feature gating request
> > immediately, which introduces a time gap and the intended-to-close
> feature
> > could actually serve request during this phase. Do you think we should
> also
> > support configurations as well so that admin user could freely roll up a
> > cluster with all nodes complying the same feature gating, without
> worrying
> > about the turnaround time to propagate the message only after the cluster
> > starts up?
>
> (Kowshik): This is a great point/question. One of the expectations out of
> this KIP, which is
> already followed in the broker, is the following.
>  - Imagine at time T1 the broker starts up and registers it’s presence in
> ZK,
>    along with advertising it’s supported features.
>  - Imagine at a future time T2 the broker receives the
> UpdateMetadataRequest
>    from the controller, which contains the latest finalized features as
> seen by
>    the controller. The broker validates this data against it’s supported
> features to
>    make sure there is no mismatch (it will shutdown if there is an
> incompatibility).
>
> It is expected that during the time between the 2 events T1 and T2, the
> broker is
> almost a silent entity in the cluster. It does not add any value to the
> cluster, or carry
> out any important broker activities. By “important”, I mean it is not doing
> mutations
> on it’s persistence, not mutating critical in-memory state, won’t be
> serving
> produce/fetch requests. Note it doesn’t even know it’s assigned partitions
> until
> it receives UpdateMetadataRequest from controller. Anything the broker is
> doing up
> until this point is not damaging/useful.
>
> I’ve clarified the above in the KIP, see this new section:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> .
>
> > 5. "adding a new Feature, updating or deleting an existing Feature", may
> be
> > I misunderstood something, I thought the features are defined in broker
> > code, so admin could not really create a new feature?
>
> (Kowshik): Great point! You understood this right. Here adding a feature
> means we are
> adding a cluster-wide finalized *max* version for a feature that was
> previously never finalized.
> I have clarified this in the KIP now.
>
> > 6. I think we need a separate error code like FEATURE_UPDATE_IN_PROGRESS
> to
> > reject a concurrent feature update request.
>
> (Kowshik): Great point! I have modified the KIP adding the above (see
> 'Tooling support -> Admin API changes').
>
> > 7. I think we haven't discussed the alternative solution to pass the
> > feature information through Zookeeper. Is that mentioned in the KIP to
> > justify why using UpdateMetadata is more favorable?
>
> (Kowshik): Nice question! The broker reads finalized feature info stored in
> ZK,
> only during startup when it does a validation. When serving
> `ApiVersionsRequest`, the
> broker does not read this info from ZK directly. I'd imagine the risk is
> that it can increase
> the ZK read QPS which can be a bottleneck for the system. Today, in Kafka
> we use the
> controller to fan out ZK updates to brokers and we want to stick to that
> pattern to avoid
> the ZK read bottleneck when serving `ApiVersionsRequest`.
>
> > 8. I was under the impression that user could configure a range of
> > supported versions, what's the trade-off for allowing single finalized
> > version only?
>
> (Kowshik): Great question! The finalized version of a feature basically
> refers to
> the cluster-wide finalized feature "maximum" version. For example, if the
> 'group_coordinator' feature
> has the finalized version set to 10, then, it means that cluster-wide all
> versions upto v10 are
> supported for this feature. However, note that if some version (ex: v0)
> gets deprecated
> for this feature, then we don’t convey that using this scheme (also
> supporting deprecation is a non-goal).
>
> (Kowshik): I’ve now modified the KIP at all points, refering to finalized
> feature "maximum" versions.
>
> > 9. One minor syntax fix: Note that here the "client" here may be a
> producer
>
> (Kowshik): Great point! Done.
>
>
> Cheers,
> Kowshik
>
>
> On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <re...@gmail.com>
> wrote:
>
> > Hey Kowshik,
> >
> > thanks for the revised KIP. Got a couple of questions:
> >
> > 1. "When is it safe for the brokers to begin handling EOS traffic" could
> be
> > converted as "When is it safe for the brokers to start serving new
> > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > context.
> >
> > 2. In the *Explanation *section, the metadata version number part seems a
> > bit blurred. Could you point a reference to later section that we going
> to
> > store it in Zookeeper and update it every time when there is a feature
> > change?
> >
> > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > features such as group coordinator semantics, there is no legal scenario
> to
> > perform a downgrade at all. So having downgrade door open is pretty
> > error-prone as human faults happen all the time. I'm assuming as new
> > features are implemented, it's not very hard to add a flag during feature
> > creation to indicate whether this feature is "downgradable". Could you
> > explain a bit more on the extra engineering effort for shipping this KIP
> > with downgrade protection in place?
> >
> > 4. "Each broker’s supported dictionary of feature versions will be
> defined
> > in the broker code." So this means in order to restrict a certain
> feature,
> > we need to start the broker first and then send a feature gating request
> > immediately, which introduces a time gap and the intended-to-close
> feature
> > could actually serve request during this phase. Do you think we should
> also
> > support configurations as well so that admin user could freely roll up a
> > cluster with all nodes complying the same feature gating, without
> worrying
> > about the turnaround time to propagate the message only after the cluster
> > starts up?
> >
> > 5. "adding a new Feature, updating or deleting an existing Feature", may
> be
> > I misunderstood something, I thought the features are defined in broker
> > code, so admin could not really create a new feature?
> >
> > 6. I think we need a separate error code like FEATURE_UPDATE_IN_PROGRESS
> to
> > reject a concurrent feature update request.
> >
> > 7. I think we haven't discussed the alternative solution to pass the
> > feature information through Zookeeper. Is that mentioned in the KIP to
> > justify why using UpdateMetadata is more favorable?
> >
> > 8. I was under the impression that user could configure a range of
> > supported versions, what's the trade-off for allowing single finalized
> > version only?
> >
> > 9. One minor syntax fix: Note that here the "client" here may be a
> producer
> >
> > Boyang
> >
> > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cm...@apache.org> wrote:
> >
> > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > Hi Colin,
> > > >
> > > > Thanks for the feedback! I've changed the KIP to address your
> > > > suggestions.
> > > > Please find below my explanation. Here is a link to KIP 584:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > .
> > > >
> > > > 1. '__data_version__' is the version of the finalized feature
> metadata
> > > > (i.e. actual ZK node contents), while the '__schema_version__' is the
> > > > version of the schema of the data persisted in ZK. These serve
> > different
> > > > purposes. '__data_version__' is is useful mainly to clients during
> > reads,
> > > > to differentiate between the 2 versions of eventually consistent
> > > 'finalized
> > > > features' metadata (i.e. larger metadata version is more recent).
> > > > '__schema_version__' provides an additional degree of flexibility,
> > where
> > > if
> > > > we decide to change the schema for '/features' node in ZK (in the
> > > future),
> > > > then we can manage broker roll outs suitably (i.e.
> > > > serialization/deserialization of the ZK data can be handled safely).
> > >
> > > Hi Kowshik,
> > >
> > > If you're talking about a number that lets you know if data is more or
> > > less recent, we would typically call that an epoch, and not a version.
> > For
> > > the ZK data structures, the word "version" is typically reserved for
> > > describing changes to the overall schema of the data that is written to
> > > ZooKeeper.  We don't even really change the "version" of those schemas
> > that
> > > much, since most changes are backwards-compatible.  But we do include
> > that
> > > version field just in case.
> > >
> > > I don't think we really need an epoch here, though, since we can just
> > look
> > > at the broker epoch.  Whenever the broker registers, its epoch will be
> > > greater than the previous broker epoch.  And the newly registered data
> > will
> > > take priority.  This will be a lot simpler than adding a separate epoch
> > > system, I think.
> > >
> > > >
> > > > 2. Regarding admin client needing min and max information - you are
> > > right!
> > > > I've changed the KIP such that the Admin API also allows the user to
> > read
> > > > 'supported features' from a specific broker. Please look at the
> section
> > > > "Admin API changes".
> > >
> > > Thanks.
> > >
> > > >
> > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
> I've
> > > > improved the KIP to just use `long` at all places.
> > >
> > > Sounds good.
> > >
> > > >
> > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right! I've
> > > updated
> > > > the KIP sketching the functionality provided by this tool, with some
> > > > examples. Please look at the section "Tooling support examples".
> > > >
> > > > Thank you!
> > >
> > >
> > > Thanks, Kowshik.
> > >
> > > cheers,
> > > Colin
> > >
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <cm...@apache.org>
> > > wrote:
> > > >
> > > > > Thanks, Kowshik, this looks good.
> > > > >
> > > > > In the "Schema" section, do we really need both __schema_version__
> > and
> > > > > __data_version__?  Can we just have a single version field here?
> > > > >
> > > > > Shouldn't the Admin(Client) function have some way to get the min
> and
> > > max
> > > > > information that we're exposing as well?  I guess we could have
> min,
> > > max,
> > > > > and current.  Unrelated: is the use of Long rather than long
> > deliberate
> > > > > here?
> > > > >
> > > > > It would be good to describe how the command line tool
> > > > > kafka.admin.FeatureCommand will work.  For example the flags that
> it
> > > will
> > > > > take and the output that it will generate to STDOUT.
> > > > >
> > > > > cheers,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I've opened KIP-584 <
> https://issues.apache.org/jira/browse/KIP-584
> > >
> > > > > > which
> > > > > > is intended to provide a versioning scheme for features. I'd like
> > to
> > > use
> > > > > > this thread to discuss the same. I'd appreciate any feedback on
> > this.
> > > > > > Here
> > > > > > is a link to KIP-584:
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > >  .
> > > > > >
> > > > > > Thank you!
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > >
> > > >
> > >
> >
>