You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Boyang Chen <re...@gmail.com> on 2020/12/04 01:28:55 UTC

[DISCUSS] KIP-687: Automatic Reloading of Security Store

Hey there,

I would like to start the discussion thread for KIP-687:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store

This KIP is trying to deprecate the AlterConfigs API support of updating
the security store by reloading path in-place, and replace with a
file-watch mechanism inside the broker. Let me know what you think.

Best,
Boyang

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
Thanks Rajini for the comments.

On Thu, Jan 7, 2021 at 2:27 AM Rajini Sivaram <ra...@gmail.com>
wrote:

> Hi Boyang,
>
> Thanks for the KIP, I have a few questions:
>
> 1) Will it be possible to enable/disable automatic file reloading? If not,
> we should mention in the compatibility section.
>
I don't think we need to support disabling reloading as it will become the
major mechanism to refresh
the security store. Will make that clear in the KIP.

>
> 2) We are introducing new common SSL configs and updating common code to
> perform automated updates. What does this mean for clients? Are we going to
> automatically reload client key stores and trust stores?
>
> This config would not be applied to clients. Will make that clear in the
config comments.

3) We should mention in the compatibility section that we are changing the
> audit log/authorization model for dynamic updates of SSL stores. At the
> moment, only a user with powerful Cluster:Alter permissions can dynamically
> update SSL stores on brokers. The KIP removes this restriction and relies
> purely on file system permissions for file-based stores, unlike for example
> PEM store updates which would still rely on Kafka permissions.
>
> Sounds good, will make that clear.

4) Is it really necessary to add a file watcher that is not guaranteed 100%
> of the time, if we are adding refresh interval configs? In particular, if
> we extend this to clients now or at some point in the future, wouldn't it
> be better just to use deterministic refresh intervals without the overhead
> of watching?
>
> It is a used hybrid approach in certain server side scenarios. I guess the
main advantage
is that we have a best case guarantee for immediate refresh, as well as a
worst case
guarantee when the trigger fails, but not too frequent to introduce
performance problems.


> 5) The KIP doesn't talk about validation of key and trust stores. I am
> guessing we will continue to perform the same validation that we perform
> today. But what happens if validation fails? We should mention in the KIP
> that we no longer provide feedback for this case (unlike admin client
> requests that returned an error).
>
We suggested to expose error metrics and write error messages in the server
side log. As we
decouple the update mechanism from AlterConfig request, this is
unavoidable. Note that we are
only applying an in-place security file update without name change, which I
believe is one edge case.


> 5a) If a key store doesn't conform (e.g. the DN was changed), we would fail
> the update if we apply the current validation. Would we do the validation
> every 5 minutes after that forever even though the file wasn't updated
> since? Or will we remember the last reloaded time to avoid reloading if
> file hasn't changed?
>
Could you elaborate the definition for `DN`? Curious what does `conform`
suggest?


> 5b) We perform additional validation for inter-broker key and trust stores
> to ensure we never break the broker with dynamic updates. Since this
> validation matches key store with trust store, it relies on the order in
> which stores are updated. With reloading in the admin client, user had
> control over the update. We should document any restrictions on the order
> in which files need to be updated on the file system to perform updates of
> inter-broker SSL stores. And as with 5a), it will be good to document the
> behaviour if validation fails.
>
> So you are suggesting that the key store update should happen before the
trust store, or vice versa?
I'm not familiar with the restriction TBH.

Regards,
>
> Rajini
>
>
>
> On Wed, Jan 6, 2021 at 5:55 PM Jason Gustafson <ja...@confluent.io> wrote:
>
> > Thanks Boyang. Someone mentioned my email never showed up, but basically
> I
> > suggested tying the refresh configuration more directly to the
> > configurations it would affect. I'm happy with the updates.
> >
> > -Jason
> >
> > On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <re...@gmail.com>
> > wrote:
> >
> > > Thanks Jason for the feedback. I separated the time configs for key
> store
> > > and trust store, and rename the configs as you proposed.
> > >
> > > Best,
> > > Boyang
> > >
> > > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <
> reluctanthero104@gmail.com>
> > > wrote:
> > >
> > > > Hey there,
> > > >
> > > > bumping up this thread to see if there are further questions
> regarding
> > > the
> > > > updated proposal.
> > > >
> > > > Best,
> > > > Boyang
> > > >
> > > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
> > reluctanthero104@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> After some offline discussions, we believe that it's the right
> > direction
> > > >> to go by doing a hybrid approach which includes both file-watch
> > trigger
> > > and
> > > >> interval based reloading. The former guarantees a swift change in
> 99%
> > > time,
> > > >> while the latter provides a time-based guarantee in the worst case
> > when
> > > the
> > > >> file-watch does not take effect. The current default reloading
> > interval
> > > is
> > > >> set to 5 min. I have updated the KIP and ticket, feel free to check
> > out
> > > and
> > > >> see if it makes sense.
> > > >>
> > > >> Best,
> > > >> Boyang
> > > >>
> > > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
> > reluctanthero104@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hey Gwen, thanks for the feedback.
> > > >>>
> > > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io>
> > > wrote:
> > > >>>
> > > >>>> Agree with Igor. IIRC, we also encountered cases where filewatch
> was
> > > >>>> not triggered as expected. An interval will give us a better
> > > >>>> worse-case scenario that is easily controlled by the Kafka admin.
> > > >>>>
> > > >>>> Are the cases you were referring to happening in the cloud
> > > environment?
> > > >>> Should we investigate instead of simply assuming the standard API
> > won't
> > > >>> work? I checked around and found a similar complaint here
> > > >>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
> > > >>>
> > > >>> I would be partially agreeing that we want to have a reliable
> > approach
> > > >>> for all different operating systems in general, but would be great
> if
> > > we
> > > >>> could reach a quantitative measure of file-watch success rate if
> > > possible
> > > >>> for us to make the call. Eventually, the benefit of file-watch is
> > more
> > > >>> prompt reaction time and less configuration to the broker.
> > > >>>
> > > >>>> Gwen
> > > >>>>
> > > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
> > > >>>> >
> > > >>>> >
> > > >>>> > > > The proposed change relies on a file watch, why not also
> have
> > a
> > > >>>> polling
> > > >>>> > > > interval to check the file for changes?
> > > >>>> > > >
> > > >>>> > > > The periodical check could work, the slight downside is that
> > we
> > > >>>> need
> > > >>>> > > additional configurations to schedule the interval. Do you
> think
> > > the
> > > >>>> > > file-watch approach has any extra overhead than the interval
> > based
> > > >>>> solution?
> > > >>>> >
> > > >>>> > I don't think so. The reason I'm asking this is the KIP
> currently
> > > >>>> includes:
> > > >>>> >
> > > >>>> >   "When the file watch does not work for unknown reason, user
> > could
> > > >>>> still try to change the store path in an explicit AlterConfig call
> > in
> > > the
> > > >>>> worst case."
> > > >>>> >
> > > >>>> > Having the interval in addition to the file watch could result
> in
> > a
> > > >>>> better worst case scenario.
> > > >>>> > I understand it would require introducing at least one new
> > > >>>> configuration for the interval, so maybe this doesn't have to
> solved
> > > in
> > > >>>> this KIP.
> > > >>>> >
> > > >>>> > --
> > > >>>> > Igor
> > > >>>> >
> > > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > > >>>> > > Hey Igor, thanks for the feedback.
> > > >>>> > >
> > > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me>
> wrote:
> > > >>>> > >
> > > >>>> > > > Hi Boyang,
> > > >>>> > > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > > What happens if the file is changed into an invalid store?
> > Does
> > > >>>> the
> > > >>>> > > > previous store stay in use?
> > > >>>> > > >
> > > >>>> > > > If the reload fails, the previous store should be
> effective. I
> > > >>>> will state
> > > >>>> > > that in the KIP.
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > > Thanks,
> > > >>>> > > >
> > > >>>> > > > --
> > > >>>> > > > Igor
> > > >>>> > > >
> > > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > > >>>> > > > > Hey there,
> > > >>>> > > > >
> > > >>>> > > > > I would like to start the discussion thread for KIP-687:
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > > >>>> > > > >
> > > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API
> support
> > > of
> > > >>>> updating
> > > >>>> > > > > the security store by reloading path in-place, and replace
> > > with
> > > >>>> a
> > > >>>> > > > > file-watch mechanism inside the broker. Let me know what
> you
> > > >>>> think.
> > > >>>> > > > >
> > > >>>> > > > > Best,
> > > >>>> > > > > Boyang
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Gwen Shapira
> > > >>>> Engineering Manager | Confluent
> > > >>>> 650.450.2760 | @gwenshap
> > > >>>> Follow us: Twitter | blog
> > > >>>>
> > > >>>
> > >
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Rajini Sivaram <ra...@gmail.com>.
Hi Boyang,

Thanks for the KIP, I have a few questions:

1) Will it be possible to enable/disable automatic file reloading? If not,
we should mention in the compatibility section.

2) We are introducing new common SSL configs and updating common code to
perform automated updates. What does this mean for clients? Are we going to
automatically reload client key stores and trust stores?

3) We should mention in the compatibility section that we are changing the
audit log/authorization model for dynamic updates of SSL stores. At the
moment, only a user with powerful Cluster:Alter permissions can dynamically
update SSL stores on brokers. The KIP removes this restriction and relies
purely on file system permissions for file-based stores, unlike for example
PEM store updates which would still rely on Kafka permissions.

4) Is it really necessary to add a file watcher that is not guaranteed 100%
of the time, if we are adding refresh interval configs? In particular, if
we extend this to clients now or at some point in the future, wouldn't it
be better just to use deterministic refresh intervals without the overhead
of watching?

5) The KIP doesn't talk about validation of key and trust stores. I am
guessing we will continue to perform the same validation that we perform
today. But what happens if validation fails? We should mention in the KIP
that we no longer provide feedback for this case (unlike admin client
requests that returned an error).
5a) If a key store doesn't conform (e.g. the DN was changed), we would fail
the update if we apply the current validation. Would we do the validation
every 5 minutes after that forever even though the file wasn't updated
since? Or will we remember the last reloaded time to avoid reloading if
file hasn't changed?
5b) We perform additional validation for inter-broker key and trust stores
to ensure we never break the broker with dynamic updates. Since this
validation matches key store with trust store, it relies on the order in
which stores are updated. With reloading in the admin client, user had
control over the update. We should document any restrictions on the order
in which files need to be updated on the file system to perform updates of
inter-broker SSL stores. And as with 5a), it will be good to document the
behaviour if validation fails.

Regards,

Rajini



On Wed, Jan 6, 2021 at 5:55 PM Jason Gustafson <ja...@confluent.io> wrote:

> Thanks Boyang. Someone mentioned my email never showed up, but basically I
> suggested tying the refresh configuration more directly to the
> configurations it would affect. I'm happy with the updates.
>
> -Jason
>
> On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <re...@gmail.com>
> wrote:
>
> > Thanks Jason for the feedback. I separated the time configs for key store
> > and trust store, and rename the configs as you proposed.
> >
> > Best,
> > Boyang
> >
> > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <re...@gmail.com>
> > wrote:
> >
> > > Hey there,
> > >
> > > bumping up this thread to see if there are further questions regarding
> > the
> > > updated proposal.
> > >
> > > Best,
> > > Boyang
> > >
> > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
> reluctanthero104@gmail.com
> > >
> > > wrote:
> > >
> > >> After some offline discussions, we believe that it's the right
> direction
> > >> to go by doing a hybrid approach which includes both file-watch
> trigger
> > and
> > >> interval based reloading. The former guarantees a swift change in 99%
> > time,
> > >> while the latter provides a time-based guarantee in the worst case
> when
> > the
> > >> file-watch does not take effect. The current default reloading
> interval
> > is
> > >> set to 5 min. I have updated the KIP and ticket, feel free to check
> out
> > and
> > >> see if it makes sense.
> > >>
> > >> Best,
> > >> Boyang
> > >>
> > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
> reluctanthero104@gmail.com>
> > >> wrote:
> > >>
> > >>> Hey Gwen, thanks for the feedback.
> > >>>
> > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io>
> > wrote:
> > >>>
> > >>>> Agree with Igor. IIRC, we also encountered cases where filewatch was
> > >>>> not triggered as expected. An interval will give us a better
> > >>>> worse-case scenario that is easily controlled by the Kafka admin.
> > >>>>
> > >>>> Are the cases you were referring to happening in the cloud
> > environment?
> > >>> Should we investigate instead of simply assuming the standard API
> won't
> > >>> work? I checked around and found a similar complaint here
> > >>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
> > >>>
> > >>> I would be partially agreeing that we want to have a reliable
> approach
> > >>> for all different operating systems in general, but would be great if
> > we
> > >>> could reach a quantitative measure of file-watch success rate if
> > possible
> > >>> for us to make the call. Eventually, the benefit of file-watch is
> more
> > >>> prompt reaction time and less configuration to the broker.
> > >>>
> > >>>> Gwen
> > >>>>
> > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
> > >>>> >
> > >>>> >
> > >>>> > > > The proposed change relies on a file watch, why not also have
> a
> > >>>> polling
> > >>>> > > > interval to check the file for changes?
> > >>>> > > >
> > >>>> > > > The periodical check could work, the slight downside is that
> we
> > >>>> need
> > >>>> > > additional configurations to schedule the interval. Do you think
> > the
> > >>>> > > file-watch approach has any extra overhead than the interval
> based
> > >>>> solution?
> > >>>> >
> > >>>> > I don't think so. The reason I'm asking this is the KIP currently
> > >>>> includes:
> > >>>> >
> > >>>> >   "When the file watch does not work for unknown reason, user
> could
> > >>>> still try to change the store path in an explicit AlterConfig call
> in
> > the
> > >>>> worst case."
> > >>>> >
> > >>>> > Having the interval in addition to the file watch could result in
> a
> > >>>> better worst case scenario.
> > >>>> > I understand it would require introducing at least one new
> > >>>> configuration for the interval, so maybe this doesn't have to solved
> > in
> > >>>> this KIP.
> > >>>> >
> > >>>> > --
> > >>>> > Igor
> > >>>> >
> > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > >>>> > > Hey Igor, thanks for the feedback.
> > >>>> > >
> > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
> > >>>> > >
> > >>>> > > > Hi Boyang,
> > >>>> > > >
> > >>>> > >
> > >>>> > >
> > >>>> > > > What happens if the file is changed into an invalid store?
> Does
> > >>>> the
> > >>>> > > > previous store stay in use?
> > >>>> > > >
> > >>>> > > > If the reload fails, the previous store should be effective. I
> > >>>> will state
> > >>>> > > that in the KIP.
> > >>>> > >
> > >>>> > >
> > >>>> > > > Thanks,
> > >>>> > > >
> > >>>> > > > --
> > >>>> > > > Igor
> > >>>> > > >
> > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > >>>> > > > > Hey there,
> > >>>> > > > >
> > >>>> > > > > I would like to start the discussion thread for KIP-687:
> > >>>> > > > >
> > >>>> > > >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > >>>> > > > >
> > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API support
> > of
> > >>>> updating
> > >>>> > > > > the security store by reloading path in-place, and replace
> > with
> > >>>> a
> > >>>> > > > > file-watch mechanism inside the broker. Let me know what you
> > >>>> think.
> > >>>> > > > >
> > >>>> > > > > Best,
> > >>>> > > > > Boyang
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Gwen Shapira
> > >>>> Engineering Manager | Confluent
> > >>>> 650.450.2760 | @gwenshap
> > >>>> Follow us: Twitter | blog
> > >>>>
> > >>>
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
In my offline discussion with Rajini and other folks, we basically had a
better understanding for the following problems:

1. Whether there should be a clear disabling mechanism for periodical
refreshing. We thought about using -1 for a special refreshing interval
value to disable it, but as a matter of fact users could just set the value
to max long in order to disable periodical refresh when they don't want to
have it in a rare case, so we don't need to have this extra complexity.

2. Whether we need file-watch triggering at all. It is agreed that having a
hybrid approach will increase the implementation complexity, and we could
potentially ask existing customers to avoid doing in-place file updates in
the upgrade notes if we don't implement the file-watcher at all. Compared
with the plain periodical refreshing approach, file-watcher still has the
merits of maintaining the same semantic behavior for existing in-place
security store swap, and it is not necessarily too complicated to block the
project progress. Above all, this features an internal implementation
detail which we don't need to resolve at the KIP discussion stage, so we
should keep it and revisit if things indeed go down.

Let me know your thoughts.

Best,
Boyang

On Fri, Jan 8, 2021 at 10:42 AM Boyang Chen <re...@gmail.com>
wrote:

>
>
> On Fri, Jan 8, 2021 at 4:46 AM Rajini Sivaram <ra...@gmail.com>
> wrote:
>
>> Hi Boyang,
>>
>> Thanks for the responses. Follow up comments on a couple of those:
>>
>> 4) Can you provide some more details on the scenarios where file watcher
>> is
>> useful. You mentioned hybrid, but it is not clear to me why a watcher that
>> reloads 99% of the time would be useful. There are a few cases to
>> consider:
>> 4a) Refresh SSL stores prior to expiry. In this case, time-interval-based
>> reloading makes sense and this is similar to refresh of other security
>> credentials. It seems reasonable to expect that stores are updated well
>> before expiry and that the refresh interval would be configured
>> accordingly.
>> 4b) Remove a compromised client certificate from the trust store. This is
>> a
>> critical case. With the current code, you send an admin client request and
>> get feedback on whether the update happened. A watcher that may or may not
>> work is pretty much useless. The only reliable way to handle this case
>> after this KIP would be to change the file name and send an admin client
>> request.
>> 4c) Change broker key store, which also perhaps requires change to the
>> truststore. This is the ordering issue introduced by this KIP. This is
>> also
>> the issue that David Jacot brought up. With the current code, you update
>> keystore and truststore and send admin client requests to perform the
>> update and hence you have total control on when each update is performed
>> and whether they are performed together. With the proposed KIP, we lose
>> this control. Like 4b), the only reliable way after this KIP to handle
>> this
>> case would be to change filenames and send an admin client request.
>> 4d) Add a new client certificate to the broker truststore. This is the
>> only
>> case where a file watcher seems useful to reload sooner.
>>
>> To summarize, two of the four cases are no longer reliable without file
>> name change and it would be better to document these as limitations. These
>> are rare cases where file name change seems reasonable. Periodic refresh,
>> which is the most common case and one that we would want to support in
>> clients some day can be supported without a file watcher. That leaves 4d)
>> which I am not sure justifies the use of a file watcher that cannot be
>> disabled.
>>
>>  My understanding is that the hybrid approach reduces the amount of time
> to
> propagate security changes, instead of waiting for unnecessarily long
> time. Secondly IIUC,
> even with automatic tooling, you need to do the key store and trust store
> updates in two separate AlterConfigs,
> where one must happen after the other is successful. If we have to use an
> RPC, it has to decouple from
> AlterConfig and sent twice to a specific broker, which we discussed as a
> rejected alternative due to the complexity and overhead.
>
> To put it straight, how many use cases today do we have to do the same
> name store file content update? My impression was
> that KIP-687 was just trying to solve a very special edge case with a
> minimal effort, not trying to officially claim that
> in-place security file update should be a main-stream use case.
>
> 5) Dynamic updates are restricted to ensure brokers don't become unusable
>> after an update. For example, we don't allow the distinguished name (DN)
>>  in the certificate that is used to determine user identity to be changed
>> dynamically. Similarly, you cannot remove host names included in the
>> certificates  since that may fail client validation. For certificates used
>> for inter-broker communication, we validate the key store against the
>> trust
>> store to ensure that brokers are able to communicate to each other. With
>> admin client based updates, we attempt an update and provide feedback so
>> that the user can take action. The KIP needs more detail on how errors
>> will
>> be handled with the watcher or periodic updates. It also needs more detail
>> on the new requirements being introduced for ordering of updates of files
>> on the file system or clearly state the cases which will no longer be
>> supported without file  name change.
>>
>> I guess we will have to rely on the application error log to detect an
> update failure.
> If it helps, we could get a separate log file specifically for security
> store reloading progress, but
> I don't think we could propagate the validation error in the admin client
> as before. Could you
> show me the documentation on how the tooling is being used today for
> security store reloading which
> has an ordering that we might need to change?
>
>>
>> On Thu, Jan 7, 2021 at 6:46 PM Boyang Chen <re...@gmail.com>
>> wrote:
>>
>> > Hey David, thanks for the feedback.
>> >
>> > On Thu, Jan 7, 2021 at 2:37 AM David Jacot <dj...@confluent.io> wrote:
>> >
>> > > Hi Boyang,
>> > >
>> > > Thanks for the KIP. I am fine with it in general. I just have a few
>> > > comments.
>> > >
>> > > With the proposal, we don't have the guarantee that both the new
>> keystore
>> > > and the new truststore will be picked up together so we may end up
>> with
>> > > the new keystore and the old truststore for a short period of time, or
>> > > permanently
>> > > if the second one can't be reloaded for any reason.
>> > >
>> > > This could disallow clients to authenticate for a while if the new
>> > keystore
>> > > and the
>> > > new trustore are not crafted to work with their old versions.
>> > >
>> > > I wonder how this would work in practice. Do we already have guards in
>> > > place to avoid this or could we add something to ensure that listeners
>> > are
>> > > updated only if both the truststore and the keystore works with each
>> > other?
>> > >
>> > > We don't have this issue today as both the truststore and the keystore
>> > are
>> > > reloaded when the AlterConfig RPC is received so the admin can control
>> > > this process. It is all or nothing.
>> > >
>> > > I'm not sure how we achieve that today, if we are updating both key
>> store
>> > and trust store
>> > in the same request, it is still possible that one update succeeded but
>> > another failed. Does users
>> > have awareness to make the guarantee by themselves? My
>> > point is that today there is no reliable way to achieve atomic update
>> > either, and hoping users
>> > to make the correct sequence of decisions would be hard.
>> >
>> > I think that this is acceptable but it is worth clearly mentioning that
>> > > there is no
>> > > guarantee from that regard in the KIP, and later in the doc. Perhaps,
>> we
>> > > could
>> > > also mention that updating them in place is not a best practice and
>> that
>> > > using
>> > > new paths gives better control to the admin.
>> > >
>> > > Best,
>> > > David
>> > >
>> > > On Wed, Jan 6, 2021 at 6:55 PM Jason Gustafson <ja...@confluent.io>
>> > wrote:
>> > >
>> > > > Thanks Boyang. Someone mentioned my email never showed up, but
>> > basically
>> > > I
>> > > > suggested tying the refresh configuration more directly to the
>> > > > configurations it would affect. I'm happy with the updates.
>> > > >
>> > > > -Jason
>> > > >
>> > > > On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <
>> reluctanthero104@gmail.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > Thanks Jason for the feedback. I separated the time configs for
>> key
>> > > store
>> > > > > and trust store, and rename the configs as you proposed.
>> > > > >
>> > > > > Best,
>> > > > > Boyang
>> > > > >
>> > > > > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <
>> > > reluctanthero104@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hey there,
>> > > > > >
>> > > > > > bumping up this thread to see if there are further questions
>> > > regarding
>> > > > > the
>> > > > > > updated proposal.
>> > > > > >
>> > > > > > Best,
>> > > > > > Boyang
>> > > > > >
>> > > > > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
>> > > > reluctanthero104@gmail.com
>> > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > >> After some offline discussions, we believe that it's the right
>> > > > direction
>> > > > > >> to go by doing a hybrid approach which includes both file-watch
>> > > > trigger
>> > > > > and
>> > > > > >> interval based reloading. The former guarantees a swift change
>> in
>> > > 99%
>> > > > > time,
>> > > > > >> while the latter provides a time-based guarantee in the worst
>> case
>> > > > when
>> > > > > the
>> > > > > >> file-watch does not take effect. The current default reloading
>> > > > interval
>> > > > > is
>> > > > > >> set to 5 min. I have updated the KIP and ticket, feel free to
>> > check
>> > > > out
>> > > > > and
>> > > > > >> see if it makes sense.
>> > > > > >>
>> > > > > >> Best,
>> > > > > >> Boyang
>> > > > > >>
>> > > > > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
>> > > > reluctanthero104@gmail.com>
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >>> Hey Gwen, thanks for the feedback.
>> > > > > >>>
>> > > > > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <
>> gwen@confluent.io>
>> > > > > wrote:
>> > > > > >>>
>> > > > > >>>> Agree with Igor. IIRC, we also encountered cases where
>> filewatch
>> > > was
>> > > > > >>>> not triggered as expected. An interval will give us a better
>> > > > > >>>> worse-case scenario that is easily controlled by the Kafka
>> > admin.
>> > > > > >>>>
>> > > > > >>>> Are the cases you were referring to happening in the cloud
>> > > > > environment?
>> > > > > >>> Should we investigate instead of simply assuming the standard
>> API
>> > > > won't
>> > > > > >>> work? I checked around and found a similar complaint here
>> > > > > >>> <
>> https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
>> > > > > >>>
>> > > > > >>> I would be partially agreeing that we want to have a reliable
>> > > > approach
>> > > > > >>> for all different operating systems in general, but would be
>> > great
>> > > if
>> > > > > we
>> > > > > >>> could reach a quantitative measure of file-watch success rate
>> if
>> > > > > possible
>> > > > > >>> for us to make the call. Eventually, the benefit of
>> file-watch is
>> > > > more
>> > > > > >>> prompt reaction time and less configuration to the broker.
>> > > > > >>>
>> > > > > >>>> Gwen
>> > > > > >>>>
>> > > > > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me>
>> wrote:
>> > > > > >>>> >
>> > > > > >>>> >
>> > > > > >>>> > > > The proposed change relies on a file watch, why not
>> also
>> > > have
>> > > > a
>> > > > > >>>> polling
>> > > > > >>>> > > > interval to check the file for changes?
>> > > > > >>>> > > >
>> > > > > >>>> > > > The periodical check could work, the slight downside is
>> > that
>> > > > we
>> > > > > >>>> need
>> > > > > >>>> > > additional configurations to schedule the interval. Do
>> you
>> > > think
>> > > > > the
>> > > > > >>>> > > file-watch approach has any extra overhead than the
>> interval
>> > > > based
>> > > > > >>>> solution?
>> > > > > >>>> >
>> > > > > >>>> > I don't think so. The reason I'm asking this is the KIP
>> > > currently
>> > > > > >>>> includes:
>> > > > > >>>> >
>> > > > > >>>> >   "When the file watch does not work for unknown reason,
>> user
>> > > > could
>> > > > > >>>> still try to change the store path in an explicit AlterConfig
>> > call
>> > > > in
>> > > > > the
>> > > > > >>>> worst case."
>> > > > > >>>> >
>> > > > > >>>> > Having the interval in addition to the file watch could
>> result
>> > > in
>> > > > a
>> > > > > >>>> better worst case scenario.
>> > > > > >>>> > I understand it would require introducing at least one new
>> > > > > >>>> configuration for the interval, so maybe this doesn't have to
>> > > solved
>> > > > > in
>> > > > > >>>> this KIP.
>> > > > > >>>> >
>> > > > > >>>> > --
>> > > > > >>>> > Igor
>> > > > > >>>> >
>> > > > > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
>> > > > > >>>> > > Hey Igor, thanks for the feedback.
>> > > > > >>>> > >
>> > > > > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me>
>> > > wrote:
>> > > > > >>>> > >
>> > > > > >>>> > > > Hi Boyang,
>> > > > > >>>> > > >
>> > > > > >>>> > >
>> > > > > >>>> > >
>> > > > > >>>> > > > What happens if the file is changed into an invalid
>> store?
>> > > > Does
>> > > > > >>>> the
>> > > > > >>>> > > > previous store stay in use?
>> > > > > >>>> > > >
>> > > > > >>>> > > > If the reload fails, the previous store should be
>> > > effective. I
>> > > > > >>>> will state
>> > > > > >>>> > > that in the KIP.
>> > > > > >>>> > >
>> > > > > >>>> > >
>> > > > > >>>> > > > Thanks,
>> > > > > >>>> > > >
>> > > > > >>>> > > > --
>> > > > > >>>> > > > Igor
>> > > > > >>>> > > >
>> > > > > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
>> > > > > >>>> > > > > Hey there,
>> > > > > >>>> > > > >
>> > > > > >>>> > > > > I would like to start the discussion thread for
>> KIP-687:
>> > > > > >>>> > > > >
>> > > > > >>>> > > >
>> > > > > >>>>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
>> > > > > >>>> > > > >
>> > > > > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API
>> > > support
>> > > > > of
>> > > > > >>>> updating
>> > > > > >>>> > > > > the security store by reloading path in-place, and
>> > replace
>> > > > > with
>> > > > > >>>> a
>> > > > > >>>> > > > > file-watch mechanism inside the broker. Let me know
>> what
>> > > you
>> > > > > >>>> think.
>> > > > > >>>> > > > >
>> > > > > >>>> > > > > Best,
>> > > > > >>>> > > > > Boyang
>> > > > > >>>> > > > >
>> > > > > >>>> > > >
>> > > > > >>>> > >
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> --
>> > > > > >>>> Gwen Shapira
>> > > > > >>>> Engineering Manager | Confluent
>> > > > > >>>> 650.450.2760 | @gwenshap
>> > > > > >>>> Follow us: Twitter | blog
>> > > > > >>>>
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
On Fri, Jan 8, 2021 at 4:46 AM Rajini Sivaram <ra...@gmail.com>
wrote:

> Hi Boyang,
>
> Thanks for the responses. Follow up comments on a couple of those:
>
> 4) Can you provide some more details on the scenarios where file watcher is
> useful. You mentioned hybrid, but it is not clear to me why a watcher that
> reloads 99% of the time would be useful. There are a few cases to consider:
> 4a) Refresh SSL stores prior to expiry. In this case, time-interval-based
> reloading makes sense and this is similar to refresh of other security
> credentials. It seems reasonable to expect that stores are updated well
> before expiry and that the refresh interval would be configured
> accordingly.
> 4b) Remove a compromised client certificate from the trust store. This is a
> critical case. With the current code, you send an admin client request and
> get feedback on whether the update happened. A watcher that may or may not
> work is pretty much useless. The only reliable way to handle this case
> after this KIP would be to change the file name and send an admin client
> request.
> 4c) Change broker key store, which also perhaps requires change to the
> truststore. This is the ordering issue introduced by this KIP. This is also
> the issue that David Jacot brought up. With the current code, you update
> keystore and truststore and send admin client requests to perform the
> update and hence you have total control on when each update is performed
> and whether they are performed together. With the proposed KIP, we lose
> this control. Like 4b), the only reliable way after this KIP to handle this
> case would be to change filenames and send an admin client request.
> 4d) Add a new client certificate to the broker truststore. This is the only
> case where a file watcher seems useful to reload sooner.
>
> To summarize, two of the four cases are no longer reliable without file
> name change and it would be better to document these as limitations. These
> are rare cases where file name change seems reasonable. Periodic refresh,
> which is the most common case and one that we would want to support in
> clients some day can be supported without a file watcher. That leaves 4d)
> which I am not sure justifies the use of a file watcher that cannot be
> disabled.
>
>  My understanding is that the hybrid approach reduces the amount of time
to
propagate security changes, instead of waiting for unnecessarily long time.
Secondly IIUC,
even with automatic tooling, you need to do the key store and trust store
updates in two separate AlterConfigs,
where one must happen after the other is successful. If we have to use an
RPC, it has to decouple from
AlterConfig and sent twice to a specific broker, which we discussed as a
rejected alternative due to the complexity and overhead.

To put it straight, how many use cases today do we have to do the same name
store file content update? My impression was
that KIP-687 was just trying to solve a very special edge case with a
minimal effort, not trying to officially claim that
in-place security file update should be a main-stream use case.

5) Dynamic updates are restricted to ensure brokers don't become unusable
> after an update. For example, we don't allow the distinguished name (DN)
>  in the certificate that is used to determine user identity to be changed
> dynamically. Similarly, you cannot remove host names included in the
> certificates  since that may fail client validation. For certificates used
> for inter-broker communication, we validate the key store against the trust
> store to ensure that brokers are able to communicate to each other. With
> admin client based updates, we attempt an update and provide feedback so
> that the user can take action. The KIP needs more detail on how errors will
> be handled with the watcher or periodic updates. It also needs more detail
> on the new requirements being introduced for ordering of updates of files
> on the file system or clearly state the cases which will no longer be
> supported without file  name change.
>
> I guess we will have to rely on the application error log to detect an
update failure.
If it helps, we could get a separate log file specifically for security
store reloading progress, but
I don't think we could propagate the validation error in the admin client
as before. Could you
show me the documentation on how the tooling is being used today for
security store reloading which
has an ordering that we might need to change?

>
> On Thu, Jan 7, 2021 at 6:46 PM Boyang Chen <re...@gmail.com>
> wrote:
>
> > Hey David, thanks for the feedback.
> >
> > On Thu, Jan 7, 2021 at 2:37 AM David Jacot <dj...@confluent.io> wrote:
> >
> > > Hi Boyang,
> > >
> > > Thanks for the KIP. I am fine with it in general. I just have a few
> > > comments.
> > >
> > > With the proposal, we don't have the guarantee that both the new
> keystore
> > > and the new truststore will be picked up together so we may end up with
> > > the new keystore and the old truststore for a short period of time, or
> > > permanently
> > > if the second one can't be reloaded for any reason.
> > >
> > > This could disallow clients to authenticate for a while if the new
> > keystore
> > > and the
> > > new trustore are not crafted to work with their old versions.
> > >
> > > I wonder how this would work in practice. Do we already have guards in
> > > place to avoid this or could we add something to ensure that listeners
> > are
> > > updated only if both the truststore and the keystore works with each
> > other?
> > >
> > > We don't have this issue today as both the truststore and the keystore
> > are
> > > reloaded when the AlterConfig RPC is received so the admin can control
> > > this process. It is all or nothing.
> > >
> > > I'm not sure how we achieve that today, if we are updating both key
> store
> > and trust store
> > in the same request, it is still possible that one update succeeded but
> > another failed. Does users
> > have awareness to make the guarantee by themselves? My
> > point is that today there is no reliable way to achieve atomic update
> > either, and hoping users
> > to make the correct sequence of decisions would be hard.
> >
> > I think that this is acceptable but it is worth clearly mentioning that
> > > there is no
> > > guarantee from that regard in the KIP, and later in the doc. Perhaps,
> we
> > > could
> > > also mention that updating them in place is not a best practice and
> that
> > > using
> > > new paths gives better control to the admin.
> > >
> > > Best,
> > > David
> > >
> > > On Wed, Jan 6, 2021 at 6:55 PM Jason Gustafson <ja...@confluent.io>
> > wrote:
> > >
> > > > Thanks Boyang. Someone mentioned my email never showed up, but
> > basically
> > > I
> > > > suggested tying the refresh configuration more directly to the
> > > > configurations it would affect. I'm happy with the updates.
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <
> reluctanthero104@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Thanks Jason for the feedback. I separated the time configs for key
> > > store
> > > > > and trust store, and rename the configs as you proposed.
> > > > >
> > > > > Best,
> > > > > Boyang
> > > > >
> > > > > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <
> > > reluctanthero104@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hey there,
> > > > > >
> > > > > > bumping up this thread to see if there are further questions
> > > regarding
> > > > > the
> > > > > > updated proposal.
> > > > > >
> > > > > > Best,
> > > > > > Boyang
> > > > > >
> > > > > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
> > > > reluctanthero104@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> After some offline discussions, we believe that it's the right
> > > > direction
> > > > > >> to go by doing a hybrid approach which includes both file-watch
> > > > trigger
> > > > > and
> > > > > >> interval based reloading. The former guarantees a swift change
> in
> > > 99%
> > > > > time,
> > > > > >> while the latter provides a time-based guarantee in the worst
> case
> > > > when
> > > > > the
> > > > > >> file-watch does not take effect. The current default reloading
> > > > interval
> > > > > is
> > > > > >> set to 5 min. I have updated the KIP and ticket, feel free to
> > check
> > > > out
> > > > > and
> > > > > >> see if it makes sense.
> > > > > >>
> > > > > >> Best,
> > > > > >> Boyang
> > > > > >>
> > > > > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
> > > > reluctanthero104@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hey Gwen, thanks for the feedback.
> > > > > >>>
> > > > > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <
> gwen@confluent.io>
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Agree with Igor. IIRC, we also encountered cases where
> filewatch
> > > was
> > > > > >>>> not triggered as expected. An interval will give us a better
> > > > > >>>> worse-case scenario that is easily controlled by the Kafka
> > admin.
> > > > > >>>>
> > > > > >>>> Are the cases you were referring to happening in the cloud
> > > > > environment?
> > > > > >>> Should we investigate instead of simply assuming the standard
> API
> > > > won't
> > > > > >>> work? I checked around and found a similar complaint here
> > > > > >>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/
> >.
> > > > > >>>
> > > > > >>> I would be partially agreeing that we want to have a reliable
> > > > approach
> > > > > >>> for all different operating systems in general, but would be
> > great
> > > if
> > > > > we
> > > > > >>> could reach a quantitative measure of file-watch success rate
> if
> > > > > possible
> > > > > >>> for us to make the call. Eventually, the benefit of file-watch
> is
> > > > more
> > > > > >>> prompt reaction time and less configuration to the broker.
> > > > > >>>
> > > > > >>>> Gwen
> > > > > >>>>
> > > > > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me>
> wrote:
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > > > The proposed change relies on a file watch, why not also
> > > have
> > > > a
> > > > > >>>> polling
> > > > > >>>> > > > interval to check the file for changes?
> > > > > >>>> > > >
> > > > > >>>> > > > The periodical check could work, the slight downside is
> > that
> > > > we
> > > > > >>>> need
> > > > > >>>> > > additional configurations to schedule the interval. Do you
> > > think
> > > > > the
> > > > > >>>> > > file-watch approach has any extra overhead than the
> interval
> > > > based
> > > > > >>>> solution?
> > > > > >>>> >
> > > > > >>>> > I don't think so. The reason I'm asking this is the KIP
> > > currently
> > > > > >>>> includes:
> > > > > >>>> >
> > > > > >>>> >   "When the file watch does not work for unknown reason,
> user
> > > > could
> > > > > >>>> still try to change the store path in an explicit AlterConfig
> > call
> > > > in
> > > > > the
> > > > > >>>> worst case."
> > > > > >>>> >
> > > > > >>>> > Having the interval in addition to the file watch could
> result
> > > in
> > > > a
> > > > > >>>> better worst case scenario.
> > > > > >>>> > I understand it would require introducing at least one new
> > > > > >>>> configuration for the interval, so maybe this doesn't have to
> > > solved
> > > > > in
> > > > > >>>> this KIP.
> > > > > >>>> >
> > > > > >>>> > --
> > > > > >>>> > Igor
> > > > > >>>> >
> > > > > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > > > > >>>> > > Hey Igor, thanks for the feedback.
> > > > > >>>> > >
> > > > > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me>
> > > wrote:
> > > > > >>>> > >
> > > > > >>>> > > > Hi Boyang,
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > > > What happens if the file is changed into an invalid
> store?
> > > > Does
> > > > > >>>> the
> > > > > >>>> > > > previous store stay in use?
> > > > > >>>> > > >
> > > > > >>>> > > > If the reload fails, the previous store should be
> > > effective. I
> > > > > >>>> will state
> > > > > >>>> > > that in the KIP.
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > > > Thanks,
> > > > > >>>> > > >
> > > > > >>>> > > > --
> > > > > >>>> > > > Igor
> > > > > >>>> > > >
> > > > > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > > > > >>>> > > > > Hey there,
> > > > > >>>> > > > >
> > > > > >>>> > > > > I would like to start the discussion thread for
> KIP-687:
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > > > > >>>> > > > >
> > > > > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API
> > > support
> > > > > of
> > > > > >>>> updating
> > > > > >>>> > > > > the security store by reloading path in-place, and
> > replace
> > > > > with
> > > > > >>>> a
> > > > > >>>> > > > > file-watch mechanism inside the broker. Let me know
> what
> > > you
> > > > > >>>> think.
> > > > > >>>> > > > >
> > > > > >>>> > > > > Best,
> > > > > >>>> > > > > Boyang
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> --
> > > > > >>>> Gwen Shapira
> > > > > >>>> Engineering Manager | Confluent
> > > > > >>>> 650.450.2760 | @gwenshap
> > > > > >>>> Follow us: Twitter | blog
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Rajini Sivaram <ra...@gmail.com>.
Hi Boyang,

Thanks for the responses. Follow up comments on a couple of those:

4) Can you provide some more details on the scenarios where file watcher is
useful. You mentioned hybrid, but it is not clear to me why a watcher that
reloads 99% of the time would be useful. There are a few cases to consider:
4a) Refresh SSL stores prior to expiry. In this case, time-interval-based
reloading makes sense and this is similar to refresh of other security
credentials. It seems reasonable to expect that stores are updated well
before expiry and that the refresh interval would be configured
accordingly.
4b) Remove a compromised client certificate from the trust store. This is a
critical case. With the current code, you send an admin client request and
get feedback on whether the update happened. A watcher that may or may not
work is pretty much useless. The only reliable way to handle this case
after this KIP would be to change the file name and send an admin client
request.
4c) Change broker key store, which also perhaps requires change to the
truststore. This is the ordering issue introduced by this KIP. This is also
the issue that David Jacot brought up. With the current code, you update
keystore and truststore and send admin client requests to perform the
update and hence you have total control on when each update is performed
and whether they are performed together. With the proposed KIP, we lose
this control. Like 4b), the only reliable way after this KIP to handle this
case would be to change filenames and send an admin client request.
4d) Add a new client certificate to the broker truststore. This is the only
case where a file watcher seems useful to reload sooner.

To summarize, two of the four cases are no longer reliable without file
name change and it would be better to document these as limitations. These
are rare cases where file name change seems reasonable. Periodic refresh,
which is the most common case and one that we would want to support in
clients some day can be supported without a file watcher. That leaves 4d)
which I am not sure justifies the use of a file watcher that cannot be
disabled.

5) Dynamic updates are restricted to ensure brokers don't become unusable
after an update. For example, we don't allow the distinguished name (DN)
 in the certificate that is used to determine user identity to be changed
dynamically. Similarly, you cannot remove host names included in the
certificates  since that may fail client validation. For certificates used
for inter-broker communication, we validate the key store against the trust
store to ensure that brokers are able to communicate to each other. With
admin client based updates, we attempt an update and provide feedback so
that the user can take action. The KIP needs more detail on how errors will
be handled with the watcher or periodic updates. It also needs more detail
on the new requirements being introduced for ordering of updates of files
on the file system or clearly state the cases which will no longer be
supported without file  name change.


On Thu, Jan 7, 2021 at 6:46 PM Boyang Chen <re...@gmail.com>
wrote:

> Hey David, thanks for the feedback.
>
> On Thu, Jan 7, 2021 at 2:37 AM David Jacot <dj...@confluent.io> wrote:
>
> > Hi Boyang,
> >
> > Thanks for the KIP. I am fine with it in general. I just have a few
> > comments.
> >
> > With the proposal, we don't have the guarantee that both the new keystore
> > and the new truststore will be picked up together so we may end up with
> > the new keystore and the old truststore for a short period of time, or
> > permanently
> > if the second one can't be reloaded for any reason.
> >
> > This could disallow clients to authenticate for a while if the new
> keystore
> > and the
> > new trustore are not crafted to work with their old versions.
> >
> > I wonder how this would work in practice. Do we already have guards in
> > place to avoid this or could we add something to ensure that listeners
> are
> > updated only if both the truststore and the keystore works with each
> other?
> >
> > We don't have this issue today as both the truststore and the keystore
> are
> > reloaded when the AlterConfig RPC is received so the admin can control
> > this process. It is all or nothing.
> >
> > I'm not sure how we achieve that today, if we are updating both key store
> and trust store
> in the same request, it is still possible that one update succeeded but
> another failed. Does users
> have awareness to make the guarantee by themselves? My
> point is that today there is no reliable way to achieve atomic update
> either, and hoping users
> to make the correct sequence of decisions would be hard.
>
> I think that this is acceptable but it is worth clearly mentioning that
> > there is no
> > guarantee from that regard in the KIP, and later in the doc. Perhaps, we
> > could
> > also mention that updating them in place is not a best practice and that
> > using
> > new paths gives better control to the admin.
> >
> > Best,
> > David
> >
> > On Wed, Jan 6, 2021 at 6:55 PM Jason Gustafson <ja...@confluent.io>
> wrote:
> >
> > > Thanks Boyang. Someone mentioned my email never showed up, but
> basically
> > I
> > > suggested tying the refresh configuration more directly to the
> > > configurations it would affect. I'm happy with the updates.
> > >
> > > -Jason
> > >
> > > On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <reluctanthero104@gmail.com
> >
> > > wrote:
> > >
> > > > Thanks Jason for the feedback. I separated the time configs for key
> > store
> > > > and trust store, and rename the configs as you proposed.
> > > >
> > > > Best,
> > > > Boyang
> > > >
> > > > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <
> > reluctanthero104@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey there,
> > > > >
> > > > > bumping up this thread to see if there are further questions
> > regarding
> > > > the
> > > > > updated proposal.
> > > > >
> > > > > Best,
> > > > > Boyang
> > > > >
> > > > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
> > > reluctanthero104@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > >> After some offline discussions, we believe that it's the right
> > > direction
> > > > >> to go by doing a hybrid approach which includes both file-watch
> > > trigger
> > > > and
> > > > >> interval based reloading. The former guarantees a swift change in
> > 99%
> > > > time,
> > > > >> while the latter provides a time-based guarantee in the worst case
> > > when
> > > > the
> > > > >> file-watch does not take effect. The current default reloading
> > > interval
> > > > is
> > > > >> set to 5 min. I have updated the KIP and ticket, feel free to
> check
> > > out
> > > > and
> > > > >> see if it makes sense.
> > > > >>
> > > > >> Best,
> > > > >> Boyang
> > > > >>
> > > > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
> > > reluctanthero104@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Hey Gwen, thanks for the feedback.
> > > > >>>
> > > > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io>
> > > > wrote:
> > > > >>>
> > > > >>>> Agree with Igor. IIRC, we also encountered cases where filewatch
> > was
> > > > >>>> not triggered as expected. An interval will give us a better
> > > > >>>> worse-case scenario that is easily controlled by the Kafka
> admin.
> > > > >>>>
> > > > >>>> Are the cases you were referring to happening in the cloud
> > > > environment?
> > > > >>> Should we investigate instead of simply assuming the standard API
> > > won't
> > > > >>> work? I checked around and found a similar complaint here
> > > > >>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
> > > > >>>
> > > > >>> I would be partially agreeing that we want to have a reliable
> > > approach
> > > > >>> for all different operating systems in general, but would be
> great
> > if
> > > > we
> > > > >>> could reach a quantitative measure of file-watch success rate if
> > > > possible
> > > > >>> for us to make the call. Eventually, the benefit of file-watch is
> > > more
> > > > >>> prompt reaction time and less configuration to the broker.
> > > > >>>
> > > > >>>> Gwen
> > > > >>>>
> > > > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > > > The proposed change relies on a file watch, why not also
> > have
> > > a
> > > > >>>> polling
> > > > >>>> > > > interval to check the file for changes?
> > > > >>>> > > >
> > > > >>>> > > > The periodical check could work, the slight downside is
> that
> > > we
> > > > >>>> need
> > > > >>>> > > additional configurations to schedule the interval. Do you
> > think
> > > > the
> > > > >>>> > > file-watch approach has any extra overhead than the interval
> > > based
> > > > >>>> solution?
> > > > >>>> >
> > > > >>>> > I don't think so. The reason I'm asking this is the KIP
> > currently
> > > > >>>> includes:
> > > > >>>> >
> > > > >>>> >   "When the file watch does not work for unknown reason, user
> > > could
> > > > >>>> still try to change the store path in an explicit AlterConfig
> call
> > > in
> > > > the
> > > > >>>> worst case."
> > > > >>>> >
> > > > >>>> > Having the interval in addition to the file watch could result
> > in
> > > a
> > > > >>>> better worst case scenario.
> > > > >>>> > I understand it would require introducing at least one new
> > > > >>>> configuration for the interval, so maybe this doesn't have to
> > solved
> > > > in
> > > > >>>> this KIP.
> > > > >>>> >
> > > > >>>> > --
> > > > >>>> > Igor
> > > > >>>> >
> > > > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > > > >>>> > > Hey Igor, thanks for the feedback.
> > > > >>>> > >
> > > > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me>
> > wrote:
> > > > >>>> > >
> > > > >>>> > > > Hi Boyang,
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > > What happens if the file is changed into an invalid store?
> > > Does
> > > > >>>> the
> > > > >>>> > > > previous store stay in use?
> > > > >>>> > > >
> > > > >>>> > > > If the reload fails, the previous store should be
> > effective. I
> > > > >>>> will state
> > > > >>>> > > that in the KIP.
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > > Thanks,
> > > > >>>> > > >
> > > > >>>> > > > --
> > > > >>>> > > > Igor
> > > > >>>> > > >
> > > > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > > > >>>> > > > > Hey there,
> > > > >>>> > > > >
> > > > >>>> > > > > I would like to start the discussion thread for KIP-687:
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > > > >>>> > > > >
> > > > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API
> > support
> > > > of
> > > > >>>> updating
> > > > >>>> > > > > the security store by reloading path in-place, and
> replace
> > > > with
> > > > >>>> a
> > > > >>>> > > > > file-watch mechanism inside the broker. Let me know what
> > you
> > > > >>>> think.
> > > > >>>> > > > >
> > > > >>>> > > > > Best,
> > > > >>>> > > > > Boyang
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> --
> > > > >>>> Gwen Shapira
> > > > >>>> Engineering Manager | Confluent
> > > > >>>> 650.450.2760 | @gwenshap
> > > > >>>> Follow us: Twitter | blog
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
Hey David, thanks for the feedback.

On Thu, Jan 7, 2021 at 2:37 AM David Jacot <dj...@confluent.io> wrote:

> Hi Boyang,
>
> Thanks for the KIP. I am fine with it in general. I just have a few
> comments.
>
> With the proposal, we don't have the guarantee that both the new keystore
> and the new truststore will be picked up together so we may end up with
> the new keystore and the old truststore for a short period of time, or
> permanently
> if the second one can't be reloaded for any reason.
>
> This could disallow clients to authenticate for a while if the new keystore
> and the
> new trustore are not crafted to work with their old versions.
>
> I wonder how this would work in practice. Do we already have guards in
> place to avoid this or could we add something to ensure that listeners are
> updated only if both the truststore and the keystore works with each other?
>
> We don't have this issue today as both the truststore and the keystore are
> reloaded when the AlterConfig RPC is received so the admin can control
> this process. It is all or nothing.
>
> I'm not sure how we achieve that today, if we are updating both key store
and trust store
in the same request, it is still possible that one update succeeded but
another failed. Does users
have awareness to make the guarantee by themselves? My
point is that today there is no reliable way to achieve atomic update
either, and hoping users
to make the correct sequence of decisions would be hard.

I think that this is acceptable but it is worth clearly mentioning that
> there is no
> guarantee from that regard in the KIP, and later in the doc. Perhaps, we
> could
> also mention that updating them in place is not a best practice and that
> using
> new paths gives better control to the admin.
>
> Best,
> David
>
> On Wed, Jan 6, 2021 at 6:55 PM Jason Gustafson <ja...@confluent.io> wrote:
>
> > Thanks Boyang. Someone mentioned my email never showed up, but basically
> I
> > suggested tying the refresh configuration more directly to the
> > configurations it would affect. I'm happy with the updates.
> >
> > -Jason
> >
> > On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <re...@gmail.com>
> > wrote:
> >
> > > Thanks Jason for the feedback. I separated the time configs for key
> store
> > > and trust store, and rename the configs as you proposed.
> > >
> > > Best,
> > > Boyang
> > >
> > > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <
> reluctanthero104@gmail.com>
> > > wrote:
> > >
> > > > Hey there,
> > > >
> > > > bumping up this thread to see if there are further questions
> regarding
> > > the
> > > > updated proposal.
> > > >
> > > > Best,
> > > > Boyang
> > > >
> > > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
> > reluctanthero104@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> After some offline discussions, we believe that it's the right
> > direction
> > > >> to go by doing a hybrid approach which includes both file-watch
> > trigger
> > > and
> > > >> interval based reloading. The former guarantees a swift change in
> 99%
> > > time,
> > > >> while the latter provides a time-based guarantee in the worst case
> > when
> > > the
> > > >> file-watch does not take effect. The current default reloading
> > interval
> > > is
> > > >> set to 5 min. I have updated the KIP and ticket, feel free to check
> > out
> > > and
> > > >> see if it makes sense.
> > > >>
> > > >> Best,
> > > >> Boyang
> > > >>
> > > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
> > reluctanthero104@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hey Gwen, thanks for the feedback.
> > > >>>
> > > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io>
> > > wrote:
> > > >>>
> > > >>>> Agree with Igor. IIRC, we also encountered cases where filewatch
> was
> > > >>>> not triggered as expected. An interval will give us a better
> > > >>>> worse-case scenario that is easily controlled by the Kafka admin.
> > > >>>>
> > > >>>> Are the cases you were referring to happening in the cloud
> > > environment?
> > > >>> Should we investigate instead of simply assuming the standard API
> > won't
> > > >>> work? I checked around and found a similar complaint here
> > > >>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
> > > >>>
> > > >>> I would be partially agreeing that we want to have a reliable
> > approach
> > > >>> for all different operating systems in general, but would be great
> if
> > > we
> > > >>> could reach a quantitative measure of file-watch success rate if
> > > possible
> > > >>> for us to make the call. Eventually, the benefit of file-watch is
> > more
> > > >>> prompt reaction time and less configuration to the broker.
> > > >>>
> > > >>>> Gwen
> > > >>>>
> > > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
> > > >>>> >
> > > >>>> >
> > > >>>> > > > The proposed change relies on a file watch, why not also
> have
> > a
> > > >>>> polling
> > > >>>> > > > interval to check the file for changes?
> > > >>>> > > >
> > > >>>> > > > The periodical check could work, the slight downside is that
> > we
> > > >>>> need
> > > >>>> > > additional configurations to schedule the interval. Do you
> think
> > > the
> > > >>>> > > file-watch approach has any extra overhead than the interval
> > based
> > > >>>> solution?
> > > >>>> >
> > > >>>> > I don't think so. The reason I'm asking this is the KIP
> currently
> > > >>>> includes:
> > > >>>> >
> > > >>>> >   "When the file watch does not work for unknown reason, user
> > could
> > > >>>> still try to change the store path in an explicit AlterConfig call
> > in
> > > the
> > > >>>> worst case."
> > > >>>> >
> > > >>>> > Having the interval in addition to the file watch could result
> in
> > a
> > > >>>> better worst case scenario.
> > > >>>> > I understand it would require introducing at least one new
> > > >>>> configuration for the interval, so maybe this doesn't have to
> solved
> > > in
> > > >>>> this KIP.
> > > >>>> >
> > > >>>> > --
> > > >>>> > Igor
> > > >>>> >
> > > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > > >>>> > > Hey Igor, thanks for the feedback.
> > > >>>> > >
> > > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me>
> wrote:
> > > >>>> > >
> > > >>>> > > > Hi Boyang,
> > > >>>> > > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > > What happens if the file is changed into an invalid store?
> > Does
> > > >>>> the
> > > >>>> > > > previous store stay in use?
> > > >>>> > > >
> > > >>>> > > > If the reload fails, the previous store should be
> effective. I
> > > >>>> will state
> > > >>>> > > that in the KIP.
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > > Thanks,
> > > >>>> > > >
> > > >>>> > > > --
> > > >>>> > > > Igor
> > > >>>> > > >
> > > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > > >>>> > > > > Hey there,
> > > >>>> > > > >
> > > >>>> > > > > I would like to start the discussion thread for KIP-687:
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > > >>>> > > > >
> > > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API
> support
> > > of
> > > >>>> updating
> > > >>>> > > > > the security store by reloading path in-place, and replace
> > > with
> > > >>>> a
> > > >>>> > > > > file-watch mechanism inside the broker. Let me know what
> you
> > > >>>> think.
> > > >>>> > > > >
> > > >>>> > > > > Best,
> > > >>>> > > > > Boyang
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Gwen Shapira
> > > >>>> Engineering Manager | Confluent
> > > >>>> 650.450.2760 | @gwenshap
> > > >>>> Follow us: Twitter | blog
> > > >>>>
> > > >>>
> > >
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by David Jacot <dj...@confluent.io>.
Hi Boyang,

Thanks for the KIP. I am fine with it in general. I just have a few
comments.

With the proposal, we don't have the guarantee that both the new keystore
and the new truststore will be picked up together so we may end up with
the new keystore and the old truststore for a short period of time, or
permanently
if the second one can't be reloaded for any reason.

This could disallow clients to authenticate for a while if the new keystore
and the
new trustore are not crafted to work with their old versions.

I wonder how this would work in practice. Do we already have guards in
place to avoid this or could we add something to ensure that listeners are
updated only if both the truststore and the keystore works with each other?

We don't have this issue today as both the truststore and the keystore are
reloaded when the AlterConfig RPC is received so the admin can control
this process. It is all or nothing.

I think that this is acceptable but it is worth clearly mentioning that
there is no
guarantee from that regard in the KIP, and later in the doc. Perhaps, we
could
also mention that updating them in place is not a best practice and that
using
new paths gives better control to the admin.

Best,
David

On Wed, Jan 6, 2021 at 6:55 PM Jason Gustafson <ja...@confluent.io> wrote:

> Thanks Boyang. Someone mentioned my email never showed up, but basically I
> suggested tying the refresh configuration more directly to the
> configurations it would affect. I'm happy with the updates.
>
> -Jason
>
> On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <re...@gmail.com>
> wrote:
>
> > Thanks Jason for the feedback. I separated the time configs for key store
> > and trust store, and rename the configs as you proposed.
> >
> > Best,
> > Boyang
> >
> > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <re...@gmail.com>
> > wrote:
> >
> > > Hey there,
> > >
> > > bumping up this thread to see if there are further questions regarding
> > the
> > > updated proposal.
> > >
> > > Best,
> > > Boyang
> > >
> > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
> reluctanthero104@gmail.com
> > >
> > > wrote:
> > >
> > >> After some offline discussions, we believe that it's the right
> direction
> > >> to go by doing a hybrid approach which includes both file-watch
> trigger
> > and
> > >> interval based reloading. The former guarantees a swift change in 99%
> > time,
> > >> while the latter provides a time-based guarantee in the worst case
> when
> > the
> > >> file-watch does not take effect. The current default reloading
> interval
> > is
> > >> set to 5 min. I have updated the KIP and ticket, feel free to check
> out
> > and
> > >> see if it makes sense.
> > >>
> > >> Best,
> > >> Boyang
> > >>
> > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
> reluctanthero104@gmail.com>
> > >> wrote:
> > >>
> > >>> Hey Gwen, thanks for the feedback.
> > >>>
> > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io>
> > wrote:
> > >>>
> > >>>> Agree with Igor. IIRC, we also encountered cases where filewatch was
> > >>>> not triggered as expected. An interval will give us a better
> > >>>> worse-case scenario that is easily controlled by the Kafka admin.
> > >>>>
> > >>>> Are the cases you were referring to happening in the cloud
> > environment?
> > >>> Should we investigate instead of simply assuming the standard API
> won't
> > >>> work? I checked around and found a similar complaint here
> > >>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
> > >>>
> > >>> I would be partially agreeing that we want to have a reliable
> approach
> > >>> for all different operating systems in general, but would be great if
> > we
> > >>> could reach a quantitative measure of file-watch success rate if
> > possible
> > >>> for us to make the call. Eventually, the benefit of file-watch is
> more
> > >>> prompt reaction time and less configuration to the broker.
> > >>>
> > >>>> Gwen
> > >>>>
> > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
> > >>>> >
> > >>>> >
> > >>>> > > > The proposed change relies on a file watch, why not also have
> a
> > >>>> polling
> > >>>> > > > interval to check the file for changes?
> > >>>> > > >
> > >>>> > > > The periodical check could work, the slight downside is that
> we
> > >>>> need
> > >>>> > > additional configurations to schedule the interval. Do you think
> > the
> > >>>> > > file-watch approach has any extra overhead than the interval
> based
> > >>>> solution?
> > >>>> >
> > >>>> > I don't think so. The reason I'm asking this is the KIP currently
> > >>>> includes:
> > >>>> >
> > >>>> >   "When the file watch does not work for unknown reason, user
> could
> > >>>> still try to change the store path in an explicit AlterConfig call
> in
> > the
> > >>>> worst case."
> > >>>> >
> > >>>> > Having the interval in addition to the file watch could result in
> a
> > >>>> better worst case scenario.
> > >>>> > I understand it would require introducing at least one new
> > >>>> configuration for the interval, so maybe this doesn't have to solved
> > in
> > >>>> this KIP.
> > >>>> >
> > >>>> > --
> > >>>> > Igor
> > >>>> >
> > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > >>>> > > Hey Igor, thanks for the feedback.
> > >>>> > >
> > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
> > >>>> > >
> > >>>> > > > Hi Boyang,
> > >>>> > > >
> > >>>> > >
> > >>>> > >
> > >>>> > > > What happens if the file is changed into an invalid store?
> Does
> > >>>> the
> > >>>> > > > previous store stay in use?
> > >>>> > > >
> > >>>> > > > If the reload fails, the previous store should be effective. I
> > >>>> will state
> > >>>> > > that in the KIP.
> > >>>> > >
> > >>>> > >
> > >>>> > > > Thanks,
> > >>>> > > >
> > >>>> > > > --
> > >>>> > > > Igor
> > >>>> > > >
> > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > >>>> > > > > Hey there,
> > >>>> > > > >
> > >>>> > > > > I would like to start the discussion thread for KIP-687:
> > >>>> > > > >
> > >>>> > > >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > >>>> > > > >
> > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API support
> > of
> > >>>> updating
> > >>>> > > > > the security store by reloading path in-place, and replace
> > with
> > >>>> a
> > >>>> > > > > file-watch mechanism inside the broker. Let me know what you
> > >>>> think.
> > >>>> > > > >
> > >>>> > > > > Best,
> > >>>> > > > > Boyang
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Gwen Shapira
> > >>>> Engineering Manager | Confluent
> > >>>> 650.450.2760 | @gwenshap
> > >>>> Follow us: Twitter | blog
> > >>>>
> > >>>
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Jason Gustafson <ja...@confluent.io>.
Thanks Boyang. Someone mentioned my email never showed up, but basically I
suggested tying the refresh configuration more directly to the
configurations it would affect. I'm happy with the updates.

-Jason

On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <re...@gmail.com>
wrote:

> Thanks Jason for the feedback. I separated the time configs for key store
> and trust store, and rename the configs as you proposed.
>
> Best,
> Boyang
>
> On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <re...@gmail.com>
> wrote:
>
> > Hey there,
> >
> > bumping up this thread to see if there are further questions regarding
> the
> > updated proposal.
> >
> > Best,
> > Boyang
> >
> > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <reluctanthero104@gmail.com
> >
> > wrote:
> >
> >> After some offline discussions, we believe that it's the right direction
> >> to go by doing a hybrid approach which includes both file-watch trigger
> and
> >> interval based reloading. The former guarantees a swift change in 99%
> time,
> >> while the latter provides a time-based guarantee in the worst case when
> the
> >> file-watch does not take effect. The current default reloading interval
> is
> >> set to 5 min. I have updated the KIP and ticket, feel free to check out
> and
> >> see if it makes sense.
> >>
> >> Best,
> >> Boyang
> >>
> >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <re...@gmail.com>
> >> wrote:
> >>
> >>> Hey Gwen, thanks for the feedback.
> >>>
> >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io>
> wrote:
> >>>
> >>>> Agree with Igor. IIRC, we also encountered cases where filewatch was
> >>>> not triggered as expected. An interval will give us a better
> >>>> worse-case scenario that is easily controlled by the Kafka admin.
> >>>>
> >>>> Are the cases you were referring to happening in the cloud
> environment?
> >>> Should we investigate instead of simply assuming the standard API won't
> >>> work? I checked around and found a similar complaint here
> >>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
> >>>
> >>> I would be partially agreeing that we want to have a reliable approach
> >>> for all different operating systems in general, but would be great if
> we
> >>> could reach a quantitative measure of file-watch success rate if
> possible
> >>> for us to make the call. Eventually, the benefit of file-watch is more
> >>> prompt reaction time and less configuration to the broker.
> >>>
> >>>> Gwen
> >>>>
> >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
> >>>> >
> >>>> >
> >>>> > > > The proposed change relies on a file watch, why not also have a
> >>>> polling
> >>>> > > > interval to check the file for changes?
> >>>> > > >
> >>>> > > > The periodical check could work, the slight downside is that we
> >>>> need
> >>>> > > additional configurations to schedule the interval. Do you think
> the
> >>>> > > file-watch approach has any extra overhead than the interval based
> >>>> solution?
> >>>> >
> >>>> > I don't think so. The reason I'm asking this is the KIP currently
> >>>> includes:
> >>>> >
> >>>> >   "When the file watch does not work for unknown reason, user could
> >>>> still try to change the store path in an explicit AlterConfig call in
> the
> >>>> worst case."
> >>>> >
> >>>> > Having the interval in addition to the file watch could result in a
> >>>> better worst case scenario.
> >>>> > I understand it would require introducing at least one new
> >>>> configuration for the interval, so maybe this doesn't have to solved
> in
> >>>> this KIP.
> >>>> >
> >>>> > --
> >>>> > Igor
> >>>> >
> >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> >>>> > > Hey Igor, thanks for the feedback.
> >>>> > >
> >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
> >>>> > >
> >>>> > > > Hi Boyang,
> >>>> > > >
> >>>> > >
> >>>> > >
> >>>> > > > What happens if the file is changed into an invalid store? Does
> >>>> the
> >>>> > > > previous store stay in use?
> >>>> > > >
> >>>> > > > If the reload fails, the previous store should be effective. I
> >>>> will state
> >>>> > > that in the KIP.
> >>>> > >
> >>>> > >
> >>>> > > > Thanks,
> >>>> > > >
> >>>> > > > --
> >>>> > > > Igor
> >>>> > > >
> >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> >>>> > > > > Hey there,
> >>>> > > > >
> >>>> > > > > I would like to start the discussion thread for KIP-687:
> >>>> > > > >
> >>>> > > >
> >>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> >>>> > > > >
> >>>> > > > > This KIP is trying to deprecate the AlterConfigs API support
> of
> >>>> updating
> >>>> > > > > the security store by reloading path in-place, and replace
> with
> >>>> a
> >>>> > > > > file-watch mechanism inside the broker. Let me know what you
> >>>> think.
> >>>> > > > >
> >>>> > > > > Best,
> >>>> > > > > Boyang
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Gwen Shapira
> >>>> Engineering Manager | Confluent
> >>>> 650.450.2760 | @gwenshap
> >>>> Follow us: Twitter | blog
> >>>>
> >>>
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
Thanks Jason for the feedback. I separated the time configs for key store
and trust store, and rename the configs as you proposed.

Best,
Boyang

On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <re...@gmail.com>
wrote:

> Hey there,
>
> bumping up this thread to see if there are further questions regarding the
> updated proposal.
>
> Best,
> Boyang
>
> On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <re...@gmail.com>
> wrote:
>
>> After some offline discussions, we believe that it's the right direction
>> to go by doing a hybrid approach which includes both file-watch trigger and
>> interval based reloading. The former guarantees a swift change in 99% time,
>> while the latter provides a time-based guarantee in the worst case when the
>> file-watch does not take effect. The current default reloading interval is
>> set to 5 min. I have updated the KIP and ticket, feel free to check out and
>> see if it makes sense.
>>
>> Best,
>> Boyang
>>
>> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <re...@gmail.com>
>> wrote:
>>
>>> Hey Gwen, thanks for the feedback.
>>>
>>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io> wrote:
>>>
>>>> Agree with Igor. IIRC, we also encountered cases where filewatch was
>>>> not triggered as expected. An interval will give us a better
>>>> worse-case scenario that is easily controlled by the Kafka admin.
>>>>
>>>> Are the cases you were referring to happening in the cloud environment?
>>> Should we investigate instead of simply assuming the standard API won't
>>> work? I checked around and found a similar complaint here
>>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
>>>
>>> I would be partially agreeing that we want to have a reliable approach
>>> for all different operating systems in general, but would be great if we
>>> could reach a quantitative measure of file-watch success rate if possible
>>> for us to make the call. Eventually, the benefit of file-watch is more
>>> prompt reaction time and less configuration to the broker.
>>>
>>>> Gwen
>>>>
>>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
>>>> >
>>>> >
>>>> > > > The proposed change relies on a file watch, why not also have a
>>>> polling
>>>> > > > interval to check the file for changes?
>>>> > > >
>>>> > > > The periodical check could work, the slight downside is that we
>>>> need
>>>> > > additional configurations to schedule the interval. Do you think the
>>>> > > file-watch approach has any extra overhead than the interval based
>>>> solution?
>>>> >
>>>> > I don't think so. The reason I'm asking this is the KIP currently
>>>> includes:
>>>> >
>>>> >   "When the file watch does not work for unknown reason, user could
>>>> still try to change the store path in an explicit AlterConfig call in the
>>>> worst case."
>>>> >
>>>> > Having the interval in addition to the file watch could result in a
>>>> better worst case scenario.
>>>> > I understand it would require introducing at least one new
>>>> configuration for the interval, so maybe this doesn't have to solved in
>>>> this KIP.
>>>> >
>>>> > --
>>>> > Igor
>>>> >
>>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
>>>> > > Hey Igor, thanks for the feedback.
>>>> > >
>>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
>>>> > >
>>>> > > > Hi Boyang,
>>>> > > >
>>>> > >
>>>> > >
>>>> > > > What happens if the file is changed into an invalid store? Does
>>>> the
>>>> > > > previous store stay in use?
>>>> > > >
>>>> > > > If the reload fails, the previous store should be effective. I
>>>> will state
>>>> > > that in the KIP.
>>>> > >
>>>> > >
>>>> > > > Thanks,
>>>> > > >
>>>> > > > --
>>>> > > > Igor
>>>> > > >
>>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
>>>> > > > > Hey there,
>>>> > > > >
>>>> > > > > I would like to start the discussion thread for KIP-687:
>>>> > > > >
>>>> > > >
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
>>>> > > > >
>>>> > > > > This KIP is trying to deprecate the AlterConfigs API support of
>>>> updating
>>>> > > > > the security store by reloading path in-place, and replace with
>>>> a
>>>> > > > > file-watch mechanism inside the broker. Let me know what you
>>>> think.
>>>> > > > >
>>>> > > > > Best,
>>>> > > > > Boyang
>>>> > > > >
>>>> > > >
>>>> > >
>>>>
>>>>
>>>>
>>>> --
>>>> Gwen Shapira
>>>> Engineering Manager | Confluent
>>>> 650.450.2760 | @gwenshap
>>>> Follow us: Twitter | blog
>>>>
>>>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
Hey there,

bumping up this thread to see if there are further questions regarding the
updated proposal.

Best,
Boyang

On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <re...@gmail.com>
wrote:

> After some offline discussions, we believe that it's the right direction
> to go by doing a hybrid approach which includes both file-watch trigger and
> interval based reloading. The former guarantees a swift change in 99% time,
> while the latter provides a time-based guarantee in the worst case when the
> file-watch does not take effect. The current default reloading interval is
> set to 5 min. I have updated the KIP and ticket, feel free to check out and
> see if it makes sense.
>
> Best,
> Boyang
>
> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <re...@gmail.com>
> wrote:
>
>> Hey Gwen, thanks for the feedback.
>>
>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io> wrote:
>>
>>> Agree with Igor. IIRC, we also encountered cases where filewatch was
>>> not triggered as expected. An interval will give us a better
>>> worse-case scenario that is easily controlled by the Kafka admin.
>>>
>>> Are the cases you were referring to happening in the cloud environment?
>> Should we investigate instead of simply assuming the standard API won't
>> work? I checked around and found a similar complaint here
>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
>>
>> I would be partially agreeing that we want to have a reliable approach
>> for all different operating systems in general, but would be great if we
>> could reach a quantitative measure of file-watch success rate if possible
>> for us to make the call. Eventually, the benefit of file-watch is more
>> prompt reaction time and less configuration to the broker.
>>
>>> Gwen
>>>
>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
>>> >
>>> >
>>> > > > The proposed change relies on a file watch, why not also have a
>>> polling
>>> > > > interval to check the file for changes?
>>> > > >
>>> > > > The periodical check could work, the slight downside is that we
>>> need
>>> > > additional configurations to schedule the interval. Do you think the
>>> > > file-watch approach has any extra overhead than the interval based
>>> solution?
>>> >
>>> > I don't think so. The reason I'm asking this is the KIP currently
>>> includes:
>>> >
>>> >   "When the file watch does not work for unknown reason, user could
>>> still try to change the store path in an explicit AlterConfig call in the
>>> worst case."
>>> >
>>> > Having the interval in addition to the file watch could result in a
>>> better worst case scenario.
>>> > I understand it would require introducing at least one new
>>> configuration for the interval, so maybe this doesn't have to solved in
>>> this KIP.
>>> >
>>> > --
>>> > Igor
>>> >
>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
>>> > > Hey Igor, thanks for the feedback.
>>> > >
>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
>>> > >
>>> > > > Hi Boyang,
>>> > > >
>>> > >
>>> > >
>>> > > > What happens if the file is changed into an invalid store? Does the
>>> > > > previous store stay in use?
>>> > > >
>>> > > > If the reload fails, the previous store should be effective. I
>>> will state
>>> > > that in the KIP.
>>> > >
>>> > >
>>> > > > Thanks,
>>> > > >
>>> > > > --
>>> > > > Igor
>>> > > >
>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
>>> > > > > Hey there,
>>> > > > >
>>> > > > > I would like to start the discussion thread for KIP-687:
>>> > > > >
>>> > > >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
>>> > > > >
>>> > > > > This KIP is trying to deprecate the AlterConfigs API support of
>>> updating
>>> > > > > the security store by reloading path in-place, and replace with a
>>> > > > > file-watch mechanism inside the broker. Let me know what you
>>> think.
>>> > > > >
>>> > > > > Best,
>>> > > > > Boyang
>>> > > > >
>>> > > >
>>> > >
>>>
>>>
>>>
>>> --
>>> Gwen Shapira
>>> Engineering Manager | Confluent
>>> 650.450.2760 | @gwenshap
>>> Follow us: Twitter | blog
>>>
>>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
After some offline discussions, we believe that it's the right direction to
go by doing a hybrid approach which includes both file-watch trigger and
interval based reloading. The former guarantees a swift change in 99% time,
while the latter provides a time-based guarantee in the worst case when the
file-watch does not take effect. The current default reloading interval is
set to 5 min. I have updated the KIP and ticket, feel free to check out and
see if it makes sense.

Best,
Boyang

On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <re...@gmail.com>
wrote:

> Hey Gwen, thanks for the feedback.
>
> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io> wrote:
>
>> Agree with Igor. IIRC, we also encountered cases where filewatch was
>> not triggered as expected. An interval will give us a better
>> worse-case scenario that is easily controlled by the Kafka admin.
>>
>> Are the cases you were referring to happening in the cloud environment?
> Should we investigate instead of simply assuming the standard API won't
> work? I checked around and found a similar complaint here
> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
>
> I would be partially agreeing that we want to have a reliable approach for
> all different operating systems in general, but would be great if we could
> reach a quantitative measure of file-watch success rate if possible for us
> to make the call. Eventually, the benefit of file-watch is more prompt
> reaction time and less configuration to the broker.
>
>> Gwen
>>
>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
>> >
>> >
>> > > > The proposed change relies on a file watch, why not also have a
>> polling
>> > > > interval to check the file for changes?
>> > > >
>> > > > The periodical check could work, the slight downside is that we need
>> > > additional configurations to schedule the interval. Do you think the
>> > > file-watch approach has any extra overhead than the interval based
>> solution?
>> >
>> > I don't think so. The reason I'm asking this is the KIP currently
>> includes:
>> >
>> >   "When the file watch does not work for unknown reason, user could
>> still try to change the store path in an explicit AlterConfig call in the
>> worst case."
>> >
>> > Having the interval in addition to the file watch could result in a
>> better worst case scenario.
>> > I understand it would require introducing at least one new
>> configuration for the interval, so maybe this doesn't have to solved in
>> this KIP.
>> >
>> > --
>> > Igor
>> >
>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
>> > > Hey Igor, thanks for the feedback.
>> > >
>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
>> > >
>> > > > Hi Boyang,
>> > > >
>> > >
>> > >
>> > > > What happens if the file is changed into an invalid store? Does the
>> > > > previous store stay in use?
>> > > >
>> > > > If the reload fails, the previous store should be effective. I will
>> state
>> > > that in the KIP.
>> > >
>> > >
>> > > > Thanks,
>> > > >
>> > > > --
>> > > > Igor
>> > > >
>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
>> > > > > Hey there,
>> > > > >
>> > > > > I would like to start the discussion thread for KIP-687:
>> > > > >
>> > > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
>> > > > >
>> > > > > This KIP is trying to deprecate the AlterConfigs API support of
>> updating
>> > > > > the security store by reloading path in-place, and replace with a
>> > > > > file-watch mechanism inside the broker. Let me know what you
>> think.
>> > > > >
>> > > > > Best,
>> > > > > Boyang
>> > > > >
>> > > >
>> > >
>>
>>
>>
>> --
>> Gwen Shapira
>> Engineering Manager | Confluent
>> 650.450.2760 | @gwenshap
>> Follow us: Twitter | blog
>>
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
Hey Gwen, thanks for the feedback.

On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <gw...@confluent.io> wrote:

> Agree with Igor. IIRC, we also encountered cases where filewatch was
> not triggered as expected. An interval will give us a better
> worse-case scenario that is easily controlled by the Kafka admin.
>
> Are the cases you were referring to happening in the cloud environment?
Should we investigate instead of simply assuming the standard API won't
work? I checked around and found a similar complaint here
<https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.

I would be partially agreeing that we want to have a reliable approach for
all different operating systems in general, but would be great if we could
reach a quantitative measure of file-watch success rate if possible for us
to make the call. Eventually, the benefit of file-watch is more prompt
reaction time and less configuration to the broker.

> Gwen
>
> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
> >
> >
> > > > The proposed change relies on a file watch, why not also have a
> polling
> > > > interval to check the file for changes?
> > > >
> > > > The periodical check could work, the slight downside is that we need
> > > additional configurations to schedule the interval. Do you think the
> > > file-watch approach has any extra overhead than the interval based
> solution?
> >
> > I don't think so. The reason I'm asking this is the KIP currently
> includes:
> >
> >   "When the file watch does not work for unknown reason, user could
> still try to change the store path in an explicit AlterConfig call in the
> worst case."
> >
> > Having the interval in addition to the file watch could result in a
> better worst case scenario.
> > I understand it would require introducing at least one new configuration
> for the interval, so maybe this doesn't have to solved in this KIP.
> >
> > --
> > Igor
> >
> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > > Hey Igor, thanks for the feedback.
> > >
> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
> > >
> > > > Hi Boyang,
> > > >
> > >
> > >
> > > > What happens if the file is changed into an invalid store? Does the
> > > > previous store stay in use?
> > > >
> > > > If the reload fails, the previous store should be effective. I will
> state
> > > that in the KIP.
> > >
> > >
> > > > Thanks,
> > > >
> > > > --
> > > > Igor
> > > >
> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > > > > Hey there,
> > > > >
> > > > > I would like to start the discussion thread for KIP-687:
> > > > >
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > > > >
> > > > > This KIP is trying to deprecate the AlterConfigs API support of
> updating
> > > > > the security store by reloading path in-place, and replace with a
> > > > > file-watch mechanism inside the broker. Let me know what you think.
> > > > >
> > > > > Best,
> > > > > Boyang
> > > > >
> > > >
> > >
>
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Gwen Shapira <gw...@confluent.io>.
Agree with Igor. IIRC, we also encountered cases where filewatch was
not triggered as expected. An interval will give us a better
worse-case scenario that is easily controlled by the Kafka admin.

Gwen

On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
>
>
> > > The proposed change relies on a file watch, why not also have a polling
> > > interval to check the file for changes?
> > >
> > > The periodical check could work, the slight downside is that we need
> > additional configurations to schedule the interval. Do you think the
> > file-watch approach has any extra overhead than the interval based solution?
>
> I don't think so. The reason I'm asking this is the KIP currently includes:
>
>   "When the file watch does not work for unknown reason, user could still try to change the store path in an explicit AlterConfig call in the worst case."
>
> Having the interval in addition to the file watch could result in a better worst case scenario.
> I understand it would require introducing at least one new configuration for the interval, so maybe this doesn't have to solved in this KIP.
>
> --
> Igor
>
> On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > Hey Igor, thanks for the feedback.
> >
> > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
> >
> > > Hi Boyang,
> > >
> >
> >
> > > What happens if the file is changed into an invalid store? Does the
> > > previous store stay in use?
> > >
> > > If the reload fails, the previous store should be effective. I will state
> > that in the KIP.
> >
> >
> > > Thanks,
> > >
> > > --
> > > Igor
> > >
> > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > > > Hey there,
> > > >
> > > > I would like to start the discussion thread for KIP-687:
> > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > > >
> > > > This KIP is trying to deprecate the AlterConfigs API support of updating
> > > > the security store by reloading path in-place, and replace with a
> > > > file-watch mechanism inside the broker. Let me know what you think.
> > > >
> > > > Best,
> > > > Boyang
> > > >
> > >
> >



-- 
Gwen Shapira
Engineering Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Igor Soarez <i...@soarez.me>.
> > The proposed change relies on a file watch, why not also have a polling
> > interval to check the file for changes?
> >
> > The periodical check could work, the slight downside is that we need
> additional configurations to schedule the interval. Do you think the
> file-watch approach has any extra overhead than the interval based solution?

I don't think so. The reason I'm asking this is the KIP currently includes:

  "When the file watch does not work for unknown reason, user could still try to change the store path in an explicit AlterConfig call in the worst case."

Having the interval in addition to the file watch could result in a better worst case scenario.
I understand it would require introducing at least one new configuration for the interval, so maybe this doesn't have to solved in this KIP.

--
Igor

On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> Hey Igor, thanks for the feedback.
> 
> On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:
> 
> > Hi Boyang,
> >
> 
> 
> > What happens if the file is changed into an invalid store? Does the
> > previous store stay in use?
> >
> > If the reload fails, the previous store should be effective. I will state
> that in the KIP.
> 
> 
> > Thanks,
> >
> > --
> > Igor
> >
> > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > > Hey there,
> > >
> > > I would like to start the discussion thread for KIP-687:
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > >
> > > This KIP is trying to deprecate the AlterConfigs API support of updating
> > > the security store by reloading path in-place, and replace with a
> > > file-watch mechanism inside the broker. Let me know what you think.
> > >
> > > Best,
> > > Boyang
> > >
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
Hey Igor, thanks for the feedback.

On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me> wrote:

> Hi Boyang,
>
> The proposed change relies on a file watch, why not also have a polling
> interval to check the file for changes?
>
> The periodical check could work, the slight downside is that we need
additional configurations to schedule the interval. Do you think the
file-watch approach has any extra overhead than the interval based solution?


> What happens if the file is changed into an invalid store? Does the
> previous store stay in use?
>
> If the reload fails, the previous store should be effective. I will state
that in the KIP.


> Thanks,
>
> --
> Igor
>
> On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > Hey there,
> >
> > I would like to start the discussion thread for KIP-687:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> >
> > This KIP is trying to deprecate the AlterConfigs API support of updating
> > the security store by reloading path in-place, and replace with a
> > file-watch mechanism inside the broker. Let me know what you think.
> >
> > Best,
> > Boyang
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Igor Soarez <i...@soarez.me>.
Hi Boyang,

The proposed change relies on a file watch, why not also have a polling interval to check the file for changes? 

What happens if the file is changed into an invalid store? Does the previous store stay in use?

Thanks,

--
Igor

On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> Hey there,
> 
> I would like to start the discussion thread for KIP-687:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> 
> This KIP is trying to deprecate the AlterConfigs API support of updating
> the security store by reloading path in-place, and replace with a
> file-watch mechanism inside the broker. Let me know what you think.
> 
> Best,
> Boyang
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Noa Resare <no...@resare.com>.
The benefit of the solution I mention is simply that it can be implemented without changing Kafka, and I provided it more as a side note for people reading this list that might not have time to wait for this KIP to land into a released version. I do think that the KIP proposal would be very useful.

Cheers,
noa

> On 4 Dec 2020, at 19:02, Boyang Chen <re...@gmail.com> wrote:
> 
> Thanks Noa for the suggested path. Like you mentioned, I feel this
> mechanism is a little bit overkill for a simple security file reloading
> case. Could you provide more context on the benefit of doing a customized
> KeyManager setup? TBH, I don't see Kafka going deep into these low level
> security details yet.
> 
> Best,
> Boyang
> 
> On Fri, Dec 4, 2020 at 4:02 AM Noa Resare <no...@resare.com> wrote:
> 
>> Hi Boyang,
>> 
>> I think that it would improve the ergonomics of dealing with short lived
>> certificates to have this be the default behaviour.
>> 
>> It should be noted that transparently reloading certificates and keys when
>> they changed on disk can be implemented right now registering a custom
>> KeyManagerFactory, but to say that the JDK is designed to make this easy
>> would be an overstatement. The things that we do to get this working:
>> 
>> 1. Create a class implementing SecurityProviderCreator that will return a
>> Provider that registers a custom KeyManagerFactory implementation.
>> 2. This custom KeyManagerFactory would return KeyManager instances that
>> implements X509ExendedKeyManager
>> 3. The custom KeyManager would return cached but up to date values for the
>> getCertificateChain() and getPrivateKey() methods.
>> 5. Configure Kafka with security.providers referencing the class defined
>> in 1)
>> 
>> This is not something I would wish upon anyone, but it works. Solving this
>> for everyone inside Apache Kafka seems like a much preferred solution.
>> 
>> Cheers
>> noa
>> 
>> ps. It seems my apple.com <http://apple.com/> email address ends up on
>> the list as apple.com <http://apple.com/>.INVALID. Is this a known
>> problem? For now I’m working around it by using my personal email.
>> 
>>> On 4 Dec 2020, at 01:28, Boyang Chen <re...@gmail.com> wrote:
>>> 
>>> Hey there,
>>> 
>>> I would like to start the discussion thread for KIP-687:
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
>>> 
>>> This KIP is trying to deprecate the AlterConfigs API support of updating
>>> the security store by reloading path in-place, and replace with a
>>> file-watch mechanism inside the broker. Let me know what you think.
>>> 
>>> Best,
>>> Boyang
>> 
>> 


Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
Thanks Noa for the suggested path. Like you mentioned, I feel this
mechanism is a little bit overkill for a simple security file reloading
case. Could you provide more context on the benefit of doing a customized
KeyManager setup? TBH, I don't see Kafka going deep into these low level
security details yet.

Best,
Boyang

On Fri, Dec 4, 2020 at 4:02 AM Noa Resare <no...@resare.com> wrote:

> Hi Boyang,
>
> I think that it would improve the ergonomics of dealing with short lived
> certificates to have this be the default behaviour.
>
> It should be noted that transparently reloading certificates and keys when
> they changed on disk can be implemented right now registering a custom
> KeyManagerFactory, but to say that the JDK is designed to make this easy
> would be an overstatement. The things that we do to get this working:
>
> 1. Create a class implementing SecurityProviderCreator that will return a
> Provider that registers a custom KeyManagerFactory implementation.
> 2. This custom KeyManagerFactory would return KeyManager instances that
> implements X509ExendedKeyManager
> 3. The custom KeyManager would return cached but up to date values for the
> getCertificateChain() and getPrivateKey() methods.
> 5. Configure Kafka with security.providers referencing the class defined
> in 1)
>
> This is not something I would wish upon anyone, but it works. Solving this
> for everyone inside Apache Kafka seems like a much preferred solution.
>
> Cheers
> noa
>
> ps. It seems my apple.com <http://apple.com/> email address ends up on
> the list as apple.com <http://apple.com/>.INVALID. Is this a known
> problem? For now I’m working around it by using my personal email.
>
> > On 4 Dec 2020, at 01:28, Boyang Chen <re...@gmail.com> wrote:
> >
> > Hey there,
> >
> > I would like to start the discussion thread for KIP-687:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> >
> > This KIP is trying to deprecate the AlterConfigs API support of updating
> > the security store by reloading path in-place, and replace with a
> > file-watch mechanism inside the broker. Let me know what you think.
> >
> > Best,
> > Boyang
>
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Boyang Chen <re...@gmail.com>.
Thanks for the feedback Nikolay. I think our proposals are solving
orthogonal issues. The KIP is proposing to deprecate the reloading path on
AlterConfig, which has nothing to do with general certification expire
issue.

Best,
Boyang

On Fri, Dec 4, 2020 at 4:26 AM Nikolay Izhikov <ni...@apache.org> wrote:

> Hello, Boyang Chen.
>
> I think this KIP overlaps with my idea [1] of exposing information about
> certificates Kafka uses.
> Kafka administrator should initiate renewal certificates procedure not
> long before the certificate expires.
> But, for now, there is no way for administrators to know the expiration
> date of the certificate.
> My proposal is to expose this info in the describe command.
>
> What do you think?
> Do we need it?
>
> [1]
> https://mail-archives.apache.org/mod_mbox/kafka-dev/202012.mbox/browser
>
>
> > 4 дек. 2020 г., в 15:00, Noa Resare <no...@resare.com> написал(а):
> >
> > Hi Boyang,
> >
> > I think that it would improve the ergonomics of dealing with short lived
> certificates to have this be the default behaviour.
> >
> > It should be noted that transparently reloading certificates and keys
> when they changed on disk can be implemented right now registering a custom
> KeyManagerFactory, but to say that the JDK is designed to make this easy
> would be an overstatement. The things that we do to get this working:
> >
> > 1. Create a class implementing SecurityProviderCreator that will return
> a Provider that registers a custom KeyManagerFactory implementation.
> > 2. This custom KeyManagerFactory would return KeyManager instances that
> implements X509ExendedKeyManager
> > 3. The custom KeyManager would return cached but up to date values for
> the getCertificateChain() and getPrivateKey() methods.
> > 5. Configure Kafka with security.providers referencing the class defined
> in 1)
> >
> > This is not something I would wish upon anyone, but it works. Solving
> this for everyone inside Apache Kafka seems like a much preferred solution.
> >
> > Cheers
> > noa
> >
> > ps. It seems my apple.com <http://apple.com/> email address ends up on
> the list as apple.com <http://apple.com/>.INVALID. Is this a known
> problem? For now I’m working around it by using my personal email.
> >
> >> On 4 Dec 2020, at 01:28, Boyang Chen <re...@gmail.com>
> wrote:
> >>
> >> Hey there,
> >>
> >> I would like to start the discussion thread for KIP-687:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> >>
> >> This KIP is trying to deprecate the AlterConfigs API support of updating
> >> the security store by reloading path in-place, and replace with a
> >> file-watch mechanism inside the broker. Let me know what you think.
> >>
> >> Best,
> >> Boyang
> >
>
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Boyang Chen.

I think this KIP overlaps with my idea [1] of exposing information about certificates Kafka uses.
Kafka administrator should initiate renewal certificates procedure not long before the certificate expires.
But, for now, there is no way for administrators to know the expiration date of the certificate.
My proposal is to expose this info in the describe command.

What do you think?
Do we need it?

[1] https://mail-archives.apache.org/mod_mbox/kafka-dev/202012.mbox/browser


> 4 дек. 2020 г., в 15:00, Noa Resare <no...@resare.com> написал(а):
> 
> Hi Boyang,
> 
> I think that it would improve the ergonomics of dealing with short lived certificates to have this be the default behaviour.
> 
> It should be noted that transparently reloading certificates and keys when they changed on disk can be implemented right now registering a custom KeyManagerFactory, but to say that the JDK is designed to make this easy would be an overstatement. The things that we do to get this working:
> 
> 1. Create a class implementing SecurityProviderCreator that will return a Provider that registers a custom KeyManagerFactory implementation.
> 2. This custom KeyManagerFactory would return KeyManager instances that implements X509ExendedKeyManager
> 3. The custom KeyManager would return cached but up to date values for the getCertificateChain() and getPrivateKey() methods.
> 5. Configure Kafka with security.providers referencing the class defined in 1)
> 
> This is not something I would wish upon anyone, but it works. Solving this for everyone inside Apache Kafka seems like a much preferred solution.
> 
> Cheers
> noa
> 
> ps. It seems my apple.com <http://apple.com/> email address ends up on the list as apple.com <http://apple.com/>.INVALID. Is this a known problem? For now I’m working around it by using my personal email.
> 
>> On 4 Dec 2020, at 01:28, Boyang Chen <re...@gmail.com> wrote:
>> 
>> Hey there,
>> 
>> I would like to start the discussion thread for KIP-687:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
>> 
>> This KIP is trying to deprecate the AlterConfigs API support of updating
>> the security store by reloading path in-place, and replace with a
>> file-watch mechanism inside the broker. Let me know what you think.
>> 
>> Best,
>> Boyang
> 


Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Posted by Noa Resare <no...@resare.com>.
Hi Boyang,

I think that it would improve the ergonomics of dealing with short lived certificates to have this be the default behaviour.

It should be noted that transparently reloading certificates and keys when they changed on disk can be implemented right now registering a custom KeyManagerFactory, but to say that the JDK is designed to make this easy would be an overstatement. The things that we do to get this working:

1. Create a class implementing SecurityProviderCreator that will return a Provider that registers a custom KeyManagerFactory implementation.
2. This custom KeyManagerFactory would return KeyManager instances that implements X509ExendedKeyManager
3. The custom KeyManager would return cached but up to date values for the getCertificateChain() and getPrivateKey() methods.
5. Configure Kafka with security.providers referencing the class defined in 1)

This is not something I would wish upon anyone, but it works. Solving this for everyone inside Apache Kafka seems like a much preferred solution.

Cheers
noa

ps. It seems my apple.com <http://apple.com/> email address ends up on the list as apple.com <http://apple.com/>.INVALID. Is this a known problem? For now I’m working around it by using my personal email.

> On 4 Dec 2020, at 01:28, Boyang Chen <re...@gmail.com> wrote:
> 
> Hey there,
> 
> I would like to start the discussion thread for KIP-687:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> 
> This KIP is trying to deprecate the AlterConfigs API support of updating
> the security store by reloading path in-place, and replace with a
> file-watch mechanism inside the broker. Let me know what you think.
> 
> Best,
> Boyang