You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Igor Soarez <so...@apple.com.INVALID> on 2022/12/01 18:18:29 UTC

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Hi Jun,

Thank you for reviewing the KIP. Please find my replies to
your comments below.

10. Thanks for pointing out this typo; it has been corrected.


11. I agree that the additional delay in switching to the
future replica is undesirable, however I see a couple of
issues if we forward the request to the controller
as you describe:

   a) If the controller persists the change to the log directory
   assignment before the future replica has caught up and there
   is a failure in the original log directory then if the broker
   is a leader for the partition there will be no failover
   and the partition will become unavailable. It is not safe to
   call AssignReplicasToDirectories before the replica exists
   in the designated log directory.

   b) An existing behavior we'd like to maintain if possible is
   the ability to move partitions between log directories when a
   broker is offline, as it can be very useful to manage storage
   space. ZK brokers will load and accept logs in the new
   location after startup.
   Maintaining this behavior requires that the broker be able to
   override/correct assignments that are seen in the cluster metadata.
   i.e. the broker is the authority on log directory placement in case
   of mismatch.
   If we want to keep this feature and have the controller send log
   directory reassignments, we'll need a way to distinguish between
   mismatch due to offline movement and mismatch due to controller
   triggered reassignment.

To keep the delay low, instead of sending AlterReplicaLogDirs
within the lock the RPC can be queued elsewhere when the future
replica first catches up. ReplicaAlterLogDirsThread can keep
going and not switch the replicas yet.
Once the broker receives confirmation of the metadata change
it can then briefly block appends to the old replica and make the switch.
In the unlikely event that source log directory fails between the moment
AssignReplicasToDirectories is acknowledged by the controller and
before the broker is able to make the switch, then the broker
can issue AssignReplicasToDirectories of that replica back to the offline
log directory to let the controller know that the replica is actually
offline.
What do you think?


12. Indeed, the metadata.log.dir, if explicitly defined to a separate
directory should not be included in the directory UUID list sent
to the controller in broker registration and heartbeat requests.
I have updated the KIP to make this explicit.


13. Thank you for making this suggestion.
Let's address the different scenarios you enumerated:

 a) When enabling JBOD for an existing KRaft cluster

 In this scenario, the broker finds a single log directory configured
 in `log.dirs`, with an already existing `meta.properties`, which is
 simply missing `directory.id <http://directory.id/>` and `directory.ids`.

 It is safe to have the broker automatically generate the log dir
 UUID and update the `meta.properties` file. This removes any need
 to have extra steps in upgrading and enabling of this feature,
 so it is quite useful.


 b) Adding a log dir to an existing JBOD KRaft cluster

 In this scenario, the broker finds a shorter list of `directory.ids`
 in `meta.properties` than what is configured in `log.dirs`.

 KIP-631 introduced the requirement to run the command kafka-storage.sh
 to format storage directories.

 Currently, if the broker in KRaft mode cannot find `meta.properties`
 in each log directory it will fail at startup. KIP-785 proposes
 removing the need to run the format storage command, but it is
 still open. If new log directories are being added the storage
 command must be run anyway. So I don't think there will be any
 benefit in this case.


 c) Removing a log dir from an existing JBOD KRaft cluster

 In this scenario the broker finds a larger list of `directory.ids`
 in `meta.properties` than what is configured in `log.dirs`.

 The removal of log directories requires an explicit update to
 the `log.dirs` configuration, so it is also safe to have the
 broker automatically update `directory.ids` in `meta.properties`
 to remove the extra UUIDs. It's also useful to drop the requirement
 to run the storage command after removal of log directories from
 configuration, as it reduces operational burden.


 c) Upgrading a JBOD ZK cluster to KRaft

 In ZooKeeper brokers write `meta.properties` with `version=0`,
 but in KRaft mode brokers require `version=1`.
 According to the current discussion on KIP-866, the broker will
 automatically upgrade meta.properties from `version=0` to `version=1`.
 Assuming the meta.properties gets translated into `version=1` as part
 of KIP-866, then the broker should find `meta.properties` which are
 simply missing the two new fields: `directory.id <http://directory.id/>` and `directory.ids`.
 So it should be OK for the broker to generate UUIDs for each log
 directory and update the `meta.properties` file.


I have updated the KIP to remove the need to run the storage command
in these scenarios - apart from b) where it should already be a
requirement AFAICT - and describe how the broker will automatically
update `directory.id <http://directory.id/>` and `directory.ids` in `meta.properties` if
required.


14. Thank you for pointing this out. Yes, the controller should
simply delegate the log directory choice back to the broker,
by assigning `Uuid.ZERO` to all the replicas that were assigned
to removed log directories. I have updated the KIP under
"Controller" > "Broker registration".


15. Yes. Same as in "Metadata caching", replicas are considered
offline if the replica references a log directory which is not in the
list of online log directories for the broker ID hosting the replica.
This means the controller will not assign leadership those replicas.
This is described under "Controller" > "Handling log directory failures".


16. We'll need to bump metadata.version to gate use of the records
and RPC changes. I've added mention of this under the
"Compatibility, Deprecation, and Migration Plan" section.



Please let me know of any further feedback.

Best,

--
Igor

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Igor Soarez <so...@apple.com.INVALID>.
Hi Tom,

Thank you for having another look.


20. That is a good point.

Thinking about your suggestion:
How would this look like in a non-JBOD KRraft cluster upgrade to JBOD mode?

Upgrading to version that includes the JBOD support patch would automatically
update meta.properties to include the new fields. A subsequent downgrade
might be desirable by the operator for whatever reason. The same problem
applies but here there is no future event at which version=2 could
later be set.

Alternatively, we could either:
 - Simply not change the version at all since older broker versions will ignore
   the new fields, and newer versions will add them when missing.
 - Introduce a new, separate file that would live along meta.properties.

WDYT?


Best,

--
Igor


Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Igor Soarez <so...@apple.com.INVALID>.
Hi Jun,

Thank you for your comments and questions.

30. Thank you for pointing this out. The isNew flag is not available
in KRaft mode. The broker can consider the metadata records:
If, and only if, the logdir assigned is Uuid.ZERO then the replica can
be considered new.

Being able to determine if a replica "isNew" is important to prevent
the remaining logdirs from filling up logdirs when some of them become
offline by re-creating replicas that already exist in the offline logdirs.
So the broker will refuse to create logs that are not new if there are
any offline logdirs.

If a logdir is removed from configuration, the controller will detect
this change upon broker registration and reset all partitions assigned
to the removed logdirs to Uuid.ZERO. In this case, it is OK for the
broker to assume that the partitions are new because they do not exist in
any _configured_ online or offline logdir, and the intended behavior is
to re-create them in one of the online logdirs anyway.

I have updated the KIP to make it clear broker decisions are based
on the metadata, and not on this flag.


31. I don't think I understand the question.
Why do we need to assign the same UUID?

A logdir may be replaced with a disk by replacing its configured path
with the new disk mount path under the `log.dirs` property.
While the broker was offline, the operator might have copied the contents
of the old logdir to the disk, or not.

If contents were copied over, then so was the logdir's meta.properties,
along with the UUID, in which case no change is necessary. The broker will
load all configured logdir paths, all existing meta.properties, and verify
that the full set of UUIDs is still congruent across all meta.properties
files. Neither broker or controller will know that something has changed,
and neither of them needs to. All partition assignments are still correct.
The mapping of UUID to logdir is determined by the meta.propeties
under that same logdir.

If the contents were not copied then this is assumed to be a new
and empty logdir. It should get a different UUID. When the broker loads
all meta.properties it will verify that one is missing for the new disk and
create it, generating a new UUID. It will also update the full set of UUIDs
listed in any other meta.properties files. On the broker registration
request the controller will notice a new UUID being registered, but also
notice a UUID missing.
Any topic partitions assinged to the now missing logdir UUID will be
updated to relate to UUID.Zero, so that the broker can place them in the
most suitable logdir - which is likely to be the new and empty one.


32. You are correct, the HeartBeat request should convey the failure
and the broker shouldn't need to send a AssignReplicasToDirs request.

The bit preceding that quote is important:
  "If the partition is assigned to an online log directory"
In this case the broker finds that the metadata indicates that a non-new
replica is assigned to an online logdir in the metadata but this replica
cannot actually be found in any online logdir.
So we want to tell the controller that the metadata is wrong, and that
the replica is actually offline.

This is a defensive design option.

In a scenario where for some reason the broker can see that the metadata
is incorrect about the logdir assignment of replica that existed in the
failed logdir, it is better to correct and recover than to allow the
problem to persist.

Ignoring the error could mean that the partition stays offline. If the
controller is only told about the UUID of the failure logdir, it won't
be able to determine that a leadership and ISR update is required for
any replica with an incorrect logdir assignment.

An alternative – when facing this unlikely failure scenario – would be
for the broker to error and exit, which would be more disruptive.


33. Correct. I should've made that clear. Updated.


34. No. It shouldn't be a large request, and it should only happen rarely.
This relates to point 32.
When a logdir fails, that failure is communicated to the controller by
indicating the logdir UUID in the heartbeat request. The controller
can determine that _the partitions assigned to that logdir UUID_
are now offline. But, if there are any partitions that were in that logdir
and do not have that same logdir UUID assigned to it in the cluster metadata
then the broker needs to signal that these are also offline, as the
controller will not be able to determine that without the assignment.

We expect each broker to proactively instruct the controller to keep the
metadata correct about the logdir assignment for each replica, so
situations where the metadata is wrong should be rare, and when they
happen only a small number of replicas should be affected. Hence this
should be both a small and rare request.


35. Hmm, I could not find the string "AlterReplicaDirRequest"
in the source:
  https://github.com/apache/kafka/search?q=AlterReplicaDirRequest

I'm referring to this API key:
  clients/src/main/java/org/apache/kafka/common/protocol/ApiKeys.java#L78


36. The risk is that if the broker is unfenced while the controller still
has an incorrect view of the logdir assignment it may assign leadership
to the broker for some partition which is incorrectly assigned in metadata.
If that happens, when a logdir fails, the heartbeat request
indicating the failed logdir UUID will not cause the controller to
take action and reassign leadership, and we may end up with an unavailable
partition.

The controller will assume that partition leadership is being performed
correctly and will not take any action, as long as it thinks the broker
is alive, and that the partition is assigned to an online logdir.
It could be interesting to find a more general solution to this issue,
as that would eliminate a wider range of failures in Kafka. But I don't
currently have any suggestions there.

The requests sent while still fenced aim to correct the logdir assignment
for all of the partitions in the broker. One of the reasons that the
assignment may be incorrect is that an operator might have relocated some
partitions to a different logdir while the broker was offline.
This is a currently supported feature - albeit probably not widely known.

Why is it important that there should be no other requests while the
broker is still fenced?


37. I had originally proposed that if there is a single logdir configured
the controller could assume that all the existing replicas are assigned
to the only logdir indicated in broker registration request, provided
there isn't a previous registration that indicates any logdir UUIDs.
This would avoid the broker sending AssignReplicasToDirs to populate the
the initial assignments.

If the broker is registering with a single logdir, but the previous broker
registration indicates some logdir UUID then the controller cannot make this
simplification, as the logdir could be a new one, or a previous second
logdir might have been removed from configuration and the current assignment
is unclear.

We could maybe say that whenever there is a single logdir, the broker
will not bother about the assignment in general. The downside of this is that
there might be more work to do later (more partition assignments to correct)
when a second logdir is configured. I think may be more disruptive.
It is preferable to spread out the effort to maintain a correct
assignment in the metadata.

Tom Bentley raised this in point 4. and since it's a not strictly
necessary optimisation I updated the KIP to remove it back then.
Do you think we should keep the optimisation?


38. Correct. I've updated the KIP.


39. I think I forgot to update this after I changed the proposal to say
the meta.properties are automatically updated. I have updated this
section to clarify that the broker will automatically update the
file if possible.

A new logdir can be added while there are other, offline logdirs, as
long as the set of UUIDs in `directory.ids` is expanded to include the
new one. So the length of UUIDs in `directory.ids` and paths in
`log.dirs` should always match.

It is important that the broker be able to distinguish between UUIDs
for logdirs that are offline, vs UUIDs for logdirs that were removed
from configuration.

If the broker starts up, configured with two logdirs, each logdir contains
a meta.properties file indicating three different UUIDs under
`directory.ids`, but only one of the configured logdirs is accessible
(online), then it is not possible for the broker to automatically update
the file, as it won't be able to distinguish between the UUID for the
offline logdir and the removed logdir. In this case the broker should
fail to start. The operator can either bring the offline logdir back up,
restore the log.dirs configuration or manually update meta.properties.


40. Indeed. What I meant to say here is that the controller should not
accept broker registration requests that do not indicate any online
logdir UUIDs. We don't expect the broker would send these anyway.

During the upgrade from non JBOD we could allow brokers to register
without specifying any logdir UUID (online or offline). But thinking
about this again now, I don't think it will be necessary — this idea
was from before the metadata.version feature flag change was introduced.
BrokerRegistrationRequest should only include logdir UUIDs after
all servers are upgraded, and by then all logdirs will have an UUID assigned.

I've updated the KIP to clarify that BrokerRegistrationRequest must always
include some online logdir UUID.


Best,

--
Igor



Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Igor Soarez <i...@soarez.me>.
Hi all,

I’ve had to step away from work for personal reasons for a couple of months – until mid April 2023. I don’t think I’ll be able to continue to address feedback or update this KIP before then.

--
Igor

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Igor Soarez <so...@apple.com.INVALID>.
Hi Jun,

Thank you for sharing your questions, please find my answers below.


41. There can only be user partitions on `metadata.log.dir` if that log
dir is also listed in `log.dirs`.
`LogManager` does not specifically load contents from `metadata.log.dir`.

The broker will communicate UUIDs to the controller for all log dirs
configured in `log.dirs`. If the metadata directory happens to be one
of those, it may also contain user partitions, so the controller will
know about it. If it is a completely separate log dir, it cannot hold
user partitions, so there's no need to include it.


42. I'm not sure about what exactly you refer to with "decommission the
disk", so please let me know if I'm missing your point here.

A disk can be removed from `log.dirs` and removed from the system in a
single broker restart:

  1. Shutdown the broker
  2. Unmount, remove the disk
  3. Update `log.dirs` config
  4. Start the broker

Upon restart, the broker will update `directory.ids` in the
`meta.properties` for the remaining configured log dirs.

Log dir identity cannot be inferred from the path, because the same
storage device can be remounted under a different path, so the way we
identify storage directories is by looking at their contents – the
`directory.id` field in its `meta.properties`.
But this also means  that a log dir cannot be identified if it is not
available, and so it also means that the broker can only generate
`directory.ids` if all log directories listed under `log.dirs` happen
to be available.

Consider the following example, where `log.dirs=/a,/b/,/c`, and
the following `meta.properties` (non-relevant values omitted):

    # /a/meta.properties
    directory.id=1
    directory.ids=1,2,3

    # /b/meta.properties
    directory.id=2
    directory.ids=1,2,3

    # /c/meta.properties
    directory.id=3
    directory.ids=1,2,3

If `log.dirs` is updated to remove `/c`, the broker will be able
to determine the new value for `directory.ids=1,2` by loading
`/a/meta.properties` and `/b/meta.properties`.
But if either `/a`, or `/b` happens to be unavailable, e.g. due to some
temporary disk failure we cannot determine `directory.ids`. e.g.
if `/b` is unavailable, the broker can't tell if `directory.ids` should be
`1,2`, `1,3`, or even `1,4`.

In a scenario where an operator wishes to remove a log dir from
configuration and some other log dir is also offline, the operator will have
a few options:

  a) Bring the offline log dir back online before restarting the broker.

  b) Edit `meta.properties` to remove the UUID for the deconfigured logdir
  from `directory.ids` in the remaining available log dirs. This will remove
  the need for the broker to regenerate `directory.ids` as the entry count
  for `directory.ids` and `log.dirs` will be equal.

  c) Also remove the offline log dir from `log.dirs`.


43. If the log dir was already failed at startup, indeed, the broker
will not know that. But in that case, there's no risk of a race or failure.

What I meant here relates rather to log dir failures at runtime.
I've updated this bit in the KIP to clarify.

When executing the log directory failure handler, the broker knows
which directory failed, which partitions resided there, and it can
check if any of those newly failed partitions refer to a different
log dir in the cluster metadata.

The assignment should be correct for all of them, as the broker
will be proactive in notifying the controller of any changes in
log dir assignment. But in case of some race condition, the broker
should nudge the controller to deal with the incorrectly
assigned partitions.


44. Tom Bentley and I have discussed this previously in this thread,
in emails dated Jan 10, 13, 23 and Feb 3.

When upgrading a JBOD enabled ZK cluster, we could piggyback on the ZK to
KRaft upgrade (as per KIP-866) and delay bumping `meta.properties` until the
ZK->KRaft upgrade is finalized. After then, we do not support downgrading.

But I'm not convinced we should do this, since there's another upgrade
scenario – when this change proposed in this KIP is applied to a KRaft
cluster that does not yet support JBOD.
In this scenario there are no multiple steps in which one of them is
considered final, and I'm not sure we'd want to introduce an
additional step – making the upgrade process more complex – just to
address this issue either.

I think the best approach is to keep using `version=1` in `meta.properties`.
The new properties introduced in this KIP will safely be ignored by previous
versions, in either ZK or KRaft mode, and we avoid creating conflicts with
unexpected declared versions. The presence of the extra fields is also
innocuous in the case of a second upgrade following a downgrade.

I've updated the KIP to reflect this. Let me know what you think.


Best,

--
Igor



Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Jun Rao <ju...@confluent.io.INVALID>.
Hi, Igor,

Thanks for the reply. A few more replies and comments.

31. Thanks for the explanation. This looks good to me then.

35. Yes, you are right.

36. Yes, this seems fine since the KRaft controller allows broker requests
before it's being unfenced.

37. Yes, it's probably simpler without the optimization for the case with a
single log dir.

41. "If log directory that holds the cluster metadata topic is configured
separately to a different path — using metadata.log.dir — then the
respective UUID for this log directory is excluded from both online and
offline sets, as the broker cannot run if this particular log directory is
unavailable." Hmm, there could be user partitions on metadata.log.dir. So
the controller needs to know the disk UUID for that dir, right? Otherwise,
the controller will keep setting those replicas disk id to zero.

42. "If an entry is removed from log.dirs  the broker can also
automatically update directory.ids as long as no log directories are
offline when the broker comes back up." This makes the removal of a disk
less convenient than before since one can't just remove the disk from
log.dirs and decommission the disk in a single step.

43. "If the broker knows that the partition already exists in a different
log directory that is now offline" The broker may not know that since
currently we remove the offline partitions from LogManager.

44. "When configured for the migration and while still in ZK mode, brokers
will: update meta.properties to generate and include directory.id  and
directory.ids;" I guess there is no downgrade after this point? In KIP-866,
the meta.properties is only changed after the upgrade is finalized.

Thanks,

Jun

On Mon, Feb 6, 2023 at 10:15 AM Igor Soarez <so...@apple.com.invalid>
wrote:

> Hi David,
>
> Thank you for your suggestions and for having a look at this KIP.
>
> 1. Yes, that should be OK. I have updated the section
> "Migrating a cluster in ZK mode running with JBOD" to reflect this.
>
> 2. I've updated the motivation section to state that.
>
> Best,
>
> --
> Igor
>
>
>

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Igor Soarez <so...@apple.com.INVALID>.
Hi David,

Thank you for your suggestions and for having a look at this KIP.

1. Yes, that should be OK. I have updated the section
"Migrating a cluster in ZK mode running with JBOD" to reflect this.

2. I've updated the motivation section to state that.

Best,

--
Igor



Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by David Arthur <da...@confluent.io.INVALID>.
Igor, thanks for the KIP!

1. For the ZK migration part, I wonder if we could avoid monitoring the
directory failure ZNode while in dual-write mode. So far in the migration
design, we have avoided reading anything from ZK after the initial metadata
migration. We have modified the ZK brokers to use the
BrokerLifecycleManager while in migration mode as a means to register with
the KRaft controller. Could we use the proposed new fields in
BrokerRegistrationRequest and BrokerHeartbeatRequest RPCs to inform the
KRaft controller of failed directories?

2. We should probably explicitly state that this KIP will subsume KIP-589

Thanks!
David

On Wed, Jan 25, 2023 at 6:14 PM Jun Rao <ju...@confluent.io.invalid> wrote:

> Hi, Igor,
>
> Thanks for the reply. A few more comments/questions.
>
> 30. In ZK mode, LeaderAndIsr has a field isNew that indicates whether a
> replica is newly created or not. How do we convey the same thing with
> metadata records in KRaft mode?
>
> 31. If one replaces a log dir with a disk, I guess we need to assign the
> same UUID to the log dir? I am not sure if we could do this since we don't
> store the mapping from log dir name to its UUID.
>
> 32. "If the partition is not new and does not exist, and there are any
> offline log directories, the broker uses AssignReplicasToDirs to change the
> metadata assignment to an offline log directory." Hmm, should the broker
> send the HeartBeat request to propagate the failed log dir instead?
> Actually, does it need to send it at all since the same information should
> already be sent when the offline dir was first detected?
>
> 33. "If the partition already exists in another log directory and is a
> future replica in the log directory indicated by the metadata, the broker
> will replace the current replica with the future replica." It seems that
> the broker needs to make sure that the future replica catches up with the
> current replica first?
>
> 34. "As a fallback safety mechanism in case of any synchronisation issues,
> when a log directory fails, the broker will check log directory assignments
> in the metadata and if there are any missing replicas, the broker will use
> AlterReplicaLogDirs assigning the replicas to the offline log directory, to
> convey to the controller that these replicas are offline and need
> leadership and ISR updates." Hmm, the motivation says that we want to avoid
> large requests introduced in KIP-589. By sending AlterReplicaLogDirs on log
> dir fail, it seems that we are issuing the same large request every time a
> log dir fails?
>
> 35. "When replicas are moved between directories, using the existing
> AlterReplicaLogDirs RPC, the receiving broker will start moving the
> replicas using AlterReplicaLogDirs threads as usual."  AlterReplicaLogDirs
> is new request. The existing one is AlterReplicaDirRequest.
>
> 36. "If the broker is configured with multiple log directories it remains
> FENCED until it can verify that all partitions are assigned to the correct
> log directories in the cluster metadata." Hmm, this could be a bit weird.
> A broker probably shouldn't send any request other than HeartBeat to the
> controller until it's unfenced. But it may need to send the
> AssignReplicasToDirs request to complete the log dir assignment. Is this
> required?
>
> 37. "Existing replicas without a log directory are either: Assumed to live
> in a broker that isn’t yet configured with multiple log directories, and so
> live in a single log directory, even if the UUID for that directory isn’t
> yet known by the controller. It is not possible to trigger a log directory
> failure from a broker that has a single log directory, as the broker would
> simply shut down if there are no remaining online log directories." So, if
> there is a single log dir, the broker doesn't send AssignReplicasToDirs
> requests to the controller?
>
> 38. In the section under "If the partition is assigned to an online log
> directory", it seems that we need to add another case. If the partition
> exists and a future replica also exists in another dir, should we start the
> process to replicate the current replica to the future replica. This is the
> existing behavior.
>
> 39. "If there are offline log directories, the broker will fail at startup
> if there is a mismatch between the number of entries in log.dirs  and
> directory.ids." This seems a bit restricting since it means that one can't
> add new log dir when an existing log dir is down?
>
> 40. "In a future release, broker registration without online log directory
> UUIDs will be disallowed." Hmm, currently, the broker already fails after
> log recovery if no log dir is online, right?
>
> Thanks,
>
> Jun
>
>
> On Mon, Jan 23, 2023 at 3:59 AM Tom Bentley <tb...@redhat.com> wrote:
>
> > Hi Igor,
> >
> > Thanks for your replies on points 21-25. Those all LGTM. See below for a
> > further thought on the meta.properties upgrade.
> >
> > Kind regards,
> >
> > Tom
> >
> >
> > On Fri, 13 Jan 2023 at 18:09, Igor Soarez <so...@apple.com.invalid>
> > wrote:
> >
> > > Hi Tom,
> > >
> > > Thank you for having another look.
> > >
> > > 20. Upon a downgrade to a Kafka version that runs the current
> > > "version == 1" assertion, then yes — a downgrade would not be possible
> > > without first updating (manually) the meta.properties files back
> > > to the previous version.
> > >
> > > We could prevent this issue if we consider the new fields optional and
> > > not bump the version, but this takes away from the value of having
> > > a version in the first place. I'm not sure this would be a good
> > trade-off.
> > >
> > > IIUC in KIP-866 this issue was addressed by delaying the writing of
> > > meta.properties under the new version until after the full transition
> to
> > > KRaft mode is finished and relaxing the expectations around versioning.
> > > We could take a similar approach by relaxing the current
> > > expectation that the version must strictly be 1.
> > >
> > > Currently, MetaProperties#parse() checks that the version is 1 but only
> > > requires that the cluster.id and node.id properties are present in the
> > > file.
> > > We can relax the requirement, ensuring only that the version is at
> least
> > 1,
> > > and land that change before this KIP is implemented – at least one
> > release
> > > beforehand. Then there will be some previous version for which a
> > downgrade
> > > is possible, even after the version has been bumped to 2 and the two
> new
> > > properties are introduced. WDYT?
> > >
> >
> > Doing that would delay JBOD support and mean existing users of JBOD must
> > upgrade via that specific version.
> >
> > I think I favour a different way of relaxing the expectation around
> > versioning:
> > 1. In the new broker, update the meta.properties with the new keys but
> > keeping version=1 (AFAICS this won't cause problems if we roll back to
> old
> > broker binaries at any point prior to the completion of KRaft upgrade,
> > because the code ignores unknown keys).
> > 2. When the KRaft upgrade is complete (i.e. once a broker has written the
> > final ZkMigrationRecord), update meta.properties to version=2, where the
> > new keys are required.
> >
> > That would mean that during the upgrade the additional validation of
> > meta.properties is missing. I think that's acceptable because the extra
> > protection provided isn't something supported by ZK-based JBOD today, so
> > there is no loss of functionality.
> >
> > Wdty?
> >
> >
> > > 21. I think in this case we can reuse LOG_DIR_NOT_FOUND (code 57) and
> > > perhaps
> > > reword its description slightly to be a bit more generic.
> > > I've updated the KIP to mention this.
> > >
> > > 22. This check should be enforced by the broker, as the controller
> cannot
> > > know whether the current replica to logdir assignment is correct.
> > > Currently, the controller does not unfence a broker that does not
> "want"
> > > to be unfenced. The broker should not want to be unfenced while there
> > > are still mismatching (wrong or missing) replica to logdir assignments.
> > > I updated the KIP to clarify this.
> > >
> > > So a reason wouldn't be included in the heartbeat response but your
> point
> > > about diagnosing is interesting. So I've updated the KIP to propose a
> > > new broker metric reporting the number of mismatching replica
> assignments
> > > - i.e. replica assignments that are either missing or incorrect in the
> > > cluster metadata, as observed by the broker — until this metric is
> zero,
> > > the broker would not want to be unfenced. Let me know what you think.
> > >
> > > 23. I don't see why not. We should be able to extend the system tests
> > > proposed in KIP-866 to cover this.
> > >
> > > 24. Correct. Updated to remove the typo.
> > >
> > > 25. Good point. Updated to align the names.
> > >
> > >
> > > Thank you,
> > >
> > > --
> > > Igor
> > >
> > >
> > >
> >
>


-- 
-David

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Jun Rao <ju...@confluent.io.INVALID>.
Hi, Igor,

Thanks for the reply. A few more comments/questions.

30. In ZK mode, LeaderAndIsr has a field isNew that indicates whether a
replica is newly created or not. How do we convey the same thing with
metadata records in KRaft mode?

31. If one replaces a log dir with a disk, I guess we need to assign the
same UUID to the log dir? I am not sure if we could do this since we don't
store the mapping from log dir name to its UUID.

32. "If the partition is not new and does not exist, and there are any
offline log directories, the broker uses AssignReplicasToDirs to change the
metadata assignment to an offline log directory." Hmm, should the broker
send the HeartBeat request to propagate the failed log dir instead?
Actually, does it need to send it at all since the same information should
already be sent when the offline dir was first detected?

33. "If the partition already exists in another log directory and is a
future replica in the log directory indicated by the metadata, the broker
will replace the current replica with the future replica." It seems that
the broker needs to make sure that the future replica catches up with the
current replica first?

34. "As a fallback safety mechanism in case of any synchronisation issues,
when a log directory fails, the broker will check log directory assignments
in the metadata and if there are any missing replicas, the broker will use
AlterReplicaLogDirs assigning the replicas to the offline log directory, to
convey to the controller that these replicas are offline and need
leadership and ISR updates." Hmm, the motivation says that we want to avoid
large requests introduced in KIP-589. By sending AlterReplicaLogDirs on log
dir fail, it seems that we are issuing the same large request every time a
log dir fails?

35. "When replicas are moved between directories, using the existing
AlterReplicaLogDirs RPC, the receiving broker will start moving the
replicas using AlterReplicaLogDirs threads as usual."  AlterReplicaLogDirs
is new request. The existing one is AlterReplicaDirRequest.

36. "If the broker is configured with multiple log directories it remains
FENCED until it can verify that all partitions are assigned to the correct
log directories in the cluster metadata." Hmm, this could be a bit weird.
A broker probably shouldn't send any request other than HeartBeat to the
controller until it's unfenced. But it may need to send the
AssignReplicasToDirs request to complete the log dir assignment. Is this
required?

37. "Existing replicas without a log directory are either: Assumed to live
in a broker that isn’t yet configured with multiple log directories, and so
live in a single log directory, even if the UUID for that directory isn’t
yet known by the controller. It is not possible to trigger a log directory
failure from a broker that has a single log directory, as the broker would
simply shut down if there are no remaining online log directories." So, if
there is a single log dir, the broker doesn't send AssignReplicasToDirs
requests to the controller?

38. In the section under "If the partition is assigned to an online log
directory", it seems that we need to add another case. If the partition
exists and a future replica also exists in another dir, should we start the
process to replicate the current replica to the future replica. This is the
existing behavior.

39. "If there are offline log directories, the broker will fail at startup
if there is a mismatch between the number of entries in log.dirs  and
directory.ids." This seems a bit restricting since it means that one can't
add new log dir when an existing log dir is down?

40. "In a future release, broker registration without online log directory
UUIDs will be disallowed." Hmm, currently, the broker already fails after
log recovery if no log dir is online, right?

Thanks,

Jun


On Mon, Jan 23, 2023 at 3:59 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi Igor,
>
> Thanks for your replies on points 21-25. Those all LGTM. See below for a
> further thought on the meta.properties upgrade.
>
> Kind regards,
>
> Tom
>
>
> On Fri, 13 Jan 2023 at 18:09, Igor Soarez <so...@apple.com.invalid>
> wrote:
>
> > Hi Tom,
> >
> > Thank you for having another look.
> >
> > 20. Upon a downgrade to a Kafka version that runs the current
> > "version == 1" assertion, then yes — a downgrade would not be possible
> > without first updating (manually) the meta.properties files back
> > to the previous version.
> >
> > We could prevent this issue if we consider the new fields optional and
> > not bump the version, but this takes away from the value of having
> > a version in the first place. I'm not sure this would be a good
> trade-off.
> >
> > IIUC in KIP-866 this issue was addressed by delaying the writing of
> > meta.properties under the new version until after the full transition to
> > KRaft mode is finished and relaxing the expectations around versioning.
> > We could take a similar approach by relaxing the current
> > expectation that the version must strictly be 1.
> >
> > Currently, MetaProperties#parse() checks that the version is 1 but only
> > requires that the cluster.id and node.id properties are present in the
> > file.
> > We can relax the requirement, ensuring only that the version is at least
> 1,
> > and land that change before this KIP is implemented – at least one
> release
> > beforehand. Then there will be some previous version for which a
> downgrade
> > is possible, even after the version has been bumped to 2 and the two new
> > properties are introduced. WDYT?
> >
>
> Doing that would delay JBOD support and mean existing users of JBOD must
> upgrade via that specific version.
>
> I think I favour a different way of relaxing the expectation around
> versioning:
> 1. In the new broker, update the meta.properties with the new keys but
> keeping version=1 (AFAICS this won't cause problems if we roll back to old
> broker binaries at any point prior to the completion of KRaft upgrade,
> because the code ignores unknown keys).
> 2. When the KRaft upgrade is complete (i.e. once a broker has written the
> final ZkMigrationRecord), update meta.properties to version=2, where the
> new keys are required.
>
> That would mean that during the upgrade the additional validation of
> meta.properties is missing. I think that's acceptable because the extra
> protection provided isn't something supported by ZK-based JBOD today, so
> there is no loss of functionality.
>
> Wdty?
>
>
> > 21. I think in this case we can reuse LOG_DIR_NOT_FOUND (code 57) and
> > perhaps
> > reword its description slightly to be a bit more generic.
> > I've updated the KIP to mention this.
> >
> > 22. This check should be enforced by the broker, as the controller cannot
> > know whether the current replica to logdir assignment is correct.
> > Currently, the controller does not unfence a broker that does not "want"
> > to be unfenced. The broker should not want to be unfenced while there
> > are still mismatching (wrong or missing) replica to logdir assignments.
> > I updated the KIP to clarify this.
> >
> > So a reason wouldn't be included in the heartbeat response but your point
> > about diagnosing is interesting. So I've updated the KIP to propose a
> > new broker metric reporting the number of mismatching replica assignments
> > - i.e. replica assignments that are either missing or incorrect in the
> > cluster metadata, as observed by the broker — until this metric is zero,
> > the broker would not want to be unfenced. Let me know what you think.
> >
> > 23. I don't see why not. We should be able to extend the system tests
> > proposed in KIP-866 to cover this.
> >
> > 24. Correct. Updated to remove the typo.
> >
> > 25. Good point. Updated to align the names.
> >
> >
> > Thank you,
> >
> > --
> > Igor
> >
> >
> >
>

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Tom Bentley <tb...@redhat.com>.
Hi Igor,

Thanks for your replies on points 21-25. Those all LGTM. See below for a
further thought on the meta.properties upgrade.

Kind regards,

Tom


On Fri, 13 Jan 2023 at 18:09, Igor Soarez <so...@apple.com.invalid> wrote:

> Hi Tom,
>
> Thank you for having another look.
>
> 20. Upon a downgrade to a Kafka version that runs the current
> "version == 1" assertion, then yes — a downgrade would not be possible
> without first updating (manually) the meta.properties files back
> to the previous version.
>
> We could prevent this issue if we consider the new fields optional and
> not bump the version, but this takes away from the value of having
> a version in the first place. I'm not sure this would be a good trade-off.
>
> IIUC in KIP-866 this issue was addressed by delaying the writing of
> meta.properties under the new version until after the full transition to
> KRaft mode is finished and relaxing the expectations around versioning.
> We could take a similar approach by relaxing the current
> expectation that the version must strictly be 1.
>
> Currently, MetaProperties#parse() checks that the version is 1 but only
> requires that the cluster.id and node.id properties are present in the
> file.
> We can relax the requirement, ensuring only that the version is at least 1,
> and land that change before this KIP is implemented – at least one release
> beforehand. Then there will be some previous version for which a downgrade
> is possible, even after the version has been bumped to 2 and the two new
> properties are introduced. WDYT?
>

Doing that would delay JBOD support and mean existing users of JBOD must
upgrade via that specific version.

I think I favour a different way of relaxing the expectation around
versioning:
1. In the new broker, update the meta.properties with the new keys but
keeping version=1 (AFAICS this won't cause problems if we roll back to old
broker binaries at any point prior to the completion of KRaft upgrade,
because the code ignores unknown keys).
2. When the KRaft upgrade is complete (i.e. once a broker has written the
final ZkMigrationRecord), update meta.properties to version=2, where the
new keys are required.

That would mean that during the upgrade the additional validation of
meta.properties is missing. I think that's acceptable because the extra
protection provided isn't something supported by ZK-based JBOD today, so
there is no loss of functionality.

Wdty?


> 21. I think in this case we can reuse LOG_DIR_NOT_FOUND (code 57) and
> perhaps
> reword its description slightly to be a bit more generic.
> I've updated the KIP to mention this.
>
> 22. This check should be enforced by the broker, as the controller cannot
> know whether the current replica to logdir assignment is correct.
> Currently, the controller does not unfence a broker that does not "want"
> to be unfenced. The broker should not want to be unfenced while there
> are still mismatching (wrong or missing) replica to logdir assignments.
> I updated the KIP to clarify this.
>
> So a reason wouldn't be included in the heartbeat response but your point
> about diagnosing is interesting. So I've updated the KIP to propose a
> new broker metric reporting the number of mismatching replica assignments
> - i.e. replica assignments that are either missing or incorrect in the
> cluster metadata, as observed by the broker — until this metric is zero,
> the broker would not want to be unfenced. Let me know what you think.
>
> 23. I don't see why not. We should be able to extend the system tests
> proposed in KIP-866 to cover this.
>
> 24. Correct. Updated to remove the typo.
>
> 25. Good point. Updated to align the names.
>
>
> Thank you,
>
> --
> Igor
>
>
>

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Igor Soarez <so...@apple.com.INVALID>.
Hi Tom,

Thank you for having another look.

20. Upon a downgrade to a Kafka version that runs the current
"version == 1" assertion, then yes — a downgrade would not be possible
without first updating (manually) the meta.properties files back
to the previous version.

We could prevent this issue if we consider the new fields optional and
not bump the version, but this takes away from the value of having
a version in the first place. I'm not sure this would be a good trade-off.

IIUC in KIP-866 this issue was addressed by delaying the writing of
meta.properties under the new version until after the full transition to
KRaft mode is finished and relaxing the expectations around versioning.
We could take a similar approach by relaxing the current
expectation that the version must strictly be 1.

Currently, MetaProperties#parse() checks that the version is 1 but only
requires that the cluster.id and node.id properties are present in the file.
We can relax the requirement, ensuring only that the version is at least 1,
and land that change before this KIP is implemented – at least one release
beforehand. Then there will be some previous version for which a downgrade
is possible, even after the version has been bumped to 2 and the two new
properties are introduced. WDYT?

21. I think in this case we can reuse LOG_DIR_NOT_FOUND (code 57) and perhaps
reword its description slightly to be a bit more generic.
I've updated the KIP to mention this.

22. This check should be enforced by the broker, as the controller cannot
know whether the current replica to logdir assignment is correct.
Currently, the controller does not unfence a broker that does not "want"
to be unfenced. The broker should not want to be unfenced while there
are still mismatching (wrong or missing) replica to logdir assignments.
I updated the KIP to clarify this.

So a reason wouldn't be included in the heartbeat response but your point
about diagnosing is interesting. So I've updated the KIP to propose a
new broker metric reporting the number of mismatching replica assignments
- i.e. replica assignments that are either missing or incorrect in the
cluster metadata, as observed by the broker — until this metric is zero,
the broker would not want to be unfenced. Let me know what you think.

23. I don't see why not. We should be able to extend the system tests
proposed in KIP-866 to cover this.

24. Correct. Updated to remove the typo.

25. Good point. Updated to align the names.


Thank you,

--
Igor



Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Tom Bentley <tb...@redhat.com>.
Hi Igor,

20. The description of the changes to meta.properties says "If there any
meta.properties file is missing directory.id a new UUID is generated, and
assigned to that log directory by updating the file", and the
upgrade/migration section says "As the upgraded brokers come up, the
existing meta.properties  files in each broker are updated with a generated
directory.id  and directory.ids ." Currently MetaProperties#parse() checks
that the version is 1, so would the described behaviour prevent downgrade
of a broker to an older version of the software?

21. "If the indicated log directory UUID is not a registered log directory
then the call fails with an error" can you specify which error (is it a new
error code)?

22. "If multiple log directories are registered the broker will remain
fenced until the controller learns of all the partition to log directory
placements in that broker - i.e. no remaining replicas assigned to Uuid.ZERO ."
Is an error code used in the BrokerHeartbeatResponse to indicate this
state? (Or is the only way to diagnose the reason for a broker remaining
fenced for this reason to look at the controller logs?)

23. Will there be a system test to cover the upgrade of a ZK+JBOD cluster
to KRaft+JBOD cluster?

24. In the rejected alternatives: "However the broker is in a better
position to make a choice of log directory than the broker". I think that
should be "...than the controller", right?

25. I wonder about the inconsistency of the RPC names: We have the existing
AlterReplicaLogDirs (and log.dirs broker config), but the new RPC is
AssignReplicasToDirectories.

Many thanks!

Tom

On Tue, 3 Jan 2023 at 18:05, Igor Soarez <so...@apple.com.invalid> wrote:

> Hi Jun,
>
> Thank you for having another look.
>
> 11. That is correct. I have updated the KIP in an attempt to make this
> clearer.
> I think the goal should be to try to minimize the chance that a log
> directory
> may happen while the metadata is incorrect about the log directory
> assignment,
> but also have a fallback safety mechanism to indicate to the controller
> that
> some replica was missed in case of a bad race.
>
> 13. Ok, I think I have misunderstood this. Thank you for correcting me.
> In this case the broker can update the existing meta.properties and create
> new meta.properties in the new log directories.
> This also means that the update-directories subcommand in kafka-storage.sh
> is not necessary.
> I have updated the KIP to reflect this.
>
> Please have another look.
>
>
> Thank you,
>
> --
> Igor
>
>
> > On 22 Dec 2022, at 00:25, Jun Rao <ju...@confluent.io.INVALID> wrote:
> >
> > Hi, Igor,
> >
> > Thanks for the reply.
> >
> > 11. Yes, your proposal could work. Once the broker receives confirmation
> of
> > the metadata change, I guess it needs to briefly block appends to the old
> > replica, make sure the future log fully catches up and then make the
> switch?
> >
> > 13 (b). The kafka-storage.sh is only required in KIP-631 for a brand new
> > KRaft cluster. If the cluster already exists and one just wants to add a
> > log dir, it seems inconvenient to have to run the kafka-storage.sh tool
> > again.
> >
> > Thanks,
> >
> > Jun
>
>

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Igor Soarez <so...@apple.com.INVALID>.
Hi Jun,

Thank you for having another look.

11. That is correct. I have updated the KIP in an attempt to make this clearer.
I think the goal should be to try to minimize the chance that a log directory
may happen while the metadata is incorrect about the log directory assignment,
but also have a fallback safety mechanism to indicate to the controller that
some replica was missed in case of a bad race.

13. Ok, I think I have misunderstood this. Thank you for correcting me.
In this case the broker can update the existing meta.properties and create
new meta.properties in the new log directories.
This also means that the update-directories subcommand in kafka-storage.sh
is not necessary.
I have updated the KIP to reflect this.

Please have another look.


Thank you,

--
Igor


> On 22 Dec 2022, at 00:25, Jun Rao <ju...@confluent.io.INVALID> wrote:
> 
> Hi, Igor,
> 
> Thanks for the reply.
> 
> 11. Yes, your proposal could work. Once the broker receives confirmation of
> the metadata change, I guess it needs to briefly block appends to the old
> replica, make sure the future log fully catches up and then make the switch?
> 
> 13 (b). The kafka-storage.sh is only required in KIP-631 for a brand new
> KRaft cluster. If the cluster already exists and one just wants to add a
> log dir, it seems inconvenient to have to run the kafka-storage.sh tool
> again.
> 
> Thanks,
> 
> Jun

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Posted by Jun Rao <ju...@confluent.io.INVALID>.
Hi, Igor,

Thanks for the reply.

11. Yes, your proposal could work. Once the broker receives confirmation of
the metadata change, I guess it needs to briefly block appends to the old
replica, make sure the future log fully catches up and then make the switch?

13 (b). The kafka-storage.sh is only required in KIP-631 for a brand new
KRaft cluster. If the cluster already exists and one just wants to add a
log dir, it seems inconvenient to have to run the kafka-storage.sh tool
again.

Thanks,

Jun


On Thu, Dec 1, 2022 at 10:19 AM Igor Soarez <so...@apple.com.invalid>
wrote:

>
> Hi Jun,
>
> Thank you for reviewing the KIP. Please find my replies to
> your comments below.
>
> 10. Thanks for pointing out this typo; it has been corrected.
>
>
> 11. I agree that the additional delay in switching to the
> future replica is undesirable, however I see a couple of
> issues if we forward the request to the controller
> as you describe:
>
>    a) If the controller persists the change to the log directory
>    assignment before the future replica has caught up and there
>    is a failure in the original log directory then if the broker
>    is a leader for the partition there will be no failover
>    and the partition will become unavailable. It is not safe to
>    call AssignReplicasToDirectories before the replica exists
>    in the designated log directory.
>
>    b) An existing behavior we'd like to maintain if possible is
>    the ability to move partitions between log directories when a
>    broker is offline, as it can be very useful to manage storage
>    space. ZK brokers will load and accept logs in the new
>    location after startup.
>    Maintaining this behavior requires that the broker be able to
>    override/correct assignments that are seen in the cluster metadata.
>    i.e. the broker is the authority on log directory placement in case
>    of mismatch.
>    If we want to keep this feature and have the controller send log
>    directory reassignments, we'll need a way to distinguish between
>    mismatch due to offline movement and mismatch due to controller
>    triggered reassignment.
>
> To keep the delay low, instead of sending AlterReplicaLogDirs
> within the lock the RPC can be queued elsewhere when the future
> replica first catches up. ReplicaAlterLogDirsThread can keep
> going and not switch the replicas yet.
> Once the broker receives confirmation of the metadata change
> it can then briefly block appends to the old replica and make the switch.
> In the unlikely event that source log directory fails between the moment
> AssignReplicasToDirectories is acknowledged by the controller and
> before the broker is able to make the switch, then the broker
> can issue AssignReplicasToDirectories of that replica back to the offline
> log directory to let the controller know that the replica is actually
> offline.
> What do you think?
>
>
> 12. Indeed, the metadata.log.dir, if explicitly defined to a separate
> directory should not be included in the directory UUID list sent
> to the controller in broker registration and heartbeat requests.
> I have updated the KIP to make this explicit.
>
>
> 13. Thank you for making this suggestion.
> Let's address the different scenarios you enumerated:
>
>  a) When enabling JBOD for an existing KRaft cluster
>
>  In this scenario, the broker finds a single log directory configured
>  in `log.dirs`, with an already existing `meta.properties`, which is
>  simply missing `directory.id <http://directory.id/>` and `directory.ids`.
>
>  It is safe to have the broker automatically generate the log dir
>  UUID and update the `meta.properties` file. This removes any need
>  to have extra steps in upgrading and enabling of this feature,
>  so it is quite useful.
>
>
>  b) Adding a log dir to an existing JBOD KRaft cluster
>
>  In this scenario, the broker finds a shorter list of `directory.ids`
>  in `meta.properties` than what is configured in `log.dirs`.
>
>  KIP-631 introduced the requirement to run the command kafka-storage.sh
>  to format storage directories.
>
>  Currently, if the broker in KRaft mode cannot find `meta.properties`
>  in each log directory it will fail at startup. KIP-785 proposes
>  removing the need to run the format storage command, but it is
>  still open. If new log directories are being added the storage
>  command must be run anyway. So I don't think there will be any
>  benefit in this case.
>
>
>  c) Removing a log dir from an existing JBOD KRaft cluster
>
>  In this scenario the broker finds a larger list of `directory.ids`
>  in `meta.properties` than what is configured in `log.dirs`.
>
>  The removal of log directories requires an explicit update to
>  the `log.dirs` configuration, so it is also safe to have the
>  broker automatically update `directory.ids` in `meta.properties`
>  to remove the extra UUIDs. It's also useful to drop the requirement
>  to run the storage command after removal of log directories from
>  configuration, as it reduces operational burden.
>
>
>  c) Upgrading a JBOD ZK cluster to KRaft
>
>  In ZooKeeper brokers write `meta.properties` with `version=0`,
>  but in KRaft mode brokers require `version=1`.
>  According to the current discussion on KIP-866, the broker will
>  automatically upgrade meta.properties from `version=0` to `version=1`.
>  Assuming the meta.properties gets translated into `version=1` as part
>  of KIP-866, then the broker should find `meta.properties` which are
>  simply missing the two new fields: `directory.id <http://directory.id/>`
> and `directory.ids`.
>  So it should be OK for the broker to generate UUIDs for each log
>  directory and update the `meta.properties` file.
>
>
> I have updated the KIP to remove the need to run the storage command
> in these scenarios - apart from b) where it should already be a
> requirement AFAICT - and describe how the broker will automatically
> update `directory.id <http://directory.id/>` and `directory.ids` in
> `meta.properties` if
> required.
>
>
> 14. Thank you for pointing this out. Yes, the controller should
> simply delegate the log directory choice back to the broker,
> by assigning `Uuid.ZERO` to all the replicas that were assigned
> to removed log directories. I have updated the KIP under
> "Controller" > "Broker registration".
>
>
> 15. Yes. Same as in "Metadata caching", replicas are considered
> offline if the replica references a log directory which is not in the
> list of online log directories for the broker ID hosting the replica.
> This means the controller will not assign leadership those replicas.
> This is described under "Controller" > "Handling log directory failures".
>
>
> 16. We'll need to bump metadata.version to gate use of the records
> and RPC changes. I've added mention of this under the
> "Compatibility, Deprecation, and Migration Plan" section.
>
>
>
> Please let me know of any further feedback.
>
> Best,
>
> --
> Igor