You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Noa Resare (Jira)" <ji...@apache.org> on 2020/07/27 15:46:00 UTC
[jira] [Created] (KAFKA-10314) KafkaStorageException on
reassignment when offline log directories exist
Noa Resare created KAFKA-10314:
----------------------------------
Summary: KafkaStorageException on reassignment when offline log directories exist
Key: KAFKA-10314
URL: https://issues.apache.org/jira/browse/KAFKA-10314
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 2.5.0
Reporter: Noa Resare
If a reassignment of a partition is triggered to a broker with an offline directory, the new broker will fail to follow, instead raising a KafkaStorageException which causes the reassignment to stall indefinitely. The error message we see is the following:
{{[2020-07-23 13:11:08,727] ERROR [Broker id=1] Skipped the become-follower state change with correlation id 14 from controller 1 epoch 1 for partition t2-0 (last update controller epoch 1) with leader 2 since the replica for the partition is offline due to disk error org.apache.kafka.common.errors.KafkaStorageException: Can not create log for t2-0 because log directories /tmp/kafka/d1 are offline (state.change.logger)}}
It seems to me that unless the partition in question already existed on the offline log partition, a better behaviour would simply be to assign the partition to one of the available log directories.
The conditional in [LogManager.scala:769|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/log/LogManager.scala#L769] was introduced to prevent the issue in [KAFKA-4763|https://issues.apache.org/jira/browse/KAFKA-4763] where partitions in offline logdirs would be re-created in an online directory as soon as a LeaderAndISR message gets processed. However, the semantics of isNew seems different in LogManager (the replica is new on this broker) compared to when isNew is set in [KafkaController.scala|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/controller/KafkaController.scala#L879] (where it seems to refer to whether the topic partition in itself is new, all followers gets {{isNew=false}})
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
Re: [jira] [Created] (KAFKA-10314) KafkaStorageException on
reassignment when offline log directories exist
Posted by Noa Resare <re...@apple.com.INVALID>.
I guess it might be time to nag a bit about this, according to the contributing code changes <http://kafka.apache.org/contributing> instructions :) I opened a pull request <https://github.com/apache/kafka/pull/9122> (with test) 6 days ago that resolves this issue for me. I would be delighted to have a review or two of this tiny change.
cheers
noa
> On 27 Jul 2020, at 16:46, Noa Resare (Jira) <ji...@apache.org> wrote:
>
> Noa Resare created KAFKA-10314:
> ----------------------------------
>
> Summary: KafkaStorageException on reassignment when offline log directories exist
> Key: KAFKA-10314
> URL: https://issues.apache.org/jira/browse/KAFKA-10314
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 2.5.0
> Reporter: Noa Resare
>
>
> If a reassignment of a partition is triggered to a broker with an offline directory, the new broker will fail to follow, instead raising a KafkaStorageException which causes the reassignment to stall indefinitely. The error message we see is the following:
>
> {{[2020-07-23 13:11:08,727] ERROR [Broker id=1] Skipped the become-follower state change with correlation id 14 from controller 1 epoch 1 for partition t2-0 (last update controller epoch 1) with leader 2 since the replica for the partition is offline due to disk error org.apache.kafka.common.errors.KafkaStorageException: Can not create log for t2-0 because log directories /tmp/kafka/d1 are offline (state.change.logger)}}
>
> It seems to me that unless the partition in question already existed on the offline log partition, a better behaviour would simply be to assign the partition to one of the available log directories.
>
> The conditional in [LogManager.scala:769|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/log/LogManager.scala#L769] was introduced to prevent the issue in [KAFKA-4763|https://issues.apache.org/jira/browse/KAFKA-4763] where partitions in offline logdirs would be re-created in an online directory as soon as a LeaderAndISR message gets processed. However, the semantics of isNew seems different in LogManager (the replica is new on this broker) compared to when isNew is set in [KafkaController.scala|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/controller/KafkaController.scala#L879] (where it seems to refer to whether the topic partition in itself is new, all followers gets {{isNew=false}})
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)