You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Christo Lolov <ch...@gmail.com> on 2023/05/22 09:23:54 UTC

[DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

Hello all!

I would like to start a discussion on KIP-928: Making Kafka resilient to
log directories becoming full which can be found at
https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full
.

In summary, I frequently run into problems where Kafka becomes unresponsive
when the disks backing its log directories become full. Such
unresponsiveness generally requires intervention outside of Kafka. I have
found it to be significantly nicer of an experience when Kafka maintains
control plane operations and allows you to free up space.

I am interested in your thoughts and any suggestions for improving the
proposal!

Best,
Christo

Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

Posted by Christo Lolov <ch...@gmail.com>.

Heya Igor,

Thank you for reading through the KIP and providing feedback!

11. Good question. I will check whether a change is needed in the
processing of the metadata records and come back. My hunch says no as long
as the Kafka broker is still alive to process the metadata records. This
being said, deleting topics is one of the two things I want to achieve. The
other one is to allow retention to be changed and continue to take effect.
As an example, if a person does not want to lose all data, but has realised
that they are storing 7 days of data while the only need the last 1 day
they should be able to make the retention more aggressive and recover space
without deleting the topic. In my opinion, the change to the controller for
ZK mode isn't big - where previously requests were sent only to online
replicas they are now sent to all replicas. I have a preference for it to
make it in, but if reviewers don't find it necessary I am happy to target
just KRaft.

12. Great question! Since the KIP aims to be as non-invasive as possible,
the controller has no knowledge of the saturated state - the brokers do not
propagate any new information. As such they will be reported as having
thrown a KafkaStorageException whenever DescribeReplicaLogDirs is called.
Again, this decision came from me wanting the change to be as least
invasive as possible - the new state could be propagated.

13. Yes, I forgot to add this to the KIP and will amend it in the upcoming
days. I was planning on proposing a metric similar to
kafka.log:type=LogManager,name=OfflineLogDirectoryCount, except that it
will show the count of SaturatedLogDirectory.

14. Great question and I will clarify this in the KIP! No, similarly to
getting out the offline state getting out of the saturated state once space
has been reclaimed would require a bounce of the broker. I have a want
should the KIP be accepted to build upon the proposal to allow
auto-recovery without the need of a restart.

Best,
Christo

On Fri, 2 Jun 2023 at 17:02, Igor Soarez <so...@apple.com.invalid> wrote:

> Hi Christo,
>
> Thank you for the KIP. Kafka is very sensitive to filesystem errors,
> and at the first IO error the whole log directory is permanently
> considered offline. It seems your proposal aims to increase the
> robustness of Kafka, and that's a positive improvement.
>
> I have some questions:
>
> 11. "Instead of sending a delete topic request only to replicas we
> know to be online, we will allow a delete topic request to be sent
> to all replicas regardless of their state. Previously a controller
> did not send delete topic requests to brokers because it knew they
> would fail. In the future, topic deletions for saturated topics will
> succeed, but topic deletions for the offline scenario will continue
> to fail." It seems you're describing ZK mode behavior? In KRaft
> mode the Controller does not send requests to Brokers. Instead
> the Controller persists new metadata records which all online Brokers
> then fetch. Since it's too late to be proposing design changes for
> ZK mode, is this change necessary? Is there a difference in how the
> metadata records should be processed by Brokers?
>
> 12. "We will add a new state to the broker state machines of a log
> directory (saturated) and a partition replica (saturated)."
> How are log directories and partitions replicas in these states
> represented in the Admin API? e.g. `DescribeReplicaLogDirs`
>
> 13. Should there be any metrics indicating the new saturated state for
> log directories and replicas?
>
> 14. "If an IOException due to No space left on device is raised (we
> will check the remaining space at that point in time rather than the
> exception message) the broker will stop all operations on logs
> located in that directory, remove all fetchers and stop compaction.
> Retention will continue to be respected. The same node as the
> current state will be written to in Zookeeper. All other
> IOExceptions will continue to be treated the same way they are
> treated now and will result in a log directory going offline."
> Does a log directory in this "saturated" state transition back to
> online if more storage space becomes available, e.g. due to
> retention policy enforcement or due to topic deletion, or does the
> Broker still require a restart to bring the log directory back to
> full operation?
>
> Best,
>
> --
> Igor
>
>
>

Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

Posted by Igor Soarez <so...@apple.com.INVALID>.

Hi Christo,

Thank you for the KIP. Kafka is very sensitive to filesystem errors,
and at the first IO error the whole log directory is permanently
considered offline. It seems your proposal aims to increase the
robustness of Kafka, and that's a positive improvement.

I have some questions:

11. "Instead of sending a delete topic request only to replicas we
know to be online, we will allow a delete topic request to be sent
to all replicas regardless of their state. Previously a controller
did not send delete topic requests to brokers because it knew they
would fail. In the future, topic deletions for saturated topics will
succeed, but topic deletions for the offline scenario will continue
to fail." It seems you're describing ZK mode behavior? In KRaft
mode the Controller does not send requests to Brokers. Instead
the Controller persists new metadata records which all online Brokers
then fetch. Since it's too late to be proposing design changes for
ZK mode, is this change necessary? Is there a difference in how the
metadata records should be processed by Brokers?

12. "We will add a new state to the broker state machines of a log
directory (saturated) and a partition replica (saturated)."
How are log directories and partitions replicas in these states
represented in the Admin API? e.g. `DescribeReplicaLogDirs`

13. Should there be any metrics indicating the new saturated state for
log directories and replicas?

14. "If an IOException due to No space left on device is raised (we
will check the remaining space at that point in time rather than the
exception message) the broker will stop all operations on logs
located in that directory, remove all fetchers and stop compaction.
Retention will continue to be respected. The same node as the
current state will be written to in Zookeeper. All other
IOExceptions will continue to be treated the same way they are
treated now and will result in a log directory going offline."
Does a log directory in this "saturated" state transition back to
online if more storage space becomes available, e.g. due to
retention policy enforcement or due to topic deletion, or does the
Broker still require a restart to bring the log directory back to
full operation?

Best,

--
Igor

Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

Posted by Colin McCabe <cm...@apache.org>.

On Wed, Jun 7, 2023, at 07:07, Christo Lolov wrote:
> Hey Colin,
>
> I tried the following setup:
>
> * Create 3 EC2 machines.
> * EC2 machine named A acts as a KRaft Controller.
> * EC2 machine named B acts as a KRaft Broker. (The only configurations
> different to the default values: log.retention.ms=30000,
> log.segment.bytes=1048576, log.retention.check.interval.ms=30000,
> leader.imbalance.check.interval.seconds=30)
> * EC2 machine named C acts as a Producer.
> * I attached 1 GB EBS volume to the EC2 machine B (Broker) and 
> configured
> the log.dirs to point to it.
> * I filled 995 MB of that EBS volume using fallocate.
> * I created a topic with 6 partitions and a replication factor of 1.
> * From the Producer machine I used 
> `~/kafka/bin/kafka-producer-perf-test.sh
> --producer.config ~/kafka/config/client.properties --topic batman
> --record-size 524288 --throughput 5 --num-records 150`. The disk on EC2
> machine B filled up and the broker shut down. I stopped the producer.
> * I stopped the controller on EC2 machine A. I started the controller to
> both be a controller and a broker (I need this because I cannot 
> communicate
> directly with a controller -
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-919%3A+Allow+AdminClient+to+Talk+Directly+with+the+KRaft+Controller+Quorum
> ).
> * I deleted the topic to which I had been writing by using 
> kafka-topics.sh .
> * I started the broker on EC2 machine B and it failed due to no space 
> left
> on disk during its recovery process. The topic was not deleted from the
> disk.
>
> As such, I am not convinced that KRaft addresses the problem of deleting
> topics on startup if there is no space left on the disk - is there
> something wrong with my setup that you disagree with? I think this will
> continue to be the case even when JBOD + KRaft is implemented.

Thank you for trying this. You're right that it doesn't work today, but it very easily could with no architecture changes.

We have the initial KRaft metadata load when log recovery starts. So if we wanted to, we could delete non-existent topics during log recovery. (Obviously we'd want to check both the topic ID and topic name, as always.)

This would be a good optimization in general. Spending time recovering a directory and then immediately deleting it during the initial metadata load is silly. I think nobody has bothered to optimize this yet since it's a bit of a rare case. But we very easily could.

I don't know if this would require a KIP or not. Arguably it's not user-visible behavior.

best,
Colin

>
> Let me know your thoughts!
>
> Best,
> Christo
>
> On Mon, 5 Jun 2023 at 11:03, Christo Lolov <ch...@gmail.com> wrote:
>
>> Hey Colin,
>>
>> Thanks for the review!
>>
>> I am also skeptical that much space can be reclaimed via compaction as
>> detailed in the limitations section of the KIP.
>>
>> In my head there are two ways to get out of the saturated state -
>> configure more aggressive retention and delete topics. I wasn't aware that
>> KRaft deletes topics marked for deletion on startup if the disks occupied
>> by those partitions are full - I will check it out, thank you for the
>> information! On the retention side, I believe there is still a benefit in
>> keeping the broker up and responsive - in my experience, people first try
>> to reduce the data they have and only when that also does not work they are
>> okay with sacrificing all of the data.
>>
>> Let me know your thoughts!
>>
>> Best,
>> Christo
>>
>> On Fri, 2 Jun 2023 at 20:09, Colin McCabe <cm...@apache.org> wrote:
>>
>>> Hi Christo,
>>>
>>> We're not adding new stuff to ZK at this point (it's deprecated), so it
>>> would be good to drop that from the design.
>>>
>>> With regard to the "saturated" state: I'm skeptical that compaction could
>>> really move the needle much in terms of freeing up space -- in most
>>> workloads I've seen, it wouldn't. Compaction also requires free space to
>>> function as well.
>>>
>>> So the main benefit of the "satured" state seems to be enabling deletion
>>> on full disks. But KRaft mode already has most of that benefit. Full disks
>>> (or, indeed, downed brokers) don't block deletion on KRaft. If you delete a
>>> topic and then bounce the broker that had the disk full, it will delete the
>>> topic directory on startup as part of its snapshot load process.
>>>
>>> So I'm not sure if we really need this. Maybe we should re-evaluate once
>>> we have JBOD + KRaft.
>>>
>>> best,
>>> Colin
>>>
>>>
>>> On Mon, May 22, 2023, at 02:23, Christo Lolov wrote:
>>> > Hello all!
>>> >
>>> > I would like to start a discussion on KIP-928: Making Kafka resilient to
>>> > log directories becoming full which can be found at
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full
>>> > .
>>> >
>>> > In summary, I frequently run into problems where Kafka becomes
>>> unresponsive
>>> > when the disks backing its log directories become full. Such
>>> > unresponsiveness generally requires intervention outside of Kafka. I
>>> have
>>> > found it to be significantly nicer of an experience when Kafka maintains
>>> > control plane operations and allows you to free up space.
>>> >
>>> > I am interested in your thoughts and any suggestions for improving the
>>> > proposal!
>>> >
>>> > Best,
>>> > Christo
>>>
>>

Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

Posted by Christo Lolov <ch...@gmail.com>.

Hey Colin,

I tried the following setup:

* Create 3 EC2 machines.
* EC2 machine named A acts as a KRaft Controller.
* EC2 machine named B acts as a KRaft Broker. (The only configurations
different to the default values: log.retention.ms=30000,
log.segment.bytes=1048576, log.retention.check.interval.ms=30000,
leader.imbalance.check.interval.seconds=30)
* EC2 machine named C acts as a Producer.
* I attached 1 GB EBS volume to the EC2 machine B (Broker) and configured
the log.dirs to point to it.
* I filled 995 MB of that EBS volume using fallocate.
* I created a topic with 6 partitions and a replication factor of 1.
* From the Producer machine I used `~/kafka/bin/kafka-producer-perf-test.sh
--producer.config ~/kafka/config/client.properties --topic batman
--record-size 524288 --throughput 5 --num-records 150`. The disk on EC2
machine B filled up and the broker shut down. I stopped the producer.
* I stopped the controller on EC2 machine A. I started the controller to
both be a controller and a broker (I need this because I cannot communicate
directly with a controller -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-919%3A+Allow+AdminClient+to+Talk+Directly+with+the+KRaft+Controller+Quorum
).
* I deleted the topic to which I had been writing by using kafka-topics.sh .
* I started the broker on EC2 machine B and it failed due to no space left
on disk during its recovery process. The topic was not deleted from the
disk.

As such, I am not convinced that KRaft addresses the problem of deleting
topics on startup if there is no space left on the disk - is there
something wrong with my setup that you disagree with? I think this will
continue to be the case even when JBOD + KRaft is implemented.

Let me know your thoughts!

Best,
Christo

On Mon, 5 Jun 2023 at 11:03, Christo Lolov <ch...@gmail.com> wrote:

> Hey Colin,
>
> Thanks for the review!
>
> I am also skeptical that much space can be reclaimed via compaction as
> detailed in the limitations section of the KIP.
>
> In my head there are two ways to get out of the saturated state -
> configure more aggressive retention and delete topics. I wasn't aware that
> KRaft deletes topics marked for deletion on startup if the disks occupied
> by those partitions are full - I will check it out, thank you for the
> information! On the retention side, I believe there is still a benefit in
> keeping the broker up and responsive - in my experience, people first try
> to reduce the data they have and only when that also does not work they are
> okay with sacrificing all of the data.
>
> Let me know your thoughts!
>
> Best,
> Christo
>
> On Fri, 2 Jun 2023 at 20:09, Colin McCabe <cm...@apache.org> wrote:
>
>> Hi Christo,
>>
>> We're not adding new stuff to ZK at this point (it's deprecated), so it
>> would be good to drop that from the design.
>>
>> With regard to the "saturated" state: I'm skeptical that compaction could
>> really move the needle much in terms of freeing up space -- in most
>> workloads I've seen, it wouldn't. Compaction also requires free space to
>> function as well.
>>
>> So the main benefit of the "satured" state seems to be enabling deletion
>> on full disks. But KRaft mode already has most of that benefit. Full disks
>> (or, indeed, downed brokers) don't block deletion on KRaft. If you delete a
>> topic and then bounce the broker that had the disk full, it will delete the
>> topic directory on startup as part of its snapshot load process.
>>
>> So I'm not sure if we really need this. Maybe we should re-evaluate once
>> we have JBOD + KRaft.
>>
>> best,
>> Colin
>>
>>
>> On Mon, May 22, 2023, at 02:23, Christo Lolov wrote:
>> > Hello all!
>> >
>> > I would like to start a discussion on KIP-928: Making Kafka resilient to
>> > log directories becoming full which can be found at
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full
>> > .
>> >
>> > In summary, I frequently run into problems where Kafka becomes
>> unresponsive
>> > when the disks backing its log directories become full. Such
>> > unresponsiveness generally requires intervention outside of Kafka. I
>> have
>> > found it to be significantly nicer of an experience when Kafka maintains
>> > control plane operations and allows you to free up space.
>> >
>> > I am interested in your thoughts and any suggestions for improving the
>> > proposal!
>> >
>> > Best,
>> > Christo
>>
>

Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

Posted by Christo Lolov <ch...@gmail.com>.

Hey Colin,

Thanks for the review!

I am also skeptical that much space can be reclaimed via compaction as
detailed in the limitations section of the KIP.

In my head there are two ways to get out of the saturated state - configure
more aggressive retention and delete topics. I wasn't aware that KRaft
deletes topics marked for deletion on startup if the disks occupied by
those partitions are full - I will check it out, thank you for the
information! On the retention side, I believe there is still a benefit in
keeping the broker up and responsive - in my experience, people first try
to reduce the data they have and only when that also does not work they are
okay with sacrificing all of the data.

Let me know your thoughts!

Best,
Christo

On Fri, 2 Jun 2023 at 20:09, Colin McCabe <cm...@apache.org> wrote:

> Hi Christo,
>
> We're not adding new stuff to ZK at this point (it's deprecated), so it
> would be good to drop that from the design.
>
> With regard to the "saturated" state: I'm skeptical that compaction could
> really move the needle much in terms of freeing up space -- in most
> workloads I've seen, it wouldn't. Compaction also requires free space to
> function as well.
>
> So the main benefit of the "satured" state seems to be enabling deletion
> on full disks. But KRaft mode already has most of that benefit. Full disks
> (or, indeed, downed brokers) don't block deletion on KRaft. If you delete a
> topic and then bounce the broker that had the disk full, it will delete the
> topic directory on startup as part of its snapshot load process.
>
> So I'm not sure if we really need this. Maybe we should re-evaluate once
> we have JBOD + KRaft.
>
> best,
> Colin
>
>
> On Mon, May 22, 2023, at 02:23, Christo Lolov wrote:
> > Hello all!
> >
> > I would like to start a discussion on KIP-928: Making Kafka resilient to
> > log directories becoming full which can be found at
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full
> > .
> >
> > In summary, I frequently run into problems where Kafka becomes
> unresponsive
> > when the disks backing its log directories become full. Such
> > unresponsiveness generally requires intervention outside of Kafka. I have
> > found it to be significantly nicer of an experience when Kafka maintains
> > control plane operations and allows you to free up space.
> >
> > I am interested in your thoughts and any suggestions for improving the
> > proposal!
> >
> > Best,
> > Christo
>

Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

Posted by Colin McCabe <cm...@apache.org>.

Hi Christo,

We're not adding new stuff to ZK at this point (it's deprecated), so it would be good to drop that from the design.

With regard to the "saturated" state: I'm skeptical that compaction could really move the needle much in terms of freeing up space -- in most workloads I've seen, it wouldn't. Compaction also requires free space to function as well.

So the main benefit of the "satured" state seems to be enabling deletion on full disks. But KRaft mode already has most of that benefit. Full disks (or, indeed, downed brokers) don't block deletion on KRaft. If you delete a topic and then bounce the broker that had the disk full, it will delete the topic directory on startup as part of its snapshot load process.

So I'm not sure if we really need this. Maybe we should re-evaluate once we have JBOD + KRaft.

best,
Colin

On Mon, May 22, 2023, at 02:23, Christo Lolov wrote:
> Hello all!
>
> I would like to start a discussion on KIP-928: Making Kafka resilient to
> log directories becoming full which can be found at
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full
> .
>
> In summary, I frequently run into problems where Kafka becomes unresponsive
> when the disks backing its log directories become full. Such
> unresponsiveness generally requires intervention outside of Kafka. I have
> found it to be significantly nicer of an experience when Kafka maintains
> control plane operations and allows you to free up space.
>
> I am interested in your thoughts and any suggestions for improving the
> proposal!
>
> Best,
> Christo