You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "shenwenbing (Jira)" <ji...@apache.org> on 2020/11/02 02:53:00 UTC

[jira] [Created] (KAFKA-10672) Restarting Kafka always takes a lot of time

shenwenbing created KAFKA-10672:
-----------------------------------

             Summary: Restarting Kafka always takes a lot of time
                 Key: KAFKA-10672
                 URL: https://issues.apache.org/jira/browse/KAFKA-10672
             Project: Kafka
          Issue Type: Improvement
          Components: core
    Affects Versions: 2.0.0
         Environment: A cluster of 21 Kafka nodes;
Each node has 12 disks;
Each node has about 1500 partitions;
There are approximately 700 leader partitions per node;
Slow-loading partitions have about 1000 log segments;
            Reporter: shenwenbing
         Attachments: server.log

When the snapshot file does not exist, or the latest snapshot file before the current active period, restoring the state of producers will traverse the log section, it will traverse the log all batch, in the period when the individual broker node partition number many, that there are most of the number of logs, can cause a lot of IO number, IO will only load one batch at a time, such as a log there will always be in the tens of thousands of batch, I found that in the code for each batch are at least two IO operation, when a batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be generated, and then at least 65,536 *2= 131,072 IO operations will be generated, which will lead to a lot of time spent in kafka startup process. We configured 15 log recovery threads in the production environment, and it still took more than 2 hours to load a partition,can community puts forward some proposals to the situation or improve.For detailed logs, see the section on test-perf-18 partitions in the nearby logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)