You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "Chia-Ping Tsai (Jira)" <ji...@apache.org> on 2020/12/02 14:54:00 UTC

[jira] [Commented] (KAFKA-10786) ReplicaAlterLogDirsThread gets stuck during the reassignment of Kafka partition

    [ https://issues.apache.org/jira/browse/KAFKA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242411#comment-17242411 ] 

Chia-Ping Tsai commented on KAFKA-10786:
----------------------------------------

Is this similar to https://issues.apache.org/jira/browse/KAFKA-9654? Could you update kafka to the fixed version and then test it again?

>  ReplicaAlterLogDirsThread gets stuck during the reassignment of Kafka partition
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-10786
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10786
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 2.0.0
>            Reporter: nick song
>            Priority: Blocker
>         Attachments: attachment 1.png, attachment 2.png, attachment 3.png
>
>
> Topic config：Configs for topic 'athena_8603' are leader.replication.throttled.replicas=9:7,9:6,10:8,10:7,8:6,8:5,11:9,11:8,follower.replication.throttled.replicas=9:13,10:0,8:15,11:14,retention.ms=86400000,delete.retention.ms=60000
>  
> Reassignment of replica athena_8603-1-15 is still in progress
>  
> When I reassigning the topic partition, I found that some tasks have been in progress, lasting more than ten hours. After investigation, it was found that ReplicaAlterLogDirsThread was running all the time and occupies a high CPU usage rate (Attachment 1).
> Check the thread information (Attachment 2) and find that the log data is being copied. Check the log directory (Attachment 3) and find that the index of the future directory is older than the original log. Is it because the configuration delete.retention.ms=60000 caused the data to be deleted while copying ? This causes the replication thread to get stuck. Is there any solution?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)