You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Qinghui Xu (Jira)" <ji...@apache.org> on 2022/07/12 15:34:00 UTC

[jira] [Updated] (KAFKA-14071) Kafka request handler threads saturated when moving a partition

     [ https://issues.apache.org/jira/browse/KAFKA-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Qinghui Xu updated KAFKA-14071:
-------------------------------
    Description: 
*Kafka version:* 2.7.1

 

*Our scenario:*

Each server has 72 cores, and running with around 100 request handler threads, 50 network handler threads.

Many (a few hundreds) readers consuming from the same topic (and the same partitions) as they don't belong to the same consumer group.

Many (hundreds) producers are also producing data into the same topic, with a throughput around 100KB/s.

 

*The procedure to reproduce it:*

Move a partition leader replica to a new broker which was not the follower (meaning it does not have data for that partition)

 

*Observation:*

All Kafka request handler threads are overloaded. After an analysis of the threaddump, it seems most of them are trying to read the same log segment file which requires locking a monitor on a specific object in the `sun.nio.ch.FileChannelImpl`.

 

*Other remarks:*

Problem is not reproduced when it's a simple leadership transition between the replicas. For example, we try to shut down the leader broker, or move leader to another follower using kafka assignment script, it's working fine.

  was:
*Kafka version:* 2.7.1

{*}Our scenarios:{*}{*}{*}

Each server has 72 cores, and running with around 100 request handler threads, 50 network handler threads.

Many (a few hundreds) readers consuming from the same topic (and the same partitions) as they don't belong to the same consumer group.

Many (hundreds) producers are also producing data into the same topic, with a throughput around 100KB/s.

 

*The procedure to reproduce it:*

Move a partition leader replica to a new broker which was not the follower (meaning it does not have data for that partition)

 

*Observation:*

All Kafka request handler threads are overloaded. After an analysis of the threaddump, it seems most of them are trying to read the same log segment file which requires locking a monitor on a specific object in the `sun.nio.ch.FileChannelImpl`.

 

*Other remarks:*

Problem is not reproduced when it's a simple leadership transition between the replicas. For example, we try to shut down the leader broker, or move leader to another follower using kafka assignment script, it's working fine.


> Kafka request handler threads saturated when moving a partition
> ---------------------------------------------------------------
>
>                 Key: KAFKA-14071
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14071
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Qinghui Xu
>            Priority: Major
>
> *Kafka version:* 2.7.1
>  
> *Our scenario:*
> Each server has 72 cores, and running with around 100 request handler threads, 50 network handler threads.
> Many (a few hundreds) readers consuming from the same topic (and the same partitions) as they don't belong to the same consumer group.
> Many (hundreds) producers are also producing data into the same topic, with a throughput around 100KB/s.
>  
> *The procedure to reproduce it:*
> Move a partition leader replica to a new broker which was not the follower (meaning it does not have data for that partition)
>  
> *Observation:*
> All Kafka request handler threads are overloaded. After an analysis of the threaddump, it seems most of them are trying to read the same log segment file which requires locking a monitor on a specific object in the `sun.nio.ch.FileChannelImpl`.
>  
> *Other remarks:*
> Problem is not reproduced when it's a simple leadership transition between the replicas. For example, we try to shut down the leader broker, or move leader to another follower using kafka assignment script, it's working fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)