You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Lucas Bradstreet (Jira)" <ji...@apache.org> on 2020/06/29 15:02:00 UTC

[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

    [ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147861#comment-17147861 ] 

Lucas Bradstreet commented on KAFKA-10158:
------------------------------------------

I think the problem here is subtly different to the one diagnosed. The test is attempting to show that an in progress reassignment does not show up as under replication partitions. The problem is that the reassignment may be completing as the `--under-replicated-partitions` check is taking place, and unfortunately there is some inconsistency in how the command line tool performs this check. It first performs a topic describe and then looks up whether a reassignment is in progress. As these checks are performed at different times, it can end up seeing the topic describe while the reassignment is in progress, and then not see the reassignment is in progress immediately after.

This inconsistency would not be a problem for this test if there wasn't a second problem. We set a replication throttle
{noformat}
TestUtils.setReplicationThrottleForPartitions(adminClient, brokerIds, Set(tp), throttleBytes = 1) {noformat}
however the throttle does not become effective due to the way the check works in the replica fetcher:
{noformat}
!fetchState.isReplicaInSync && quota.isThrottled(topicPartition) && quota.isQuotaExceeded
{noformat}
The first thread causes the follower to think it's in sync, and the follow up fetch causes fetchState.isReplicaInSync to return true, which means it does not throttle itself.

> Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-10158
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10158
>             Project: Kafka
>          Issue Type: Bug
>          Components: unit tests
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>            Priority: Minor
>             Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the reassignment is still in progress when we start to verify the "under-replicated-partitions". In order to make it stable, it needs a wait for the reassignment completion before verifying the topic command with "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)