You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Matthew Wong (Jira)" <ji...@apache.org> on 2020/02/28 18:33:00 UTC

[jira] [Created] (KAFKA-9624) test_throttled_reassignment as EndToEndTest

Matthew Wong created KAFKA-9624:
-----------------------------------

             Summary: test_throttled_reassignment as EndToEndTest
                 Key: KAFKA-9624
                 URL: https://issues.apache.org/jira/browse/KAFKA-9624
             Project: Kafka
          Issue Type: Bug
          Components: system tests
    Affects Versions: 2.4.1
            Reporter: Matthew Wong
             Fix For: 2.4.1


The test_throttled_reassignment test fails because the consumer that is used to validate reassignment does not start on time to consume all messages. This does not seem like an issue with the throttling of the reassignment, since increasing the timeout allowed the test to pass multiple consecutive runs locally. This test seemed to rely on the default JmxTool for the console consumer that was removed in this commit: [{{179d0d7}}|https://github.com/apache/kafka/commit/179d0d73d65ab2c3eb8bc79c70b9893f07038447]

The console consumer would check to see if it had partitions assigned to it before beginning to consume. Although the test occasionally failed with the JmxTool, it began to fail much more after the removal. Error messages of failures followed the below format with varying numbers of missed messages. They are the first messages by the producer.

```535 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 515 more. Total Acked: 192792, Total Consumed: 192259. We validated that the first 535 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer.```

In the scope of the test, this error suggests that the test is falling into the race condition described in produce_consume_validate.py, which has the timeout to prevent the consumer from missing initial messages. Rewriting this test as an EndToEndTest allows to use its verifiable consumer that can await partition assignment, addressing the race condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)