You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ewen Cheslack-Postava (JIRA)" <ji...@apache.org> on 2016/12/23 17:14:58 UTC

[jira] [Commented] (KAFKA-4558) throttling_test fails if the producer starts too fast.

    [ https://issues.apache.org/jira/browse/KAFKA-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15773299#comment-15773299 ] 

Ewen Cheslack-Postava commented on KAFKA-4558:
----------------------------------------------

There's another case that looks basically the same as this issue:

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-23--001.1482484603--apache--trunk--76169f9/report.html

{quote}
test_id:    kafkatest.tests.core.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_SSL.failure_mode=hard_bounce.broker_type=controller
status:     FAIL
run time:   3 minutes 27.556 seconds


    9 acked message did not make it to the Consumer. They are: [3425, 3428, 3404, 3407, 3410, 3413, 3416, 3419, 3422]. We validated that the first 9 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer.
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run
    data = self.run_test()
  File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test
    return self.test_context.function(self.test)
  File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", line 321, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/replication_test.py", line 155, in test_replication_with_broker_failure
    self.run_produce_consume_validate(core_test_action=lambda: failures[failure_mode](self, broker_type))
  File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 101, in run_produce_consume_validate
    self.validate()
  File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 163, in validate
    assert success, msg
AssertionError: 9 acked message did not make it to the Consumer. They are: [3425, 3428, 3404, 3407, 3410, 3413, 3416, 3419, 3422]. We validated that the first 9 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer.
{quote}

These are in the middle of the set and are all 3 apart, which is presumably due to the fact that there are 3 partitions in the topic and we are seeing a piece of one of the partitions missing instead of all 3. I think probably this is fairly pervasive in the ProduceConsumeValidate tests, so may not be "fixable" just by ignoring tests one-off.

> throttling_test fails if the producer starts too fast.
> ------------------------------------------------------
>
>                 Key: KAFKA-4558
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4558
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>            Assignee: Apurva Mehta
>
> As described in https://issues.apache.org/jira/browse/KAFKA-4526, the throttling test will fail if the producer in the produce-consume-validate loop starts up before the consumer is fully initialized.
> We need to block the start of the producer until the consumer is ready to go. 
> The current plan is to poll the consumer for a particular metric (like, for instance, partition assignment) which will act as a good proxy for successful initialization. Currently, we just check for the existence of a process with the PID, which is not a strong enough check, causing the test to fail intermittently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)