You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ewen Cheslack-Postava (JIRA)" <ji...@apache.org> on 2016/04/08 19:33:25 UTC

[jira] [Commented] (KAFKA-3374) Failure in security rolling upgrade phase 2 system test

    [ https://issues.apache.org/jira/browse/KAFKA-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232533#comment-15232533 ] 

Ewen Cheslack-Postava commented on KAFKA-3374:
----------------------------------------------

Another case of this here: http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-03-08--001.1457454171--apache--trunk--f6e35de/report.html

In both of the test_log.debug files, right before the failure this is logged:

{quote}
Time delta between successively acked messages is large: delta_t_sec: 8.61418795586, prev_message: {u'name': u'producer_send_success', u'time_ms': 14601149
70682, u'partition': 0, u'value': u'15721', u'topic': u'test_topic', u'key': None, u'offset': 2958, u'class': u'class org.apache.kafka.tools.VerifiableProducer'}, current_message: {u'name': u'producer_send_success', u'time_ms': 1460114
979295, u'partition': 2, u'value': u'15716', u'topic': u'test_topic', u'key': None, u'offset': 2774, u'class': u'class org.apache.kafka.tools.VerifiableProducer'}
{quote}

And the consumer times out. It looks like there's some kind of stall on the producer side that then causes the consumer to hit the limit we put in place. It's 8s in last night's run, but was 21s in the older failure. This could just be a matter of timeouts (I don't know the details of this test well enough to say), but there might also be some underlying cause for the stall in the producer.

> Failure in security rolling upgrade phase 2 system test
> -------------------------------------------------------
>
>                 Key: KAFKA-3374
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3374
>             Project: Kafka
>          Issue Type: Test
>          Components: system tests
>            Reporter: Ismael Juma
>            Assignee: Ben Stopford
>            Priority: Critical
>             Fix For: 0.10.1.0
>
>
> [~geoffra] reported the following a few days ago.
> Seeing fairly consistent failures in
> "Module: kafkatest.tests.security_rolling_upgrade_test
> Class:  TestSecurityRollingUpgrade
> Method: test_rolling_upgrade_phase_two
> Arguments:
> {
>   "broker_protocol": "SASL_PLAINTEXT",
>   "client_protocol": "SASL_SSL"
> }
> Last successful run (git hash): 2a58ba9
> First failure: f7887bd
> (note failures are not 100% consistent, so there's non-zero chance the commit that introduced the failure is prior to 2a58ba9)
> See for example:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-03-08--001.1457454171--apache--trunk--f6e35de/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)