You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ismael Juma (JIRA)" <ji...@apache.org> on 2016/05/17 14:30:13 UTC

[jira] [Updated] (KAFKA-3488) commitAsync() fails if metadata update creates new SASL/SSL connection

     [ https://issues.apache.org/jira/browse/KAFKA-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ismael Juma updated KAFKA-3488:
-------------------------------
    Fix Version/s:     (was: 0.10.1.0)

> commitAsync() fails if metadata update creates new SASL/SSL connection
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-3488
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3488
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.9.0.1
>            Reporter: Rajini Sivaram
>            Assignee: Rajini Sivaram
>             Fix For: 0.10.0.0
>
>
> Sasl/SslConsumerTest.testSimpleConsumption() fails intermittently with a failure in {{commitAsync()}}. The exception stack trace shows:
> {quote}
> kafka.api.SaslPlaintextConsumerTest.testSimpleConsumption FAILED
> java.lang.AssertionError: expected:<1> but was:<0>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:834)
> 	at org.junit.Assert.assertEquals(Assert.java:645)
> 	at org.junit.Assert.assertEquals(Assert.java:631)
> 	at kafka.api.BaseConsumerTest.awaitCommitCallback(BaseConsumerTest.scala:340)
> 	at kafka.api.BaseConsumerTest.testSimpleConsumption(BaseConsumerTest.scala:85)
> {quote}
> I have recreated this with some additional trace. The tests run with a very small metadata expiry interval, triggering metadata updates quite often. If a metadata request immediately following a {{commitAsync()}} call creates a new SSL/SASL connection, {{ConsumerNetworkClient.poll}} returns to process the connection handshake packets. Since {{ConsumerNetworkClient.poll}} discards all unsent packets before returning from poll, this can result in the failure of the commit - the callback is invoked with {{SendFailedException}}.
> I understand that {{ConsumerNetworkClient.poll()}} discards unsent packets rather than buffer them to keep the code simple. And perhaps it is ok to fail {{commitAsync}} occasionally since the callback does indicate that the caller should retry. But it feels like an unnecessary limitation that requires error handling in client applications when there are no real failures and makes it much harder to test reliably. As special handling to fix issues like KAFKA-3412, KAFKA-2672 adds more complexity to the code anyway, and because it is much harder to debug failures that affect only SSL/SASL, it may be worth considering improving this behaviour.
> I will see if I can submit a PR for the specific issue I was seeing with the impact of handshakes on {{commitAsync()}}, but I will be interested in views on improving the logic in {{ConsumerNetworkClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)