You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Greg Zoller (JIRA)" <ji...@apache.org> on 2016/04/16 21:14:25 UTC

[jira] [Created] (KAFKA-3569) commitAsync() sometimes fails with errors

Greg Zoller created KAFKA-3569:
----------------------------------

             Summary: commitAsync() sometimes fails with errors
                 Key: KAFKA-3569
                 URL: https://issues.apache.org/jira/browse/KAFKA-3569
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 0.9.0.1
         Environment: MacOS Docker
            Reporter: Greg Zoller


I have a KafkaConsumer instance I've wrapped in a thread, which communicates with the outside (multi-threaded) world via a blocking queue.  Code is here:

https://gist.github.com/gzoller/93fe2392fd3606bcb3b879e4ab2f8f6e

I'm not worried about batch commits at this point--I want to understand single-message commit behavior first.  If I commitSync() a single message it is "slow" but is consistent--doesn't drop commits.

If I use commitAsync() its "fast" but I get flakey results--it drops commits, even for small numbers.

I pre-loaded a 4-partition topic with 12 messages--3 per partition.  Then I use this code across 2 consumers (each with their own instance of this class, hence their own thread).  One consumer winds up listening on 2 partitions and the other on the remaining 2.  

Read logs confirm the poll() behavior/content is working as expected for the 2 consumers, meaning each of the 2 consumers is successfully seeing (and only seeing) messages from their respectively assigned partitions.

Some of the 12 messages committed fine, while others report errors like this one in the callback:

ERROR [{lowercaseStrings-2=OffsetAndMetadata{offset=1, metadata=''}}]:  org.apache.kafka.clients.consumer.internals.SendFailedExceptionERROR

My final offsets after my test run of 12:

GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
group1, lowercaseStrings, 0, 2, 3, 1, consumer-1_/192.168.99.1
group1, lowercaseStrings, 1, unknown, 3, unknown, consumer-1_/192.168.99.1
group1, lowercaseStrings, 2, unknown, 3, unknown, consumer-2_/192.168.99.1
group1, lowercaseStrings, 3, 2, 3, 1, consumer-2_/192.168.99.1

The "missing" offsets correspond to the ones that produced errors, so all messages are accounted for, either by success or by error.

At high volumes the behavior is the same.  Over 1 million messages I'll drop 30K-60K of them due to these same kinds of errors, while the other commit successfully.  The speed difference is profound, though!  commitSync() takes several minutes for 1m, but drops none.  commitAsync() takes maybe 5 seconds with losses.

I noted there has been some work done in this area in 0.10.1.0 (for example SendFailedException doesn't seem to be in the code anymore) and was eager to see if the problem persists, but I'm having KafkaProducer problems in 0.10.1.0 and haven't been able to see if this behavior remains or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)