You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Apurva Mehta (JIRA)" <ji...@apache.org> on 2017/06/07 06:30:19 UTC

[jira] [Created] (KAFKA-5396) Consumer reading from beginning of log can read the same message multiple times.

Apurva Mehta created KAFKA-5396:
-----------------------------------

             Summary: Consumer reading from beginning of log can read the same message multiple times.
                 Key: KAFKA-5396
                 URL: https://issues.apache.org/jira/browse/KAFKA-5396
             Project: Kafka
          Issue Type: Bug
            Reporter: Apurva Mehta


I noticed this when running the transactions system test with hard broker bounces. We have a consumer in READ_COMMITTED mode reading from the tail of the log as the writes are appended.

This test has failed once because the concurrent consumer returned duplicate data. The actual log has no duplicates, so the problem is in the consumer. 

One of the duplicate values is '0', and is at offset 250 in output-topic-1. The first time it is read, we see the following.

{noformat}
[2017-06-07 05:50:34,601] TRACE Returning fetched records at offset 0 for assigned partition output-topic-0 and update position to 250 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,602] TRACE Preparing to read 2967 bytes of data for partition output-topic-1 with offset 250 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,602] TRACE Updating high watermark for partition output-topic-1 to 502 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,613] TRACE Returning fetched records at offset 250 for assigned partition output-topic-1 and update position to 500 (org.apache.kafka.clients.consumer.internals.Fetcher)
{noformat}

The next time it is read, we see this
{noformat}
[2017-06-07 05:51:36,386] TRACE Preparing to read 169858 bytes of data for partition output-topic-1 with offset 0 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:51:36,389] TRACE Updating high watermark for partition output-topic-1 to 13053 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:51:36,391] TRACE Returning fetched records at offset 0 for assigned partition output-topic-1 and update position to 500 (org.apache.kafka.clients.consumer.internals.Fetcher)
{noformat}

For some reason, the fetcher re-sent the data from offset 0, an reset the position to 500. 

This is the plain consumer doing 'poll' in a loop until it is killed. So this position reset is puzzling. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)