You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Oleg Zhurakousky <oz...@hortonworks.com> on 2016/08/29 13:54:08 UTC

Kafka slow consumer

Hi all

Have a question about the scenario where consumer that is consuming Kafka records is not very fast (regardless of the reason). And yes I know about certain configuration properties on both server and consumer which help with mitigating the effects, so I just simply want to confirm that what I am seeing is the expected behavior, so here it is

- Kafka topic with a single partition. 
- 3 consumers which essentially means that one will be consuming records while two others will essentially be on stand-by
- First consumer doesn’t manage to process all returned records within ‘session.timeout.ms” and broker chooses different consumer (rebalancing). While fist consumer is unaware (until next poll() or commit() call) that it’s been blacklisted and continues processing the remaining records the consumer.commitSync() begins to fail (rightfully so) and first consumer is now out of the picture. . .
- Second consumer starts processing records and the process resumes

However, the second consumer manages to grab a few records that have actually been processed successfully by the first consumer (i.e., commitSync() was executed successfully). So let’s say first consumer processed 0, 1, 2, 3, 4, 5 and second consumer starts with 4, 5, 6, 7, 8. . . so 4 and 5 becomes duplicates.

I am suspecting that there is synchronization gap where during consumer rebalancing some offsets that were “just committed” by the black-listed consumer are not known to the next consumer chosen by the broker, hence allowing it to read those records and produce duplicates.

Is my assumption correct? I mean I can reproduce it in so many different ways and I can also fix it in so many different ways, so its not a huge problem, but I am just trying to understand exactly what’s happening and if this is the expected behavior. After all Kafka does guarantee “at-least-one”, but not “exactly-one”, which would make sense. Duplicates are always better then data-loss.

Cheers
Oleg