You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by ka...@chrisdone.com on 2020/03/10 14:02:22 UTC
Pristine Zookeeper and Kafka (via Docker) producing OFFSET_OUT_OF_RANGE for Fetch

Hi all,


I'm implementing a custom client.

I was wondering whether anyone could explain the OFFSET_OUT_OF_RANGE
error in this scenario.

My test suite tears down and spins up a fresh zookeeper and kafka every
time inside a pristine docker container.

The test suite runs as:

1. Producer runs first and finishes.
2. Consumer group members then runs later in 3 separate threads.

I write key/pairs of "fruit", "animal" and "vegetable" with a
round-robin algorithm for each partition.

The consumer group process runs an OffsetCommit with offset=0 for each
partition to kick off. (I found that if I just started with OffsetFetch I
would get UNKNOWN_TOPIC_OR_PARTITION, and couldn't find docs about
whether this was "normal" or not. But that's a tangent.)

This page shows writing to three partitions within a topic, and
each Produce request succeeding:

https://chrisdone.com/consumer-groups-sink-out-of-range.html [fine]

Each column is a thread in my test suite.

However, when trying to fetch those three partitions, for some reason on
partition 2, I get OFFSET_OUT_OF_RANGE. The other two partitions consume
successfully. This can be seen in the consumer side shown in this page:

https://chrisdone.com/consumer-groups-source-out-of-range.html [problem]

(scroll to about half way through, as the first half is three threads trying to join the group, then when they've joined the group. Three new threads spin up to the right, one for each consumer in the consumer group.)

Yet, this is a nondeterministic error that seems to depend on
timing. I have random 1-500ms waiting lag intentionally placed in
every message log so that the program might exhibit real-world cases
like this.

If I remove the random timeouts, this process works every time
(demonstrated here:
https://chrisdone.com/consumer-groups-source-working.html). So there is
some kind of timing issue that I cannot identify.

Upon receiving an OFFSET_OUT_OF_RANGE error, you can see in the log
that I wait, refresh metadata, and retry the request again (as I read
elsewhere[1] that this "typically implies a leader change") only to
get another OFFSET_OUT_OF_RANGE.

I'm receiving a OffsetFetch response of partitionIndex = 2 and
committedOffset = 0,

( ThreadId 46
, SourceRequestMsg
 (ReceivedResponse
 0.006022722
 s
 (OffsetFetchResponseV0
 { topicsArray =
 ARRAY
 [ OffsetFetchResponseV0Topics
 { name = STRING "355b26d6-ccab-4b28-bd05-a44ac6326cb7"
 , partitionsArray =
 ARRAY
 [ OffsetFetchResponseV0TopicsPartitions
 { partitionIndex = 2
 , committedOffset = 0
 , metadata = NULLABLE_STRING (Just "")
 , errorCode = NONE
 }
 ]
 }
 ]
 })))

I sent a fetch request,

( ThreadId 49
, ConsumerGroupConsumerFor
 "myclientid-e79931cc-d6d4-479b-90d6-1b61aab85198"
 [PartitionId 2]
 (KafkaSourceMsg
 (SourceRequestMsg
 (SendingRequest
 (FetchRequestV4
 { replicaId = -1
 , maxWaitTime = 200
 , minBytes = 5
 , maxBytes = 1048576
 , isolationLevel = 0
 , topicsArray =
 ARRAY
 [ FetchRequestV4Topics
 { topic =
 STRING "355b26d6-ccab-4b28-bd05-a44ac6326cb7"
 , partitionsArray =
 ARRAY
 [ FetchRequestV4TopicsPartitions
 { partition = 2
 , fetchOffset = 0
 , partitionMaxBytes = 1048576
 }
 ]
 }
 ]
 })))))

And yet it returns

FetchResponseV4Responses
 { topic = STRING "355b26d6-ccab-4b28-bd05-a44ac6326cb7"
 , partitionResponsesArray =
 ARRAY
 [ FetchResponseV4ResponsesPartitionResponses
 { partitionHeader =
 FetchResponseV4ResponsesPartitionResponsesPartitionHeader
 { partition = 2
 , errorCode = OFFSET_OUT_OF_RANGE
 , highWatermark = -1
 , lastStableOffset = -1
 , abortedTransactionsArray = ARRAY []
 }
 , recordSet = RecordBatchV2Sequence {recordBatchV2Sequence = []}
 }
 ]
 }

So I am very confused.

Can someone who is more familiar with this process hazard a guess as to
what's going on?

Cheers,

Chris

[1]: https://issues.apache.org/jira/browse/KAFKA-7395?focusedCommentId=16640313&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16640313