You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Stig Rohde Døssing <st...@gmail.com> on 2020/07/30 14:30:43 UTC
Follower node receiving records out of order

Hi,

We are expanding a 3-node cluster to a 5-node cluster, and have encountered
an issue where a follower node is fetching offsets out of order. We are on
2.4.0.

We've used the kafka-reassign-partitions tool. Several partitions are
affected. Picking an example partition (11), it was configured to go from
replicas [1,2,3] to [3,4,5], without enabling throttling. Below is the
current state of that partition:

Topic: some-topic Partition: 11 Leader: 3 Replicas: 3,4,5,1,2 Isr: 2,3,1

What we are seeing is that follower 4 is getting an exception when fetching
offsets from 3.

kafka.common.OffsetsOutOfOrderException: Out of order offsets found in
append to some-topic: ArrayBuffer(<snip>, 1091513, 745397, 1110822,
1127988, <snip>)
at kafka.log.Log.$anonfun$append$2(Log.scala:1096)
at kafka.log.Log.maybeHandleIOException(Log.scala:2316)
at kafka.log.Log.append(Log.scala:1032)
at kafka.log.Log.appendAsFollower(Log.scala:1012)
at
kafka.cluster.Partition.$anonfun$doAppendRecordsToFollowerOrFutureReplica$1(Partition.scala:910)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
at
kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:903)
at
kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:917)
at
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:161)
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:317)
at scala.Option.foreach(Option.scala:437)
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:306)
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:305)
at scala.collection.immutable.List.foreach(List.scala:305)
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5(AbstractFetcherThread.scala:305)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:305)
at
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:133)
at
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:132)
at scala.Option.foreach(Option.scala:437)
at
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:132)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)

[2020-07-30 11:25:37,676] WARN [ReplicaFetcher replicaId=4, leaderId=3,
fetcherId=0] Partition some-topic-11 marked as failed
(kafka.server.ReplicaFetcherThread)

As far as we can tell, it is a single offset in the sequence that is not
properly ordered. We are not sure what to look for to debug this? If anyone
has seen something similar, advice would be welcome. The reassignment is
stuck as this error recurs.

Broker configuration that may or may not be relevant follows:

listeners=PLAINTEXT://localhost:9092
replica.fetch.max.bytes=104857600
log.segment.bytes=134217728
message.max.bytes=104857600
compression.type=producer
log.dirs=/data/kafka-data
log.retention.check.interval.ms=300000
unclean.leader.election.enable=false