You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "David Arthur (Jira)" <ji...@apache.org> on 2021/07/08 16:39:00 UTC

[jira] [Created] (KAFKA-13050) Race between controller creating snapshot and snapshot cleaning

David Arthur created KAFKA-13050:
------------------------------------

             Summary: Race between controller creating snapshot and snapshot cleaning
                 Key: KAFKA-13050
                 URL: https://issues.apache.org/jira/browse/KAFKA-13050
             Project: Kafka
          Issue Type: Bug
          Components: controller, kraft
    Affects Versions: 3.0.0
            Reporter: David Arthur


If the controller attempts to take a snapshot with its cached OffsetAndEpoch while snapshot cleaning is happening, it is possible for the OffsetAndEpoch to be invalidated due to truncation.

{code}
[2021-07-08 12:12:41,938] WARN [Controller 1] org.apache.kafka.controller.QuorumController@67e0d836: failed with unknown server exception IllegalArgumentException at epoch -1 in 3207460 us.  Reverting to last committed offset 98. (org.apache.kafka.controller.QuorumController)
java.lang.IllegalArgumentException: Snapshot id (OffsetAndEpoch(offset=99, epoch=5)) is not valid according to the log: ValidOffsetAndEpoch(kind=SNAPSHOT, offsetAndEpoch=OffsetAndEpoch(offset=180, epoch=8))
	at kafka.raft.KafkaMetadataLog.createNewSnapshot(KafkaMetadataLog.scala:252)
	at org.apache.kafka.raft.KafkaRaftClient.lambda$createSnapshot$30(KafkaRaftClient.java:2334)
	at org.apache.kafka.snapshot.SnapshotWriter.createWithHeader(SnapshotWriter.java:134)
	at org.apache.kafka.raft.KafkaRaftClient.createSnapshot(KafkaRaftClient.java:2333)
	at org.apache.kafka.controller.QuorumController$SnapshotGeneratorManager.createSnapshotGenerator(QuorumController.java:351)
	at org.apache.kafka.controller.QuorumController.checkSnapshotGeneration(QuorumController.java:904)
	at org.apache.kafka.controller.QuorumController.access$3000(QuorumController.java:121)
	at org.apache.kafka.controller.QuorumController$QuorumMetaLogListener.lambda$handleCommit$0(QuorumController.java:681)
	at org.apache.kafka.controller.QuorumController$ControlEvent.run(QuorumController.java:311)
	at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
	at java.lang.Thread.run(Thread.java:748)
[2021-07-08 12:12:41,941] INFO [BrokerMetadataListener id=1] Loading snapshot 180-8. (kafka.server.metadata.BrokerMetadataListener)
{code}

This was observed while running a broker in combined mode with artificially low values for snapshot generation and cleaning.

{code}
metadata.log.max.record.bytes.between.snapshots=100
metadata.log.segment.bytes=1024
metadata.max.retention.bytes=4096
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)