You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "mumrah (via GitHub)" <gi...@apache.org> on 2023/05/26 14:30:12 UTC

[GitHub] [kafka] mumrah commented on pull request #13742: KAFKA-14996: Handle overly large user operations on the kcontroller

mumrah commented on PR #13742:
URL: https://github.com/apache/kafka/pull/13742#issuecomment-1564483362

   @divijvaidya Colin can correct me if I'm mistaken, but I believe this patch is mainly about closing an existing edge case until we implement KIP-868 (metadata transactions). Once we have transactions in the controller, we can allow arbitrarily large batches of records.
   
   > I am concerned about the user facing aspect of this change. If I am a user and get this exception, what am I expected to do to resolve it?
   
   Right now, if you create a topic with more than ~10000 partitions, you'll get a server error anyways. The controller fails to commit the batch, throws and exception, and the renounces leadership. 
   
   Here's what happens on the controller:
   ```
   [2023-05-26 10:24:28,308] DEBUG [QuorumController id=1] Got exception while running createTopics(1813420413). Invoking handleException. (org.apache.kafka.queue.KafkaEventQueue)
   java.lang.IllegalStateException: Attempted to atomically commit 20001 records, but maxRecordsPerBatch is 10000
   	at org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)
   	at org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)
   	at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
   	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
   	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
   	at java.lang.Thread.run(Thread.java:750)
   [2023-05-26 10:24:28,314] INFO [RaftManager id=1] Received user request to resign from the current epoch 3 (org.apache.kafka.raft.KafkaRaftClient)
   [2023-05-26 10:24:28,323] INFO [RaftManager id=1] Failed to handle fetch from 2 at 142 due to NOT_LEADER_OR_FOLLOWER (org.apache.kafka.raft.KafkaRaftClient)
   ```
   
   And the client sees:
   ```
   [2023-05-26 10:24:28,351] ERROR org.apache.kafka.common.errors.UnknownServerException: The server experienced an unexpected error when processing the request.
    (kafka.admin.TopicCommand$)
   ```
   
   So, really this patch isn't changing anything from the client's perspective. It's just prevent the controller from renouncing (which is the real problem).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org