You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Edoardo Comar (Jira)" <ji...@apache.org> on 2023/05/19 14:46:00 UTC

[jira] [Commented] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

    [ https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724300#comment-17724300 ] 

Edoardo Comar commented on KAFKA-14996:
---------------------------------------

Similar error is encounter if creating partitions > QuorumController.MAX_RECORDS_PER_BATCH on an existing topic.

More worrying is that the cluster looks like it can be unstable after the error occurs.

Seen in a cluster with 6 nodes 0,1,2=broker,controller 3,4,5=broker

e.g. server.log for node 1 :

 

{{[2023-05-19 15:43:32,640] INFO [RaftManager id=1] Completed transition to CandidateState(localId=1, epoch=300, retries=86, voteStates=\{0=UNRECORDED, 1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1145) from CandidateState(localId=1, epoch=299, retries=85, voteStates=\{0=UNRECORDED, 1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1817) (org.apache.kafka.raft.QuorumState)}}
{{[2023-05-19 15:43:32,649] WARN [RaftManager id=1] Received error UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with correlation id 4646. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:32,650] WARN [RaftManager id=1] Received error UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with correlation id 4647. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,095] WARN [RaftManager id=1] Received error UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with correlation id 4652. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,147] WARN [RaftManager id=1] Received error UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with correlation id 4656. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,594] WARN [RaftManager id=1] Received error UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with correlation id 4678. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,696] WARN [RaftManager id=1] Received error UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with correlation id 4684. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,773] INFO [RaftManager id=1] Election has timed out, backing off for 1000ms before becoming a candidate again (org.apache.kafka.raft.KafkaRaftClient)}}
{{[2023-05-19 15:43:34,774] INFO [RaftManager id=1] Re-elect as candidate after election backoff has completed (org.apache.kafka.raft.KafkaRaftClient)}}
{{[2023-05-19 15:43:34,784] INFO [RaftManager id=1] Completed transition to CandidateState(localId=1, epoch=301, retries=87, voteStates=\{0=UNRECORDED, 1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1022) from CandidateState(localId=1, epoch=300, retries=86, voteStates=\{0=UNRECORDED, 1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1145) (org.apache.kafka.raft.QuorumState)}}
{{[2023-05-19 15:43:34,802] WARN [RaftManager id=1] Received error UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with correlation id 4691. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:34,825] WARN [RaftManager id=1] Received error UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with correlation id 4692. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{}}

 

> CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-14996
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14996
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>            Reporter: Edoardo Comar
>            Assignee: Edoardo Comar
>            Priority: Major
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (10000)
> the client receives an UnknownServerException - it could rather receive a better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed with unknown server exception IllegalStateException at epoch 2 in 21956 us.  Renouncing leadership and reverting to the last committed offset 174. (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 records, but maxRecordsPerBatch is 10000}}
> {{    at org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)