You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/03/11 22:25:00 UTC

[jira] [Commented] (KAFKA-9605) EOS Producer could throw illegal state if trying to complete a failed batch after fatal error

    [ https://issues.apache.org/jira/browse/KAFKA-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057443#comment-17057443 ] 

ASF GitHub Bot commented on KAFKA-9605:
---------------------------------------

hachikuji commented on pull request #8177: KAFKA-9605: Do not attempt to abort batches when txn manager is in fatal error
URL: https://github.com/apache/kafka/pull/8177
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> EOS Producer could throw illegal state if trying to complete a failed batch after fatal error
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9605
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9605
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.0, 2.5.0
>            Reporter: Boyang Chen
>            Assignee: Boyang Chen
>            Priority: Major
>             Fix For: 2.6.0
>
>
> In the Producer we could see network client hits fatal exception while trying to complete the batches after Txn manager hits fatal fenced error:
> {code:java}
>  
> [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,673] ERROR [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender)
> [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
> [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,674] INFO [stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-0_0-producer, transactionalId=stream-soak-test-0_0] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
> [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,684] INFO [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Resetting sequence number of batch with current sequence 354277 for partition windowed-node-counts-0 to 354276 (org.apache.kafka.clients.producer.internals.TransactionManager)
> [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,684] INFO [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] Resetting sequence number of batch with current sequence 354277 for partition windowed-node-counts-0 to 354276 (org.apache.kafka.clients.producer.internals.ProducerBatch)
> [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,685] ERROR [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient)
> [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) java.lang.IllegalStateException: Should not reopen a batch which is already aborted.
>         at org.apache.kafka.common.record.MemoryRecordsBuilder.reopenAndRewriteProducerState(MemoryRecordsBuilder.java:295)
>         at org.apache.kafka.clients.producer.internals.ProducerBatch.resetProducerState(ProducerBatch.java:395)
>         at org.apache.kafka.clients.producer.internals.TransactionManager.lambda$adjustSequencesDueToFailedBatch$4(TransactionManager.java:770)
>         at org.apache.kafka.clients.producer.internals.TransactionManager$TopicPartitionEntry.resetSequenceNumbers(TransactionManager.java:180)
>         at org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:760)
>         at org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:735)
>         at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:671)
>         at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:662)
>         at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:620)
>         at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:554)
>         at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:69)
>         at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:745)
>         at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
>         at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:571)
>         at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:563)
>         at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:304)
>         at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> The proper fix is to add a check for handle failed batch in txn manager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)