You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Rajini Sivaram (Jira)" <ji...@apache.org> on 2019/11/11 20:18:00 UTC

[jira] [Resolved] (KAFKA-9171) DelayedFetch completion may throw exception, causing successful produce to be failed

     [ https://issues.apache.org/jira/browse/KAFKA-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajini Sivaram resolved KAFKA-9171.
-----------------------------------
      Reviewer: Ismael Juma
    Resolution: Fixed

> DelayedFetch completion may throw exception, causing successful produce to be failed
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9171
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9171
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.4.0
>            Reporter: Rajini Sivaram
>            Assignee: Rajini Sivaram
>            Priority: Major
>             Fix For: 2.4.0
>
>
> I was looking at the logs of the system test failure of ReassignPartitionsTest.
> Logs show produce error ReplicaNotAvailableException for two records in the producer log, but the data logs of all the brokers contain the records. The offsets of these records are returned as successful produce for two subsequent records which don't appear in the logs and hence the test failed.
> Broker logs of the leader at the time of the reassignment and leader change show:
>  
> {{[2019-11-11 07:23:17,727] ERROR [ReplicaManager broker=3] Error processing append operation on partition test_topic-17 (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.ReplicaNotAvailableException: Partition test_topic-5 is not available}}
> This is failing the append operation on `test_topic-17` when a different partition `test_topic-5` was unavailable for fetch. I think it is fetch since produce would have thrown NotLeaderForPartitionException rather than ReplicaNotAvailableException.
> We don't expect DelayedFetch to throw exceptions and it looks like we are not handling `ReplicaNotAvailableException`.
> I am not sure if this fixes the issues with ReassignPartitionsTest, but this seems to a scenario that we should fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)