You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jason Rosenberg (JIRA)" <ji...@apache.org> on 2013/06/29 15:15:19 UTC

[jira] [Commented] (KAFKA-955) After a leader change, messages sent with ack=0 are lost

    [ https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696123#comment-13696123 ] 

Jason Rosenberg commented on KAFKA-955:
---------------------------------------

Here's a related scenario, that I see after a rolling restart of my brokers (using ack=0):

looking back at my logs, I'm wondering if a producer will reuse the same socket to send data to the same broker, for multiple topics (I'm guessing yes).  In which case, it looks like I'm seeing this scenario:

1. producer1 is happily sending messages for topicX and topicY to serverA (serverA is the leader for both topics, only 1 partition for each topic for simplicity).
2. serverA is restarted, and in the process, serverB becomes the new leader for both topicX and topicY.
3. producer1 decides to send a new message to topicX to serverA.
3a. this results in an exception ("Connection reset by peer").  producer1's connection to serverA is invalidated.
3b. producer1 makes a new metadata request for topicX, and learns that serverB is now the leader for topicX.
3c. producer1 resends the message to topicX, on serverB.
4. producer1 decides to send a new message to topicY to serverA.
4a. producer1 notes that it's socket to serverA is invalid, so it creates a new connection to serverA.
4b. producer1 successfully sends it's message to serverA (without realizing that serverA is no longer the leader for topicY).
4c. serverA logs to it's console:  
2013-06-23 08:28:46,770  WARN [kafka-request-handler-2] server.KafkaApis - [KafkaApi-508818741] Produce request with correlation id 7136261 from client  on partition [mytopic,0] failed due to Leader not local for partition [mytopic,0] on broker 508818741
5. producer1 continues to send messages for topicY to serverA, and serverA continues to log the same messages.
6. 10 minutes later, producer1 decides to update it's metadata for topicY, and learns that serverB is now the leader for topidY.
7. the warning messages finally stop in the console for serverA.

I am pretty sure this scenario, or one very close to it, is what I'm seeing in my logs, after doing a rolling restart, with controlled shutdown.

                
> After a leader change, messages sent with ack=0 are lost
> --------------------------------------------------------
>
>                 Key: KAFKA-955
>                 URL: https://issues.apache.org/jira/browse/KAFKA-955
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Rosenberg
>
> If the leader changes for a partition, and a producer is sending messages with ack=0, then messages will be lost, since the producer has no active way of knowing that the leader has changed, until it's next metadata refresh update.
> The broker receiving the message, which is no longer the leader, logs a message like this:
> Produce request with correlation id 7136261 from client  on partition [mytopic,0] failed due to Leader not local for partition [mytopic,0] on broker 508818741
> This is exacerbated by the controlled shutdown mechanism, which forces an immediate leader change.
> A possible solution to this would be for a broker which receives a message, for a topic that it is no longer the leader for (and if the ack level is 0), then the broker could just silently forward the message over to the current leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira