You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (JIRA)" <ji...@apache.org> on 2015/02/01 07:46:35 UTC

[jira] [Created] (KAFKA-1911) Log deletion on stopping replicas should be async

Joel Koshy created KAFKA-1911:
---------------------------------

             Summary: Log deletion on stopping replicas should be async
                 Key: KAFKA-1911
                 URL: https://issues.apache.org/jira/browse/KAFKA-1911
             Project: Kafka
          Issue Type: Bug
          Components: log, replication
            Reporter: Joel Koshy
            Assignee: Jay Kreps
             Fix For: 0.8.3


If a StopReplicaRequest sets delete=true then we do a file.delete on the file message sets. I was under the impression that this is fast but it does not seem to be the case.

On a partition reassignment in our cluster the local time for stop replica took nearly 30 seconds.

{noformat}
Completed request:Name: StopReplicaRequest; Version: 0; CorrelationId: 467; ClientId: ;    DeletePartitions: true; ControllerId: 1212; ControllerEpoch: 53 from client/...:45964;totalTime:29191,requestQueueTime:1,localTime:29190,remoteTime:0,responseQueueTime:0,sendTime:0
{noformat}

This ties up one API thread for the duration of the request.

Specifically in our case, the queue times for other requests also went up and producers to the partition that was just deleted on the old leader took a while to refresh their metadata (see KAFKA-1303) and eventually ran out of retries on some messages leading to data loss.

I think the log deletion in this case should be fully asynchronous although we need to handle the case when a broker may respond immediately to the stop-replica-request but then go down after deleting only some of the log segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)