You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by onurkaraman <gi...@git.apache.org> on 2017/05/06 03:46:16 UTC

[GitHub] kafka pull request #2986: KAFKA-5099: Replica Deletion Regression from KIP-1...

GitHub user onurkaraman opened a pull request:

    https://github.com/apache/kafka/pull/2986

    KAFKA-5099: Replica Deletion Regression from KIP-101

    Replica deletion regressed from KIP-101. Replica deletion happens when a broker receives a StopReplicaRequest with delete=true. Ever since KAFKA-1911, replica deletion has been async, meaning the broker responds with a StopReplicaResponse simply after marking the replica directory as staged for deletion. This marking happens by moving a data log directory and its contents such as /tmp/kafka-logs1/t1-0 to a marked directory like /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete, acting as a soft-delete. A scheduled thread later actually deletes the data. It appears that the regression occurs while the scheduled thread is actually trying to delete the data, which means the controller considers operations such as partition reassignment and topic deletion complete. But if you look at the log4j logs and data logs, you'll find that the soft-deleted data logs actually won't get deleted.
    
    The bug is that upon log deletion, we attempt to flush the LeaderEpochFileCache to the original file location instead of the moved file location. Restarting the broker actually allows for the soft-deleted directories to get deleted.
    
    This patch avoids the issue by simply not flushing the LeaderEpochFileCache upon log deletion.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/onurkaraman/kafka KAFKA-5099

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/2986.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2986
    
----
commit 081e09be262dea261f7faf9b5d81b50995600f68
Author: Onur Karaman <ok...@linkedin.com>
Date:   2017-05-06T00:59:36Z

    KAFKA-5099: Replica Deletion Regression from KIP-101
    
    Replica deletion regressed from KIP-101. Replica deletion happens when a broker receives a StopReplicaRequest with delete=true. Ever since KAFKA-1911, replica deletion has been async, meaning the broker responds with a StopReplicaResponse simply after marking the replica directory as staged for deletion. This marking happens by moving a data log directory and its contents such as /tmp/kafka-logs1/t1-0 to a marked directory like /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete, acting as a soft-delete. A scheduled thread later actually deletes the data. It appears that the regression occurs while the scheduled thread is actually trying to delete the data, which means the controller considers operations such as partition reassignment and topic deletion complete. But if you look at the log4j logs and data logs, you'll find that the soft-deleted data logs actually won't get deleted.
    
    The bug is that upon log deletion, we attempt to flush the LeaderEpochFileCache to the original file location instead of the moved file location. Restarting the broker actually allows for the soft-deleted directories to get deleted.
    
    This patch avoids the issue by simply not flushing the LeaderEpochFileCache upon log deletion.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] kafka pull request #2986: KAFKA-5099: Replica Deletion Regression from KIP-1...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/kafka/pull/2986


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---