You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/12/06 11:26:58 UTC

[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure

    [ https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725206#comment-15725206 ] 

ASF GitHub Bot commented on KAFKA-3038:
---------------------------------------

GitHub user resetius opened a pull request:

    https://github.com/apache/kafka/pull/2213

    KAFKA-3038; Future'based pseudo-async controller

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/resetius/kafka KAFKA-3038-trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/2213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2213
    
----
commit 339f8d76f7c2eb1b4ff45c7e088c6c8486ba786a
Author: Alexey Ozeritsky <ao...@yandex-team.ru>
Date:   2016-12-01T17:29:12Z

    KAFKA-3038; Future'based pseudo-async controller

----


> Speeding up partition reassignment after broker failure
> -------------------------------------------------------
>
>                 Key: KAFKA-3038
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3038
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller, core
>    Affects Versions: 0.9.0.0
>            Reporter: Eno Thereska
>             Fix For: 0.11.0.0
>
>
> After a broker failure the controller does several writes to Zookeeper for each partition on the failed broker. Writes are done one at a time, in closed loop, which is slow especially under high latency networks. Zookeeper has support for batching operations (the "multi" API). It is expected that substituting serial writes with batched ones should reduce failure handling time by an order of magnitude.
> This is identified as an issue in https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 (section End-to-end latency during a broker failure)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)