You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/12/18 00:14:58 UTC

[jira] [Commented] (KAFKA-4553) Connect's round robin assignment produces undesirable distribution of connectors/tasks

    [ https://issues.apache.org/jira/browse/KAFKA-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757854#comment-15757854 ] 

ASF GitHub Bot commented on KAFKA-4553:
---------------------------------------

GitHub user ewencp opened a pull request:

    https://github.com/apache/kafka/pull/2272

    KAFKA-4553: Improve round robin assignment in Connect to avoid uneven distributions of connectors and tasks

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ewencp/kafka kafka-4553-better-connect-round-robin

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/2272.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2272
    
----
commit a33bbec13aac54bf2e09869125d6efb89165f602
Author: Ewen Cheslack-Postava <me...@ewencp.org>
Date:   2016-12-17T23:53:29Z

    KAFKA-4553: Improve round robin assignment in Connect to avoid uneven distributions of connectors and tasks

----


> Connect's round robin assignment produces undesirable distribution of connectors/tasks
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4553
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4553
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 0.10.1.0
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>
> Currently the round robin assignment in Connect looks something like this:
> foreach connector {
>   assign connector to next worker
>   for each task in connector {
>     assign task to next member
>   }
> }
> For the most part we assume that connectors and tasks are effectively equivalent units of work, but this is actually rarely the case. Connectors are usually much lighterweight as they are just monitoring for changes in the source/sink system and tasks are doing the heavy lifting. The way we are currently doing round robin assignment then causes uneven distributions of work in some cases that are not too uncommon.
> In particular, it gets bad if there are an even number of workers and connectors that generate only a single task since this results in the even #'d workers always getting assigned connectors and odd workers always getting assigned tasks. An extreme case of this is when users start distributed mode clusters with just a couple of workers to get started and deploy multiple single-task connectors (e.g. CDC connectors like Debezium would be a common example). All the connectors end up on one worker, all the tasks end up on the other, and the second worker becomes overloaded.
> Although the ideal solution to this problem is to have a better idea of how much load each connector/task will generate, I don't think we want to get into the business of full-on cluster resource management. An alternative which I think avoids this common pitfall without the risk of hitting another common bad case is to change the algorithm to assign all the connectors first, then all the tasks, i.e.
> foreach connector {
>   assign connector to next worker
> }
> foreach connector {
>   for each task in connector {
>     assign task to next worker
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)