You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Ryanne Dolan <ry...@gmail.com> on 2019/08/01 16:25:01 UTC

Re: Kafka connect task assignment Improvement ( New Feature )

Manjeet, this sounds like a problem that exists outside Connect's purview.
Connect has nothing to do with resource management.

Ryanne

On Mon, Jul 29, 2019, 2:05 PM Manjeet Duhan <md...@operative.com> wrote:

> Hi ,
>
> This is Manjeet here working in operative media . I have been working on
> confluent kafka for almost 4 years and have made many customized changes
> for kafka connect sink and source connectors . I have made changes in kafka
> code base as well for our requirement.
>
> There is one feature I have added recently after discussing with our
> architect Praveen Manvi which I wanted to discuss with you for larger
> community usage.
>
> Background  :- We are running more than 30 connectors in the operative but
> each connector require different machine specification . E.g Kafka connect
> s3 requires more memory and some of the in house connector require more
> network bandwidth ( IO ) and processing power (CPU) . We were getting out
> of memory in worker due to one connector . This effected entire processes
> and we had to pause this connector.
>
> Issue :- We wanted each connector to run on specific machine (in this case
> , we want 3 type of machines memory , cpu and IO).
>
> Existing Solution :-  We can start 3 cluster and have specific type of
> machine in each cluster but this is difficult to manage.
>           Pain points :-
>
> 1.       We have to consistently take care of cluster while starting
> machine otherwise it can start in different cluster.
>
> 2.       We have to change offset storage topic otherwise we will be able
> to see across cluster connectors
>
> Issue Proposed :-  We specify type of machine in distributed properties of
> each worker machine so that when we specify target machine type in
> connector start , It should be able to start task on exactly same type of
> machines. In this case we don't have to take care of above pain points .
> Different type of machine will be part of same cluster.
>
> Example :- I have 4 workers with type as memory (worker 1), cpu (worker 2)
> and IO (worker3 and worker 4 ).
>
>
> a)       We started connector 1 with 2 tasks and specified target machine
> type as cpu. It will distribute tasks equally on worker 3 and worker 4.
>
> b)      We started connector 2  with 2 task with target machine type as
> memory . It will start both task on worker 1.
>
> I have made changes for this feature and it is working fine and we are
> pushing to our production cluster in few days.
>
> Please tell if it can be helpful for the larger community.
>
>
> Thanks,
> Manjeet Duhan
>
>
>