You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/10 22:02:00 UTC

[jira] [Commented] (KAFKA-4969) State-store workload-aware StreamsPartitionAssignor

    [ https://issues.apache.org/jira/browse/KAFKA-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321183#comment-16321183 ] 

ASF GitHub Bot commented on KAFKA-4969:
---------------------------------------

bbejeck opened a new pull request #4410: KAFKA-4969: attempt to evenly distribute load of tasks
URL: https://github.com/apache/kafka/pull/4410
 
 
   This PR is an initial attempt to evenly distribute tasks with heavy processing across clients using a somewhat naive approach.
   
   The rationale is by making sure each task is not comprised entirely of the same `topicGroupId`s, 
   then if there is one sub-topology doing heavy processing and another sub-topology that is relatively light, the processing load is somewhat evenly distributed.
   
   This process only looks at active tasks; standby tasks are not given this consideration as we can end up in a state where clients have the same task assignments i.e [aT1, sT2] [aT2, sT1].
   
   We plan to do a follow-on task at a later date where we weigh tasks with state stores to
   distribute tasks with state stores evenly.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> State-store workload-aware StreamsPartitionAssignor
> ---------------------------------------------------
>
>                 Key: KAFKA-4969
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4969
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>            Reporter: Matthias J. Sax
>
> Currently, {{StreamPartitionsAssigner}} does not distinguish different "types" of tasks. For example, task can be stateless of have one or multiple stores.
> This can lead to an suboptimal task placement: assume there are 2 stateless and 2 stateful tasks and the app is running with 2 instances. To share the "store load" it would be good to place one stateless and one stateful task per instance. Right now, there is no guarantee about this, and it can happen, that one instance processed both stateless tasks while the other processes both stateful tasks.
> We should improve {{StreamPartitionAssignor}} and introduce "task types" including a cost model for task placement. We should consider the following parameters:
>  - number of stores
>  - number of sources/sinks
>  - number of processors
>  - regular task vs standby task
> This improvement should be backed by a design document in the project wiki (no KIP required though) as it's a fairly complex change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)