You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "József Márton Jung (JIRA)" <ji...@apache.org> on 2015/07/09 15:30:04 UTC

[jira] [Assigned] (SAMZA-717) Expose the TaskNameGrouper API

     [ https://issues.apache.org/jira/browse/SAMZA-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

József Márton Jung reassigned SAMZA-717:
----------------------------------------

    Assignee: József Márton Jung

> Expose the TaskNameGrouper API
> ------------------------------
>
>                 Key: SAMZA-717
>                 URL: https://issues.apache.org/jira/browse/SAMZA-717
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Yan Fang
>            Assignee: József Márton Jung
>            Priority: Minor
>
> We now are using the [GroupByContainerCount|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/grouper/task/GroupByContainerCount.scala] that extends [TaskNameGrouper|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/grouper/task/TaskNameGrouper.scala] to assign TaskModels to ContainerModels (equivalent to assign tasks to different containers in YARN world).
> I think it also makes sense that we expose the TaskNameGrouper as an API that users can use to implement how they want to assign the TaskModels to the ContainerModels. 
> This is useful when users have knowledge about the throughput of their streams because we are sharing the consumers for all the taskIntances in one container. One use case is that users want to put (partition-1, partition-3), (partition-2, partition-4) instead of (partition-1, partition-2), (partition-3, partition-4), which is current strategy. Because partition-1 and partition-2 both have a lot of messages coming, while partition-3 and partition-4 have fewer messages coming. Of course, when users have enough containers (same number as the task number) or all the partitions are equally divided, this feature is useless.
> What do you guys think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)