You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Yan Fang (JIRA)" <ji...@apache.org> on 2015/07/22 18:45:04 UTC
[jira] [Commented] (SAMZA-717) Expose the TaskNameGrouper API

    [ https://issues.apache.org/jira/browse/SAMZA-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637160#comment-14637160 ] 

Yan Fang commented on SAMZA-717:
--------------------------------

[~jjung], 

1. 
{quote}
Renamed the TaskNameGrouper to TaskGrouper
{quote}

I would suggest to use the original one, TaskNameGrouper, because TaskName is a class name. We actually are grouping the TaskName class. "Task" is too broad.


2. can you try to keep the original class in their existing package? Not move from samza-core to samza-api. We only expose needed APIs to the api package. Others should be kept in the core package.

> Expose the TaskNameGrouper API
> ------------------------------
>
>                 Key: SAMZA-717
>                 URL: https://issues.apache.org/jira/browse/SAMZA-717
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Yan Fang
>            Assignee: József Márton Jung
>            Priority: Minor
>             Fix For: 0.10.0
>
>         Attachments: SAMZA-717.0.patch
>
>
> We now are using the [GroupByContainerCount|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/grouper/task/GroupByContainerCount.scala] that extends [TaskNameGrouper|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/grouper/task/TaskNameGrouper.scala] to assign TaskModels to ContainerModels (equivalent to assign tasks to different containers in YARN world).
> I think it also makes sense that we expose the TaskNameGrouper as an API that users can use to implement how they want to assign the TaskModels to the ContainerModels. 
> This is useful when users have knowledge about the throughput of their streams because we are sharing the consumers for all the taskIntances in one container. One use case is that users want to put (partition-1, partition-3), (partition-2, partition-4) instead of (partition-1, partition-2), (partition-3, partition-4), which is current strategy. Because partition-1 and partition-2 both have a lot of messages coming, while partition-3 and partition-4 have fewer messages coming. Of course, when users have enough containers (same number as the task number) or all the partitions are equally divided, this feature is useless.
> What do you guys think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)