You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@samza.apache.org by "Xinyu Liu (Jira)" <ji...@apache.org> on 2020/02/27 18:54:00 UTC

[jira] [Assigned] (SAMZA-1974) Optimize partitionBy() for task count =1

     [ https://issues.apache.org/jira/browse/SAMZA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xinyu Liu reassigned SAMZA-1974:
--------------------------------

    Assignee:     (was: Xinyu Liu)

> Optimize partitionBy() for task count =1
> ----------------------------------------
>
>                 Key: SAMZA-1974
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1974
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Xinyu Liu
>            Priority: Major
>
> If the task count is 1, then we don't need to go through repartition stage if the pipeline uses partitionBy.  We don't need to create intermediate streams either. In this case, all the keys will be mapped to a single task so no need to repartition anymore.
> The reason why we need this is that for Beam generated Samza pipeline, it will insert the partitionBy operator if there is GroupByKey. For local dev/testing, the input will be using a single partition so there is no need to repartition, avoiding setting up local kafka clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)