You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Navina Ramesh (JIRA)" <ji...@apache.org> on 2016/03/25 03:37:25 UTC

[jira] [Created] (SAMZA-917) Dynamic rebalancing upon upstream changes

Navina Ramesh created SAMZA-917:
-----------------------------------

             Summary: Dynamic rebalancing upon upstream changes
                 Key: SAMZA-917
                 URL: https://issues.apache.org/jira/browse/SAMZA-917
             Project: Samza
          Issue Type: New Feature
            Reporter: Navina Ramesh


This is a known issue, where any change in the partition count in the upstream affects the Samza job and it needs to be restarted. In such scenarios, we experience data loss or incorrect processing because the application logic depends on the partitioning strategy.
The following are the issues in Samza when it comes to integrating with dynamically changing (perhaps elastic??) upstreams:
* We don't have a good mechanism to detect a change in upstream partitioning. This means that there are no Samza tasks to handle new partitions in the stream.
* We don't rebalance Samza tasks to ensure that all new partitions from upstream are picked up. It is worsened for stateful jobs as we no longer have a meaningful way to shuffle the tasks's state store.

This is an umbrella ticket to discuss solution for these issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)