You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Navina Ramesh (JIRA)" <ji...@apache.org> on 2016/03/25 03:37:25 UTC
[jira] [Created] (SAMZA-917) Dynamic rebalancing upon upstream
changes
Navina Ramesh created SAMZA-917:
-----------------------------------
Summary: Dynamic rebalancing upon upstream changes
Key: SAMZA-917
URL: https://issues.apache.org/jira/browse/SAMZA-917
Project: Samza
Issue Type: New Feature
Reporter: Navina Ramesh
This is a known issue, where any change in the partition count in the upstream affects the Samza job and it needs to be restarted. In such scenarios, we experience data loss or incorrect processing because the application logic depends on the partitioning strategy.
The following are the issues in Samza when it comes to integrating with dynamically changing (perhaps elastic??) upstreams:
* We don't have a good mechanism to detect a change in upstream partitioning. This means that there are no Samza tasks to handle new partitions in the stream.
* We don't rebalance Samza tasks to ensure that all new partitions from upstream are picked up. It is worsened for stateful jobs as we no longer have a meaningful way to shuffle the tasks's state store.
This is an umbrella ticket to discuss solution for these issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)