You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Yi Pan (Data Infrastructure) (JIRA)" <ji...@apache.org> on 2018/09/20 22:29:00 UTC
[jira] [Created] (SAMZA-1893) JobNodeConfigurationGenerator should
only generate configurations for operators, streams, stores, and tables
that are reachable by a JobNode
Yi Pan (Data Infrastructure) created SAMZA-1893:
---------------------------------------------------
Summary: JobNodeConfigurationGenerator should only generate configurations for operators, streams, stores, and tables that are reachable by a JobNode
Key: SAMZA-1893
URL: https://issues.apache.org/jira/browse/SAMZA-1893
Project: Samza
Issue Type: Improvement
Reporter: Yi Pan (Data Infrastructure)
Currently, the planner does not generate multi-job ExecutionPlan yet. And hence, the current implementation of JobNodeConfigurationGenerator does not strictly follow the rule to only generate the configuration for input/output/intermediate streams, operators, stores, and tables reachable by the current JobNode yet (i.e. for single JobNode plan, everything is reachable by the only JobNode).
When we extend it to multi-job plan, we need to generate the configurations only for streams, operators, stores, and tables that are reachable by a JobNode. If two JobNodes collide on the configuration for those configuration, it will result in the following problems:
1) input streams are consumed multiple times, unnecessarily
2) stores' changelog will be written by multiple jobs, creating consistency issue in recovery
3) tables will be accessed (read or write) by multiple jobs, also creating consistency issues
We need to make sure that JobNodes in multi-job plans don't create collision in configuration for input/output/intermediate streams, state stores, and tables.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)