You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Paris Carbone (JIRA)" <ji...@apache.org> on 2016/01/19 12:44:39 UTC

[jira] [Updated] (FLINK-3256) Invalid execution graph cleanup for jobs with colocation groups

     [ https://issues.apache.org/jira/browse/FLINK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paris Carbone updated FLINK-3256:
---------------------------------
    Description: 
Currently, upon restarting an execution graph, we clean-up the colocation constraints for each group present in an ExecutionJobVertex respectively.

This can lead to invalid reconfiguration upon a restart or any other activity that relies on state cleanup of the execution graph. For example, upon restarting a DataStream job with iterations the following steps are executed:

1) IterationSource colocation group constraints are reset
2) New IterationSource colocation group constraints are generated
3) IterationSource execution vertices are reset with current colocation constraints
4) IterationSink colocation group constraints are reset
5) New IterationSink colocation group constraints are generated
6) IterationSink execution vertices are reset with different colocation constraints, thus, not being colocated with sources while also demanding more slots from the scheduler.

This can be trivially fixed by reseting colocation groups independently from ExecutionJobVertices, thus, updating them once per reconfiguration.

  was:
Currently, upon restarting an execution graph, we clean-up the colocation constraints for each group present in an ExecutionJobVertex respectively.

This can lead to invalid reconfiguration upon a restart or any other activity that relies on state cleanup of the execution graph. For example, upon restarting a DataStream job with iterations the following steps are executed:

1) IterationSource colocation group constraints are reset
2) New IterationSource colocation group constraints are generated
3) IterationSource subtasks are scheduled with current colocation constraints
4) IterationSink colocation group constraints are reset
5) New IterationSink colocation group constraints are generated
6) IterationSink subtasks are scheduled with different colocation constraints, thus, not being colocated with sources while also demanding more slots from the scheduler.

This can be trivially fixed by reseting colocation groups independently from ExecutionJobVertices, thus, updating them once per reconfiguration.


> Invalid execution graph cleanup for jobs with colocation groups
> ---------------------------------------------------------------
>
>                 Key: FLINK-3256
>                 URL: https://issues.apache.org/jira/browse/FLINK-3256
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>            Reporter: Paris Carbone
>            Assignee: Paris Carbone
>            Priority: Blocker
>
> Currently, upon restarting an execution graph, we clean-up the colocation constraints for each group present in an ExecutionJobVertex respectively.
> This can lead to invalid reconfiguration upon a restart or any other activity that relies on state cleanup of the execution graph. For example, upon restarting a DataStream job with iterations the following steps are executed:
> 1) IterationSource colocation group constraints are reset
> 2) New IterationSource colocation group constraints are generated
> 3) IterationSource execution vertices are reset with current colocation constraints
> 4) IterationSink colocation group constraints are reset
> 5) New IterationSink colocation group constraints are generated
> 6) IterationSink execution vertices are reset with different colocation constraints, thus, not being colocated with sources while also demanding more slots from the scheduler.
> This can be trivially fixed by reseting colocation groups independently from ExecutionJobVertices, thus, updating them once per reconfiguration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)