You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/01/20 14:35:39 UTC

[jira] [Commented] (FLINK-3256) Invalid execution graph cleanup for jobs with colocation groups

    [ https://issues.apache.org/jira/browse/FLINK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108548#comment-15108548 ] 

ASF GitHub Bot commented on FLINK-3256:
---------------------------------------

GitHub user senorcarbone opened a pull request:

    https://github.com/apache/flink/pull/1526

    [FLINK-3256] Fix colocation group re-instantiation

    This PR deals with the problem of inconsistent colocation groups upon reconfiguration. The problem was that we were removing shared constraints multiple times for each ExecutionJobVertex, thus, colocated vertices, in the same co-location group, ended up being scheduled with different constraints leading to wrong redeployment.
    
    To deal with it we keep all distinct colocation groups in the execution graph and reset them once outside the individual ExecutionJobVertex re-instantiation. There is also a new test that is used to check whether certain properties are consistent after reconfiguration. We can potentially add more properties in the same test to ensure that they are also maintained upon reconfiguration.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/senorcarbone/flink egfix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1526
    
----
commit a8f24b5003885596f48eaa73b24b94dfbc5380e6
Author: Paris Carbone <pa...@kth.se>
Date:   2016-01-20T02:03:41Z

    [FLINK-3256] Fix colocation group re-instantiation

----


> Invalid execution graph cleanup for jobs with colocation groups
> ---------------------------------------------------------------
>
>                 Key: FLINK-3256
>                 URL: https://issues.apache.org/jira/browse/FLINK-3256
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>            Reporter: Paris Carbone
>            Assignee: Paris Carbone
>            Priority: Blocker
>
> Currently, upon restarting an execution graph, we clean-up the colocation constraints for each group present in an ExecutionJobVertex respectively.
> This can lead to invalid reconfiguration upon a restart or any other activity that relies on state cleanup of the execution graph. For example, upon restarting a DataStream job with iterations the following steps are executed:
> 1) IterationSource colgroup constraints are reset
> 2) IterationSource execution vertices reset and create new colocation constraints
> 3) IterationSink colgroup constraints are reset
> 4) IterationSink execution vertices reset and create different colocation constraints.
> This can be trivially fixed by reseting colocation groups independently from ExecutionJobVertices, thus, updating them once per reconfiguration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)