You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Markus Holzemer (JIRA)" <ji...@apache.org> on 2014/06/23 12:31:25 UTC

[jira] [Commented] (FLINK-909) Pitfall due to additional superstep after the iteration has stopped

    [ https://issues.apache.org/jira/browse/FLINK-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040602#comment-14040602 ] 

Markus Holzemer commented on FLINK-909:
---------------------------------------

I also stumbled over this issue a few times. Since I am currently in the process of refactoring the iterations runtime I will have a look at this issue.
It should be possible to add a barrier at the start of each superstep and wait for an explicit OK message from the iteration head task (that is managing a single iteration instance at one taskmanager) before the next superstep can start.



> Pitfall due to additional superstep after the iteration has stopped
> -------------------------------------------------------------------
>
>                 Key: FLINK-909
>                 URL: https://issues.apache.org/jira/browse/FLINK-909
>             Project: Flink
>          Issue Type: Bug
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> Currently, after an iteration has exceeded the maximum number of iterations, all tasks are started again for an additional superstep during which they are stopped. This works if a tasks only waits for dynamic input. However, in the case where one has a task, e.g. a coGroup operation, which gets dynamic and static input the execution is not blocked. This can then lead to erroneous behaviour which the user is not aware of.
> I had this problem implementing ALS. Here one has a loop which gets as dynamic input matrix columns and as static input matrix entries. The columns and the entries are used to construct a matrix which represents a system of linear equations. If the set of columns are empty, then the matrix is singular and thus not solvable. During the additional superstep the task won't receive any columns but would still try to solve the now singular matrix.
> It would be good to finish the iteration without initiating this additional superstep.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/909
> Created by: [tillrohrmann|https://github.com/tillrohrmann]
> Labels: 
> Created at: Thu Jun 05 17:50:17 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)