You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias J. Sax (JIRA)" <ji...@apache.org> on 2015/08/27 19:21:47 UTC

[jira] [Commented] (FLINK-2586) Unstable Storm Compatibility Tests

    [ https://issues.apache.org/jira/browse/FLINK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717104#comment-14717104 ] 

Matthias J. Sax commented on FLINK-2586:
----------------------------------------

This is a know problem. The reasons why the call to `killTopology` is non-blocking is that in Storm this call does not block either. I think we should preserve the behavior as much as possible to maximize user experience. However, we could extend `KillOptions` as `FlinkKillOptions` and add a new Flink specific flag that make the call blocking. We can use this flag in test to make them stable. The same holds for `SubmitOptions`. If all Spouts implement `FiniteSpoutInterface` we could allow for a blocking submit option to bridge between Storm and Flink behavior.

> Unstable Storm Compatibility Tests
> ----------------------------------
>
>                 Key: FLINK-2586
>                 URL: https://issues.apache.org/jira/browse/FLINK-2586
>             Project: Flink
>          Issue Type: Bug
>          Components: Storm Compatibility
>    Affects Versions: 0.10
>            Reporter: Stephan Ewen
>            Priority: Critical
>             Fix For: 0.10
>
>
> The Storm Compatibility tests frequently fail.
> The reason is that they kill the topologies after a certain time interval. That may fail on CI infrastructure when certain steps are delayed beyond usual. Trying to guarantee progress by time is inherently problematic:
>   - Waiting too short makes tests unstable
>   - Waiting too long makes tests slow
> The right way to go is letting the program decide when to terminate, for example by throwing a special {{SuccessException}}.
> Have a look at the Kafka connector tests, they do this a lot and hence run exactly as short or as long as they need to.
> Here is an example of a failed run: https://s3.amazonaws.com/archive.travis-ci.org/jobs/77499577/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)