You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias J. Sax (JIRA)" <ji...@apache.org> on 2015/10/07 11:17:26 UTC

[jira] [Updated] (FLINK-2586) Unstable Storm Compatibility Tests

     [ https://issues.apache.org/jira/browse/FLINK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias J. Sax updated FLINK-2586:
-----------------------------------
    Description: 
The Storm Compatibility tests frequently fail.

The reason is that they kill the topologies after a certain time interval. That may fail on CI infrastructure when certain steps are delayed beyond usual. Trying to guarantee progress by time is inherently problematic:
  - Waiting too short makes tests unstable
  - Waiting too long makes tests slow

The right way to go is letting the program decide when to terminate, for example by throwing a special {{SuccessException}}.

Have a look at the Kafka connector tests, they do this a lot and hence run exactly as short or as long as they need to.

Here is an example of a failed run: https://s3.amazonaws.com/archive.travis-ci.org/jobs/77499577/log.txt

>From FLINK-2801

bq. The tests for the storm compatibiliy layer are all working with timeouts (running the program for 10 seconds) and then checking whether teh expected result has been written.

bq. That is inherently unstable and slow (long delays). They should be rewritten in a similar manner like for example the KafkaITCase tests, where the streaming jobs terminate themselves with a "SuccessException", which can be recognized as successful completion when thrown by the job client.


  was:
The Storm Compatibility tests frequently fail.

The reason is that they kill the topologies after a certain time interval. That may fail on CI infrastructure when certain steps are delayed beyond usual. Trying to guarantee progress by time is inherently problematic:
  - Waiting too short makes tests unstable
  - Waiting too long makes tests slow

The right way to go is letting the program decide when to terminate, for example by throwing a special {{SuccessException}}.

Have a look at the Kafka connector tests, they do this a lot and hence run exactly as short or as long as they need to.

Here is an example of a failed run: https://s3.amazonaws.com/archive.travis-ci.org/jobs/77499577/log.txt


> Unstable Storm Compatibility Tests
> ----------------------------------
>
>                 Key: FLINK-2586
>                 URL: https://issues.apache.org/jira/browse/FLINK-2586
>             Project: Flink
>          Issue Type: Bug
>          Components: Storm Compatibility
>    Affects Versions: 0.10
>            Reporter: Stephan Ewen
>            Assignee: Matthias J. Sax
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 0.10
>
>
> The Storm Compatibility tests frequently fail.
> The reason is that they kill the topologies after a certain time interval. That may fail on CI infrastructure when certain steps are delayed beyond usual. Trying to guarantee progress by time is inherently problematic:
>   - Waiting too short makes tests unstable
>   - Waiting too long makes tests slow
> The right way to go is letting the program decide when to terminate, for example by throwing a special {{SuccessException}}.
> Have a look at the Kafka connector tests, they do this a lot and hence run exactly as short or as long as they need to.
> Here is an example of a failed run: https://s3.amazonaws.com/archive.travis-ci.org/jobs/77499577/log.txt
> From FLINK-2801
> bq. The tests for the storm compatibiliy layer are all working with timeouts (running the program for 10 seconds) and then checking whether teh expected result has been written.
> bq. That is inherently unstable and slow (long delays). They should be rewritten in a similar manner like for example the KafkaITCase tests, where the streaming jobs terminate themselves with a "SuccessException", which can be recognized as successful completion when thrown by the job client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)