You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2018/06/27 12:11:00 UTC

[jira] [Created] (FLINK-9678) Remove hard-coded sleeps in HA E2E test

Chesnay Schepler created FLINK-9678:
---------------------------------------

             Summary: Remove hard-coded sleeps in HA E2E test
                 Key: FLINK-9678
                 URL: https://issues.apache.org/jira/browse/FLINK-9678
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Coordination, Tests
    Affects Versions: 1.5.0, 1.6.0
            Reporter: Chesnay Schepler


{{test_ha.sh}} uses 2 hard-coded sleeps.
{code:java}
# let the job run for a while to take some checkpoints
sleep 20

for (( c=0; c<${JM_KILLS}; c++ )); do
    # kill the JM and wait for watchdog to
    # create a new one which will take over
    kill_jm
    sleep 60
done{code}
These sleeps are always troublesome as they either make the test brittle by being to small, or causing the test to idle when they are to large.

The first sleep should be replaced with {{wait_num_checkpoints.}}

I'm not entirely sure about the semantics of the second sleep, but I guess we're waiting for the new JM to continue the job execution. In this case I suggest to instead query the job status via REST and wait until the job is running.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)