You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2019/09/10 13:37:00 UTC

[jira] [Updated] (FLINK-14043) SavepointMigrationTestBase is super slow

     [ https://issues.apache.org/jira/browse/FLINK-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann updated FLINK-14043:
----------------------------------
    Description: 
The subclasses of {{SavepointMigrationTestBase}} take super long to execute. On my local machine

* {{TypeSerializerSnapshotMigrationITCase}} takes 2min 30s
* {{StatefulJobWBroadcastStateMigrationITCase}} takes 1min 45s
* {{StatefulJobSavepointMigrationITCase}} takes 2min 5s

to execute. The reasons for the long runtimes seem to be that we are using the {{AccumulatorCountingSink}} which uses the accumulators to signal when a job is done. Since the accumulators are being sent with the TM heartbeats, the heartbeat interval how fast the client realizes that the job can be shut down. The default heartbeat interval is {{10 s}} and hence it takes always at least 10 seconds until the client stops the job.

I suggest to decrease the heartbeat interval in the {{SavepointMigrationTestBase}} to 300ms in order to speed up the tests. On my machine the test runtimes with this settings are:

* {{TypeSerializerSnapshotMigrationITCase}} takes 13s
* {{StatefulJobWBroadcastStateMigrationITCase}} takes 10s
* {{StatefulJobSavepointMigrationITCase}} takes 11s


  was:
The subclasses of {{SavepointMigrationTestBase}} take super long to execute. On my local machine

* {{TypeSerializerSnapshotMigrationITCase}} takes 2min 30s
* {{StatefulJobWBroadcastStateMigrationITCase}} takes 1min 45s
* {{StatefulJobSavepointMigrationITCase}} takes 2min 5s

to execute. The reasons for the long runtimes seem to be that we are using the {{AccumulatorCountingSink}} which uses the accumulators to signal when a job is done. Since the accumulators are being sent with the TM heartbeats, the heartbeat interval how fast the client realizes that the job can be shut down. The default heartbeat interval is {{10 s}} and hence it takes always at least 10 seconds until the client stops the job.

I suggest to decrease the heartbeat interval in the {{SavepointMigrationTestBase}} to 500ms in order to speed up the tests. On my machine the test runtimes with this settings are:

* {{TypeSerializerSnapshotMigrationITCase}} takes 13s
* {{StatefulJobWBroadcastStateMigrationITCase}} takes 10s
* {{StatefulJobSavepointMigrationITCase}} takes 11s



> SavepointMigrationTestBase is super slow
> ----------------------------------------
>
>                 Key: FLINK-14043
>                 URL: https://issues.apache.org/jira/browse/FLINK-14043
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends, Tests
>    Affects Versions: 1.8.1, 1.9.0, 1.10.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>             Fix For: 1.10.0, 1.9.1, 1.8.3
>
>
> The subclasses of {{SavepointMigrationTestBase}} take super long to execute. On my local machine
> * {{TypeSerializerSnapshotMigrationITCase}} takes 2min 30s
> * {{StatefulJobWBroadcastStateMigrationITCase}} takes 1min 45s
> * {{StatefulJobSavepointMigrationITCase}} takes 2min 5s
> to execute. The reasons for the long runtimes seem to be that we are using the {{AccumulatorCountingSink}} which uses the accumulators to signal when a job is done. Since the accumulators are being sent with the TM heartbeats, the heartbeat interval how fast the client realizes that the job can be shut down. The default heartbeat interval is {{10 s}} and hence it takes always at least 10 seconds until the client stops the job.
> I suggest to decrease the heartbeat interval in the {{SavepointMigrationTestBase}} to 300ms in order to speed up the tests. On my machine the test runtimes with this settings are:
> * {{TypeSerializerSnapshotMigrationITCase}} takes 13s
> * {{StatefulJobWBroadcastStateMigrationITCase}} takes 10s
> * {{StatefulJobSavepointMigrationITCase}} takes 11s



--
This message was sent by Atlassian Jira
(v8.3.2#803003)