You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2021/05/12 16:01:00 UTC

[jira] [Updated] (BEAM-4100) Dataflow ValidatesRunner PostCommits are nearing the 3 hour max runtime

     [ https://issues.apache.org/jira/browse/BEAM-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kenneth Knowles updated BEAM-4100:
----------------------------------
    Fix Version/s: Not applicable
       Resolution: Fixed
           Status: Resolved  (was: Open)

> Dataflow ValidatesRunner PostCommits are nearing the 3 hour max runtime
> -----------------------------------------------------------------------
>
>                 Key: BEAM-4100
>                 URL: https://issues.apache.org/jira/browse/BEAM-4100
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Scott Wegner
>            Priority: P3
>             Fix For: Not applicable
>
>
> The beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle test suite been getting slower and slower over time. We run over 250 pipelines, and Dataflow has a fixed cost of about 3 minutes per pipeline just to spin up VM's. And with Gradle we are not able to parallelize execution quite as well, putting the test suite very close to the 3 hour limit.
> We should take steps to decrease the overall execution time for these post-commits. Some ideas, roughly in order of recommended action:
>  # Convert some ValidatesRunner tests to NeedsRunner. ValidatesRunner should be used when the test needs to validate functionality of each runner. I suspect that many of these are either duplicate or only need to be run on a single runner.
>  # Break up large test classes. Gradle parallelizes test classes, so the overall execution is constrained by the slowest test classes. ParDoTest for example runs over 50 pipelines, so the overall execution will always be > 50 * 3min = 2h30m
>  # Investigate how to achieve additional ValidatesRunner test parallelism. For example, we could:
>  ## Build a custom JUnit runner which packs a set of pipeline graphs into a single job to execute together.
>  ## Work with Dataflow for supporting a way to reuse VMs between jobs to decrease the overall cost.
>  ## Work on the Gradle Java plugin to support parallelization at the test case level, similar to Maven Surefire.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)