You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Shivam Singhal <sh...@gmail.com> on 2022/10/13 17:01:05 UTC

Automating the e2e testing of flows involving batch beam pipelines

Hey folks,

I have a backend side flow which involves running a batch beam pipeline.

We have an automation test which:

   1. Writes some mock data to BQ
   2. Invokes Dataflow API to run a batch job which reads from BQ and
   writes the results to BigTable
   3. Asserts on the results written by the dataflow job

The problem is that step number 2 takes 6-7 mins because beam and dataflow
needs those minutes to spin up the worker and analyze the job graph.
This wait time increases our time to run tests.

Are there any best practices or ways to somehow reduce this wait time?

I know we have unit test classes in apache beam but those are just unit
tests (not integration tests).

Thanks!