You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Łukasz Gajowy <lg...@apache.org> on 2019/02/18 16:51:24 UTC
Re: Dealing with expensive jenkins + Dataflow jobs

Thanks for your suggestions. It's always good to reach the dev list! You're
right that we should focus more on what are we trying to test rather than
providing huge loads.

To stay transparent for everyone:

"what is it we're trying to test?"

I talked with some testing experts from the Dataflow team and did some
experiments. Then I improved the proposal doc to explain better what is the
goal (previously it was not clear at all).
At the end of the doc [1] you can find a table with proposed test suites
for GroupByKey. I scaled the test scenarios down drastically but ensured
that we're testing what we want to test. Thanks to that they are not using
lots of resources but still do the job. Feel free to comment there
especially if you see any shortcomings we should rethink.

As for other operations (CoGBK, ParDo, SideInput, Combine): those are yet
to come.

(As an aside, 4 hours x 10 workers seems like a lot for 23GB of
data...or is it 230GB once you've fanned out?)

It was 230GB total - way too much given what we want to check.

[1]
https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit#heading=h.n2c0qqzfjcgz

Thanks,
Łukasz

śr., 23 sty 2019 o 19:18 Alan Myrvold <am...@google.com> napisał(a):

> Agreeing with Robert about "what is it we're trying to test?". Would a
> smaller performance test find the same issues, faster and more reliably?
>
> We have seen issues with the apache-beam-testing project exceeding quota
> during dataflow jobs, resulting in spurious failures during precommits and
> postcommits. 32 workers per dataflow jobs sounds fine, provided there are
> not too many concurrent dataflow jobs. Not all the tests have the number of
> workers limited, so I've seen some with ~80 workers. For non-performance
> tests, it would seem we should be able to drastically limit the number of
> workers, which should provide more room for performance tests/
>
> On Wed, Jan 23, 2019 at 7:10 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> I like the idea of creating separate project(s) for load tests so as
>> to not compete with other tests and the standard development cycle.
>>
>> As for how many workers is too many, I would take the track "what is
>> it we're trying to test?" Unless your stress-testing the shuffle
>> itself, much of what Beam does is linearly parallizable with the
>> number of machines. Of course one will still want to run over real,
>> large data sets, but not every load test needs this every time. More
>> interesting could be to try out running at 2x and 4x the data, with 2x
>> and 4x the machines, and seeing where we fail to be linear.
>>
>> (As an aside, 4 hours x 10 workers seems like a lot for 23GB of
>> data...or is it 230GB once you've fanned out?)
>>
>> On Wed, Jan 23, 2019 at 3:33 PM Łukasz Gajowy <lg...@apache.org> wrote:
>> >
>> > Hi,
>> >
>> > pinging this thread (maybe some folks missed it). What do you think
>> about those concerns/ideas?
>> >
>> > Łukasz
>> >
>> > pon., 14 sty 2019 o 17:11 Łukasz Gajowy <lg...@apache.org>
>> napisał(a):
>> >>
>> >> Hi all,
>> >>
>> >> one problem we need to solve while working with load tests we
>> currently develop is that we don't really know how much GCP/Jenkins
>> resources can we occupy. We did some initial testing with
>> beam_Java_LoadTests_GroupByKey_Dataflow_Small[1] and it seems that for:
>> >>
>> >> - 1 000 000 000 (~ 23 GB) synthetic record
>> >> - 10 fanouts
>> >> - 10 dataflow workers (--maxNumWorkers)
>> >>
>> >> the total job time exceeds 4 hours. It seems too much for such a small
>> load test. Additionally, we plan to add much bigger tests for other core
>> operations too. The proposal [2] describes only few of them.
>> >>
>> >> The questions are:
>> >> 1. how many workers can we assign to this job without starving the
>> other jobs? Are 32 workers for a single Dataflow job fine? Would 64 workers
>> for such job be fine either?
>> >> 2. given the plans that we are going to add more and more load tests
>> soon, do you think it is a good idea to create a separate GCP project +
>> separate Jenkins workers for load testing purposes only? This would avoid
>> starvation of critical tests (post commits, pre-commits, etc). Or maybe
>> there is another solution that will bring such isolation? Is such isolation
>> needed?
>> >>
>> >> Ad 2: Please note that we will also need to host Flink/Spark clusters
>> later on GKE/Dataproc (not decided yet).
>> >>
>> >> [1]
>> https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Java_LoadTests_GroupByKey_Dataflow_Small_PR/
>> >> [2] https://s.apache.org/load-test-basic-operations
>> >>
>> >>
>> >> Thanks,
>> >> Łukasz
>> >>
>>
>