You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ismaël Mejía <ie...@gmail.com> on 2019/07/03 07:23:22 UTC

Re: Stop using Perfkit Benchmarker tool in all tests?

+1 to remove Perfkit if we can cover what we need without it.
One less tool to 'learn/understand/maintain' is always good.

On Fri, Jun 28, 2019 at 5:31 PM Lukasz Cwik <lc...@google.com> wrote:
>
> +1 for removing tests that are not maintained.
>
> Are there features in Perfkit that we would like to be using that we aren't?
> Can we make the integration with Perfkit less brittle?
>
> If we aren't getting much and don't plan to get much value in the short term, removal makes sense to me.
>
> On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy <lg...@apache.org> wrote:
>>
>> Hi all,
>>
>> moving the discussion to the dev list: https://github.com/apache/beam/pull/8919. I think that Perfkit Benchmarker should be removed from all our tests.
>>
>> Problems that we face currently:
>>
>> Changes to Gradle tasks/build configuration in the Beam codebase have to be reflected in Perfkit code. This required PRs to Perfkit which can last and the tests break due to this sometimes (no change in Perfkit + change already there in beam = incompatibility). This is what happened in PR 8919 (above),
>> Can't run in Python3 (depends on python 2 only library like functools32),
>> Black box testing which hard to collect pipeline related metrics,
>> Measurement of run time is inaccurate,
>> It offers relatively small elasticity in comparison with eg. Jenkins tasks in terms of setting up the testing infrastructure (runners, databases). For example, if we'd like to setup Flink runner, and reuse it in consequent tests in one go, that would be impossible. We can easily do this in Jenkins.
>>
>> Tests that use Perfkit:
>>
>>  IO integration tests,
>>  Python performance tests,
>>  beam_PerformanceTests_Dataflow (disabled),
>>  beam_PerformanceTests_Spark (failing constantly - looks not maintained).
>>
>> From the IOIT perspective (1), only the code that setups/tears down Kubernetes resources is useful right now but these parts can be easily implemented in Jenkins/Gradle code. That would make Perfkit obsolete in IOIT because we already collect metrics using Metrics API and store them in BigQuery directly.
>>
>> As for point 2: I have no knowledge of how complex the task would be (help needed).
>>
>> Regarding 3, 4: Those tests seem to be not maintained - should we remove them?
>>
>> Opinions?
>>
>> Thank you,
>> Łukasz
>>
>>
>>
>>