You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pranav Bhandari <bh...@gmail.com> on 2022/09/26 16:07:22 UTC

Performance and Cost benchmarking

Hello,

Hope this email finds you well. I have attached a link to a doc which
discusses the design for a performance and cost benchmarking framework to
be used by Beam IOs and Google-provided dataflow templates.

Please feel free to comment on the doc with any questions, concerns or
ideas you might have.

Thank you,
Pranav Bhandari


https://docs.google.com/document/d/14GatBilwuR4jJGb-ZNpYeuB-KkVmDvEm/edit?usp=sharing&ouid=102139643796739130048&rtpof=true&sd=true

Re: Performance and Cost benchmarking

Posted by Yi Hu via dev <de...@beam.apache.org>.
Hi Andrew and Pranav,

Thanks for pointing out the current infrastructure we have in Beam. I have
investigated the current Performance tests of Beam IOs in the Beam repo and
summarized the current tools and infrastructure we have in this document:
https://docs.google.com/document/d/11CgNVtyZSipoRiJ2O57hhqShUw_FQDTj6rEOzudwsK4/edit#
. I also put some brief designs about how can we incorporate them into the
test framework proposed by Pranav.

Best,
Yi

On Tue, Sep 27, 2022 at 12:06 PM Alexey Romanenko <ar...@gmail.com>
wrote:

> Thanks for raising this topic.
>
> > On 26 Sep 2022, at 23:32, Andrew Pilloud via dev <de...@beam.apache.org>
> wrote:
> >
> > I left some comments on your design. Your doc discusses a bunch of
> > details about infrastructure such as testing frameworks, automation,
> > and performance databases, but doesn't describe how it will fit in
> > with our existing infrastructure (Load Tests, Nexmark, Jenkins,
> > InfluxDB, Grafina). I would suspect we actually have most of the
> > infrastructure already built?
>
> Right, I’m second on this question. We already have an infrastructure
> ready to run a bunch of different benchmarks/load tests and
> collect/present/analyse the results. Of course, there is a field for
> improvements, but it would be great to take this into account and add the
> details how this benchmark can be integrated into (to avoid a double job
> for further support).
>
>
> —
> Alexey
>
> > On Mon, Sep 26, 2022 at 9:07 AM Pranav Bhandari
> > <bh...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> Hope this email finds you well. I have attached a link to a doc which
> discusses the design for a performance and cost benchmarking framework to
> be used by Beam IOs and Google-provided dataflow templates.
> >>
> >> Please feel free to comment on the doc with any questions, concerns or
> ideas you might have.
> >>
> >> Thank you,
> >> Pranav Bhandari
> >>
> >>
> >>
> https://docs.google.com/document/d/14GatBilwuR4jJGb-ZNpYeuB-KkVmDvEm/edit?usp=sharing&ouid=102139643796739130048&rtpof=true&sd=true
>
>

Re: Performance and Cost benchmarking

Posted by Alexey Romanenko <ar...@gmail.com>.
Thanks for raising this topic.

> On 26 Sep 2022, at 23:32, Andrew Pilloud via dev <de...@beam.apache.org> wrote:
> 
> I left some comments on your design. Your doc discusses a bunch of
> details about infrastructure such as testing frameworks, automation,
> and performance databases, but doesn't describe how it will fit in
> with our existing infrastructure (Load Tests, Nexmark, Jenkins,
> InfluxDB, Grafina). I would suspect we actually have most of the
> infrastructure already built?

Right, I’m second on this question. We already have an infrastructure ready to run a bunch of different benchmarks/load tests and collect/present/analyse the results. Of course, there is a field for improvements, but it would be great to take this into account and add the details how this benchmark can be integrated into (to avoid a double job for further support).


—
Alexey

> On Mon, Sep 26, 2022 at 9:07 AM Pranav Bhandari
> <bh...@gmail.com> wrote:
>> 
>> Hello,
>> 
>> Hope this email finds you well. I have attached a link to a doc which discusses the design for a performance and cost benchmarking framework to be used by Beam IOs and Google-provided dataflow templates.
>> 
>> Please feel free to comment on the doc with any questions, concerns or ideas you might have.
>> 
>> Thank you,
>> Pranav Bhandari
>> 
>> 
>> https://docs.google.com/document/d/14GatBilwuR4jJGb-ZNpYeuB-KkVmDvEm/edit?usp=sharing&ouid=102139643796739130048&rtpof=true&sd=true


Re: Performance and Cost benchmarking

Posted by Andrew Pilloud via dev <de...@beam.apache.org>.
Hi Pranav,

I left some comments on your design. Your doc discusses a bunch of
details about infrastructure such as testing frameworks, automation,
and performance databases, but doesn't describe how it will fit in
with our existing infrastructure (Load Tests, Nexmark, Jenkins,
InfluxDB, Grafina). I would suspect we actually have most of the
infrastructure already built?

What I didn't see (and expected to see) was details on how the tests
would actually interact with IOs. Will there be a generic Schema IO
test harness or do you plan to write one for each IO? Will you be
comparing different data types (data stored as byte[] vs more complex
structures)? What about different IO specific optimization (data
sharding, pushdown)?

Andrew

On Mon, Sep 26, 2022 at 9:07 AM Pranav Bhandari
<bh...@gmail.com> wrote:
>
> Hello,
>
> Hope this email finds you well. I have attached a link to a doc which discusses the design for a performance and cost benchmarking framework to be used by Beam IOs and Google-provided dataflow templates.
>
> Please feel free to comment on the doc with any questions, concerns or ideas you might have.
>
> Thank you,
> Pranav Bhandari
>
>
> https://docs.google.com/document/d/14GatBilwuR4jJGb-ZNpYeuB-KkVmDvEm/edit?usp=sharing&ouid=102139643796739130048&rtpof=true&sd=true

Re: Performance and Cost benchmarking

Posted by Yi Hu via dev <de...@beam.apache.org>.
Hi everyone,

Thanks for your attention. Since this last thread there are works
implementing the utilities for the performance and cost benchmarking
framework ongoing, currently under the DataflowTemplate repository (
https://github.com/GoogleCloudPlatform/DataflowTemplates). In order to use
these utilities for IO performance tests hosted in Beam repo, we plan to
migrate these utilities to Beam repository.

I have attached a link [1] describes the migration plan. Please feel free
to comment on the doc with any questions, suggestions, and of course any
concerns.

Best,
Yi


 template-it-to-beam
<https://docs.google.com/document/d/11RBh9_Escr8jq93tev2ADF7Wdw4St89PL4ZNrmHyGNI/edit?usp=drive_web>



On Mon, Sep 26, 2022 at 12:07 PM Pranav Bhandari <
bhandari.pranav22@gmail.com> wrote:

> Hello,
>
> Hope this email finds you well. I have attached a link to a doc which
> discusses the design for a performance and cost benchmarking framework to
> be used by Beam IOs and Google-provided dataflow templates.
>
> Please feel free to comment on the doc with any questions, concerns or
> ideas you might have.
>
> Thank you,
> Pranav Bhandari
>
>
>
> https://docs.google.com/document/d/14GatBilwuR4jJGb-ZNpYeuB-KkVmDvEm/edit?usp=sharing&ouid=102139643796739130048&rtpof=true&sd=true
>