You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Łukasz Gajowy <lg...@apache.org> on 2018/09/03 10:53:18 UTC

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Hi all!

I'm bumping this (in case you missed it). Any feedback and questions are
welcome!

Best regards,
Łukasz

pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net> napisał(a):

> Hi Lukasz,
>
> Thanks for the update, and the abstract looks promising.
>
> Let me take a look on the doc.
>
> Regards
> JB
>
> On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > Hi all,
> >
> > since Synthetic Sources API has been introduced in Java and Python SDK,
> > it can be used to test some basic Apache Beam operations (i.e.
> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
> > terms of performance. This, in brief, is why we'd like to share the
> > below proposal:
> >
> > _
> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> >
> > Let us know what you think in the document's comments. Thank you in
> > advance for all the feedback!
> >
> > Łukasz
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Etienne Chauchot <ec...@apache.org>.

@Alexey
We already do that with Nexmark with the graphs but it is a visual check (like I did this morning for the release vote)
Etienne
Le mardi 11 septembre 2018 à 15:05 +0200, Alexey Romanenko a écrit :
> I agree that we can benefit from having two types of performance tests (low and high level) that could complement each
> other.Can we detect a regression (if any) automatically and send a report about that? Sorry if we already do that for
> Nexmark.
> 
> > On 11 Sep 2018, at 11:29, Etienne Chauchot <ec...@apache.org> wrote:
> > 
> > Hi Lukasz,
> > Well, having low level byte[] based pure performance tests makes sense. And having high level realistic model
> > (Nexmark auction system) makes sense also to avoid testing unrealistic pipelines as you describe.
> > Have common code between the 2 seems difficult as both the architecture and the model are different.
> > I'm more concerned about having two CI mechanisms to detect functionnal/performance regressions. BestEtienne
> > Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
> > > In my opinion and as far as I understand Nexmark, there are some benefits to having both types of tests. The load
> > > tests we propose can be very straightforward and clearly show what is being tested thanks to the fact that there's
> > > no fixed model but very "low level" KV<byte[], byte[]> collections only. They are more flexible in shapes of the
> > > pipelines they can express e.g. fanout_64, without having to think about specific use cases. 
> > > 
> > > Having both types would allow developers to decide whether they want to create a new Nexmark query for their
> > > specific case or develop a new Load test (whatever is easier and more fits their case). However, there is a risk -
> > > with KV<byte[], byte[]> developer can overemphasize cases that can never happen in practice, so we need to be
> > > careful about the exact configurations we run. 
> > > 
> > > Still, I can imagine that there surely will be code that should be common to both types of tests and we seek ways
> > > to not duplicate code.
> > > 
> > > WDYT? 
> > > 
> > > Regards, 
> > > Łukasz
> > > 
> > > 
> > > 
> > > pon., 10 wrz 2018 o 16:36 Etienne Chauchot <ec...@apache.org> napisał(a):
> > > > Hi,It seems that there is a notable overlap with what Nexmark already does:Nexmark mesures performance and
> > > > regression by exercising  all the Beam model in both batch and streaming modes with several runners. It also
> > > > computes on synthetic data. Also nexmark is already included as PostCommits in the CI and dashboards.
> > > > Shall we merge the two?
> > > > Best
> > > > Etienne
> > > > Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
> > > > > Hello everyone, 
> > > > > 
> > > > > thank you for all your comments to the proposal. To sum up: 
> > > > > 
> > > > > A set of performance tests exercising Core Beam Transforms (ParDo, GroupByKey, CoGroupByKey, Combine) will be
> > > > > implemented for Java and Python SDKs. Those tests will allow to: 
> > > > > measure performance of the transforms on various runners
> > > > > exercise the transforms by creating stressful conditions and big loads using Synthetic Source and Synthetic
> > > > > Step API (delays, keeping cpu busy or asleep, processing large keys and values, performing fanout or
> > > > > reiteration of inputs)
> > > > > run both in batch and streaming context
> > > > > gather various metrics
> > > > > notice regressions by comparing data from consequent Jenkins runs  
> > > > > Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be gathered during test invocations.
> > > > > We will start with runtime and leverage Metrics API to collect the other metrics in later phases of
> > > > > development. 
> > > > > The tests will be fully configurable through pipeline options and it will be possible to run any custom
> > > > > scenarios manually. However, a representative set of testing scenarios will be run periodically using Jenkins.
> > > > > 
> > > > > Regards, 
> > > > > Łukasz 
> > > > > 
> > > > > śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rf...@google.com> napisał(a):
> > > > > > neat! left a comment or two
> > > > > > 
> > > > > > On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:
> > > > > > > Hi all! 
> > > > > > > 
> > > > > > > I'm bumping this (in case you missed it). Any feedback and questions are welcome!
> > > > > > > 
> > > > > > > Best regards, 
> > > > > > > Łukasz
> > > > > > > 
> > > > > > > pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net> napisał(a):
> > > > > > > > Hi Lukasz,
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks for the update, and the abstract looks promising.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Let me take a look on the doc.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Regards
> > > > > > > > 
> > > > > > > > JB
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > > > > > > > 
> > > > > > > > > Hi all, 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > > since Synthetic Sources API has been introduced in Java and Python SDK,
> > > > > > > > 
> > > > > > > > > it can be used to test some basic Apache Beam operations (i.e.
> > > > > > > > 
> > > > > > > > > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
> > > > > > > > 
> > > > > > > > > terms of performance. This, in brief, is why we'd like to share the
> > > > > > > > 
> > > > > > > > > below proposal:
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > > _https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > > Let us know what you think in the document's comments. Thank you in
> > > > > > > > 
> > > > > > > > > advance for all the feedback!
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > > Łukasz
> > > > > > > > 
> > > > > > > > 
> > > > > > > >

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Alexey Romanenko <ar...@gmail.com>.

I agree that we can benefit from having two types of performance tests (low and high level) that could complement each other.
Can we detect a regression (if any) automatically and send a report about that? Sorry if we already do that for Nexmark.

> On 11 Sep 2018, at 11:29, Etienne Chauchot <ec...@apache.org> wrote:
> 
> Hi Lukasz,
> 
> Well, having low level byte[] based pure performance tests makes sense. And having high level realistic model (Nexmark auction system) makes sense also to avoid testing unrealistic pipelines as you describe.
> 
> Have common code between the 2 seems difficult as both the architecture and the model are different.
> 
> I'm more concerned about having two CI mechanisms to detect functionnal/performance regressions.
> Best
> Etienne
> 
> Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
>> In my opinion and as far as I understand Nexmark, there are some benefits to having both types of tests. The load tests we propose can be very straightforward and clearly show what is being tested thanks to the fact that there's no fixed model but very "low level" KV<byte[], byte[]> collections only. They are more flexible in shapes of the pipelines they can express e.g. fanout_64, without having to think about specific use cases. 
>> 
>> Having both types would allow developers to decide whether they want to create a new Nexmark query for their specific case or develop a new Load test (whatever is easier and more fits their case). However, there is a risk - with KV<byte[], byte[]> developer can overemphasize cases that can never happen in practice, so we need to be careful about the exact configurations we run. 
>> 
>> Still, I can imagine that there surely will be code that should be common to both types of tests and we seek ways to not duplicate code.
>> 
>> WDYT? 
>> 
>> Regards, 
>> Łukasz
>> 
>> 
>> 
>> pon., 10 wrz 2018 o 16:36 Etienne Chauchot <echauchot@apache.org <ma...@apache.org>> napisał(a):
>>> Hi,
>>> It seems that there is a notable overlap with what Nexmark already does:
>>> Nexmark mesures performance and regression by exercising all the Beam model in both batch and streaming modes with several runners. It also computes on synthetic data. Also nexmark is already included as PostCommits in the CI and dashboards.
>>> 
>>> Shall we merge the two?
>>> 
>>> Best
>>> 
>>> Etienne
>>> 
>>> Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
>>>> Hello everyone, 
>>>> 
>>>> thank you for all your comments to the proposal. To sum up: 
>>>> 
>>>> A set of performance tests exercising Core Beam Transforms (ParDo, GroupByKey, CoGroupByKey, Combine) will be implemented for Java and Python SDKs. Those tests will allow to: 
>>>> measure performance of the transforms on various runners
>>>> exercise the transforms by creating stressful conditions and big loads using Synthetic Source and Synthetic Step API (delays, keeping cpu busy or asleep, processing large keys and values, performing fanout or reiteration of inputs)
>>>> run both in batch and streaming context
>>>> gather various metrics
>>>> notice regressions by comparing data from consequent Jenkins runs  
>>>> Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be gathered during test invocations. We will start with runtime and leverage Metrics API to collect the other metrics in later phases of development. 
>>>> The tests will be fully configurable through pipeline options and it will be possible to run any custom scenarios manually. However, a representative set of testing scenarios will be run periodically using Jenkins.
>>>> 
>>>> Regards, 
>>>> Łukasz 
>>>> 
>>>> śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rfernand@google.com <ma...@google.com>> napisał(a):
>>>>> neat! left a comment or two
>>>>> 
>>>>> On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lgajowy@apache.org <ma...@apache.org>> wrote:
>>>>>> Hi all! 
>>>>>> 
>>>>>> I'm bumping this (in case you missed it). Any feedback and questions are welcome!
>>>>>> 
>>>>>> Best regards, 
>>>>>> Łukasz
>>>>>> 
>>>>>> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb@nanthrax.net <ma...@nanthrax.net>> napisał(a):
>>>>>>> Hi Lukasz,
>>>>>>> 
>>>>>>> Thanks for the update, and the abstract looks promising.
>>>>>>> 
>>>>>>> Let me take a look on the doc.
>>>>>>> 
>>>>>>> Regards
>>>>>>> JB
>>>>>>> 
>>>>>>> On 13/08/2018 13:24, Łukasz Gajowy wrote:
>>>>>>> > Hi all, 
>>>>>>> > 
>>>>>>> > since Synthetic Sources API has been introduced in Java and Python SDK,
>>>>>>> > it can be used to test some basic Apache Beam operations (i.e.
>>>>>>> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
>>>>>>> > terms of performance. This, in brief, is why we'd like to share the
>>>>>>> > below proposal:
>>>>>>> > 
>>>>>>> > _https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_ <https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_>
>>>>>>> > 
>>>>>>> > Let us know what you think in the document's comments. Thank you in
>>>>>>> > advance for all the feedback!
>>>>>>> > 
>>>>>>> > Łukasz
>>>>>>>

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Pablo Estrada <pa...@google.com>.

I really like these. Happy to have them.
Best
-P.

On Fri, Mar 15, 2019 at 11:16 AM Łukasz Gajowy <lg...@apache.org> wrote:

> Hi Beamers,
>
> an update on this. Together With Kasia, Michał and cooperating closely
> with Pablo we have created and scheduled a Cron Job running daily 7 tests
> for GroupByKey batch scenarios. Description of the tests is in the proposal
> [1] and will be documented later. The dashboards for the tests:
>  - showing run times [2]
>  - showing total load size (bytes) [3]
>
> All the metrics are collected using Beam's Metrics API.
>
> Things we have on our horizon:
>  - the same set of tests for Java but in streaming mode
>  - similar jobs for Python SDK
>  - running similar suites on Flink runner
>
> We have also created a set of Dataproc bash scripts that can be used to
> set up a Flink cluster that supports portability [4]. It is ready to use
> and I've already successfully run the word count example using Python SDK
> on it. Hoping + aiming to run load tests on it soon. :)
>
> BTW/Last but not least: we also reused some code to collect metrics using
> Metrics API in TextIOIT too and are willing to do a similar change for
> other IOITs. Dashboards for TextIOIT: [5].
>
> Thanks,
> Łukasz
>
> [1] https://s.apache.org/load-test-basic-operations
> [2]
> https://apache-beam-testing.appspot.com/explore?dashboard=5643144871804928
> [3]
> https://apache-beam-testing.appspot.com/explore?dashboard=5701325169885184
> [4]
> https://github.com/apache/beam/blob/b1ed061fd0c1ed1da562089c939d55715907769d/.test-infra/dataproc/create_flink_cluster.sh
> [5]
> https://apache-beam-testing.appspot.com/explore?dashboard=5629522644828160
>
>
>
> śr., 12 wrz 2018 o 14:23 Etienne Chauchot <ec...@apache.org>
> napisał(a):
>
>> Let me elaborate a bit my last sentence
>> Le mardi 11 septembre 2018 à 11:29 +0200, Etienne Chauchot a écrit :
>>
>> Hi Lukasz,
>>
>> Well, having low level byte[] based pure performance tests makes sense.
>> And having high level realistic model (Nexmark auction system) makes sense
>> also to avoid testing unrealistic pipelines as you describe.
>>
>> Have common code between the 2 seems difficult as both the architecture
>> and the model are different.
>>
>> I'm more concerned about having two CI mechanisms to detect
>> functionnal/performance regressions.
>>
>>
>> Even if parts of NexMark and performance tests are the same they could
>> target different objectives: raw performance tests (the new framework) and
>> user oriented tests (nexmark). So they might be complementary.
>>
>> We must just chose how to run them. I think we need to have only one
>> automatic regression detection tool. IMHO, the most relevant for func/perf
>> regression is Nexmark because it represents what a real user could do (it
>> simulates an auction system). So let's  keep it as post commits. Post
>> commits allow to target a particular commit that introduced a regression.
>>
>> We could schedule the new performance tests.
>>
>> Best
>> Etienne
>>
>>
>> Best
>> Etienne
>>
>> Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
>>
>> In my opinion and as far as I understand Nexmark, there are some benefits
>> to having both types of tests. The load tests we propose can be very
>> straightforward and clearly show what is being tested thanks to the fact
>> that there's no fixed model but very "low level" KV<byte[], byte[]>
>> collections only. They are more flexible in shapes of the pipelines they
>> can express e.g. fanout_64, without having to think about specific use
>> cases.
>>
>> Having both types would allow developers to decide whether they want to
>> create a new Nexmark query for their specific case or develop a new Load
>> test (whatever is easier and more fits their case). However, there is a
>> risk - with KV<byte[], byte[]> developer can overemphasize cases that can
>> never happen in practice, so we need to be careful about the exact
>> configurations we run.
>>
>> Still, I can imagine that there surely will be code that should be common
>> to both types of tests and we seek ways to not duplicate code.
>>
>> WDYT?
>>
>> Regards,
>> Łukasz
>>
>>
>>
>> pon., 10 wrz 2018 o 16:36 Etienne Chauchot <ec...@apache.org>
>> napisał(a):
>>
>> Hi,
>> It seems that there is a notable overlap with what Nexmark already does:
>> Nexmark mesures performance and regression by exercising all the Beam
>> model in both batch and streaming modes with several runners. It also
>> computes on synthetic data. Also nexmark is already included as PostCommits
>> in the CI and dashboards.
>>
>> Shall we merge the two?
>>
>> Best
>>
>> Etienne
>>
>> Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
>>
>> Hello everyone,
>>
>> thank you for all your comments to the proposal. To sum up:
>>
>> A set of performance tests exercising Core Beam Transforms (ParDo,
>> GroupByKey, CoGroupByKey, Combine) will be implemented for Java and Python
>> SDKs. Those tests will allow to:
>>
>>    - measure performance of the transforms on various runners
>>    - exercise the transforms by creating stressful conditions and big
>>    loads using Synthetic Source and Synthetic Step API (delays, keeping cpu
>>    busy or asleep, processing large keys and values, performing fanout or
>>    reiteration of inputs)
>>    - run both in batch and streaming context
>>    - gather various metrics
>>    - notice regressions by comparing data from consequent Jenkins runs
>>
>> Metrics (runtime, consumed bytes, memory usage, split/bundle count) can
>> be gathered during test invocations. We will start with runtime and
>> leverage Metrics API to collect the other metrics in later phases of
>> development.
>> The tests will be fully configurable through pipeline options and it will
>> be possible to run any custom scenarios manually. However, a representative
>> set of testing scenarios will be run periodically using Jenkins.
>>
>> Regards,
>> Łukasz
>>
>> śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rf...@google.com>
>> napisał(a):
>>
>> neat! left a comment or two
>>
>> On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:
>>
>> Hi all!
>>
>> I'm bumping this (in case you missed it). Any feedback and questions are
>> welcome!
>>
>> Best regards,
>> Łukasz
>>
>> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net>
>> napisał(a):
>>
>> Hi Lukasz,
>>
>> Thanks for the update, and the abstract looks promising.
>>
>> Let me take a look on the doc.
>>
>> Regards
>> JB
>>
>> On 13/08/2018 13:24, Łukasz Gajowy wrote:
>> > Hi all,
>> >
>> > since Synthetic Sources API has been introduced in Java and Python SDK,
>> > it can be used to test some basic Apache Beam operations (i.e.
>> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
>> > terms of performance. This, in brief, is why we'd like to share the
>> > below proposal:
>> >
>> > _
>> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
>> >
>> > Let us know what you think in the document's comments. Thank you in
>> > advance for all the feedback!
>> >
>> > Łukasz
>>
>>

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Łukasz Gajowy <lg...@apache.org>.

Hi Beamers,

an update on this. Together With Kasia, Michał and cooperating closely with
Pablo we have created and scheduled a Cron Job running daily 7 tests for
GroupByKey batch scenarios. Description of the tests is in the proposal [1]
and will be documented later. The dashboards for the tests:
 - showing run times [2]
 - showing total load size (bytes) [3]

All the metrics are collected using Beam's Metrics API.

Things we have on our horizon:
 - the same set of tests for Java but in streaming mode
 - similar jobs for Python SDK
 - running similar suites on Flink runner

We have also created a set of Dataproc bash scripts that can be used to set
up a Flink cluster that supports portability [4]. It is ready to use and
I've already successfully run the word count example using Python SDK on
it. Hoping + aiming to run load tests on it soon. :)

BTW/Last but not least: we also reused some code to collect metrics using
Metrics API in TextIOIT too and are willing to do a similar change for
other IOITs. Dashboards for TextIOIT: [5].

Thanks,
Łukasz

[1] https://s.apache.org/load-test-basic-operations
[2]
https://apache-beam-testing.appspot.com/explore?dashboard=5643144871804928
[3]
https://apache-beam-testing.appspot.com/explore?dashboard=5701325169885184
[4]
https://github.com/apache/beam/blob/b1ed061fd0c1ed1da562089c939d55715907769d/.test-infra/dataproc/create_flink_cluster.sh
[5]
https://apache-beam-testing.appspot.com/explore?dashboard=5629522644828160


śr., 12 wrz 2018 o 14:23 Etienne Chauchot <ec...@apache.org> napisał(a):

> Let me elaborate a bit my last sentence
> Le mardi 11 septembre 2018 à 11:29 +0200, Etienne Chauchot a écrit :
>
> Hi Lukasz,
>
> Well, having low level byte[] based pure performance tests makes sense.
> And having high level realistic model (Nexmark auction system) makes sense
> also to avoid testing unrealistic pipelines as you describe.
>
> Have common code between the 2 seems difficult as both the architecture
> and the model are different.
>
> I'm more concerned about having two CI mechanisms to detect
> functionnal/performance regressions.
>
>
> Even if parts of NexMark and performance tests are the same they could
> target different objectives: raw performance tests (the new framework) and
> user oriented tests (nexmark). So they might be complementary.
>
> We must just chose how to run them. I think we need to have only one
> automatic regression detection tool. IMHO, the most relevant for func/perf
> regression is Nexmark because it represents what a real user could do (it
> simulates an auction system). So let's  keep it as post commits. Post
> commits allow to target a particular commit that introduced a regression.
>
> We could schedule the new performance tests.
>
> Best
> Etienne
>
>
> Best
> Etienne
>
> Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
>
> In my opinion and as far as I understand Nexmark, there are some benefits
> to having both types of tests. The load tests we propose can be very
> straightforward and clearly show what is being tested thanks to the fact
> that there's no fixed model but very "low level" KV<byte[], byte[]>
> collections only. They are more flexible in shapes of the pipelines they
> can express e.g. fanout_64, without having to think about specific use
> cases.
>
> Having both types would allow developers to decide whether they want to
> create a new Nexmark query for their specific case or develop a new Load
> test (whatever is easier and more fits their case). However, there is a
> risk - with KV<byte[], byte[]> developer can overemphasize cases that can
> never happen in practice, so we need to be careful about the exact
> configurations we run.
>
> Still, I can imagine that there surely will be code that should be common
> to both types of tests and we seek ways to not duplicate code.
>
> WDYT?
>
> Regards,
> Łukasz
>
>
>
> pon., 10 wrz 2018 o 16:36 Etienne Chauchot <ec...@apache.org>
> napisał(a):
>
> Hi,
> It seems that there is a notable overlap with what Nexmark already does:
> Nexmark mesures performance and regression by exercising all the Beam
> model in both batch and streaming modes with several runners. It also
> computes on synthetic data. Also nexmark is already included as PostCommits
> in the CI and dashboards.
>
> Shall we merge the two?
>
> Best
>
> Etienne
>
> Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
>
> Hello everyone,
>
> thank you for all your comments to the proposal. To sum up:
>
> A set of performance tests exercising Core Beam Transforms (ParDo,
> GroupByKey, CoGroupByKey, Combine) will be implemented for Java and Python
> SDKs. Those tests will allow to:
>
>    - measure performance of the transforms on various runners
>    - exercise the transforms by creating stressful conditions and big
>    loads using Synthetic Source and Synthetic Step API (delays, keeping cpu
>    busy or asleep, processing large keys and values, performing fanout or
>    reiteration of inputs)
>    - run both in batch and streaming context
>    - gather various metrics
>    - notice regressions by comparing data from consequent Jenkins runs
>
> Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be
> gathered during test invocations. We will start with runtime and leverage
> Metrics API to collect the other metrics in later phases of development.
> The tests will be fully configurable through pipeline options and it will
> be possible to run any custom scenarios manually. However, a representative
> set of testing scenarios will be run periodically using Jenkins.
>
> Regards,
> Łukasz
>
> śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rf...@google.com> napisał(a):
>
> neat! left a comment or two
>
> On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:
>
> Hi all!
>
> I'm bumping this (in case you missed it). Any feedback and questions are
> welcome!
>
> Best regards,
> Łukasz
>
> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net>
> napisał(a):
>
> Hi Lukasz,
>
> Thanks for the update, and the abstract looks promising.
>
> Let me take a look on the doc.
>
> Regards
> JB
>
> On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > Hi all,
> >
> > since Synthetic Sources API has been introduced in Java and Python SDK,
> > it can be used to test some basic Apache Beam operations (i.e.
> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
> > terms of performance. This, in brief, is why we'd like to share the
> > below proposal:
> >
> > _
> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> >
> > Let us know what you think in the document's comments. Thank you in
> > advance for all the feedback!
> >
> > Łukasz
>
>

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Etienne Chauchot <ec...@apache.org>.

Let me elaborate a bit my last sentenceLe mardi 11 septembre 2018 à 11:29 +0200, Etienne Chauchot a écrit :
> Hi Lukasz,
> 
> Well, having low level byte[] based pure performance tests makes sense. And having high level realistic model (Nexmark
> auction system) makes sense also to avoid testing unrealistic pipelines as you describe.
> 
> Have common code between the 2 seems difficult as both the architecture and the model are different.
> 
> I'm more concerned about having two CI mechanisms to detect functionnal/performance regressions.

Even if parts of NexMark and performance tests are the same they could target different objectives: raw performance
tests (the new framework) and user oriented tests (nexmark). So they might be complementary.
We must just chose how to run them. I think we need to have only one automatic regression detection tool. IMHO, the most
relevant for func/perf regression is Nexmark because it represents what a real user could do (it simulates an auction
system). So let's  keep it as post commits. Post commits allow to target a particular commit that introduced a
regression. 
We could schedule the new performance tests.
BestEtienne

>  BestEtienne
> Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
> > In my opinion and as far as I understand Nexmark, there are some benefits to having both types of tests. The load
> > tests we propose can be very straightforward and clearly show what is being tested thanks to the fact that there's
> > no fixed model but very "low level" KV<byte[], byte[]> collections only. They are more flexible in shapes of the
> > pipelines they can express e.g. fanout_64, without having to think about specific use cases. 
> > 
> > Having both types would allow developers to decide whether they want to create a new Nexmark query for their
> > specific case or develop a new Load test (whatever is easier and more fits their case). However, there is a risk -
> > with KV<byte[], byte[]> developer can overemphasize cases that can never happen in practice, so we need to be
> > careful about the exact configurations we run. 
> > 
> > Still, I can imagine that there surely will be code that should be common to both types of tests and we seek ways to
> > not duplicate code.
> > 
> > WDYT? 
> > 
> > Regards, 
> > Łukasz
> > 
> > 
> > 
> > pon., 10 wrz 2018 o 16:36 Etienne Chauchot <ec...@apache.org> napisał(a):
> > > Hi,It seems that there is a notable overlap with what Nexmark already does:Nexmark mesures performance and
> > > regression by exercising  all the Beam model in both batch and streaming modes with several runners. It also
> > > computes on synthetic data. Also nexmark is already included as PostCommits in the CI and dashboards.
> > > Shall we merge the two?
> > > Best
> > > Etienne
> > > Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
> > > > Hello everyone, 
> > > > 
> > > > thank you for all your comments to the proposal. To sum up: 
> > > > 
> > > > A set of performance tests exercising Core Beam Transforms (ParDo, GroupByKey, CoGroupByKey, Combine) will be
> > > > implemented for Java and Python SDKs. Those tests will allow to: 
> > > > measure performance of the transforms on various runners
> > > > exercise the transforms by creating stressful conditions and big loads using Synthetic Source and Synthetic Step
> > > > API (delays, keeping cpu busy or asleep, processing large keys and values, performing fanout or reiteration of
> > > > inputs)
> > > > run both in batch and streaming context
> > > > gather various metrics
> > > > notice regressions by comparing data from consequent Jenkins runs  
> > > > Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be gathered during test invocations. We
> > > > will start with runtime and leverage Metrics API to collect the other metrics in later phases of development. 
> > > > The tests will be fully configurable through pipeline options and it will be possible to run any custom
> > > > scenarios manually. However, a representative set of testing scenarios will be run periodically using Jenkins.
> > > > 
> > > > Regards, 
> > > > Łukasz 
> > > > 
> > > > śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rf...@google.com> napisał(a):
> > > > > neat! left a comment or two
> > > > > 
> > > > > On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:
> > > > > > Hi all! 
> > > > > > 
> > > > > > I'm bumping this (in case you missed it). Any feedback and questions are welcome!
> > > > > > 
> > > > > > Best regards, 
> > > > > > Łukasz
> > > > > > 
> > > > > > pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net> napisał(a):
> > > > > > > Hi Lukasz,
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Thanks for the update, and the abstract looks promising.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Let me take a look on the doc.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Regards
> > > > > > > 
> > > > > > > JB
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > > > > > > 
> > > > > > > > Hi all, 
> > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > > since Synthetic Sources API has been introduced in Java and Python SDK,
> > > > > > > 
> > > > > > > > it can be used to test some basic Apache Beam operations (i.e.
> > > > > > > 
> > > > > > > > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
> > > > > > > 
> > > > > > > > terms of performance. This, in brief, is why we'd like to share the
> > > > > > > 
> > > > > > > > below proposal:
> > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > > _https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > > Let us know what you think in the document's comments. Thank you in
> > > > > > > 
> > > > > > > > advance for all the feedback!
> > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > > Łukasz
> > > > > > > 
> > > > > > > 
> > > > > > >

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Etienne Chauchot <ec...@apache.org>.

Hi Lukasz,
Well, having low level byte[] based pure performance tests makes sense. And having high level realistic model (Nexmark
auction system) makes sense also to avoid testing unrealistic pipelines as you describe.
Have common code between the 2 seems difficult as both the architecture and the model are different.
I'm more concerned about having two CI mechanisms to detect functionnal/performance regressions. BestEtienne
Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
> In my opinion and as far as I understand Nexmark, there are some benefits to having both types of tests. The load
> tests we propose can be very straightforward and clearly show what is being tested thanks to the fact that there's no
> fixed model but very "low level" KV<byte[], byte[]> collections only. They are more flexible in shapes of the
> pipelines they can express e.g. fanout_64, without having to think about specific use cases. 
> 
> Having both types would allow developers to decide whether they want to create a new Nexmark query for their specific
> case or develop a new Load test (whatever is easier and more fits their case). However, there is a risk - with
> KV<byte[], byte[]> developer can overemphasize cases that can never happen in practice, so we need to be careful about
> the exact configurations we run. 
> 
> Still, I can imagine that there surely will be code that should be common to both types of tests and we seek ways to
> not duplicate code.
> 
> WDYT? 
> 
> Regards, 
> Łukasz
> 
> 
> 
> pon., 10 wrz 2018 o 16:36 Etienne Chauchot <ec...@apache.org> napisał(a):
> > Hi,It seems that there is a notable overlap with what Nexmark already does:Nexmark mesures performance and
> > regression by exercising  all the Beam model in both batch and streaming modes with several runners. It also
> > computes on synthetic data. Also nexmark is already included as PostCommits in the CI and dashboards.
> > Shall we merge the two?
> > Best
> > Etienne
> > Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
> > > Hello everyone, 
> > > 
> > > thank you for all your comments to the proposal. To sum up: 
> > > 
> > > A set of performance tests exercising Core Beam Transforms (ParDo, GroupByKey, CoGroupByKey, Combine) will be
> > > implemented for Java and Python SDKs. Those tests will allow to: 
> > > measure performance of the transforms on various runners
> > > exercise the transforms by creating stressful conditions and big loads using Synthetic Source and Synthetic Step
> > > API (delays, keeping cpu busy or asleep, processing large keys and values, performing fanout or reiteration of
> > > inputs)
> > > run both in batch and streaming context
> > > gather various metrics
> > > notice regressions by comparing data from consequent Jenkins runs  
> > > Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be gathered during test invocations. We
> > > will start with runtime and leverage Metrics API to collect the other metrics in later phases of development. 
> > > The tests will be fully configurable through pipeline options and it will be possible to run any custom scenarios
> > > manually. However, a representative set of testing scenarios will be run periodically using Jenkins.
> > > 
> > > Regards, 
> > > Łukasz 
> > > 
> > > śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rf...@google.com> napisał(a):
> > > > neat! left a comment or two
> > > > 
> > > > On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:
> > > > > Hi all! 
> > > > > 
> > > > > I'm bumping this (in case you missed it). Any feedback and questions are welcome!
> > > > > 
> > > > > Best regards, 
> > > > > Łukasz
> > > > > 
> > > > > pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net> napisał(a):
> > > > > > Hi Lukasz,
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Thanks for the update, and the abstract looks promising.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Let me take a look on the doc.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Regards
> > > > > > 
> > > > > > JB
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > > > > > 
> > > > > > > Hi all, 
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > > since Synthetic Sources API has been introduced in Java and Python SDK,
> > > > > > 
> > > > > > > it can be used to test some basic Apache Beam operations (i.e.
> > > > > > 
> > > > > > > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
> > > > > > 
> > > > > > > terms of performance. This, in brief, is why we'd like to share the
> > > > > > 
> > > > > > > below proposal:
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > > _https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > > Let us know what you think in the document's comments. Thank you in
> > > > > > 
> > > > > > > advance for all the feedback!
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > > Łukasz
> > > > > > 
> > > > > > 
> > > > > >

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Łukasz Gajowy <lg...@apache.org>.

In my opinion and as far as I understand Nexmark, there are some benefits
to having both types of tests. The load tests we propose can be very
straightforward and clearly show what is being tested thanks to the fact
that there's no fixed model but very "low level" KV<byte[], byte[]>
collections only. They are more flexible in shapes of the pipelines they
can express e.g. fanout_64, without having to think about specific use
cases.

Having both types would allow developers to decide whether they want to
create a new Nexmark query for their specific case or develop a new Load
test (whatever is easier and more fits their case). However, there is a
risk - with KV<byte[], byte[]> developer can overemphasize cases that can
never happen in practice, so we need to be careful about the exact
configurations we run.

Still, I can imagine that there surely will be code that should be common
to both types of tests and we seek ways to not duplicate code.

WDYT?

Regards,
Łukasz



pon., 10 wrz 2018 o 16:36 Etienne Chauchot <ec...@apache.org>
napisał(a):

> Hi,
> It seems that there is a notable overlap with what Nexmark already does:
> Nexmark mesures performance and regression by exercising all the Beam
> model in both batch and streaming modes with several runners. It also
> computes on synthetic data. Also nexmark is already included as PostCommits
> in the CI and dashboards.
>
> Shall we merge the two?
>
> Best
>
> Etienne
>
> Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
>
> Hello everyone,
>
> thank you for all your comments to the proposal. To sum up:
>
> A set of performance tests exercising Core Beam Transforms (ParDo,
> GroupByKey, CoGroupByKey, Combine) will be implemented for Java and Python
> SDKs. Those tests will allow to:
>
>    - measure performance of the transforms on various runners
>    - exercise the transforms by creating stressful conditions and big
>    loads using Synthetic Source and Synthetic Step API (delays, keeping cpu
>    busy or asleep, processing large keys and values, performing fanout or
>    reiteration of inputs)
>    - run both in batch and streaming context
>    - gather various metrics
>    - notice regressions by comparing data from consequent Jenkins runs
>
> Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be
> gathered during test invocations. We will start with runtime and leverage
> Metrics API to collect the other metrics in later phases of development.
> The tests will be fully configurable through pipeline options and it will
> be possible to run any custom scenarios manually. However, a representative
> set of testing scenarios will be run periodically using Jenkins.
>
> Regards,
> Łukasz
>
> śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rf...@google.com> napisał(a):
>
> neat! left a comment or two
>
> On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:
>
> Hi all!
>
> I'm bumping this (in case you missed it). Any feedback and questions are
> welcome!
>
> Best regards,
> Łukasz
>
> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net>
> napisał(a):
>
> Hi Lukasz,
>
> Thanks for the update, and the abstract looks promising.
>
> Let me take a look on the doc.
>
> Regards
> JB
>
> On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > Hi all,
> >
> > since Synthetic Sources API has been introduced in Java and Python SDK,
> > it can be used to test some basic Apache Beam operations (i.e.
> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
> > terms of performance. This, in brief, is why we'd like to share the
> > below proposal:
> >
> > _
> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> >
> > Let us know what you think in the document's comments. Thank you in
> > advance for all the feedback!
> >
> > Łukasz
>
>

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Etienne Chauchot <ec...@apache.org>.

Hi,It seems that there is a notable overlap with what Nexmark already does:Nexmark mesures performance and regression by
exercising  all the Beam model in both batch and streaming modes with several runners. It also computes on synthetic
data. Also nexmark is already included as PostCommits in the CI and dashboards.
Shall we merge the two?
Best
Etienne
Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
> Hello everyone, 
> 
> thank you for all your comments to the proposal. To sum up: 
> 
> A set of performance tests exercising Core Beam Transforms (ParDo, GroupByKey, CoGroupByKey, Combine) will be
> implemented for Java and Python SDKs. Those tests will allow to: 
> measure performance of the transforms on various runners
> exercise the transforms by creating stressful conditions and big loads using Synthetic Source and Synthetic Step API
> (delays, keeping cpu busy or asleep, processing large keys and values, performing fanout or reiteration of inputs)
> run both in batch and streaming context
> gather various metrics
> notice regressions by comparing data from consequent Jenkins runs  
> Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be gathered during test invocations. We will
> start with runtime and leverage Metrics API to collect the other metrics in later phases of development. 
> The tests will be fully configurable through pipeline options and it will be possible to run any custom scenarios
> manually. However, a representative set of testing scenarios will be run periodically using Jenkins.
> 
> Regards, 
> Łukasz 
> 
> śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rf...@google.com> napisał(a):
> > neat! left a comment or two
> > 
> > On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:
> > > Hi all! 
> > > 
> > > I'm bumping this (in case you missed it). Any feedback and questions are welcome!
> > > 
> > > Best regards, 
> > > Łukasz
> > > 
> > > pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net> napisał(a):
> > > > Hi Lukasz,
> > > > 
> > > > 
> > > > 
> > > > Thanks for the update, and the abstract looks promising.
> > > > 
> > > > 
> > > > 
> > > > Let me take a look on the doc.
> > > > 
> > > > 
> > > > 
> > > > Regards
> > > > 
> > > > JB
> > > > 
> > > > 
> > > > 
> > > > On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > > > 
> > > > > Hi all, 
> > > > 
> > > > > 
> > > > 
> > > > > since Synthetic Sources API has been introduced in Java and Python SDK,
> > > > 
> > > > > it can be used to test some basic Apache Beam operations (i.e.
> > > > 
> > > > > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
> > > > 
> > > > > terms of performance. This, in brief, is why we'd like to share the
> > > > 
> > > > > below proposal:
> > > > 
> > > > > 
> > > > 
> > > > > _https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> > > > 
> > > > > 
> > > > 
> > > > > Let us know what you think in the document's comments. Thank you in
> > > > 
> > > > > advance for all the feedback!
> > > > 
> > > > > 
> > > > 
> > > > > Łukasz
> > > > 
> > > > 
> > > >

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Łukasz Gajowy <lu...@gmail.com>.

Hello everyone,

thank you for all your comments to the proposal. To sum up:

A set of performance tests exercising Core Beam Transforms (ParDo,
GroupByKey, CoGroupByKey, Combine) will be implemented for Java and Python
SDKs. Those tests will allow to:

   - measure performance of the transforms on various runners
   - exercise the transforms by creating stressful conditions and big loads
   using Synthetic Source and Synthetic Step API (delays, keeping cpu busy or
   asleep, processing large keys and values, performing fanout or reiteration
   of inputs)
   - run both in batch and streaming context
   - gather various metrics
   - notice regressions by comparing data from consequent Jenkins runs

Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be
gathered during test invocations. We will start with runtime and leverage
Metrics API to collect the other metrics in later phases of development.
The tests will be fully configurable through pipeline options and it will
be possible to run any custom scenarios manually. However, a representative
set of testing scenarios will be run periodically using Jenkins.

Regards,
Łukasz

śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rf...@google.com> napisał(a):

> neat! left a comment or two
>
> On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:
>
>> Hi all!
>>
>> I'm bumping this (in case you missed it). Any feedback and questions are
>> welcome!
>>
>> Best regards,
>> Łukasz
>>
>> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net>
>> napisał(a):
>>
>>> Hi Lukasz,
>>>
>>> Thanks for the update, and the abstract looks promising.
>>>
>>> Let me take a look on the doc.
>>>
>>> Regards
>>> JB
>>>
>>> On 13/08/2018 13:24, Łukasz Gajowy wrote:
>>> > Hi all,
>>> >
>>> > since Synthetic Sources API has been introduced in Java and Python SDK,
>>> > it can be used to test some basic Apache Beam operations (i.e.
>>> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
>>> > terms of performance. This, in brief, is why we'd like to share the
>>> > below proposal:
>>> >
>>> > _
>>> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
>>> >
>>> > Let us know what you think in the document's comments. Thank you in
>>> > advance for all the feedback!
>>> >
>>> > Łukasz
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Posted by Rafael Fernandez <rf...@google.com>.

neat! left a comment or two

On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lg...@apache.org> wrote:

> Hi all!
>
> I'm bumping this (in case you missed it). Any feedback and questions are
> welcome!
>
> Best regards,
> Łukasz
>
> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <jb...@nanthrax.net>
> napisał(a):
>
>> Hi Lukasz,
>>
>> Thanks for the update, and the abstract looks promising.
>>
>> Let me take a look on the doc.
>>
>> Regards
>> JB
>>
>> On 13/08/2018 13:24, Łukasz Gajowy wrote:
>> > Hi all,
>> >
>> > since Synthetic Sources API has been introduced in Java and Python SDK,
>> > it can be used to test some basic Apache Beam operations (i.e.
>> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
>> > terms of performance. This, in brief, is why we'd like to share the
>> > below proposal:
>> >
>> > _
>> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
>> >
>> > Let us know what you think in the document's comments. Thank you in
>> > advance for all the feedback!
>> >
>> > Łukasz
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>