You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pei HE <pe...@apache.org> on 2017/09/15 04:33:30 UTC

Re: [DISCUSSION] using NexMark for Beam

Could any Googlers help to run NexMark on Dataflow streaming and share the
numbers with the community?
--
Pei

On Fri, Aug 25, 2017 at 11:28 PM, Lukasz Cwik <lc...@google.com.invalid>
wrote:

> Etienne, cut some JIRAs for improvements like ValidatesRunner for the
> Nexmark suite that you think are worthy. Some of them might be good
> 'starter' tasks as well.
>
> On Fri, Aug 25, 2017 at 1:43 AM, Etienne Chauchot <ec...@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > There is also some points to discuss:
> >
> > - I think some of the tests in this test suite should be generalized as
> > validatesRunner tests like it was done for example for custom window
> > merging (https://github.com/apache/beam/blob/5181e619f17e1f69fabe8d5
> > bdfc7a3a6a2142cde/sdks/java/core/src/test/java/org/apache/
> > beam/sdk/transforms/windowing/WindowTest.java#L591)
> >
> > - We have run almost no tests on Dataflow, so if someone could run the
> > test suite on dataflow, he's very welcome. All needed information are
> still
> > in the README, but I'll move these info to the website.
> >
> > - other points?
> >
> > WDYT?
> >
> > Best,
> >
> > Etienne
> >
> >
> >
> > Le 24/08/2017 à 18:35, Lukasz Cwik a écrit :
> >
> >> Yeah, was looking forward to this.
> >>
> >> On Thu, Aug 24, 2017 at 9:20 AM, Tyler Akidau
> <takidau@google.com.invalid
> >> >
> >> wrote:
> >>
> >> Awesome news, thank you! :-D
> >>>
> >>> On Thu, Aug 24, 2017 at 12:40 AM Etienne Chauchot <echauchot@gmail.com
> >
> >>> wrote:
> >>>
> >>> Hi all,
> >>>>
> >>>> I wanted to let you know that the Nexmark PR is merged into master.
> Feel
> >>>> free to use it (e.g. performance testing, release testing ...).
> >>>>
> >>>> Etienne
> >>>>
> >>>> Le 12/05/2017 à 10:55, Etienne Chauchot a écrit :
> >>>>
> >>>>> Hi guys,
> >>>>>
> >>>>> I wanted to let you know that I have just submitted a PR around
> >>>>> NexMark. This is a port of the NexMark queries to Beam, to be used as
> >>>>> integration tests.
> >>>>> This can also be used as A-B testing (no-regression or performance
> >>>>> comparison between 2 versions of the same engine or of the same
> runner)
> >>>>>
> >>>>> This a continuation of the previous PR (#99) from Mark Shields.
> >>>>> The code has changed quite a bit: some queries have changed to use
> new
> >>>>> Beam APIs and there where some big refactorings. More important, we
> >>>>> can now run all the queries in all the runners.
> >>>>>
> >>>>> Nevertheless, there are still some open issues in Nexmark
> >>>>> (https://github.com/iemejia/beam/issues) and in Beam upstream (see
> >>>>> issue links in https://issues.apache.org/jira/browse/BEAM-160)
> >>>>>
> >>>>> I wanted to submit the PR before our (Ismaël and I) NexMark talk at
> >>>>> the ApacheCon. The PR is not perfect but it is in a good shape to
> >>>>> share it.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Etienne
> >>>>>
> >>>>>
> >>>>>
> >>>>> Le 22/03/2017 à 04:51, Kenneth Knowles a écrit :
> >>>>>
> >>>>>> This is great! Having a variety of realistic-ish pipelines running
> on
> >>>>>> all
> >>>>>> runners complements the validation suite and IO IT work.
> >>>>>>
> >>>>>> If I recall, some of these involve heavy and esoteric uses of state,
> >>>>>>
> >>>>> so
> >>>
> >>>> definitely give me a ping if you hit any trouble.
> >>>>>>
> >>>>>> Kenn
> >>>>>>
> >>>>>> On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <
> >>>>>>
> >>>>> echauchot@gmail.com>
> >>>
> >>>> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>>
> >>>>>>> Ismael and I are working on upgrading the Nexmark implementation
> for
> >>>>>>> Beam.
> >>>>>>> See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and
> >>>>>>> https://issues.apache.org/jira/browse/BEAM-160. We are continuing
> >>>>>>>
> >>>>>> the
> >>>
> >>>> work done by Mark Shields. See https://github.com/apache/
> >>>>>>>
> >>>>>> beam/pull/366
> >>>
> >>>> for the original PR.
> >>>>>>>
> >>>>>>> The PR contains queries that have a wide coverage of the Beam model
> >>>>>>>
> >>>>>> and
> >>>
> >>>> that represent a realistic end user use case (some come from client
> >>>>>>> experience on Google Cloud Dataflow).
> >>>>>>>
> >>>>>>> So far, we have upgraded the implementation to the latest Beam
> >>>>>>> snapshot.
> >>>>>>> And we are able to execute a good subset of the queries in the
> >>>>>>> different
> >>>>>>> runners. We upgraded the nexmark drivers to do so: direct driver
> >>>>>>> (upgraded
> >>>>>>> from inProcessDriver) and flink driver and we added a new one for
> >>>>>>> spark.
> >>>>>>>
> >>>>>>> There is still a good amount of work to do and we would like to
> know
> >>>>>>>
> >>>>>> if
> >>>
> >>>> you think that this contribution can have its place into Beam
> >>>>>>> eventually.
> >>>>>>>
> >>>>>>> The interests of having Nexmark on Beam that we have seen so far
> are:
> >>>>>>>
> >>>>>>> - Rich batch/streaming test
> >>>>>>>
> >>>>>>> - A-B testing of runners or runtimes (non-regression, performance
> >>>>>>> comparison between versions ...)
> >>>>>>>
> >>>>>>> - Integration testing (sdk/runners, runner/runtime, ...)
> >>>>>>>
> >>>>>>> - Validate beam capability matrix
> >>>>>>>
> >>>>>>> - It can be used as part of the ongoing PerfKit work (if there is
> any
> >>>>>>> interest).
> >>>>>>>
> >>>>>>> As a final note, we are tracking the issues in the same repo. If
> >>>>>>> someone
> >>>>>>> is interested in contributing, or have more ideas, you are welcome
> :)
> >>>>>>>
> >>>>>>> Etienne
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>
> >
>

Re: [DISCUSSION] using NexMark for Beam

Posted by Etienne Chauchot <ec...@gmail.com>.
@Reuven,

Tell me if I can help on that

Etienne
Le 15/09/2017 à 06:44, Reuven Lax a écrit :
> It's being worked on. Turns out there are some modifications still needed
> to the NexMark queries.
>
> Reuven
>
> On Thu, Sep 14, 2017 at 9:33 PM, Pei HE <pe...@apache.org> wrote:
>
>> Could any Googlers help to run NexMark on Dataflow streaming and share the
>> numbers with the community?
>> --
>> Pei
>>
>> On Fri, Aug 25, 2017 at 11:28 PM, Lukasz Cwik <lc...@google.com.invalid>
>> wrote:
>>
>>> Etienne, cut some JIRAs for improvements like ValidatesRunner for the
>>> Nexmark suite that you think are worthy. Some of them might be good
>>> 'starter' tasks as well.
>>>
>>> On Fri, Aug 25, 2017 at 1:43 AM, Etienne Chauchot <ec...@gmail.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> There is also some points to discuss:
>>>>
>>>> - I think some of the tests in this test suite should be generalized as
>>>> validatesRunner tests like it was done for example for custom window
>>>> merging (https://github.com/apache/beam/blob/5181e619f17e1f69fabe8d5
>>>> bdfc7a3a6a2142cde/sdks/java/core/src/test/java/org/apache/
>>>> beam/sdk/transforms/windowing/WindowTest.java#L591)
>>>>
>>>> - We have run almost no tests on Dataflow, so if someone could run the
>>>> test suite on dataflow, he's very welcome. All needed information are
>>> still
>>>> in the README, but I'll move these info to the website.
>>>>
>>>> - other points?
>>>>
>>>> WDYT?
>>>>
>>>> Best,
>>>>
>>>> Etienne
>>>>
>>>>
>>>>
>>>> Le 24/08/2017 à 18:35, Lukasz Cwik a écrit :
>>>>
>>>>> Yeah, was looking forward to this.
>>>>>
>>>>> On Thu, Aug 24, 2017 at 9:20 AM, Tyler Akidau
>>> <takidau@google.com.invalid
>>>>> wrote:
>>>>>
>>>>> Awesome news, thank you! :-D
>>>>>> On Thu, Aug 24, 2017 at 12:40 AM Etienne Chauchot <
>> echauchot@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>> I wanted to let you know that the Nexmark PR is merged into master.
>>> Feel
>>>>>>> free to use it (e.g. performance testing, release testing ...).
>>>>>>>
>>>>>>> Etienne
>>>>>>>
>>>>>>> Le 12/05/2017 à 10:55, Etienne Chauchot a écrit :
>>>>>>>
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> I wanted to let you know that I have just submitted a PR around
>>>>>>>> NexMark. This is a port of the NexMark queries to Beam, to be used
>> as
>>>>>>>> integration tests.
>>>>>>>> This can also be used as A-B testing (no-regression or performance
>>>>>>>> comparison between 2 versions of the same engine or of the same
>>> runner)
>>>>>>>> This a continuation of the previous PR (#99) from Mark Shields.
>>>>>>>> The code has changed quite a bit: some queries have changed to use
>>> new
>>>>>>>> Beam APIs and there where some big refactorings. More important, we
>>>>>>>> can now run all the queries in all the runners.
>>>>>>>>
>>>>>>>> Nevertheless, there are still some open issues in Nexmark
>>>>>>>> (https://github.com/iemejia/beam/issues) and in Beam upstream (see
>>>>>>>> issue links in https://issues.apache.org/jira/browse/BEAM-160)
>>>>>>>>
>>>>>>>> I wanted to submit the PR before our (Ismaël and I) NexMark talk at
>>>>>>>> the ApacheCon. The PR is not perfect but it is in a good shape to
>>>>>>>> share it.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Etienne
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 22/03/2017 à 04:51, Kenneth Knowles a écrit :
>>>>>>>>
>>>>>>>>> This is great! Having a variety of realistic-ish pipelines running
>>> on
>>>>>>>>> all
>>>>>>>>> runners complements the validation suite and IO IT work.
>>>>>>>>>
>>>>>>>>> If I recall, some of these involve heavy and esoteric uses of
>> state,
>>>>>>>> so
>>>>>>> definitely give me a ping if you hit any trouble.
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <
>>>>>>>>>
>>>>>>>> echauchot@gmail.com>
>>>>>>> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>> Ismael and I are working on upgrading the Nexmark implementation
>>> for
>>>>>>>>>> Beam.
>>>>>>>>>> See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and
>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-160. We are
>> continuing
>>>>>>>>> the
>>>>>>> work done by Mark Shields. See https://github.com/apache/
>>>>>>>>> beam/pull/366
>>>>>>> for the original PR.
>>>>>>>>>> The PR contains queries that have a wide coverage of the Beam
>> model
>>>>>>>>> and
>>>>>>> that represent a realistic end user use case (some come from client
>>>>>>>>>> experience on Google Cloud Dataflow).
>>>>>>>>>>
>>>>>>>>>> So far, we have upgraded the implementation to the latest Beam
>>>>>>>>>> snapshot.
>>>>>>>>>> And we are able to execute a good subset of the queries in the
>>>>>>>>>> different
>>>>>>>>>> runners. We upgraded the nexmark drivers to do so: direct driver
>>>>>>>>>> (upgraded
>>>>>>>>>> from inProcessDriver) and flink driver and we added a new one for
>>>>>>>>>> spark.
>>>>>>>>>>
>>>>>>>>>> There is still a good amount of work to do and we would like to
>>> know
>>>>>>>>> if
>>>>>>> you think that this contribution can have its place into Beam
>>>>>>>>>> eventually.
>>>>>>>>>>
>>>>>>>>>> The interests of having Nexmark on Beam that we have seen so far
>>> are:
>>>>>>>>>> - Rich batch/streaming test
>>>>>>>>>>
>>>>>>>>>> - A-B testing of runners or runtimes (non-regression, performance
>>>>>>>>>> comparison between versions ...)
>>>>>>>>>>
>>>>>>>>>> - Integration testing (sdk/runners, runner/runtime, ...)
>>>>>>>>>>
>>>>>>>>>> - Validate beam capability matrix
>>>>>>>>>>
>>>>>>>>>> - It can be used as part of the ongoing PerfKit work (if there is
>>> any
>>>>>>>>>> interest).
>>>>>>>>>>
>>>>>>>>>> As a final note, we are tracking the issues in the same repo. If
>>>>>>>>>> someone
>>>>>>>>>> is interested in contributing, or have more ideas, you are
>> welcome
>>> :)
>>>>>>>>>> Etienne
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>


Re: [DISCUSSION] using NexMark for Beam

Posted by Reuven Lax <re...@google.com.INVALID>.
It's being worked on. Turns out there are some modifications still needed
to the NexMark queries.

Reuven

On Thu, Sep 14, 2017 at 9:33 PM, Pei HE <pe...@apache.org> wrote:

> Could any Googlers help to run NexMark on Dataflow streaming and share the
> numbers with the community?
> --
> Pei
>
> On Fri, Aug 25, 2017 at 11:28 PM, Lukasz Cwik <lc...@google.com.invalid>
> wrote:
>
> > Etienne, cut some JIRAs for improvements like ValidatesRunner for the
> > Nexmark suite that you think are worthy. Some of them might be good
> > 'starter' tasks as well.
> >
> > On Fri, Aug 25, 2017 at 1:43 AM, Etienne Chauchot <ec...@gmail.com>
> > wrote:
> >
> > > Hi guys,
> > >
> > > There is also some points to discuss:
> > >
> > > - I think some of the tests in this test suite should be generalized as
> > > validatesRunner tests like it was done for example for custom window
> > > merging (https://github.com/apache/beam/blob/5181e619f17e1f69fabe8d5
> > > bdfc7a3a6a2142cde/sdks/java/core/src/test/java/org/apache/
> > > beam/sdk/transforms/windowing/WindowTest.java#L591)
> > >
> > > - We have run almost no tests on Dataflow, so if someone could run the
> > > test suite on dataflow, he's very welcome. All needed information are
> > still
> > > in the README, but I'll move these info to the website.
> > >
> > > - other points?
> > >
> > > WDYT?
> > >
> > > Best,
> > >
> > > Etienne
> > >
> > >
> > >
> > > Le 24/08/2017 à 18:35, Lukasz Cwik a écrit :
> > >
> > >> Yeah, was looking forward to this.
> > >>
> > >> On Thu, Aug 24, 2017 at 9:20 AM, Tyler Akidau
> > <takidau@google.com.invalid
> > >> >
> > >> wrote:
> > >>
> > >> Awesome news, thank you! :-D
> > >>>
> > >>> On Thu, Aug 24, 2017 at 12:40 AM Etienne Chauchot <
> echauchot@gmail.com
> > >
> > >>> wrote:
> > >>>
> > >>> Hi all,
> > >>>>
> > >>>> I wanted to let you know that the Nexmark PR is merged into master.
> > Feel
> > >>>> free to use it (e.g. performance testing, release testing ...).
> > >>>>
> > >>>> Etienne
> > >>>>
> > >>>> Le 12/05/2017 à 10:55, Etienne Chauchot a écrit :
> > >>>>
> > >>>>> Hi guys,
> > >>>>>
> > >>>>> I wanted to let you know that I have just submitted a PR around
> > >>>>> NexMark. This is a port of the NexMark queries to Beam, to be used
> as
> > >>>>> integration tests.
> > >>>>> This can also be used as A-B testing (no-regression or performance
> > >>>>> comparison between 2 versions of the same engine or of the same
> > runner)
> > >>>>>
> > >>>>> This a continuation of the previous PR (#99) from Mark Shields.
> > >>>>> The code has changed quite a bit: some queries have changed to use
> > new
> > >>>>> Beam APIs and there where some big refactorings. More important, we
> > >>>>> can now run all the queries in all the runners.
> > >>>>>
> > >>>>> Nevertheless, there are still some open issues in Nexmark
> > >>>>> (https://github.com/iemejia/beam/issues) and in Beam upstream (see
> > >>>>> issue links in https://issues.apache.org/jira/browse/BEAM-160)
> > >>>>>
> > >>>>> I wanted to submit the PR before our (Ismaël and I) NexMark talk at
> > >>>>> the ApacheCon. The PR is not perfect but it is in a good shape to
> > >>>>> share it.
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> Etienne
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Le 22/03/2017 à 04:51, Kenneth Knowles a écrit :
> > >>>>>
> > >>>>>> This is great! Having a variety of realistic-ish pipelines running
> > on
> > >>>>>> all
> > >>>>>> runners complements the validation suite and IO IT work.
> > >>>>>>
> > >>>>>> If I recall, some of these involve heavy and esoteric uses of
> state,
> > >>>>>>
> > >>>>> so
> > >>>
> > >>>> definitely give me a ping if you hit any trouble.
> > >>>>>>
> > >>>>>> Kenn
> > >>>>>>
> > >>>>>> On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <
> > >>>>>>
> > >>>>> echauchot@gmail.com>
> > >>>
> > >>>> wrote:
> > >>>>>>
> > >>>>>> Hi all,
> > >>>>>>>
> > >>>>>>> Ismael and I are working on upgrading the Nexmark implementation
> > for
> > >>>>>>> Beam.
> > >>>>>>> See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and
> > >>>>>>> https://issues.apache.org/jira/browse/BEAM-160. We are
> continuing
> > >>>>>>>
> > >>>>>> the
> > >>>
> > >>>> work done by Mark Shields. See https://github.com/apache/
> > >>>>>>>
> > >>>>>> beam/pull/366
> > >>>
> > >>>> for the original PR.
> > >>>>>>>
> > >>>>>>> The PR contains queries that have a wide coverage of the Beam
> model
> > >>>>>>>
> > >>>>>> and
> > >>>
> > >>>> that represent a realistic end user use case (some come from client
> > >>>>>>> experience on Google Cloud Dataflow).
> > >>>>>>>
> > >>>>>>> So far, we have upgraded the implementation to the latest Beam
> > >>>>>>> snapshot.
> > >>>>>>> And we are able to execute a good subset of the queries in the
> > >>>>>>> different
> > >>>>>>> runners. We upgraded the nexmark drivers to do so: direct driver
> > >>>>>>> (upgraded
> > >>>>>>> from inProcessDriver) and flink driver and we added a new one for
> > >>>>>>> spark.
> > >>>>>>>
> > >>>>>>> There is still a good amount of work to do and we would like to
> > know
> > >>>>>>>
> > >>>>>> if
> > >>>
> > >>>> you think that this contribution can have its place into Beam
> > >>>>>>> eventually.
> > >>>>>>>
> > >>>>>>> The interests of having Nexmark on Beam that we have seen so far
> > are:
> > >>>>>>>
> > >>>>>>> - Rich batch/streaming test
> > >>>>>>>
> > >>>>>>> - A-B testing of runners or runtimes (non-regression, performance
> > >>>>>>> comparison between versions ...)
> > >>>>>>>
> > >>>>>>> - Integration testing (sdk/runners, runner/runtime, ...)
> > >>>>>>>
> > >>>>>>> - Validate beam capability matrix
> > >>>>>>>
> > >>>>>>> - It can be used as part of the ongoing PerfKit work (if there is
> > any
> > >>>>>>> interest).
> > >>>>>>>
> > >>>>>>> As a final note, we are tracking the issues in the same repo. If
> > >>>>>>> someone
> > >>>>>>> is interested in contributing, or have more ideas, you are
> welcome
> > :)
> > >>>>>>>
> > >>>>>>> Etienne
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>
> > >
> >
>