You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Charles Chen <cc...@google.com> on 2018/02/08 01:55:43 UTC

A 15x speed-up in local Python DirectRunner execution

Local execution of Beam pipelines on the Python DirectRunner currently
suffers from performance issues, which makes it hard for pipeline authors
to iterate, especially on medium to large size datasets.  We would like to
optimize and make this a better experience for Beam users.

The FnApiRunner was written as a way of leveraging the portability
framework execution code path for local portability development. We've
found it also provides great speedups in batch execution with no user
changes required, so we propose to switch to use this runner by default in
batch pipelines.  For example, WordCount on the Shakespeare dataset with a
single CPU core now takes 50 seconds to run, compared to 12 minutes before;
this is a 15x performance improvement that users can get for free, with no
user pipeline changes.

The JIRA for this change is here (
https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch is
available here (https://github.com/apache/beam/pull/4634). I have been
working over the last month on making this an automatic drop-in replacement
for the current DirectRunner when applicable.  Before it becomes the
default, you can try this runner now by manually specifying
apache_beam.runners.portability.fn_api_runner.FnApiRunner as the runner.

Even with this change, local Python pipeline execution can only effectively
use one core because of the Python GIL.  A natural next step to further
improve performance will be to refactor the FnApiRunner to allow for
multi-process execution.  This is being tracked here (
https://issues.apache.org/jira/browse/BEAM-3645).

Best,

Charles

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Robert Bradshaw <ro...@google.com>.

Yes it does work for Java pipelines, modulo
https://github.com/apache/beam/pull/4211 . I'm actually not sure what
the performance characteristics are; but I'm sure it's not as dramatic
as improvement (if any) compared to what we see in Python. It's great
for development though.

On Fri, Feb 16, 2018 at 12:06 PM, Marián Dvorský <ma...@google.com> wrote:
> Does the same runner work for Java pipelines? (I assume so, given that it
> uses portability framework.) If so, does it provide similar speedup?
>
> On Fri, Feb 16, 2018 at 7:37 PM Robert Bradshaw <ro...@google.com> wrote:
>>
>> If there are no concerns, I say let's merge this.
>>
>> On Fri, Feb 16, 2018 at 9:39 AM, Charles Chen <cc...@google.com> wrote:
>> > I hope those interested have had time to test this out.  I have sent out
>> > https://github.com/apache/beam/pull/4696 to switch to using this fast
>> > runner
>> > as the default DirectRunner for local execution.  Let me know if there
>> > are
>> > any concerns.
>> >
>> > On Tue, Feb 13, 2018 at 12:17 PM Charles Chen <cc...@google.com> wrote:
>> >>
>> >> This is now checked into master.  You can use it by setting
>> >> --runner=SwitchingDirectRunner.  Please let us know if you run into any
>> >> issues.
>> >>
>> >>
>> >> On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau
>> >> <rm...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Very interesting! Sounds like a sane way for beam future and I'm very
>> >>> happy it is consistent with the current Java experience: no need to
>> >>> interlace runners at the end, it makes design, code and user
>> >>> experience way
>> >>> better than trying to put everything in the direct runner :).
>> >>>
>> >>> Le 8 févr. 2018 19:20, "María García Herrero" <ma...@google.com> a
>> >>> écrit :
>> >>>>
>> >>>> Amazing improvement, Charles.
>> >>>> Thanks for the effort!
>> >>>>
>> >>>>
>> >>>> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov
>> >>>> <ki...@google.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Sounds awesome, congratulations and thanks for making this happen!
>> >>>>>
>> >>>>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <ra...@google.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> This is terrific news! Thanks Charles.
>> >>>>>>
>> >>>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Local execution of Beam pipelines on the Python DirectRunner
>> >>>>>>> currently suffers from performance issues, which makes it hard for
>> >>>>>>> pipeline
>> >>>>>>> authors to iterate, especially on medium to large size datasets.
>> >>>>>>> We would
>> >>>>>>> like to optimize and make this a better experience for Beam users.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> The FnApiRunner was written as a way of leveraging the portability
>> >>>>>>> framework execution code path for local portability development.
>> >>>>>>> We've found
>> >>>>>>> it also provides great speedups in batch execution with no user
>> >>>>>>> changes
>> >>>>>>> required, so we propose to switch to use this runner by default in
>> >>>>>>> batch
>> >>>>>>> pipelines.  For example, WordCount on the Shakespeare dataset with
>> >>>>>>> a single
>> >>>>>>> CPU core now takes 50 seconds to run, compared to 12 minutes
>> >>>>>>> before; this is
>> >>>>>>> a 15x performance improvement that users can get for free, with no
>> >>>>>>> user
>> >>>>>>> pipeline changes.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> The JIRA for this change is here
>> >>>>>>> (https://issues.apache.org/jira/browse/BEAM-3644), and a candidate
>> >>>>>>> patch is
>> >>>>>>> available here (https://github.com/apache/beam/pull/4634). I have
>> >>>>>>> been
>> >>>>>>> working over the last month on making this an automatic drop-in
>> >>>>>>> replacement
>> >>>>>>> for the current DirectRunner when applicable.  Before it becomes
>> >>>>>>> the
>> >>>>>>> default, you can try this runner now by manually specifying
>> >>>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
>> >>>>>>> runner.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Even with this change, local Python pipeline execution can only
>> >>>>>>> effectively use one core because of the Python GIL.  A natural
>> >>>>>>> next step to
>> >>>>>>> further improve performance will be to refactor the FnApiRunner to
>> >>>>>>> allow for
>> >>>>>>> multi-process execution.  This is being tracked here
>> >>>>>>> (https://issues.apache.org/jira/browse/BEAM-3645).
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>>
>> >>>>>>> Charles
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Impact is the effect that wouldn’t have happened if you hadn’t done
>> >>>> what
>> >>>> you did.
>> >>>>
>> >>>>
>> >

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Marián Dvorský <ma...@google.com>.

Does the same runner work for Java pipelines? (I assume so, given that it
uses portability framework.) If so, does it provide similar speedup?

On Fri, Feb 16, 2018 at 7:37 PM Robert Bradshaw <ro...@google.com> wrote:

> If there are no concerns, I say let's merge this.
>
> On Fri, Feb 16, 2018 at 9:39 AM, Charles Chen <cc...@google.com> wrote:
> > I hope those interested have had time to test this out.  I have sent out
> > https://github.com/apache/beam/pull/4696 to switch to using this fast
> runner
> > as the default DirectRunner for local execution.  Let me know if there
> are
> > any concerns.
> >
> > On Tue, Feb 13, 2018 at 12:17 PM Charles Chen <cc...@google.com> wrote:
> >>
> >> This is now checked into master.  You can use it by setting
> >> --runner=SwitchingDirectRunner.  Please let us know if you run into any
> >> issues.
> >>
> >>
> >> On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau <
> rmannibucau@gmail.com>
> >> wrote:
> >>>
> >>> Very interesting! Sounds like a sane way for beam future and I'm very
> >>> happy it is consistent with the current Java experience: no need to
> >>> interlace runners at the end, it makes design, code and user
> experience way
> >>> better than trying to put everything in the direct runner :).
> >>>
> >>> Le 8 févr. 2018 19:20, "María García Herrero" <ma...@google.com> a
> >>> écrit :
> >>>>
> >>>> Amazing improvement, Charles.
> >>>> Thanks for the effort!
> >>>>
> >>>>
> >>>> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <
> kirpichov@google.com>
> >>>> wrote:
> >>>>>
> >>>>> Sounds awesome, congratulations and thanks for making this happen!
> >>>>>
> >>>>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <ra...@google.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> This is terrific news! Thanks Charles.
> >>>>>>
> >>>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com>
> wrote:
> >>>>>>>
> >>>>>>> Local execution of Beam pipelines on the Python DirectRunner
> >>>>>>> currently suffers from performance issues, which makes it hard for
> pipeline
> >>>>>>> authors to iterate, especially on medium to large size datasets.
> We would
> >>>>>>> like to optimize and make this a better experience for Beam users.
> >>>>>>>
> >>>>>>>
> >>>>>>> The FnApiRunner was written as a way of leveraging the portability
> >>>>>>> framework execution code path for local portability development.
> We've found
> >>>>>>> it also provides great speedups in batch execution with no user
> changes
> >>>>>>> required, so we propose to switch to use this runner by default in
> batch
> >>>>>>> pipelines.  For example, WordCount on the Shakespeare dataset with
> a single
> >>>>>>> CPU core now takes 50 seconds to run, compared to 12 minutes
> before; this is
> >>>>>>> a 15x performance improvement that users can get for free, with no
> user
> >>>>>>> pipeline changes.
> >>>>>>>
> >>>>>>>
> >>>>>>> The JIRA for this change is here
> >>>>>>> (https://issues.apache.org/jira/browse/BEAM-3644), and a
> candidate patch is
> >>>>>>> available here (https://github.com/apache/beam/pull/4634). I have
> been
> >>>>>>> working over the last month on making this an automatic drop-in
> replacement
> >>>>>>> for the current DirectRunner when applicable.  Before it becomes
> the
> >>>>>>> default, you can try this runner now by manually specifying
> >>>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
> runner.
> >>>>>>>
> >>>>>>>
> >>>>>>> Even with this change, local Python pipeline execution can only
> >>>>>>> effectively use one core because of the Python GIL.  A natural
> next step to
> >>>>>>> further improve performance will be to refactor the FnApiRunner to
> allow for
> >>>>>>> multi-process execution.  This is being tracked here
> >>>>>>> (https://issues.apache.org/jira/browse/BEAM-3645).
> >>>>>>>
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Charles
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Impact is the effect that wouldn’t have happened if you hadn’t done
> what
> >>>> you did.
> >>>>
> >>>>
> >
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Robert Bradshaw <ro...@google.com>.

If there are no concerns, I say let's merge this.

On Fri, Feb 16, 2018 at 9:39 AM, Charles Chen <cc...@google.com> wrote:
> I hope those interested have had time to test this out.  I have sent out
> https://github.com/apache/beam/pull/4696 to switch to using this fast runner
> as the default DirectRunner for local execution.  Let me know if there are
> any concerns.
>
> On Tue, Feb 13, 2018 at 12:17 PM Charles Chen <cc...@google.com> wrote:
>>
>> This is now checked into master.  You can use it by setting
>> --runner=SwitchingDirectRunner.  Please let us know if you run into any
>> issues.
>>
>>
>> On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>>
>>> Very interesting! Sounds like a sane way for beam future and I'm very
>>> happy it is consistent with the current Java experience: no need to
>>> interlace runners at the end, it makes design, code and user experience way
>>> better than trying to put everything in the direct runner :).
>>>
>>> Le 8 févr. 2018 19:20, "María García Herrero" <ma...@google.com> a
>>> écrit :
>>>>
>>>> Amazing improvement, Charles.
>>>> Thanks for the effort!
>>>>
>>>>
>>>> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <ki...@google.com>
>>>> wrote:
>>>>>
>>>>> Sounds awesome, congratulations and thanks for making this happen!
>>>>>
>>>>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <ra...@google.com>
>>>>> wrote:
>>>>>>
>>>>>> This is terrific news! Thanks Charles.
>>>>>>
>>>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>>>>>>
>>>>>>> Local execution of Beam pipelines on the Python DirectRunner
>>>>>>> currently suffers from performance issues, which makes it hard for pipeline
>>>>>>> authors to iterate, especially on medium to large size datasets.  We would
>>>>>>> like to optimize and make this a better experience for Beam users.
>>>>>>>
>>>>>>>
>>>>>>> The FnApiRunner was written as a way of leveraging the portability
>>>>>>> framework execution code path for local portability development. We've found
>>>>>>> it also provides great speedups in batch execution with no user changes
>>>>>>> required, so we propose to switch to use this runner by default in batch
>>>>>>> pipelines.  For example, WordCount on the Shakespeare dataset with a single
>>>>>>> CPU core now takes 50 seconds to run, compared to 12 minutes before; this is
>>>>>>> a 15x performance improvement that users can get for free, with no user
>>>>>>> pipeline changes.
>>>>>>>
>>>>>>>
>>>>>>> The JIRA for this change is here
>>>>>>> (https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch is
>>>>>>> available here (https://github.com/apache/beam/pull/4634). I have been
>>>>>>> working over the last month on making this an automatic drop-in replacement
>>>>>>> for the current DirectRunner when applicable.  Before it becomes the
>>>>>>> default, you can try this runner now by manually specifying
>>>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the runner.
>>>>>>>
>>>>>>>
>>>>>>> Even with this change, local Python pipeline execution can only
>>>>>>> effectively use one core because of the Python GIL.  A natural next step to
>>>>>>> further improve performance will be to refactor the FnApiRunner to allow for
>>>>>>> multi-process execution.  This is being tracked here
>>>>>>> (https://issues.apache.org/jira/browse/BEAM-3645).
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Charles
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Impact is the effect that wouldn’t have happened if you hadn’t done what
>>>> you did.
>>>>
>>>>
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Charles Chen <cc...@google.com>.

I hope those interested have had time to test this out.  I have sent out
https://github.com/apache/beam/pull/4696 to switch to using this fast
runner as the default DirectRunner for local execution.  Let me know if
there are any concerns.

On Tue, Feb 13, 2018 at 12:17 PM Charles Chen <cc...@google.com> wrote:

> This is now checked into master.  You can use it by setting
> --runner=SwitchingDirectRunner.  Please let us know if you run into any
> issues.
>
>
> On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>> Very interesting! Sounds like a sane way for beam future and I'm very
>> happy it is consistent with the current Java experience: no need to
>> interlace runners at the end, it makes design, code and user experience way
>> better than trying to put everything in the direct runner :).
>>
>> Le 8 févr. 2018 19:20, "María García Herrero" <ma...@google.com> a
>> écrit :
>>
>>> Amazing improvement, Charles.
>>> Thanks for the effort!
>>>
>>>
>>> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <ki...@google.com>
>>> wrote:
>>>
>>>> Sounds awesome, congratulations and thanks for making this happen!
>>>>
>>>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <ra...@google.com>
>>>> wrote:
>>>>
>>>>> This is terrific news! Thanks Charles.
>>>>>
>>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>>>>
>>>>>> Local execution of Beam pipelines on the Python DirectRunner
>>>>>> currently suffers from performance issues, which makes it hard for pipeline
>>>>>> authors to iterate, especially on medium to large size datasets.  We would
>>>>>> like to optimize and make this a better experience for Beam users.
>>>>>>
>>>>>> The FnApiRunner was written as a way of leveraging the portability
>>>>>> framework execution code path for local portability development. We've
>>>>>> found it also provides great speedups in batch execution with no user
>>>>>> changes required, so we propose to switch to use this runner by default in
>>>>>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>>>>>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>>>>>> this is a 15x performance improvement that users can get for free,
>>>>>> with no user pipeline changes.
>>>>>>
>>>>>> The JIRA for this change is here (
>>>>>> https://issues.apache.org/jira/browse/BEAM-3644), and a candidate
>>>>>> patch is available here (https://github.com/apache/beam/pull/4634).
>>>>>> I have been working over the last month on making this an automatic drop-in
>>>>>> replacement for the current DirectRunner when applicable.  Before it
>>>>>> becomes the default, you can try this runner now by manually specifying
>>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
>>>>>> runner.
>>>>>>
>>>>>> Even with this change, local Python pipeline execution can only
>>>>>> effectively use one core because of the Python GIL.  A natural next step to
>>>>>> further improve performance will be to refactor the FnApiRunner to allow
>>>>>> for multi-process execution.  This is being tracked here (
>>>>>> https://issues.apache.org/jira/browse/BEAM-3645).
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Charles
>>>>>>
>>>>>
>>>
>>> --
>>>
>>> Impact is the effect that wouldn’t have happened if you hadn’t done what you
>>> did.
>>>
>>>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Charles Chen <cc...@google.com>.

This is now checked into master.  You can use it by setting
--runner=SwitchingDirectRunner.  Please let us know if you run into any
issues.


On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Very interesting! Sounds like a sane way for beam future and I'm very
> happy it is consistent with the current Java experience: no need to
> interlace runners at the end, it makes design, code and user experience way
> better than trying to put everything in the direct runner :).
>
> Le 8 févr. 2018 19:20, "María García Herrero" <ma...@google.com> a
> écrit :
>
>> Amazing improvement, Charles.
>> Thanks for the effort!
>>
>>
>> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Sounds awesome, congratulations and thanks for making this happen!
>>>
>>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <ra...@google.com> wrote:
>>>
>>>> This is terrific news! Thanks Charles.
>>>>
>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>>>
>>>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>>>> suffers from performance issues, which makes it hard for pipeline authors
>>>>> to iterate, especially on medium to large size datasets.  We would like to
>>>>> optimize and make this a better experience for Beam users.
>>>>>
>>>>> The FnApiRunner was written as a way of leveraging the portability
>>>>> framework execution code path for local portability development. We've
>>>>> found it also provides great speedups in batch execution with no user
>>>>> changes required, so we propose to switch to use this runner by default in
>>>>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>>>>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>>>>> this is a 15x performance improvement that users can get for free,
>>>>> with no user pipeline changes.
>>>>>
>>>>> The JIRA for this change is here (
>>>>> https://issues.apache.org/jira/browse/BEAM-3644), and a candidate
>>>>> patch is available here (https://github.com/apache/beam/pull/4634). I
>>>>> have been working over the last month on making this an automatic drop-in
>>>>> replacement for the current DirectRunner when applicable.  Before it
>>>>> becomes the default, you can try this runner now by manually specifying
>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
>>>>> runner.
>>>>>
>>>>> Even with this change, local Python pipeline execution can only
>>>>> effectively use one core because of the Python GIL.  A natural next step to
>>>>> further improve performance will be to refactor the FnApiRunner to allow
>>>>> for multi-process execution.  This is being tracked here (
>>>>> https://issues.apache.org/jira/browse/BEAM-3645).
>>>>>
>>>>> Best,
>>>>>
>>>>> Charles
>>>>>
>>>>
>>
>> --
>>
>> Impact is the effect that wouldn’t have happened if you hadn’t done what you
>> did.
>>
>>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Romain Manni-Bucau <rm...@gmail.com>.

Very interesting! Sounds like a sane way for beam future and I'm very happy
it is consistent with the current Java experience: no need to interlace
runners at the end, it makes design, code and user experience way better
than trying to put everything in the direct runner :).

Le 8 févr. 2018 19:20, "María García Herrero" <ma...@google.com> a écrit :

> Amazing improvement, Charles.
> Thanks for the effort!
>
>
> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> Sounds awesome, congratulations and thanks for making this happen!
>>
>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <ra...@google.com> wrote:
>>
>>> This is terrific news! Thanks Charles.
>>>
>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>>
>>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>>> suffers from performance issues, which makes it hard for pipeline authors
>>>> to iterate, especially on medium to large size datasets.  We would like to
>>>> optimize and make this a better experience for Beam users.
>>>>
>>>> The FnApiRunner was written as a way of leveraging the portability
>>>> framework execution code path for local portability development. We've
>>>> found it also provides great speedups in batch execution with no user
>>>> changes required, so we propose to switch to use this runner by default in
>>>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>>>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>>>> this is a 15x performance improvement that users can get for free,
>>>> with no user pipeline changes.
>>>>
>>>> The JIRA for this change is here (https://issues.apache.org/
>>>> jira/browse/BEAM-3644), and a candidate patch is available here (
>>>> https://github.com/apache/beam/pull/4634). I have been working over
>>>> the last month on making this an automatic drop-in replacement for the
>>>> current DirectRunner when applicable.  Before it becomes the default, you
>>>> can try this runner now by manually specifying apache_beam.runners.
>>>> portability.fn_api_runner.FnApiRunner as the runner.
>>>>
>>>> Even with this change, local Python pipeline execution can only
>>>> effectively use one core because of the Python GIL.  A natural next step to
>>>> further improve performance will be to refactor the FnApiRunner to allow
>>>> for multi-process execution.  This is being tracked here (
>>>> https://issues.apache.org/jira/browse/BEAM-3645).
>>>>
>>>> Best,
>>>>
>>>> Charles
>>>>
>>>
>
> --
>
> Impact is the effect that wouldn’t have happened if you hadn’t done what you
> did.
>
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by María García Herrero <ma...@google.com>.

Amazing improvement, Charles.
Thanks for the effort!


On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <ki...@google.com>
wrote:

> Sounds awesome, congratulations and thanks for making this happen!
>
> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <ra...@google.com> wrote:
>
>> This is terrific news! Thanks Charles.
>>
>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>
>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>> suffers from performance issues, which makes it hard for pipeline authors
>>> to iterate, especially on medium to large size datasets.  We would like to
>>> optimize and make this a better experience for Beam users.
>>>
>>> The FnApiRunner was written as a way of leveraging the portability
>>> framework execution code path for local portability development. We've
>>> found it also provides great speedups in batch execution with no user
>>> changes required, so we propose to switch to use this runner by default in
>>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>>> this is a 15x performance improvement that users can get for free, with
>>> no user pipeline changes.
>>>
>>> The JIRA for this change is here (
>>> https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch
>>> is available here (https://github.com/apache/beam/pull/4634). I have
>>> been working over the last month on making this an automatic drop-in
>>> replacement for the current DirectRunner when applicable.  Before it
>>> becomes the default, you can try this runner now by manually specifying
>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
>>> runner.
>>>
>>> Even with this change, local Python pipeline execution can only
>>> effectively use one core because of the Python GIL.  A natural next step to
>>> further improve performance will be to refactor the FnApiRunner to allow
>>> for multi-process execution.  This is being tracked here (
>>> https://issues.apache.org/jira/browse/BEAM-3645).
>>>
>>> Best,
>>>
>>> Charles
>>>
>>

-- 

Impact is the effect that wouldn’t have happened if you hadn’t done what you
did.

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Eugene Kirpichov <ki...@google.com>.

Sounds awesome, congratulations and thanks for making this happen!

On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <ra...@google.com> wrote:

> This is terrific news! Thanks Charles.
>
> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>
>> Local execution of Beam pipelines on the Python DirectRunner currently
>> suffers from performance issues, which makes it hard for pipeline authors
>> to iterate, especially on medium to large size datasets.  We would like to
>> optimize and make this a better experience for Beam users.
>>
>> The FnApiRunner was written as a way of leveraging the portability
>> framework execution code path for local portability development. We've
>> found it also provides great speedups in batch execution with no user
>> changes required, so we propose to switch to use this runner by default in
>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>> this is a 15x performance improvement that users can get for free, with
>> no user pipeline changes.
>>
>> The JIRA for this change is here (
>> https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch
>> is available here (https://github.com/apache/beam/pull/4634). I have
>> been working over the last month on making this an automatic drop-in
>> replacement for the current DirectRunner when applicable.  Before it
>> becomes the default, you can try this runner now by manually specifying
>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the runner.
>>
>> Even with this change, local Python pipeline execution can only
>> effectively use one core because of the Python GIL.  A natural next step to
>> further improve performance will be to refactor the FnApiRunner to allow
>> for multi-process execution.  This is being tracked here (
>> https://issues.apache.org/jira/browse/BEAM-3645).
>>
>> Best,
>>
>> Charles
>>
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Raghu Angadi <ra...@google.com>.

This is terrific news! Thanks Charles.

On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:

> Local execution of Beam pipelines on the Python DirectRunner currently
> suffers from performance issues, which makes it hard for pipeline authors
> to iterate, especially on medium to large size datasets.  We would like to
> optimize and make this a better experience for Beam users.
>
> The FnApiRunner was written as a way of leveraging the portability
> framework execution code path for local portability development. We've
> found it also provides great speedups in batch execution with no user
> changes required, so we propose to switch to use this runner by default in
> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
> this is a 15x performance improvement that users can get for free, with
> no user pipeline changes.
>
> The JIRA for this change is here (https://issues.apache.org/
> jira/browse/BEAM-3644), and a candidate patch is available here (
> https://github.com/apache/beam/pull/4634). I have been working over the
> last month on making this an automatic drop-in replacement for the current
> DirectRunner when applicable.  Before it becomes the default, you can try
> this runner now by manually specifying apache_beam.runners.
> portability.fn_api_runner.FnApiRunner as the runner.
>
> Even with this change, local Python pipeline execution can only
> effectively use one core because of the Python GIL.  A natural next step to
> further improve performance will be to refactor the FnApiRunner to allow
> for multi-process execution.  This is being tracked here (
> https://issues.apache.org/jira/browse/BEAM-3645).
>
> Best,
>
> Charles
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Henning Rohde <he...@google.com>.

Awesome! Well done, Charles.

On Thu, Feb 8, 2018 at 9:10 AM, Ismaël Mejía <ie...@gmail.com> wrote:

> Sounds impressive, and with the extra portability stuff, great !
> Worth the switch just for he user experience improvement.
>
> On Thu, Feb 8, 2018 at 5:52 PM, Robert Bradshaw <ro...@google.com>
> wrote:
> > This is going to be a great improvement for our users! I'll take a
> > look at the pull request.
> >
> > On Wed, Feb 7, 2018 at 7:03 PM, Kenneth Knowles <kl...@google.com> wrote:
> >> Nice!
> >>
> >> On Wed, Feb 7, 2018 at 6:45 PM, Charles Chen <cc...@google.com> wrote:
> >>>
> >>> The existing DirectRunner will be needed for the foreseeable future
> since
> >>> it is currently the only local runner that supports streaming
> execution.
> >>>
> >>>
> >>> On Wed, Feb 7, 2018, 6:39 PM Pablo Estrada <pa...@google.com> wrote:
> >>>>
> >>>> Very cool Charles! Have you considered whether you'll want to remove
> the
> >>>> direct runner code afterwards?
> >>>> Best
> >>>> -P.
> >>>>
> >>>>
> >>>> On Wed, Feb 7, 2018, 6:25 PM Lukasz Cwik <lc...@google.com> wrote:
> >>>>>
> >>>>> That is pretty awesome.
> >>>>>
> >>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
> >>>>>>
> >>>>>> Local execution of Beam pipelines on the Python DirectRunner
> currently
> >>>>>> suffers from performance issues, which makes it hard for pipeline
> authors to
> >>>>>> iterate, especially on medium to large size datasets.  We would
> like to
> >>>>>> optimize and make this a better experience for Beam users.
> >>>>>>
> >>>>>>
> >>>>>> The FnApiRunner was written as a way of leveraging the portability
> >>>>>> framework execution code path for local portability development.
> We've found
> >>>>>> it also provides great speedups in batch execution with no user
> changes
> >>>>>> required, so we propose to switch to use this runner by default in
> batch
> >>>>>> pipelines.  For example, WordCount on the Shakespeare dataset with
> a single
> >>>>>> CPU core now takes 50 seconds to run, compared to 12 minutes
> before; this is
> >>>>>> a 15x performance improvement that users can get for free, with no
> user
> >>>>>> pipeline changes.
> >>>>>>
> >>>>>>
> >>>>>> The JIRA for this change is here
> >>>>>> (https://issues.apache.org/jira/browse/BEAM-3644), and a candidate
> patch is
> >>>>>> available here (https://github.com/apache/beam/pull/4634). I have
> been
> >>>>>> working over the last month on making this an automatic drop-in
> replacement
> >>>>>> for the current DirectRunner when applicable.  Before it becomes the
> >>>>>> default, you can try this runner now by manually specifying
> >>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
> runner.
> >>>>>>
> >>>>>>
> >>>>>> Even with this change, local Python pipeline execution can only
> >>>>>> effectively use one core because of the Python GIL.  A natural next
> step to
> >>>>>> further improve performance will be to refactor the FnApiRunner to
> allow for
> >>>>>> multi-process execution.  This is being tracked here
> >>>>>> (https://issues.apache.org/jira/browse/BEAM-3645).
> >>>>>>
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> Charles
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Got feedback? go/pabloem-feedback
> >>
> >>
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Ismaël Mejía <ie...@gmail.com>.

Sounds impressive, and with the extra portability stuff, great !
Worth the switch just for he user experience improvement.

On Thu, Feb 8, 2018 at 5:52 PM, Robert Bradshaw <ro...@google.com> wrote:
> This is going to be a great improvement for our users! I'll take a
> look at the pull request.
>
> On Wed, Feb 7, 2018 at 7:03 PM, Kenneth Knowles <kl...@google.com> wrote:
>> Nice!
>>
>> On Wed, Feb 7, 2018 at 6:45 PM, Charles Chen <cc...@google.com> wrote:
>>>
>>> The existing DirectRunner will be needed for the foreseeable future since
>>> it is currently the only local runner that supports streaming execution.
>>>
>>>
>>> On Wed, Feb 7, 2018, 6:39 PM Pablo Estrada <pa...@google.com> wrote:
>>>>
>>>> Very cool Charles! Have you considered whether you'll want to remove the
>>>> direct runner code afterwards?
>>>> Best
>>>> -P.
>>>>
>>>>
>>>> On Wed, Feb 7, 2018, 6:25 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>> That is pretty awesome.
>>>>>
>>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>>>>>
>>>>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>>>>> suffers from performance issues, which makes it hard for pipeline authors to
>>>>>> iterate, especially on medium to large size datasets.  We would like to
>>>>>> optimize and make this a better experience for Beam users.
>>>>>>
>>>>>>
>>>>>> The FnApiRunner was written as a way of leveraging the portability
>>>>>> framework execution code path for local portability development. We've found
>>>>>> it also provides great speedups in batch execution with no user changes
>>>>>> required, so we propose to switch to use this runner by default in batch
>>>>>> pipelines.  For example, WordCount on the Shakespeare dataset with a single
>>>>>> CPU core now takes 50 seconds to run, compared to 12 minutes before; this is
>>>>>> a 15x performance improvement that users can get for free, with no user
>>>>>> pipeline changes.
>>>>>>
>>>>>>
>>>>>> The JIRA for this change is here
>>>>>> (https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch is
>>>>>> available here (https://github.com/apache/beam/pull/4634). I have been
>>>>>> working over the last month on making this an automatic drop-in replacement
>>>>>> for the current DirectRunner when applicable.  Before it becomes the
>>>>>> default, you can try this runner now by manually specifying
>>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the runner.
>>>>>>
>>>>>>
>>>>>> Even with this change, local Python pipeline execution can only
>>>>>> effectively use one core because of the Python GIL.  A natural next step to
>>>>>> further improve performance will be to refactor the FnApiRunner to allow for
>>>>>> multi-process execution.  This is being tracked here
>>>>>> (https://issues.apache.org/jira/browse/BEAM-3645).
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Charles
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Got feedback? go/pabloem-feedback
>>
>>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Robert Bradshaw <ro...@google.com>.

This is going to be a great improvement for our users! I'll take a
look at the pull request.

On Wed, Feb 7, 2018 at 7:03 PM, Kenneth Knowles <kl...@google.com> wrote:
> Nice!
>
> On Wed, Feb 7, 2018 at 6:45 PM, Charles Chen <cc...@google.com> wrote:
>>
>> The existing DirectRunner will be needed for the foreseeable future since
>> it is currently the only local runner that supports streaming execution.
>>
>>
>> On Wed, Feb 7, 2018, 6:39 PM Pablo Estrada <pa...@google.com> wrote:
>>>
>>> Very cool Charles! Have you considered whether you'll want to remove the
>>> direct runner code afterwards?
>>> Best
>>> -P.
>>>
>>>
>>> On Wed, Feb 7, 2018, 6:25 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>> That is pretty awesome.
>>>>
>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>>>>
>>>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>>>> suffers from performance issues, which makes it hard for pipeline authors to
>>>>> iterate, especially on medium to large size datasets.  We would like to
>>>>> optimize and make this a better experience for Beam users.
>>>>>
>>>>>
>>>>> The FnApiRunner was written as a way of leveraging the portability
>>>>> framework execution code path for local portability development. We've found
>>>>> it also provides great speedups in batch execution with no user changes
>>>>> required, so we propose to switch to use this runner by default in batch
>>>>> pipelines.  For example, WordCount on the Shakespeare dataset with a single
>>>>> CPU core now takes 50 seconds to run, compared to 12 minutes before; this is
>>>>> a 15x performance improvement that users can get for free, with no user
>>>>> pipeline changes.
>>>>>
>>>>>
>>>>> The JIRA for this change is here
>>>>> (https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch is
>>>>> available here (https://github.com/apache/beam/pull/4634). I have been
>>>>> working over the last month on making this an automatic drop-in replacement
>>>>> for the current DirectRunner when applicable.  Before it becomes the
>>>>> default, you can try this runner now by manually specifying
>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the runner.
>>>>>
>>>>>
>>>>> Even with this change, local Python pipeline execution can only
>>>>> effectively use one core because of the Python GIL.  A natural next step to
>>>>> further improve performance will be to refactor the FnApiRunner to allow for
>>>>> multi-process execution.  This is being tracked here
>>>>> (https://issues.apache.org/jira/browse/BEAM-3645).
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Charles
>>>>
>>>>
>>>
>>>
>>> --
>>> Got feedback? go/pabloem-feedback
>
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Kenneth Knowles <kl...@google.com>.

Nice!

On Wed, Feb 7, 2018 at 6:45 PM, Charles Chen <cc...@google.com> wrote:

> The existing DirectRunner will be needed for the foreseeable future since
> it is currently the only local runner that supports streaming execution.
>
> On Wed, Feb 7, 2018, 6:39 PM Pablo Estrada <pa...@google.com> wrote:
>
>> Very cool Charles! Have you considered whether you'll want to remove the
>> direct runner code afterwards?
>> Best
>> -P.
>>
>> On Wed, Feb 7, 2018, 6:25 PM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> That is pretty awesome.
>>>
>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>>
>>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>>> suffers from performance issues, which makes it hard for pipeline authors
>>>> to iterate, especially on medium to large size datasets.  We would like to
>>>> optimize and make this a better experience for Beam users.
>>>>
>>>> The FnApiRunner was written as a way of leveraging the portability
>>>> framework execution code path for local portability development. We've
>>>> found it also provides great speedups in batch execution with no user
>>>> changes required, so we propose to switch to use this runner by default in
>>>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>>>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>>>> this is a 15x performance improvement that users can get for free,
>>>> with no user pipeline changes.
>>>>
>>>> The JIRA for this change is here (https://issues.apache.org/
>>>> jira/browse/BEAM-3644), and a candidate patch is available here (
>>>> https://github.com/apache/beam/pull/4634). I have been working over
>>>> the last month on making this an automatic drop-in replacement for the
>>>> current DirectRunner when applicable.  Before it becomes the default, you
>>>> can try this runner now by manually specifying apache_beam.runners.
>>>> portability.fn_api_runner.FnApiRunner as the runner.
>>>>
>>>> Even with this change, local Python pipeline execution can only
>>>> effectively use one core because of the Python GIL.  A natural next step to
>>>> further improve performance will be to refactor the FnApiRunner to allow
>>>> for multi-process execution.  This is being tracked here (
>>>> https://issues.apache.org/jira/browse/BEAM-3645).
>>>>
>>>> Best,
>>>>
>>>> Charles
>>>>
>>>
>>>
>>
>> --
>> Got feedback? go/pabloem-feedback
>> <https://goto.google.com/pabloem-feedback>
>>
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Charles Chen <cc...@google.com>.

The existing DirectRunner will be needed for the foreseeable future since
it is currently the only local runner that supports streaming execution.

On Wed, Feb 7, 2018, 6:39 PM Pablo Estrada <pa...@google.com> wrote:

> Very cool Charles! Have you considered whether you'll want to remove the
> direct runner code afterwards?
> Best
> -P.
>
> On Wed, Feb 7, 2018, 6:25 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> That is pretty awesome.
>>
>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>>
>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>> suffers from performance issues, which makes it hard for pipeline authors
>>> to iterate, especially on medium to large size datasets.  We would like to
>>> optimize and make this a better experience for Beam users.
>>>
>>> The FnApiRunner was written as a way of leveraging the portability
>>> framework execution code path for local portability development. We've
>>> found it also provides great speedups in batch execution with no user
>>> changes required, so we propose to switch to use this runner by default in
>>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>>> this is a 15x performance improvement that users can get for free, with
>>> no user pipeline changes.
>>>
>>> The JIRA for this change is here (
>>> https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch
>>> is available here (https://github.com/apache/beam/pull/4634). I have
>>> been working over the last month on making this an automatic drop-in
>>> replacement for the current DirectRunner when applicable.  Before it
>>> becomes the default, you can try this runner now by manually specifying
>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
>>> runner.
>>>
>>> Even with this change, local Python pipeline execution can only
>>> effectively use one core because of the Python GIL.  A natural next step to
>>> further improve performance will be to refactor the FnApiRunner to allow
>>> for multi-process execution.  This is being tracked here (
>>> https://issues.apache.org/jira/browse/BEAM-3645).
>>>
>>> Best,
>>>
>>> Charles
>>>
>>
>>
>
> --
> Got feedback? go/pabloem-feedback
> <https://goto.google.com/pabloem-feedback>
>

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Pablo Estrada <pa...@google.com>.

Very cool Charles! Have you considered whether you'll want to remove the
direct runner code afterwards?
Best
-P.

On Wed, Feb 7, 2018, 6:25 PM Lukasz Cwik <lc...@google.com> wrote:

> That is pretty awesome.
>
> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:
>
>> Local execution of Beam pipelines on the Python DirectRunner currently
>> suffers from performance issues, which makes it hard for pipeline authors
>> to iterate, especially on medium to large size datasets.  We would like to
>> optimize and make this a better experience for Beam users.
>>
>> The FnApiRunner was written as a way of leveraging the portability
>> framework execution code path for local portability development. We've
>> found it also provides great speedups in batch execution with no user
>> changes required, so we propose to switch to use this runner by default in
>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>> this is a 15x performance improvement that users can get for free, with
>> no user pipeline changes.
>>
>> The JIRA for this change is here (
>> https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch
>> is available here (https://github.com/apache/beam/pull/4634). I have
>> been working over the last month on making this an automatic drop-in
>> replacement for the current DirectRunner when applicable.  Before it
>> becomes the default, you can try this runner now by manually specifying
>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the runner.
>>
>> Even with this change, local Python pipeline execution can only
>> effectively use one core because of the Python GIL.  A natural next step to
>> further improve performance will be to refactor the FnApiRunner to allow
>> for multi-process execution.  This is being tracked here (
>> https://issues.apache.org/jira/browse/BEAM-3645).
>>
>> Best,
>>
>> Charles
>>
>
>

-- 
Got feedback? go/pabloem-feedback

Re: A 15x speed-up in local Python DirectRunner execution

Posted by Lukasz Cwik <lc...@google.com>.

That is pretty awesome.

On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <cc...@google.com> wrote:

> Local execution of Beam pipelines on the Python DirectRunner currently
> suffers from performance issues, which makes it hard for pipeline authors
> to iterate, especially on medium to large size datasets.  We would like to
> optimize and make this a better experience for Beam users.
>
> The FnApiRunner was written as a way of leveraging the portability
> framework execution code path for local portability development. We've
> found it also provides great speedups in batch execution with no user
> changes required, so we propose to switch to use this runner by default in
> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
> this is a 15x performance improvement that users can get for free, with
> no user pipeline changes.
>
> The JIRA for this change is here (https://issues.apache.org/
> jira/browse/BEAM-3644), and a candidate patch is available here (
> https://github.com/apache/beam/pull/4634). I have been working over the
> last month on making this an automatic drop-in replacement for the current
> DirectRunner when applicable.  Before it becomes the default, you can try
> this runner now by manually specifying apache_beam.runners.
> portability.fn_api_runner.FnApiRunner as the runner.
>
> Even with this change, local Python pipeline execution can only
> effectively use one core because of the Python GIL.  A natural next step to
> further improve performance will be to refactor the FnApiRunner to allow
> for multi-process execution.  This is being tracked here (
> https://issues.apache.org/jira/browse/BEAM-3645).
>
> Best,
>
> Charles
>