You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Sam Rohde <sr...@google.com> on 2020/03/18 21:38:11 UTC

[Interactive Runner] now available on master

Hi All!



I am happy to announce that an improved Interactive Runner is now available
on master. This Python runner allows for the interactive development of
Beam pipelines in a notebook (and IPython) environment.



The runner still has some bugs that need to be fixed as well as some
refactoring, but it is in a good enough shape to start using it.



Here are the new things you can do with the Interactive Runner:

   -

   Create and execute pipelines within a REPL
   -

   Visualize elements as the pipeline is running
   -

   Materialize PCollections to DataFrames
   -

   Record unbounded sources for deterministic replay
   -

   Replay cached unbounded sources including watermark advancements

The code lives in sdks/python/apache_beam/runners/interactive
<https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
and example notebooks are in
sdks/python/apache_beam/runners/interactive/examples
<https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
.



To install, use `pip install -e .[interactive]` in your <project
root>/sdks/python directory.

To run, here’s a quick example:

```

import apache_beam as beam

from apache_beam.runners.interactive.interactive_runner import
InteractiveRunner

import apache_beam.runners.interactive.interactive_beam as ib



p = beam.Pipeline(InteractiveRunner())

words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])

counts = words | 'count' >> beam.combiners.Count.PerElement()



# Shows a dynamically updating display of the PCollection elements

ib.show(counts)



# We can now visualize the data using standard pandas operations.

df = ib.collect(counts)

print(df.info())

print(df.describe())



# Plot the top-10 counted words

df = df.sort_values(by=1, ascending=False)

df.head(n=10).plot(x=0, y=1)

```



Currently, Batch is supported on any runner. Streaming is only supported on
the DirectRunner (non-FnAPI).



I would like to thank the great work of Sindy (@sindyli) and Harsh
(@ananvay) for the initial implementation,

David Yan (@davidyan) who led the project, Ning (@ningk) and myself
(@srohde) for the implementation and design, and Ahmet (@altay), Daniel
(@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
lot of their time to help with the design and code reviews.



It was a team effort and we wouldn't have been able to complete it without
the help of everyone involved.



Regards,

Sam

Re: [Interactive Runner] now available on master

Posted by Maximilian Michels <mx...@apache.org>.

Great work! This will also be super handy for demoing Beam. Looking
forward to playing around with this :)

On 19.03.20 00:52, Kenneth Knowles wrote:
> Nice!
> 
> On Wed, Mar 18, 2020 at 2:58 PM Ahmet Altay <altay@google.com
> <ma...@google.com>> wrote:
> 
>     Great to see this progress! :)
> 
>     On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <rez@google.com
>     <ma...@google.com>> wrote:
> 
>         Awesome !
> 
>         On Thu, 19 Mar 2020, 05:38 Sam Rohde, <srohde@google.com
>         <ma...@google.com>> wrote:
> 
>             Hi All!
> 
>              
> 
>             I am happy to announce that an improved Interactive Runner
>             is now available on master. This Python runner allows for
>             the interactive development of Beam pipelines in a notebook
>             (and IPython) environment.
> 
>              
> 
>             The runner still has some bugs that need to be fixed as well
>             as some refactoring, but it is in a good enough shape to
>             start using it.
> 
>              
> 
>             Here are the new things you can do with the Interactive Runner:
> 
>               *
> 
>                 Create and execute pipelines within a REPL
> 
>               *
> 
>                 Visualize elements as the pipeline is running
> 
>               *
> 
>                 Materialize PCollections to DataFrames
> 
>               *
> 
>                 Record unbounded sources for deterministic replay
> 
>               *
> 
>                 Replay cached unbounded sources including watermark
>                 advancements
> 
>             The code lives insdks/python/apache_beam/runners/interactive
>             <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>and
>             example notebooks are
>             insdks/python/apache_beam/runners/interactive/examples
>             <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>.
> 
>              
> 
>             To install, use `pip install -e .[interactive]` in your
>             <project root>/sdks/python directory.
> 
>             To run, here’s a quick example:
> 
>             ```
> 
>             import apache_beam as beam
> 
>             from apache_beam.runners.interactive.interactive_runner
>             import InteractiveRunner
> 
>             import apache_beam.runners.interactive.interactive_beam as ib
> 
>              
> 
>             p = beam.Pipeline(InteractiveRunner())
> 
>             words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
> 
>             counts = words | 'count' >> beam.combiners.Count.PerElement()
> 
>              
> 
>             # Shows a dynamically updating display of the PCollection
>             elements
> 
>             ib.show(counts)
> 
>              
> 
>             # We can now visualize the data using standard pandas
>             operations.
> 
>             df = ib.collect(counts)
> 
>             print(df.info <http://df.info>())
> 
>             print(df.describe())
> 
>              
> 
>             # Plot the top-10 counted words
> 
>             df = df.sort_values(by=1, ascending=False)
> 
>             df.head(n=10).plot(x=0, y=1)
> 
>             ```
> 
>              
> 
>             Currently, Batch is supported on any runner. Streaming is
>             only supported on the DirectRunner (non-FnAPI).
> 
>              
> 
>             I would like to thank the great work of Sindy (@sindyli) and
>             Harsh (@ananvay) for the initial implementation,
> 
>             David Yan (@davidyan) who led the project, Ning (@ningk) and
>             myself (@srohde) for the implementation and design, and
>             Ahmet (@altay), Daniel (@millsd), Pablo (@pabloem), and
>             Robert (@robertwb) who all contributed a lot of their time
>             to help with the design and code reviews.
> 
>              
> 
>             It was a team effort and we wouldn't have been able to
>             complete it without the help of everyone involved.
> 
>              
> 
>             Regards,
> 
>             Sam
> 
>

Re: [Interactive Runner] now available on master

Posted by Kenneth Knowles <ke...@apache.org>.

Nice!

On Wed, Mar 18, 2020 at 2:58 PM Ahmet Altay <al...@google.com> wrote:

> Great to see this progress! :)
>
> On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <re...@google.com> wrote:
>
>> Awesome !
>>
>> On Thu, 19 Mar 2020, 05:38 Sam Rohde, <sr...@google.com> wrote:
>>
>>> Hi All!
>>>
>>>
>>>
>>> I am happy to announce that an improved Interactive Runner is now
>>> available on master. This Python runner allows for the interactive
>>> development of Beam pipelines in a notebook (and IPython) environment.
>>>
>>>
>>>
>>> The runner still has some bugs that need to be fixed as well as some
>>> refactoring, but it is in a good enough shape to start using it.
>>>
>>>
>>>
>>> Here are the new things you can do with the Interactive Runner:
>>>
>>>    -
>>>
>>>    Create and execute pipelines within a REPL
>>>    -
>>>
>>>    Visualize elements as the pipeline is running
>>>    -
>>>
>>>    Materialize PCollections to DataFrames
>>>    -
>>>
>>>    Record unbounded sources for deterministic replay
>>>    -
>>>
>>>    Replay cached unbounded sources including watermark advancements
>>>
>>> The code lives in sdks/python/apache_beam/runners/interactive
>>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
>>> and example notebooks are in
>>> sdks/python/apache_beam/runners/interactive/examples
>>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
>>> .
>>>
>>>
>>>
>>> To install, use `pip install -e .[interactive]` in your <project
>>> root>/sdks/python directory.
>>>
>>> To run, here’s a quick example:
>>>
>>> ```
>>>
>>> import apache_beam as beam
>>>
>>> from apache_beam.runners.interactive.interactive_runner import
>>> InteractiveRunner
>>>
>>> import apache_beam.runners.interactive.interactive_beam as ib
>>>
>>>
>>>
>>> p = beam.Pipeline(InteractiveRunner())
>>>
>>> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>>>
>>> counts = words | 'count' >> beam.combiners.Count.PerElement()
>>>
>>>
>>>
>>> # Shows a dynamically updating display of the PCollection elements
>>>
>>> ib.show(counts)
>>>
>>>
>>>
>>> # We can now visualize the data using standard pandas operations.
>>>
>>> df = ib.collect(counts)
>>>
>>> print(df.info())
>>>
>>> print(df.describe())
>>>
>>>
>>>
>>> # Plot the top-10 counted words
>>>
>>> df = df.sort_values(by=1, ascending=False)
>>>
>>> df.head(n=10).plot(x=0, y=1)
>>>
>>> ```
>>>
>>>
>>>
>>> Currently, Batch is supported on any runner. Streaming is only supported
>>> on the DirectRunner (non-FnAPI).
>>>
>>>
>>>
>>> I would like to thank the great work of Sindy (@sindyli) and Harsh
>>> (@ananvay) for the initial implementation,
>>>
>>> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
>>> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
>>> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
>>> lot of their time to help with the design and code reviews.
>>>
>>>
>>>
>>> It was a team effort and we wouldn't have been able to complete it
>>> without the help of everyone involved.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Sam
>>>
>>>

Re: [Interactive Runner] now available on master

Posted by Ahmet Altay <al...@google.com>.

Great to see this progress! :)

On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <re...@google.com> wrote:

> Awesome !
>
> On Thu, 19 Mar 2020, 05:38 Sam Rohde, <sr...@google.com> wrote:
>
>> Hi All!
>>
>>
>>
>> I am happy to announce that an improved Interactive Runner is now
>> available on master. This Python runner allows for the interactive
>> development of Beam pipelines in a notebook (and IPython) environment.
>>
>>
>>
>> The runner still has some bugs that need to be fixed as well as some
>> refactoring, but it is in a good enough shape to start using it.
>>
>>
>>
>> Here are the new things you can do with the Interactive Runner:
>>
>>    -
>>
>>    Create and execute pipelines within a REPL
>>    -
>>
>>    Visualize elements as the pipeline is running
>>    -
>>
>>    Materialize PCollections to DataFrames
>>    -
>>
>>    Record unbounded sources for deterministic replay
>>    -
>>
>>    Replay cached unbounded sources including watermark advancements
>>
>> The code lives in sdks/python/apache_beam/runners/interactive
>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
>> and example notebooks are in
>> sdks/python/apache_beam/runners/interactive/examples
>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
>> .
>>
>>
>>
>> To install, use `pip install -e .[interactive]` in your <project
>> root>/sdks/python directory.
>>
>> To run, here’s a quick example:
>>
>> ```
>>
>> import apache_beam as beam
>>
>> from apache_beam.runners.interactive.interactive_runner import
>> InteractiveRunner
>>
>> import apache_beam.runners.interactive.interactive_beam as ib
>>
>>
>>
>> p = beam.Pipeline(InteractiveRunner())
>>
>> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>>
>> counts = words | 'count' >> beam.combiners.Count.PerElement()
>>
>>
>>
>> # Shows a dynamically updating display of the PCollection elements
>>
>> ib.show(counts)
>>
>>
>>
>> # We can now visualize the data using standard pandas operations.
>>
>> df = ib.collect(counts)
>>
>> print(df.info())
>>
>> print(df.describe())
>>
>>
>>
>> # Plot the top-10 counted words
>>
>> df = df.sort_values(by=1, ascending=False)
>>
>> df.head(n=10).plot(x=0, y=1)
>>
>> ```
>>
>>
>>
>> Currently, Batch is supported on any runner. Streaming is only supported
>> on the DirectRunner (non-FnAPI).
>>
>>
>>
>> I would like to thank the great work of Sindy (@sindyli) and Harsh
>> (@ananvay) for the initial implementation,
>>
>> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
>> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
>> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
>> lot of their time to help with the design and code reviews.
>>
>>
>>
>> It was a team effort and we wouldn't have been able to complete it
>> without the help of everyone involved.
>>
>>
>>
>> Regards,
>>
>> Sam
>>
>>

Re: [Interactive Runner] now available on master

Posted by Reza Rokni <re...@google.com>.

Awesome !

On Thu, 19 Mar 2020, 05:38 Sam Rohde, <sr...@google.com> wrote:

> Hi All!
>
>
>
> I am happy to announce that an improved Interactive Runner is now
> available on master. This Python runner allows for the interactive
> development of Beam pipelines in a notebook (and IPython) environment.
>
>
>
> The runner still has some bugs that need to be fixed as well as some
> refactoring, but it is in a good enough shape to start using it.
>
>
>
> Here are the new things you can do with the Interactive Runner:
>
>    -
>
>    Create and execute pipelines within a REPL
>    -
>
>    Visualize elements as the pipeline is running
>    -
>
>    Materialize PCollections to DataFrames
>    -
>
>    Record unbounded sources for deterministic replay
>    -
>
>    Replay cached unbounded sources including watermark advancements
>
> The code lives in sdks/python/apache_beam/runners/interactive
> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
> and example notebooks are in
> sdks/python/apache_beam/runners/interactive/examples
> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
> .
>
>
>
> To install, use `pip install -e .[interactive]` in your <project
> root>/sdks/python directory.
>
> To run, here’s a quick example:
>
> ```
>
> import apache_beam as beam
>
> from apache_beam.runners.interactive.interactive_runner import
> InteractiveRunner
>
> import apache_beam.runners.interactive.interactive_beam as ib
>
>
>
> p = beam.Pipeline(InteractiveRunner())
>
> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>
> counts = words | 'count' >> beam.combiners.Count.PerElement()
>
>
>
> # Shows a dynamically updating display of the PCollection elements
>
> ib.show(counts)
>
>
>
> # We can now visualize the data using standard pandas operations.
>
> df = ib.collect(counts)
>
> print(df.info())
>
> print(df.describe())
>
>
>
> # Plot the top-10 counted words
>
> df = df.sort_values(by=1, ascending=False)
>
> df.head(n=10).plot(x=0, y=1)
>
> ```
>
>
>
> Currently, Batch is supported on any runner. Streaming is only supported
> on the DirectRunner (non-FnAPI).
>
>
>
> I would like to thank the great work of Sindy (@sindyli) and Harsh
> (@ananvay) for the initial implementation,
>
> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
> lot of their time to help with the design and code reviews.
>
>
>
> It was a team effort and we wouldn't have been able to complete it without
> the help of everyone involved.
>
>
>
> Regards,
>
> Sam
>
>