You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Sam Rohde <sr...@google.com> on 2020/03/18 21:38:11 UTC
[Interactive Runner] now available on master
Hi All!
I am happy to announce that an improved Interactive Runner is now available
on master. This Python runner allows for the interactive development of
Beam pipelines in a notebook (and IPython) environment.
The runner still has some bugs that need to be fixed as well as some
refactoring, but it is in a good enough shape to start using it.
Here are the new things you can do with the Interactive Runner:
-
Create and execute pipelines within a REPL
-
Visualize elements as the pipeline is running
-
Materialize PCollections to DataFrames
-
Record unbounded sources for deterministic replay
-
Replay cached unbounded sources including watermark advancements
The code lives in sdks/python/apache_beam/runners/interactive
<https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
and example notebooks are in
sdks/python/apache_beam/runners/interactive/examples
<https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
.
To install, use `pip install -e .[interactive]` in your <project
root>/sdks/python directory.
To run, here’s a quick example:
```
import apache_beam as beam
from apache_beam.runners.interactive.interactive_runner import
InteractiveRunner
import apache_beam.runners.interactive.interactive_beam as ib
p = beam.Pipeline(InteractiveRunner())
words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
counts = words | 'count' >> beam.combiners.Count.PerElement()
# Shows a dynamically updating display of the PCollection elements
ib.show(counts)
# We can now visualize the data using standard pandas operations.
df = ib.collect(counts)
print(df.info())
print(df.describe())
# Plot the top-10 counted words
df = df.sort_values(by=1, ascending=False)
df.head(n=10).plot(x=0, y=1)
```
Currently, Batch is supported on any runner. Streaming is only supported on
the DirectRunner (non-FnAPI).
I would like to thank the great work of Sindy (@sindyli) and Harsh
(@ananvay) for the initial implementation,
David Yan (@davidyan) who led the project, Ning (@ningk) and myself
(@srohde) for the implementation and design, and Ahmet (@altay), Daniel
(@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
lot of their time to help with the design and code reviews.
It was a team effort and we wouldn't have been able to complete it without
the help of everyone involved.
Regards,
Sam
Re: [Interactive Runner] now available on master
Posted by Maximilian Michels <mx...@apache.org>.
Great work! This will also be super handy for demoing Beam. Looking
forward to playing around with this :)
On 19.03.20 00:52, Kenneth Knowles wrote:
> Nice!
>
> On Wed, Mar 18, 2020 at 2:58 PM Ahmet Altay <altay@google.com
> <ma...@google.com>> wrote:
>
> Great to see this progress! :)
>
> On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <rez@google.com
> <ma...@google.com>> wrote:
>
> Awesome !
>
> On Thu, 19 Mar 2020, 05:38 Sam Rohde, <srohde@google.com
> <ma...@google.com>> wrote:
>
> Hi All!
>
>
>
> I am happy to announce that an improved Interactive Runner
> is now available on master. This Python runner allows for
> the interactive development of Beam pipelines in a notebook
> (and IPython) environment.
>
>
>
> The runner still has some bugs that need to be fixed as well
> as some refactoring, but it is in a good enough shape to
> start using it.
>
>
>
> Here are the new things you can do with the Interactive Runner:
>
> *
>
> Create and execute pipelines within a REPL
>
> *
>
> Visualize elements as the pipeline is running
>
> *
>
> Materialize PCollections to DataFrames
>
> *
>
> Record unbounded sources for deterministic replay
>
> *
>
> Replay cached unbounded sources including watermark
> advancements
>
> The code lives insdks/python/apache_beam/runners/interactive
> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>and
> example notebooks are
> insdks/python/apache_beam/runners/interactive/examples
> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>.
>
>
>
> To install, use `pip install -e .[interactive]` in your
> <project root>/sdks/python directory.
>
> To run, here’s a quick example:
>
> ```
>
> import apache_beam as beam
>
> from apache_beam.runners.interactive.interactive_runner
> import InteractiveRunner
>
> import apache_beam.runners.interactive.interactive_beam as ib
>
>
>
> p = beam.Pipeline(InteractiveRunner())
>
> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>
> counts = words | 'count' >> beam.combiners.Count.PerElement()
>
>
>
> # Shows a dynamically updating display of the PCollection
> elements
>
> ib.show(counts)
>
>
>
> # We can now visualize the data using standard pandas
> operations.
>
> df = ib.collect(counts)
>
> print(df.info <http://df.info>())
>
> print(df.describe())
>
>
>
> # Plot the top-10 counted words
>
> df = df.sort_values(by=1, ascending=False)
>
> df.head(n=10).plot(x=0, y=1)
>
> ```
>
>
>
> Currently, Batch is supported on any runner. Streaming is
> only supported on the DirectRunner (non-FnAPI).
>
>
>
> I would like to thank the great work of Sindy (@sindyli) and
> Harsh (@ananvay) for the initial implementation,
>
> David Yan (@davidyan) who led the project, Ning (@ningk) and
> myself (@srohde) for the implementation and design, and
> Ahmet (@altay), Daniel (@millsd), Pablo (@pabloem), and
> Robert (@robertwb) who all contributed a lot of their time
> to help with the design and code reviews.
>
>
>
> It was a team effort and we wouldn't have been able to
> complete it without the help of everyone involved.
>
>
>
> Regards,
>
> Sam
>
>
Re: [Interactive Runner] now available on master
Posted by Kenneth Knowles <ke...@apache.org>.
Nice!
On Wed, Mar 18, 2020 at 2:58 PM Ahmet Altay <al...@google.com> wrote:
> Great to see this progress! :)
>
> On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <re...@google.com> wrote:
>
>> Awesome !
>>
>> On Thu, 19 Mar 2020, 05:38 Sam Rohde, <sr...@google.com> wrote:
>>
>>> Hi All!
>>>
>>>
>>>
>>> I am happy to announce that an improved Interactive Runner is now
>>> available on master. This Python runner allows for the interactive
>>> development of Beam pipelines in a notebook (and IPython) environment.
>>>
>>>
>>>
>>> The runner still has some bugs that need to be fixed as well as some
>>> refactoring, but it is in a good enough shape to start using it.
>>>
>>>
>>>
>>> Here are the new things you can do with the Interactive Runner:
>>>
>>> -
>>>
>>> Create and execute pipelines within a REPL
>>> -
>>>
>>> Visualize elements as the pipeline is running
>>> -
>>>
>>> Materialize PCollections to DataFrames
>>> -
>>>
>>> Record unbounded sources for deterministic replay
>>> -
>>>
>>> Replay cached unbounded sources including watermark advancements
>>>
>>> The code lives in sdks/python/apache_beam/runners/interactive
>>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
>>> and example notebooks are in
>>> sdks/python/apache_beam/runners/interactive/examples
>>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
>>> .
>>>
>>>
>>>
>>> To install, use `pip install -e .[interactive]` in your <project
>>> root>/sdks/python directory.
>>>
>>> To run, here’s a quick example:
>>>
>>> ```
>>>
>>> import apache_beam as beam
>>>
>>> from apache_beam.runners.interactive.interactive_runner import
>>> InteractiveRunner
>>>
>>> import apache_beam.runners.interactive.interactive_beam as ib
>>>
>>>
>>>
>>> p = beam.Pipeline(InteractiveRunner())
>>>
>>> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>>>
>>> counts = words | 'count' >> beam.combiners.Count.PerElement()
>>>
>>>
>>>
>>> # Shows a dynamically updating display of the PCollection elements
>>>
>>> ib.show(counts)
>>>
>>>
>>>
>>> # We can now visualize the data using standard pandas operations.
>>>
>>> df = ib.collect(counts)
>>>
>>> print(df.info())
>>>
>>> print(df.describe())
>>>
>>>
>>>
>>> # Plot the top-10 counted words
>>>
>>> df = df.sort_values(by=1, ascending=False)
>>>
>>> df.head(n=10).plot(x=0, y=1)
>>>
>>> ```
>>>
>>>
>>>
>>> Currently, Batch is supported on any runner. Streaming is only supported
>>> on the DirectRunner (non-FnAPI).
>>>
>>>
>>>
>>> I would like to thank the great work of Sindy (@sindyli) and Harsh
>>> (@ananvay) for the initial implementation,
>>>
>>> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
>>> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
>>> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
>>> lot of their time to help with the design and code reviews.
>>>
>>>
>>>
>>> It was a team effort and we wouldn't have been able to complete it
>>> without the help of everyone involved.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Sam
>>>
>>>
Re: [Interactive Runner] now available on master
Posted by Ahmet Altay <al...@google.com>.
Great to see this progress! :)
On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <re...@google.com> wrote:
> Awesome !
>
> On Thu, 19 Mar 2020, 05:38 Sam Rohde, <sr...@google.com> wrote:
>
>> Hi All!
>>
>>
>>
>> I am happy to announce that an improved Interactive Runner is now
>> available on master. This Python runner allows for the interactive
>> development of Beam pipelines in a notebook (and IPython) environment.
>>
>>
>>
>> The runner still has some bugs that need to be fixed as well as some
>> refactoring, but it is in a good enough shape to start using it.
>>
>>
>>
>> Here are the new things you can do with the Interactive Runner:
>>
>> -
>>
>> Create and execute pipelines within a REPL
>> -
>>
>> Visualize elements as the pipeline is running
>> -
>>
>> Materialize PCollections to DataFrames
>> -
>>
>> Record unbounded sources for deterministic replay
>> -
>>
>> Replay cached unbounded sources including watermark advancements
>>
>> The code lives in sdks/python/apache_beam/runners/interactive
>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
>> and example notebooks are in
>> sdks/python/apache_beam/runners/interactive/examples
>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
>> .
>>
>>
>>
>> To install, use `pip install -e .[interactive]` in your <project
>> root>/sdks/python directory.
>>
>> To run, here’s a quick example:
>>
>> ```
>>
>> import apache_beam as beam
>>
>> from apache_beam.runners.interactive.interactive_runner import
>> InteractiveRunner
>>
>> import apache_beam.runners.interactive.interactive_beam as ib
>>
>>
>>
>> p = beam.Pipeline(InteractiveRunner())
>>
>> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>>
>> counts = words | 'count' >> beam.combiners.Count.PerElement()
>>
>>
>>
>> # Shows a dynamically updating display of the PCollection elements
>>
>> ib.show(counts)
>>
>>
>>
>> # We can now visualize the data using standard pandas operations.
>>
>> df = ib.collect(counts)
>>
>> print(df.info())
>>
>> print(df.describe())
>>
>>
>>
>> # Plot the top-10 counted words
>>
>> df = df.sort_values(by=1, ascending=False)
>>
>> df.head(n=10).plot(x=0, y=1)
>>
>> ```
>>
>>
>>
>> Currently, Batch is supported on any runner. Streaming is only supported
>> on the DirectRunner (non-FnAPI).
>>
>>
>>
>> I would like to thank the great work of Sindy (@sindyli) and Harsh
>> (@ananvay) for the initial implementation,
>>
>> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
>> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
>> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
>> lot of their time to help with the design and code reviews.
>>
>>
>>
>> It was a team effort and we wouldn't have been able to complete it
>> without the help of everyone involved.
>>
>>
>>
>> Regards,
>>
>> Sam
>>
>>
Re: [Interactive Runner] now available on master
Posted by Reza Rokni <re...@google.com>.
Awesome !
On Thu, 19 Mar 2020, 05:38 Sam Rohde, <sr...@google.com> wrote:
> Hi All!
>
>
>
> I am happy to announce that an improved Interactive Runner is now
> available on master. This Python runner allows for the interactive
> development of Beam pipelines in a notebook (and IPython) environment.
>
>
>
> The runner still has some bugs that need to be fixed as well as some
> refactoring, but it is in a good enough shape to start using it.
>
>
>
> Here are the new things you can do with the Interactive Runner:
>
> -
>
> Create and execute pipelines within a REPL
> -
>
> Visualize elements as the pipeline is running
> -
>
> Materialize PCollections to DataFrames
> -
>
> Record unbounded sources for deterministic replay
> -
>
> Replay cached unbounded sources including watermark advancements
>
> The code lives in sdks/python/apache_beam/runners/interactive
> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
> and example notebooks are in
> sdks/python/apache_beam/runners/interactive/examples
> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
> .
>
>
>
> To install, use `pip install -e .[interactive]` in your <project
> root>/sdks/python directory.
>
> To run, here’s a quick example:
>
> ```
>
> import apache_beam as beam
>
> from apache_beam.runners.interactive.interactive_runner import
> InteractiveRunner
>
> import apache_beam.runners.interactive.interactive_beam as ib
>
>
>
> p = beam.Pipeline(InteractiveRunner())
>
> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>
> counts = words | 'count' >> beam.combiners.Count.PerElement()
>
>
>
> # Shows a dynamically updating display of the PCollection elements
>
> ib.show(counts)
>
>
>
> # We can now visualize the data using standard pandas operations.
>
> df = ib.collect(counts)
>
> print(df.info())
>
> print(df.describe())
>
>
>
> # Plot the top-10 counted words
>
> df = df.sort_values(by=1, ascending=False)
>
> df.head(n=10).plot(x=0, y=1)
>
> ```
>
>
>
> Currently, Batch is supported on any runner. Streaming is only supported
> on the DirectRunner (non-FnAPI).
>
>
>
> I would like to thank the great work of Sindy (@sindyli) and Harsh
> (@ananvay) for the initial implementation,
>
> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
> lot of their time to help with the design and code reviews.
>
>
>
> It was a team effort and we wouldn't have been able to complete it without
> the help of everyone involved.
>
>
>
> Regards,
>
> Sam
>
>