You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Ning Kang <ni...@google.com> on 2019/12/04 19:45:24 UTC

[Interactive Beam] Changes to local pipeline executions

*If you are not an Interactive Beam user, you can ignore this email.*

Hi Interactive Beam users,

We've recently made some changes to how Interactive Beam gets to understand
the context of the pipelines/PCollections defined in your notebook/code.

If you write Beam pipelines with the InteractiveRunner directly in notebook
cells like the Interactive Beam Examples
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb>
or
define everything in "__main__", you will not be affected by the changes.

If you define your pipelines in local scope such as functions (an example
scenario, unit tests) and you rely on interactive features to introspect
the data of a PCollection after a pipeline run, you might see such  "raise
ValueError('PCollection not available, please run the pipeline.')".

It's because Interactive Beam now "watches" the "__main__" scope by default
to provide features implicitly. To avoid the error, you only need to tell
Interactive Beam to "watch
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_beam.py#L36>"
your local scopes too.
An example to fix the issue,
from apache_beam.runners.interactive import interactive_beam
...
def some_func(...):
    p = beam.Pipeline(InteractiveRunner())
    pcoll = p | 'SomeTransform' >> SomeTransform()
    ...
    interactive_beam.watch(locals())
    result = p.run()
    ...
...

Thanks for using Interactive Beam!

Ning.

Re: [Interactive Beam] Changes to local pipeline executions

Posted by Maximilian Michels <mx...@apache.org>.
Thanks for the heads-up, Ning! I haven't tried out interactive Beam, but 
this puts it back on my radar :)

Cheers,
Max

On 04.12.19 20:45, Ning Kang wrote:
> *If you are not an Interactive Beam user, you can ignore this email.*
> *
> *
> Hi Interactive Beam users,
> 
> We've recently made some changes to how Interactive Beam gets to 
> understand the context of the pipelines/PCollections defined in your 
> notebook/code.
> 
> If you write Beam pipelines with the InteractiveRunner directly in 
> notebook cells like the Interactive Beam Examples 
> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb> or 
> define everything in "__main__", you will not be affected by the changes.
> 
> If you define your pipelines in local scope such as functions (an 
> example scenario, unit tests) and you rely on interactive features to 
> introspect the data of a PCollection after a pipeline run, you might see 
> such  "raise ValueError('PCollection not available, please run the 
> pipeline.')".
> 
> It's because Interactive Beam now "watches" the "__main__" scope by 
> default to provide features implicitly. To avoid the error, you only 
> need to tell Interactive Beam to "watch 
> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_beam.py#L36>" 
> your local scopes too.
> An example to fix the issue,
> from apache_beam.runners.interactive import interactive_beam
> ...
> def some_func(...):
>      p = beam.Pipeline(InteractiveRunner())
>      pcoll = p | 'SomeTransform' >> SomeTransform()
>      ...
> interactive_beam.watch(locals())
>      result = p.run()
>      ...
> ...
> 
> Thanks for using Interactive Beam!
> 
> Ning.