You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/09/09 16:52:00 UTC

[jira] [Work logged] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook

     [ https://issues.apache.org/jira/browse/BEAM-7760?focusedWorklogId=309067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309067 ]

ASF GitHub Bot logged work on BEAM-7760:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Sep/19 16:51
            Start Date: 09/Sep/19 16:51
    Worklog Time Spent: 10m 
      Work Description: aaltay commented on pull request #9278: [BEAM-7760] Added Interactive Beam module
URL: https://github.com/apache/beam/pull/9278#discussion_r322344323
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam_test.py
 ##########
 @@ -0,0 +1,70 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.interactive_beam."""
+
+import importlib
+import unittest
+
+from apache_beam.runners.interactive import interactive_beam as ib
+from apache_beam.runners.interactive import interactive_environment as ie
+
+# The module name is also a variable in module.
+_module_name = 'apache_beam.runners.interactive.interactive_beam_test'
+
+
+class InteractiveBeamTest(unittest.TestCase):
+
+  def setUp(self):
+    self._var_in_class_instance = 'a var in class instance, not directly used'
+    ie.new_env()
+
+  def test_watch_main_by_default(self):
+    test_env = ie.InteractiveEnvironment()
+    # Current Interactive Beam env fetched and the test env are 2 instances.
+    self.assertNotEqual(id(ie.current_env()), id(test_env))
+    self.assertEqual(ie.current_env().watching(), test_env.watching())
+
+  def test_watch_a_module_by_name(self):
+    test_env = ie.InteractiveEnvironment()
+    ib.watch(_module_name)
+    test_env.watch(_module_name)
+    self.assertEqual(ie.current_env().watching(), test_env.watching())
 
 Review comment:
   What is really tested here? We call watch on two environments and check that they have the same results. It is not checking whether the _module_name is in the watch set or not as the test name implies. (Same comment for other assertEqual's in this file.)
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 309067)
    Time Spent: 3h 40m  (was: 3.5h)

> Interactive Beam Caching PCollections bound to user defined vars in notebook
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-7760
>                 URL: https://issues.apache.org/jira/browse/BEAM-7760
>             Project: Beam
>          Issue Type: New Feature
>          Components: examples-python
>            Reporter: Ning Kang
>            Assignee: Ning Kang
>            Priority: Major
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Cache only PCollections bound to user defined variables in a pipeline when running pipeline with interactive runner in jupyter notebooks.
> [Interactive Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]] has been caching and using caches of "leaf" PCollections for interactive execution in jupyter notebooks.
> The interactive execution is currently supported so that when appending new transforms to existing pipeline for a new run, executed part of the pipeline doesn't need to be re-executed. 
> A PCollection is "leaf" when it is never used as input in any PTransform in the pipeline.
> The problem with building caches and pipeline to execute around "leaf" is that when a PCollection is consumed by a sink with no output, the pipeline to execute built will miss the subgraph generating and consuming that PCollection.
> An example, "ReadFromPubSub --> WirteToPubSub" will result in an empty pipeline.
> Caching around PCollections bound to user defined variables and replacing transforms with source and sink of caches could resolve the pipeline to execute properly under the interactive execution scenario. Also, cached PCollection now can trace back to user code and can be used for user data visualization if user wants to do it.
> E.g.,
> {code:java}
> // ...
> p = beam.Pipeline(interactive_runner.InteractiveRunner(),
>                   options=pipeline_options)
> messages = p | "Read" >> beam.io.ReadFromPubSub(subscription='...')
> messages | "Write" >> beam.io.WriteToPubSub(topic_path)
> result = p.run()
> // ...
> visualize(messages){code}
>  The interactive runner automatically figures out that PCollection
> {code:java}
> messages{code}
> created by
> {code:java}
> p | "Read" >> beam.io.ReadFromPubSub(subscription='...'){code}
> should be cached and reused if the notebook user appends more transforms.
>  And once the pipeline gets executed, the user could use any visualize(PCollection) module to visualize the data statically (batch) or dynamically (stream)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)