You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/07/23 18:59:59 UTC

[GitHub] [beam] chamikaramj commented on a change in pull request #12355: [BEAM-10559] Add some comments and clean up SQL example.

chamikaramj commented on a change in pull request #12355:
URL: https://github.com/apache/beam/pull/12355#discussion_r459662985



##########
File path: sdks/python/apache_beam/examples/wordcount_xlang_sql.py
##########
@@ -15,12 +15,16 @@
 # limitations under the License.
 #
 
-"""A cross-language word-counting workflow."""
+"""A cross-language word-counting workflow.
+
+Java and docker must be available to run this pipeline.

Review comment:
       Sounds good.

##########
File path: sdks/python/apache_beam/examples/wordcount_xlang_sql.py
##########
@@ -101,12 +89,33 @@ def main():
   # workflow rely on global context (e.g., a module imported at module level).
   pipeline_options.view_as(SetupOptions).save_main_session = True
 
-  p = beam.Pipeline(options=pipeline_options)
-  # Preemptively start due to BEAM-6666.
-  p.runner.create_job_service(pipeline_options)
+  with beam.Pipeline(options=pipeline_options) as p:
+    if isinstance(p.runner, portable_runner.PortableRunner):
+      # Preemptively start due to BEAM-6666.
+      p.runner.create_job_service(pipeline_options)

Review comment:
       Yeah, you are right.

##########
File path: sdks/python/apache_beam/examples/wordcount_xlang_sql.py
##########
@@ -101,12 +89,33 @@ def main():
   # workflow rely on global context (e.g., a module imported at module level).
   pipeline_options.view_as(SetupOptions).save_main_session = True
 
-  p = beam.Pipeline(options=pipeline_options)
-  # Preemptively start due to BEAM-6666.
-  p.runner.create_job_service(pipeline_options)
+  with beam.Pipeline(options=pipeline_options) as p:
+    if isinstance(p.runner, portable_runner.PortableRunner):
+      # Preemptively start due to BEAM-6666.
+      p.runner.create_job_service(pipeline_options)
+
+    run(p, known_args.input, known_args.output)
 
-  run(p, known_args.input, known_args.output)
 
+# Some more fun queries:
+# ------
+# SELECT
+#   word as key,
+#   COUNT(*) as `count`
+# FROM PCOLLECTION
+# GROUP BY word

Review comment:
       Let's try them again and try with Dataflow as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org