You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/04/27 23:07:33 UTC

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #17481: added the blog post

TheNeuralBit commented on code in PR #17481:
URL: https://github.com/apache/beam/pull/17481#discussion_r860308582


##########
website/www/site/content/en/blog/beam-sql-with-notebooks-and-dataflow.md:
##########
@@ -0,0 +1,761 @@
+---
+title:  "Running SQL with Apache Beam Notebooks and Google Cloud Dataflow"
+date:   2022-04-26 00:00:01 -0800
+categories:
+  - blog
+aliases:
+  - /blog/2022/04/26/beam-sql-with-notebooks-and-dataflow.html
+authors:
+  - ningk
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+## Intro
+
+[Beam SQL](https://beam.apache.org/documentation/dsls/sql/overview/) allows a
+Beam user to query PCollections with SQL statements.
+[Apache Beam Notebooks](https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development)
+provides a cloud-hosted [JupyterLab](https://jupyter.org/) environment and lets
+a Beam user iteratively develop pipelines, inspect pipeline graphs, and parse
+individual PCollections in a read-eval-print-loop (REPL) workflow.
+
+In this post, you will see how to use `beam_sql`, a notebook
+[magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html), to
+execute Beam SQL in notebooks and later move on to Dataflow. To follow the steps
+given in this post, you should have a project in Google Cloud Platform with
+[necessary APIs enabled](https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#before_you_begin)
+, and you should have enough permissions to create a Google Cloud Storage bucket
+(or to use an existing one), query a public Google Cloud BigQuery dataset, and
+run Dataflow jobs.
+
+Once you have your Google Cloud project ready, you will need to create an Apache
+Beam Notebooks instance and open the JupyterLab web interface. Please follow the
+instructions given at:
+https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#launching_an_notebooks_instance
+
+
+## Getting familiar with the environment
+
+### Landing page
+
+After clicking the `OPEN JUPYTERLAB` link, you will land on the default launcher
+page of the notebook environment.
+
+<img class="center-block"
+     src="/images/blog/beam-sql-notebooks/image1.png"
+     alt="Beam SQL in Notebooks: landing page">

Review Comment:
   These images don't seem to be rendering in the staged version of the website: 
   ![image](https://user-images.githubusercontent.com/675055/165645255-2f07d595-db64-4c66-b04c-78056e08f9bb.png)
   
   I think because of the extra "blog" in the src?
   
   You might also look at using the markdown syntax for linking an image `![Example image](/image.png)` (see https://gohugo.io/content-management/static-files/)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org