You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/08 11:59:38 UTC

[GitHub] [beam] olehborysevych opened a new pull request, #23085: [Tour of Beam] Learning content for "Introduction" module

olehborysevych opened a new pull request, #23085:
URL: https://github.com/apache/beam/pull/23085

   fixes #22497
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] alxp1982 commented on a diff in pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
alxp1982 commented on code in PR #23085:
URL: https://github.com/apache/beam/pull/23085#discussion_r966541256


##########
learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/from-memory/example/ParDoExample.java:
##########
@@ -0,0 +1,82 @@
+/*

Review Comment:
   Please change the name of the file and task. In this learning context, we're not showing ParDo but rather PCollection creation from in-memory data



##########
learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/description.md:
##########
@@ -0,0 +1,134 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide

Review Comment:
   I think we need to split this unit. It is too long and overloaded. Let's do the following:
   
   - Creating a pipeline
     - Setting PipelineOptions
     - Creating custom options



##########
learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/example/Task.java:
##########
@@ -0,0 +1,69 @@
+/*

Review Comment:
   The unit itself was focused on creating and configuring the pipeline. The example needs to include learned concepts. 



##########
learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/description.md:
##########
@@ -0,0 +1,134 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+The Beam Programming Guide is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. This guide provides guidance for using the Beam SDK classes to build and test pipelines. The programming guide is not intended to be an exhaustive reference, but rather a language-agnostic, high-level guide to programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your pipelines.

Review Comment:
   Tour of Beam Programing Guide section isn't needed



##########
learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/description.md:
##########
@@ -0,0 +1,134 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+The Beam Programming Guide is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. This guide provides guidance for using the Beam SDK classes to build and test pipelines. The programming guide is not intended to be an exhaustive reference, but rather a language-agnostic, high-level guide to programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your pipelines.
+
+For a brief introduction to Beam’s basic concepts,take a look at the Basics of the Beam model page before reading the programming guide.
+
+### Overview
+
+To use Beam, you need to first create a driver program using the classes in one of the Beam SDKs. Your driver program defines your pipeline, including all of the inputs, transforms, and outputs; it also sets execution options for your pipeline (typically passed by using command-line options). These include the Pipeline Runner, which, in turn, determines what back-end your pipeline will run on.
+
+The Beam SDKs provide a number of abstractions that simplify the mechanics of large-scale distributed data processing. The same Beam abstractions work with both batch and streaming data sources. When you create your Beam pipeline, you can think about your data processing task in terms of these abstractions. They include:
+
+&#8594; `Pipeline`: A Pipeline encapsulates your entire data processing task, from start to finish. This includes reading input data, transforming that data, and writing output data. All Beam driver programs must create a Pipeline. When you create the Pipeline, you must also specify the execution options that tell the Pipeline where and how to run.
+
+&#8594; `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline.
+
+&#8594; `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as input, performs a processing function that you provide on the elements of that PCollection, and produces zero or more output PCollection objects.
+
+&#8594; `Scope`: The Go SDK has an explicit scope variable used to build a Pipeline. A Pipeline can return it’s root scope with the `Root()` method. The scope variable is passed to PTransform functions to place them in the Pipeline that owns the Scope.
+
+&#8594; `I/O transforms`: Beam comes with a number of “IOs” - library PTransforms that read or write data to various external storage systems.
+
+A typical Beam driver program works as follows:
+
+&#8594; Create a Pipeline object and set the pipeline execution options, including the Pipeline Runner.
+
+&#8594; Create an initial PCollection for pipeline data, either using the IOs to read data from an external storage system, or using a Create transform to build a PCollection from in-memory data.
+
+&#8594; Apply PTransforms to each PCollection. Transforms can change, filter, group, analyze, or otherwise process the elements in a PCollection. A transform creates a new output PCollection without modifying the input collection. A typical pipeline applies subsequent transforms to each new output PCollection in turn until processing is complete. However, note that a pipeline does not have to be a single straight line of transforms applied one after another: think of PCollections as variables and PTransforms as functions applied to these variables: the shape of the pipeline can be an arbitrarily complex processing graph.
+
+&#8594; Use IOs to write the final, transformed PCollection(s) to an external source.
+
+&#8594; Run the pipeline using the designated Pipeline Runner.
+
+When you run your Beam driver program, the Pipeline Runner that you designate constructs a workflow graph of your pipeline based on the PCollection objects you’ve created and transforms that you’ve applied. That graph is then executed using the appropriate distributed processing back-end, becoming an asynchronous “job” (or equivalent) on that back-end.
+
+### Creating a pipeline
+
+The `Pipeline` abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a `Pipeline` object, and then using that object as the basis for creating the pipeline’s data sets as `PCollection`s and its operations as `Transforms`.
+
+To use Beam, your driver program must first create an instance of the Beam SDK class `Pipeline` (typically in the main() function). When you create your `Pipeline`, you’ll also need to set some configuration options. You can set your pipeline’s configuration options programmatically, but it’s often easier to set the options ahead of time (or read them from the command line) and pass them to the `Pipeline` object when you create the object.
+
+```
+// Start by defining the options for the pipeline.
+PipelineOptions options = PipelineOptionsFactory.create();
+
+// Then create the pipeline.
+Pipeline p = Pipeline.create(options);
+```
+
+### Configuring pipeline options
+
+Use the pipeline options to configure different aspects of your pipeline, such as the pipeline runner that will execute your pipeline and any runner-specific configuration required by the chosen runner. Your pipeline options will potentially include information such as your project ID or a location for storing files.
+
+When you run the pipeline on a runner of your choice, a copy of the PipelineOptions will be available to your code. For example, if you add a PipelineOptions parameter to a DoFn’s `@ProcessElement` method, it will be populated by the system.
+
+### Setting PipelineOptions from command-line arguments
+
+While you can configure your pipeline by creating a PipelineOptions object and setting the fields directly, the Beam SDKs include a command-line parser that you can use to set fields in PipelineOptions using command-line arguments.
+
+To read options from the command-line, construct your PipelineOptions object as demonstrated in the following example code:
+
+```
+PipelineOptions options =
+    PipelineOptionsFactory.fromArgs(args).withValidation().create();
+```
+
+This interprets command-line arguments that follow the format:
+
+```
+--<option>=<value>
+```
+
+> Appending the method .withValidation will check for required command-line arguments and validate argument values.
+
+### Creating custom options
+
+You can add your own custom options in addition to the standard `PipelineOptions`.
+
+To add your own options, define an interface with getter and setter methods for each option.
+
+The following example shows how to add `input` and `output` custom options:
+
+```
+public interface MyOptions extends PipelineOptions {
+    String getInput();
+    void setInput(String input);
+
+    String getOutput();
+    void setOutput(String output);
+}
+```
+
+You can also specify a description, which appears when a user passes `--help` as a command-line argument, and a default value.
+
+You set the description and default value using annotations, as follows:
+
+```
+public interface MyOptions extends PipelineOptions {
+    @Description("Input for the pipeline")
+    @Default.String("gs://my-bucket/input")
+    String getInput();
+    void setInput(String input);
+
+    @Description("Output for the pipeline")
+    @Default.String("gs://my-bucket/output")
+    String getOutput();
+    void setOutput(String output);
+}
+```
+
+It’s recommended that you register your interface with `PipelineOptionsFactory` and then pass the interface when creating the `PipelineOptions` object. When you register your interface with `PipelineOptionsFactory`, the `--help` can find your custom options interface and add it to the output of the --help command. `PipelineOptionsFactory` will also validate that your custom options are compatible with all other registered options.
+
+The following example code shows how to register your custom options interface with `PipelineOptionsFactory`:
+
+```
+PipelineOptionsFactory.register(MyOptions.class);
+MyOptions options = PipelineOptionsFactory.fromArgs(args)
+                                                .withValidation()
+                                                .as(MyOptions.class);
+```

Review Comment:
   Please add a description of the task, an invitation to run and experiment with it as well as a few small challenges. Similarly, how it is done in other places, e.g. 'In the playground window, you can find an example of ... that you can run and experiment with. Can you modify it so that ...' 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] olehborysevych commented on a diff in pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
olehborysevych commented on code in PR #23085:
URL: https://github.com/apache/beam/pull/23085#discussion_r1015552904


##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles.
+You can access the full list of modules by clicking ‘<<’ button on the left . For each module, learning progress is displayed next to it.
+Throughout the tour, you will find learning materials, examples, exercises and challenges for you to complete.
+Learning units are accompanied by code examples that you can review in the upper right pane. You can edit the code, or just run the example by clicking the ‘Run’. Output is displayed in the lower right pane.

Review Comment:
   Thanks @damccorm. Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] olehborysevych commented on pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
olehborysevych commented on PR #23085:
URL: https://github.com/apache/beam/pull/23085#issuecomment-1249271629

   R: @kerrydc 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] olehborysevych commented on pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
olehborysevych commented on PR #23085:
URL: https://github.com/apache/beam/pull/23085#issuecomment-1240621728

   R: @sirenbyte
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] olehborysevych commented on a diff in pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
olehborysevych commented on code in PR #23085:
URL: https://github.com/apache/beam/pull/23085#discussion_r966765687


##########
learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/description.md:
##########
@@ -0,0 +1,134 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide

Review Comment:
   Alex, we discussed this with Abzal and we propose to have pipeline options and custom options in a single unit since there is no value in having two examples for those topics. The concepts of options and custom options are similar and could be illustrated with a single example



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
kerrydc commented on code in PR #23085:
URL: https://github.com/apache/beam/pull/23085#discussion_r1008288264


##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, it is time to learn how to create an initial `PCollection` and fill it with data.
+
+There are several options on how to do that:

Review Comment:
   There are several options:



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, it is time to learn how to create an initial `PCollection` and fill it with data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class in your driver program.
+
+→ You can also read the data from a variety of external sources such as local and cloud-based files, databases, or other sources using Beam-provided I/O adapters

Review Comment:
   local or cloud-based files



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, it is time to learn how to create an initial `PCollection` and fill it with data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class in your driver program.
+
+→ You can also read the data from a variety of external sources such as local and cloud-based files, databases, or other sources using Beam-provided I/O adapters
+
+Through the tour, most of the examples use either `PCollection` created from in-memory data or data read from one of the cloud buckets: beam-examples, dataflow-samples. These buckets contain sample data sets specifically created for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from an in-memory Go Collection. You can apply Create transform directly to your Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline

Review Comment:
   typo: pipline



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, it is time to learn how to create an initial `PCollection` and fill it with data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class in your driver program.
+
+→ You can also read the data from a variety of external sources such as local and cloud-based files, databases, or other sources using Beam-provided I/O adapters
+
+Through the tour, most of the examples use either `PCollection` created from in-memory data or data read from one of the cloud buckets: beam-examples, dataflow-samples. These buckets contain sample data sets specifically created for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from an in-memory Go Collection. You can apply Create transform directly to your Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    //Now create the PCollection using list of strings
+    numbers := beam.Create(s, "To", "be", "or", "not", "to", "be","that", "is", "the", "question")
+
+    //Create a numerical PCollection
+    numbers := beam.Create(s, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+
+}
+```
+
+### Playground exercise
+
+You can find the complete code of this example in the playground window you can run and experiment with.
+
+One of the differences you will notice is that it also contains the part to output `PCollection` elements to the console. Don’t worry if you don’t quite understand it, as the concept of `ParDo` transform will be explained later in the course. Feel free, however, to use it in exercises and challenges to explore results.
+
+Do you also notice in what order elements of PCollection appear in the console? Why is that? You can also run the example several times to see if the output stays the same or changes.

Review Comment:
   Do you also notice the order elements of the PCollection appear in the console? Why is that? You can also run the example several times to see if the output stays the same or changes.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and challenges throughout the course, you’ll be working with one of the datasets stored as csv files in either beam-examples, dataflow-samples buckets.
+
+Loading data from csv file requires some processing and consists of two main part:
+* Loading text lines using `TextIO.Read` transform
+* Parsing lines of text into tabular format
+
+### Playground exercise
+
+Try to experiment with an example in the playground window and modify the code to process other fields from New York taxi rides dataset.

Review Comment:
   Try to experiment with an example in the playground window and modify the code to process other fields from the New York taxi rides dataset.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and challenges throughout the course, you’ll be working with one of the datasets stored as csv files in either beam-examples, dataflow-samples buckets.
+
+Loading data from csv file requires some processing and consists of two main part:
+* Loading text lines using `TextIO.Read` transform
+* Parsing lines of text into tabular format

Review Comment:
   Parse



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and challenges throughout the course, you’ll be working with one of the datasets stored as csv files in either beam-examples, dataflow-samples buckets.
+
+Loading data from csv file requires some processing and consists of two main part:
+* Loading text lines using `TextIO.Read` transform
+* Parsing lines of text into tabular format
+
+### Playground exercise
+
+Try to experiment with an example in the playground window and modify the code to process other fields from New York taxi rides dataset.
+
+Here is a list of fields and a sample record from this dataset:

Review Comment:
   Please describe the fields in more detail, and highlight which fields are important for these exercises.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/example/textIo.go:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+/*
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+*/
+// beam-playground:
+//   name: TextIO
+//   description: TextIO example.
+//   multifile: false
+//   context_line: 46
+//   categories:
+//     - Quickstart
+//   complexity: BASIC
+//   tags:
+//     - hellobeam
+
+package main
+
+import (
+    "context"
+    "fmt"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/filter"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
+    "regexp"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top"
+)
+
+var (
+    wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`)
+)
+
+func less(a, b string) bool{
+    return len(a)>len(b)
+}
+
+func main() {
+    p, s := beam.NewPipelineWithRoot()
+
+    file := Read(s, "gs://apache-beam-samples/shakespeare/kinglear.txt")
+
+    lines := getLines(s, file)
+    fixedSizeLines := top.Largest(s,lines,10,less)
+    output(s, "Lines: ", fixedSizeLines)
+
+    words := getWords(s,lines)
+    fixedSizeWords := top.Largest(s,words,10,less)
+    output(s, "Words: ", fixedSizeWords)
+
+    err := beamx.Run(context.Background(), p)
+    if err != nil {
+        log.Exitf(context.Background(), "Failed to execute job: %v", err)
+    }
+}
+
+// Read reads from fiename(s) specified by a glob string and a returns a PCollection<string>.
+func Read(s beam.Scope, glob string) beam.PCollection {
+    return textio.Read(s, glob)
+}
+
+// Read text file content line by line. resulting PCollection contains elements, where each element contains a single line of text from the input file.

Review Comment:
   The resulting PCollection contains elements, where each element contains a single line of text from the input file.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, it is time to learn how to create an initial `PCollection` and fill it with data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class in your driver program.
+
+→ You can also read the data from a variety of external sources such as local and cloud-based files, databases, or other sources using Beam-provided I/O adapters
+
+Through the tour, most of the examples use either `PCollection` created from in-memory data or data read from one of the cloud buckets: beam-examples, dataflow-samples. These buckets contain sample data sets specifically created for educational purposes.

Review Comment:
   Through the tour, most of the examples use either a `PCollection` created from in-memory data or data read from one of the cloud buckets "beam-examples" or "dataflow-samples". These buckets contain sample data sets specifically created for educational purposes.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes in one of the Beam SDKs. Your driver program defines your pipeline, including all of the inputs, transforms, and outputs. It also sets execution options for your pipeline (typically passed by using command-line options). These include the Pipeline Runner, which, in turn, determines what back-end your pipeline will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of large-scale distributed data processing. The same Beam abstractions work with both batch and streaming data sources. When you create your Beam pipeline, you can think about your data processing task in terms of these abstractions. They include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from start to finish. This includes reading input data, transforming that data, and writing output data. All Beam driver programs must create a Pipeline. When you create the Pipeline, you must also specify the execution options that tell the Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline.
+
+→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects.
+
+→ `Scope`: The Go SDK has an explicit scope variable used to build a `Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The scope variable is then passed to `PTransform` functions that place them in the `Pipeline` that owns the `Scope`.
+
+→ `I/O transforms`: Beam comes with a number of “IOs” - library PTransforms that read or write data to various external storage systems.
+
+A typical Beam driver program works as follows:

Review Comment:
   A typical Beam driver program works like this:



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, it is time to learn how to create an initial `PCollection` and fill it with data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class in your driver program.
+
+→ You can also read the data from a variety of external sources such as local and cloud-based files, databases, or other sources using Beam-provided I/O adapters
+
+Through the tour, most of the examples use either `PCollection` created from in-memory data or data read from one of the cloud buckets: beam-examples, dataflow-samples. These buckets contain sample data sets specifically created for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from an in-memory Go Collection. You can apply Create transform directly to your Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    //Now create the PCollection using list of strings
+    numbers := beam.Create(s, "To", "be", "or", "not", "to", "be","that", "is", "the", "question")
+
+    //Create a numerical PCollection
+    numbers := beam.Create(s, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+
+}
+```
+
+### Playground exercise
+
+You can find the complete code of this example in the playground window you can run and experiment with.

Review Comment:
   You can find the complete code of this example in the playground window where you can run the pipeline and experiment with it.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes in one of the Beam SDKs. Your driver program defines your pipeline, including all of the inputs, transforms, and outputs. It also sets execution options for your pipeline (typically passed by using command-line options). These include the Pipeline Runner, which, in turn, determines what back-end your pipeline will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of large-scale distributed data processing. The same Beam abstractions work with both batch and streaming data sources. When you create your Beam pipeline, you can think about your data processing task in terms of these abstractions. They include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from start to finish. This includes reading input data, transforming that data, and writing output data. All Beam driver programs must create a Pipeline. When you create the Pipeline, you must also specify the execution options that tell the Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline.

Review Comment:
   The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. A pipeline with bounded input is referred to as a Batch Pipeline, while an unbounded input is used with a Streaming Pipeline. 



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes in one of the Beam SDKs. Your driver program defines your pipeline, including all of the inputs, transforms, and outputs. It also sets execution options for your pipeline (typically passed by using command-line options). These include the Pipeline Runner, which, in turn, determines what back-end your pipeline will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of large-scale distributed data processing. The same Beam abstractions work with both batch and streaming data sources. When you create your Beam pipeline, you can think about your data processing task in terms of these abstractions. They include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from start to finish. This includes reading input data, transforming that data, and writing output data. All Beam driver programs must create a Pipeline. When you create the Pipeline, you must also specify the execution options that tell the Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline.
+
+→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects.
+
+→ `Scope`: The Go SDK has an explicit scope variable used to build a `Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The scope variable is then passed to `PTransform` functions that place them in the `Pipeline` that owns the `Scope`.
+
+→ `I/O transforms`: Beam comes with a number of “IOs” - library PTransforms that read or write data to various external storage systems.
+
+A typical Beam driver program works as follows:
+
+→ Create a Pipeline object and set the pipeline execution options, including the Pipeline Runner.
+
+→ Create an initial `PCollection` for pipeline data, either using the IOs to read data from an external storage system, or using a Create transform to build a `PCollection` from in-memory data.

Review Comment:
   → Create an initial `PCollection` of pipeline data, either by using the IOs to read data from an external storage system, or by using a Create transform to build a `PCollection` from in-memory data.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and challenges throughout the course, you’ll be working with one of the datasets stored as csv files in either beam-examples, dataflow-samples buckets.
+
+Loading data from csv file requires some processing and consists of two main part:
+* Loading text lines using `TextIO.Read` transform

Review Comment:
   Loading data from csv file takes two steps:
   * Load text lines using the `TextIO.Read` transform



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, it is time to learn how to create an initial `PCollection` and fill it with data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class in your driver program.
+
+→ You can also read the data from a variety of external sources such as local and cloud-based files, databases, or other sources using Beam-provided I/O adapters
+
+Through the tour, most of the examples use either `PCollection` created from in-memory data or data read from one of the cloud buckets: beam-examples, dataflow-samples. These buckets contain sample data sets specifically created for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from an in-memory Go Collection. You can apply Create transform directly to your Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    //Now create the PCollection using list of strings
+    numbers := beam.Create(s, "To", "be", "or", "not", "to", "be","that", "is", "the", "question")

Review Comment:
   numbers should be strings or hamlet



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/setting-pipeline/description.md:
##########
@@ -0,0 +1,63 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Configuring pipeline options
+
+Use the pipeline options to configure different aspects of your pipeline, such as the pipeline runner that will execute your pipeline and any runner-specific configuration required by the chosen runner. Your pipeline options will potentially include information such as your project ID or a location for storing files.
+
+### Setting PipelineOptions from command-line arguments
+
+Use Go flags to parse command line arguments to configure your pipeline. Flags must be parsed before `beam.Init()` is called.
+
+```
+// If beamx or Go flags are used, flags must be parsed first,
+// before beam.Init() is called.
+flag.Parse()
+```
+
+This interprets command-line arguments this follow the format:
+
+```
+--<option>=<value>
+```
+
+### Creating custom options
+
+You can add your own custom options in addition to the standard `PipelineOptions`.
+
+The following example shows how to add `input` and `output` custom options:
+
+```
+// Use standard Go flags to define pipeline options.
+var (
+  input  = flag.String("input", "gs://my-bucket/input", "Input for the pipeline")
+  output = flag.String("output", "gs://my-bucket/output", "Output for the pipeline")
+)
+```
+
+### Playground exercise
+
+You can find the full code of the above example in the playground window, which you can run and experiment with.
+
+You can transfer files of other extensions. For example, a csv file with taxi order data. And after making some transformations, you can write to a new csv file:
+```
+var (
+  input = flag.String("input", "gs://apache-beam-samples/nyc_taxi/misc/sample1000.csv", "File(s) to read.")
+
+  output = flag.String("output", "output.csv", "Output file (required).")
+)
+```
+
+Overview [file](https://storage.googleapis.com/apache-beam-samples/nyc_taxi/misc/sample1000.csv)
+
+Do you also notice in what order elements of PCollection appear in the console? Why is that? You can also run the example several times to see if the output stays the same or changes.

Review Comment:
   Do you notice the order that elements of the PCollection appear in the console?



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, it is time to learn how to create an initial `PCollection` and fill it with data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class in your driver program.
+
+→ You can also read the data from a variety of external sources such as local and cloud-based files, databases, or other sources using Beam-provided I/O adapters
+
+Through the tour, most of the examples use either `PCollection` created from in-memory data or data read from one of the cloud buckets: beam-examples, dataflow-samples. These buckets contain sample data sets specifically created for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from an in-memory Go Collection. You can apply Create transform directly to your Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    //Now create the PCollection using list of strings
+    numbers := beam.Create(s, "To", "be", "or", "not", "to", "be","that", "is", "the", "question")
+
+    //Create a numerical PCollection
+    numbers := beam.Create(s, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+
+}
+```
+
+### Playground exercise
+
+You can find the complete code of this example in the playground window you can run and experiment with.
+
+One of the differences you will notice is that it also contains the part to output `PCollection` elements to the console. Don’t worry if you don’t quite understand it, as the concept of `ParDo` transform will be explained later in the course. Feel free, however, to use it in exercises and challenges to explore results.

Review Comment:
   One difference you will notice is that it also contains a function to output `PCollection` elements to the console. Don’t worry if you don’t quite understand it, as the concept of `ParDo` transforms will be explained later in the course. Feel free, however, to use it in exercises and challenges to explore results.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes in one of the Beam SDKs. Your driver program defines your pipeline, including all of the inputs, transforms, and outputs. It also sets execution options for your pipeline (typically passed by using command-line options). These include the Pipeline Runner, which, in turn, determines what back-end your pipeline will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of large-scale distributed data processing. The same Beam abstractions work with both batch and streaming data sources. When you create your Beam pipeline, you can think about your data processing task in terms of these abstractions. They include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from start to finish. This includes reading input data, transforming that data, and writing output data. All Beam driver programs must create a Pipeline. When you create the Pipeline, you must also specify the execution options that tell the Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline.
+
+→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects.
+
+→ `Scope`: The Go SDK has an explicit scope variable used to build a `Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The scope variable is then passed to `PTransform` functions that place them in the `Pipeline` that owns the `Scope`.
+
+→ `I/O transforms`: Beam comes with a number of “IOs” - library PTransforms that read or write data to various external storage systems.
+
+A typical Beam driver program works as follows:
+
+→ Create a Pipeline object and set the pipeline execution options, including the Pipeline Runner.
+
+→ Create an initial `PCollection` for pipeline data, either using the IOs to read data from an external storage system, or using a Create transform to build a `PCollection` from in-memory data.
+
+→ Apply `PTransforms` to each `PCollection`. Transforms can change, filter, group, analyze, or otherwise process the elements in a PCollection. A transform creates a new output PCollection without modifying the input collection. A typical pipeline applies subsequent transforms to each new output PCollection in turn until the processing is complete. However, note that a pipeline does not have to be a single straight line of transforms applied one after another: think of PCollections as variables and PTransforms as functions applied to these variables: the shape of the pipeline can be an arbitrarily complex processing graph.

Review Comment:
   A transform creates a new output PCollection without modifying the input collection, and the transform is always applied to every element of the input PCollection.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Overview
+
+Apache Beam provides a portable API layer for building sophisticated data-parallel processing `pipelines` that may be executed across a diversity of execution engines, or `runners`. The core concepts of this layer are based upon the Beam Model (formerly referred to as the Dataflow Model), and implemented to varying degrees in each Beam `runner`.
+
+### Direct runner
+The Direct Runner executes pipelines on your machine and is designed to validate that pipelines adhere to the Apache Beam model as closely as possible. Instead of focusing on efficient pipeline execution, the Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model. Some of these checks include:
+
+* enforcing immutability of elements
+* enforcing encodability of elements
+* elements are processed in an arbitrary order at all points
+* serialization of user functions (DoFn, CombineFn, etc.)
+
+Using the Direct Runner for testing and development helps ensure that pipelines are robust across different Beam runners. In addition, debugging failed runs can be a non-trivial task when a pipeline executes on a remote cluster. Instead, it is often faster and simpler to perform local unit testing on your pipeline code. Unit testing your pipeline locally also allows you to use your preferred local debugging tools.In the SDK Go, the default is runner **DirectRunner**.

Review Comment:
   In the Go SDK, the default is runner **DirectRunner**.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles.
+You can access list of modules by clicking ‘<<’ button on the left . For each module, learning progress is displayed next to it.

Review Comment:
   You can access the full list of modules



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Overview
+
+Apache Beam provides a portable API layer for building sophisticated data-parallel processing `pipelines` that may be executed across a diversity of execution engines, or `runners`. The core concepts of this layer are based upon the Beam Model (formerly referred to as the Dataflow Model), and implemented to varying degrees in each Beam `runner`.
+
+### Direct runner
+The Direct Runner executes pipelines on your machine and is designed to validate that pipelines adhere to the Apache Beam model as closely as possible. Instead of focusing on efficient pipeline execution, the Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model. Some of these checks include:

Review Comment:
   I think we should use the Portable Runner for examples, since we intend to have x-lang examples eventually. If the portable Runner doesn't work for an example we can give the direct runner flags.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles.
+You can access list of modules by clicking ‘<<’ button on the left . For each module, learning progress is displayed next to it.
+Throughout the tour, you will find list of learning materials, examples, exercises and challenges for you to complete.
+Learning units are accompanied by examples that you can review in the right pane, and run by clicking ‘Run’ button to see the output.

Review Comment:
   Learning units are accompanied by code examples that you can review in the upper right pane. You can edit the code, or just run the example by clicking the ‘Run’. Output is displayed in the lower right pane.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-terms/description.md:
##########
@@ -0,0 +1,38 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+**Pipeline** - A pipeline is a user-constructed graph of transformations that defines the desired data processing operations.
+
+**PCollection** - A PCollection is a data set or data stream. The data that a pipeline processes is part of a PCollection.
+
+**PTransform** - A PTransform (or transform) represents a data processing operation, or a step, in your pipeline. A transform is applied to zero or more PCollection objects, and produces zero or more PCollection objects.
+
+**Aggregation** - Aggregation is computing a value from multiple (1 or more) input elements.
+
+**User-defined function (UDF)** - Some Beam operations allow you to run user-defined code as a way to configure the transform.
+
+**Schema** - A schema is a language-independent type definition for a PCollection. The schema for a PCollection defines elements of that PCollection as an ordered list of named fields.
+
+**SDK** - A language-specific library that lets pipeline authors build transforms, construct their pipelines, and submit them to a runner.
+
+**Runner** - A runner runs a Beam pipeline using the capabilities of your chosen data processing engine.
+
+**Window** - A PCollection can be subdivided into windows based on the timestamps of the individual elements. Windows enable grouping operations over collections that grow over time by dividing the collection into windows of finite collections.
+
+**Watermark** - A watermark is a guess as to when all data in a certain window is expected to have arrived. This is needed because data isn’t always guaranteed to arrive in a pipeline in time order, or to always arrive at predictable intervals.

Review Comment:
   This is needed because data isn’t always guaranteed to arrive in a pipeline in event time order, or to always arrive at predictable intervals.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/example/csvExample.go:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// beam-playground:
+//   name: CSV
+//   description: CSV example.
+//   multifile: false
+//   context_line: 44
+//   categories:
+//     - Quickstart
+//   complexity: BASIC
+//   tags:
+//     - hellobeam
+
+package main
+
+import (
+    "context"
+    "fmt"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
+    "strconv"
+    "strings"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top"
+
+)
+
+func less(a, b float64) bool{
+    return a>b
+}
+
+func main() {
+    p, s := beam.NewPipelineWithRoot()
+
+    file := Read(s, "gs://apache-beam-samples/nyc_taxi/misc/sample1000.csv")
+
+    cost := applyTransform(s, file)
+
+    fixedSizeElements := top.Largest(s,cost,10,less)

Review Comment:
   Please add a comment explaining this



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles.
+You can access list of modules by clicking ‘<<’ button on the left . For each module, learning progress is displayed next to it.
+Throughout the tour, you will find list of learning materials, examples, exercises and challenges for you to complete.
+Learning units are accompanied by examples that you can review in the right pane, and run by clicking ‘Run’ button to see the output.
+Each module also contains a challenge based on the material learned. Try to solve as many as you can, and if you need help, just click on the ‘Hint’ button or examine the correct solution by clicking the ‘Solution’ button.
+Now let’s start the tour with learning core Beam principles.

Review Comment:
   Now let’s start the tour by learning some core Beam principles.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/description.md:
##########
@@ -0,0 +1,41 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Reading from text file
+
+You use one of the Beam-provided I/O adapters to read from an external source. The adapters vary in their exact usage, but all of them read from some external data source and return a `PCollection` whose elements represent the data records in that source.

Review Comment:
   You can use one of the Beam-provided I/O adapters to read from an external source. The adapters vary in their exact usage, but all of them read from some external data source and return a `PCollection` whose elements represent the data in that source.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and challenges throughout the course, you’ll be working with one of the datasets stored as csv files in either beam-examples, dataflow-samples buckets.

Review Comment:
   as csv files in the "beam-examples" or "dataflow-samples" buckets.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/creating-pipeline/description.md:
##########
@@ -0,0 +1,36 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating a pipeline
+
+The `Pipeline` abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a Pipeline object, and then using that object as the basis for creating the pipeline’s data sets as PCollections and its operations as `Transforms`.
+
+To use Beam, your driver program must first create an instance of the Beam SDK class Pipeline (typically in the main() function). When you create your `Pipeline`, you’ll also need to set some configuration options. You can set your pipeline’s configuration options programmatically, but it’s often easier to set the options ahead of time (or read them from the command line) and pass them to the Pipeline object when you create the object.
+
+```
+// beam.Init() is an initialization hook that must be called
+// near the beginning of main(), before creating a pipeline.
+beam.Init()
+
+// Create the Pipeline object and root scope.
+pipeline, scope := beam.NewPipelineWithRoot()
+```
+
+### Playground exercise
+
+You can find the full code of the above example in the playground window, which you can run and experiment with. And you can create a `pipeline`, `scope` separately, it is an alternative to `beam.NewPipelineWithRoot()`. It is convenient if manipulations are needed before creating an element.

Review Comment:
   You can find the full code of the above example in the playground window, where you can run the pipeline and experiment with it. You can create a `pipeline` and `scope` separately, as an alternative to using `beam.NewPipelineWithRoot()`. This can be convenient if manipulations are needed before creating an element.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/description.md:
##########
@@ -0,0 +1,41 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Reading from text file
+
+You use one of the Beam-provided I/O adapters to read from an external source. The adapters vary in their exact usage, but all of them read from some external data source and return a `PCollection` whose elements represent the data records in that source.
+
+Each data source adapter has a Read transform; to read, you must apply that transform to the Pipeline object itself.
+
+`TextIO.Read` , for example, reads from an external text file and returns a `PCollection` whose elements are of type String. Each String represents one line from the text file. Here’s how you would apply `TextIO.Read` to your Pipeline to create a `PCollection`:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    // Now create the PCollection by reading text files. Separate elements will be added for each line in the input file
+    lines :=  textio.Read(scope, 'gs://some/inputData.txt')
+
+}
+```
+
+### Playground exercise
+
+In the playground window, you can find an example that reads a king lear poem from the text file stored in the Google Storage bucket and fills PCollection with individual lines and then with individual words. Try it out and see what the output is.
+
+One of the differences you will see is that the output is much shorter than the input file itself. This is because the number of elements in the output `PCollection` is limited with the `top.Largest(s,lines,10,less)` transform. Use Sample.fixedSizeGlobally transform of is another technique you can use to troubleshoot and limit the output sent to the console for debugging purposes in case of large input datasets.

Review Comment:
   One of the differences you will see is that the output is much shorter than the input file itself. This is because the number of elements in the output `PCollection` is limited with the `top.Largest(s,lines,10,less)` transform. Another technique you can use to limit the output sent to the console for debugging purposes is the Sample.fixedSizeGlobally transform.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/description.md:
##########
@@ -0,0 +1,41 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Reading from text file
+
+You use one of the Beam-provided I/O adapters to read from an external source. The adapters vary in their exact usage, but all of them read from some external data source and return a `PCollection` whose elements represent the data records in that source.
+
+Each data source adapter has a Read transform; to read, you must apply that transform to the Pipeline object itself.
+
+`TextIO.Read` , for example, reads from an external text file and returns a `PCollection` whose elements are of type String. Each String represents one line from the text file. Here’s how you would apply `TextIO.Read` to your Pipeline to create a `PCollection`:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    // Now create the PCollection by reading text files. Separate elements will be added for each line in the input file
+    lines :=  textio.Read(scope, 'gs://some/inputData.txt')
+
+}
+```
+
+### Playground exercise
+
+In the playground window, you can find an example that reads a king lear poem from the text file stored in the Google Storage bucket and fills PCollection with individual lines and then with individual words. Try it out and see what the output is.

Review Comment:
   In the playground window, you can find an example that reads the Shakespeare play King Lear from the text file stored in the Google Storage bucket and fills PCollection with individual lines and then with individual words. Try it out and see what the output is.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/example/textIo.go:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+/*
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+*/
+// beam-playground:
+//   name: TextIO
+//   description: TextIO example.
+//   multifile: false
+//   context_line: 46
+//   categories:
+//     - Quickstart
+//   complexity: BASIC
+//   tags:
+//     - hellobeam
+
+package main
+
+import (
+    "context"
+    "fmt"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/filter"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
+    "regexp"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top"
+)
+
+var (
+    wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`)
+)
+
+func less(a, b string) bool{
+    return len(a)>len(b)
+}
+
+func main() {
+    p, s := beam.NewPipelineWithRoot()
+
+    file := Read(s, "gs://apache-beam-samples/shakespeare/kinglear.txt")
+
+    lines := getLines(s, file)
+    fixedSizeLines := top.Largest(s,lines,10,less)
+    output(s, "Lines: ", fixedSizeLines)
+
+    words := getWords(s,lines)
+    fixedSizeWords := top.Largest(s,words,10,less)
+    output(s, "Words: ", fixedSizeWords)
+
+    err := beamx.Run(context.Background(), p)
+    if err != nil {
+        log.Exitf(context.Background(), "Failed to execute job: %v", err)
+    }
+}
+
+// Read reads from fiename(s) specified by a glob string and a returns a PCollection<string>.
+func Read(s beam.Scope, glob string) beam.PCollection {
+    return textio.Read(s, glob)
+}
+
+// Read text file content line by line. resulting PCollection contains elements, where each element contains a single line of text from the input file.
+func getLines(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return filter.Include(s, input, func(element string) bool {
+        return element != ""
+    })
+}
+
+// getWords read text lines and split into PCollection of words.

Review Comment:
   getWords reads text lines and splits them into PCollection of words.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Overview
+
+Apache Beam provides a portable API layer for building sophisticated data-parallel processing `pipelines` that may be executed across a diversity of execution engines, or `runners`. The core concepts of this layer are based upon the Beam Model (formerly referred to as the Dataflow Model), and implemented to varying degrees in each Beam `runner`.
+
+### Direct runner
+The Direct Runner executes pipelines on your machine and is designed to validate that pipelines adhere to the Apache Beam model as closely as possible. Instead of focusing on efficient pipeline execution, the Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model. Some of these checks include:
+
+* enforcing immutability of elements
+* enforcing encodability of elements
+* elements are processed in an arbitrary order at all points
+* serialization of user functions (DoFn, CombineFn, etc.)
+
+Using the Direct Runner for testing and development helps ensure that pipelines are robust across different Beam runners. In addition, debugging failed runs can be a non-trivial task when a pipeline executes on a remote cluster. Instead, it is often faster and simpler to perform local unit testing on your pipeline code. Unit testing your pipeline locally also allows you to use your preferred local debugging tools.In the SDK Go, the default is runner **DirectRunner**.
+
+Additionally, you can read [here](https://beam.apache.org/documentation/runners/direct/)

Review Comment:
   Additionally, you can read -> You can read more 



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes in one of the Beam SDKs. Your driver program defines your pipeline, including all of the inputs, transforms, and outputs. It also sets execution options for your pipeline (typically passed by using command-line options). These include the Pipeline Runner, which, in turn, determines what back-end your pipeline will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of large-scale distributed data processing. The same Beam abstractions work with both batch and streaming data sources. When you create your Beam pipeline, you can think about your data processing task in terms of these abstractions. They include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from start to finish. This includes reading input data, transforming that data, and writing output data. All Beam driver programs must create a Pipeline. When you create the Pipeline, you must also specify the execution options that tell the Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline.
+
+→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects.

Review Comment:
   Every PTransform takes one or more PCollection objects as the input, performs a user provided processing function on all the elements of that PCollection, and then produces zero or more output PCollection objects. It is crucial to understand that the PTransform is applied to each element of the PCollection independently. 



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles.
+You can access list of modules by clicking ‘<<’ button on the left . For each module, learning progress is displayed next to it.
+Throughout the tour, you will find list of learning materials, examples, exercises and challenges for you to complete.

Review Comment:
   Throughout the tour, you will find learning materials, examples, exercises and challenges for you to complete.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Overview
+
+Apache Beam provides a portable API layer for building sophisticated data-parallel processing `pipelines` that may be executed across a diversity of execution engines, or `runners`. The core concepts of this layer are based upon the Beam Model (formerly referred to as the Dataflow Model), and implemented to varying degrees in each Beam `runner`.
+
+### Direct runner
+The Direct Runner executes pipelines on your machine and is designed to validate that pipelines adhere to the Apache Beam model as closely as possible. Instead of focusing on efficient pipeline execution, the Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model. Some of these checks include:
+
+* enforcing immutability of elements
+* enforcing encodability of elements
+* elements are processed in an arbitrary order at all points
+* serialization of user functions (DoFn, CombineFn, etc.)
+
+Using the Direct Runner for testing and development helps ensure that pipelines are robust across different Beam runners. In addition, debugging failed runs can be a non-trivial task when a pipeline executes on a remote cluster. Instead, it is often faster and simpler to perform local unit testing on your pipeline code. Unit testing your pipeline locally also allows you to use your preferred local debugging tools.In the SDK Go, the default is runner **DirectRunner**.
+
+Additionally, you can read [here](https://beam.apache.org/documentation/runners/direct/)
+
+#### Run example
+
+```
+$ go install github.com/apache/beam/sdks/v2/go/examples/wordcount
+$ wordcount --input <PATH_TO_INPUT_FILE> --output counts
+```
+
+### Google Cloud Dataflow runner
+
+The Google Cloud Dataflow uses the Cloud Dataflow managed service. When you run your pipeline with the Cloud Dataflow service, the runner uploads your executable code and dependencies to a Google Cloud Storage bucket and creates a Cloud Dataflow job, which executes your pipeline on managed resources in Google Cloud Platform. The Cloud Dataflow Runner and service are suitable for large scale, continuous jobs, and provide:
+* a fully managed service
+* autoscaling of the number of workers throughout the lifetime of the job
+* dynamic work rebalancing
+
+Additionally, you can read [here](https://beam.apache.org/documentation/runners/dataflow/)

Review Comment:
   You can read more



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] alxp1982 commented on a diff in pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
alxp1982 commented on code in PR #23085:
URL: https://github.com/apache/beam/pull/23085#discussion_r970889878


##########
learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/description.md:
##########
@@ -0,0 +1,134 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #23085:
URL: https://github.com/apache/beam/pull/23085#issuecomment-1240622860

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #23085:
URL: https://github.com/apache/beam/pull/23085#discussion_r1015497019


##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles.
+You can access the full list of modules by clicking ‘<<’ button on the left . For each module, learning progress is displayed next to it.
+Throughout the tour, you will find learning materials, examples, exercises and challenges for you to complete.
+Learning units are accompanied by code examples that you can review in the upper right pane. You can edit the code, or just run the example by clicking the ‘Run’. Output is displayed in the lower right pane.

Review Comment:
   ```suggestion
   Learning units are accompanied by code examples that you can review in the upper right pane. You can edit the code, or just run the example by clicking the ‘Run’ button. Output is displayed in the lower right pane.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm merged pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Posted by GitBox <gi...@apache.org>.
damccorm merged PR #23085:
URL: https://github.com/apache/beam/pull/23085


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org