You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/19 14:35:26 UTC

[GitHub] [beam] damccorm commented on a diff in pull request #25080: Tour of beam learning materials CI/CD refactoring and templating

damccorm commented on code in PR #25080:
URL: https://github.com/apache/beam/pull/25080#discussion_r1081345275


##########
learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -45,6 +58,19 @@ In java, you need to set runner to `args` when you start the program.
 ```
 --runner=DirectRunner
 ```
+{{end}}
+
+{{if (eq .Sdk "python")}}
+In the SDK Python, the default is runner **DirectRunner**.

Review Comment:
   ```suggestion
   In the Python SDK, the default is runner **DirectRunner**.
   ```



##########
learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -95,7 +136,21 @@ java -jar target/beam-examples-bundled-1.0.0.jar \
   --region=<GCP_REGION> \
   --tempLocation=gs://<YOUR_GCS_BUCKET>/temp/
 ```
+{{end}}
+{{if (eq .Sdk "python")}}
+```
+# As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                         --output gs://YOUR_GCS_BUCKET/counts \
+                                         --runner DataflowRunner \
+                                         --project YOUR_GCP_PROJECT \
+                                         --region YOUR_GCP_REGION \
+                                         --temp_location gs://YOUR_GCS_BUCKET/tmp/
+```
+{{end}}
 
+{{if (eq .Sdk "java" "python")}}

Review Comment:
   Is there a reason this is limited to Java/Python? Flink can run pipelines for any language



##########
learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -95,7 +136,21 @@ java -jar target/beam-examples-bundled-1.0.0.jar \
   --region=<GCP_REGION> \
   --tempLocation=gs://<YOUR_GCS_BUCKET>/temp/
 ```
+{{end}}
+{{if (eq .Sdk "python")}}
+```
+# As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                         --output gs://YOUR_GCS_BUCKET/counts \
+                                         --runner DataflowRunner \
+                                         --project YOUR_GCP_PROJECT \
+                                         --region YOUR_GCP_REGION \
+                                         --temp_location gs://YOUR_GCS_BUCKET/tmp/
+```
+{{end}}
 
+{{if (eq .Sdk "java" "python")}}

Review Comment:
   Same for Spark, Samza, etc...



##########
learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -25,6 +25,19 @@ The Direct Runner executes pipelines on your machine and is designed to validate
 
 Using the Direct Runner for testing and development helps ensure that pipelines are robust across different Beam runners. In addition, debugging failed runs can be a non-trivial task when a pipeline executes on a remote cluster. Instead, it is often faster and simpler to perform local unit testing on your pipeline code. Unit testing your pipeline locally also allows you to use your preferred local debugging tools.
 
+{{if (eq .Sdk "go")}}
+In the SDK Go, the default is runner **DirectRunner**.

Review Comment:
   ```suggestion
   In the Go SDK, the default is runner **DirectRunner**.
   ```



##########
learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -25,6 +25,19 @@ The Direct Runner executes pipelines on your machine and is designed to validate
 
 Using the Direct Runner for testing and development helps ensure that pipelines are robust across different Beam runners. In addition, debugging failed runs can be a non-trivial task when a pipeline executes on a remote cluster. Instead, it is often faster and simpler to perform local unit testing on your pipeline code. Unit testing your pipeline locally also allows you to use your preferred local debugging tools.
 
+{{if (eq .Sdk "go")}}
+In the SDK Go, the default is runner **DirectRunner**.
+
+Additionally, you can read [here](https://beam.apache.org/documentation/runners/direct/)

Review Comment:
   ```suggestion
   Additionally, you can read more about the Direct Runner [here](https://beam.apache.org/documentation/runners/direct/)
   ```



##########
learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -95,7 +136,21 @@ java -jar target/beam-examples-bundled-1.0.0.jar \
   --region=<GCP_REGION> \
   --tempLocation=gs://<YOUR_GCS_BUCKET>/temp/
 ```
+{{end}}
+{{if (eq .Sdk "python")}}
+```
+# As part of the initial setup, install Google Cloud Platform specific extra components.
+pip install apache-beam[gcp]
+python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
+                                         --output gs://YOUR_GCS_BUCKET/counts \
+                                         --runner DataflowRunner \
+                                         --project YOUR_GCP_PROJECT \
+                                         --region YOUR_GCP_REGION \
+                                         --temp_location gs://YOUR_GCS_BUCKET/tmp/
+```
+{{end}}
 
+{{if (eq .Sdk "java" "python")}}

Review Comment:
   Is it just because we don't have instructions for running the pipeline with Go yet?



##########
learning/tour-of-beam/backend/internal/fs_content/load.go:
##########
@@ -97,27 +115,57 @@ func collectUnit(infopath string, ctx *sdkContext) (unit *tob.Unit, err error) {
 	return builder.Build(), err
 }
 
+func processTemplate(source []byte, sdk tob.Sdk) ([]byte, error) {
+	t := template.New("")
+	t, err := t.Parse(string(source))
+	if err != nil {
+		return nil, err
+	}
+
+	var output bytes.Buffer
+	err = t.Execute(&output, struct{ Sdk tob.Sdk }{Sdk: sdk})
+	if err != nil {
+		return nil, err
+	}
+
+	return output.Bytes(), nil
+}
+
 func collectGroup(infopath string, ctx *sdkContext) (*tob.Group, error) {
 	info := loadLearningGroupInfo(infopath)
+
+	supported, err := isSupportedSdk(info.Sdk, ctx, infopath)
+	if err != nil {
+		return nil, err
+	}
+	if !supported {
+		log.Printf("Group %v at %v not supported in %v\n", info.Id, infopath, ctx.sdk)
+		return nil, nil
+	}
+
 	log.Printf("Found Group %v metadata at %v\n", info.Name, infopath)
 	group := tob.Group{Id: info.Id, Title: info.Name}
 	for _, item := range info.Content {
 		node, err := collectNode(filepath.Join(infopath, "..", item), ctx)
 		if err != nil {
 			return &group, err
 		}
-		group.Nodes = append(group.Nodes, node)
+		if node == nil {
+			continue
+		}
+		group.Nodes = append(group.Nodes, *node)
 	}
 
 	return &group, nil
 }
 
 // Collect node which is either a unit or a group.
-func collectNode(rootpath string, ctx *sdkContext) (node tob.Node, err error) {
+func collectNode(rootpath string, ctx *sdkContext) (*tob.Node, error) {
 	files, err := os.ReadDir(rootpath)
 	if err != nil {
-		return node, err
+		return nil, err
 	}
+	node := &tob.Node{}
 	for _, f := range files {

Review Comment:
   This isn't technically a problem with this code change, but I think this loop has a potential problem. If the first file has an associated error, but the second does not it will reset `err` to `nil`
   
   Probably we should be early returning if `err != nil` after the switch



##########
learning/tour-of-beam/learning-content/introduction/introduction-guide/unit-info.yaml:
##########
@@ -17,6 +17,9 @@
 # under the License.
 #
 
+sdk:
+  - Java
+  - Python
+  - Go
 id: guide
 name: Tour of Beam Guide
-complexity: BASIC

Review Comment:
   Was removing this line intentional?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org