You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/05/27 03:30:47 UTC

[GitHub] [beam] lostluck commented on a change in pull request #11803: [BEAM-9679] Add a CoGroupByKey lesson to the Core Transforms section

lostluck commented on a change in pull request #11803:
URL: https://github.com/apache/beam/pull/11803#discussion_r430512377



##########
File path: learning/katas/go/Core Transforms/CoGroupByKey/CoGroupByKey/pkg/task/task.go
##########
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package task
+
+import (
+	"fmt"
+	"github.com/apache/beam/sdks/go/pkg/beam"
+)
+
+func ApplyTransform(s beam.Scope, fruits beam.PCollection, countries beam.PCollection) beam.PCollection {
+	fruitsKV := beam.ParDo(s, func(e string) (string, string) {
+		return string(e[0]), e
+	}, fruits)
+
+	countriesKV := beam.ParDo(s, func(e string) (string, string) {
+		return string(e[0]), e
+	}, countries)
+
+	grouped := beam.CoGroupByKey(s, fruitsKV, countriesKV)
+	return beam.ParDo(s, func(key string, f func(*string) bool, c func(*string) bool, emit func(string)) {

Review comment:
       +1 to using unambiguous names here. 
   In this case it's especially important to disambiguate between the iterator parameters (currently f and c) and the emiter parameter since they're both function parameters.
   eg, using iter1, iter2 would be better than unrelated single character names. Also +1 to doing a global cleanup instead of immediately back updating everything previously. That said, there's no harm in starting in *this* PR.

##########
File path: learning/katas/go/Core Transforms/CoGroupByKey/CoGroupByKey/task.md
##########
@@ -0,0 +1,104 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+
+# CoGroupByKey
+
+CoGroupByKey performs a relational join of two or more key/value PCollections that have the same 
+key type.
+
+**Kata:** Implement a [beam.CoGroupByKey](https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey) 
+transform that join words by the first alphabetical letter, and then produces the string representation of the 
+WordsAlphabet model.
+
+<div class="hint">
+    Refer to
+    <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey">beam.CoGroupByKey</a>
+    to solve this problem.
+</div>
+
+<div class="hint">
+  Refer to the Beam Programming Guide
+  <a href="https://beam.apache.org/documentation/programming-guide/#cogroupbykey">
+    "CoGroupByKey"</a> section for more information.
+</div>
+
+<div class="hint">
+  Think of this problem in three stages.  First, create key/value pairs of PCollections called KV
+  for fruits and countries, pairing the first character with the word.  Next, apply CoGroupByKey to the KVs
+  followed by a ParDo.
+</div>
+
+<div class="hint">
+  In the last lesson we learned how to make key/value PCollections called KV.  Now we have 
+  two to make from fruits and countries.
+  
+  To return as a KV, you can return two values from your DoFn. The first return value represents the Key, and 
+  the second return value represents the Value.  An example is shown below.
+  
+```
+func doFn(element string) (string, string) {
+    key := string(element[0])
+    value := element
+    return key, value
+}
+``` 
+</div>
+
+<div class="hint">
+  In the last lesson we learned that 
+  <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey">
+  beam.GroupByKey</a> takes a single KV.
+  <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey">beam.CoGroupByKey</a>
+  takes more than one KV.
+</div>
+
+<div class="hint">
+  Our final step in this problem requires a
+  <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo">beam.ParDo</a>
+  with a DoFn that's different than what we've seen in previous lessons.  In the previous step we should
+  have a PCollection acquired from CoGroupByKey.  A ParDo for that PCollection expects a DoFn that looks
+  like the following. 
+  
+  ```
+  func doFn(key string, aKV func(*string) bool, anotherKV func(*string) bool, emit func(string)){

Review comment:
       [GroupByKey](https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam?utm_source=backtogodoc#GroupByKey) by itself does have a single example attached to it, but [CoGBK](https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam?utm_source=backtogodoc#CoGroupByKey) does not, nor does it explain it's relationship to GBK (it's a generalization of GBK that groups multiple PCollection KVs into a CoGBK<K, V1, V2..> )
   
   The go docs definitely be improved in this regard.

##########
File path: learning/katas/go/Core Transforms/CoGroupByKey/CoGroupByKey/task.md
##########
@@ -0,0 +1,104 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+
+# CoGroupByKey
+
+CoGroupByKey performs a relational join of two or more key/value PCollections that have the same 
+key type.
+
+**Kata:** Implement a [beam.CoGroupByKey](https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey) 
+transform that join words by the first alphabetical letter, and then produces the string representation of the 
+WordsAlphabet model.
+
+<div class="hint">
+    Refer to
+    <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey">beam.CoGroupByKey</a>
+    to solve this problem.
+</div>
+
+<div class="hint">
+  Refer to the Beam Programming Guide
+  <a href="https://beam.apache.org/documentation/programming-guide/#cogroupbykey">
+    "CoGroupByKey"</a> section for more information.
+</div>
+
+<div class="hint">
+  Think of this problem in three stages.  First, create key/value pairs of PCollections called KV
+  for fruits and countries, pairing the first character with the word.  Next, apply CoGroupByKey to the KVs
+  followed by a ParDo.
+</div>
+
+<div class="hint">
+  In the last lesson we learned how to make key/value PCollections called KV.  Now we have 
+  two to make from fruits and countries.
+  
+  To return as a KV, you can return two values from your DoFn. The first return value represents the Key, and 
+  the second return value represents the Value.  An example is shown below.
+  
+```
+func doFn(element string) (string, string) {
+    key := string(element[0])
+    value := element
+    return key, value
+}
+``` 
+</div>
+
+<div class="hint">
+  In the last lesson we learned that 
+  <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey">
+  beam.GroupByKey</a> takes a single KV.
+  <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#CoGroupByKey">beam.CoGroupByKey</a>
+  takes more than one KV.
+</div>
+
+<div class="hint">
+  Our final step in this problem requires a
+  <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo">beam.ParDo</a>
+  with a DoFn that's different than what we've seen in previous lessons.  In the previous step we should
+  have a PCollection acquired from CoGroupByKey.  A ParDo for that PCollection expects a DoFn that looks
+  like the following. 
+  
+  ```
+  func doFn(key string, aKV func(*string) bool, anotherKV func(*string) bool, emit func(string)){

Review comment:
       I did write a better guide for the Go SDK last year, but never go around to exporting it from Google internal. It's on my list of things to do before declaring the SDK is no longer experimental.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org