You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/05/18 09:25:20 UTC

[GitHub] [beam] henryken commented on a change in pull request #11734: [BEAM-9679] Add Core Transforms section / GroupByKey lesson to the Go SDK katas

henryken commented on a change in pull request #11734:
URL: https://github.com/apache/beam/pull/11734#discussion_r426455910



##########
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/pkg/task/task.go
##########
@@ -0,0 +1,27 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package task
+
+import "github.com/apache/beam/sdks/go/pkg/beam"
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+	kv := beam.ParDo(s, splitByFirstChar, input)

Review comment:
       I think the whole function body should be part of answer placeholder.

##########
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/pkg/task/task.go
##########
@@ -0,0 +1,27 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package task
+
+import "github.com/apache/beam/sdks/go/pkg/beam"
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+	kv := beam.ParDo(s, splitByFirstChar, input)
+	return beam.GroupByKey(s, kv)
+}
+
+func splitByFirstChar(element string) (uint8, string) {

Review comment:
       This function is not doing any splitting. It creates a KV.
   I suggest finding a better function name.

##########
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/task.md
##########
@@ -0,0 +1,50 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+
+# GroupByKey
+
+GroupByKey is a Beam transform for processing collections of key/value pairs. It’s a parallel
+reduction operation, analogous to the Shuffle phase of a Map/Shuffle/Reduce-style algorithm. The
+input to GroupByKey is a collection of key/value pairs that represents a multimap, where the
+collection contains multiple pairs that have the same key, but different values. Given such a
+collection, you use GroupByKey to collect all of the values associated with each unique key.
+
+**Kata:** Implement a 
+<a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey">
+beam.GroupByKey</a> transform that groups words by its first letter.
+<div class="hint">
+  Refer to <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey">
+  beam.GroupByKey</a> to solve this problem.
+</div>
+<div class="hint">

Review comment:
       Can we provide this hint as the 3rd so that the theory parts are hinted first?

##########
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/task.md
##########
@@ -0,0 +1,50 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+
+# GroupByKey
+
+GroupByKey is a Beam transform for processing collections of key/value pairs. It’s a parallel
+reduction operation, analogous to the Shuffle phase of a Map/Shuffle/Reduce-style algorithm. The
+input to GroupByKey is a collection of key/value pairs that represents a multimap, where the
+collection contains multiple pairs that have the same key, but different values. Given such a
+collection, you use GroupByKey to collect all of the values associated with each unique key.
+
+**Kata:** Implement a 
+<a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey">
+beam.GroupByKey</a> transform that groups words by its first letter.
+<div class="hint">

Review comment:
       Add newline above for better readability

##########
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/pkg/task/task.go
##########
@@ -0,0 +1,27 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package task
+
+import "github.com/apache/beam/sdks/go/pkg/beam"
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+	kv := beam.ParDo(s, splitByFirstChar, input)
+	return beam.GroupByKey(s, kv)
+}
+
+func splitByFirstChar(element string) (uint8, string) {

Review comment:
       If we return using rune, the output might not be as intuitive.
   I suggest that we return as KV<string, string>.
   ![image](https://user-images.githubusercontent.com/5459430/82194481-78f12580-9929-11ea-8c95-7f7db90e8e61.png)
   

##########
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/test/task_test.go
##########
@@ -0,0 +1,57 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package test
+
+import (
+	"github.com/apache/beam/sdks/go/pkg/beam"
+	"github.com/apache/beam/sdks/go/pkg/beam/testing/ptest"
+	"groupbykey/pkg/task"
+	"reflect"
+	"testing"
+)
+
+func TestApplyTransform(t *testing.T) {
+	p, s := beam.NewPipelineWithRoot()
+	tests := []struct {
+		input beam.PCollection
+		want map[uint8][]string
+	}{
+		{
+			input: beam.Create(s, "apple", "ball", "car", "bear", "cheetah", "ant"),
+			want: map[uint8][]string{
+				97: []string{"apple", "ant"},
+				98: []string{"ball", "bear"},
+				99: []string{"car", "cheetah"},
+			},
+		},
+	}
+	for _, tt := range tests {
+		got := task.ApplyTransform(s, tt.input)
+		beam.ParDo0(s, func(key uint8, values func(*string) bool) {
+			var got []string
+			var v string
+			for values(&v) {
+				got = append(got, v)
+			}
+			if !reflect.DeepEqual(got, tt.want[key]) {
+				t.Errorf("ApplyTransform() = %v , want %v", got, tt.want[key])

Review comment:
       How about this error message to make it easier to understand?
   ```go
   t.Errorf("ApplyTransform() key = %v, got %v , want %v", key, got, tt.want[key])
   ```
   ![image](https://user-images.githubusercontent.com/5459430/82195266-a5597180-992a-11ea-93c0-de5c89c0d9be.png)
   

##########
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/task.md
##########
@@ -0,0 +1,50 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+
+# GroupByKey
+
+GroupByKey is a Beam transform for processing collections of key/value pairs. It’s a parallel
+reduction operation, analogous to the Shuffle phase of a Map/Shuffle/Reduce-style algorithm. The
+input to GroupByKey is a collection of key/value pairs that represents a multimap, where the
+collection contains multiple pairs that have the same key, but different values. Given such a
+collection, you use GroupByKey to collect all of the values associated with each unique key.
+
+**Kata:** Implement a 
+<a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey">
+beam.GroupByKey</a> transform that groups words by its first letter.
+<div class="hint">
+  Refer to <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey">
+  beam.GroupByKey</a> to solve this problem.
+</div>
+<div class="hint">
+  Providing your ParDo a func with two return values, such as below, will transform a PCollection&lt;B&gt; 

Review comment:
       It was not apparent to me when reading this hint. Can help to improve further, something similar to below if it makes sense?
   `To return as KV struct, you can return two values from your DoFn. The first return value represents the Key, and the second return value represents the Value.`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org