You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "sirenbyte (via GitHub)" <gi...@apache.org> on 2023/03/10 08:02:36 UTC

[GitHub] [beam] sirenbyte commented on a diff in pull request #23577: [Tour of Beam] Learning content for "Common-transforms" module

sirenbyte commented on code in PR #23577:
URL: https://github.com/apache/beam/pull/23577#discussion_r1132057446


##########
learning/tour-of-beam/learning-content/common-transforms/aggregation/count/description.md:
##########
@@ -0,0 +1,311 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Count
+
+`Count` provides many transformations for calculating the count of values in a `PCollection`, either globally or for each key.
+
+{{if (eq .Sdk "go")}}
+Counts the number of elements within each aggregation. The Count transform has two varieties:
+
+You can count the number of elements in a `PCollection` with `CountElms()`, it will return one element.
+
+```
+import (
+    "github.com/apache/beam/sdks/go/pkg/beam"
+    "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return stats.CountElms(s, input)
+}
+```
+
+You can use `Count()` to count how many elements are associated with a particular key. The result will be one output for each key.
+
+```
+import (
+    "github.com/apache/beam/sdks/go/pkg/beam"
+    "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return stats.Count(s, input)
+}
+```
+{{end}}
+{{if (eq .Sdk "java")}}
+Counts the number of elements within each aggregation. The Count transform has three varieties:
+
+### Counting all elements in a PCollection
+
+`Count.globally()` counts the number of elements in the entire `PCollection`. The result is a collection with a single element.
+
+```
+PCollection<Integer> input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
+PCollection<Long> output = input.apply(Count.globally());
+```
+
+Output
+```
+10
+```
+
+### Counting elements for each key
+
+`Count.perKey()` counts how many elements are associated with each key. It ignores the values. The resulting collection has one output for every key in the input collection.
+
+```
+PCollection<KV<String, Integer>> input = pipeline.apply(
+    Create.of(KV.of("🥕", 3),
+              KV.of("🥕", 2),
+              KV.of("🍆", 1),
+              KV.of("🍅", 4),
+              KV.of("🍅", 5),
+              KV.of("🍅", 3)));
+PCollection<KV<String, Long>> output = input.apply(Count.perKey());
+```
+
+Output
+
+```
+KV{🥕, 2}
+KV{🍅, 3}
+KV{🍆, 1}
+```
+
+### Counting all unique elements
+
+`Count.perElement()` counts how many times each element appears in the input collection. The output collection is a key-value pair, containing each unique element and the number of times it appeared in the original collection.
+
+```
+PCollection<KV<String, Integer>> input = pipeline.apply(
+    Create.of(KV.of("🥕", 3),
+              KV.of("🥕", 2),
+              KV.of("🍆", 1),
+              KV.of("🍅", 3),
+              KV.of("🍅", 5),
+              KV.of("🍅", 3)));
+PCollection<KV<String, Long>> output = input.apply(Count.perElement());
+```
+
+Output
+
+```
+KV{KV{🍅, 3}, 2}
+KV{KV{🥕, 2}, 1}
+KV{KV{🍆, 1}, 1}
+KV{KV{🥕, 3}, 1}
+KV{KV{🍅, 5}, 1}
+```
+{{end}}
+{{if (eq .Sdk "python")}}
+### Counting all elements in a PCollection
+
+You can use `Count.Globally()` to count all elements in a PCollection, even if there are duplicate elements.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as p:
+  total_elements = (
+      p

Review Comment:
   Done



##########
learning/tour-of-beam/learning-content/common-transforms/aggregation/count/description.md:
##########
@@ -0,0 +1,311 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Count
+
+`Count` provides many transformations for calculating the count of values in a `PCollection`, either globally or for each key.
+
+{{if (eq .Sdk "go")}}
+Counts the number of elements within each aggregation. The Count transform has two varieties:
+
+You can count the number of elements in a `PCollection` with `CountElms()`, it will return one element.
+
+```
+import (
+    "github.com/apache/beam/sdks/go/pkg/beam"
+    "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return stats.CountElms(s, input)
+}
+```
+
+You can use `Count()` to count how many elements are associated with a particular key. The result will be one output for each key.
+
+```
+import (
+    "github.com/apache/beam/sdks/go/pkg/beam"
+    "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return stats.Count(s, input)
+}
+```
+{{end}}
+{{if (eq .Sdk "java")}}
+Counts the number of elements within each aggregation. The Count transform has three varieties:
+
+### Counting all elements in a PCollection
+
+`Count.globally()` counts the number of elements in the entire `PCollection`. The result is a collection with a single element.
+
+```
+PCollection<Integer> input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
+PCollection<Long> output = input.apply(Count.globally());
+```
+
+Output
+```
+10
+```
+
+### Counting elements for each key
+
+`Count.perKey()` counts how many elements are associated with each key. It ignores the values. The resulting collection has one output for every key in the input collection.
+
+```
+PCollection<KV<String, Integer>> input = pipeline.apply(
+    Create.of(KV.of("🥕", 3),
+              KV.of("🥕", 2),
+              KV.of("🍆", 1),
+              KV.of("🍅", 4),
+              KV.of("🍅", 5),
+              KV.of("🍅", 3)));
+PCollection<KV<String, Long>> output = input.apply(Count.perKey());
+```
+
+Output
+
+```
+KV{🥕, 2}
+KV{🍅, 3}
+KV{🍆, 1}
+```
+
+### Counting all unique elements
+
+`Count.perElement()` counts how many times each element appears in the input collection. The output collection is a key-value pair, containing each unique element and the number of times it appeared in the original collection.
+
+```
+PCollection<KV<String, Integer>> input = pipeline.apply(
+    Create.of(KV.of("🥕", 3),
+              KV.of("🥕", 2),
+              KV.of("🍆", 1),
+              KV.of("🍅", 3),
+              KV.of("🍅", 5),
+              KV.of("🍅", 3)));
+PCollection<KV<String, Long>> output = input.apply(Count.perElement());
+```
+
+Output
+
+```
+KV{KV{🍅, 3}, 2}
+KV{KV{🥕, 2}, 1}
+KV{KV{🍆, 1}, 1}
+KV{KV{🥕, 3}, 1}
+KV{KV{🍅, 5}, 1}
+```
+{{end}}
+{{if (eq .Sdk "python")}}
+### Counting all elements in a PCollection
+
+You can use `Count.Globally()` to count all elements in a PCollection, even if there are duplicate elements.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as p:
+  total_elements = (
+      p
+      | 'Create plants' >> beam.Create(
+          ['🍓', '🥕', '🥕', '🥕', '🍆', '🍆', '🍅', '🍅', '🍅', '🌽'])
+      | 'Count all elements' >> beam.combiners.Count.Globally()
+      | beam.Map(print))
+```
+
+Output
+
+```
+10
+```
+
+### Counting elements for each key
+
+You can use `Count.PerKey()` to count the elements for each unique key in a PCollection of key-values.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as p:
+  total_elements_per_keys = (
+      p

Review Comment:
   Done



##########
learning/tour-of-beam/learning-content/common-transforms/aggregation/count/description.md:
##########
@@ -0,0 +1,311 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Count
+
+`Count` provides many transformations for calculating the count of values in a `PCollection`, either globally or for each key.
+
+{{if (eq .Sdk "go")}}
+Counts the number of elements within each aggregation. The Count transform has two varieties:
+
+You can count the number of elements in a `PCollection` with `CountElms()`, it will return one element.
+
+```
+import (
+    "github.com/apache/beam/sdks/go/pkg/beam"
+    "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return stats.CountElms(s, input)
+}
+```
+
+You can use `Count()` to count how many elements are associated with a particular key. The result will be one output for each key.
+
+```
+import (
+    "github.com/apache/beam/sdks/go/pkg/beam"
+    "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return stats.Count(s, input)
+}
+```
+{{end}}
+{{if (eq .Sdk "java")}}
+Counts the number of elements within each aggregation. The Count transform has three varieties:
+
+### Counting all elements in a PCollection
+
+`Count.globally()` counts the number of elements in the entire `PCollection`. The result is a collection with a single element.
+
+```
+PCollection<Integer> input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
+PCollection<Long> output = input.apply(Count.globally());
+```
+
+Output
+```
+10
+```
+
+### Counting elements for each key
+
+`Count.perKey()` counts how many elements are associated with each key. It ignores the values. The resulting collection has one output for every key in the input collection.
+
+```
+PCollection<KV<String, Integer>> input = pipeline.apply(
+    Create.of(KV.of("🥕", 3),
+              KV.of("🥕", 2),
+              KV.of("🍆", 1),
+              KV.of("🍅", 4),
+              KV.of("🍅", 5),
+              KV.of("🍅", 3)));
+PCollection<KV<String, Long>> output = input.apply(Count.perKey());
+```
+
+Output
+
+```
+KV{🥕, 2}
+KV{🍅, 3}
+KV{🍆, 1}
+```
+
+### Counting all unique elements
+
+`Count.perElement()` counts how many times each element appears in the input collection. The output collection is a key-value pair, containing each unique element and the number of times it appeared in the original collection.
+
+```
+PCollection<KV<String, Integer>> input = pipeline.apply(
+    Create.of(KV.of("🥕", 3),
+              KV.of("🥕", 2),
+              KV.of("🍆", 1),
+              KV.of("🍅", 3),
+              KV.of("🍅", 5),
+              KV.of("🍅", 3)));
+PCollection<KV<String, Long>> output = input.apply(Count.perElement());
+```
+
+Output
+
+```
+KV{KV{🍅, 3}, 2}
+KV{KV{🥕, 2}, 1}
+KV{KV{🍆, 1}, 1}
+KV{KV{🥕, 3}, 1}
+KV{KV{🍅, 5}, 1}
+```
+{{end}}
+{{if (eq .Sdk "python")}}
+### Counting all elements in a PCollection
+
+You can use `Count.Globally()` to count all elements in a PCollection, even if there are duplicate elements.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as p:
+  total_elements = (
+      p
+      | 'Create plants' >> beam.Create(
+          ['🍓', '🥕', '🥕', '🥕', '🍆', '🍆', '🍅', '🍅', '🍅', '🌽'])
+      | 'Count all elements' >> beam.combiners.Count.Globally()
+      | beam.Map(print))
+```
+
+Output
+
+```
+10
+```
+
+### Counting elements for each key
+
+You can use `Count.PerKey()` to count the elements for each unique key in a PCollection of key-values.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as p:
+  total_elements_per_keys = (
+      p
+      | 'Create plants' >> beam.Create([
+          ('spring', '🍓'),
+          ('spring', '🥕'),
+          ('summer', '🥕'),
+          ('fall', '🥕'),
+          ('spring', '🍆'),
+          ('winter', '🍆'),
+          ('spring', '🍅'),
+          ('summer', '🍅'),
+          ('fall', '🍅'),
+          ('summer', '🌽'),
+      ])
+      | 'Count elements per key' >> beam.combiners.Count.PerKey()
+      | beam.Map(print))
+```
+
+Output
+
+```
+('spring', 4)
+('summer', 3)
+('fall', 2)
+('winter', 1)
+```
+
+### Counting all unique elements
+
+You can use `Count.PerElement()` to count only the unique elements in a `PCollection`.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as p:
+  total_unique_elements = (
+      p

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org