You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "sirenbyte (via GitHub)" <gi...@apache.org> on 2023/02/01 12:08:55 UTC

[GitHub] [beam] sirenbyte opened a new pull request, #25253: add core transforms

sirenbyte opened a new pull request, #25253:
URL: https://github.com/apache/beam/pull/25253

   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on pull request #25253: [Tour of Beam] Learning content for "Core Transforms" module

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on PR #25253:
URL: https://github.com/apache/beam/pull/25253#issuecomment-1536400747

   LGTM, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #25253: [Tour of Beam] Learning content for "Core Transforms" module

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #25253:
URL: https://github.com/apache/beam/pull/25253#issuecomment-1536589344

   ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/25253?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#25253](https://app.codecov.io/gh/apache/beam/pull/25253?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5638a84) into [master](https://app.codecov.io/gh/apache/beam/commit/5772bf74cbe556277976e4883b27e37b7a7e87f0?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5772bf7) will **increase** coverage by `0.89%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #25253      +/-   ##
   ==========================================
   + Coverage   71.17%   72.07%   +0.89%     
   ==========================================
     Files         787      745      -42     
     Lines      103293   101192    -2101     
   ==========================================
   - Hits        73522    72937     -585     
   + Misses      28283    26795    -1488     
   + Partials     1488     1460      -28     
   ```
   
   
   [see 91 files with indirect coverage changes](https://app.codecov.io/gh/apache/beam/pull/25253/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] sirenbyte commented on a diff in pull request #25253: [Tour of Beam] Learning content for "Core Transforms" module

Posted by "sirenbyte (via GitHub)" <gi...@apache.org>.
sirenbyte commented on code in PR #25253:
URL: https://github.com/apache/beam/pull/25253#discussion_r1176090018


##########
learning/tour-of-beam/learning-content/core-transforms/additional-outputs/description.md:
##########
@@ -0,0 +1,322 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Additional outputs
+{{if (eq .Sdk "go")}}
+While `beam.ParDo` always produces an output `PCollection`, your `DoFn` can produce any number of additional output `PCollection`s, or even none at all. If you choose to have multiple outputs, your `DoFn` needs to be called with the `ParDo` function that matches the number of outputs. `beam.ParDo2` for two output `PCollection`s, `beam.ParDo3` for three and so on until `beam.ParDo7`. If you need more, you can use `beam.ParDoN` which will return a `[]beam.PCollection`.
+
+### Tags for multiple outputs
+
+The Go SDK doesn’t use output tags, and instead uses positional ordering for multiple output `PCollection`s.
+
+```
+// beam.ParDo3 returns PCollections in the same order as
+// the emit function parameters in processWords.
+below, above, marked := beam.ParDo3(s, processWords, words)
+
+// processWordsMixed uses both a standard return and an emitter function.
+// The standard return produces the first PCollection from beam.ParDo2,
+// and the emitter produces the second PCollection.
+length, mixedMarked := beam.ParDo2(s, processWordsMixed, words)
+```
+
+### Emitting to multiple outputs in your DoFn
+
+Call emitter functions as needed to produce 0 or more elements for its matching `PCollection`. The same value can be emitted with multiple emitters. As normal, do not mutate values after emitting them from any emitter.
+
+All emitters should be registered using a generic `register.EmitterX[...]` function. This optimizes runtime execution of the emitter.
+
+`DoFn`s can also return a single element via the standard return. The standard return is always the first PCollection returned from `beam.ParDo`. Other emitters output to their own `PCollection`s in their defined parameter order.
+
+```
+// processWords is a DoFn that has 3 output PCollections. The emitter functions
+// are matched in positional order to the PCollections returned by beam.ParDo3.
+func processWords(word string, emitBelowCutoff, emitAboveCutoff, emitMarked func(string)) {
+	const cutOff = 5
+	if len(word) < cutOff {
+		emitBelowCutoff(word)
+	} else {
+		emitAboveCutoff(word)
+	}
+	if isMarkedWord(word) {
+		emitMarked(word)
+	}
+}
+```
+
+### Accessing additional parameters in your DoFn
+
+In addition to the element, Beam will populate other parameters to your DoFn’s `ProcessElement` method. Any combination of these parameters can be added to your process method in a standard order.
+
+**context.Context**: To support consolidated logging and user defined metrics, a `context.Context` parameter can be requested. Per Go conventions, if present it’s required to be the first parameter of the `DoFn` method.
+
+```
+func MyDoFn(ctx context.Context, word string) string { ... }
+```
+
+**Timestamp**: To access the timestamp of an input element, add a `beam.EventTime` parameter before the element. For example:
+
+```
+func MyDoFn(ts beam.EventTime, word string) string { ... }
+```
+
+**Window**: To access the window an input element falls into, add a `beam.Window` parameter before the element. If an element falls in multiple windows (for example, this will happen when using `SlidingWindows`), then the `ProcessElement` method will be invoked multiple time for the element, once for each window. Since `beam.Window` is an interface it’s possible to type assert to the concrete implementation of the window. For example, when fixed windows are being used, the window is of type `window.IntervalWindow`.
+
+```
+func MyDoFn(w beam.Window, word string) string {
+  iw := w.(window.IntervalWindow)
+  ...
+}
+```
+
+**PaneInfo**: When triggers are used, Beam provides `beam.PaneInfo` object that contains information about the current firing. Using `beam.PaneInfo` you can determine whether this is an early or a late firing, and how many times this window has already fired for this key.
+
+```
+func extractWordsFn(pn beam.PaneInfo, line string, emitWords func(string)) {
+	if pn.Timing == typex.PaneEarly || pn.Timing == typex.PaneOnTime {
+		// ... perform operation ...
+	}
+	if pn.Timing == typex.PaneLate {
+		// ... perform operation ...
+	}
+	if pn.IsFirst {
+		// ... perform operation ...
+	}
+	if pn.IsLast {
+		// ... perform operation ...
+	}
+
+	words := strings.Split(line, " ")
+	for _, w := range words {
+		emitWords(w)
+	}
+}
+```
+{{end}}
+{{if (eq .Sdk "java")}}
+While `ParDo` always outputs the main output of `PCollection` (as a return value from apply), you can also force your `ParDo` to output any number of additional `PCollection` outputs. If you decide to have multiple outputs, your `ParDo` will return all the `PCollection` output (including the main output) combined. This will be useful when you are working with big data or a database that needs to be divided into different collections. You get a combined `PCollectionTuple`, you can use `TupleTag` to get a `PCollection`.
+
+A `PCollectionTuple` is an immutable tuple of heterogeneously typed `PCollection`, "with keys" `TupleTags`. A `PCollectionTuple` can be used as input or output for `PTransform` receiving or creating multiple `PCollection` inputs or outputs, which can be of different types, for example, `ParDo` with multiple outputs.
+
+A `TupleTag` is a typed tag used as the key of a heterogeneously typed tuple, for example `PCollectionTuple`. Its general type parameter allows you to track the static type of things stored in tuples.
+
+### Tags for multiple outputs
+
+```
+  .of(new DoFn<String, String>() {

Review Comment:
   done



##########
learning/tour-of-beam/learning-content/core-transforms/additional-outputs/description.md:
##########
@@ -0,0 +1,322 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Additional outputs
+{{if (eq .Sdk "go")}}
+While `beam.ParDo` always produces an output `PCollection`, your `DoFn` can produce any number of additional output `PCollection`s, or even none at all. If you choose to have multiple outputs, your `DoFn` needs to be called with the `ParDo` function that matches the number of outputs. `beam.ParDo2` for two output `PCollection`s, `beam.ParDo3` for three and so on until `beam.ParDo7`. If you need more, you can use `beam.ParDoN` which will return a `[]beam.PCollection`.
+
+### Tags for multiple outputs
+
+The Go SDK doesn’t use output tags, and instead uses positional ordering for multiple output `PCollection`s.
+
+```
+// beam.ParDo3 returns PCollections in the same order as
+// the emit function parameters in processWords.
+below, above, marked := beam.ParDo3(s, processWords, words)
+
+// processWordsMixed uses both a standard return and an emitter function.
+// The standard return produces the first PCollection from beam.ParDo2,
+// and the emitter produces the second PCollection.
+length, mixedMarked := beam.ParDo2(s, processWordsMixed, words)
+```

Review Comment:
   done



##########
learning/tour-of-beam/learning-content/core-transforms/additional-outputs/description.md:
##########
@@ -0,0 +1,322 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Additional outputs
+{{if (eq .Sdk "go")}}
+While `beam.ParDo` always produces an output `PCollection`, your `DoFn` can produce any number of additional output `PCollection`s, or even none at all. If you choose to have multiple outputs, your `DoFn` needs to be called with the `ParDo` function that matches the number of outputs. `beam.ParDo2` for two output `PCollection`s, `beam.ParDo3` for three and so on until `beam.ParDo7`. If you need more, you can use `beam.ParDoN` which will return a `[]beam.PCollection`.
+
+### Tags for multiple outputs
+
+The Go SDK doesn’t use output tags, and instead uses positional ordering for multiple output `PCollection`s.
+
+```
+// beam.ParDo3 returns PCollections in the same order as
+// the emit function parameters in processWords.
+below, above, marked := beam.ParDo3(s, processWords, words)
+

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] sirenbyte closed pull request #25253: [Tour of Beam] Learning content for "Core Transforms" module

Posted by "sirenbyte (via GitHub)" <gi...@apache.org>.
sirenbyte closed pull request #25253: [Tour of Beam] Learning content for "Core Transforms" module
URL: https://github.com/apache/beam/pull/25253


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #25253: [Tour of Beam] Learning content for "Core Transforms" module

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #25253:
URL: https://github.com/apache/beam/pull/25253#discussion_r1129815431


##########
learning/tour-of-beam/learning-content/core-transforms/additional-outputs/description.md:
##########
@@ -0,0 +1,322 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Additional outputs
+{{if (eq .Sdk "go")}}
+While `beam.ParDo` always produces an output `PCollection`, your `DoFn` can produce any number of additional output `PCollection`s, or even none at all. If you choose to have multiple outputs, your `DoFn` needs to be called with the `ParDo` function that matches the number of outputs. `beam.ParDo2` for two output `PCollection`s, `beam.ParDo3` for three and so on until `beam.ParDo7`. If you need more, you can use `beam.ParDoN` which will return a `[]beam.PCollection`.
+
+### Tags for multiple outputs
+
+The Go SDK doesn’t use output tags, and instead uses positional ordering for multiple output `PCollection`s.
+
+```
+// beam.ParDo3 returns PCollections in the same order as
+// the emit function parameters in processWords.
+below, above, marked := beam.ParDo3(s, processWords, words)
+

Review Comment:
   Please add a comment here for clarity
   // Now below has the output of s, above has the output of processWords, and marked has the output of words



##########
learning/tour-of-beam/learning-content/core-transforms/additional-outputs/description.md:
##########
@@ -0,0 +1,322 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Additional outputs
+{{if (eq .Sdk "go")}}
+While `beam.ParDo` always produces an output `PCollection`, your `DoFn` can produce any number of additional output `PCollection`s, or even none at all. If you choose to have multiple outputs, your `DoFn` needs to be called with the `ParDo` function that matches the number of outputs. `beam.ParDo2` for two output `PCollection`s, `beam.ParDo3` for three and so on until `beam.ParDo7`. If you need more, you can use `beam.ParDoN` which will return a `[]beam.PCollection`.
+
+### Tags for multiple outputs
+
+The Go SDK doesn’t use output tags, and instead uses positional ordering for multiple output `PCollection`s.
+
+```
+// beam.ParDo3 returns PCollections in the same order as
+// the emit function parameters in processWords.
+below, above, marked := beam.ParDo3(s, processWords, words)
+
+// processWordsMixed uses both a standard return and an emitter function.
+// The standard return produces the first PCollection from beam.ParDo2,
+// and the emitter produces the second PCollection.
+length, mixedMarked := beam.ParDo2(s, processWordsMixed, words)
+```

Review Comment:
   same here



##########
learning/tour-of-beam/learning-content/core-transforms/additional-outputs/description.md:
##########
@@ -0,0 +1,322 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Additional outputs
+{{if (eq .Sdk "go")}}
+While `beam.ParDo` always produces an output `PCollection`, your `DoFn` can produce any number of additional output `PCollection`s, or even none at all. If you choose to have multiple outputs, your `DoFn` needs to be called with the `ParDo` function that matches the number of outputs. `beam.ParDo2` for two output `PCollection`s, `beam.ParDo3` for three and so on until `beam.ParDo7`. If you need more, you can use `beam.ParDoN` which will return a `[]beam.PCollection`.
+
+### Tags for multiple outputs
+
+The Go SDK doesn’t use output tags, and instead uses positional ordering for multiple output `PCollection`s.
+
+```
+// beam.ParDo3 returns PCollections in the same order as
+// the emit function parameters in processWords.
+below, above, marked := beam.ParDo3(s, processWords, words)
+
+// processWordsMixed uses both a standard return and an emitter function.
+// The standard return produces the first PCollection from beam.ParDo2,
+// and the emitter produces the second PCollection.
+length, mixedMarked := beam.ParDo2(s, processWordsMixed, words)
+```
+
+### Emitting to multiple outputs in your DoFn
+
+Call emitter functions as needed to produce 0 or more elements for its matching `PCollection`. The same value can be emitted with multiple emitters. As normal, do not mutate values after emitting them from any emitter.
+
+All emitters should be registered using a generic `register.EmitterX[...]` function. This optimizes runtime execution of the emitter.
+
+`DoFn`s can also return a single element via the standard return. The standard return is always the first PCollection returned from `beam.ParDo`. Other emitters output to their own `PCollection`s in their defined parameter order.
+
+```
+// processWords is a DoFn that has 3 output PCollections. The emitter functions
+// are matched in positional order to the PCollections returned by beam.ParDo3.
+func processWords(word string, emitBelowCutoff, emitAboveCutoff, emitMarked func(string)) {
+	const cutOff = 5
+	if len(word) < cutOff {
+		emitBelowCutoff(word)
+	} else {
+		emitAboveCutoff(word)
+	}
+	if isMarkedWord(word) {
+		emitMarked(word)
+	}
+}
+```
+
+### Accessing additional parameters in your DoFn
+
+In addition to the element, Beam will populate other parameters to your DoFn’s `ProcessElement` method. Any combination of these parameters can be added to your process method in a standard order.
+
+**context.Context**: To support consolidated logging and user defined metrics, a `context.Context` parameter can be requested. Per Go conventions, if present it’s required to be the first parameter of the `DoFn` method.
+
+```
+func MyDoFn(ctx context.Context, word string) string { ... }
+```
+
+**Timestamp**: To access the timestamp of an input element, add a `beam.EventTime` parameter before the element. For example:
+
+```
+func MyDoFn(ts beam.EventTime, word string) string { ... }
+```
+
+**Window**: To access the window an input element falls into, add a `beam.Window` parameter before the element. If an element falls in multiple windows (for example, this will happen when using `SlidingWindows`), then the `ProcessElement` method will be invoked multiple time for the element, once for each window. Since `beam.Window` is an interface it’s possible to type assert to the concrete implementation of the window. For example, when fixed windows are being used, the window is of type `window.IntervalWindow`.
+
+```
+func MyDoFn(w beam.Window, word string) string {
+  iw := w.(window.IntervalWindow)
+  ...
+}
+```
+
+**PaneInfo**: When triggers are used, Beam provides `beam.PaneInfo` object that contains information about the current firing. Using `beam.PaneInfo` you can determine whether this is an early or a late firing, and how many times this window has already fired for this key.
+
+```
+func extractWordsFn(pn beam.PaneInfo, line string, emitWords func(string)) {
+	if pn.Timing == typex.PaneEarly || pn.Timing == typex.PaneOnTime {
+		// ... perform operation ...
+	}
+	if pn.Timing == typex.PaneLate {
+		// ... perform operation ...
+	}
+	if pn.IsFirst {
+		// ... perform operation ...
+	}
+	if pn.IsLast {
+		// ... perform operation ...
+	}
+
+	words := strings.Split(line, " ")
+	for _, w := range words {
+		emitWords(w)
+	}
+}
+```
+{{end}}
+{{if (eq .Sdk "java")}}
+While `ParDo` always outputs the main output of `PCollection` (as a return value from apply), you can also force your `ParDo` to output any number of additional `PCollection` outputs. If you decide to have multiple outputs, your `ParDo` will return all the `PCollection` output (including the main output) combined. This will be useful when you are working with big data or a database that needs to be divided into different collections. You get a combined `PCollectionTuple`, you can use `TupleTag` to get a `PCollection`.
+
+A `PCollectionTuple` is an immutable tuple of heterogeneously typed `PCollection`, "with keys" `TupleTags`. A `PCollectionTuple` can be used as input or output for `PTransform` receiving or creating multiple `PCollection` inputs or outputs, which can be of different types, for example, `ParDo` with multiple outputs.
+
+A `TupleTag` is a typed tag used as the key of a heterogeneously typed tuple, for example `PCollectionTuple`. Its general type parameter allows you to track the static type of things stored in tuples.
+
+### Tags for multiple outputs
+
+```
+  .of(new DoFn<String, String>() {

Review Comment:
   ParDo.of



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] olehborysevych commented on a diff in pull request #25253: [Tour of Beam] Learning content for "Core Transforms" module

Posted by "olehborysevych (via GitHub)" <gi...@apache.org>.
olehborysevych commented on code in PR #25253:
URL: https://github.com/apache/beam/pull/25253#discussion_r1169794761


##########
learning/tour-of-beam/learning-content/core-transforms/additional-outputs/description.md:
##########
@@ -0,0 +1,322 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Additional outputs
+{{if (eq .Sdk "go")}}
+While `beam.ParDo` always produces an output `PCollection`, your `DoFn` can produce any number of additional output `PCollection`s, or even none at all. If you choose to have multiple outputs, your `DoFn` needs to be called with the `ParDo` function that matches the number of outputs. `beam.ParDo2` for two output `PCollection`s, `beam.ParDo3` for three and so on until `beam.ParDo7`. If you need more, you can use `beam.ParDoN` which will return a `[]beam.PCollection`.
+
+### Tags for multiple outputs
+
+The Go SDK doesn’t use output tags, and instead uses positional ordering for multiple output `PCollection`s.
+
+```
+// beam.ParDo3 returns PCollections in the same order as
+// the emit function parameters in processWords.
+below, above, marked := beam.ParDo3(s, processWords, words)
+
+// processWordsMixed uses both a standard return and an emitter function.
+// The standard return produces the first PCollection from beam.ParDo2,
+// and the emitter produces the second PCollection.
+length, mixedMarked := beam.ParDo2(s, processWordsMixed, words)
+```
+
+### Emitting to multiple outputs in your DoFn
+
+Call emitter functions as needed to produce 0 or more elements for its matching `PCollection`. The same value can be emitted with multiple emitters. As normal, do not mutate values after emitting them from any emitter.
+
+All emitters should be registered using a generic `register.EmitterX[...]` function. This optimizes runtime execution of the emitter.
+
+`DoFn`s can also return a single element via the standard return. The standard return is always the first PCollection returned from `beam.ParDo`. Other emitters output to their own `PCollection`s in their defined parameter order.
+
+```
+// processWords is a DoFn that has 3 output PCollections. The emitter functions
+// are matched in positional order to the PCollections returned by beam.ParDo3.
+func processWords(word string, emitBelowCutoff, emitAboveCutoff, emitMarked func(string)) {
+	const cutOff = 5
+	if len(word) < cutOff {
+		emitBelowCutoff(word)
+	} else {
+		emitAboveCutoff(word)
+	}
+	if isMarkedWord(word) {
+		emitMarked(word)
+	}
+}
+```
+
+### Accessing additional parameters in your DoFn
+
+In addition to the element, Beam will populate other parameters to your DoFn’s `ProcessElement` method. Any combination of these parameters can be added to your process method in a standard order.
+
+**context.Context**: To support consolidated logging and user defined metrics, a `context.Context` parameter can be requested. Per Go conventions, if present it’s required to be the first parameter of the `DoFn` method.
+
+```
+func MyDoFn(ctx context.Context, word string) string { ... }
+```
+
+**Timestamp**: To access the timestamp of an input element, add a `beam.EventTime` parameter before the element. For example:
+
+```
+func MyDoFn(ts beam.EventTime, word string) string { ... }
+```
+
+**Window**: To access the window an input element falls into, add a `beam.Window` parameter before the element. If an element falls in multiple windows (for example, this will happen when using `SlidingWindows`), then the `ProcessElement` method will be invoked multiple time for the element, once for each window. Since `beam.Window` is an interface it’s possible to type assert to the concrete implementation of the window. For example, when fixed windows are being used, the window is of type `window.IntervalWindow`.
+
+```
+func MyDoFn(w beam.Window, word string) string {
+  iw := w.(window.IntervalWindow)
+  ...
+}
+```
+
+**PaneInfo**: When triggers are used, Beam provides `beam.PaneInfo` object that contains information about the current firing. Using `beam.PaneInfo` you can determine whether this is an early or a late firing, and how many times this window has already fired for this key.
+
+```
+func extractWordsFn(pn beam.PaneInfo, line string, emitWords func(string)) {
+	if pn.Timing == typex.PaneEarly || pn.Timing == typex.PaneOnTime {
+		// ... perform operation ...
+	}
+	if pn.Timing == typex.PaneLate {
+		// ... perform operation ...
+	}
+	if pn.IsFirst {
+		// ... perform operation ...
+	}
+	if pn.IsLast {
+		// ... perform operation ...
+	}
+
+	words := strings.Split(line, " ")
+	for _, w := range words {
+		emitWords(w)
+	}
+}
+```
+{{end}}
+{{if (eq .Sdk "java")}}
+While `ParDo` always outputs the main output of `PCollection` (as a return value from apply), you can also force your `ParDo` to output any number of additional `PCollection` outputs. If you decide to have multiple outputs, your `ParDo` will return all the `PCollection` output (including the main output) combined. This will be useful when you are working with big data or a database that needs to be divided into different collections. You get a combined `PCollectionTuple`, you can use `TupleTag` to get a `PCollection`.
+
+A `PCollectionTuple` is an immutable tuple of heterogeneously typed `PCollection`, "with keys" `TupleTags`. A `PCollectionTuple` can be used as input or output for `PTransform` receiving or creating multiple `PCollection` inputs or outputs, which can be of different types, for example, `ParDo` with multiple outputs.
+
+A `TupleTag` is a typed tag used as the key of a heterogeneously typed tuple, for example `PCollectionTuple`. Its general type parameter allows you to track the static type of things stored in tuples.
+
+### Tags for multiple outputs
+
+```
+  .of(new DoFn<String, String>() {

Review Comment:
   Hey @nausharipov could you please check if this markdown issue is caused by some flutter issue or we can fix this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org