You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "sirenbyte (via GitHub)" <gi...@apache.org> on 2023/02/01 14:48:03 UTC

[GitHub] [beam] sirenbyte opened a new pull request, #25257: add new format for triggers

sirenbyte opened a new pull request, #25257:
URL: https://github.com/apache/beam/pull/25257

   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on pull request #25257: [Tour of Beam] Learning content for "Triggers" module

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on PR #25257:
URL: https://github.com/apache/beam/pull/25257#issuecomment-1536401843

   LGTM, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] alxp1982 commented on a diff in pull request #25257: [Tour of Beam] Learning content for "Triggers" module

Posted by "alxp1982 (via GitHub)" <gi...@apache.org>.
alxp1982 commented on code in PR #25257:
URL: https://github.com/apache/beam/pull/25257#discussion_r1107859379


##########
learning/tour-of-beam/learning-content/triggers/concept/description.md:
##########
@@ -0,0 +1,147 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Trigger

Review Comment:
   ### Triggers



##########
learning/tour-of-beam/learning-content/triggers/concept/description.md:
##########
@@ -0,0 +1,147 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Trigger
+
+When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window (referred to as a pane). If you use Beam’s default windowing configuration and default trigger, Beam outputs the aggregated result when it estimates all data has arrived, and discards all subsequent data for that window.

Review Comment:
   When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window (referred to as a pane). If you use Beam’s default windowing configuration and default trigger, Beam outputs the aggregated result when it estimates all data has arrived and discards all subsequent data for that window.



##########
learning/tour-of-beam/learning-content/triggers/concept/description.md:
##########
@@ -0,0 +1,147 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Trigger
+
+When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window (referred to as a pane). If you use Beam’s default windowing configuration and default trigger, Beam outputs the aggregated result when it estimates all data has arrived, and discards all subsequent data for that window.
+
+You can set triggers for your PCollections to change this default behavior. Beam provides a number of pre-built triggers that you can set:

Review Comment:
   You can set triggers for your PCollections to change this default behavior. Beam provides several pre-built triggers that you can set:



##########
learning/tour-of-beam/learning-content/triggers/concept/description.md:
##########
@@ -0,0 +1,147 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Trigger
+
+When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window (referred to as a pane). If you use Beam’s default windowing configuration and default trigger, Beam outputs the aggregated result when it estimates all data has arrived, and discards all subsequent data for that window.
+
+You can set triggers for your PCollections to change this default behavior. Beam provides a number of pre-built triggers that you can set:
+
+Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam’s default trigger is event time-based.

Review Comment:
   - ***Event time triggers***. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam’s default trigger is event time-based.
   - ***Processing time triggers***. These triggers operate on the processing time – the time when the data element is processed at any given stage in the pipeline.
   - ***Data-driven triggers***. These triggers operate by examining the data as it arrives in each window and firing when that data meets a certain property. Currently, data-driven triggers only support firing after a certain number of data elements.
   - ***Composite triggers***. These triggers combine multiple triggers in various ways.



##########
learning/tour-of-beam/learning-content/triggers/concept/description.md:
##########
@@ -0,0 +1,147 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Trigger
+
+When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window (referred to as a pane). If you use Beam’s default windowing configuration and default trigger, Beam outputs the aggregated result when it estimates all data has arrived, and discards all subsequent data for that window.
+
+You can set triggers for your PCollections to change this default behavior. Beam provides a number of pre-built triggers that you can set:
+
+Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam’s default trigger is event time-based.
+Processing time triggers. These triggers operate on the processing time – the time when the data element is processed at any given stage in the pipeline.
+Data-driven triggers. These triggers operate by examining the data as it arrives in each window, and firing when that data meets a certain property. Currently, data-driven triggers only support firing after a certain number of data elements.
+Composite triggers. These triggers combine multiple triggers in various ways.
+
+### Handling late data
+
+If you want your pipeline to process data that arrives after the watermark passes the end of the window, you can apply an allowed lateness when you set your windowing configuration. This gives your trigger the opportunity to react to the late data. If allowed lateness is set, the default trigger will emit new results immediately whenever late data arrives.
+
+You set the allowed lateness by using `.withAllowedLateness()` when you set your windowing function:
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> pc = ...;
+pc.apply(Window.<String>into(FixedWindows.of(1, TimeUnit.MINUTES))
+                              .triggering(AfterProcessingTime.pastFirstElementInPane()
+                                                             .plusDelayOf(Duration.standardMinutes(1)))
+                              .withAllowedLateness(Duration.standardMinutes(30));
+```
+{{end}}
+{{if (eq .Sdk "go")}}
+```
+allowedToBeLateItems := beam.WindowInto(s,
+	window.NewFixedWindows(1*time.Minute), pcollection,
+	beam.Trigger(trigger.AfterProcessingTime().
+		PlusDelay(1*time.Minute)),
+	beam.AllowedLateness(30*time.Minute),
+)
+```
+{{end}}
+{{if (eq .Sdk "python")}}
+```
+pc = [Initial PCollection]
+pc | beam.WindowInto(
+            FixedWindows(60),
+            trigger=AfterProcessingTime(60),
+            allowed_lateness=1800) # 30 minutes
+     | ...
+```
+{{end}}
+
+
+* AfterAll - the example of composite trigger. It fires when all sub-triggers defined through of(List<Trigger> triggers) method are ready.
+* AfterEach - the sub-triggers are defined in inOrder(List<Trigger> triggers) method. The sub-trigger are executed in order, one by one.
+* AfterFirst - executes when at least one of defined sub-triggers fires. As AfterAll trigger, AfterFirst also defines the sub-triggers in of(...) method.
+* AfterPane - it's an example of data-driven trigger. It uses elementCountAtLeast(int countElems) method to define the minimal number of accumulated items before executing the trigger. It's important to know that even if this threshold is never reached, the trigger can execute for the lower number.
+* AfterProcessingTime - as the name indicates, it's an example of processing time-based trigger. It defines 2 methods to control trigger firing. The first one is called plusDelayOf(final Duration delay). It defines the interval of time during which new elements are accumulated. The second method, alignedTo(final Duration period, final Instant offset) does the same but in additional it adds specified period to the defined offset. Thus if we define the offset at 2017-01-01 10:00 and we allow the period of 4 minutes, it'll accept the data betweenn 10:00 and 10:04.
+* AfterWatermark - its method pastEndOfWindow() creates a trigger firing the pane after the end of the window. It also has more fine-grained access because it allows the definition for early results (withEarlyFirings(OnceTrigger earlyFirings), produced before the end of the window) and late results (withLateFirings(OnceTrigger lateFirings), produced after the end of the window and before the end of watermark).
+* DefaultTrigger - it's the class used by default that is an equivalent to repeatable execution of AfterWatermark trigger.
+* NeverTrigger - the pane is fired only after the passed window plus allowed lateness delay.
+* OrFinallyTrigger - it's a special kind of trigger constructed through Trigger's orFinally(OnceTrigger until) method. With the call to this method the main trigger executes until the moment when until trigger is fired.
+* Repeatedly - helps to execute given trigger repeatedly. The sub-trigger is defined in forever(Trigger repeated) method. That said even if we defined a AfterPane.elementCountAtLeast(2) as a repeatable sub-trigger, it won't stop after the first 2 elements in the pane but will continue the execution for every new 2 items.
+
+### Window accumulation
+
+When you specify a trigger, you must also set the window’s accumulation mode. When a trigger fires, it emits the current contents of the window as a pane. Since a trigger can fire multiple times, the accumulation mode determines whether the system accumulates the window panes as the trigger fires, or discards them.
+
+To set a window to accumulate the panes that are produced when the trigger fires, invoke.accumulatingFiredPanes() when you set the trigger. To set a window to discard fired panes, invoke .discardingFiredPanes().
+
+Let’s look an example that uses a PCollection with fixed-time windowing and a data-based trigger. This is something you might do if, for example, each window represented a ten-minute running average, but you wanted to display the current value of the average in a UI more frequently than every ten minutes. We’ll assume the following conditions:
+
+→ The PCollection uses 10-minute fixed-time windows.
+→ The PCollection has a repeating trigger that fires every time 3 elements arrive.
+
+The following diagram shows data events for key X as they arrive in the PCollection and are assigned to windows. To keep the diagram a bit simpler, we’ll assume that the events all arrive in the pipeline in order.

Review Comment:
   We can't put diagrams, so remove this section or come up with a description instead



##########
learning/tour-of-beam/learning-content/triggers/data-driven-trigger/description.md:
##########
@@ -0,0 +1,21 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Data driven triggers
+
+A data-driven trigger is a type of trigger in Apache Beam that fires when specific conditions on the data being processed are met. Unlike processing time triggers, which fire at regular intervals based on the current system time, data-driven triggers are based on the data itself.
+
+For example, a data-driven trigger might fire when a certain number of elements have been processed, or when the values of the elements reach a specific threshold. Data-driven triggers are particularly useful when processing time-sensitive data where it's important to respond to changes in the data as they happen.

Review Comment:
   For example, a data-driven trigger might fire when a certain number of elements have been processed or when the values of the elements reach a specific threshold. Data-driven triggers are particularly useful when processing time-sensitive data, where it's important to respond to changes in the data as they happen.	



##########
learning/tour-of-beam/learning-content/triggers/motivating-challenge/description.md:
##########
@@ -0,0 +1,25 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Common Transforms motivating challenge
+
+You are provided with a "PCollection" from the array of the taxi order price in a csv file. Your task is to set the `trigger` to 10 elements. And another `trigger` for every minute.

Review Comment:
   Can we come up with something to cover several trigger types? 



##########
learning/tour-of-beam/learning-content/triggers/motivating-challenge/description.md:
##########
@@ -0,0 +1,25 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Common Transforms motivating challenge

Review Comment:
   It isn't a Common Transforms module, but a Triggers module. Also please replace 'challenge' with 'exercise' 



##########
learning/tour-of-beam/learning-content/triggers/processing-trigger/description.md:
##########
@@ -0,0 +1,25 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Processing time trigger
+
+Processing time trigger is a trigger in Apache Beam that fires based on the current processing time of the pipeline. Unlike the event time trigger that fires based on the timestamps of the elements, the processing time trigger fires based on the actual time that the elements are processed by the pipeline.
+
+Processing time triggers are used in windowing operations to specify when the window should be closed and its elements should be emitted. The processing time trigger can be set to fire after a fixed interval, after a set of elements have been processed, or when a processing time timer fires.
+
+The following accumulation modes are available with processing time triggers:
+
+`Discarding`: any late data is discarded and only the data that arrives before the trigger fires is processed.
+
+`Accumulating`: late data is included and the trigger fires whenever the trigger conditions are met.

Review Comment:
   Please add example description and challenge



##########
learning/tour-of-beam/learning-content/triggers/concept/description.md:
##########
@@ -0,0 +1,147 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Trigger
+
+When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window (referred to as a pane). If you use Beam’s default windowing configuration and default trigger, Beam outputs the aggregated result when it estimates all data has arrived, and discards all subsequent data for that window.
+
+You can set triggers for your PCollections to change this default behavior. Beam provides a number of pre-built triggers that you can set:
+
+Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam’s default trigger is event time-based.
+Processing time triggers. These triggers operate on the processing time – the time when the data element is processed at any given stage in the pipeline.
+Data-driven triggers. These triggers operate by examining the data as it arrives in each window, and firing when that data meets a certain property. Currently, data-driven triggers only support firing after a certain number of data elements.
+Composite triggers. These triggers combine multiple triggers in various ways.
+
+### Handling late data
+
+If you want your pipeline to process data that arrives after the watermark passes the end of the window, you can apply an allowed lateness when you set your windowing configuration. This gives your trigger the opportunity to react to the late data. If allowed lateness is set, the default trigger will emit new results immediately whenever late data arrives.
+
+You set the allowed lateness by using `.withAllowedLateness()` when you set your windowing function:
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> pc = ...;
+pc.apply(Window.<String>into(FixedWindows.of(1, TimeUnit.MINUTES))
+                              .triggering(AfterProcessingTime.pastFirstElementInPane()
+                                                             .plusDelayOf(Duration.standardMinutes(1)))
+                              .withAllowedLateness(Duration.standardMinutes(30));
+```
+{{end}}
+{{if (eq .Sdk "go")}}
+```
+allowedToBeLateItems := beam.WindowInto(s,
+	window.NewFixedWindows(1*time.Minute), pcollection,
+	beam.Trigger(trigger.AfterProcessingTime().
+		PlusDelay(1*time.Minute)),
+	beam.AllowedLateness(30*time.Minute),
+)
+```
+{{end}}
+{{if (eq .Sdk "python")}}
+```
+pc = [Initial PCollection]
+pc | beam.WindowInto(
+            FixedWindows(60),
+            trigger=AfterProcessingTime(60),
+            allowed_lateness=1800) # 30 minutes
+     | ...
+```
+{{end}}
+
+

Review Comment:
   What is this list below? Built-in triggers? If so, let's say - Beam SDK provides various built-in triggers: 



##########
learning/tour-of-beam/learning-content/triggers/concept/description.md:
##########
@@ -0,0 +1,147 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Trigger
+
+When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window (referred to as a pane). If you use Beam’s default windowing configuration and default trigger, Beam outputs the aggregated result when it estimates all data has arrived, and discards all subsequent data for that window.
+
+You can set triggers for your PCollections to change this default behavior. Beam provides a number of pre-built triggers that you can set:
+
+Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam’s default trigger is event time-based.
+Processing time triggers. These triggers operate on the processing time – the time when the data element is processed at any given stage in the pipeline.
+Data-driven triggers. These triggers operate by examining the data as it arrives in each window, and firing when that data meets a certain property. Currently, data-driven triggers only support firing after a certain number of data elements.
+Composite triggers. These triggers combine multiple triggers in various ways.
+
+### Handling late data
+
+If you want your pipeline to process data that arrives after the watermark passes the end of the window, you can apply an allowed lateness when you set your windowing configuration. This gives your trigger the opportunity to react to the late data. If allowed lateness is set, the default trigger will emit new results immediately whenever late data arrives.
+
+You set the allowed lateness by using `.withAllowedLateness()` when you set your windowing function:
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> pc = ...;
+pc.apply(Window.<String>into(FixedWindows.of(1, TimeUnit.MINUTES))
+                              .triggering(AfterProcessingTime.pastFirstElementInPane()
+                                                             .plusDelayOf(Duration.standardMinutes(1)))
+                              .withAllowedLateness(Duration.standardMinutes(30));
+```
+{{end}}
+{{if (eq .Sdk "go")}}
+```
+allowedToBeLateItems := beam.WindowInto(s,
+	window.NewFixedWindows(1*time.Minute), pcollection,
+	beam.Trigger(trigger.AfterProcessingTime().
+		PlusDelay(1*time.Minute)),
+	beam.AllowedLateness(30*time.Minute),
+)
+```
+{{end}}
+{{if (eq .Sdk "python")}}
+```
+pc = [Initial PCollection]
+pc | beam.WindowInto(
+            FixedWindows(60),
+            trigger=AfterProcessingTime(60),
+            allowed_lateness=1800) # 30 minutes
+     | ...
+```
+{{end}}
+
+
+* AfterAll - the example of composite trigger. It fires when all sub-triggers defined through of(List<Trigger> triggers) method are ready.
+* AfterEach - the sub-triggers are defined in inOrder(List<Trigger> triggers) method. The sub-trigger are executed in order, one by one.
+* AfterFirst - executes when at least one of defined sub-triggers fires. As AfterAll trigger, AfterFirst also defines the sub-triggers in of(...) method.
+* AfterPane - it's an example of data-driven trigger. It uses elementCountAtLeast(int countElems) method to define the minimal number of accumulated items before executing the trigger. It's important to know that even if this threshold is never reached, the trigger can execute for the lower number.
+* AfterProcessingTime - as the name indicates, it's an example of processing time-based trigger. It defines 2 methods to control trigger firing. The first one is called plusDelayOf(final Duration delay). It defines the interval of time during which new elements are accumulated. The second method, alignedTo(final Duration period, final Instant offset) does the same but in additional it adds specified period to the defined offset. Thus if we define the offset at 2017-01-01 10:00 and we allow the period of 4 minutes, it'll accept the data betweenn 10:00 and 10:04.
+* AfterWatermark - its method pastEndOfWindow() creates a trigger firing the pane after the end of the window. It also has more fine-grained access because it allows the definition for early results (withEarlyFirings(OnceTrigger earlyFirings), produced before the end of the window) and late results (withLateFirings(OnceTrigger lateFirings), produced after the end of the window and before the end of watermark).
+* DefaultTrigger - it's the class used by default that is an equivalent to repeatable execution of AfterWatermark trigger.
+* NeverTrigger - the pane is fired only after the passed window plus allowed lateness delay.
+* OrFinallyTrigger - it's a special kind of trigger constructed through Trigger's orFinally(OnceTrigger until) method. With the call to this method the main trigger executes until the moment when until trigger is fired.
+* Repeatedly - helps to execute given trigger repeatedly. The sub-trigger is defined in forever(Trigger repeated) method. That said even if we defined a AfterPane.elementCountAtLeast(2) as a repeatable sub-trigger, it won't stop after the first 2 elements in the pane but will continue the execution for every new 2 items.
+
+### Window accumulation

Review Comment:
   Consider moving to a separate unit - this one becomes too big



##########
learning/tour-of-beam/learning-content/triggers/concept/description.md:
##########
@@ -0,0 +1,147 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Trigger
+
+When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window (referred to as a pane). If you use Beam’s default windowing configuration and default trigger, Beam outputs the aggregated result when it estimates all data has arrived, and discards all subsequent data for that window.
+
+You can set triggers for your PCollections to change this default behavior. Beam provides a number of pre-built triggers that you can set:
+
+Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam’s default trigger is event time-based.
+Processing time triggers. These triggers operate on the processing time – the time when the data element is processed at any given stage in the pipeline.
+Data-driven triggers. These triggers operate by examining the data as it arrives in each window, and firing when that data meets a certain property. Currently, data-driven triggers only support firing after a certain number of data elements.
+Composite triggers. These triggers combine multiple triggers in various ways.
+
+### Handling late data
+
+If you want your pipeline to process data that arrives after the watermark passes the end of the window, you can apply an allowed lateness when you set your windowing configuration. This gives your trigger the opportunity to react to the late data. If allowed lateness is set, the default trigger will emit new results immediately whenever late data arrives.
+
+You set the allowed lateness by using `.withAllowedLateness()` when you set your windowing function:
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> pc = ...;
+pc.apply(Window.<String>into(FixedWindows.of(1, TimeUnit.MINUTES))
+                              .triggering(AfterProcessingTime.pastFirstElementInPane()
+                                                             .plusDelayOf(Duration.standardMinutes(1)))
+                              .withAllowedLateness(Duration.standardMinutes(30));
+```
+{{end}}
+{{if (eq .Sdk "go")}}
+```
+allowedToBeLateItems := beam.WindowInto(s,
+	window.NewFixedWindows(1*time.Minute), pcollection,
+	beam.Trigger(trigger.AfterProcessingTime().
+		PlusDelay(1*time.Minute)),
+	beam.AllowedLateness(30*time.Minute),
+)
+```
+{{end}}
+{{if (eq .Sdk "python")}}
+```
+pc = [Initial PCollection]
+pc | beam.WindowInto(
+            FixedWindows(60),
+            trigger=AfterProcessingTime(60),
+            allowed_lateness=1800) # 30 minutes
+     | ...
+```
+{{end}}
+
+
+* AfterAll - the example of composite trigger. It fires when all sub-triggers defined through of(List<Trigger> triggers) method are ready.
+* AfterEach - the sub-triggers are defined in inOrder(List<Trigger> triggers) method. The sub-trigger are executed in order, one by one.
+* AfterFirst - executes when at least one of defined sub-triggers fires. As AfterAll trigger, AfterFirst also defines the sub-triggers in of(...) method.
+* AfterPane - it's an example of data-driven trigger. It uses elementCountAtLeast(int countElems) method to define the minimal number of accumulated items before executing the trigger. It's important to know that even if this threshold is never reached, the trigger can execute for the lower number.
+* AfterProcessingTime - as the name indicates, it's an example of processing time-based trigger. It defines 2 methods to control trigger firing. The first one is called plusDelayOf(final Duration delay). It defines the interval of time during which new elements are accumulated. The second method, alignedTo(final Duration period, final Instant offset) does the same but in additional it adds specified period to the defined offset. Thus if we define the offset at 2017-01-01 10:00 and we allow the period of 4 minutes, it'll accept the data betweenn 10:00 and 10:04.
+* AfterWatermark - its method pastEndOfWindow() creates a trigger firing the pane after the end of the window. It also has more fine-grained access because it allows the definition for early results (withEarlyFirings(OnceTrigger earlyFirings), produced before the end of the window) and late results (withLateFirings(OnceTrigger lateFirings), produced after the end of the window and before the end of watermark).
+* DefaultTrigger - it's the class used by default that is an equivalent to repeatable execution of AfterWatermark trigger.
+* NeverTrigger - the pane is fired only after the passed window plus allowed lateness delay.
+* OrFinallyTrigger - it's a special kind of trigger constructed through Trigger's orFinally(OnceTrigger until) method. With the call to this method the main trigger executes until the moment when until trigger is fired.
+* Repeatedly - helps to execute given trigger repeatedly. The sub-trigger is defined in forever(Trigger repeated) method. That said even if we defined a AfterPane.elementCountAtLeast(2) as a repeatable sub-trigger, it won't stop after the first 2 elements in the pane but will continue the execution for every new 2 items.
+
+### Window accumulation
+
+When you specify a trigger, you must also set the window’s accumulation mode. When a trigger fires, it emits the current contents of the window as a pane. Since a trigger can fire multiple times, the accumulation mode determines whether the system accumulates the window panes as the trigger fires, or discards them.
+
+To set a window to accumulate the panes that are produced when the trigger fires, invoke.accumulatingFiredPanes() when you set the trigger. To set a window to discard fired panes, invoke .discardingFiredPanes().
+
+Let’s look an example that uses a PCollection with fixed-time windowing and a data-based trigger. This is something you might do if, for example, each window represented a ten-minute running average, but you wanted to display the current value of the average in a UI more frequently than every ten minutes. We’ll assume the following conditions:

Review Comment:
   Let’s look at an example that uses a PCollection with fixed-time windowing and a data-based trigger. This is something you might do if, for example, each window represented a ten-minute running average, but you wanted to display the current value of the average in a UI more frequently than every ten minutes. We’ll assume the following conditions:



##########
learning/tour-of-beam/learning-content/triggers/data-driven-trigger/description.md:
##########
@@ -0,0 +1,21 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Data driven triggers
+
+A data-driven trigger is a type of trigger in Apache Beam that fires when specific conditions on the data being processed are met. Unlike processing time triggers, which fire at regular intervals based on the current system time, data-driven triggers are based on the data itself.
+
+For example, a data-driven trigger might fire when a certain number of elements have been processed, or when the values of the elements reach a specific threshold. Data-driven triggers are particularly useful when processing time-sensitive data where it's important to respond to changes in the data as they happen.
+
+Apache Beam provides several options for data-driven triggers, including element count triggers, processing time triggers, and custom triggers. By using a combination of these triggers, you can implement complex processing logic that takes into account both processing time and the state of the data.

Review Comment:
   Please add example description and challenge 



##########
learning/tour-of-beam/learning-content/triggers/event-time-trigger/description.md:
##########
@@ -0,0 +1,25 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Event time triggers
+
+Event time trigger is a trigger in Apache Beam that fires based on the timestamps of the elements in a pipeline, as opposed to the current processing time. Event time triggers are used in windowing operations to specify when a window should be closed and its elements should be emitted.
+
+For example, consider a pipeline that ingests data from a streaming source with timestamped elements. The event time trigger can be set to fire based on the timestamps of the elements, rather than the processing time when the elements are processed by the pipeline. This ensures that the windows are closed and the elements are emitted based on the actual event time of the elements, rather than the processing time.
+
+The following accumulation modes are available with event time triggers:
+
+`Discarding`: any late data is discarded and only the data that arrives before the trigger fires is processed.
+
+`Accumulating`: late data is included and the trigger fires whenever the trigger conditions are met.

Review Comment:
   Please add example description and challenge



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm merged pull request #25257: [Tour of Beam] Learning content for "Triggers" module

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm merged PR #25257:
URL: https://github.com/apache/beam/pull/25257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org