You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by al...@apache.org on 2019/09/20 20:36:46 UTC

[beam] branch master updated: Document creating and implementing custom windows

This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 49c3978  Document creating and implementing custom windows
     new b19c5b5  Merge pull request #9406 from soyrice/custom-window-pattern
49c3978 is described below

commit 49c39785b7ab14eb8699e3ebe9628838a0e6d1c5
Author: Cyrus Maden <cm...@google.com>
AuthorDate: Thu Aug 22 15:30:28 2019 -0400

    Document creating and implementing custom windows
    
    Note merging windows Python support
    
    Fix typos
    
    Update region tag
    
    Custom windows copy edit
    
    Fix merge conflicts
    
    Update pattern links
    
    Fix ToC
    
    Resolve comments
    
    Remove gapAttribute field and constructor
---
 .../apache/beam/examples/snippets/Snippets.java    |   2 +-
 .../src/_includes/section-menu/documentation.html  |   9 +-
 .../{custom-io-patterns.md => custom-io.md}        |   2 +-
 .../src/documentation/patterns/custom-windows.md   | 114 +++++++++++++++++++++
 ...e-processing-patterns.md => file-processing.md} |   2 +-
 website/src/documentation/patterns/overview.md     |  15 +--
 ...line-option-patterns.md => pipeline-options.md} |   2 +-
 .../{side-input-patterns.md => side-inputs.md}     |   2 +-
 .../src/images/standard-vs-dynamic-sessions.png    | Bin 0 -> 26026 bytes
 9 files changed, 133 insertions(+), 15 deletions(-)

diff --git a/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java b/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java
index af30620..e100166 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java
+++ b/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java
@@ -725,7 +725,7 @@ public class Snippets {
       return new DynamicSessions(gapDuration);
     }
 
-    // [START CustomSessionWindow4]
+    // [END CustomSessionWindow4]
 
     @Override
     public void mergeWindows(MergeContext c) throws Exception {}
diff --git a/website/src/_includes/section-menu/documentation.html b/website/src/_includes/section-menu/documentation.html
index a496ff3..0a56d8c 100644
--- a/website/src/_includes/section-menu/documentation.html
+++ b/website/src/_includes/section-menu/documentation.html
@@ -247,10 +247,11 @@
 
   <ul class="section-nav-list">
     <li><a href="{{ site.baseurl }}/documentation/patterns/overview/">Overview</a></li>
-    <li><a href="{{ site.baseurl }}/documentation/patterns/file-processing-patterns/">File processing patterns</a></li>
-    <li><a href="{{ site.baseurl }}/documentation/patterns/side-input-patterns/">Side input patterns</a></li>
-    <li><a href="{{ site.baseurl }}/documentation/patterns/pipeline-option-patterns/">Pipeline option patterns</a></li>
-    <li><a href="{{ site.baseurl }}/documentation/patterns/custom-io-patterns/">Custom I/O patterns</a></li>
+    <li><a href="{{ site.baseurl }}/documentation/patterns/file-processing/">File processing</a></li>
+    <li><a href="{{ site.baseurl }}/documentation/patterns/side-inputs/">Side inputs</a></li>
+    <li><a href="{{ site.baseurl }}/documentation/patterns/pipeline-options/">Pipeline options</a></li>
+    <li><a href="{{ site.baseurl }}/documentation/patterns/custom-io/">Custom I/O</a></li>
+    <li><a href="{{ site.baseurl }}/documentation/patterns/custom-windows/">Custom windows</a></li>
   </ul>
 </li>
 
diff --git a/website/src/documentation/patterns/custom-io-patterns.md b/website/src/documentation/patterns/custom-io.md
similarity index 97%
rename from website/src/documentation/patterns/custom-io-patterns.md
rename to website/src/documentation/patterns/custom-io.md
index 98825f8..d816010 100644
--- a/website/src/documentation/patterns/custom-io-patterns.md
+++ b/website/src/documentation/patterns/custom-io.md
@@ -2,7 +2,7 @@
 layout: section
 title: "Custom I/O patterns"
 section_menu: section-menu/documentation.html
-permalink: /documentation/patterns/custom-io-patterns/
+permalink: /documentation/patterns/custom-io/
 ---
 <!--
 Licensed under the Apache License, Version 2.0 (the "License");
diff --git a/website/src/documentation/patterns/custom-windows.md b/website/src/documentation/patterns/custom-windows.md
new file mode 100644
index 0000000..c3ef84a
--- /dev/null
+++ b/website/src/documentation/patterns/custom-windows.md
@@ -0,0 +1,114 @@
+---
+layout: section
+title: "Custom window patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/custom-windows/
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Custom window patterns
+The samples on this page demonstrate common custom window patterns. You can create custom windows with [`WindowFn` functions]({{ site.baseurl }}/documentation/programming-guide/#provided-windowing-functions). For more information, see the [programming guide section on windowing]({{ site.baseurl }}/documentation/programming-guide/#windowing).
+
+**Note**: Custom merging windows isn't supported in Python (with fnapi).
+
+## Using data to dynamically set session window gaps
+
+You can modify the [`assignWindows`](https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/windowing/SlidingWindows.html) function to use data-driven gaps, then window incoming data into sessions.
+
+Access the `assignWindows` function through `WindowFn.AssignContext.element()`. The original, fixed-duration `assignWindows` function is:
+
+```java
+{% github_sample /apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java tag:CustomSessionWindow1
+%}
+```
+
+### Creating data-driven gaps
+To create data-driven gaps, add the following snippets to the `assignWindows` function:
+- A default value for when the custom gap is not present in the data 
+- A way to set the attribute from the main pipeline as a method of the custom windows
+
+For example, the following function assigns each element to a window between the timestamp and `gapDuration`:
+
+```java
+{% github_sample /apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java tag:CustomSessionWindow3
+%}
+```
+
+Then, set the `gapDuration` field in a windowing function:
+
+```java
+{% github_sample /apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java tag:CustomSessionWindow2
+%}
+```
+
+### Windowing messages into sessions
+After creating data-driven gaps, you can window incoming data into the new, custom sessions.
+
+First, set the session length to the gap duration:
+
+```java
+{% github_sample /apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java tag:CustomSessionWindow4
+%}
+```
+
+Lastly, window data into sessions in your pipeline:
+
+```java
+{% github_sample /apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java tag:CustomSessionWindow6
+%}
+```
+
+### Example data and windows
+The following test data tallies two users' scores with and without the `gap` attribute:
+
+```
+.apply("Create data", Create.timestamped(
+            TimestampedValue.of("{\"user\":\"user-1\",\"score\":\"12\",\"gap\":\"5\"}", new Instant()),
+            TimestampedValue.of("{\"user\":\"user-2\",\"score\":\"4\"}", new Instant()),
+            TimestampedValue.of("{\"user\":\"user-1\",\"score\":\"-3\",\"gap\":\"5\"}", new Instant().plus(2000)),
+            TimestampedValue.of("{\"user\":\"user-1\",\"score\":\"2\",\"gap\":\"5\"}", new Instant().plus(9000)),
+            TimestampedValue.of("{\"user\":\"user-1\",\"score\":\"7\",\"gap\":\"5\"}", new Instant().plus(12000)),
+            TimestampedValue.of("{\"user\":\"user-2\",\"score\":\"10\"}", new Instant().plus(12000)))
+        .withCoder(StringUtf8Coder.of()))
+```
+
+The diagram below visualizes the test data:
+
+![Two sets of data and the standard and dynamic sessions with which the data is windowed.]( {{ "/images/standard-vs-dynamic-sessions.png" | prepend: site.baseurl }})
+
+#### Standard sessions
+
+Standard sessions use the following windows and scores:
+```
+user=user-2, score=4, window=[2019-05-26T13:28:49.122Z..2019-05-26T13:28:59.122Z)
+user=user-1, score=18, window=[2019-05-26T13:28:48.582Z..2019-05-26T13:29:12.774Z)
+user=user-2, score=10, window=[2019-05-26T13:29:03.367Z..2019-05-26T13:29:13.367Z)
+```
+
+User #1 sees two events separated by 12 seconds. With standard sessions, the gap defaults to 10 seconds; both scores are in different sessions, so the scores aren't added.
+
+User #2 sees four events, seperated by two, seven, and three seconds, respectively. Since none of the gaps are greater than the default, the four events are in the same standard session and added together (18 points).
+
+#### Dynamic sessions
+The dynamic sessions specify a five-second gap, so they use the following windows and scores:
+
+```
+user=user-2, score=4, window=[2019-05-26T14:30:22.969Z..2019-05-26T14:30:32.969Z)
+user=user-1, score=9, window=[2019-05-26T14:30:22.429Z..2019-05-26T14:30:30.553Z)
+user=user-1, score=9, window=[2019-05-26T14:30:33.276Z..2019-05-26T14:30:41.849Z)
+user=user-2, score=10, window=[2019-05-26T14:30:37.357Z..2019-05-26T14:30:47.357Z)
+```
+
+With dynamic sessions, User #2 gets different scores. The third messages arrives seven seconds after the second message, so it's grouped into a different session. The large, 18-point session is split into two 9-point sessions.
\ No newline at end of file
diff --git a/website/src/documentation/patterns/file-processing-patterns.md b/website/src/documentation/patterns/file-processing.md
similarity index 98%
rename from website/src/documentation/patterns/file-processing-patterns.md
rename to website/src/documentation/patterns/file-processing.md
index b579db8..592a58b 100644
--- a/website/src/documentation/patterns/file-processing-patterns.md
+++ b/website/src/documentation/patterns/file-processing.md
@@ -2,7 +2,7 @@
 layout: section
 title: "File processing patterns"
 section_menu: section-menu/documentation.html
-permalink: /documentation/patterns/file-processing-patterns/
+permalink: /documentation/patterns/file-processing/
 ---
 <!--
 Licensed under the Apache License, Version 2.0 (the "License");
diff --git a/website/src/documentation/patterns/overview.md b/website/src/documentation/patterns/overview.md
index d676b2e..8ecca0b 100644
--- a/website/src/documentation/patterns/overview.md
+++ b/website/src/documentation/patterns/overview.md
@@ -23,17 +23,20 @@ limitations under the License.
 Pipeline patterns demonstrate common Beam use cases. Pipeline patterns are based on real-world Beam deployments. Each pattern has a description, examples, and a solution or psuedocode.
 
 **File processing patterns** - Patterns for reading from and writing to files
-* [Processing files as they arrive]({{ site.baseurl }}/documentation/patterns/file-processing-patterns/#processing-files-as-they-arrive)
-* [Accessing filenames]({{ site.baseurl }}/documentation/patterns/file-processing-patterns/#accessing-filenames)
+* [Processing files as they arrive]({{ site.baseurl }}/documentation/patterns/file-processing/#processing-files-as-they-arrive)
+* [Accessing filenames]({{ site.baseurl }}/documentation/patterns/file-processing/#accessing-filenames)
 
 **Side input patterns** - Patterns for processing supplementary data
-* [Slowly updating global window side inputs]({{ site.baseurl }}/documentation/patterns/side-input-patterns/#slowly-updating-global-window-side-inputs)
+* [Slowly updating global window side inputs]({{ site.baseurl }}/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs)
 
 **Pipeline option patterns** - Patterns for configuring pipelines
-* [Retroactively logging runtime parameters]({{ site.baseurl }}/documentation/patterns/pipeline-option-patterns/#retroactively-logging-runtime-parameters)
+* [Retroactively logging runtime parameters]({{ site.baseurl }}/documentation/patterns/pipeline-options/#retroactively-logging-runtime-parameters)
 
-**Custom I/O patterns**
-* [Choosing between built-in and custom connectors]({{ site.baseurl }}/documentation/patterns/custom-io-patterns/#choosing-between-built-in-and-custom-connectors)
+**Custom I/O patterns** - Patterns for pipeline I/O
+* [Choosing between built-in and custom connectors]({{ site.baseurl }}/documentation/patterns/custom-io/#choosing-between-built-in-and-custom-connectors)
+
+**Custom window patterns** - Patterns for windowing functions
+* [Using data to dynamically set session window gaps]({{ site.baseurl }}/documentation/patterns/custom-windows/#using-data-to-dynamically-set-session-window-gaps)
 
 ## Contributing a pattern
 
diff --git a/website/src/documentation/patterns/pipeline-option-patterns.md b/website/src/documentation/patterns/pipeline-options.md
similarity index 96%
rename from website/src/documentation/patterns/pipeline-option-patterns.md
rename to website/src/documentation/patterns/pipeline-options.md
index 71d24f6..84d1bf5 100644
--- a/website/src/documentation/patterns/pipeline-option-patterns.md
+++ b/website/src/documentation/patterns/pipeline-options.md
@@ -2,7 +2,7 @@
 layout: section
 title: "Pipeline option patterns"
 section_menu: section-menu/documentation.html
-permalink: /documentation/patterns/pipeline-option-patterns/
+permalink: /documentation/patterns/pipeline-options/
 ---
 <!--
 Licensed under the Apache License, Version 2.0 (the "License");
diff --git a/website/src/documentation/patterns/side-input-patterns.md b/website/src/documentation/patterns/side-inputs.md
similarity index 97%
rename from website/src/documentation/patterns/side-input-patterns.md
rename to website/src/documentation/patterns/side-inputs.md
index dc58fd1..854c276 100644
--- a/website/src/documentation/patterns/side-input-patterns.md
+++ b/website/src/documentation/patterns/side-inputs.md
@@ -2,7 +2,7 @@
 layout: section
 title: "Side input patterns"
 section_menu: section-menu/documentation.html
-permalink: /documentation/patterns/side-input-patterns/
+permalink: /documentation/patterns/side-inputs/
 ---
 <!--
 Licensed under the Apache License, Version 2.0 (the "License");
diff --git a/website/src/images/standard-vs-dynamic-sessions.png b/website/src/images/standard-vs-dynamic-sessions.png
new file mode 100644
index 0000000..832a181
Binary files /dev/null and b/website/src/images/standard-vs-dynamic-sessions.png differ