You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/01/23 09:36:26 UTC
[jira] [Commented] (FLINK-5529) Improve / extends windowing
documentation
[ https://issues.apache.org/jira/browse/FLINK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834133#comment-15834133 ]
ASF GitHub Bot commented on FLINK-5529:
---------------------------------------
GitHub user kl0u opened a pull request:
https://github.com/apache/flink/pull/3191
[FLINK-5529] [FLINK-4752] [docs] Improve / extends windowing documentation
This PR is for both the issues in the title.
It refactors/improves/extends the documentation of the windowing logic in Flink 1.2.
R @aljoscha
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kl0u/flink window-docs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3191.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3191
----
commit 7ee13d40983831c4ea5db0459ef691c8b974bc6e
Author: kl0u <kk...@gmail.com>
Date: 2017-01-17T15:51:09Z
[FLINK-5529] [docs] Improve / extends windowing documentation
commit 6e840fdc2d0e92715488a14e48031c44206254c9
Author: Fabian Hueske <fh...@apache.org>
Date: 2017-01-18T17:57:23Z
[FLINK-4752] [docs] Improve window assigner documentation.
----
> Improve / extends windowing documentation
> -----------------------------------------
>
> Key: FLINK-5529
> URL: https://issues.apache.org/jira/browse/FLINK-5529
> Project: Flink
> Issue Type: Sub-task
> Components: Documentation
> Reporter: Stephan Ewen
> Assignee: Kostas Kloudas
> Fix For: 1.2.0, 1.3.0
>
>
> Suggested Outline:
> {code}
> Windows
> (0) Outline: The anatomy of a window operation
> stream
> [.keyBy(...)] <- keyed versus non-keyed windows
> .window(...) <- required: "assigner"
> [.trigger(...)] <- optional: "trigger" (else default trigger)
> [.evictor(...)] <- optional: "evictor" (else no evictor)
> [.allowedLateness()] <- optional, else zero
> .reduce/fold/apply() <- required: "function"
> (1) Types of windows
> - tumble
> - slide
> - session
> - global
> (2) Pre-defined windows
> timeWindow() (tumble, slide)
> countWindow() (tumble, slide)
> - mention that count windows are inherently
> resource leaky unless limited key space
> (3) Window Functions
> - apply: most basic, iterates over elements in window
>
> - aggregating: reduce and fold, can be used with "apply()" which will get one element
>
> - forward reference to state size section
> (4) Advanced Windows
> - assigner
> - simple
> - merging
> - trigger
> - registering timers (processing time, event time)
> - state in triggers
> - life cycle of a window
> - create
> - state
> - cleanup
> - when is window contents purged
> - when is state dropped
> - when is metadata (like merging set) dropped
> (5) Late data
> - picture
> - fire vs fire_and_purge: late accumulates vs late resurrects (cf discarding mode)
>
> (6) Evictors
> - TDB
>
> (7) State size: How large will the state be?
> Basic rule: Each element has one copy per window it is assigned to
> --> num windows * num elements in window
> --> example: tumbline is one copy, sliding(n,m) is n/m copies
> --> per key
> Pre-aggregation:
> - if reduce or fold is set -> one element per window (rather than num elements in window)
> - evictor voids pre-aggregation from the perspective of state
> Special rules:
> - fold cannot pre-aggregate on session windows (and other merging windows)
> (8) Non-keyed windows
> - all elements through the same windows
> - currently not parallel
> - possible parallel in the future when having pre-aggregation functions
> - inherently (by definition) produce a result stream with parallelism one
> - state similar to one key of keyed windows
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)