You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by td...@apache.org on 2015/07/10 04:55:01 UTC
spark git commit: [DOCS] Added important updateStateByKey details
Repository: spark
Updated Branches:
refs/heads/master 1903641e6 -> d538919cc
[DOCS] Added important updateStateByKey details
Runs for *all* existing keys and returning "None" will remove the key-value pair.
Author: Michael Vogiatzis <mi...@gmail.com>
Closes #7229 from mvogiatzis/patch-1 and squashes the following commits:
e7a2946 [Michael Vogiatzis] Updated updateStateByKey text
00283ed [Michael Vogiatzis] Removed space
c2656f9 [Michael Vogiatzis] Moved description farther up
0a42551 [Michael Vogiatzis] Added important updateStateByKey details
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d538919c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d538919c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d538919c
Branch: refs/heads/master
Commit: d538919cc4fd3ab940d478c62dce1bae0270cfeb
Parents: 1903641
Author: Michael Vogiatzis <mi...@gmail.com>
Authored: Thu Jul 9 19:53:23 2015 -0700
Committer: Tathagata Das <ta...@gmail.com>
Committed: Thu Jul 9 19:54:21 2015 -0700
----------------------------------------------------------------------
docs/streaming-programming-guide.md | 2 ++
1 file changed, 2 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/d538919c/docs/streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index e72d558..2f3013b 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -854,6 +854,8 @@ it with new information. To use this, you will have to do two steps.
1. Define the state update function - Specify with a function how to update the state using the
previous state and the new values from an input stream.
+In every batch, Spark will apply the state update function for all existing keys, regardless of whether they have new data in a batch or not. If the update function returns `None` then the key-value pair will be eliminated.
+
Let's illustrate this with an example. Say you want to maintain a running count of each word
seen in a text data stream. Here, the running count is the state and it is an integer. We
define the update function as:
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org