You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by td...@apache.org on 2015/07/10 04:55:01 UTC
spark git commit: [DOCS] Added important updateStateByKey details

Repository: spark
Updated Branches:
  refs/heads/master 1903641e6 -> d538919cc


[DOCS] Added important updateStateByKey details

Runs for *all* existing keys and returning "None" will remove the key-value pair.

Author: Michael Vogiatzis <mi...@gmail.com>

Closes #7229 from mvogiatzis/patch-1 and squashes the following commits:

e7a2946 [Michael Vogiatzis] Updated updateStateByKey text
00283ed [Michael Vogiatzis] Removed space
c2656f9 [Michael Vogiatzis] Moved description farther up
0a42551 [Michael Vogiatzis] Added important updateStateByKey details


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d538919c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d538919c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d538919c

Branch: refs/heads/master
Commit: d538919cc4fd3ab940d478c62dce1bae0270cfeb
Parents: 1903641
Author: Michael Vogiatzis <mi...@gmail.com>
Authored: Thu Jul 9 19:53:23 2015 -0700
Committer: Tathagata Das <ta...@gmail.com>
Committed: Thu Jul 9 19:54:21 2015 -0700

----------------------------------------------------------------------
 docs/streaming-programming-guide.md | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/d538919c/docs/streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index e72d558..2f3013b 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -854,6 +854,8 @@ it with new information. To use this, you will have to do two steps.
 1. Define the state update function - Specify with a function how to update the state using the
 previous state and the new values from an input stream.
 
+In every batch, Spark will apply the state  update function for all existing keys, regardless of whether they have new data in a batch or not. If the update function returns `None` then the key-value pair will be eliminated.
+
 Let's illustrate this with an example. Say you want to maintain a running count of each word
 seen in a text data stream. Here, the running count is the state and it is an integer. We
 define the update function as:


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org