You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ricky Ho <ri...@yahoo.com> on 2014/01/23 21:32:31 UTC
Time window size in Spark Streaming
I find the time window operator is a bit confusing. Can someone clarify if
the following are equivalent ?
Case 1
======
dstreamB = dstreamA.reduceByKeyAndWindow(func, 10, 2)
Is it the same as ?
dsTmp = dstreamA.window(10, 2)
dstreamB = dsTmp.reduceByKey(func)
Case 2
======
ssc = new StreamingContext(..., 10)
dstreamA = ssc.socketStream(...)
Is it the same as ?
ssc = new StreamingContext(..., 1)
dsTmp = ssc.socketStream(...)
dstreamA = dsTmp.window(10, 10)
What is the difference between map() and mapPartition() ? Are they both
processing data within each partition without data shuffling ?
Rgds,
Ricky
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Time-window-size-in-Spark-Streaming-tp836.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.