You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ricky Ho <ri...@yahoo.com> on 2014/01/23 21:32:31 UTC

Time window size in Spark Streaming

I find the time window operator is a bit confusing.  Can someone clarify if
the following are equivalent ?

Case 1
======
dstreamB = dstreamA.reduceByKeyAndWindow(func, 10, 2)

Is it the same as ?

dsTmp = dstreamA.window(10, 2)
dstreamB = dsTmp.reduceByKey(func)


Case 2
======
ssc = new StreamingContext(..., 10)
dstreamA = ssc.socketStream(...)

Is it the same as ?

ssc = new StreamingContext(..., 1)
dsTmp = ssc.socketStream(...)
dstreamA = dsTmp.window(10, 10)



What is the difference between map() and mapPartition() ?  Are they both
processing data within each partition without data shuffling ?


Rgds,
Ricky



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Time-window-size-in-Spark-Streaming-tp836.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.