You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Pablo Medina <pa...@gmail.com> on 2014/09/24 18:08:50 UTC

WindowedDStreams and hierarchies

Hi everyone, 

I'm trying to understand the windowed operations functioning. What I want to
achieve is the following: 

    val ssc = new StreamingContext(sc, Seconds(1)) 

    val lines = ssc.socketTextStream("localhost", 9999) 

    val window5 = lines.window(Seconds(5),Seconds(5)).reduce((time1: Long, 
time2:Long) => time1 + time2) //basically the sum of all the numbers in a
sliding window of 5 seconds each 5 seconds 

    val window10 = window5.window(Seconds(10),Seconds(10)).reduce((time1:
Long,  time2:Long) => time1 + time2) //here I'm wanting to reuse the already
calculated RDDs from window5. So I'm expecting 2 RDDs from the window5. 

    val window20 = window10.window(Seconds(20),Seconds(20)).reduce((time1:
Long,  time2:Long) => time1 + time2) //the same as in window10 but in this
case I expect two RDDs from window10. 

The thing is that sometimes window10 does not receive any RDD. Looking at
the logs and the code it seems that the time in the interval gets invalid.
In the code I see that WindowedDStream computes the interval doing the
following: 

  val currentWindow = new Interval(validTime - windowDuration +
parent.slideDuration, validTime) 
    val rddsInWindow = parent.slice(currentWindow) 

In the case of window10, the resulting slice is from  validTime - 10 seconds
+ 5 seconds to validTime. I don't understand why the parent slideDuration is
taking into account in this calculation. Could you please help me understand
this logic?. 
Do you see something wrong in the code? Is there another way to achieve the
same thing? 

Thanks! 

Pablo. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/WindowedDStreams-and-hierarchies-tp15029.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org