You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Diego <di...@gmail.com> on 2014/09/25 13:45:46 UTC

Windowed Operations

Hi everyone, 

I'm trying to understand the windowed operations functioning. What I want to
achieve is the following: 

    val ssc = new StreamingContext(sc, Seconds(1)) 

    val lines = ssc.socketTextStream("localhost", 9999) 

    val window5 = lines.window(Seconds(5),Seconds(5)).reduce((time1: Long, 
time2:Long) => time1 + time2) //basically the sum of all the numbers in a
sliding window of 5 seconds each 5 seconds 

    val window10 = window5.window(Seconds(10),Seconds(10)).reduce((time1:
Long,  time2:Long) => time1 + time2) //here I'm wanting to reuse the already
calculated RDDs from window5. So I'm expecting 2 RDDs from the window5. 

    val window20 = window10.window(Seconds(20),Seconds(20)).reduce((time1:
Long,  time2:Long) => time1 + time2) //the same as in window10 but in this
case I expect two RDDs from window10. 

The thing is that sometimes window10 does not receive any RDD. Looking at
the logs and the code it seems that the time in the interval gets invalid.
In the code I see that WindowedDStream computes the interval doing the
following: 

  val currentWindow = new Interval(validTime - windowDuration +
parent.slideDuration, validTime) 
    val rddsInWindow = parent.slice(currentWindow) 

In the case of window10, the resulting slice is from  validTime - 10 seconds
+ 5 seconds to validTime. I don't understand why the parent slideDuration is
taking into account in this calculation. Could you please help me understand
this logic?. 
Do you see something wrong in the code? Is there another way to achieve the
same thing? 

Thanks! 

Diego.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Windowed-Operations-tp15132.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Windowed Operations

Posted by DMiner <ms...@outlook.com>.
I also met the same issue. Any updates on this?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Windowed-Operations-tp15133p23094.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Windowed Operations

Posted by julyfire <he...@gmail.com>.
hi Diego,

I have the same problem.

// reduce by key in the first window
val *w1* = *one*.reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10))
w1.count().print()

//reduce by key in the second window based on the results of the first
window
val *w2* = *w1*.reduceByKeyAndWindow(_ + _, Seconds(120), Seconds(30))
w2.print()

then w1 works, but w2 always does not print any information.

Do you have any update for this issue?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Windowed-Operations-tp15133p16128.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org