You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:23:31 UTC

[jira] [Updated] (SPARK-13650) Usage of the window() function on DStream

     [ https://issues.apache.org/jira/browse/SPARK-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-13650:
---------------------------------
    Labels: bulk-closed  (was: )

> Usage of the window() function on DStream
> -----------------------------------------
>
>                 Key: SPARK-13650
>                 URL: https://issues.apache.org/jira/browse/SPARK-13650
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 1.5.2, 1.6.0, 2.0.0
>            Reporter: Mario Briggs
>            Priority: Minor
>              Labels: bulk-closed
>
> Is there some guidance of the usage of the Window() function on DStream. Here is my academic use-case for which it fails.
> Standard word count
>  val ssc = new StreamingContext(sparkConf, Seconds(6))
>  val messages = KafkaUtils.createDirectStream(...)
>  val words = messages.map(_._2).flatMap(_.split(" "))
>  val window = words.window(Seconds(12), Seconds(6)) 
>  window.count().print()
> For the first batch interval it gives the count and then it hangs (inside the unionRDD)
> I say the above use-case is academic since one can achieve similar fuctionality by using instead the more compact API
>        words.countByWindow(Seconds(12), Seconds(6))
> which works fine. 
> Is the first approach above not the intended way of using the .window() API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org