You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:23:31 UTC
[jira] [Updated] (SPARK-13650) Usage of the window() function on
DStream
[ https://issues.apache.org/jira/browse/SPARK-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-13650:
---------------------------------
Labels: bulk-closed (was: )
> Usage of the window() function on DStream
> -----------------------------------------
>
> Key: SPARK-13650
> URL: https://issues.apache.org/jira/browse/SPARK-13650
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 1.5.2, 1.6.0, 2.0.0
> Reporter: Mario Briggs
> Priority: Minor
> Labels: bulk-closed
>
> Is there some guidance of the usage of the Window() function on DStream. Here is my academic use-case for which it fails.
> Standard word count
> val ssc = new StreamingContext(sparkConf, Seconds(6))
> val messages = KafkaUtils.createDirectStream(...)
> val words = messages.map(_._2).flatMap(_.split(" "))
> val window = words.window(Seconds(12), Seconds(6))
> window.count().print()
> For the first batch interval it gives the count and then it hangs (inside the unionRDD)
> I say the above use-case is academic since one can achieve similar fuctionality by using instead the more compact API
> words.countByWindow(Seconds(12), Seconds(6))
> which works fine.
> Is the first approach above not the intended way of using the .window() API
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org