You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ZmeiGorynych <eg...@gmail.com> on 2018/04/23 11:05:19 UTC

flatMapGroupsWithState equivalent in PySpark

I need to write PySpark logic equivalent to what flatMapGroupsWithState does
in Scala/Java. To be precise, I need to take an incoming stream of records,
group them by an arbitrary attribute, and feed each group a record at at
time to a separate instance of a user-defined (so 'black-box') Python
callable, and stream out its output.  

1. Is there a way to do that in Python using Structured Streaming?

2. How can I find out when flatMapGroupsWithState is coming to the Python
Spark API?

3. Or can I only do something like that by using updateStateByKey(), and do
I therefore have to use DStreams API instead of the Structured Streaming API
(which I'd like to avoid)?

Thanks a lot!



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org