You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Yuval.Itzchakov" <yu...@gmail.com> on 2016/11/19 13:46:48 UTC

Stateful aggregations with Structured Streaming

I've been using `DStream.mapWithState` and was looking forward to trying out
Structured Streaming. The thing I can't under is, does Structured Streaming
in it's current state support stateful aggregations?

Looking at the StateStore design document
(https://docs.google.com/document/d/1-ncawFx8JS5Zyfq1HAEGBx56RDet9wfVp_hDM8ZL254/edit#heading=h.2h7zw4ru3nw7),
and then doing a bit of digging around in the Spark codebase, I've seen
`mapPartitionsWithStateStore` as the only viable way of doing something with
a store, but the API requires an `UnsafeRow` for key and value which makes
we question if this is a real public API one should be using?

Does anyone know what the state of things are currently in regards to an
equivalent to `mapWithState` in Structured Streaming?

Thanks,
Yuval.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Stateful-aggregations-with-Structured-Streaming-tp28108.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Stateful aggregations with Structured Streaming

Posted by Michael Armbrust <mi...@databricks.com>.
We are planning on adding mapWithState or something similar in a future
release.  In the mean time, standard Dataframe aggregations should work
(count, sum, etc).  If you are looking to do something custom, I'd suggest
looking at Aggregators
<https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/expressions/Aggregator.html>
.

On Sat, Nov 19, 2016 at 5:46 AM, Yuval.Itzchakov <yu...@gmail.com> wrote:

> I've been using `DStream.mapWithState` and was looking forward to trying
> out
> Structured Streaming. The thing I can't under is, does Structured Streaming
> in it's current state support stateful aggregations?
>
> Looking at the StateStore design document
> (https://docs.google.com/document/d/1-ncawFx8JS5Zyfq1HAEGBx56RDet9wf
> Vp_hDM8ZL254/edit#heading=h.2h7zw4ru3nw7),
> and then doing a bit of digging around in the Spark codebase, I've seen
> `mapPartitionsWithStateStore` as the only viable way of doing something
> with
> a store, but the API requires an `UnsafeRow` for key and value which makes
> we question if this is a real public API one should be using?
>
> Does anyone know what the state of things are currently in regards to an
> equivalent to `mapWithState` in Structured Streaming?
>
> Thanks,
> Yuval.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Stateful-aggregations-with-
> Structured-Streaming-tp28108.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>