You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/04/14 20:52:28 UTC
when can we expect multiple aggregations to be supported in spark
structured streaming?
Hi All,
when can we expect multiple aggregations to be supported in spark
structured streaming?
For example,
id | amount | my_timestamp
------------------------------------------------------
1 | 5 | 2018-04-01T01:00:00.000Z
1 | 10 | 2018-04-01T01:10:00.000Z
2 | 20 | 2018-04-01T01:20:00.000Z
2 | 30 | 2018-04-01T01:25:00.000Z
2 | 40 | 2018-04-01T01:30:00.000Z
I want to run a query like below to solve the problem in all streaming
fashion
select sum(amount) from (select amount, max(my_timestamp) from table group
by id, window("my_timestamp", "1 hours"))
just want the output to be
sum(amount)
------------------
50
I am trying to find a solution without using flatMapGroupWithState or order
by. I am using spark 2.3.1 (custom built from master) and I had already
tried self join solution but again I am running into "multiple aggregations
are not supported"
Thanks!