You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/04/14 20:52:28 UTC

when can we expect multiple aggregations to be supported in spark structured streaming?

Hi All,

when can we expect multiple aggregations to be supported in spark
structured streaming?

For example,

id | amount | my_timestamp
------------------------------------------------------
1  |      5      |  2018-04-01T01:00:00.000Z
1  |     10     |  2018-04-01T01:10:00.000Z
2  |     20     |  2018-04-01T01:20:00.000Z
2  |     30     |  2018-04-01T01:25:00.000Z
2  |     40     |  2018-04-01T01:30:00.000Z


I want to run a query like below to solve the problem in all streaming
fashion

select sum(amount) from (select amount, max(my_timestamp) from table group
by id, window("my_timestamp", "1 hours"))

just want the output to be

sum(amount)
------------------
 50

I am trying to find a solution without using flatMapGroupWithState or order
by. I am using spark 2.3.1 (custom built from master) and I had already
tried self join solution but again I am running into "multiple aggregations
are not supported"

Thanks!