You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/04/15 07:45:38 UTC

How to select the max row for every group in spark structured streaming 2.3.0 without using order by or mapGroupWithState?

How to select the max row for every group in spark structured streaming
2.3.0 without using order by or mapGroupWithState?

*Input:*

id | amount     | my_timestamp
-------------------------------------------
1  |      5     |  2018-04-01T01:00:00.000Z
1  |     10     |  2018-04-01T01:10:00.000Z
2  |     20     |  2018-04-01T01:20:00.000Z
2  |     30     |  2018-04-01T01:25:00.000Z
2  |     40     |  2018-04-01T01:30:00.000Z

*Expected Output:*

id | amount     | my_timestamp
-------------------------------------------
1  |     10     |  2018-04-01T01:10:00.000Z
2  |     40     |  2018-04-01T01:30:00.000Z

Looking for a streaming solution using either raw sql like
sparkSession.sql("sql
query") or similar to raw sql but not something like mapGroupWithState