You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2018/05/15 12:20:43 UTC

Spark structured streaming aggregation within microbatch

I have a streaming dataframe where I insert a uuid in every row, then join
with a static dataframe (after which uuid column is no longer unique), then
group by uuid and do a simple aggregation.

So I know all rows with same uuid will be in same micro batch, guaranteed,
correct? How do I express it as such in structured streaming? I don't need
an aggregation across batches.

Thanks!