You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ruijing Li <li...@gmail.com> on 2020/03/10 22:46:11 UTC

ForEachBatch collecting batch to driver

Hi all,

I’m curious on how foreachbatch works in spark structured streaming. So
since it is taking in a micro batch dataframe, that means the code in
foreachbatch is executing on spark driver? Does this mean for large
batches, you could potentially have OOM issues from collecting each
partition into the driver?
-- 
Cheers,
Ruijing Li

Re: ForEachBatch collecting batch to driver

Posted by Burak Yavuz <br...@gmail.com>.
foreachBatch gives you the micro-batch as a DataFrame, which is
distributed. If you don't call collect on that DataFrame, it shouldn't have
any memory implications on the Driver.

On Tue, Mar 10, 2020 at 3:46 PM Ruijing Li <li...@gmail.com> wrote:

> Hi all,
>
> I’m curious on how foreachbatch works in spark structured streaming. So
> since it is taking in a micro batch dataframe, that means the code in
> foreachbatch is executing on spark driver? Does this mean for large
> batches, you could potentially have OOM issues from collecting each
> partition into the driver?
> --
> Cheers,
> Ruijing Li
>