You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Mateusz Zakarczemny <m....@gmail.com> on 2018/06/20 23:41:02 UTC
Buffer server overflow
HI,
I'm reading Apex documentation regarding buffer servers. I'm wondering what
will happen if buffers between operators became overflowed (lets assume non
partitioned operator)?
I read somewhere that data is split to disk. But what's next? What if disk
space will be exhausted?
Regards,
Mateusz Zakarczemny
Re: Buffer server overflow
Posted by Vlad Rozov <vr...@apache.org>.
When spilling to disk is enabled, an upstream operator will be blocked
from emitting more tuples to a corresponding output port when the size
of a buffer (in bytes) exceeds a limit (see documentation on how to
configure the limit). This is a back pressure mechanism that Pramod
refers to. There are two ways how data/tuples may be removed from the
buffer to make more space on the buffer available and enable back the
upstream operator. Tuples can be either spooled to a local disk or
completely purged from the buffer. The purge happens only after window
(actually the earliest checkpoint window after the window that the tuple
belongs to) is completely processed by an application/dag. If there is
not enough disk space for spooling, buffer server would fail the
container that it belongs to. There are few JIRAs filed to improve the
current behavior (for example limit amount of disk space that the buffer
server can use for spilling).
Thank you,
Vlad
On 6/20/18 17:24, Pramod Immaneni wrote:
> When back pressure is enabled (default) the upstream operators are
> blocked till space is freed up by downstream operators consuming data.
>
> Since bufferserver also provides fault recovery functionality it
> cannot immediately clear out the data when it is consumed by the
> downstream operators and needs to keep it around till next checkpoints
> thoughout the dag and the spillover to disk can come into play if the
> amount of data between checkpoints is greater than the in memory
> buffer capacity.
>
> Thanks
> On Wed, Jun 20, 2018 at 4:41 PM Mateusz Zakarczemny
> <m.zakarczemny@gmail.com <ma...@gmail.com>> wrote:
>
> HI,
> I'm reading Apex documentation regarding buffer servers. I'm
> wondering what will happen if buffers between operators became
> overflowed (lets assume non partitioned operator)?
> I read somewhere that data is split to disk. But what's next? What
> if disk space will be exhausted?
>
> Regards,
> Mateusz Zakarczemny
>
Re: Buffer server overflow
Posted by Pramod Immaneni <pr...@gmail.com>.
When back pressure is enabled (default) the upstream operators are blocked
till space is freed up by downstream operators consuming data.
Since bufferserver also provides fault recovery functionality it cannot
immediately clear out the data when it is consumed by the downstream
operators and needs to keep it around till next checkpoints thoughout the
dag and the spillover to disk can come into play if the amount of data
between checkpoints is greater than the in memory buffer capacity.
Thanks
On Wed, Jun 20, 2018 at 4:41 PM Mateusz Zakarczemny <m....@gmail.com>
wrote:
> HI,
> I'm reading Apex documentation regarding buffer servers. I'm wondering
> what will happen if buffers between operators became overflowed (lets
> assume non partitioned operator)?
> I read somewhere that data is split to disk. But what's next? What if disk
> space will be exhausted?
>
> Regards,
> Mateusz Zakarczemny
>