You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Mateusz Zakarczemny <m....@gmail.com> on 2018/06/20 23:41:02 UTC

Buffer server overflow

HI,
I'm reading Apex documentation regarding buffer servers. I'm wondering what
will happen if buffers between operators became overflowed (lets assume non
partitioned operator)?
I read somewhere that data is split to disk. But what's next? What if disk
space will be exhausted?

Regards,
Mateusz Zakarczemny

Re: Buffer server overflow

Posted by Vlad Rozov <vr...@apache.org>.
When spilling to disk is enabled, an upstream operator will be blocked 
from emitting more tuples to a corresponding output port when the size 
of a buffer (in bytes) exceeds a limit (see documentation on how to 
configure the limit). This is a back pressure mechanism that Pramod 
refers to. There are two ways how data/tuples may be removed from the 
buffer to make more space on the buffer available and enable back the 
upstream operator. Tuples can be either spooled to a local disk or 
completely purged from the buffer. The purge happens only after window 
(actually the earliest checkpoint window after the window that the tuple 
belongs to) is completely processed by an application/dag. If there is 
not enough disk space for spooling, buffer server would fail the 
container that it belongs to. There are few JIRAs filed to improve the 
current behavior (for example limit amount of disk space that the buffer 
server can use for spilling).

Thank you,

Vlad

On 6/20/18 17:24, Pramod Immaneni wrote:
> When back pressure is enabled (default) the upstream operators are 
> blocked till space is freed up by downstream operators consuming data.
>
> Since bufferserver also provides fault recovery functionality it 
> cannot immediately clear out the data when it is consumed by the 
> downstream operators and needs to keep it around till next checkpoints 
> thoughout the dag and the spillover to disk can come into play if the 
> amount of data between checkpoints is greater than the in memory 
> buffer capacity.
>
> Thanks
> On Wed, Jun 20, 2018 at 4:41 PM Mateusz Zakarczemny 
> <m.zakarczemny@gmail.com <ma...@gmail.com>> wrote:
>
>     HI,
>     I'm reading Apex documentation regarding buffer servers. I'm
>     wondering what will happen if buffers between operators became
>     overflowed (lets assume non partitioned operator)?
>     I read somewhere that data is split to disk. But what's next? What
>     if disk space will be exhausted?
>
>     Regards,
>     Mateusz Zakarczemny
>


Re: Buffer server overflow

Posted by Pramod Immaneni <pr...@gmail.com>.
When back pressure is enabled (default) the upstream operators are blocked
till space is freed up by downstream operators consuming data.

Since bufferserver also provides fault recovery functionality it cannot
immediately clear out the data when it is consumed by the downstream
operators and needs to keep it around till next checkpoints thoughout the
dag and the spillover to disk can come into play if the amount of data
between checkpoints is greater than the in memory buffer capacity.

Thanks
On Wed, Jun 20, 2018 at 4:41 PM Mateusz Zakarczemny <m....@gmail.com>
wrote:

> HI,
> I'm reading Apex documentation regarding buffer servers. I'm wondering
> what will happen if buffers between operators became overflowed (lets
> assume non partitioned operator)?
> I read somewhere that data is split to disk. But what's next? What if disk
> space will be exhausted?
>
> Regards,
> Mateusz Zakarczemny
>