You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Shaosu Liu <sh...@uber.com> on 2016/09/02 20:45:55 UTC

Memory Management in Streaming?

Hi,

I have had issues when I processed large amount of data (large windows
where I could not do incremental updates), flink slowed down significantly.
It did help when I increased the amount of memory and used off heap
allocation. But it only delayed the onset of the probelm without solving
it.

Could some one give me some hints on how Flink manage window buffer and how
streaming manages its memory. I see this page on batch api memory
management and wonder what is the equivalent for streaming?
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525

-- 
Cheers,
Shaosu

Re: Memory Management in Streaming?

Posted by Jamie Grier <ja...@data-artisans.com>.
Hi Shaosu,

Do you have an estimate on the total size of state you are keeping for the
windows?  How many messages/sec, how large a window, message size, etc
would be good details to include.

Also, which state backend are you using?  Have you considered using the
RocksDB state backend.  This backend will spill Flink state to disk if it's
larger than available RAM.  You'll also probably want to use "fully async"
mode for the RocksDBStateBackend.

-Jamie


On Fri, Sep 2, 2016 at 1:45 PM, Shaosu Liu <sh...@uber.com> wrote:

> Hi,
>
> I have had issues when I processed large amount of data (large windows
> where I could not do incremental updates), flink slowed down significantly.
> It did help when I increased the amount of memory and used off heap
> allocation. But it only delayed the onset of the probelm without solving
> it.
>
> Could some one give me some hints on how Flink manage window buffer and
> how streaming manages its memory. I see this page on batch api memory
> management and wonder what is the equivalent for streaming?
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525
>
> --
> Cheers,
> Shaosu
>



-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
jamie@data-artisans.com

Re: Memory Management in Streaming?

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

the memory management described in this wiki page only applies to the batch api. The streaming api currently uses the Java heap, but we are strongly considering introducing managed memory for streaming as well.

Best,
Stefan

> Am 02.09.2016 um 22:45 schrieb Shaosu Liu <sh...@uber.com>:
> 
> Hi,
> 
> I have had issues when I processed large amount of data (large windows where I could not do incremental updates), flink slowed down significantly. It did help when I increased the amount of memory and used off heap allocation. But it only delayed the onset of the probelm without solving it. 
> 
> Could some one give me some hints on how Flink manage window buffer and how streaming manages its memory. I see this page on batch api memory management and wonder what is the equivalent for streaming?
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525>
> 
> -- 
> Cheers,
> Shaosu