You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Vijay Bhaskar <bh...@gmail.com> on 2020/07/17 11:52:55 UTC

Flink FsStatebackend is giving better performance than RocksDB

Hi
While doing scale testing we observed that FSStatebackend is out
performing RocksDB.
When using RocksDB, off heap  memory keeps growing over a period of time
and after a day pod got terminated with OOM.

Whereas the same data pattern FSStatebackend is running for days without
any memory spike and OOM.

Based on documentation this is what i understood:

When the state size is so huge that we can't keep it in memory, that case
RocksDB is preferred. That means indirectly when FSStatebackend is
performing poorly when state size grows, RocksDB is preferred, right?

Another question, We have run production using RocksDB for quite some time,
if we switch to FSStatebackend, then what are the consequences?
Following is what i can think of:
Very first time i'll lose the state
Thereafter w.r.t Save Points and Checkpoints the behavior is same ( I know
there is no incremental checkpoint, but its a performance purpose)

Other than that, do I see any issues?

Regards
Bhaskar

Re: Flink FsStatebackend is giving better performance than RocksDB

Posted by Yun Tang <my...@live.com>.

Hi Vijay,

I think David has provided enough background knowledge for the difference of these two state backends.
When talking about the OOM problem of RocksDB, from Flink-1.10, rocksDB would be bounded to managed memory. However, RocksDB would use the managed memory as much as possible and the rest part for other native memory would be limited to the JVM overhead space[1] and consider to increase the memory of that part.


[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/memory/mem_setup.html#capped-fractionated-components

Best,
Yun Tang
________________________________
From: David Anderson <da...@alpinegizmo.com>
Sent: Saturday, July 18, 2020 19:34
To: Vijay Bhaskar <bh...@gmail.com>
Cc: user <us...@flink.apache.org>
Subject: Re: Flink FsStatebackend is giving better performance than RocksDB

You should be able to tune your setup to avoid the OOM problem you have run into with RocksDB. It will grow to use all of the memory available to it, but shouldn't leak. Perhaps something is misconfigured.

As for performance, with the FSStateBackend you can expect:

* much better throughput and average latency
* possibly worse worst-case latency, due to GC pauses

Large, multi-slot TMs can be more of a problem with the FSStateBackend because of the increased scope for GC. You may want to run with more TMs, each with fewer slots.

You'll also be giving up the possibility of taking advantage of quick, local recovery [1] that comes for free when using incremental checkpoints with RocksDB. Local recovery can still be used with the FSStateBackend, but it comes at more of a cost.

As for migrating your state, you may be able to use the State Processor API [2] to rewrite a savepoint taken from RocksDB so that it can be loaded by the FSStateBackend, so long as you aren't using any windows or ListCheckpointed state.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#task-local-recovery
[2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/libs/state_processor_api.html>

Best,
David

On Fri, Jul 17, 2020 at 1:53 PM Vijay Bhaskar <bh...@gmail.com>> wrote:
Hi
While doing scale testing we observed that FSStatebackend is out performing RocksDB.
When using RocksDB, off heap  memory keeps growing over a period of time and after a day pod got terminated with OOM.

Whereas the same data pattern FSStatebackend is running for days without any memory spike and OOM.

Based on documentation this is what i understood:

When the state size is so huge that we can't keep it in memory, that case RocksDB is preferred. That means indirectly when FSStatebackend is performing poorly when state size grows, RocksDB is preferred, right?

Another question, We have run production using RocksDB for quite some time, if we switch to FSStatebackend, then what are the consequences?
Following is what i can think of:
Very first time i'll lose the state
Thereafter w.r.t Save Points and Checkpoints the behavior is same ( I know there is no incremental checkpoint, but its a performance purpose)

Other than that, do I see any issues?

Regards
Bhaskar

Re: Flink FsStatebackend is giving better performance than RocksDB

Posted by David Anderson <da...@alpinegizmo.com>.

You should be able to tune your setup to avoid the OOM problem you have run
into with RocksDB. It will grow to use all of the memory available to it,
but shouldn't leak. Perhaps something is misconfigured.

As for performance, with the FSStateBackend you can expect:

* much better throughput and average latency
* possibly worse worst-case latency, due to GC pauses

Large, multi-slot TMs can be more of a problem with the FSStateBackend
because of the increased scope for GC. You may want to run with more TMs,
each with fewer slots.

You'll also be giving up the possibility of taking advantage of quick,
local recovery [1] that comes for free when using incremental checkpoints
with RocksDB. Local recovery can still be used with the FSStateBackend, but
it comes at more of a cost.

As for migrating your state, you may be able to use the State Processor API
[2] to rewrite a savepoint taken from RocksDB so that it can be loaded by
the FSStateBackend, so long as you aren't using any windows or
ListCheckpointed state.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#task-local-recovery
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/libs/state_processor_api.html>

Best,
David

On Fri, Jul 17, 2020 at 1:53 PM Vijay Bhaskar <bh...@gmail.com>
wrote:

> Hi
> While doing scale testing we observed that FSStatebackend is out
> performing RocksDB.
> When using RocksDB, off heap  memory keeps growing over a period of time
> and after a day pod got terminated with OOM.
>
> Whereas the same data pattern FSStatebackend is running for days without
> any memory spike and OOM.
>
> Based on documentation this is what i understood:
>
> When the state size is so huge that we can't keep it in memory, that case
> RocksDB is preferred. That means indirectly when FSStatebackend is
> performing poorly when state size grows, RocksDB is preferred, right?
>
> Another question, We have run production using RocksDB for quite
> some time, if we switch to FSStatebackend, then what are the consequences?
> Following is what i can think of:
> Very first time i'll lose the state
> Thereafter w.r.t Save Points and Checkpoints the behavior is same ( I know
> there is no incremental checkpoint, but its a performance purpose)
>
> Other than that, do I see any issues?
>
> Regards
> Bhaskar
>
>