You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by vtygoss <vt...@126.com> on 2022/07/21 11:11:16 UTC

Using RocksDBStateBackend and SSD to store states, application runs slower..

Hi, community!


I am doing some performance tests based on my scene. 


1. Environment
- Flink: 1.13.5
- StateBackend: RocksDB, incremental
- user case: complex sql contains 7 joins and 2 aggregation, input data 30,000,000 records and output 60,000,000 records about 80GB. 
- resource: flink on yarn. JM 2G, one TM 24G(8G on-heap, 16G off-heap). 3 slots per TM
- only difference: different config 'state.backend.rocksdb.localdir', one SATA disk or one SSD disk.


2. rand write performance difference between SATA and SSD
   4.8M/s is archived using SATA, while 48.2M/s using SSD.
   ```
   fio -direct=1 -iodepth 64 -thread -rw=randwrite -ioengine=sync  -fsync=1 -runtime=300 -group_reporting -name=xxx -size=100G --allow_mounted_write=1 -bs=8k  -numjobs=64 -filename=/mnt/disk11/xx
   ``` 


3. In my use case, Flink SQL application finished in 41minutes using SATA, while 45minutes using SSD. 


Does this comparision suggest that the way to improve RocksDB performance by using SSD is not effective? 
The direct downstream of the BackPressure operator is HdfsSink, does that mean the best target to improve application performance is HDFS?


Thanks for your any replies or suggestions. 


Best Regards!

Re: Using RocksDBStateBackend and SSD to store states, application runs slower..

Posted by "Teoh, Hong" <li...@amazon.co.uk>.

Hi,

I’d say it seems you are trying to identify bottlenecks in your job, and are currently looking at RocksDB Disk I/O as one of the bottlenecks. However, there are also other bottlenecks (e.g. CPU/memory/network/sink throttling), and from what you described, it’s possible that the HDFS sink is the bottleneck. Are you using Flink >= 1.13? If so you can use Flamegraphs on the Flink dashboard to debug what the busy operator is doing.

Regards,
Hong



From: Jing Ge <ji...@ververica.com>
Date: Thursday, 21 July 2022 at 21:14
To: Yaroslav Tkachenko <ya...@goldsky.io>
Cc: vtygoss <vt...@126.com>, "user@flink.apache.org" <us...@flink.apache.org>
Subject: RE: [EXTERNAL]Using RocksDBStateBackend and SSD to store states, application runs slower..


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hi,

using FLASH_SSD_OPTIMIZED already sets the number of threads to 4. This optimization can improve the source throughput and reduce the delayed wrate rate.

If this optimization didn't fix the back pressure, could you share more information about your job? Could you check the metric of the back pressured operator, e.g. check if it is caused by write-heavy or read-heavy tasks? You could try tuning rocksdb.writebuffer for write-heavy tasks.

On Thu, Jul 21, 2022 at 5:59 PM Yaroslav Tkachenko <ya...@goldsky.io>> wrote:
Hi!

I'd try re-running the SSD test with the following config options:

state.backend.rocksdb.thread.num: 4
state.backend.rocksdb.predefined-options: FLASH_SSD_OPTIMIZED


On Thu, Jul 21, 2022 at 4:11 AM vtygoss <vt...@126.com>> wrote:

Hi, community!



I am doing some performance tests based on my scene.



1. Environment

- Flink: 1.13.5

- StateBackend: RocksDB, incremental

- user case: complex sql contains 7 joins and 2 aggregation, input data 30,000,000 records and output 60,000,000 records about 80GB.

- resource: flink on yarn. JM 2G, one TM 24G(8G on-heap, 16G off-heap). 3 slots per TM

- only difference: different config 'state.backend.rocksdb.localdir', one SATA disk or one SSD disk.



2. rand write performance difference between SATA and SSD

   4.8M/s is archived using SATA, while 48.2M/s using SSD.

   ```

   fio -direct=1 -iodepth 64 -thread -rw=randwrite -ioengine=sync  -fsync=1 -runtime=300 -group_reporting -name=xxx -size=100G --allow_mounted_write=1 -bs=8k  -numjobs=64 -filename=/mnt/disk11/xx

   ```



3. In my use case, Flink SQL application finished in 41minutes using SATA, while 45minutes using SSD.



Does this comparision suggest that the way to improve RocksDB performance by using SSD is not effective?

The direct downstream of the BackPressure operator is HdfsSink, does that mean the best target to improve application performance is HDFS?



Thanks for your any replies or suggestions.



Best Regards!

Re: Using RocksDBStateBackend and SSD to store states, application runs slower..

Posted by Jing Ge <ji...@ververica.com>.

Hi,

using FLASH_SSD_OPTIMIZED already sets the number of threads to 4. This
optimization can improve the source throughput and reduce the delayed wrate
rate.

If this optimization didn't fix the back pressure, could you share more
information about your job? Could you check the metric of the back
pressured operator, e.g. check if it is caused by write-heavy or read-heavy
tasks? You could try tuning rocksdb.writebuffer for write-heavy tasks.

On Thu, Jul 21, 2022 at 5:59 PM Yaroslav Tkachenko <ya...@goldsky.io>
wrote:

> Hi!
>
> I'd try re-running the SSD test with the following config options:
>
> state.backend.rocksdb.thread.num: 4
> state.backend.rocksdb.predefined-options: FLASH_SSD_OPTIMIZED
>
>
> On Thu, Jul 21, 2022 at 4:11 AM vtygoss <vt...@126.com> wrote:
>
>> Hi, community!
>>
>>
>> I am doing some performance tests based on my scene.
>>
>>
>> 1. Environment
>>
>> - Flink: 1.13.5
>>
>> - StateBackend: RocksDB, incremental
>>
>> - user case: complex sql contains 7 joins and 2 aggregation, input data
>> 30,000,000 records and output 60,000,000 records about 80GB.
>>
>> - resource: flink on yarn. JM 2G, one TM 24G(8G on-heap, 16G off-heap). 3
>> slots per TM
>>
>> - only difference: different config 'state.backend.rocksdb.localdir', one
>> SATA disk or one SSD disk.
>>
>>
>> 2. rand write performance difference between SATA and SSD
>>
>>    4.8M/s is archived using SATA, while 48.2M/s using SSD.
>>
>>    ```
>>
>>    fio -direct=1 -iodepth 64 -thread -rw=randwrite -ioengine=sync
>>  -fsync=1 -runtime=300 -group_reporting -name=xxx -size=100G
>> --allow_mounted_write=1 -bs=8k  -numjobs=64 -filename=/mnt/disk11/xx
>>
>>    ```
>>
>>
>> 3. In my use case, Flink SQL application finished in 41minutes using
>> SATA, while 45minutes using SSD.
>>
>>
>> Does this comparision suggest that the way to improve RocksDB performance
>> by using SSD is not effective?
>>
>> The direct downstream of the BackPressure operator is HdfsSink, does that
>> mean the best target to improve application performance is HDFS?
>>
>>
>> Thanks for your any replies or suggestions.
>>
>>
>> Best Regards!
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: Using RocksDBStateBackend and SSD to store states, application runs slower..

Posted by Yaroslav Tkachenko <ya...@goldsky.io>.

Hi!

I'd try re-running the SSD test with the following config options:

state.backend.rocksdb.thread.num: 4
state.backend.rocksdb.predefined-options: FLASH_SSD_OPTIMIZED


On Thu, Jul 21, 2022 at 4:11 AM vtygoss <vt...@126.com> wrote:

> Hi, community!
>
>
> I am doing some performance tests based on my scene.
>
>
> 1. Environment
>
> - Flink: 1.13.5
>
> - StateBackend: RocksDB, incremental
>
> - user case: complex sql contains 7 joins and 2 aggregation, input data
> 30,000,000 records and output 60,000,000 records about 80GB.
>
> - resource: flink on yarn. JM 2G, one TM 24G(8G on-heap, 16G off-heap). 3
> slots per TM
>
> - only difference: different config 'state.backend.rocksdb.localdir', one
> SATA disk or one SSD disk.
>
>
> 2. rand write performance difference between SATA and SSD
>
>    4.8M/s is archived using SATA, while 48.2M/s using SSD.
>
>    ```
>
>    fio -direct=1 -iodepth 64 -thread -rw=randwrite -ioengine=sync
>  -fsync=1 -runtime=300 -group_reporting -name=xxx -size=100G
> --allow_mounted_write=1 -bs=8k  -numjobs=64 -filename=/mnt/disk11/xx
>
>    ```
>
>
> 3. In my use case, Flink SQL application finished in 41minutes using SATA,
> while 45minutes using SSD.
>
>
> Does this comparision suggest that the way to improve RocksDB performance
> by using SSD is not effective?
>
> The direct downstream of the BackPressure operator is HdfsSink, does that
> mean the best target to improve application performance is HDFS?
>
>
> Thanks for your any replies or suggestions.
>
>
> Best Regards!
>
>
>
>
>
>
>
>
>