You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Donatien (Jira)" <ji...@apache.org> on 2022/10/03 08:04:00 UTC

[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

    [ https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612208#comment-17612208 ] 

Donatien commented on FLINK-29402:
----------------------------------

Thanks for your comment! Considering that Facebook uses DirectIO for reads and writes when performing benchmarks ([https://www.usenix.org/system/files/fast20-cao_zhichao.pdf)] on RocksDB, I would say it is best practice to also enable DirectIO for Flink benchmarks using RocksDB. Disabling DirectIO can lead to unpredictable experiments depending on 1. the container memory limit 2. the amount of free heap memory used by the Page Cache. Again I understand that it is only for research purposes and agree that this could be done programmatically.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> -------------------------------------------------------
>
>                 Key: FLINK-29402
>                 URL: https://issues.apache.org/jira/browse/FLINK-29402
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.16.0
>            Reporter: Donatien
>            Priority: Not a Priority
>              Labels: Enhancement, pull-request-available, rocksdb
>             Fix For: 1.17.0
>
>         Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux Page Cache. To understand the impact of Linux Page Cache on performance, one can run a heavy workload on a single-tasked Task Manager with a container memory limit identical to the TM process memory. Running this same workload on a TM with no container memory limit will result in better performances but with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when benchmarking the Managed Memory used by RocksDB. DirectIO is typically enabled for benchmarks on working set estimation [Zwaenepoel et al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of DirectIO for reads thanks to the RocksDB API. This configuration would be disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)