You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2021/05/04 23:18:00 UTC

[jira] [Updated] (KAFKA-12748) Explore new RocksDB options to consider enabling by default

     [ https://issues.apache.org/jira/browse/KAFKA-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

A. Sophie Blee-Goldman updated KAFKA-12748:
-------------------------------------------
    Description: 
With the rocksdb version bump comes a lot of new options, some of which look interesting enough to explore for usage in Streams. We should try setting these as default options and run the benchmarks to look for any performance benefit (or decrease). See javadocs for all Options [here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]


Options.setAvoidUnnecessaryBlockingIO: 
    - As the name suggest, avoids blocking/long-latency tasks by scheduling a background job to do it

Options.setSkipCheckingSstFileSizesOnDbOpen:
    - Speeds up startup time if there are many sst files, could mean less overhead from things like rebalancing where tasks are migrated between clients or threads. Not sure how many sst files counts as "many", may be less useful now that we've disabled bulk loading 

 Options.setBestEffortsRecovery: 
    - Interesting feature to allow recovering missing files without the use of the WAL. Could be useful if the on-disk state is corrupted (eg user deletes a file) without needing to rebuild state from scratch. Though I'd want to dig in further to understand what exactly it does and does not do. Not a performance improvement but we should run the benchmarks to make sure it doesn't make the performance worse.

Options.setWriteDbidToManifest:
    - Should be set to true if/when we ever need to rely on the DB id eg for backups. Also not a performance improvement but we should still benchmark this.



Options.optimizeForSmallDb:
    - This one is definitely not something we should set by default, as "small" here means under 1GB. But it's probably worth at least calling out in the docs for those users who know their data set size (per store) is under a GB

  was:
With the rocksdb version bump comes a lot of new options, some of which look interesting enough to explore for usage in Streams. We should try setting these as default options and run the benchmarks to look for any performance benefit (or decrease). See javadocs for all Options [here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]


Options.setAvoidUnnecessaryBlockingIO: 
    - As the name suggest, avoids blocking/long-latency tasks by scheduling a background job to do it

 Options.setBestEffortsRecovery: 
    - Interesting feature to allow recovering missing files without the use of the WAL. Could be useful if the on-disk state is corrupted (eg user deletes a file) without needing to rebuild state from scratch. Though I'd want to dig in further to understand what exactly it does and does not do. Not a performance improvement but we should run the benchmarks to make sure it doesn't make the performance worse.

Options.setWriteDbidToManifest:
    - Should be set to true if/when we ever need to rely on the DB id eg for backups. Also not a performance improvement but we should still benchmark this.


Options.optimizeForSmallDb:
    - This one is definitely not something we should set by default, as "small" here means under 1GB. But it's probably worth at least calling out in the docs for those users who know their data set size (per store) is under a GB


> Explore new RocksDB options to consider enabling by default
> -----------------------------------------------------------
>
>                 Key: KAFKA-12748
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12748
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>
> With the rocksdb version bump comes a lot of new options, some of which look interesting enough to explore for usage in Streams. We should try setting these as default options and run the benchmarks to look for any performance benefit (or decrease). See javadocs for all Options [here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]
> Options.setAvoidUnnecessaryBlockingIO: 
>     - As the name suggest, avoids blocking/long-latency tasks by scheduling a background job to do it
> Options.setSkipCheckingSstFileSizesOnDbOpen:
>     - Speeds up startup time if there are many sst files, could mean less overhead from things like rebalancing where tasks are migrated between clients or threads. Not sure how many sst files counts as "many", may be less useful now that we've disabled bulk loading 
>  Options.setBestEffortsRecovery: 
>     - Interesting feature to allow recovering missing files without the use of the WAL. Could be useful if the on-disk state is corrupted (eg user deletes a file) without needing to rebuild state from scratch. Though I'd want to dig in further to understand what exactly it does and does not do. Not a performance improvement but we should run the benchmarks to make sure it doesn't make the performance worse.
> Options.setWriteDbidToManifest:
>     - Should be set to true if/when we ever need to rely on the DB id eg for backups. Also not a performance improvement but we should still benchmark this.
> Options.optimizeForSmallDb:
>     - This one is definitely not something we should set by default, as "small" here means under 1GB. But it's probably worth at least calling out in the docs for those users who know their data set size (per store) is under a GB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)