You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Jun Qin <qi...@gmail.com> on 2021/02/07 18:22:54 UTC

Activate bloom filter in RocksDB State Backend via Flink configuration

Hi, 

Activating bloom filter in the RocksDB state backend improves read performance. Currently activating bloom filter can only be done by implementing a custom ConfigurableRocksDBOptionsFactory. I think we should provide an option to activate bloom filter via Flink configuration.  What do you think? If so, what about the following configuration?

state.backend.rocksdb.bloom-filter.enabled: false (default)
state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
state.backend.rocksdb.bloom-filter.block-based: true (default) 


Thanks
Jun

Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Posted by Jun Qin <qi...@gmail.com>.
Thanks Till and Yun Tang. 

I’ve created https://issues.apache.org/jira/browse/FLINK-21336 <https://issues.apache.org/jira/browse/FLINK-21336> and I will work on it.

Thanks
Jun

> On Feb 9, 2021, at 7:52 AM, Yun Tang <my...@live.com> wrote:
> 
> Hi Jun,
> 
> Some predefined options would also activate bloom filters, e.g.  PredefinedOptions#SPINNING_DISK_OPTIMIZED_HIGH_MEM, but I think offering configurable option is good idea. +1 for this.
> 
> When talking about the bloom filter default value, I slight prefer to use full format [1] instead of old block format. This is related with FLINK-20496 [2] which try to add option to enable partitioned index & filter.
> 
> [1] https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#full-filters-new-format
> [2] https://issues.apache.org/jira/browse/FLINK-20496
> 
> Best
> Yun Tang
> ________________________________
> From: Till Rohrmann <tr...@apache.org>
> Sent: Monday, February 8, 2021 17:06
> To: dev <de...@flink.apache.org>
> Subject: Re: Activate bloom filter in RocksDB State Backend via Flink configuration
> 
> Hi Jun,
> 
> Making things easier to use and configure is a good idea. Hence, +1 for
> this proposal. Maybe create a JIRA ticket for it.
> 
> For the concrete default values it would be nice to hear the opinion of a
> RocksDB expert.
> 
> Cheers,
> Till
> 
> On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <qi...@gmail.com> wrote:
> 
>> Hi,
>> 
>> Activating bloom filter in the RocksDB state backend improves read
>> performance. Currently activating bloom filter can only be done by
>> implementing a custom ConfigurableRocksDBOptionsFactory. I think we should
>> provide an option to activate bloom filter via Flink configuration.  What
>> do you think? If so, what about the following configuration?
>> 
>> state.backend.rocksdb.bloom-filter.enabled: false (default)
>> state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
>> state.backend.rocksdb.bloom-filter.block-based: true (default)
>> 
>> 
>> Thanks
>> Jun


Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Posted by Yun Tang <my...@live.com>.
Hi Jun,

Some predefined options would also activate bloom filters, e.g.  PredefinedOptions#SPINNING_DISK_OPTIMIZED_HIGH_MEM, but I think offering configurable option is good idea. +1 for this.

When talking about the bloom filter default value, I slight prefer to use full format [1] instead of old block format. This is related with FLINK-20496 [2] which try to add option to enable partitioned index & filter.

[1] https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#full-filters-new-format
[2] https://issues.apache.org/jira/browse/FLINK-20496

Best
Yun Tang
________________________________
From: Till Rohrmann <tr...@apache.org>
Sent: Monday, February 8, 2021 17:06
To: dev <de...@flink.apache.org>
Subject: Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Hi Jun,

Making things easier to use and configure is a good idea. Hence, +1 for
this proposal. Maybe create a JIRA ticket for it.

For the concrete default values it would be nice to hear the opinion of a
RocksDB expert.

Cheers,
Till

On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <qi...@gmail.com> wrote:

> Hi,
>
> Activating bloom filter in the RocksDB state backend improves read
> performance. Currently activating bloom filter can only be done by
> implementing a custom ConfigurableRocksDBOptionsFactory. I think we should
> provide an option to activate bloom filter via Flink configuration.  What
> do you think? If so, what about the following configuration?
>
> state.backend.rocksdb.bloom-filter.enabled: false (default)
> state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
> state.backend.rocksdb.bloom-filter.block-based: true (default)
>
>
> Thanks
> Jun

Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Posted by Till Rohrmann <tr...@apache.org>.
Hi Jun,

Making things easier to use and configure is a good idea. Hence, +1 for
this proposal. Maybe create a JIRA ticket for it.

For the concrete default values it would be nice to hear the opinion of a
RocksDB expert.

Cheers,
Till

On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <qi...@gmail.com> wrote:

> Hi,
>
> Activating bloom filter in the RocksDB state backend improves read
> performance. Currently activating bloom filter can only be done by
> implementing a custom ConfigurableRocksDBOptionsFactory. I think we should
> provide an option to activate bloom filter via Flink configuration.  What
> do you think? If so, what about the following configuration?
>
> state.backend.rocksdb.bloom-filter.enabled: false (default)
> state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
> state.backend.rocksdb.bloom-filter.block-based: true (default)
>
>
> Thanks
> Jun

Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Posted by maver1ck <ma...@brynski.pl>.
Hi Jun Qin,
Do you have any example OptionsFactory for Bloom Filter.

I did experiment and change Options from FLASH_SSD_OPTIMIZED to
SPINNING_DISK_OPTIMIZED_HIGH_MEM.
This gives me 2x better performance on NVME disk.
I think the reason is that I'm doing a lot of reads and
SPINNING_DISK_OPTIMIZED_HIGH_MEM is the only with Bloom filter enabled by
default.

Regards,
Maciek



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/