You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Hesselmann, Brian" <br...@cgi.com> on 2020/03/31 20:23:12 UTC

Performance of adding many keys to redis with PutDistributedMapCache

Hi,

We currently run a flow that puts about 700.000 entries/flowfiles into Redis every 5 minutes. I'm looking for ways to improve performance.

Currently we've been upping the number of concurrent tasks and run duration of the PutDistributedMapCache processor to be able to process everything. I know Redis supports setting multiple keys at once using MSET(https://redis.io/commands/mset), however using Nifi this command is not available.

Short of simply upgrading the system we run Nifi/Redis on, do you have any suggestions for improving performance of PutDistributedMapCache?

Best,
Brian

RE: Performance of adding many keys to redis with PutDistributedMapCache

Posted by Otto Fowler <ot...@gmail.com>.
Maybe something that used records and a record query on top of mset would
be the most efficient.




On April 2, 2020 at 06:27:53, Hesselmann, Brian (brian.hesselmann@cgi.com)
wrote:

Hi Bryan and Mike,

Thanks for the responses. For now we have introduced a ExecuteStreamCommand
to use the redis-cli and different commands directly. It seems to improve
performance for now, but we will have to look into introducing a new
procesor or different DB if necessary.

Thanks,
Brian

------------------------------
*Van:* Mike Thomsen [mikerthomsen@gmail.com]
*Verzonden:* woensdag 1 april 2020 0:08
*Aan:* users@nifi.apache.org
*Onderwerp:* Re: Performance of adding many keys to redis with
PutDistributedMapCache

Might be worth experimenting with KeyDB and see if that helps. It's a
mutli-threaded fork of Redis that's supposedly about as fast in a single
node as a same size Redis cluster when you compare cluster nodes to KeyDB
thread pool size.

https://keydb.dev/
<https://urldefense.proofpoint.com/v2/url?u=https-3A__keydb.dev_&d=DwMFaQ&c=H50I6Bh8SW87d_bXfZP_8g&r=SZ1t8SQDPUG29Dh1I8iJ-uskV9jK3PuRgBiFyP5aljY&m=1nnOc3V31kMYb0yHffJiNjhefJYM79NHp8bM9bX9i0c&s=sxTO-sVQaGBua-hqcP-AyOfbdlBidK20WyRaAuw7xsM&e=>

On Tue, Mar 31, 2020 at 4:49 PM Bryan Bende <bb...@gmail.com> wrote:

> Hi Brian,
>
> I'm not sure what can really be done with the existing processor besides
> what you have already done. Have you configured your overall Timer Driven
> thread pool appropriately?
>
> Most likely there would need to be a new PutRedis processor that didn't
> have to adhere to the DistributedMapCacheInterface and could use MSET or
> whatever specific Redis functionality was needed.
>
> Another option might be a record-based variation of PutDistributedMapCache
> where you could keep thousands of records together and stream them to the
> cache. It would take a record-path to specify the key for each record and
> serialize the record as the value (assuming your data fits into one of the
> record formats like JSON, Avro, CSV).
>
> -Bryan
>
> On Tue, Mar 31, 2020 at 4:23 PM Hesselmann, Brian <
> brian.hesselmann@cgi.com> wrote:
>
>> Hi,
>>
>> We currently run a flow that puts about 700.000 entries/flowfiles into
>> Redis every 5 minutes. I'm looking for ways to improve performance.
>>
>> Currently we've been upping the number of concurrent tasks and run
>> duration of the PutDistributedMapCache processor to be able to process
>> everything. I know Redis supports setting multiple keys at once using MSET(
>> https://redis.io/commands/mset
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__redis.io_commands_mset&d=DwMFaQ&c=H50I6Bh8SW87d_bXfZP_8g&r=SZ1t8SQDPUG29Dh1I8iJ-uskV9jK3PuRgBiFyP5aljY&m=1nnOc3V31kMYb0yHffJiNjhefJYM79NHp8bM9bX9i0c&s=M5M84Vuo0mPoJU3kz_Job5q4S0N2sHtinRxUGBpKQew&e=>),
>> however using Nifi this command is not available.
>>
>> Short of simply upgrading the system we run Nifi/Redis on, do you have
>> any suggestions for improving performance of PutDistributedMapCache?
>>
>> Best,
>> Brian
>>
>

RE: Performance of adding many keys to redis with PutDistributedMapCache

Posted by "Hesselmann, Brian" <br...@cgi.com>.
Hi Bryan and Mike,

Thanks for the responses. For now we have introduced a ExecuteStreamCommand to use the redis-cli and different commands directly. It seems to improve performance for now, but we will have to look into introducing a new procesor or different DB if necessary.

Thanks,
Brian

________________________________
Van: Mike Thomsen [mikerthomsen@gmail.com]
Verzonden: woensdag 1 april 2020 0:08
Aan: users@nifi.apache.org
Onderwerp: Re: Performance of adding many keys to redis with PutDistributedMapCache

Might be worth experimenting with KeyDB and see if that helps. It's a mutli-threaded fork of Redis that's supposedly about as fast in a single node as a same size Redis cluster when you compare cluster nodes to KeyDB thread pool size.

https://keydb.dev/<https://urldefense.proofpoint.com/v2/url?u=https-3A__keydb.dev_&d=DwMFaQ&c=H50I6Bh8SW87d_bXfZP_8g&r=SZ1t8SQDPUG29Dh1I8iJ-uskV9jK3PuRgBiFyP5aljY&m=1nnOc3V31kMYb0yHffJiNjhefJYM79NHp8bM9bX9i0c&s=sxTO-sVQaGBua-hqcP-AyOfbdlBidK20WyRaAuw7xsM&e=>

On Tue, Mar 31, 2020 at 4:49 PM Bryan Bende <bb...@gmail.com>> wrote:
Hi Brian,

I'm not sure what can really be done with the existing processor besides what you have already done. Have you configured your overall Timer Driven thread pool appropriately?

Most likely there would need to be a new PutRedis processor that didn't have to adhere to the DistributedMapCacheInterface and could use MSET or whatever specific Redis functionality was needed.

Another option might be a record-based variation of PutDistributedMapCache where you could keep thousands of records together and stream them to the cache. It would take a record-path to specify the key for each record and serialize the record as the value (assuming your data fits into one of the record formats like JSON, Avro, CSV).

-Bryan

On Tue, Mar 31, 2020 at 4:23 PM Hesselmann, Brian <br...@cgi.com>> wrote:
Hi,

We currently run a flow that puts about 700.000 entries/flowfiles into Redis every 5 minutes. I'm looking for ways to improve performance.

Currently we've been upping the number of concurrent tasks and run duration of the PutDistributedMapCache processor to be able to process everything. I know Redis supports setting multiple keys at once using MSET(https://redis.io/commands/mset<https://urldefense.proofpoint.com/v2/url?u=https-3A__redis.io_commands_mset&d=DwMFaQ&c=H50I6Bh8SW87d_bXfZP_8g&r=SZ1t8SQDPUG29Dh1I8iJ-uskV9jK3PuRgBiFyP5aljY&m=1nnOc3V31kMYb0yHffJiNjhefJYM79NHp8bM9bX9i0c&s=M5M84Vuo0mPoJU3kz_Job5q4S0N2sHtinRxUGBpKQew&e=>), however using Nifi this command is not available.

Short of simply upgrading the system we run Nifi/Redis on, do you have any suggestions for improving performance of PutDistributedMapCache?

Best,
Brian

Re: Performance of adding many keys to redis with PutDistributedMapCache

Posted by Mike Thomsen <mi...@gmail.com>.
Might be worth experimenting with KeyDB and see if that helps. It's a
mutli-threaded fork of Redis that's supposedly about as fast in a single
node as a same size Redis cluster when you compare cluster nodes to KeyDB
thread pool size.

https://keydb.dev/

On Tue, Mar 31, 2020 at 4:49 PM Bryan Bende <bb...@gmail.com> wrote:

> Hi Brian,
>
> I'm not sure what can really be done with the existing processor besides
> what you have already done. Have you configured your overall Timer Driven
> thread pool appropriately?
>
> Most likely there would need to be a new PutRedis processor that didn't
> have to adhere to the DistributedMapCacheInterface and could use MSET or
> whatever specific Redis functionality was needed.
>
> Another option might be a record-based variation of PutDistributedMapCache
> where you could keep thousands of records together and stream them to the
> cache. It would take a record-path to specify the key for each record and
> serialize the record as the value (assuming your data fits into one of the
> record formats like JSON, Avro, CSV).
>
> -Bryan
>
> On Tue, Mar 31, 2020 at 4:23 PM Hesselmann, Brian <
> brian.hesselmann@cgi.com> wrote:
>
>> Hi,
>>
>> We currently run a flow that puts about 700.000 entries/flowfiles into
>> Redis every 5 minutes. I'm looking for ways to improve performance.
>>
>> Currently we've been upping the number of concurrent tasks and run
>> duration of the PutDistributedMapCache processor to be able to process
>> everything. I know Redis supports setting multiple keys at once using MSET(
>> https://redis.io/commands/mset), however using Nifi this command is not
>> available.
>>
>> Short of simply upgrading the system we run Nifi/Redis on, do you have
>> any suggestions for improving performance of PutDistributedMapCache?
>>
>> Best,
>> Brian
>>
>

Re: Performance of adding many keys to redis with PutDistributedMapCache

Posted by Bryan Bende <bb...@gmail.com>.
Hi Brian,

I'm not sure what can really be done with the existing processor besides
what you have already done. Have you configured your overall Timer Driven
thread pool appropriately?

Most likely there would need to be a new PutRedis processor that didn't
have to adhere to the DistributedMapCacheInterface and could use MSET or
whatever specific Redis functionality was needed.

Another option might be a record-based variation of PutDistributedMapCache
where you could keep thousands of records together and stream them to the
cache. It would take a record-path to specify the key for each record and
serialize the record as the value (assuming your data fits into one of the
record formats like JSON, Avro, CSV).

-Bryan

On Tue, Mar 31, 2020 at 4:23 PM Hesselmann, Brian <br...@cgi.com>
wrote:

> Hi,
>
> We currently run a flow that puts about 700.000 entries/flowfiles into
> Redis every 5 minutes. I'm looking for ways to improve performance.
>
> Currently we've been upping the number of concurrent tasks and run
> duration of the PutDistributedMapCache processor to be able to process
> everything. I know Redis supports setting multiple keys at once using MSET(
> https://redis.io/commands/mset), however using Nifi this command is not
> available.
>
> Short of simply upgrading the system we run Nifi/Redis on, do you have any
> suggestions for improving performance of PutDistributedMapCache?
>
> Best,
> Brian
>