You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Nick Carenza <ni...@thecontrolgroup.com> on 2017/02/09 02:24:37 UTC

ControlRate across cluster

I have been running a standalone instance of Nifi and am preparing a move
into a cluster configuration. One aspect I am curious about is how
ControlRate is going to operate with n nodes. I am using control rate to
satisfy rate-limit requirements for external services.

My flow looks something like:

...
> ControlRate count 5000/sec
> ControlRate data   5MB/sec
> PutKinesisFirehose batch 500, buffer 4MB


I am trying to figure out how to throttle when I add a second node which
will be running the same flow.

ControlRate might already run on the primary node only. I noticed in code
it had the @TriggerSerially annotation which is in common with ListS3, and
ListSFTP which are isolated processors that only run on the primary node. I
don't know exactly what defines a processor as isolated though. If
ControlRate is not isolated, one option would be to make it (optionally)
so. The description doesn't explicitly say if it is or not and I couldn't
find anything related to isolated processors in the developer guide. Only
the admin-guide seems to use that terminology. Does anyone have some
insight on this?

I could divide the count and data rates on each ControlRate to
rate-limit/node-count. With batching though I think they might be able to
exceed the rate limit of a given stream unless I also divided batch sizes.
This option seems not great because I don't want to have to update
properties when adding/removing nodes.

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#clustering
http://docs.aws.amazon.com/firehose/latest/dev/limits.html

Thanks,
Nick