You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by David Handermann <ex...@apache.org> on 2022/03/09 14:50:36 UTC

Re: Performance implications of RPGs for loadbalancing

Hi Isha,

Thanks for following up.  Regarding thread usage, one additional benefit of
load-balanced connections is that they have a separate thread pool for
handling sending and receiving FlowFiles.  Increasing the number of
concurrent tasks on any given component will not necessarily help, and may
actually have a negative impact on overall behavior. Changing the S2S
communication from HTTP to RAW may help, as it avoids using threads from
the web server pool, but in any case, changing to use load-balanced
connections should provide better overall behavior.

Regards,
David Handermann

On Fri, Feb 25, 2022 at 3:06 AM Isha Lamboo <is...@virtualsciences.nl>
wrote:

> Hi David,
>
>
>
> Thanks for your reply.
>
>
>
> I’m confident I’ve identified the root cause now, having been able to
> reproduce the symptoms on a test cluster.
>
>
>
> The issue appears to be not just having many RPGs, but many of them
> pointing to the same input port (that leads to the generic audit logging
> flow).
>
> Under even a low to moderate load a point is reached after some time where
> the input port is not being scheduled often enough, causing a chain
> reaction.
>
> Every RPG that gets scheduled then spends 30 seconds holding its thread
> until the connection to the input port times out. Because the RPGs and
> input ports are on the same cluster and even node for 1/3 of the files,
> this starves the input port (and frankly the whole NiFi instance) of
> threads, increasing the timeout issues, disconnecting nodes, causing UI
> issues etc.
>
>
>
> The solution is still migrating to loadbalanced connections and removing
> the RPGs, though I worry about the same chain reaction if we simply replace
> the RPGs with local output ports all pointing to the same input port. That
> takes time to implement though, so I’m looking to tweaking settings to keep
> the system running for now.
>
>
>
> So my question this time around: What is supposed to happen if I set the
> concurrent tasks for an input port to a high number (say 10-20)? Will the
> port get scheduled with exactly that number or as many threads as are
> available? If there are no 10 threads available, will the port get
> scheduled at all?
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
>
>
>
>
> *Van:* David Handermann <ex...@apache.org>
> *Verzonden:* woensdag 23 februari 2022 19:57
> *Aan:* users@nifi.apache.org
> *Onderwerp:* Re: Performance implications of RPGs for loadbalancing
>
>
>
> Hi Isha,
>
>
>
> Thanks for providing some background on the configuration and related
> issues. Based on the issues you highlighted, it sounds like you are running
> into several known problems.  There are some potential workarounds, but
> refactoring the flow configuration to use standard connection load
> balancing is the best solution. Upgrading to NiFi 1.15.3 addresses a number
> of security and performance issues, including some of the items you
> mentioned.
>
>
>
> Related to the first problem, using RAW socket communication should be
> preferred for RPG communication.  RAW socket communication is not subject
> to the Denial-of-Service filter timeout, and also has less overhead than
> HTTP request processing.  Ensuring that all Remote Process Groups use RAW
> socket communication should help.  When HTTP requests exceed the DoS filter
> timeout, Jetty terminates the connection, which can produce any number of
> errors, such as the End-of-File and Connection Closed issues you have
> observed.
>
>
>
> Using HTTP communication also uses threads from the Jetty server, which
> can impact user interface performance. This might also be part of the
> explanation for cluster nodes getting out of sync, but there could be other
> factors involved.
>
>
>
> NiFi 1.12.1 includes several issues related to the Denial-of-Service
> filter and Site-to-Site communication, which have been addressed in more
> recent releases.  Here are a couple worth noting:
>
> - https://issues.apache.org/jira/browse/NIFI-7912
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-7912&data=04%7C01%7Cisha.lamboo%40virtualsciences.nl%7C7ef36a49d18346a103d508d9f6fe6086%7C21429da9e4ad45f99a6fcd126a64274b%7C0%7C0%7C637812394689837710%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=txyv5SOJ1aGn0dmdC9z%2F3nhgqsVQoRjlFMnQNobMiIM%3D&reserved=0>
> Added new nifi.web.request properties that can be used to change the
> default 30 second timeout and exclude IP addresses from filtering
>
> - https://issues.apache.org/jira/browse/NIFI-9448
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-9448&data=04%7C01%7Cisha.lamboo%40virtualsciences.nl%7C7ef36a49d18346a103d508d9f6fe6086%7C21429da9e4ad45f99a6fcd126a64274b%7C0%7C0%7C637812394689837710%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PoP4SfPIPaauwGQxvi4S%2FI208TkpW%2BB0%2FBrtPPBYw6I%3D&reserved=0>
> Resolved potential IllegalStateException for S2S client communication
>
> - https://issues.apache.org/jira/browse/NIFI-9481
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-9481&data=04%7C01%7Cisha.lamboo%40virtualsciences.nl%7C7ef36a49d18346a103d508d9f6fe6086%7C21429da9e4ad45f99a6fcd126a64274b%7C0%7C0%7C637812394689837710%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QulchPc53aRYbvUSQIl3cycHtiVWsiF4VVww%2BdBClsg%3D&reserved=0>
> Exclude HTTP Site-to-Site Communication from DoS Filter
>
>
>
> The last issue is not yet part of a released version, but the other two
> are resolved in NiFi 1.15.3.
>
>
>
> Although upgrading and migrating to connection load balancing will take
> some work, it is the best path forward to address the issues you observed.
>
>
>
> Regards,
>
> David Handermann
>
>
>
> On Wed, Feb 23, 2022 at 11:55 AM Isha Lamboo <
> isha.lamboo@virtualsciences.nl> wrote:
>
> Hi all,
>
>
>
> I’m hoping to get some perspective from people that have NiFi with a large
> number of Remote Process Groups.
>
>
>
> I’m supporting a NiFi 1.12.1 (yes, I know) cluster of 3 nodes that has
> about 5k processors and load-balancing still done the pre-1.8 way, with
> RPGs looping back to the local cluster. There are 500+ RPGs with only about
> 30 actually going to other NiFi clusters.
>
>
>
> We’re having several problems:
>
> ·         input ports getting stuck when the RPG is set to HTTP protocol
> and connections get killed  by the Jetty DoS filter after 30 secs. The
> standard is RAW, but sometimes a HTTP RPG still gets deployed.
>
> ·         Intermittent errors like EoF, connection closed etc on HTTP
> connections
>
> ·         The cluster being unable to sync changes made to the flow
> resulting in disconnected nodes and sometimes uninheritable flow exceptions.
>
>
>
> My idea is that the RPGs should be replaced by load-balanced connection
> and/or local ports, but developer resources are scarce, so I want to either
> make a business case or tune NiFi performance if 500 RPGs should not cause
> problems normally.
>
>
>
> So is this a known issue or particular to my case? How can I
> identify/solve performance bottlenecks with RPGs?
>
>
>
> Kind regards,
>
>
>
> Isha Lamboo
>
>
>
>