You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Eric Secules <es...@gmail.com> on 2022/04/19 19:34:47 UTC

NiFi web UI not responsive under load

Hello,

When my nifi system goes under high load the web UI becomes unresponsive
until load comes down. Is there a way I can see what's going on (processor
status summary, queued count, active thread count) when the UI is
unresponsive?

The logs are not showing any errors and the various repositories are all
mounted to separate volumes with elastic capacity so I am sure that none of
them ran out of space. Our monitoring shows bursts of CPU usage and memory
use lower than normal.

The logs show that the StandardProcessScheduler stops processors followed
by starting them, but I never see logs related to the UI being ready to
serve. It does this about once an hour. I see that the flow is slowly
processing based on log activity and databases.

How can I see what's going on when the web UI is not responding?

Thanks,
Eric

Re: NiFi web UI not responsive under load

Posted by Eric Secules <es...@gmail.com>.
Hi Mark,
Here's the results from top:
[image: image.png]

Are there other actions that the system performs that could prevent the
http server from accepting connections?

We did a number of things including lowering the max JVM heap allocation,
changing the long running task monitor nifi property, as well as recreating
the whole container from scratch. One of those fixed my issue and we can
see the UI and files going through again.

Thanks Mark and Joe!

-Eric

On Tue, Apr 19, 2022 at 1:52 PM Mark Payne <ma...@hotmail.com> wrote:

> Eric,
>
> How much memory is available on the VM/system? Curious if maybe oom_killer
> or some process like that is killing nifi. Might want to check system logs
> like dmesg to see if there’s anything there.
>
> Thanks
> -Mark
>
> On Apr 19, 2022, at 4:28 PM, Eric Secules <es...@gmail.com> wrote:
>
> Hi Joe & Mark,
>
> I'll work on getting those JVM stats. But the instance is locked down
> pretty tight so getting GC, and Thread dump will be difficult.
>
> We renamed the flow.xml.gz file and restarted with the hope of seeing some
> UI. The overall CPU usage is lower, but that is not helping the UI.
>
> Bootstrap log says:
>
>> /opt/nifi/nifi-current/logs$ tail nifi-bootstrap.log
>> 2022-04-19 19:43:03,104 INFO [main] o.a.n.b.NotificationServiceManager
>> Successfully loaded the following 0 services: []
>> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
>> Registered no Notification Services for Notification Type NIFI_STARTED
>> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
>> Registered no Notification Services for Notification Type NIFI_STOPPED
>> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
>> Registered no Notification Services for Notification Type NIFI_DIED
>> 2022-04-19 19:43:03,143 INFO [main] org.apache.nifi.bootstrap.RunNiFi
>> Runtime Java version: 1.8.0_265
>> 2022-04-19 19:43:03,154 INFO [main] org.apache.nifi.bootstrap.Command
>> Starting Apache NiFi...
>> 2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command
>> Working Directory: /opt/nifi/nifi-current
>> 2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command
>> Command: /usr/local/openjdk-8/bin/java -classpath
>> /opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/nifi-nar-utils-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/logback-classic-1.2.3.jar:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/slf4j-api-1.7.30.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/jul-to-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-properties-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/jcl-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/logback-core-1.2.3.jar:/opt/nifi/nifi-current/./lib/jetty-schemas-3.1.jar
>> -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx14g -Xms12g
>> -Dcurator-log-only-first-connection-issue-as-error-level=true
>> -Djavax.security.auth.useSubjectCredsOnly=true
>> -Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false
>> -Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true
>> -Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol
>> -Dnifi.properties.file.path=/opt/nifi/nifi-current/./conf/nifi.properties
>> -Dnifi.bootstrap.listen.port=34211 -Dapp=NiFi
>> -Dorg.apache.nifi.bootstrap.config.log.dir=/opt/nifi/nifi-current/logs
>> org.apache.nifi.NiFi
>> 2022-04-19 19:43:08,517 INFO [main] org.apache.nifi.bootstrap.Command
>> Launched Apache NiFi with Process ID 123
>> 2022-04-19 19:43:13,834 INFO [NiFi Bootstrap Command Listener]
>> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for
>> Bootstrap requests on port 44417
>
>
> Mark, our nifi.properties has no value set for that property, so we are
> using whatever the default is i guess. Since it's 1.14.0 we will take your
> suggestion and set it to a long time.
>
> CPU taken by nifi is 37% (3.5 CPU cores) and 22.2% memory the jvm was
> started with -Xmx14g -Xms12g
> Disk IO is happening, but not at a high rate.
>
> App log contains these messages before the Jetty shutdown message
>
>> {"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
>>  Loaded 113 properties from system properties and environment
>> variables"}
>>
>> {"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
>>  Loaded a total of 113 properties.  Including precedence overrides
>> effective accessible registry key size is 113"}
>>
>> {"level":"INFO","thread":"main","message":"o.a.n.p.store.WriteAheadStorePartition
>>  After recovering ./provenance_repository, next Event ID to be generated
>>  will be 13089684"}
>>
>> {"level":"INFO","thread":"main","message":"o.a.n.p.index.lucene.LuceneEventIndex
>>  Will avoid re-indexing Provenance Events because the newest index is
>> defunct, so it will be re-indexed in the background"}
>>
>> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.index.lucene.LuceneEventIndex
>>  Determined that Max Event ID indexed for Partition default is
>> approximately 13082369 based on index
>> ./provenance_repository/lucene-8-index-1649528440360"}
>>
>> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
>>  The last Provenance Event indexed for partition default is 13072369,
>> but the last event written to partition has ID 13089683. Re-indexing up
>> to the last 17314 events for ./provenance_repository to ensure that the
>> Event Index is accurate and up-to-date"}
>>
>> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
>>  Finished re-indexing 17315 events across 2 files for
>> ./provenance_repository in 9.713 seconds"}
>>
>> {"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
>>  Maximum Threshold for Container default set to 2858730232217 bytes; if
>> volume exceeds this size, archived data will be deleted until it no
>> longer exceeds this size"}
>>
>> {"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
>>  Initializing FileSystemRepository with 'Always Sync' set to false"}
>>
>> {"level":"INFO","thread":"Thread-1","message":"org.apache.nifi.NiFi Initiating shutdown of Jetty web server..."}
>>
>>
> But there are no system error or warning messages.
>
> Despite seeing logs that the webserver is listening, we always get
> "connection refused" when trying to communicate with it.
>
> Thanks,
> Eric
>
>
> On Tue, Apr 19, 2022 at 12:57 PM Mark Payne <ma...@hotmail.com> wrote:
>
>> Eric,
>>
>> I certainly agree with what Joe said. I would also recommend checking in
>> nifi.properties if you have a value set for:
>>
>> nifi.monitor.long.running.task.schedule
>>
>> I recommend setting that to “9999 hours”
>>
>> In 1.14.0, we introduced the notion of a Long-Running Task Monitor. It’s
>> generally very fast. Typically runs in 10s of milliseconds on my macbook.
>> But it relies on JVM-specific code, and we’ve seen in some environments
>> that can cause the UI responsiveness to be very adversely affected. We
>> disabled the task monitor by default in 1.15, I believe, because of this.
>>
>> Thanks
>> -Mark
>>
>>
>> On Apr 19, 2022, at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:
>>
>> Eric
>>
>> When the UI isn't responsive it would be great to have a snapshot of:
>> - CPU usage at that time
>> - GC behavior/logging at/around that time.
>> - IO Utilization around that time
>> - NiFi Thread dump precisely during it and ideally also one after it
>> responds again
>>
>> NiFi Restarting itself is very interesting of course.  There should be
>> more in the app log and bootstrap that will help illuminate the issue
>> then.
>>
>> Thanks
>>
>>
>> On Tue, Apr 19, 2022 at 12:42 PM Eric Secules <es...@gmail.com> wrote:
>>
>>> By the way, I am running NiFi 1.14.0 and it looks like it keeps
>>> restarting itself. I am seeing this in the logs about once an hour.
>>>
>>> {"level":"INFO","thread":"main","message":"org.apache.nifi.NiFi
>>>> Controller initialization took 4737354582168 nanoseconds (4737 seconds)
>>>> ."}
>>>>
>>>
>>> On Tue, Apr 19, 2022 at 12:34 PM Eric Secules <es...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> When my nifi system goes under high load the web UI becomes
>>>> unresponsive until load comes down. Is there a way I can see what's going
>>>> on (processor status summary, queued count, active thread count) when the
>>>> UI is unresponsive?
>>>>
>>>> The logs are not showing any errors and the various repositories are
>>>> all mounted to separate volumes with elastic capacity so I am sure that
>>>> none of them ran out of space. Our monitoring shows bursts of CPU usage and
>>>> memory use lower than normal.
>>>>
>>>> The logs show that the StandardProcessScheduler stops processors
>>>> followed by starting them, but I never see logs related to the UI being
>>>> ready to serve. It does this about once an hour. I see that the flow is
>>>> slowly processing based on log activity and databases.
>>>>
>>>> How can I see what's going on when the web UI is not responding?
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>
>>
>

Re: NiFi web UI not responsive under load

Posted by Mark Payne <ma...@hotmail.com>.
Eric,

How much memory is available on the VM/system? Curious if maybe oom_killer or some process like that is killing nifi. Might want to check system logs like dmesg to see if there’s anything there.

Thanks
-Mark

On Apr 19, 2022, at 4:28 PM, Eric Secules <es...@gmail.com>> wrote:

Hi Joe & Mark,

I'll work on getting those JVM stats. But the instance is locked down pretty tight so getting GC, and Thread dump will be difficult.

We renamed the flow.xml.gz file and restarted with the hope of seeing some UI. The overall CPU usage is lower, but that is not helping the UI.

Bootstrap log says:
/opt/nifi/nifi-current/logs$ tail nifi-bootstrap.log
2022-04-19 19:43:03,104 INFO [main] o.a.n.b.NotificationServiceManager Successfully loaded the following 0 services: []
2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_STARTED
2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_STOPPED
2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_DIED
2022-04-19 19:43:03,143 INFO [main] org.apache.nifi.bootstrap.RunNiFi Runtime Java version: 1.8.0_265
2022-04-19 19:43:03,154 INFO [main] org.apache.nifi.bootstrap.Command Starting Apache NiFi...
2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command Working Directory: /opt/nifi/nifi-current
2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command Command: /usr/local/openjdk-8/bin/java -classpath /opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/nifi-nar-utils-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/logback-classic-1.2.3.jar:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/slf4j-api-1.7.30.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/jul-to-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-properties-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/jcl-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/logback-core-1.2.3.jar:/opt/nifi/nifi-current/./lib/jetty-schemas-3.1.jar -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx14g -Xms12g -Dcurator-log-only-first-connection-issue-as-error-level=true -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false -Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol -Dnifi.properties.file.path=/opt/nifi/nifi-current/./conf/nifi.properties -Dnifi.bootstrap.listen.port=34211 -Dapp=NiFi -Dorg.apache.nifi.bootstrap.config.log.dir=/opt/nifi/nifi-current/logs org.apache.nifi.NiFi
2022-04-19 19:43:08,517 INFO [main] org.apache.nifi.bootstrap.Command Launched Apache NiFi with Process ID 123
2022-04-19 19:43:13,834 INFO [NiFi Bootstrap Command Listener] org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for Bootstrap requests on port 44417

Mark, our nifi.properties has no value set for that property, so we are using whatever the default is i guess. Since it's 1.14.0 we will take your suggestion and set it to a long time.

CPU taken by nifi is 37% (3.5 CPU cores) and 22.2% memory the jvm was started with -Xmx14g -Xms12g
Disk IO is happening, but not at a high rate.

App log contains these messages before the Jetty shutdown message

{"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
 Loaded 113 properties from system properties and environment
variables"}

{"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
 Loaded a total of 113 properties.  Including precedence overrides
effective accessible registry key size is 113"}

{"level":"INFO","thread":"main","message":"o.a.n.p.store.WriteAheadStorePartition
 After recovering ./provenance_repository, next Event ID to be generated
 will be 13089684"}

{"level":"INFO","thread":"main","message":"o.a.n.p.index.lucene.LuceneEventIndex
 Will avoid re-indexing Provenance Events because the newest index is
defunct, so it will be re-indexed in the background"}

{"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.index.lucene.LuceneEventIndex
 Determined that Max Event ID indexed for Partition default is
approximately 13082369 based on index
./provenance_repository/lucene-8-index-1649528440360"}

{"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
 The last Provenance Event indexed for partition default is 13072369,
but the last event written to partition has ID 13089683. Re-indexing up
to the last 17314 events for ./provenance_repository to ensure that the
Event Index is accurate and up-to-date"}

{"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
 Finished re-indexing 17315 events across 2 files for
./provenance_repository in 9.713 seconds"}

{"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
 Maximum Threshold for Container default set to 2858730232217 bytes; if
volume exceeds this size, archived data will be deleted until it no
longer exceeds this size"}

{"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
 Initializing FileSystemRepository with 'Always Sync' set to false"}

{"level":"INFO","thread":"Thread-1","message":"org.apache.nifi.NiFi Initiating shutdown of Jetty web server..."}

But there are no system error or warning messages.

Despite seeing logs that the webserver is listening, we always get "connection refused" when trying to communicate with it.

Thanks,
Eric


On Tue, Apr 19, 2022 at 12:57 PM Mark Payne <ma...@hotmail.com>> wrote:
Eric,

I certainly agree with what Joe said. I would also recommend checking in nifi.properties if you have a value set for:


nifi.monitor.long.running.task.schedule

I recommend setting that to “9999 hours”

In 1.14.0, we introduced the notion of a Long-Running Task Monitor. It’s generally very fast. Typically runs in 10s of milliseconds on my macbook. But it relies on JVM-specific code, and we’ve seen in some environments that can cause the UI responsiveness to be very adversely affected. We disabled the task monitor by default in 1.15, I believe, because of this.

Thanks
-Mark


On Apr 19, 2022, at 3:44 PM, Joe Witt <jo...@gmail.com>> wrote:

Eric

When the UI isn't responsive it would be great to have a snapshot of:
- CPU usage at that time
- GC behavior/logging at/around that time.
- IO Utilization around that time
- NiFi Thread dump precisely during it and ideally also one after it responds again

NiFi Restarting itself is very interesting of course.  There should be more in the app log and bootstrap that will help illuminate the issue then.

Thanks


On Tue, Apr 19, 2022 at 12:42 PM Eric Secules <es...@gmail.com>> wrote:
By the way, I am running NiFi 1.14.0 and it looks like it keeps restarting itself. I am seeing this in the logs about once an hour.

{"level":"INFO","thread":"main","message":"org.apache.nifi.NiFi Controller initialization took 4737354582168 nanoseconds (4737 seconds)."}

On Tue, Apr 19, 2022 at 12:34 PM Eric Secules <es...@gmail.com>> wrote:
Hello,

When my nifi system goes under high load the web UI becomes unresponsive until load comes down. Is there a way I can see what's going on (processor status summary, queued count, active thread count) when the UI is unresponsive?

The logs are not showing any errors and the various repositories are all mounted to separate volumes with elastic capacity so I am sure that none of them ran out of space. Our monitoring shows bursts of CPU usage and memory use lower than normal.

The logs show that the StandardProcessScheduler stops processors followed by starting them, but I never see logs related to the UI being ready to serve. It does this about once an hour. I see that the flow is slowly processing based on log activity and databases.

How can I see what's going on when the web UI is not responding?

Thanks,
Eric



Re: NiFi web UI not responsive under load

Posted by Eric Secules <es...@gmail.com>.
Hi Joe & Mark,

I'll work on getting those JVM stats. But the instance is locked down
pretty tight so getting GC, and Thread dump will be difficult.

We renamed the flow.xml.gz file and restarted with the hope of seeing some
UI. The overall CPU usage is lower, but that is not helping the UI.

Bootstrap log says:

> /opt/nifi/nifi-current/logs$ tail nifi-bootstrap.log
> 2022-04-19 19:43:03,104 INFO [main] o.a.n.b.NotificationServiceManager
> Successfully loaded the following 0 services: []
> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> Registered no Notification Services for Notification Type NIFI_STARTED
> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> Registered no Notification Services for Notification Type NIFI_STOPPED
> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> Registered no Notification Services for Notification Type NIFI_DIED
> 2022-04-19 19:43:03,143 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> Runtime Java version: 1.8.0_265
> 2022-04-19 19:43:03,154 INFO [main] org.apache.nifi.bootstrap.Command
> Starting Apache NiFi...
> 2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command
> Working Directory: /opt/nifi/nifi-current
> 2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command
> Command: /usr/local/openjdk-8/bin/java -classpath
> /opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/nifi-nar-utils-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/logback-classic-1.2.3.jar:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/slf4j-api-1.7.30.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/jul-to-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-properties-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/jcl-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/logback-core-1.2.3.jar:/opt/nifi/nifi-current/./lib/jetty-schemas-3.1.jar
> -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx14g -Xms12g
> -Dcurator-log-only-first-connection-issue-as-error-level=true
> -Djavax.security.auth.useSubjectCredsOnly=true
> -Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false
> -Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true
> -Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol
> -Dnifi.properties.file.path=/opt/nifi/nifi-current/./conf/nifi.properties
> -Dnifi.bootstrap.listen.port=34211 -Dapp=NiFi
> -Dorg.apache.nifi.bootstrap.config.log.dir=/opt/nifi/nifi-current/logs
> org.apache.nifi.NiFi
> 2022-04-19 19:43:08,517 INFO [main] org.apache.nifi.bootstrap.Command
> Launched Apache NiFi with Process ID 123
> 2022-04-19 19:43:13,834 INFO [NiFi Bootstrap Command Listener]
> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for
> Bootstrap requests on port 44417


Mark, our nifi.properties has no value set for that property, so we are
using whatever the default is i guess. Since it's 1.14.0 we will take your
suggestion and set it to a long time.

CPU taken by nifi is 37% (3.5 CPU cores) and 22.2% memory the jvm was
started with -Xmx14g -Xms12g
Disk IO is happening, but not at a high rate.

App log contains these messages before the Jetty shutdown message

> {"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
>  Loaded 113 properties from system properties and environment
> variables"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
>  Loaded a total of 113 properties.  Including precedence overrides
> effective accessible registry key size is 113"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.p.store.WriteAheadStorePartition
>  After recovering ./provenance_repository, next Event ID to be generated
>  will be 13089684"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.p.index.lucene.LuceneEventIndex
>  Will avoid re-indexing Provenance Events because the newest index is
> defunct, so it will be re-indexed in the background"}
>
> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.index.lucene.LuceneEventIndex
>  Determined that Max Event ID indexed for Partition default is
> approximately 13082369 based on index
> ./provenance_repository/lucene-8-index-1649528440360"}
>
> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
>  The last Provenance Event indexed for partition default is 13072369,
> but the last event written to partition has ID 13089683. Re-indexing up
> to the last 17314 events for ./provenance_repository to ensure that the
> Event Index is accurate and up-to-date"}
>
> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
>  Finished re-indexing 17315 events across 2 files for
> ./provenance_repository in 9.713 seconds"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
>  Maximum Threshold for Container default set to 2858730232217 bytes; if
> volume exceeds this size, archived data will be deleted until it no
> longer exceeds this size"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
>  Initializing FileSystemRepository with 'Always Sync' set to false"}
>
> {"level":"INFO","thread":"Thread-1","message":"org.apache.nifi.NiFi Initiating shutdown of Jetty web server..."}
>
>
But there are no system error or warning messages.

Despite seeing logs that the webserver is listening, we always get
"connection refused" when trying to communicate with it.

Thanks,
Eric


On Tue, Apr 19, 2022 at 12:57 PM Mark Payne <ma...@hotmail.com> wrote:

> Eric,
>
> I certainly agree with what Joe said. I would also recommend checking in
> nifi.properties if you have a value set for:
>
> nifi.monitor.long.running.task.schedule
>
> I recommend setting that to “9999 hours”
>
> In 1.14.0, we introduced the notion of a Long-Running Task Monitor. It’s
> generally very fast. Typically runs in 10s of milliseconds on my macbook.
> But it relies on JVM-specific code, and we’ve seen in some environments
> that can cause the UI responsiveness to be very adversely affected. We
> disabled the task monitor by default in 1.15, I believe, because of this.
>
> Thanks
> -Mark
>
>
> On Apr 19, 2022, at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:
>
> Eric
>
> When the UI isn't responsive it would be great to have a snapshot of:
> - CPU usage at that time
> - GC behavior/logging at/around that time.
> - IO Utilization around that time
> - NiFi Thread dump precisely during it and ideally also one after it
> responds again
>
> NiFi Restarting itself is very interesting of course.  There should be
> more in the app log and bootstrap that will help illuminate the issue
> then.
>
> Thanks
>
>
> On Tue, Apr 19, 2022 at 12:42 PM Eric Secules <es...@gmail.com> wrote:
>
>> By the way, I am running NiFi 1.14.0 and it looks like it keeps
>> restarting itself. I am seeing this in the logs about once an hour.
>>
>> {"level":"INFO","thread":"main","message":"org.apache.nifi.NiFi
>>> Controller initialization took 4737354582168 nanoseconds (4737 seconds).
>>> "}
>>>
>>
>> On Tue, Apr 19, 2022 at 12:34 PM Eric Secules <es...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> When my nifi system goes under high load the web UI becomes unresponsive
>>> until load comes down. Is there a way I can see what's going on (processor
>>> status summary, queued count, active thread count) when the UI is
>>> unresponsive?
>>>
>>> The logs are not showing any errors and the various repositories are all
>>> mounted to separate volumes with elastic capacity so I am sure that none of
>>> them ran out of space. Our monitoring shows bursts of CPU usage and memory
>>> use lower than normal.
>>>
>>> The logs show that the StandardProcessScheduler stops processors
>>> followed by starting them, but I never see logs related to the UI being
>>> ready to serve. It does this about once an hour. I see that the flow is
>>> slowly processing based on log activity and databases.
>>>
>>> How can I see what's going on when the web UI is not responding?
>>>
>>> Thanks,
>>> Eric
>>>
>>
>

Re: NiFi web UI not responsive under load

Posted by Mark Payne <ma...@hotmail.com>.
Eric,

I certainly agree with what Joe said. I would also recommend checking in nifi.properties if you have a value set for:


nifi.monitor.long.running.task.schedule

I recommend setting that to “9999 hours”

In 1.14.0, we introduced the notion of a Long-Running Task Monitor. It’s generally very fast. Typically runs in 10s of milliseconds on my macbook. But it relies on JVM-specific code, and we’ve seen in some environments that can cause the UI responsiveness to be very adversely affected. We disabled the task monitor by default in 1.15, I believe, because of this.

Thanks
-Mark


On Apr 19, 2022, at 3:44 PM, Joe Witt <jo...@gmail.com>> wrote:

Eric

When the UI isn't responsive it would be great to have a snapshot of:
- CPU usage at that time
- GC behavior/logging at/around that time.
- IO Utilization around that time
- NiFi Thread dump precisely during it and ideally also one after it responds again

NiFi Restarting itself is very interesting of course.  There should be more in the app log and bootstrap that will help illuminate the issue then.

Thanks


On Tue, Apr 19, 2022 at 12:42 PM Eric Secules <es...@gmail.com>> wrote:
By the way, I am running NiFi 1.14.0 and it looks like it keeps restarting itself. I am seeing this in the logs about once an hour.

{"level":"INFO","thread":"main","message":"org.apache.nifi.NiFi Controller initialization took 4737354582168 nanoseconds (4737 seconds)."}

On Tue, Apr 19, 2022 at 12:34 PM Eric Secules <es...@gmail.com>> wrote:
Hello,

When my nifi system goes under high load the web UI becomes unresponsive until load comes down. Is there a way I can see what's going on (processor status summary, queued count, active thread count) when the UI is unresponsive?

The logs are not showing any errors and the various repositories are all mounted to separate volumes with elastic capacity so I am sure that none of them ran out of space. Our monitoring shows bursts of CPU usage and memory use lower than normal.

The logs show that the StandardProcessScheduler stops processors followed by starting them, but I never see logs related to the UI being ready to serve. It does this about once an hour. I see that the flow is slowly processing based on log activity and databases.

How can I see what's going on when the web UI is not responding?

Thanks,
Eric


Re: NiFi web UI not responsive under load

Posted by Joe Witt <jo...@gmail.com>.
Eric

When the UI isn't responsive it would be great to have a snapshot of:
- CPU usage at that time
- GC behavior/logging at/around that time.
- IO Utilization around that time
- NiFi Thread dump precisely during it and ideally also one after it
responds again

NiFi Restarting itself is very interesting of course.  There should be more
in the app log and bootstrap that will help illuminate the issue then.

Thanks


On Tue, Apr 19, 2022 at 12:42 PM Eric Secules <es...@gmail.com> wrote:

> By the way, I am running NiFi 1.14.0 and it looks like it keeps restarting
> itself. I am seeing this in the logs about once an hour.
>
> {"level":"INFO","thread":"main","message":"org.apache.nifi.NiFi Controller
>> initialization took 4737354582168 nanoseconds (4737 seconds)."}
>>
>
> On Tue, Apr 19, 2022 at 12:34 PM Eric Secules <es...@gmail.com> wrote:
>
>> Hello,
>>
>> When my nifi system goes under high load the web UI becomes unresponsive
>> until load comes down. Is there a way I can see what's going on (processor
>> status summary, queued count, active thread count) when the UI is
>> unresponsive?
>>
>> The logs are not showing any errors and the various repositories are all
>> mounted to separate volumes with elastic capacity so I am sure that none of
>> them ran out of space. Our monitoring shows bursts of CPU usage and memory
>> use lower than normal.
>>
>> The logs show that the StandardProcessScheduler stops processors followed
>> by starting them, but I never see logs related to the UI being ready to
>> serve. It does this about once an hour. I see that the flow is slowly
>> processing based on log activity and databases.
>>
>> How can I see what's going on when the web UI is not responding?
>>
>> Thanks,
>> Eric
>>
>

Re: NiFi web UI not responsive under load

Posted by Eric Secules <es...@gmail.com>.
By the way, I am running NiFi 1.14.0 and it looks like it keeps restarting
itself. I am seeing this in the logs about once an hour.

{"level":"INFO","thread":"main","message":"org.apache.nifi.NiFi Controller
> initialization took 4737354582168 nanoseconds (4737 seconds)."}
>

On Tue, Apr 19, 2022 at 12:34 PM Eric Secules <es...@gmail.com> wrote:

> Hello,
>
> When my nifi system goes under high load the web UI becomes unresponsive
> until load comes down. Is there a way I can see what's going on (processor
> status summary, queued count, active thread count) when the UI is
> unresponsive?
>
> The logs are not showing any errors and the various repositories are all
> mounted to separate volumes with elastic capacity so I am sure that none of
> them ran out of space. Our monitoring shows bursts of CPU usage and memory
> use lower than normal.
>
> The logs show that the StandardProcessScheduler stops processors followed
> by starting them, but I never see logs related to the UI being ready to
> serve. It does this about once an hour. I see that the flow is slowly
> processing based on log activity and databases.
>
> How can I see what's going on when the web UI is not responding?
>
> Thanks,
> Eric
>