You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Jean-Sebastien Vachon <js...@brizodata.com> on 2019/10/03 01:25:27 UTC

Weird behaviour

Hi all,

First thing first... sorry for the long post but my Nifi instance (1.9.1) started to behave weird (according to my experience at least) this afternoon and I'm struggling to determine the cause of the problem (and fix it)
All processors are stopped and I have about 205k flows queued in a single queue as illustrated below.

[cid:1c8868f8-4527-4d5c-86f7-c35a16116385]

The process "SaveToDB" seems to be causing some issues. It actually does a few more things than saving to the DB so this module was implemented as an ExecuteStreamCommand processor using Python3.
This module hasn't changed in quite some time and this is the first time I'm having issues around it. Whenever I start it (even with only 1 thread), Nifi start to show the following message...

[cid:e107a064-c536-42c7-9ae9-4b88fbce3e93]

Another weird issue is that if I try to open the "View Status History" on this processor, Nifi says: "Insufficient history, please try again later".
I tried many things, like restarting Nifi, and noticed that the CPU runs at 150% on startup with no indication that it will ever stop. I read about these warning/error messages and tuned a few settings to increase the number of threads to index and restarted Nifi but
the CPU is still over 100% when no actual process is running in Nifi. Looking at /nifi-api/system-diagnostics shows the following:

{"systemDiagnostics": {"aggregateSnapshot": {"totalNonHeap": "235.4 MB","totalNonHeapBytes": 246833152,"usedNonHeap": "222.25 MB","usedNonHeapBytes": 233046120,"freeNonHeap": "13.15 MB","freeNonHeapBytes": 13787032,"maxNonHeap": "-1 bytes","maxNonHeapBytes": -1,"totalHeap": "12 GB","totalHeapBytes": 12884901888,"usedHeap": "7.14 GB","usedHeapBytes": 7670910608,"freeHeap": "4.86 GB","freeHeapBytes": 5213991280,"maxHeap": "12 GB","maxHeapBytes": 12884901888,"heapUtilization": "60.0%","availableProcessors": 72,"processorLoadAverage": 1.06,"totalThreads": 916,"daemonThreads": 53,"uptime": "00:58:48.996",
I was wondering if the high number of threads is expected or not... 916 threads seems a lot to me. Especially since my max threads count is set to 400 right now.
The machine has 72 CPUs and is quite capable but the CPU has been running at over 100% for hours now.

Do you have any recommendation?

thanks

Re: Weird behaviour

Posted by Jean-Sebastien Vachon <js...@brizodata.com>.

Hi Joe,

I read a thread about this new provenance repo yesterday night (https://www.mail-archive.com/users@nifi.apache.org/msg07179.html) and tried it.
It did a very big difference and I was able to process my entire queue during the night.

I never had any issues with GC1 so far. What GC engine to you recommend? CMS?

thanks
________________________________
From: Joe Witt <jo...@gmail.com>
Sent: Wednesday, October 2, 2019 10:30 PM
To: users@nifi.apache.org <us...@nifi.apache.org>
Subject: Re: Weird behaviour

Jean

Id recommend switching to the new provenance repo called WriteAheadProvenaceRepository.  Look at a new nifi downloads nifi.properties as it has been the default for a while.  This will help the prov stuff.  You may also want to stop using g1gc if on java 8.

I cant explain the status history finding.

Thanks

On Wed, Oct 2, 2019 at 8:32 PM Jean-Sebastien Vachon <js...@brizodata.com>> wrote:
Just did a thread dump using bin/nifi.sh dump and saw that a lot of threads are waiting on a lock.



"Provenance Maintenance Thread-3" Id=99 RUNNABLE
        at java.io.UnixFileSystem.getLength(Native Method)
        at java.io.File.length(File.java:974)
        at org.apache.nifi.provenance.IndexConfiguration.getSize(IndexConfiguration.java:341)
        at org.apache.nifi.provenance.IndexConfiguration.getIndexSize(IndexConfiguration.java:355)
        at org.apache.nifi.provenance.PersistentProvenanceRepository.getSize(PersistentProvenanceRepository.java:892)
        at org.apache.nifi.provenance.PersistentProvenanceRepository.purgeOldEvents(PersistentProvenanceRepository.java:913)
        - waiting on org.apache.nifi.provenance.PersistentProvenanceRepository@3cee277e
        at org.apache.nifi.provenance.PersistentProvenanceRepository.rollover(PersistentProvenanceRepository.java:1416)
        at org.apache.nifi.provenance.PersistentProvenanceRepository.access$300(PersistentProvenanceRepository.java:131)
        at org.apache.nifi.provenance.PersistentProvenanceRepository$1.run(PersistentProvenanceRepository.java:302)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        Number of Locked Synchronizers: 3
        - java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@46683a80
        - java.util.concurrent.locks.ReentrantLock$NonfairSync@3973f842
        - java.util.concurrent.ThreadPoolExecutor$Worker@3d2af8d7

"Provenance Query Thread-1" Id=1185 WAITING  on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@232695c5
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

________________________________
From: Jean-Sebastien Vachon <js...@brizodata.com>>
Sent: Wednesday, October 2, 2019 9:25 PM
To: users@nifi.apache.org<ma...@nifi.apache.org> <us...@nifi.apache.org>>
Subject: Weird behaviour

Hi all,

First thing first... sorry for the long post but my Nifi instance (1.9.1) started to behave weird (according to my experience at least) this afternoon and I'm struggling to determine the cause of the problem (and fix it)
All processors are stopped and I have about 205k flows queued in a single queue as illustrated below.


[cid:16d8f727eed72f938c21]

The process "SaveToDB" seems to be causing some issues. It actually does a few more things than saving to the DB so this module was implemented as an ExecuteStreamCommand processor using Python3.
This module hasn't changed in quite some time and this is the first time I'm having issues around it. Whenever I start it (even with only 1 thread), Nifi start to show the following message...

[cid:16d8f727eedb7cb0fdb2]

Another weird issue is that if I try to open the "View Status History" on this processor, Nifi says: "Insufficient history, please try again later".
I tried many things, like restarting Nifi, and noticed that the CPU runs at 150% on startup with no indication that it will ever stop. I read about these warning/error messages and tuned a few settings to increase the number of threads to index and restarted Nifi but
the CPU is still over 100% when no actual process is running in Nifi. Looking at /nifi-api/system-diagnostics shows the following:

{"systemDiagnostics": {"aggregateSnapshot": {"totalNonHeap": "235.4 MB","totalNonHeapBytes": 246833152,"usedNonHeap": "222.25 MB","usedNonHeapBytes": 233046120,"freeNonHeap": "13.15 MB","freeNonHeapBytes": 13787032,"maxNonHeap": "-1 bytes","maxNonHeapBytes": -1,"totalHeap": "12 GB","totalHeapBytes": 12884901888,"usedHeap": "7.14 GB","usedHeapBytes": 7670910608,"freeHeap": "4.86 GB","freeHeapBytes": 5213991280,"maxHeap": "12 GB","maxHeapBytes": 12884901888,"heapUtilization": "60.0%","availableProcessors": 72,"processorLoadAverage": 1.06,"totalThreads": 916,"daemonThreads": 53,"uptime": "00:58:48.996",
I was wondering if the high number of threads is expected or not... 916 threads seems a lot to me. Especially since my max threads count is set to 400 right now.
The machine has 72 CPUs and is quite capable but the CPU has been running at over 100% for hours now.

Do you have any recommendation?

thanks

Re: Weird behaviour

Posted by Joe Witt <jo...@gmail.com>.

Jean

Id recommend switching to the new provenance repo called
WriteAheadProvenaceRepository.  Look at a new nifi downloads
nifi.properties as it has been the default for a while.  This will help the
prov stuff.  You may also want to stop using g1gc if on java 8.

I cant explain the status history finding.

Thanks

On Wed, Oct 2, 2019 at 8:32 PM Jean-Sebastien Vachon <js...@brizodata.com>
wrote:

> Just did a thread dump using bin/nifi.sh dump and saw that a lot of
> threads are waiting on a lock.
>
>
>
> "Provenance Maintenance Thread-3" Id=99 RUNNABLE
>         at java.io.UnixFileSystem.getLength(Native Method)
>         at java.io.File.length(File.java:974)
>         at
> org.apache.nifi.provenance.IndexConfiguration.getSize(IndexConfiguration.java:341)
>         at
> org.apache.nifi.provenance.IndexConfiguration.getIndexSize(IndexConfiguration.java:355)
>         at
> org.apache.nifi.provenance.PersistentProvenanceRepository.getSize(PersistentProvenanceRepository.java:892)
>         at
> org.apache.nifi.provenance.PersistentProvenanceRepository.purgeOldEvents(PersistentProvenanceRepository.java:913)
>         - waiting on
> org.apache.nifi.provenance.PersistentProvenanceRepository@3cee277e
>         at
> org.apache.nifi.provenance.PersistentProvenanceRepository.rollover(PersistentProvenanceRepository.java:1416)
>         at
> org.apache.nifi.provenance.PersistentProvenanceRepository.access$300(PersistentProvenanceRepository.java:131)
>         at
> org.apache.nifi.provenance.PersistentProvenanceRepository$1.run(PersistentProvenanceRepository.java:302)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>         Number of Locked Synchronizers: 3
>         -
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@46683a80
>         - java.util.concurrent.locks.ReentrantLock$NonfairSync@3973f842
>         - java.util.concurrent.ThreadPoolExecutor$Worker@3d2af8d7
>
> "Provenance Query Thread-1" Id=1185 WAITING  on
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@232695c5
>         at sun.misc.Unsafe.park(Native Method)
>         at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>         at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> ------------------------------
> *From:* Jean-Sebastien Vachon <js...@brizodata.com>
> *Sent:* Wednesday, October 2, 2019 9:25 PM
> *To:* users@nifi.apache.org <us...@nifi.apache.org>
> *Subject:* Weird behaviour
>
> Hi all,
>
> First thing first... sorry for the long post but my Nifi instance (1.9.1)
> started to behave weird (according to my experience at least) this
> afternoon and I'm struggling to determine the cause of the problem (and fix
> it)
> All processors are stopped and I have about 205k flows queued in a single
> queue as illustrated below.
>
>
>
>
> The process "SaveToDB" seems to be causing some issues. It actually does a
> few more things than saving to the DB so this module was implemented as an
> ExecuteStreamCommand processor using Python3.
> This module hasn't changed in quite some time and this is the first time
> I'm having issues around it. Whenever I start it (even with only 1 thread),
> Nifi start to show the following message...
>
>
>
> Another weird issue is that if I try to open the "View Status History" on
> this processor, Nifi says: "Insufficient history, please try again later".
> I tried many things, like restarting Nifi, and noticed that the CPU runs
> at 150% on startup with no indication that it will ever stop. I read about
> these warning/error messages and tuned a few settings to increase the
> number of threads to index and restarted Nifi but
> the CPU is still over 100% when no actual process is running in Nifi.
> Looking at /nifi-api/system-diagnostics shows the following:
>
> {"systemDiagnostics": {"aggregateSnapshot": {"totalNonHeap": "235.4 MB","
> totalNonHeapBytes": 246833152,"usedNonHeap": "222.25 MB","usedNonHeapBytes
> ": 233046120,"freeNonHeap": "13.15 MB","freeNonHeapBytes": 13787032,"
> maxNonHeap": "-1 bytes","maxNonHeapBytes": -1,"totalHeap": "12 GB","
> totalHeapBytes": 12884901888,"usedHeap": "7.14 GB","usedHeapBytes":
> 7670910608,"freeHeap": "4.86 GB","freeHeapBytes": 5213991280,"maxHeap": "12
> GB","maxHeapBytes": 12884901888,"heapUtilization": "60.0%","
> availableProcessors": 72,"processorLoadAverage": 1.06,"totalThreads": 916,
> "daemonThreads": 53,"uptime": "00:58:48.996",
> I was wondering if the high number of threads is expected or not... 916
> threads seems a lot to me. Especially since my max threads count is set to
> 400 right now.
> The machine has 72 CPUs and is quite capable but the CPU has been running
> at over 100% for hours now.
>
> Do you have any recommendation?
>
> thanks
>

Re: Weird behaviour

Posted by Jean-Sebastien Vachon <js...@brizodata.com>.

Just did a thread dump using bin/nifi.sh dump and saw that a lot of threads are waiting on a lock.



"Provenance Maintenance Thread-3" Id=99 RUNNABLE
        at java.io.UnixFileSystem.getLength(Native Method)
        at java.io.File.length(File.java:974)
        at org.apache.nifi.provenance.IndexConfiguration.getSize(IndexConfiguration.java:341)
        at org.apache.nifi.provenance.IndexConfiguration.getIndexSize(IndexConfiguration.java:355)
        at org.apache.nifi.provenance.PersistentProvenanceRepository.getSize(PersistentProvenanceRepository.java:892)
        at org.apache.nifi.provenance.PersistentProvenanceRepository.purgeOldEvents(PersistentProvenanceRepository.java:913)
        - waiting on org.apache.nifi.provenance.PersistentProvenanceRepository@3cee277e
        at org.apache.nifi.provenance.PersistentProvenanceRepository.rollover(PersistentProvenanceRepository.java:1416)
        at org.apache.nifi.provenance.PersistentProvenanceRepository.access$300(PersistentProvenanceRepository.java:131)
        at org.apache.nifi.provenance.PersistentProvenanceRepository$1.run(PersistentProvenanceRepository.java:302)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        Number of Locked Synchronizers: 3
        - java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@46683a80
        - java.util.concurrent.locks.ReentrantLock$NonfairSync@3973f842
        - java.util.concurrent.ThreadPoolExecutor$Worker@3d2af8d7

"Provenance Query Thread-1" Id=1185 WAITING  on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@232695c5
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

________________________________
From: Jean-Sebastien Vachon <js...@brizodata.com>
Sent: Wednesday, October 2, 2019 9:25 PM
To: users@nifi.apache.org <us...@nifi.apache.org>
Subject: Weird behaviour

Hi all,

First thing first... sorry for the long post but my Nifi instance (1.9.1) started to behave weird (according to my experience at least) this afternoon and I'm struggling to determine the cause of the problem (and fix it)
All processors are stopped and I have about 205k flows queued in a single queue as illustrated below.


[cid:1c8868f8-4527-4d5c-86f7-c35a16116385]

The process "SaveToDB" seems to be causing some issues. It actually does a few more things than saving to the DB so this module was implemented as an ExecuteStreamCommand processor using Python3.
This module hasn't changed in quite some time and this is the first time I'm having issues around it. Whenever I start it (even with only 1 thread), Nifi start to show the following message...

[cid:e107a064-c536-42c7-9ae9-4b88fbce3e93]

Another weird issue is that if I try to open the "View Status History" on this processor, Nifi says: "Insufficient history, please try again later".
I tried many things, like restarting Nifi, and noticed that the CPU runs at 150% on startup with no indication that it will ever stop. I read about these warning/error messages and tuned a few settings to increase the number of threads to index and restarted Nifi but
the CPU is still over 100% when no actual process is running in Nifi. Looking at /nifi-api/system-diagnostics shows the following:

{"systemDiagnostics": {"aggregateSnapshot": {"totalNonHeap": "235.4 MB","totalNonHeapBytes": 246833152,"usedNonHeap": "222.25 MB","usedNonHeapBytes": 233046120,"freeNonHeap": "13.15 MB","freeNonHeapBytes": 13787032,"maxNonHeap": "-1 bytes","maxNonHeapBytes": -1,"totalHeap": "12 GB","totalHeapBytes": 12884901888,"usedHeap": "7.14 GB","usedHeapBytes": 7670910608,"freeHeap": "4.86 GB","freeHeapBytes": 5213991280,"maxHeap": "12 GB","maxHeapBytes": 12884901888,"heapUtilization": "60.0%","availableProcessors": 72,"processorLoadAverage": 1.06,"totalThreads": 916,"daemonThreads": 53,"uptime": "00:58:48.996",
I was wondering if the high number of threads is expected or not... 916 threads seems a lot to me. Especially since my max threads count is set to 400 right now.
The machine has 72 CPUs and is quite capable but the CPU has been running at over 100% for hours now.

Do you have any recommendation?

thanks