You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by sanjeet rath <ra...@gmail.com> on 2021/01/06 14:10:06 UTC

Nifi 1.12.1 cluster is getting hung after few days(15 days)

Hi All,

Happy New Year :)

I have upgraded our cluster from 1.8 to 1.12.1, few days ago and everything
is working fine. I observed that Nifi was like hanged after running for few
days (I have observed its nearly after 15 days of nifi service start) issue
is after login the browser keep on loading , When I saw the bootstrap.log I
saw this message "*Apache nifi is running at PID () but not responding to
ping requests*”.
This happened to only one node from a 3 node cluster.

This issue happened *3 times on different cluster on different nodes.*

*Everytime issue got fixed by restarting NiFi service.*

During  the hanged state I tried see the resource utilisation

 -> top -n 1 -H -p 943785 (nifi processid )


top - 08:26:36 up 40 days, 3:48, 2 users, load average: 5.28, 5.38, 5.43
Threads: 239 total, 4 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s):
98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem :
15829.5 total, 610.8 free, 10823.7 used, 4395.0 buff/cache MiB Swap: 0.0
total, 0.0 free, 0.0 used. 4456.1 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

*943806* root 20 0 12.5g 9.4g 18692 R *88.9* 60.7 12698:50 *GC Thread#1 *

943807 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:48 GC Thread#2

943808 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:58 GC Thread#3

 943787 root 20 0 12.5g 9.4g 18692 R 83.3 60.7 12698:51 GC Thread#0

943785 root 20 0 12.5g 9.4g 18692 S 0.0 60.7 0:00.00 java


We have 4 core cpu, all *4 GC threads*  are keep on this state and
consuming more CPU.*cluster is hung state for 2 days,* Then after 2 days I
saw these threads are moved and nifi comes out of the hung state for this
node , but saw another node from the same cluster moved to the hung state
with similar fashion means , 4 threads busy in GC and consuming more CPU.


Could you please help me to identify what could be the possible reason.

Details:

Nifi 1.12.1

Jdk 11

Zookeeper 3.5.8

16g memory



Thanks,
-- 
Sanjeet Kumar Rath,
mob- +91 8777577470

Re: Nifi 1.12.1 cluster is getting hung after few days(15 days)

Posted by Joe Witt <jo...@gmail.com>.

Hello

Please capture and share a full thread dump by running bin/nifi.sh dump.
and please post these so theyre easier to read than this email system.

Thanks

On Thu, Jan 7, 2021 at 5:22 AM sanjeet rath <ra...@gmail.com> wrote:

> Hi All,
>
> Could someone please give me thoughts on the trailed mail issue, so i can
> do my further analysis.
>
> Regards,
> Sanjeet
>
> On Wed, 6 Jan 2021, 7:40 pm sanjeet rath, <ra...@gmail.com> wrote:
>
>> Hi All,
>>
>> Happy New Year :)
>>
>> I have upgraded our cluster from 1.8 to 1.12.1, few days ago and everything
>> is working fine. I observed that Nifi was like hanged after running for few
>> days (I have observed its nearly after 15 days of nifi service start) issue
>> is after login the browser keep on loading , When I saw the bootstrap.log I
>> saw this message "*Apache nifi is running at PID () but not responding
>> to ping requests*”.
>> This happened to only one node from a 3 node cluster.
>>
>> This issue happened *3 times on different cluster on different nodes.*
>>
>> *Everytime issue got fixed by restarting NiFi service.*
>>
>> During  the hanged state I tried see the resource utilisation
>>
>>  -> top -n 1 -H -p 943785 (nifi processid )
>>
>>
>> top - 08:26:36 up 40 days, 3:48, 2 users, load average: 5.28, 5.38, 5.43
>> Threads: 239 total, 4 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s):
>> 98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem :
>> 15829.5 total, 610.8 free, 10823.7 used, 4395.0 buff/cache MiB Swap: 0.0
>> total, 0.0 free, 0.0 used. 4456.1 avail Mem
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>
>> *943806* root 20 0 12.5g 9.4g 18692 R *88.9* 60.7 12698:50 *GC Thread#1 *
>>
>> 943807 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:48 GC Thread#2
>>
>> 943808 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:58 GC Thread#3
>>
>>  943787 root 20 0 12.5g 9.4g 18692 R 83.3 60.7 12698:51 GC Thread#0
>>
>> 943785 root 20 0 12.5g 9.4g 18692 S 0.0 60.7 0:00.00 java
>>
>>
>> We have 4 core cpu, all *4 GC threads*  are keep on this state and
>> consuming more CPU.*cluster is hung state for 2 days,* Then after 2 days
>> I saw these threads are moved and nifi comes out of the hung state for this
>> node , but saw another node from the same cluster moved to the hung state
>> with similar fashion means , 4 threads busy in GC and consuming more CPU.
>>
>>
>> Could you please help me to identify what could be the possible reason.
>>
>> Details:
>>
>> Nifi 1.12.1
>>
>> Jdk 11
>>
>> Zookeeper 3.5.8
>>
>> 16g memory
>>
>>
>>
>> Thanks,
>> --
>> Sanjeet Kumar Rath,
>> mob- +91 8777577470
>>
>>
>>

Re: Nifi 1.12.1 cluster is getting hung after few days(15 days)

Posted by sanjeet rath <ra...@gmail.com>.

Hi All,

Could someone please give me thoughts on the trailed mail issue, so i can
do my further analysis.

Regards,
Sanjeet

On Wed, 6 Jan 2021, 7:40 pm sanjeet rath, <ra...@gmail.com> wrote:

> Hi All,
>
> Happy New Year :)
>
> I have upgraded our cluster from 1.8 to 1.12.1, few days ago and everything
> is working fine. I observed that Nifi was like hanged after running for few
> days (I have observed its nearly after 15 days of nifi service start) issue
> is after login the browser keep on loading , When I saw the bootstrap.log I
> saw this message "*Apache nifi is running at PID () but not responding to
> ping requests*”.
> This happened to only one node from a 3 node cluster.
>
> This issue happened *3 times on different cluster on different nodes.*
>
> *Everytime issue got fixed by restarting NiFi service.*
>
> During  the hanged state I tried see the resource utilisation
>
>  -> top -n 1 -H -p 943785 (nifi processid )
>
>
> top - 08:26:36 up 40 days, 3:48, 2 users, load average: 5.28, 5.38, 5.43
> Threads: 239 total, 4 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s):
> 98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem :
> 15829.5 total, 610.8 free, 10823.7 used, 4395.0 buff/cache MiB Swap: 0.0
> total, 0.0 free, 0.0 used. 4456.1 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>
> *943806* root 20 0 12.5g 9.4g 18692 R *88.9* 60.7 12698:50 *GC Thread#1 *
>
> 943807 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:48 GC Thread#2
>
> 943808 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:58 GC Thread#3
>
>  943787 root 20 0 12.5g 9.4g 18692 R 83.3 60.7 12698:51 GC Thread#0
>
> 943785 root 20 0 12.5g 9.4g 18692 S 0.0 60.7 0:00.00 java
>
>
> We have 4 core cpu, all *4 GC threads*  are keep on this state and
> consuming more CPU.*cluster is hung state for 2 days,* Then after 2 days
> I saw these threads are moved and nifi comes out of the hung state for this
> node , but saw another node from the same cluster moved to the hung state
> with similar fashion means , 4 threads busy in GC and consuming more CPU.
>
>
> Could you please help me to identify what could be the possible reason.
>
> Details:
>
> Nifi 1.12.1
>
> Jdk 11
>
> Zookeeper 3.5.8
>
> 16g memory
>
>
>
> Thanks,
> --
> Sanjeet Kumar Rath,
> mob- +91 8777577470
>
>
>