You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by James McMahon <js...@gmail.com> on 2020/06/25 01:05:02 UTC

Indications in the UI of which cluster node hosts a “stuck” thread?

Our production nifi cluster is exhibiting repeated problems with threads
that do not end. It is happening with processors that have complex
configurations and dependencies (ConsumeAMQP), and - more troubling - it is
also occurring periodically for simple processors like ControlRate. I’ll
have a Control processor sitting in a running state with no active running
thread,I select Stop on that processor, get a thread I presume to be
responsible for stopping the processor, and that thread will never end.
This renders my processor in a useless state - not stopped, not really
running, and not accessible to reconfigure.

I read a blog by Pierre Villard on using nifi.sh for thread dumps. I’ll dig
into that. My questions:

1. In a cluster, is there anything I can use in the UI to tell me which
cluster node hosts the bad thread? Digging through thread dumps from
multiple cluster nodes seems impractical, and I’m hoping there’s a way to
zero in on a node.

2. What nifi system resources in my configuration influence the management
and well-being of these threads?

3. Has anyone debugged such a thread issue in a clustered nifi environment,
and if so can you offer any tips based on your experience?

Thanks in advance for any help.
Jim

Re: Indications in the UI of which cluster node hosts a “stuck” thread?

Posted by James McMahon <js...@gmail.com>.

This does help, thank you Matt. And I like your suggestion. It would be
more at our fingertips if as we hover over the thread count on the
processor, the distribution across all cluster nodes is presented in a
popup. I wonder if project leads would consider this helpful improvement?

I can now see that my hanging threads are on just two of my cluster nodes.
This is very helpful - thanks again. It reduces the amount of thread
dumping review I will be doing today.

Jim

On Wed, Jun 24, 2020 at 9:53 PM Matt Gilman <ma...@gmail.com> wrote:

> Hi Jim,
>
> If you open the Summary page from the global menu you should see the
> active threads in parentheses next to the scheduled state. Find the row in
> question and click the cluster icon from the actions column. This will open
> a dialog with a node-wise breakdown. I believe that the thread count is one
> of the metrics that is broken down per node.
>
> Hope this helps! Adding this breakdown to the main canvas would be a great
> addition. Maybe these breakdowns could be offered in a tooltip first each
> metric.
>
> Matt
>
> Sent from my iPhone
>
> > On Jun 24, 2020, at 21:05, James McMahon <js...@gmail.com> wrote:
> >
> > 
> > Our production nifi cluster is exhibiting repeated problems with threads
> that do not end. It is happening with processors that have complex
> configurations and dependencies (ConsumeAMQP), and - more troubling - it is
> also occurring periodically for simple processors like ControlRate. I’ll
> have a Control processor sitting in a running state with no active running
> thread,I select Stop on that processor, get a thread I presume to be
> responsible for stopping the processor, and that thread will never end.
> This renders my processor in a useless state - not stopped, not really
> running, and not accessible to reconfigure.
> >
> > I read a blog by Pierre Villard on using nifi.sh for thread dumps. I’ll
> dig into that. My questions:
> >
> > 1. In a cluster, is there anything I can use in the UI to tell me which
> cluster node hosts the bad thread? Digging through thread dumps from
> multiple cluster nodes seems impractical, and I’m hoping there’s a way to
> zero in on a node.
> >
> > 2. What nifi system resources in my configuration influence the
> management and well-being of these threads?
> >
> > 3. Has anyone debugged such a thread issue in a clustered nifi
> environment, and if so can you offer any tips based on your experience?
> >
> > Thanks in advance for any help.
> > Jim
>

Re: Indications in the UI of which cluster node hosts a “stuck” thread?

Posted by Matt Gilman <ma...@gmail.com>.

Hi Jim,

If you open the Summary page from the global menu you should see the active threads in parentheses next to the scheduled state. Find the row in question and click the cluster icon from the actions column. This will open a dialog with a node-wise breakdown. I believe that the thread count is one of the metrics that is broken down per node.

Hope this helps! Adding this breakdown to the main canvas would be a great addition. Maybe these breakdowns could be offered in a tooltip first each metric.

Matt

Sent from my iPhone

> On Jun 24, 2020, at 21:05, James McMahon <js...@gmail.com> wrote:
> 
> 
> Our production nifi cluster is exhibiting repeated problems with threads that do not end. It is happening with processors that have complex configurations and dependencies (ConsumeAMQP), and - more troubling - it is also occurring periodically for simple processors like ControlRate. I’ll have a Control processor sitting in a running state with no active running thread,I select Stop on that processor, get a thread I presume to be responsible for stopping the processor, and that thread will never end. This renders my processor in a useless state - not stopped, not really running, and not accessible to reconfigure.
> 
> I read a blog by Pierre Villard on using nifi.sh for thread dumps. I’ll dig into that. My questions:
> 
> 1. In a cluster, is there anything I can use in the UI to tell me which cluster node hosts the bad thread? Digging through thread dumps from multiple cluster nodes seems impractical, and I’m hoping there’s a way to zero in on a node.
> 
> 2. What nifi system resources in my configuration influence the management and well-being of these threads?
> 
> 3. Has anyone debugged such a thread issue in a clustered nifi environment, and if so can you offer any tips based on your experience?
> 
> Thanks in advance for any help.
> Jim