You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Arne Degenring <ar...@luxonit.com> on 2017/10/23 10:46:48 UTC

Back pressure deadlock

Hi,
 
We came across a situation when we experience a kind of “back pressure dead lock”.
 
In our setup, this occurs around PublishJMS when the target JMS queue is full. Please find attached a screenshot of the relevant flow.
 
The failure relation we route to a logging component, and then back to PublishJMS for retry. Sooner or later, the failure and retry queues will become full and produce backpressure towards the main input (which is good). The problem is that the same back pressure is also applied to the retry queue.
 
In this situation, PublishJMS will not be called at all any longer. Even when the JMS problem resolves, the whole thing stays deadlocked.
 
Is there a recommended way to avoid such situation?
 
Obviously, an admin can temporarily increase the back pressure threshold of the failure connection, once the JMS problem is resolved. But it would be nicer if the problem could resolve automatically, i.e. PublishJMS should keep retrying somehow.

Any hints?
 
Thanks,
Arne
 
 
 


Re: Back pressure deadlock

Posted by Matt Burgess <ma...@apache.org>.
Perhaps a quick(ish) win would be to implement a
DeadlockDetectionReportingTask, where you could specify processor IDs
(or names but that can get dicey) and it would monitor those
processors for incoming connections that all have backpressure
applied, and that the processor has not run for X amount of time. Then
you might be able to build a subflow that stops one of the source
processors for a while or something.

Regards,
Matt

On Mon, Oct 23, 2017 at 11:34 AM, Arne Degenring
<ar...@luxonit.com> wrote:
> Hi Andrew,
>
> My question was generally about the back pressure deadlock problem. The
> specific PublishJMS scenario from our test environment was just meant as an
> example.
>
> Having that said, ANY resource can get full or become unavailable. JMS
> queues can have maximum queue depths configured, which can be reached easily
> if the consumer is down.
>
> Thanks
> Arne
>
> On 23. Oct 2017, at 16:39, Andrew Grande <ap...@gmail.com> wrote:
>
> I wonder which jms broker you are using. The situation where a jms
> destination is full is absurd, the whole point was to decouple publishers
> and consumers. I would additionally look into what jms broker settings are
> available to address the situation.
>
> Andrew
>
>
> On Mon, Oct 23, 2017, 10:32 AM Arne Degenring <ar...@luxonit.com>
> wrote:
>>
>> Hi Mark,
>>
>> Don’t get me wrong, NiFi is great! Much appreciated that it is constantly
>> being improved. Would be great if better support for looping connections
>> would be one of those improvements in the future :-) In the meantime, we can
>> live with one of the solutions you suggested. Thanks for describing the
>> options!
>>
>> Keep up the good work!
>> Arne
>>
>>
>> On 23. Oct 2017, at 16:05, Mark Payne <ma...@hotmail.com> wrote:
>>
>> Arne,
>>
>> Fair enough. NiFi could perhaps be smarter about looping connections
>> instead of stopping at self-loops.
>>
>> Another approach to this situation, which I have used, though, would be
>> rather than having a flow that loops like you laid out
>> with PublishJMS -> LogAttribute -> Back to PublishJMS,
>> you could instead connect the 'failure' relationship to both PublishJMS as
>> a self-loop and also connect it to the LogAttribute (or alerting
>> processor or whatever you have), and then set an age-off on that
>> connection. So in this setup, even if the log/alerting processor
>> was having trouble, you'd not cause back pressure to be applied to
>> PublishJMS because of the age-off. Typically in such a situation,
>> sending data to some sort of alerting/status publishing case, it is the
>> case that age-off is appropriate (though granted it may not be 100%
>> of the time).
>>
>> Another useful approach to consider in such a case may actually be to have
>> Reporting Tasks [1] that would monitor the flow for large queues,
>> etc. While you can build such monitoring capabilities into the flow, I am
>> a fan personally of 'pulling up' this logic out of the flow because it tends
>> to result in much cleaner, easier-to-understand, and easier-to-implement
>> flows.
>>
>> So I'm certainly not saying that what NiFi does is correct and perfect and
>> can't be improved upon - any solution can probably be improved upon,
>> and NiFi is certainly rapidly improving each day. But I wanted to point
>> out some ways that you can think about attacking the concerns that you
>> have with the current implementation.
>>
>> Thanks!
>> -Mark
>>
>>
>> [1]
>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Reporting_Tasks
>>
>>
>>
>> On Oct 23, 2017, at 9:45 AM, Arne Degenring <ar...@luxonit.com>
>> wrote:
>>
>> Hi Mark,
>>
>> Thanks for clarifying that self-looping connections will still be
>> processed in back pressure situations.
>>
>> For this specific case, we can probably live without the additional
>> routing to the logging component and back.
>>
>> I think, however, that there are cases when such ping-pong routing in
>> failure cases can be very useful. E.g. for alerting someone actively,
>> publishing some information on a status page, ... etc.
>>
>> Therefore I feel it would be great if NiFi could be extended to avoid such
>> back pressure deadlock situations. Maybe through some kind of automatic
>> deadlock detection, or by marking certain incoming relations as not back
>> pressure relevant (same as self-looping connections).
>>
>> Thanks,
>> Arne
>>
>>
>> On 23. Oct 2017, at 15:00, Mark Payne <ma...@hotmail.com> wrote:
>>
>> Hi Arne,
>>
>> Generally, the approach that is used in such a situation would be to route
>> failure back to the PublishJMS processor
>> itself (without diverting first to a LogAttribute processor). The
>> PublishJMS processors itself should be logging an error
>> with the FlowFile's identity. Then, troubleshooting can be done by
>> inspecting the queue (right-click, List Queue) or
>> via Data Provenance [1]. When a processor encounters backpressure, it
>> still will continue to process data that comes
>> in on self-looping connections. So the failure relationship would still
>> get processed.
>>
>> Does this help?
>>
>> Thanks
>> -Mark
>>
>>
>>
>> [1]
>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance
>>
>>
>>
>> On Oct 23, 2017, at 6:46 AM, Arne Degenring <ar...@luxonit.com>
>> wrote:
>>
>> Hi,
>>
>> We came across a situation when we experience a kind of “back pressure
>> dead lock”.
>>
>> In our setup, this occurs around PublishJMS when the target JMS queue is
>> full. Please find attached a screenshot of the relevant flow.
>>
>> The failure relation we route to a logging component, and then back to
>> PublishJMS for retry. Sooner or later, the failure and retry queues will
>> become full and produce backpressure towards the main input (which is good).
>> The problem is that the same back pressure is also applied to the retry
>> queue.
>>
>> In this situation, PublishJMS will not be called at all any longer. Even
>> when the JMS problem resolves, the whole thing stays deadlocked.
>>
>> Is there a recommended way to avoid such situation?
>>
>> Obviously, an admin can temporarily increase the back pressure threshold
>> of the failure connection, once the JMS problem is resolved. But it would be
>> nicer if the problem could resolve automatically, i.e. PublishJMS should
>> keep retrying somehow.
>>
>> Any hints?
>>
>> Thanks,
>> Arne
>>
>>
>>
>> <backpressure-deadlock.png>
>>
>>
>>
>

Re: Back pressure deadlock

Posted by Arne Degenring <ar...@luxonit.com>.
Hi Andrew,

My question was generally about the back pressure deadlock problem. The specific PublishJMS scenario from our test environment was just meant as an example.

Having that said, ANY resource can get full or become unavailable. JMS queues can have maximum queue depths configured, which can be reached easily if the consumer is down.

Thanks
Arne

> On 23. Oct 2017, at 16:39, Andrew Grande <ap...@gmail.com> wrote:
> 
> I wonder which jms broker you are using. The situation where a jms destination is full is absurd, the whole point was to decouple publishers and consumers. I would additionally look into what jms broker settings are available to address the situation.
> 
> Andrew
> 
> 
> On Mon, Oct 23, 2017, 10:32 AM Arne Degenring <arne.degenring@luxonit.com <ma...@luxonit.com>> wrote:
> Hi Mark,
> 
> Don’t get me wrong, NiFi is great! Much appreciated that it is constantly being improved. Would be great if better support for looping connections would be one of those improvements in the future :-) In the meantime, we can live with one of the solutions you suggested. Thanks for describing the options!
> 
> Keep up the good work!
> Arne
> 
> 
> On 23. Oct 2017, at 16:05, Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> wrote:
> 
>> Arne,
>> 
>> Fair enough. NiFi could perhaps be smarter about looping connections instead of stopping at self-loops.
>> 
>> Another approach to this situation, which I have used, though, would be rather than having a flow that loops like you laid out
>> with PublishJMS -> LogAttribute -> Back to PublishJMS,
>> you could instead connect the 'failure' relationship to both PublishJMS as a self-loop and also connect it to the LogAttribute (or alerting
>> processor or whatever you have), and then set an age-off on that connection. So in this setup, even if the log/alerting processor
>> was having trouble, you'd not cause back pressure to be applied to PublishJMS because of the age-off. Typically in such a situation,
>> sending data to some sort of alerting/status publishing case, it is the case that age-off is appropriate (though granted it may not be 100%
>> of the time).
>> 
>> Another useful approach to consider in such a case may actually be to have Reporting Tasks [1] that would monitor the flow for large queues,
>> etc. While you can build such monitoring capabilities into the flow, I am a fan personally of 'pulling up' this logic out of the flow because it tends
>> to result in much cleaner, easier-to-understand, and easier-to-implement flows.
>> 
>> So I'm certainly not saying that what NiFi does is correct and perfect and can't be improved upon - any solution can probably be improved upon,
>> and NiFi is certainly rapidly improving each day. But I wanted to point out some ways that you can think about attacking the concerns that you
>> have with the current implementation.
>> 
>> Thanks!
>> -Mark
>> 
>> 
>> [1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Reporting_Tasks <http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Reporting_Tasks>
>> 
>> 
>> 
>>> On Oct 23, 2017, at 9:45 AM, Arne Degenring <arne.degenring@luxonit.com <ma...@luxonit.com>> wrote:
>>> 
>>> Hi Mark,
>>> 
>>> Thanks for clarifying that self-looping connections will still be processed in back pressure situations.
>>> 
>>> For this specific case, we can probably live without the additional routing to the logging component and back.
>>> 
>>> I think, however, that there are cases when such ping-pong routing in failure cases can be very useful. E.g. for alerting someone actively, publishing some information on a status page, ... etc. 
>>> 
>>> Therefore I feel it would be great if NiFi could be extended to avoid such back pressure deadlock situations. Maybe through some kind of automatic deadlock detection, or by marking certain incoming relations as not back pressure relevant (same as self-looping connections). 
>>> 
>>> Thanks,
>>> Arne
>>> 
>>> 
>>> On 23. Oct 2017, at 15:00, Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> wrote:
>>> 
>>>> Hi Arne,
>>>> 
>>>> Generally, the approach that is used in such a situation would be to route failure back to the PublishJMS processor
>>>> itself (without diverting first to a LogAttribute processor). The PublishJMS processors itself should be logging an error
>>>> with the FlowFile's identity. Then, troubleshooting can be done by inspecting the queue (right-click, List Queue) or
>>>> via Data Provenance [1]. When a processor encounters backpressure, it still will continue to process data that comes
>>>> in on self-looping connections. So the failure relationship would still get processed.
>>>> 
>>>> Does this help?
>>>> 
>>>> Thanks
>>>> -Mark
>>>> 
>>>> 
>>>> 
>>>> [1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance <http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance>
>>>> 
>>>> 
>>>> 
>>>>> On Oct 23, 2017, at 6:46 AM, Arne Degenring <arne.degenring@luxonit.com <ma...@luxonit.com>> wrote:
>>>>> 
>>>>> Hi,
>>>>>  
>>>>> We came across a situation when we experience a kind of “back pressure dead lock”.
>>>>>  
>>>>> In our setup, this occurs around PublishJMS when the target JMS queue is full. Please find attached a screenshot of the relevant flow.
>>>>>  
>>>>> The failure relation we route to a logging component, and then back to PublishJMS for retry. Sooner or later, the failure and retry queues will become full and produce backpressure towards the main input (which is good). The problem is that the same back pressure is also applied to the retry queue.
>>>>>  
>>>>> In this situation, PublishJMS will not be called at all any longer. Even when the JMS problem resolves, the whole thing stays deadlocked.
>>>>>  
>>>>> Is there a recommended way to avoid such situation?
>>>>>  
>>>>> Obviously, an admin can temporarily increase the back pressure threshold of the failure connection, once the JMS problem is resolved. But it would be nicer if the problem could resolve automatically, i.e. PublishJMS should keep retrying somehow.
>>>>> 
>>>>> Any hints?
>>>>>  
>>>>> Thanks,
>>>>> Arne
>>>>>  
>>>>>  
>>>>>  
>>>>> <backpressure-deadlock.png>
>>>> 
>> 


Re: Back pressure deadlock

Posted by Andrew Grande <ap...@gmail.com>.
I wonder which jms broker you are using. The situation where a jms
destination is full is absurd, the whole point was to decouple publishers
and consumers. I would additionally look into what jms broker settings are
available to address the situation.

Andrew

On Mon, Oct 23, 2017, 10:32 AM Arne Degenring <ar...@luxonit.com>
wrote:

> Hi Mark,
>
> Don’t get me wrong, NiFi is great! Much appreciated that it is constantly
> being improved. Would be great if better support for looping connections
> would be one of those improvements in the future :-) In the meantime, we
> can live with one of the solutions you suggested. Thanks for describing the
> options!
>
> Keep up the good work!
> Arne
>
>
> On 23. Oct 2017, at 16:05, Mark Payne <ma...@hotmail.com> wrote:
>
> Arne,
>
> Fair enough. NiFi could perhaps be smarter about looping connections
> instead of stopping at self-loops.
>
> Another approach to this situation, which I have used, though, would be
> rather than having a flow that loops like you laid out
> with PublishJMS -> LogAttribute -> Back to PublishJMS,
> you could instead connect the 'failure' relationship to both PublishJMS as
> a self-loop and also connect it to the LogAttribute (or alerting
> processor or whatever you have), and then set an age-off on that
> connection. So in this setup, even if the log/alerting processor
> was having trouble, you'd not cause back pressure to be applied to
> PublishJMS because of the age-off. Typically in such a situation,
> sending data to some sort of alerting/status publishing case, it is the
> case that age-off is appropriate (though granted it may not be 100%
> of the time).
>
> Another useful approach to consider in such a case may actually be to have
> Reporting Tasks [1] that would monitor the flow for large queues,
> etc. While you can build such monitoring capabilities into the flow, I am
> a fan personally of 'pulling up' this logic out of the flow because it tends
> to result in much cleaner, easier-to-understand, and easier-to-implement
> flows.
>
> So I'm certainly not saying that what NiFi does is correct and perfect and
> can't be improved upon - any solution can probably be improved upon,
> and NiFi is certainly rapidly improving each day. But I wanted to point
> out some ways that you can think about attacking the concerns that you
> have with the current implementation.
>
> Thanks!
> -Mark
>
>
> [1]
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Reporting_Tasks
>
>
>
> On Oct 23, 2017, at 9:45 AM, Arne Degenring <ar...@luxonit.com>
> wrote:
>
> Hi Mark,
>
> Thanks for clarifying that self-looping connections will still be
> processed in back pressure situations.
>
> For this specific case, we can probably live without the additional
> routing to the logging component and back.
>
> I think, however, that there are cases when such ping-pong routing in
> failure cases can be very useful. E.g. for alerting someone actively,
> publishing some information on a status page, ... etc.
>
> Therefore I feel it would be great if NiFi could be extended to avoid such
> back pressure deadlock situations. Maybe through some kind of automatic
> deadlock detection, or by marking certain incoming relations as not back
> pressure relevant (same as self-looping connections).
>
> Thanks,
> Arne
>
>
> On 23. Oct 2017, at 15:00, Mark Payne <ma...@hotmail.com> wrote:
>
> Hi Arne,
>
> Generally, the approach that is used in such a situation would be to route
> failure back to the PublishJMS processor
> itself (without diverting first to a LogAttribute processor). The
> PublishJMS processors itself should be logging an error
> with the FlowFile's identity. Then, troubleshooting can be done by
> inspecting the queue (right-click, List Queue) or
> via Data Provenance [1]. When a processor encounters backpressure, it
> still will continue to process data that comes
> in on self-looping connections. So the failure relationship would still
> get processed.
>
> Does this help?
>
> Thanks
> -Mark
>
>
>
> [1]
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance
>
>
>
> On Oct 23, 2017, at 6:46 AM, Arne Degenring <ar...@luxonit.com>
> wrote:
>
> Hi,
>
> We came across a situation when we experience a kind of “back pressure
> dead lock”.
>
> In our setup, this occurs around PublishJMS when the target JMS queue is
> full. Please find attached a screenshot of the relevant flow.
>
> The failure relation we route to a logging component, and then back to
> PublishJMS for retry. Sooner or later, the failure and retry queues will
> become full and produce backpressure towards the main input (which is
> good). The problem is that the same back pressure is also applied to the
> retry queue.
>
> In this situation, PublishJMS will not be called at all any longer. Even
> when the JMS problem resolves, the whole thing stays deadlocked.
>
> Is there a recommended way to avoid such situation?
>
> Obviously, an admin can temporarily increase the back pressure threshold
> of the failure connection, once the JMS problem is resolved. But it would
> be nicer if the problem could resolve automatically, i.e. PublishJMS should
> keep retrying somehow.
>
> Any hints?
>
> Thanks,
> Arne
>
>
>
> <backpressure-deadlock.png>
>
>
>
>

Re: Back pressure deadlock

Posted by Arne Degenring <ar...@luxonit.com>.
Hi Mark,

Don’t get me wrong, NiFi is great! Much appreciated that it is constantly being improved. Would be great if better support for looping connections would be one of those improvements in the future :-) In the meantime, we can live with one of the solutions you suggested. Thanks for describing the options!

Keep up the good work!
Arne


> On 23. Oct 2017, at 16:05, Mark Payne <ma...@hotmail.com> wrote:
> 
> Arne,
> 
> Fair enough. NiFi could perhaps be smarter about looping connections instead of stopping at self-loops.
> 
> Another approach to this situation, which I have used, though, would be rather than having a flow that loops like you laid out
> with PublishJMS -> LogAttribute -> Back to PublishJMS,
> you could instead connect the 'failure' relationship to both PublishJMS as a self-loop and also connect it to the LogAttribute (or alerting
> processor or whatever you have), and then set an age-off on that connection. So in this setup, even if the log/alerting processor
> was having trouble, you'd not cause back pressure to be applied to PublishJMS because of the age-off. Typically in such a situation,
> sending data to some sort of alerting/status publishing case, it is the case that age-off is appropriate (though granted it may not be 100%
> of the time).
> 
> Another useful approach to consider in such a case may actually be to have Reporting Tasks [1] that would monitor the flow for large queues,
> etc. While you can build such monitoring capabilities into the flow, I am a fan personally of 'pulling up' this logic out of the flow because it tends
> to result in much cleaner, easier-to-understand, and easier-to-implement flows.
> 
> So I'm certainly not saying that what NiFi does is correct and perfect and can't be improved upon - any solution can probably be improved upon,
> and NiFi is certainly rapidly improving each day. But I wanted to point out some ways that you can think about attacking the concerns that you
> have with the current implementation.
> 
> Thanks!
> -Mark
> 
> 
> [1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Reporting_Tasks
> 
> 
> 
>> On Oct 23, 2017, at 9:45 AM, Arne Degenring <ar...@luxonit.com> wrote:
>> 
>> Hi Mark,
>> 
>> Thanks for clarifying that self-looping connections will still be processed in back pressure situations.
>> 
>> For this specific case, we can probably live without the additional routing to the logging component and back.
>> 
>> I think, however, that there are cases when such ping-pong routing in failure cases can be very useful. E.g. for alerting someone actively, publishing some information on a status page, ... etc. 
>> 
>> Therefore I feel it would be great if NiFi could be extended to avoid such back pressure deadlock situations. Maybe through some kind of automatic deadlock detection, or by marking certain incoming relations as not back pressure relevant (same as self-looping connections). 
>> 
>> Thanks,
>> Arne
>> 
>> 
>> On 23. Oct 2017, at 15:00, Mark Payne <ma...@hotmail.com> wrote:
>> 
>>> Hi Arne,
>>> 
>>> Generally, the approach that is used in such a situation would be to route failure back to the PublishJMS processor
>>> itself (without diverting first to a LogAttribute processor). The PublishJMS processors itself should be logging an error
>>> with the FlowFile's identity. Then, troubleshooting can be done by inspecting the queue (right-click, List Queue) or
>>> via Data Provenance [1]. When a processor encounters backpressure, it still will continue to process data that comes
>>> in on self-looping connections. So the failure relationship would still get processed.
>>> 
>>> Does this help?
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> 
>>> [1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance
>>> 
>>> 
>>> 
>>>> On Oct 23, 2017, at 6:46 AM, Arne Degenring <ar...@luxonit.com> wrote:
>>>> 
>>>> Hi,
>>>>  
>>>> We came across a situation when we experience a kind of “back pressure dead lock”.
>>>>  
>>>> In our setup, this occurs around PublishJMS when the target JMS queue is full. Please find attached a screenshot of the relevant flow.
>>>>  
>>>> The failure relation we route to a logging component, and then back to PublishJMS for retry. Sooner or later, the failure and retry queues will become full and produce backpressure towards the main input (which is good). The problem is that the same back pressure is also applied to the retry queue.
>>>>  
>>>> In this situation, PublishJMS will not be called at all any longer. Even when the JMS problem resolves, the whole thing stays deadlocked.
>>>>  
>>>> Is there a recommended way to avoid such situation?
>>>>  
>>>> Obviously, an admin can temporarily increase the back pressure threshold of the failure connection, once the JMS problem is resolved. But it would be nicer if the problem could resolve automatically, i.e. PublishJMS should keep retrying somehow.
>>>> 
>>>> Any hints?
>>>>  
>>>> Thanks,
>>>> Arne
>>>>  
>>>>  
>>>>  
>>>> <backpressure-deadlock.png>
>>> 
> 

Re: Back pressure deadlock

Posted by Mark Payne <ma...@hotmail.com>.
Arne,

Fair enough. NiFi could perhaps be smarter about looping connections instead of stopping at self-loops.

Another approach to this situation, which I have used, though, would be rather than having a flow that loops like you laid out
with PublishJMS -> LogAttribute -> Back to PublishJMS,
you could instead connect the 'failure' relationship to both PublishJMS as a self-loop and also connect it to the LogAttribute (or alerting
processor or whatever you have), and then set an age-off on that connection. So in this setup, even if the log/alerting processor
was having trouble, you'd not cause back pressure to be applied to PublishJMS because of the age-off. Typically in such a situation,
sending data to some sort of alerting/status publishing case, it is the case that age-off is appropriate (though granted it may not be 100%
of the time).

Another useful approach to consider in such a case may actually be to have Reporting Tasks [1] that would monitor the flow for large queues,
etc. While you can build such monitoring capabilities into the flow, I am a fan personally of 'pulling up' this logic out of the flow because it tends
to result in much cleaner, easier-to-understand, and easier-to-implement flows.

So I'm certainly not saying that what NiFi does is correct and perfect and can't be improved upon - any solution can probably be improved upon,
and NiFi is certainly rapidly improving each day. But I wanted to point out some ways that you can think about attacking the concerns that you
have with the current implementation.

Thanks!
-Mark


[1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Reporting_Tasks



On Oct 23, 2017, at 9:45 AM, Arne Degenring <ar...@luxonit.com>> wrote:

Hi Mark,

Thanks for clarifying that self-looping connections will still be processed in back pressure situations.

For this specific case, we can probably live without the additional routing to the logging component and back.

I think, however, that there are cases when such ping-pong routing in failure cases can be very useful. E.g. for alerting someone actively, publishing some information on a status page, ... etc.

Therefore I feel it would be great if NiFi could be extended to avoid such back pressure deadlock situations. Maybe through some kind of automatic deadlock detection, or by marking certain incoming relations as not back pressure relevant (same as self-looping connections).

Thanks,
Arne


On 23. Oct 2017, at 15:00, Mark Payne <ma...@hotmail.com>> wrote:

Hi Arne,

Generally, the approach that is used in such a situation would be to route failure back to the PublishJMS processor
itself (without diverting first to a LogAttribute processor). The PublishJMS processors itself should be logging an error
with the FlowFile's identity. Then, troubleshooting can be done by inspecting the queue (right-click, List Queue) or
via Data Provenance [1]. When a processor encounters backpressure, it still will continue to process data that comes
in on self-looping connections. So the failure relationship would still get processed.

Does this help?

Thanks
-Mark



[1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance



On Oct 23, 2017, at 6:46 AM, Arne Degenring <ar...@luxonit.com>> wrote:

Hi,

We came across a situation when we experience a kind of “back pressure dead lock”.

In our setup, this occurs around PublishJMS when the target JMS queue is full. Please find attached a screenshot of the relevant flow.

The failure relation we route to a logging component, and then back to PublishJMS for retry. Sooner or later, the failure and retry queues will become full and produce backpressure towards the main input (which is good). The problem is that the same back pressure is also applied to the retry queue.

In this situation, PublishJMS will not be called at all any longer. Even when the JMS problem resolves, the whole thing stays deadlocked.

Is there a recommended way to avoid such situation?

Obviously, an admin can temporarily increase the back pressure threshold of the failure connection, once the JMS problem is resolved. But it would be nicer if the problem could resolve automatically, i.e. PublishJMS should keep retrying somehow.

Any hints?

Thanks,
Arne



<backpressure-deadlock.png>



Re: Back pressure deadlock

Posted by Arne Degenring <ar...@luxonit.com>.
Hi Mark,

Thanks for clarifying that self-looping connections will still be processed in back pressure situations.

For this specific case, we can probably live without the additional routing to the logging component and back.

I think, however, that there are cases when such ping-pong routing in failure cases can be very useful. E.g. for alerting someone actively, publishing some information on a status page, ... etc. 

Therefore I feel it would be great if NiFi could be extended to avoid such back pressure deadlock situations. Maybe through some kind of automatic deadlock detection, or by marking certain incoming relations as not back pressure relevant (same as self-looping connections). 

Thanks,
Arne


> On 23. Oct 2017, at 15:00, Mark Payne <ma...@hotmail.com> wrote:
> 
> Hi Arne,
> 
> Generally, the approach that is used in such a situation would be to route failure back to the PublishJMS processor
> itself (without diverting first to a LogAttribute processor). The PublishJMS processors itself should be logging an error
> with the FlowFile's identity. Then, troubleshooting can be done by inspecting the queue (right-click, List Queue) or
> via Data Provenance [1]. When a processor encounters backpressure, it still will continue to process data that comes
> in on self-looping connections. So the failure relationship would still get processed.
> 
> Does this help?
> 
> Thanks
> -Mark
> 
> 
> 
> [1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance
> 
> 
> 
>> On Oct 23, 2017, at 6:46 AM, Arne Degenring <ar...@luxonit.com> wrote:
>> 
>> Hi,
>>  
>> We came across a situation when we experience a kind of “back pressure dead lock”.
>>  
>> In our setup, this occurs around PublishJMS when the target JMS queue is full. Please find attached a screenshot of the relevant flow.
>>  
>> The failure relation we route to a logging component, and then back to PublishJMS for retry. Sooner or later, the failure and retry queues will become full and produce backpressure towards the main input (which is good). The problem is that the same back pressure is also applied to the retry queue.
>>  
>> In this situation, PublishJMS will not be called at all any longer. Even when the JMS problem resolves, the whole thing stays deadlocked.
>>  
>> Is there a recommended way to avoid such situation?
>>  
>> Obviously, an admin can temporarily increase the back pressure threshold of the failure connection, once the JMS problem is resolved. But it would be nicer if the problem could resolve automatically, i.e. PublishJMS should keep retrying somehow.
>> 
>> Any hints?
>>  
>> Thanks,
>> Arne
>>  
>>  
>>  
>> <backpressure-deadlock.png>
> 

Re: Back pressure deadlock

Posted by Mark Payne <ma...@hotmail.com>.
Hi Arne,

Generally, the approach that is used in such a situation would be to route failure back to the PublishJMS processor
itself (without diverting first to a LogAttribute processor). The PublishJMS processors itself should be logging an error
with the FlowFile's identity. Then, troubleshooting can be done by inspecting the queue (right-click, List Queue) or
via Data Provenance [1]. When a processor encounters backpressure, it still will continue to process data that comes
in on self-looping connections. So the failure relationship would still get processed.

Does this help?

Thanks
-Mark



[1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance



On Oct 23, 2017, at 6:46 AM, Arne Degenring <ar...@luxonit.com>> wrote:

Hi,

We came across a situation when we experience a kind of “back pressure dead lock”.

In our setup, this occurs around PublishJMS when the target JMS queue is full. Please find attached a screenshot of the relevant flow.

The failure relation we route to a logging component, and then back to PublishJMS for retry. Sooner or later, the failure and retry queues will become full and produce backpressure towards the main input (which is good). The problem is that the same back pressure is also applied to the retry queue.

In this situation, PublishJMS will not be called at all any longer. Even when the JMS problem resolves, the whole thing stays deadlocked.

Is there a recommended way to avoid such situation?

Obviously, an admin can temporarily increase the back pressure threshold of the failure connection, once the JMS problem is resolved. But it would be nicer if the problem could resolve automatically, i.e. PublishJMS should keep retrying somehow.

Any hints?

Thanks,
Arne



<backpressure-deadlock.png>