You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Greene (US), Geoffrey N" <ge...@boeing.com> on 2021/04/22 18:08:49 UTC

Some retry flowfile questions

We have a rest endpoint that is "unreliable". It works sometimes.
When it doesn't work, the solution seems to be to sleep for awhile, then try again

So I put in a retry processor:

http processor    <-  Retry
   |      \             ^
Success  Failure  -----|

So far, so good, that loop works.  But how do I handle the slow down?
Does the penalty / yield go on the retry? Or on the http?  Whats the difference?  How do I know if I should YIELD or impose a penalty? I'm not sure I understand the differences here

Thanks
Geoff





RE: Some retry flowfile questions

Posted by Tomislav Novosel <to...@clearpeaks.com>.
Hi Harald, Mark,

I asked about RetryFlowfile the other day and its potential danger, but no answer yet.
My question was not referred to penalty and yield really, but just to make consideration about it.

@Harald, if on this Retry in your schema you are using RetryFlowfile processor, there can be sooner or later
potential of deadlock if you are having a lot of files going through this point in your flow.

Imagine there is big number of flowfiles and that “unreliable” endpoint you mentioned is sleeping
for a while, all the flowfiles are going to failure relationship and after sometime(depends how you configured
number of retries in RetryFlowfile processor) files are going to retry relationship to retry endpoint again.

If both of that relationships are full to the backpressure threshold, there will deadlock
and even if that endpoint wakes up, NiFi will not try it.

Related to your “slow down” question, in RetryFlowFile there is an option to penalize flowfiles before sending
to retry relationship.

Thanks,
Regards,
Tom

From: Dobbernack, Harald (Key-Work) <ha...@key-work.de>
Sent: 23 April 2021 09:50
To: users@nifi.apache.org
Subject: AW: Some retry flowfile questions

Mark, thank you so much for this great explanation!
Harald

Von: Mark Payne <ma...@hotmail.com>>
Gesendet: Donnerstag, 22. April 2021 22:32
An: users@nifi.apache.org<ma...@nifi.apache.org>
Betreff: Re: Some retry flowfile questions

Geoff,

The difference between penalization and yielding is whether the failure is data-dependent or not.

So, an easy way to think about this is to consider a scenario where you have a simple flow: GetFTP -> PutFTP.
Something else is picking up data from the FTP server that you’re putting to.

You know that sometimes the data will already exist with the same name, but you don’t want to overwrite it because it’s likely to actually be different data with a conflicting filename.
So you want to wait a while and try to push that file again. In the meantime, you want to continue pushing other files to the FTP server.
In this case, the processor would penalize that FlowFile so that it can continue working on other data.

On the other hand, if PutFTP were to get a connection failure, it’s not even able to connect to that FTP server, then it doesn’t make sense to penalize that FlowFile and move onto the next one and try to push it. It can’t connect, so it can’t make progress regardless of what data it has.
In this case, the processor should yield.

Note, however, that it is up to the processor developer to tell the processor to yield or to penalize the FlowFile. It’s not up to the creator of the data flow.

Does that help?

Thanks
-Mark

On Apr 22, 2021, at 2:08 PM, Greene (US), Geoffrey N <ge...@boeing.com>> wrote:

We have a rest endpoint that is “unreliable”. It works sometimes.
When it doesn’t work, the solution seems to be to sleep for awhile, then try again

So I put in a retry processor:

http processor    <-  Retry
   |      \             ^
Success  Failure  -----|

So far, so good, that loop works.  But how do I handle the slow down?
Does the penalty / yield go on the retry? Or on the http?  Whats the difference?  How do I know if I should YIELD or impose a penalty? I’m not sure I understand the differences here

Thanks
Geoff



Harald Dobbernack

Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | www.key-work.de<https://www.key-work.de> | Datenschutz<https://www.key-work.de/de/footer/datenschutz.html>
Fon: +49-721-78203-264 | E-Mail: harald.dobbernack@key-work.de<ma...@key-work.de>

Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
Geschäftsführer: Andreas Stappert, Tobin Wotring

AW: Some retry flowfile questions

Posted by "Dobbernack, Harald (Key-Work)" <ha...@key-work.de>.
Mark, thank you so much for this great explanation!
Harald

Von: Mark Payne <ma...@hotmail.com>
Gesendet: Donnerstag, 22. April 2021 22:32
An: users@nifi.apache.org
Betreff: Re: Some retry flowfile questions

Geoff,

The difference between penalization and yielding is whether the failure is data-dependent or not.

So, an easy way to think about this is to consider a scenario where you have a simple flow: GetFTP -> PutFTP.
Something else is picking up data from the FTP server that you’re putting to.

You know that sometimes the data will already exist with the same name, but you don’t want to overwrite it because it’s likely to actually be different data with a conflicting filename.
So you want to wait a while and try to push that file again. In the meantime, you want to continue pushing other files to the FTP server.
In this case, the processor would penalize that FlowFile so that it can continue working on other data.

On the other hand, if PutFTP were to get a connection failure, it’s not even able to connect to that FTP server, then it doesn’t make sense to penalize that FlowFile and move onto the next one and try to push it. It can’t connect, so it can’t make progress regardless of what data it has.
In this case, the processor should yield.

Note, however, that it is up to the processor developer to tell the processor to yield or to penalize the FlowFile. It’s not up to the creator of the data flow.

Does that help?

Thanks
-Mark


On Apr 22, 2021, at 2:08 PM, Greene (US), Geoffrey N <ge...@boeing.com>> wrote:

We have a rest endpoint that is “unreliable”. It works sometimes.
When it doesn’t work, the solution seems to be to sleep for awhile, then try again

So I put in a retry processor:

http processor    <-  Retry
   |      \             ^
Success  Failure  -----|

So far, so good, that loop works.  But how do I handle the slow down?
Does the penalty / yield go on the retry? Or on the http?  Whats the difference?  How do I know if I should YIELD or impose a penalty? I’m not sure I understand the differences here

Thanks
Geoff



Harald Dobbernack

Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | www.key-work.de<https://www.key-work.de> | Datenschutz<https://www.key-work.de/de/footer/datenschutz.html>
Fon: +49-721-78203-264 | E-Mail: harald.dobbernack@key-work.de

Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
Geschäftsführer: Andreas Stappert, Tobin Wotring

Re: Some retry flowfile questions

Posted by Mark Payne <ma...@hotmail.com>.
Geoff,

The difference between penalization and yielding is whether the failure is data-dependent or not.

So, an easy way to think about this is to consider a scenario where you have a simple flow: GetFTP -> PutFTP.
Something else is picking up data from the FTP server that you’re putting to.

You know that sometimes the data will already exist with the same name, but you don’t want to overwrite it because it’s likely to actually be different data with a conflicting filename.
So you want to wait a while and try to push that file again. In the meantime, you want to continue pushing other files to the FTP server.
In this case, the processor would penalize that FlowFile so that it can continue working on other data.

On the other hand, if PutFTP were to get a connection failure, it’s not even able to connect to that FTP server, then it doesn’t make sense to penalize that FlowFile and move onto the next one and try to push it. It can’t connect, so it can’t make progress regardless of what data it has.
In this case, the processor should yield.

Note, however, that it is up to the processor developer to tell the processor to yield or to penalize the FlowFile. It’s not up to the creator of the data flow.

Does that help?

Thanks
-Mark

On Apr 22, 2021, at 2:08 PM, Greene (US), Geoffrey N <ge...@boeing.com>> wrote:

We have a rest endpoint that is “unreliable”. It works sometimes.
When it doesn’t work, the solution seems to be to sleep for awhile, then try again

So I put in a retry processor:

http processor    <-  Retry
   |      \             ^
Success  Failure  -----|

So far, so good, that loop works.  But how do I handle the slow down?
Does the penalty / yield go on the retry? Or on the http?  Whats the difference?  How do I know if I should YIELD or impose a penalty? I’m not sure I understand the differences here

Thanks
Geoff