You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by David Caldwell <dd...@yahoo.com.INVALID> on 2019/10/31 18:28:51 UTC

invokeHttp routing of exceptions like ConnectException and IOException to failure instead of retry

Hi,
While testing invokeHttp retry logic when the destination endpoint is offline, I learned that invokeHttp processor routes exceptions caused by the offline endpoint to the failure relationship instead of the retry relationship.
That surprised me since those types of errors are exactly what I would normally like to retry.  What is the the reasoning for routing them to failure instead of retry?
Furthermore, when I routed the failure relationship back into invokeHttp to retry, I then found that nifi cpu usage stays around 150-160% until the remote endpoint comes back online.
Additional digging showed that invokeHttp penalizes retry, no_retry & failure scenarios, but yields only for retry and no_retry.  IOW, the failure scenario doesn't yield.

I don't really have a problem with failure not yielding.  That's just what I suspect may be causing the excessive cpu utilization.  My real problem is that I think recoverable communications exceptions should be routed to retry instead of failure.  Not only would that avoid developer surprise, but it would include yield which I hope would prevent the high cpu utilization.
Reading the topic https://nifi.apache.org/docs/nifi-docs/components/nifi-docs/html/developer-guide.html#penalization-vs-yielding, the following points reinforced my thinking:   
   - Yield when processor won't be able to perform any useful function for some period of time
   
   - This tells framework, don't waste resources triggering the processor to run, because there's nothing it can do for a while
   - The topic actually uses a processor communicating with remote resource as the example for yield
Thoughts?
David

Re: invokeHttp routing of exceptions like ConnectException and IOException to failure instead of retry

Posted by Adam Taft <ad...@adamtaft.com>.

Hi David,

*> "What is the reasoning for routing them to failure instead of retry?"*

Good question ... HTTP status codes give good hints as to what a client
should do for retry/no-retry operations.  Generally 400 error codes do not
get retried, 500 codes get retried, etc.  It doesn't, however, give any
indication what a client should do in case of not connecting or having host
lookup problems down at the TCP level.

The "failure" relationship in NiFi is somewhat a common "catch all"
relationship, with a significant number of processors having both a
"success" and "failure" relationship pair.  InvokeHTTP uses that precedent
to capture TCP oriented failures,  and additionally provides relationships
when the http protocol can provide more context.

In short, the "failure" relationship captures TCP related problems.  The
"retry" / "no-retry" relationships capture HTTP related problems.

There's really no ability to tell, at the TCP level, whether a host will
come back online in the future or not.  HTTP 5xx service codes give a
pretty good hint that the request can be retried again in the future, but
TCP ConnectException or UnknownHostException don't really give any
indication for that.

On your other comment, "Yield vs. Penalize" ...

The "yield" function in NiFi is a mechanism that is used in the context of
the Processor.  It's basically a way from a processor to evaluate whether
there is any "work to do" and signal the framework that it can relinquish
its resources.  If a FlowFile is queued above a Processor, somewhat by
definition, the Processor indeed has work to do and therefore shouldn't
yield.  The yield function is applied in the context of the Processor
itself.

Whereas, the "penalize" function in NiFi is oriented to a FlowFile itself.
The Processor might notice a problem with a FlowFile as it is working on
it.  The processor can then apply a penalty to the FlowFile, which is
effectively a signal back to the downstream queues.  A NiFi Queue that
handles a FlowFile which has been penalized will effectively "hide" that
FlowFile until the penalty duration has expired, regardless of where that
flowfile is being routed.

So as a developer creating a custom processor, deciding when to "yield" is
a function of determining if Processor has work to do.  Whereas deciding
when to "penalize" is a function of determining if there was a problem with
the FlowFile being processed.

Now, InvokeHTTP is a complicated beast.  So it's having to make
determinations as to whether it should yield based on whether it's
considering itself a "source processor".  Because of its complexity, you
are really seeing multiple design patterns being played out inside the
code.  But fundamentally, the InvokeHTTP processor shouldn't be making a
decision to "yield" based on a FlowFile that had previously failed to
connect.

Because of the way InvokeHTTP is designed, it's not necessarily configured
to just connect with one host.  The URL parameter can be read in from
flowfile attributes (via expression language) allowing it to potentially
make requests to any number of hosts. So we can't universally predict when
to yield, penalize, retry or fail.

Maybe there's some room for improvement.  But I hope that gives some of the
background that you were asking for.

On Thu, Oct 31, 2019 at 12:48 PM David Caldwell <dd...@yahoo.com.invalid>
wrote:

> Hi,
> While testing invokeHttp retry logic when the destination endpoint is
> offline, I learned that invokeHttp processor routes exceptions caused by
> the offline endpoint to the failure relationship instead of the retry
> relationship.
> That surprised me since those types of errors are exactly what I would
> normally like to retry.  What is the the reasoning for routing them to
> failure instead of retry?
> Furthermore, when I routed the failure relationship back into invokeHttp
> to retry, I then found that nifi cpu usage stays around 150-160% until the
> remote endpoint comes back online.
> Additional digging showed that invokeHttp penalizes retry, no_retry &
> failure scenarios, but yields only for retry and no_retry.  IOW, the
> failure scenario doesn't yield.
>
> I don't really have a problem with failure not yielding.  That's just what
> I suspect may be causing the excessive cpu utilization.  My real problem is
> that I think recoverable communications exceptions should be routed to
> retry instead of failure.  Not only would that avoid developer surprise,
> but it would include yield which I hope would prevent the high cpu
> utilization.
> Reading the topic
> https://nifi.apache.org/docs/nifi-docs/components/nifi-docs/html/developer-guide.html#penalization-vs-yielding,
> the following points reinforced my thinking:
>    - Yield when processor won't be able to perform any useful function for
> some period of time
>
>    - This tells framework, don't waste resources triggering the processor
> to run, because there's nothing it can do for a while
>    - The topic actually uses a processor communicating with remote
> resource as the example for yield
> Thoughts?
> David
>