You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Reinhard Sell (Jira)" <ji...@apache.org> on 2021/06/28 15:54:00 UTC
[jira] [Created] (NIFI-8746) ListenRELP does not reliably recover from errors

Reinhard Sell created NIFI-8746:
-----------------------------------

             Summary: ListenRELP does not reliably recover from errors 
                 Key: NIFI-8746
                 URL: https://issues.apache.org/jira/browse/NIFI-8746
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 1.13.2
            Reporter: Reinhard Sell


The ListenRELP processor does sometimes not recover from errors (e.g. RELPFrameException). A manual stop and start of the processor is then necessary to re-establish communication with the client (rsyslog). In particular, if such an errors occurs more than once.

h3. How to reproduce:

* Enable DEBUG logging for {{org.apache.nifi.processors.standard.ListenRELP}}
* Create a simple flow with a ListenRELP processor, set a valid port (e.g. 12345). Leave default for all other values, esp. *Max Number of TCP Connections = 2.*
* Connect ListenRELPs output to a funnel and start it.
* Install the tool {{nc}} (netcat).
* Use {{nc}} to provide some correct and also some invalid data as follows:

Start RELP session on command line:
{{$ nc 127.0.0.1 12345}}

Enter the following to open the connection:
{{1 open 0}}

Expect the following response:
{{1 rsp 7 200 OK}}
{{}}

Enter the following to submit a valid line:
{{2 syslog 3 abc}}

Expect the following response:
{{2 rsp 6 200 OK}}

Now enter an invalid line:
{{3 syslog -1}}

Expect RELPFrameException in the logs and *no* response in the {{nc}} session.
Nifi will not respond via this connection anymore, even for valid lines. Which is ok
according to the RELP spec.

Press Ctrl-C to end the {{nc}} session.

Open a new {{nc}} session and repeat the same commands.
It should work for a second time, as we may have two TCP connections.

However, it will not work a third or fourth time: At some point in time ListenRELP will not respond at all, even within a complete new connection. The only way to recover from this state seems to be: Stop and Start of the processor.

Also: At some point in time (after all connections have been used up?) the following DEBUG message is printed *very* often (several times per ms!):

{{o.a.nifi.processors.standard.ListenRELP ListenRELP[id=<uuid>] No more data available, returning for selection}}

This behaviour is a problem for our production setup: Even though it does not happen very often, it does happen. And data might be lost, if this state is not detected and resolved fast enough.

Disclaimer: Sending an invalid RELP frame is *not* what happens in our production environment. It's just a simple way to get ListenRELP into this state.
We are not sure about the core reason for the communcation interuption, perhaps a network/firewall issue. But the result looks very much like described here.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)