You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Reinhard Sell (Jira)" <ji...@apache.org> on 2021/06/28 15:55:00 UTC
[jira] [Updated] (NIFI-8746) ListenRELP does not reliably recover
from errors
[ https://issues.apache.org/jira/browse/NIFI-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reinhard Sell updated NIFI-8746:
--------------------------------
Description:
The ListenRELP processor does sometimes not recover from errors (e.g. RELPFrameException). A manual stop and start of the processor is then necessary to re-establish communication with the client (rsyslog). In particular, if such an errors occurs more than once.
h3. How to reproduce:
* Enable DEBUG logging for {{org.apache.nifi.processors.standard.ListenRELP}}
* Create a simple flow with a ListenRELP processor, set a valid port (e.g. 12345). Leave default for all other values, esp. *Max Number of TCP Connections = 2.*
* Connect ListenRELPs output to a funnel and start it.
* Install the tool {{nc}} (netcat).
* Use {{nc}} to provide some correct and also some invalid data as follows:
Start RELP session on command line:
{{$ nc 127.0.0.1 12345}}
Enter the following to open the connection:
{{1 open 0}}
Expect the following response:
{{1 rsp 7 200 OK}}
{{ }}
Enter the following to submit a valid line:
{{2 syslog 3 abc}}
Expect the following response:
{{2 rsp 6 200 OK}}
Now enter an invalid line:
{{3 syslog -1}}
Expect RELPFrameException in the logs and *no* response in the {{nc}} session.
Nifi will not respond via this connection anymore, even for valid lines. Which is ok
according to the RELP spec.
Press Ctrl-C to end the {{nc}} session.
Open a new {{nc}} session and repeat the same commands.
It should work for a second time, as we may have two TCP connections.
However, it will not work a third or fourth time: At some point in time ListenRELP will not respond at all, even within a complete new connection. The only way to recover from this state seems to be: Stop and Start of the processor.
Also: At some point in time (after all connections have been used up?) the following DEBUG message is printed *very* often (several times per ms!):
{{o.a.nifi.processors.standard.ListenRELP ListenRELP[id=<uuid>] No more data available, returning for selection}}
This behaviour is a problem for our production setup: Even though it does not happen very often, it does happen. And data might be lost, if this state is not detected and resolved fast enough.
Disclaimer: Sending an invalid RELP frame is *not* what happens in our production environment. It's just a simple way to get ListenRELP into this state.
We are not sure about the core reason for the communcation interuption, perhaps a network/firewall issue. But the result looks very much like described here.
was:
The ListenRELP processor does sometimes not recover from errors (e.g. RELPFrameException). A manual stop and start of the processor is then necessary to re-establish communication with the client (rsyslog). In particular, if such an errors occurs more than once.
h3. How to reproduce:
* Enable DEBUG logging for {{org.apache.nifi.processors.standard.ListenRELP}}
* Create a simple flow with a ListenRELP processor, set a valid port (e.g. 12345). Leave default for all other values, esp. *Max Number of TCP Connections = 2.*
* Connect ListenRELPs output to a funnel and start it.
* Install the tool {{nc}} (netcat).
* Use {{nc}} to provide some correct and also some invalid data as follows:
Start RELP session on command line:
{{$ nc 127.0.0.1 12345}}
Enter the following to open the connection:
{{1 open 0}}
Expect the following response:
{{1 rsp 7 200 OK}}
{{}}
Enter the following to submit a valid line:
{{2 syslog 3 abc}}
Expect the following response:
{{2 rsp 6 200 OK}}
Now enter an invalid line:
{{3 syslog -1}}
Expect RELPFrameException in the logs and *no* response in the {{nc}} session.
Nifi will not respond via this connection anymore, even for valid lines. Which is ok
according to the RELP spec.
Press Ctrl-C to end the {{nc}} session.
Open a new {{nc}} session and repeat the same commands.
It should work for a second time, as we may have two TCP connections.
However, it will not work a third or fourth time: At some point in time ListenRELP will not respond at all, even within a complete new connection. The only way to recover from this state seems to be: Stop and Start of the processor.
Also: At some point in time (after all connections have been used up?) the following DEBUG message is printed *very* often (several times per ms!):
{{o.a.nifi.processors.standard.ListenRELP ListenRELP[id=<uuid>] No more data available, returning for selection}}
This behaviour is a problem for our production setup: Even though it does not happen very often, it does happen. And data might be lost, if this state is not detected and resolved fast enough.
Disclaimer: Sending an invalid RELP frame is *not* what happens in our production environment. It's just a simple way to get ListenRELP into this state.
We are not sure about the core reason for the communcation interuption, perhaps a network/firewall issue. But the result looks very much like described here.
> ListenRELP does not reliably recover from errors
> -------------------------------------------------
>
> Key: NIFI-8746
> URL: https://issues.apache.org/jira/browse/NIFI-8746
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.13.2
> Reporter: Reinhard Sell
> Priority: Major
>
> The ListenRELP processor does sometimes not recover from errors (e.g. RELPFrameException). A manual stop and start of the processor is then necessary to re-establish communication with the client (rsyslog). In particular, if such an errors occurs more than once.
> h3. How to reproduce:
> * Enable DEBUG logging for {{org.apache.nifi.processors.standard.ListenRELP}}
> * Create a simple flow with a ListenRELP processor, set a valid port (e.g. 12345). Leave default for all other values, esp. *Max Number of TCP Connections = 2.*
> * Connect ListenRELPs output to a funnel and start it.
> * Install the tool {{nc}} (netcat).
> * Use {{nc}} to provide some correct and also some invalid data as follows:
> Start RELP session on command line:
> {{$ nc 127.0.0.1 12345}}
> Enter the following to open the connection:
> {{1 open 0}}
> Expect the following response:
> {{1 rsp 7 200 OK}}
> {{ }}
> Enter the following to submit a valid line:
> {{2 syslog 3 abc}}
> Expect the following response:
> {{2 rsp 6 200 OK}}
> Now enter an invalid line:
> {{3 syslog -1}}
> Expect RELPFrameException in the logs and *no* response in the {{nc}} session.
> Nifi will not respond via this connection anymore, even for valid lines. Which is ok
> according to the RELP spec.
> Press Ctrl-C to end the {{nc}} session.
> Open a new {{nc}} session and repeat the same commands.
> It should work for a second time, as we may have two TCP connections.
> However, it will not work a third or fourth time: At some point in time ListenRELP will not respond at all, even within a complete new connection. The only way to recover from this state seems to be: Stop and Start of the processor.
> Also: At some point in time (after all connections have been used up?) the following DEBUG message is printed *very* often (several times per ms!):
> {{o.a.nifi.processors.standard.ListenRELP ListenRELP[id=<uuid>] No more data available, returning for selection}}
> This behaviour is a problem for our production setup: Even though it does not happen very often, it does happen. And data might be lost, if this state is not detected and resolved fast enough.
> Disclaimer: Sending an invalid RELP frame is *not* what happens in our production environment. It's just a simple way to get ListenRELP into this state.
> We are not sure about the core reason for the communcation interuption, perhaps a network/firewall issue. But the result looks very much like described here.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)