You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Reinhard Sell (Jira)" <ji...@apache.org> on 2021/06/28 15:55:00 UTC

[jira] [Updated] (NIFI-8746) ListenRELP does not reliably recover from errors

     [ https://issues.apache.org/jira/browse/NIFI-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reinhard Sell updated NIFI-8746:
--------------------------------
    Description: 
The ListenRELP processor does sometimes not recover from errors (e.g. RELPFrameException). A manual stop and start of the processor is then necessary to re-establish communication with the client (rsyslog). In particular, if such an errors occurs more than once.

h3. How to reproduce:

* Enable DEBUG logging for {{org.apache.nifi.processors.standard.ListenRELP}}
* Create a simple flow with a ListenRELP processor, set a valid port (e.g. 12345). Leave default for all other values, esp. *Max Number of TCP Connections = 2.*
* Connect ListenRELPs output to a funnel and start it.
* Install the tool {{nc}} (netcat).
* Use {{nc}} to provide some correct and also some invalid data as follows:

Start RELP session on command line:
{{$ nc 127.0.0.1 12345}}

Enter the following to open the connection:
{{1 open 0}}

Expect the following response:
{{1 rsp 7 200 OK}}
{{ }}

Enter the following to submit a valid line:
{{2 syslog 3 abc}}

Expect the following response:
{{2 rsp 6 200 OK}}

Now enter an invalid line:
{{3 syslog -1}}

Expect RELPFrameException in the logs and *no* response in the {{nc}} session.
Nifi will not respond via this connection anymore, even for valid lines. Which is ok
according to the RELP spec.

Press Ctrl-C to end the {{nc}} session.

Open a new {{nc}} session and repeat the same commands.
It should work for a second time, as we may have two TCP connections.

However, it will not work a third or fourth time: At some point in time ListenRELP will not respond at all, even within a complete new connection. The only way to recover from this state seems to be: Stop and Start of the processor.

Also: At some point in time (after all connections have been used up?) the following DEBUG message is printed *very* often (several times per ms!):

{{o.a.nifi.processors.standard.ListenRELP ListenRELP[id=<uuid>] No more data available, returning for selection}}

This behaviour is a problem for our production setup: Even though it does not happen very often, it does happen. And data might be lost, if this state is not detected and resolved fast enough.

Disclaimer: Sending an invalid RELP frame is *not* what happens in our production environment. It's just a simple way to get ListenRELP into this state.
We are not sure about the core reason for the communcation interuption, perhaps a network/firewall issue. But the result looks very much like described here.




  was:
The ListenRELP processor does sometimes not recover from errors (e.g. RELPFrameException). A manual stop and start of the processor is then necessary to re-establish communication with the client (rsyslog). In particular, if such an errors occurs more than once.

h3. How to reproduce:

* Enable DEBUG logging for {{org.apache.nifi.processors.standard.ListenRELP}}
* Create a simple flow with a ListenRELP processor, set a valid port (e.g. 12345). Leave default for all other values, esp. *Max Number of TCP Connections = 2.*
* Connect ListenRELPs output to a funnel and start it.
* Install the tool {{nc}} (netcat).
* Use {{nc}} to provide some correct and also some invalid data as follows:

Start RELP session on command line:
{{$ nc 127.0.0.1 12345}}

Enter the following to open the connection:
{{1 open 0}}

Expect the following response:
{{1 rsp 7 200 OK}}
{{}}

Enter the following to submit a valid line:
{{2 syslog 3 abc}}

Expect the following response:
{{2 rsp 6 200 OK}}

Now enter an invalid line:
{{3 syslog -1}}

Expect RELPFrameException in the logs and *no* response in the {{nc}} session.
Nifi will not respond via this connection anymore, even for valid lines. Which is ok
according to the RELP spec.

Press Ctrl-C to end the {{nc}} session.

Open a new {{nc}} session and repeat the same commands.
It should work for a second time, as we may have two TCP connections.

However, it will not work a third or fourth time: At some point in time ListenRELP will not respond at all, even within a complete new connection. The only way to recover from this state seems to be: Stop and Start of the processor.

Also: At some point in time (after all connections have been used up?) the following DEBUG message is printed *very* often (several times per ms!):

{{o.a.nifi.processors.standard.ListenRELP ListenRELP[id=<uuid>] No more data available, returning for selection}}

This behaviour is a problem for our production setup: Even though it does not happen very often, it does happen. And data might be lost, if this state is not detected and resolved fast enough.

Disclaimer: Sending an invalid RELP frame is *not* what happens in our production environment. It's just a simple way to get ListenRELP into this state.
We are not sure about the core reason for the communcation interuption, perhaps a network/firewall issue. But the result looks very much like described here.





> ListenRELP does not reliably recover from errors 
> -------------------------------------------------
>
>                 Key: NIFI-8746
>                 URL: https://issues.apache.org/jira/browse/NIFI-8746
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.13.2
>            Reporter: Reinhard Sell
>            Priority: Major
>
> The ListenRELP processor does sometimes not recover from errors (e.g. RELPFrameException). A manual stop and start of the processor is then necessary to re-establish communication with the client (rsyslog). In particular, if such an errors occurs more than once.
> h3. How to reproduce:
> * Enable DEBUG logging for {{org.apache.nifi.processors.standard.ListenRELP}}
> * Create a simple flow with a ListenRELP processor, set a valid port (e.g. 12345). Leave default for all other values, esp. *Max Number of TCP Connections = 2.*
> * Connect ListenRELPs output to a funnel and start it.
> * Install the tool {{nc}} (netcat).
> * Use {{nc}} to provide some correct and also some invalid data as follows:
> Start RELP session on command line:
> {{$ nc 127.0.0.1 12345}}
> Enter the following to open the connection:
> {{1 open 0}}
> Expect the following response:
> {{1 rsp 7 200 OK}}
> {{ }}
> Enter the following to submit a valid line:
> {{2 syslog 3 abc}}
> Expect the following response:
> {{2 rsp 6 200 OK}}
> Now enter an invalid line:
> {{3 syslog -1}}
> Expect RELPFrameException in the logs and *no* response in the {{nc}} session.
> Nifi will not respond via this connection anymore, even for valid lines. Which is ok
> according to the RELP spec.
> Press Ctrl-C to end the {{nc}} session.
> Open a new {{nc}} session and repeat the same commands.
> It should work for a second time, as we may have two TCP connections.
> However, it will not work a third or fourth time: At some point in time ListenRELP will not respond at all, even within a complete new connection. The only way to recover from this state seems to be: Stop and Start of the processor.
> Also: At some point in time (after all connections have been used up?) the following DEBUG message is printed *very* often (several times per ms!):
> {{o.a.nifi.processors.standard.ListenRELP ListenRELP[id=<uuid>] No more data available, returning for selection}}
> This behaviour is a problem for our production setup: Even though it does not happen very often, it does happen. And data might be lost, if this state is not detected and resolved fast enough.
> Disclaimer: Sending an invalid RELP frame is *not* what happens in our production environment. It's just a simple way to get ListenRELP into this state.
> We are not sure about the core reason for the communcation interuption, perhaps a network/firewall issue. But the result looks very much like described here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)