You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Koji Kawamura (JIRA)" <ji...@apache.org> on 2017/12/14 08:23:00 UTC

[jira] [Commented] (NIFI-3377) NiFi RPG errors when switching between site-to-site transport protocols

    [ https://issues.apache.org/jira/browse/NIFI-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290529#comment-16290529 ] 

Koji Kawamura commented on NIFI-3377:
-------------------------------------

I was able to debug the root cause of this issue.

RemoteGroupPort (S2S client) persists remote NiFi node endpoints (hostname, port, isSecure) into local file (conf/state/port-id.peers by default). This file is designed for the old NiFi cluster management system prior to NiFi 1.0.0 that uses NCM (NiFi Cluster Manager) node. So that even if the NCM node goes down, S2S client NiFi instances can be restarted and restore remote node endpoints from the persisted file.

Currently, the peers file is read when a RemoteGroupPort is started. Since the file is named by a RemoteGroupPort GUID, it does not take S2S transport protocol into account. This causes the reported issue, if a RemoteGroupPort is configured to use RAW, then it persists RAW endpoints (e.g. remote1:8081, remote2:8081), and after its transmission is stopped and reconfigured to use HTTP, then it restores RAW endpoints when it's restarted. Actually it sends HTTP requests to the RAW port, and vise versa. That's why we see strange network layer error.

Once this happens, RemoteGroupPorts will not update remote endpoints either consumed all calculated request endpoints, or passes 60 seconds. That's why it doesn't immediately recover from the situation.

I was able to fix the issue by adding transport protocol to the persistence file name. I will submit a PR shortly.

> NiFi RPG errors when switching between site-to-site transport protocols
> -----------------------------------------------------------------------
>
>                 Key: NIFI-3377
>                 URL: https://issues.apache.org/jira/browse/NIFI-3377
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.1.0
>            Reporter: Matthew Clarke
>            Assignee: Koji Kawamura
>            Priority: Minor
>
> If i have a RPG configured to use the RAW transport protocol and then switch it to use HTTTP transport protocol, it will throw the following error message twice before finally correcting itself:
> 2017-01-19 22:10:32,363 ERROR [I/O dispatcher 841] o.a.n.r.util.SiteToSiteRestApiClient Failed to create transaction for http://<hostname>.openstacklocal:8055/nifi-api/data-transfer/input-ports/b76c293d-0159-1000-0000-00003f85f297/transactions
> org.apache.http.ConnectionClosedException: Connection closed unexpectedly
> 	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.closed(HttpAsyncRequestExecutor.java:140) [httpcore-nio-4.4.5.jar:4.4.5]
> 	at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:71) [httpasyncclient-4.1.2.jar:4.1.2]
> 	at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:39) [httpasyncclient-4.1.2.jar:4.1.2]
> 	at org.apache.http.impl.nio.reactor.AbstractIODispatch.disconnected(AbstractIODispatch.java:100) [httpcore-nio-4.4.5.jar:4.4.5]
> 	at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionClosed(BaseIOReactor.java:279) [httpcore-nio-4.4.5.jar:4.4.5]
> 	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processClosedSessions(AbstractIOReactor.java:440) [httpcore-nio-4.4.5.jar:4.4.5]
> 	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:283) [httpcore-nio-4.4.5.jar:4.4.5]
> 	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [httpcore-nio-4.4.5.jar:4.4.5]
> 	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) [httpcore-nio-4.4.5.jar:4.4.5]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Similarly the following ERROR message will be thrown many times when switching from HTTP to the RAW transport protocol:
> 2017-01-19 22:13:15,916 ERROR [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=outport2,targets=http://<hostname>:9090/nifi/] failed to communicate with http://<hostname>:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode
> With both these scenarios, the RPG will eventually correct itself and start working again.  User may be hesitant to wait once they start seeing these ERRORS and instead stop the RPG since the self correction does not occur rapidly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)