You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Shawn Weeks <sw...@weeksconsulting.us> on 2021/09/15 18:12:29 UTC

NiFi Fails to Reconnect to Zookeeper After an Outage

Had a Zookeeper cluster go down and after things came back up NiFi seemed stuck and wouldn't ever reestablish the cluster. The following error was repeated on the node that wouldn't rejoin. Googling the first message mentions a bug in the curator library that causes it to never reconnect to Zookeeper after an issue. See https://issues.apache.org/jira/browse/CURATOR-405 for an example. This is on 1.14.0 against Zookeeper 3.6.3

2021-09-15 18:03:20,644 WARN [Curator-ConnectionStateManager-0] o.a.c.f.state.ConnectionStateManager Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 10000. Adjusted session timeout ms: 10000
2021-09-15 18:03:25,201 WARN [Clustering Tasks Thread-2] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Cannot send heartbeat because there is no Cluster Coordinator currently elected

RE: NiFi Fails to Reconnect to Zookeeper After an Outage

Posted by Shawn Weeks <sw...@weeksconsulting.us>.

Supposedly it was fixed in Curator 4.1 which it looks like NiFi 1.14.0 already was on 4.2 so my issue might be unrelated. On the nodes that drop out of the cluster that warn message from the curator will get logged continuously and the nodes never seem to pickup the new cluster coordinator.

Thanks
Shawn

From: Pierre Villard <pi...@gmail.com>
Sent: Tuesday, November 30, 2021 9:12 AM
To: users@nifi.apache.org
Subject: Re: NiFi Fails to Reconnect to Zookeeper After an Outage

Hey Shawn,

I think you're looking for https://github.com/apache/nifi/pull/5503 but it's not part of NiFi 1.15.

Pierre

Le mar. 30 nov. 2021 à 15:58, Shawn Weeks <sw...@weeksconsulting.us>> a écrit :
Does anyone know if the patch to the curator library ever made it into NiFi? Still seeing this issue where once NiFi has lost it’s connecting to Zookeeper it will never recover it and thus never reconnect to the cluster.

Thanks
Shawn

From: Shawn Weeks <sw...@weeksconsulting.us>>
Sent: Wednesday, September 15, 2021 1:12 PM
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: NiFi Fails to Reconnect to Zookeeper After an Outage

Had a Zookeeper cluster go down and after things came back up NiFi seemed stuck and wouldn’t ever reestablish the cluster. The following error was repeated on the node that wouldn’t rejoin. Googling the first message mentions a bug in the curator library that causes it to never reconnect to Zookeeper after an issue. See https://issues.apache.org/jira/browse/CURATOR-405 for an example. This is on 1.14.0 against Zookeeper 3.6.3

2021-09-15 18:03:20,644 WARN [Curator-ConnectionStateManager-0] o.a.c.f.state.ConnectionStateManager Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 10000. Adjusted session timeout ms: 10000
2021-09-15 18:03:25,201 WARN [Clustering Tasks Thread-2] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Cannot send heartbeat because there is no Cluster Coordinator currently elected

Re: NiFi Fails to Reconnect to Zookeeper After an Outage

Posted by Pierre Villard <pi...@gmail.com>.

Hey Shawn,

I think you're looking for https://github.com/apache/nifi/pull/5503 but
it's not part of NiFi 1.15.

Pierre

Le mar. 30 nov. 2021 à 15:58, Shawn Weeks <sw...@weeksconsulting.us> a
écrit :

> Does anyone know if the patch to the curator library ever made it into
> NiFi? Still seeing this issue where once NiFi has lost it’s connecting to
> Zookeeper it will never recover it and thus never reconnect to the cluster.
>
>
>
> Thanks
>
> Shawn
>
>
>
> *From:* Shawn Weeks <sw...@weeksconsulting.us>
> *Sent:* Wednesday, September 15, 2021 1:12 PM
> *To:* users@nifi.apache.org
> *Subject:* NiFi Fails to Reconnect to Zookeeper After an Outage
>
>
>
> Had a Zookeeper cluster go down and after things came back up NiFi seemed
> stuck and wouldn’t ever reestablish the cluster. The following error was
> repeated on the node that wouldn’t rejoin. Googling the first message
> mentions a bug in the curator library that causes it to never reconnect to
> Zookeeper after an issue. See
> https://issues.apache.org/jira/browse/CURATOR-405 for an example. This is
> on 1.14.0 against Zookeeper 3.6.3
>
>
>
> 2021-09-15 18:03:20,644 WARN [Curator-ConnectionStateManager-0]
> o.a.c.f.state.ConnectionStateManager Session timeout has elapsed while
> SUSPENDED. Injecting a session expiration. Elapsed ms: 10000. Adjusted
> session timeout ms: 10000
>
> 2021-09-15 18:03:25,201 WARN [Clustering Tasks Thread-2]
> o.apache.nifi.controller.FlowController Failed to send heartbeat due to:
> org.apache.nifi.cluster.protocol.ProtocolException: Cannot send heartbeat
> because there is no Cluster Coordinator currently elected
>
>
>
>
>

Re: NiFi Fails to Reconnect to Zookeeper After an Outage

Posted by Joe Witt <jo...@gmail.com>.

Shawn,

I'm not aware of any specific action.  Can you please file a JIRA with
as much detail as possible?

Thanks

On Tue, Nov 30, 2021 at 7:58 AM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>
> Does anyone know if the patch to the curator library ever made it into NiFi? Still seeing this issue where once NiFi has lost it’s connecting to Zookeeper it will never recover it and thus never reconnect to the cluster.
>
>
>
> Thanks
>
> Shawn
>
>
>
> From: Shawn Weeks <sw...@weeksconsulting.us>
> Sent: Wednesday, September 15, 2021 1:12 PM
> To: users@nifi.apache.org
> Subject: NiFi Fails to Reconnect to Zookeeper After an Outage
>
>
>
> Had a Zookeeper cluster go down and after things came back up NiFi seemed stuck and wouldn’t ever reestablish the cluster. The following error was repeated on the node that wouldn’t rejoin. Googling the first message mentions a bug in the curator library that causes it to never reconnect to Zookeeper after an issue. See https://issues.apache.org/jira/browse/CURATOR-405 for an example. This is on 1.14.0 against Zookeeper 3.6.3
>
>
>
> 2021-09-15 18:03:20,644 WARN [Curator-ConnectionStateManager-0] o.a.c.f.state.ConnectionStateManager Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 10000. Adjusted session timeout ms: 10000
>
> 2021-09-15 18:03:25,201 WARN [Clustering Tasks Thread-2] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Cannot send heartbeat because there is no Cluster Coordinator currently elected
>
>
>
>

RE: NiFi Fails to Reconnect to Zookeeper After an Outage

Posted by Shawn Weeks <sw...@weeksconsulting.us>.

Does anyone know if the patch to the curator library ever made it into NiFi? Still seeing this issue where once NiFi has lost it's connecting to Zookeeper it will never recover it and thus never reconnect to the cluster.

Thanks
Shawn

From: Shawn Weeks <sw...@weeksconsulting.us>
Sent: Wednesday, September 15, 2021 1:12 PM
To: users@nifi.apache.org
Subject: NiFi Fails to Reconnect to Zookeeper After an Outage

Had a Zookeeper cluster go down and after things came back up NiFi seemed stuck and wouldn't ever reestablish the cluster. The following error was repeated on the node that wouldn't rejoin. Googling the first message mentions a bug in the curator library that causes it to never reconnect to Zookeeper after an issue. See https://issues.apache.org/jira/browse/CURATOR-405 for an example. This is on 1.14.0 against Zookeeper 3.6.3

2021-09-15 18:03:20,644 WARN [Curator-ConnectionStateManager-0] o.a.c.f.state.ConnectionStateManager Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 10000. Adjusted session timeout ms: 10000
2021-09-15 18:03:25,201 WARN [Clustering Tasks Thread-2] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Cannot send heartbeat because there is no Cluster Coordinator currently elected