You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Ryan H <ry...@gmail.com> on 2017/03/06 14:17:05 UTC

NiFi 1.1.1 AWS EC2 Secure Cluster Zookeeper Connection Loss Error

Hi All,

I am running into another issue setting up a secure NiFi cluster across 2
EC2 instances in AWS. Shortly after starting up the two nodes, the
nifi-app.log is completely spammed with the following error message(s):

2017-03-06 13:48:06,029 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
~[zookeeper-3.4.6.jar:3.4.6-1569965]
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
[curator-framework-2.11.0.jar:na]
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
[curator-framework-2.11.0.jar:na]
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
[curator-framework-2.11.0.jar:na]
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
[curator-framework-2.11.0.jar:na]
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
[curator-framework-2.11.0.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_121]
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_121]
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[na:1.8.0_121]
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_121]
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_121]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
2017-03-06 13:48:06,029 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838)
[curator-framework-2.11.0.jar:na]
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
[curator-framework-2.11.0.jar:na]
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
[curator-framework-2.11.0.jar:na]
    at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
[curator-framework-2.11.0.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_121]
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_121]
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[na:1.8.0_121]
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_121]
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_121]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]


The error looks very similar to the following post:
http://apache-nifi-developer-list.39713.n7.nabble.com/Zookeeper-error-td13915.html

However, the resolution employed with the solution provided did not resolve
the issue (clearing state and flow from the NiFi nodes). As an aside and
FWIW, when stopping the nodes using ./bin/nifi.sh stop I will get the
following shut-down message:

ERROR [main] org.apache.nifi.bootstrap.Command Failed to send shutdown
command to port 32993 due to java.net.SocketTimeoutException: Read timed
out. Will kill the NiFi Process with PID 5984.

This issue looks extremely close to the following Bug as documented on
Apache:
https://issues.apache.org/jira/browse/CURATOR-209

Here is what I have previously successfully done during my development
efforts:

   1. Setup single standalone Unsecured NiFi.
   2. Setup multiple nodes Unsecured (clustered) on single EC2 instance.
   3. Setup multiple nodes Unsecured across multiple EC2 instances.
   4. Setup single standalone Secured NiFi.
   5. Setup multiple nodes Secured (clustered) on single EC2 instance.

Below are the relevant config files for my 2 nodes. Any help is greatly
appreciated!


Cheers,

Ryan H.


-----------------------------------------

EC2 Instance 1

-----------------------------------------

nifi.properties

nifi.state.management.embedded.zookeeper.start=true

# Site to Site
properties

nifi.remote.input.host=my-host-name-1
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec

# web properties
#

nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=
nifi.web.https.host=my-host-name-1
nifi.web.https.port=443
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200

# cluster common properties (all nodes must have same values)
#

nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=true

# cluster node properties (only configure for cluster nodes)
#

nifi.cluster.is.node=true
nifi.cluster.node.address=my-host-name-1
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=

# zookeeper properties, used for cluster management
#

nifi.zookeeper.connect.string=my-host-name-1:2181,my-host-name-2:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi

-----------------------------------------

state-management.xml

<cluster-provider>
        <id>zk-provider</id>

<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
        <property name="Connect
String">my-host-name-1:2181,my-host-name-2:2181</property>
        <property name="Root Node">/nifi</property>
        <property name="Session Timeout">10 seconds</property>
        <property name="Access Control">Open</property>
    </cluster-provider>

-----------------------------------------

zookeeper.properties

clientPort=2181
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30

server.1=my-host-name-1:2888:3888
server.2=my-host-name-2:2888:3888

-----------------------------------------

authorizers.xml

<authorizer>
        <identifier>file-provider</identifier>
        <class>org.apache.nifi.authorization.FileAuthorizer</class>
        <property name="Authorizations
File">./conf/authorizations.xml</property>
        <property name="Users File">./conf/users.xml</property>
        <property name="Initial Admin Identity">CN=admin, OU=NIFI</property>
        <property name="Legacy Authorized Users File"></property>
        <property name="Node Identity 1">CN=my-host-name-1,
OU=NIFI</property>
        <property name="Node Identity 2">CN=my-host-name-2,
OU=NIFI</property>
    </authorizer>

-----------------------------------------

EC2 Instance 2

-----------------------------------------

nifi.properties

nifi.state.management.embedded.zookeeper.start=true

# Site to Site
properties

nifi.remote.input.host=my-host-name-2
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec

# web properties
#

nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=
nifi.web.https.host=my-host-name-2
nifi.web.https.port=443
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200

# cluster common properties (all nodes must have same values)
#

nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=true

# cluster node properties (only configure for cluster nodes)
#

nifi.cluster.is.node=true
nifi.cluster.node.address=my-host-name-2
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=

# zookeeper properties, used for cluster management
#

nifi.zookeeper.connect.string=my-host-name-1:2181,my-host-name-2:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi

-----------------------------------------

state-management.xml

<cluster-provider>
        <id>zk-provider</id>

<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
        <property name="Connect
String">my-host-name-1:2181,my-host-name-2:2181</property>
        <property name="Root Node">/nifi</property>
        <property name="Session Timeout">10 seconds</property>
        <property name="Access Control">Open</property>
    </cluster-provider>

-----------------------------------------

zookeeper.properties

clientPort=2181
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30

server.1=my-host-name-1:2888:3888
server.2=my-host-name-2:2888:3888

-----------------------------------------

authorizers.xml
<authorizer>
        <identifier>file-provider</identifier>
        <class>org.apache.nifi.authorization.FileAuthorizer</class>
        <property name="Authorizations
File">./conf/authorizations.xml</property>
        <property name="Users File">./conf/users.xml</property>
        <property name="Initial Admin Identity">CN=admin, OU=NIFI</property>
        <property name="Legacy Authorized Users File"></property>
        <property name="Node Identity 1">CN=my-host-name-1,
OU=NIFI</property>
        <property name="Node Identity 2">CN=my-host-name-2,
OU=NIFI</property>
    </authorizer>