You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Phil H <gi...@gmail.com> on 2018/10/01 03:07:42 UTC

Zookeeper - help!

Hi guys,

Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.

Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.

Server nifi1.domain:
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi1.domain
nifi.cluster.node.protocol.port=10000

nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
nifi.zookeeper.root.node=/nifi

Server nifi2.domain:
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi2.domain
nifi.cluster.node.protocol.port=10000

nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
nifi.zookeeper.root.node=/nifi

I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):

2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
        at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
        at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
        at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
        at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
        at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
        at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
        ... 5 common frames omitted
Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
        at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
        at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
        ... 7 common frames omitted



2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Re: Zookeeper - help!

Posted by Nathan Gough <th...@gmail.com>.
And I forgot the connect.string config:

nifi1/conf/nifi.properties:nifi.zookeeper.connect.string=nifi1.com:2180,nifi1.com:2181
nifi2/conf/nifi.properties:nifi.zookeeper.connect.string=nifi2.com:2180,nifi2.com:2181



Note that the configuration I've given is used for dev purposes. In a production environment Zookeeper needs to run on an odd-number of nodes: http://www.corejavaguru.com/blog/bigdata/why-zookeeper-on-odd-number-nodes.php. If you're still having issues, you could run a single Zookeeper node on nifi1 only:

nifi1/conf/nifi.properties:nifi.zookeeper.connect.string=nifi1.com:2180
nifi1/conf/nifi.properties:nifi.state.management.embedded.zookeeper.start=true
nifi1/conf/zookeeper.properties
- clientPort=2180
...
- server.1=nifi1.com:2888:3888

nifi2/conf/nifi.properties:nifi.zookeeper.connect.string=nifi1.com:2180  // Connect to the node1 zookeeper
nifi2/conf/nifi.properties:nifi.state.management.embedded.zookeeper.start=false
nifi2/conf/zookeeper.properties not required


Nathan



On 10/2/18, 11:25 AM, "Nathan Gough" <th...@gmail.com> wrote:

    Check your configs on nifi2. I don't believe that NiFi is starting two instances of Zookeeper but the ports configured are unintentionally configured to overlap ie. Ports used twice in different configs where they should be different.
    
    It may be that your zookeeper.properties has:
    
    clientPort=2180
    ...
    server.1=nifi1.com:2180:3888
    server.2=nifi2.com:2180:3888
    
    where it should be:
    
    clientPort=2180
    ...
    server.1=nifi1.com:2888:3888
    server.2=nifi2.com:2888:3888
    
    noticing that the server.1 and server.2 ranges don't overlap with the client port.
    
    
    Not sure if this helps, but the following is the relevant config that I have for my NiFi cluster nodes that run on the SAME machine where nifi1.com and nifi2.com are configured in /etc/hosts:
    
    nifi1/conf
    zookeeper.properties
    - clientPort=2180
    - server.1=nifi1.com:2888:3888
    - server.2=nifi2.com:2888:3888
    
    nifi.properties
    - nifi.remote.input.host=nifi1.com
    - nifi.remote.input.socket.port=10440
    - nifi.web.http.host=nifi1.com
    - nifi.web.http.port=9550
    - nifi.cluster.node.address=nifi1.com
    - nifi.cluster.node.protocol.port=11440
    
    nifi1/state/zookeeper
    /myid (file contents = "1")
    /state-management.xml (no changes required)
    /version-2/
    
    
    nifi2/conf
    zookeeper.properties
    - clientPort=2181
    - server.1=nifi1.com:2888:3888
    - server.2=nifi2.com:2888:3888
    
    nifi.properties
    - nifi.remote.input.host=nifi2.com
    - nifi.remote.input.socket.port=10441
    - nifi.web.http.host=nifi2.com
    - nifi.web.http.port=9551
    - nifi.cluster.node.address=nifi2.com
    - nifi.cluster.node.protocol.port=11441
    
    nifi2/state/zookeeper
    /myid (file contents = "2")
    /state-management.xml (no changes required)
    /version-2/
    
    
    Nathan
    
    
    
    On 10/2/18, 2:07 AM, "Phil H" <gi...@gmail.com> wrote:
    
        Hi Andy,
        
        Thanks for the additional info.  I think I saw a link to that while searching but was wary since it was such an old version.
        
        I have two VMs (nifi1, and nifi2) both running NiFi with identical configs, and trying to use the inbuilt ZK to cluster them.
        
        If I only mention a single machine within the config (eg: if nifi1 doesn’t refer to nifi2, or visa versa) I don’t get any start up errors.
        
        Phil
        
        From: Andy LoPresto
        Sent: Tuesday, 2 October 2018 1:00 PM
        To: dev@nifi.apache.org
        Subject: Re: Zookeeper - help!
        
        Hi Phil, 
        
        Nathan’s advice is correct but I think he was assuming all other configurations are correct as well. Are you trying to run both NiFi nodes and ZK instances on the same machine? In that case you will have to ensure that the ports in use are different for each service so they don’t conflict. Setting them all to the same value only works if each service is running on an independent physical machine, virtual machine, or container. 
        
        I find Pierre’s guide [1] to be a helpful step-by-step instruction list as well as a good explanation of how the clustering concepts work in practice. When you get that working, and you’re ready to set up a secure cluster, he has a follow-on guide for that as well [2]. Even as someone who has set up many clustered instances of NiFi, I use his guides regularly to ensure I haven’t forgotten a step. 
        
        They were originally written for versions 1.0.0 and 1.1.0, but the only thing that has changed is the authorizer configuration for the secure instances (you’ll need to put the Initial Admin Identity and Node Identities in two locations in the authorizers.xml file instead of just once). 
        
        Hopefully this helps you get a working cluster up and running so you can experiment. Good luck. 
        
        [1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
        [2] https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/
        
        
        Andy LoPresto
        alopresto@apache.org
        alopresto.apache@gmail.com
        PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
        
        On Oct 1, 2018, at 2:45 PM, Phil H <gi...@gmail.com> wrote:
        
        Thanks Nathan,
        
        I changed the protocol.port to 10002 on both servers.
        
        On server 1, I now just see endless copies of the second error from my original message (“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when there’s only a single member of a cluster alive and running?  Seems like the logs will fill up very quickly if it is!
        
        On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter what I set it to (In this example, I changed it to 10500) I always get the same result.  If I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this is purely for teaching myself clustering config – so I don’t know why one is behaving differently to the other.
        
        2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
        2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/192.168.10.102:10500
        java.net.BindException: Address already in use (Bind failed)
        	at java.net.PlainSocketImpl.socketBind(Native Method)
        	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        	at java.net.ServerSocket.bind(ServerSocket.java:375)
        	at java.net.ServerSocket.bind(ServerSocket.java:329)
        	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
        	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
        	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
        
        
        
        
        From: Nathan Gough
        Sent: Tuesday, 2 October 2018 2:22 AM
        To: dev@nifi.apache.org
        Subject: Re: Zookeeper - help!
        
        Hi Phil,
        
        One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.
        
        You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
        
        Let us know what happens once these properties are configured correctly.
        
        Nathan
        
        
        On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:
        
           Hi guys,
        
           Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.
        
           Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.
        
           Server nifi1.domain:
           nifi.cluster.is.node=true
           nifi.cluster.node.address=nifi1.domain
           nifi.cluster.node.protocol.port=10000
        
           nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
           nifi.zookeeper.root.node=/nifi
        
           Server nifi2.domain:
           nifi.cluster.is.node=true
           nifi.cluster.node.address=nifi2.domain
           nifi.cluster.node.protocol.port=10000
        
           nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
           nifi.zookeeper.root.node=/nifi
        
           I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):
        
           2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
           2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
           2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
           2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
           2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
           org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
                   at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
                   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                   at java.lang.Thread.run(Thread.java:748)
           Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
                   ... 5 common frames omitted
           Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
                   ... 7 common frames omitted
        
        
        
           2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
           2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
           org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
                   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
                   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                   at java.lang.Thread.run(Thread.java:748)
        
        
        
        
        
        
        
    
    
    



Re: Zookeeper - help!

Posted by Pierre Villard <pi...@gmail.com>.
Hi Phil,

Happy to improve my post about it. Which one are you referring to?

Also when you say "servers’ certificates need to be installed in each
server’s keystore". That should not be the case. A keystore should only
contain the identity of the server where the keystore is installed. If
you're talking about the truststore, then you should not be in this
situation if you're using a CA to sign the server certificates.

Thanks,
Pierre

Le mer. 3 oct. 2018 à 09:09, Phil H <gi...@gmail.com> a écrit :

> Okay,
>
> I have got this working now, albeit with only a single ZK instance (at
> this stage).
>
> The missing piece of the puzzle that wasn’t in the guides from Pierre was
> that cluster servers’ certificates need to be installed in each server’s
> keystore, and all the cluster server DNs need to be added as Initial User
> Identities in authorizers.xml.
>
> Thanks again for all the assistance.
>
> Sent from Mail for Windows 10
>
> From: Nathan Gough
> Sent: Wednesday, 3 October 2018 7:27 AM
> To: dev@nifi.apache.org
> Subject: Re: Zookeeper - help!
>
> I think you are correct on that, I assumed it was a range of some kind but
> it looks like it's not:
> http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
>
>
> On 10/2/18, 5:17 PM, "Phil H" <gi...@gmail.com> wrote:
>
>     The second port in the zookeeper server config has been a mystery to
> me.  I thought it was a second port used for elections, not the upper bound
> in a range.  Why is the range so large?
>
>     Sent from Mail for Windows 10
>
>     From: Nathan Gough
>     Sent: Wednesday, 3 October 2018 1:26 AM
>     To: dev@nifi.apache.org
>     Subject: Re: Zookeeper - help!
>
>     Check your configs on nifi2. I don't believe that NiFi is starting two
> instances of Zookeeper but the ports configured are unintentionally
> configured to overlap ie. Ports used twice in different configs where they
> should be different.
>
>     It may be that your zookeeper.properties has:
>
>     clientPort=2180
>     ...
>     server.1=nifi1.com:2180:3888
>     server.2=nifi2.com:2180:3888
>
>     where it should be:
>
>     clientPort=2180
>     ...
>     server.1=nifi1.com:2888:3888
>     server.2=nifi2.com:2888:3888
>
>     noticing that the server.1 and server.2 ranges don't overlap with the
> client port.
>
>
>     Not sure if this helps, but the following is the relevant config that
> I have for my NiFi cluster nodes that run on the SAME machine where
> nifi1.com and nifi2.com are configured in /etc/hosts:
>
>     nifi1/conf
>     zookeeper.properties
>     - clientPort=2180
>     - server.1=nifi1.com:2888:3888
>     - server.2=nifi2.com:2888:3888
>
>     nifi.properties
>     - nifi.remote.input.host=nifi1.com
>     - nifi.remote.input.socket.port=10440
>     - nifi.web.http.host=nifi1.com
>     - nifi.web.http.port=9550
>     - nifi.cluster.node.address=nifi1.com
>     - nifi.cluster.node.protocol.port=11440
>
>     nifi1/state/zookeeper
>     /myid (file contents = "1")
>     /state-management.xml (no changes required)
>     /version-2/
>
>
>     nifi2/conf
>     zookeeper.properties
>     - clientPort=2181
>     - server.1=nifi1.com:2888:3888
>     - server.2=nifi2.com:2888:3888
>
>     nifi.properties
>     - nifi.remote.input.host=nifi2.com
>     - nifi.remote.input.socket.port=10441
>     - nifi.web.http.host=nifi2.com
>     - nifi.web.http.port=9551
>     - nifi.cluster.node.address=nifi2.com
>     - nifi.cluster.node.protocol.port=11441
>
>     nifi2/state/zookeeper
>     /myid (file contents = "2")
>     /state-management.xml (no changes required)
>     /version-2/
>
>
>     Nathan
>
>
>
>     On 10/2/18, 2:07 AM, "Phil H" <gi...@gmail.com> wrote:
>
>         Hi Andy,
>
>         Thanks for the additional info.  I think I saw a link to that
> while searching but was wary since it was such an old version.
>
>         I have two VMs (nifi1, and nifi2) both running NiFi with identical
> configs, and trying to use the inbuilt ZK to cluster them.
>
>         If I only mention a single machine within the config (eg: if nifi1
> doesn’t refer to nifi2, or visa versa) I don’t get any start up errors.
>
>         Phil
>
>         From: Andy LoPresto
>         Sent: Tuesday, 2 October 2018 1:00 PM
>         To: dev@nifi.apache.org
>         Subject: Re: Zookeeper - help!
>
>         Hi Phil,
>
>         Nathan’s advice is correct but I think he was assuming all other
> configurations are correct as well. Are you trying to run both NiFi nodes
> and ZK instances on the same machine? In that case you will have to ensure
> that the ports in use are different for each service so they don’t
> conflict. Setting them all to the same value only works if each service is
> running on an independent physical machine, virtual machine, or container.
>
>         I find Pierre’s guide [1] to be a helpful step-by-step instruction
> list as well as a good explanation of how the clustering concepts work in
> practice. When you get that working, and you’re ready to set up a secure
> cluster, he has a follow-on guide for that as well [2]. Even as someone who
> has set up many clustered instances of NiFi, I use his guides regularly to
> ensure I haven’t forgotten a step.
>
>         They were originally written for versions 1.0.0 and 1.1.0, but the
> only thing that has changed is the authorizer configuration for the secure
> instances (you’ll need to put the Initial Admin Identity and Node
> Identities in two locations in the authorizers.xml file instead of just
> once).
>
>         Hopefully this helps you get a working cluster up and running so
> you can experiment. Good luck.
>
>         [1]
> https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
>         [2]
> https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/
>
>
>         Andy LoPresto
>         alopresto@apache.org
>         alopresto.apache@gmail.com
>         PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>         On Oct 1, 2018, at 2:45 PM, Phil H <gi...@gmail.com> wrote:
>
>         Thanks Nathan,
>
>         I changed the protocol.port to 10002 on both servers.
>
>         On server 1, I now just see endless copies of the second error
> from my original message (“KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when
> there’s only a single member of a cluster alive and running?  Seems like
> the logs will fill up very quickly if it is!
>
>         On server 2, I get a bind exception on the Zookeeper client port.
> It doesn’t matter what I set it to (In this example, I changed it to 10500)
> I always get the same result.  If I run netstat when nifi isn’t running,
> there’s nothing listening on the port.  It’s like NiFi is starting two
> Zookeeper instances?!  There’s no repeat of this in the start up sequence
> though.  Both servers are running completely vanilla 1.6.0 – I don’t even
> have any flow defined yet, this is purely for teaching myself clustering
> config – so I don’t know why one is behaving differently to the other.
>
>         2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500]
> o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000
> minSessionTimeout 4000 maxSessionTimeout 40000 datadir
> ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
>         2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500]
> o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/
> 192.168.10.102:10500
>         java.net.BindException: Address already in use (Bind failed)
>                 at java.net.PlainSocketImpl.socketBind(Native Method)
>                 at java.net
> .AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>                 at java.net.ServerSocket.bind(ServerSocket.java:375)
>                 at java.net.ServerSocket.bind(ServerSocket.java:329)
>                 at
> org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
>                 at
> org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
>                 at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
>
>
>
>
>         From: Nathan Gough
>         Sent: Tuesday, 2 October 2018 2:22 AM
>         To: dev@nifi.apache.org
>         Subject: Re: Zookeeper - help!
>
>         Hi Phil,
>
>         One thing I notice with your config is that the
> cluster.node.protol.port and the zookeeper ports are the same - these
> should not be the same. Node.protocol.port is used by NiFi cluster to
> communicate between nodes, the zookeeper.connect.string port should be the
> port that zookeeper service is listening on. The zookeeper port is
> configured by the clientPort property in the zookeeper.properties file.
> This would make your connect string:
> 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where
> 2180 is whatever clientPort is configured.
>
>         You can read more about how NiFi uses Zookeeper and how to
> configure it here:
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management
> .
>
>         Let us know what happens once these properties are configured
> correctly.
>
>         Nathan
>
>
>         On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:
>
>            Hi guys,
>
>            Pulling my hair out trying to solve my Zookeeper problems.  I
> have two 1.6.0 servers that I am trying to cluster.
>
>            Here is the except from the properties files – all other
> properties are default so omitted for clarity.   The servers are set up to
> run HTTPS, and the interface works via the browser, so I believe the
> certificates are correctly installed.
>
>            Server nifi1.domain:
>            nifi.cluster.is.node=true
>            nifi.cluster.node.address=nifi1.domain
>            nifi.cluster.node.protocol.port=10000
>
>
>  nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
>            nifi.zookeeper.root.node=/nifi
>
>            Server nifi2.domain:
>            nifi.cluster.is.node=true
>            nifi.cluster.node.address=nifi2.domain
>            nifi.cluster.node.protocol.port=10000
>
>
>  nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
>            nifi.zookeeper.root.node=/nifi
>
>            I am getting these errors (this is from server 2, but seeing
> the same on server 1 apart from a different address, of course):
>
>            2018-10-01 20:54:16,332 INFO [main]
> org.apache.nifi.io.socket.SocketListener Now listening for connections from
> nodes on port 10000
>            2018-10-01 20:54:16,381 INFO [main]
> o.apache.nifi.controller.FlowController Successfully synchronized
> controller with proposed flow
>            2018-10-01 20:54:16,435 INFO [main]
> o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
>            2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol
> Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did
> not contain client certificates and thus the DN cannot be extracted. Check
> that the other endpoint is providing a complete client certificate chain
>            2018-10-01 20:54:16,771 WARN [Process Cluster Protocol
> Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol
> message from nifi2 due to
> org.apache.nifi.cluster.protocol.ProtocolException:
> java.security.cert.CertificateException:
> javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>            org.apache.nifi.cluster.protocol.ProtocolException:
> java.security.cert.CertificateException:
> javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>                    at
> org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
>                    at
> org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
>                    at
> org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
>                    at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                    at java.lang.Thread.run(Thread.java:748)
>            Caused by: java.security.cert.CertificateException:
> javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>                    at
> org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
>                    at
> org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
>                    at
> org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
>                    ... 5 common frames omitted
>            Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not
> authenticated
>                    at
> sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
>                    at
> org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
>                    ... 7 common frames omitted
>
>
>
>            2018-10-01 20:54:32,249 INFO [Curator-Framework-0]
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
>            2018-10-01 20:54:32,250 ERROR [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
>            org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
>                    at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>                    at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
>                    at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
>                    at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>                    at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>                    at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>                    at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>                    at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>                    at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>                    at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                    at java.lang.Thread.run(Thread.java:748)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

RE: Zookeeper - help!

Posted by Phil H <gi...@gmail.com>.
Okay,

I have got this working now, albeit with only a single ZK instance (at this stage).

The missing piece of the puzzle that wasn’t in the guides from Pierre was that cluster servers’ certificates need to be installed in each server’s keystore, and all the cluster server DNs need to be added as Initial User Identities in authorizers.xml.

Thanks again for all the assistance.

Sent from Mail for Windows 10

From: Nathan Gough
Sent: Wednesday, 3 October 2018 7:27 AM
To: dev@nifi.apache.org
Subject: Re: Zookeeper - help!

I think you are correct on that, I assumed it was a range of some kind but it looks like it's not: http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_RunningReplicatedZooKeeper


On 10/2/18, 5:17 PM, "Phil H" <gi...@gmail.com> wrote:

    The second port in the zookeeper server config has been a mystery to me.  I thought it was a second port used for elections, not the upper bound in a range.  Why is the range so large?
    
    Sent from Mail for Windows 10
    
    From: Nathan Gough
    Sent: Wednesday, 3 October 2018 1:26 AM
    To: dev@nifi.apache.org
    Subject: Re: Zookeeper - help!
    
    Check your configs on nifi2. I don't believe that NiFi is starting two instances of Zookeeper but the ports configured are unintentionally configured to overlap ie. Ports used twice in different configs where they should be different.
    
    It may be that your zookeeper.properties has:
    
    clientPort=2180
    ...
    server.1=nifi1.com:2180:3888
    server.2=nifi2.com:2180:3888
    
    where it should be:
    
    clientPort=2180
    ...
    server.1=nifi1.com:2888:3888
    server.2=nifi2.com:2888:3888
    
    noticing that the server.1 and server.2 ranges don't overlap with the client port.
    
    
    Not sure if this helps, but the following is the relevant config that I have for my NiFi cluster nodes that run on the SAME machine where nifi1.com and nifi2.com are configured in /etc/hosts:
    
    nifi1/conf
    zookeeper.properties
    - clientPort=2180
    - server.1=nifi1.com:2888:3888
    - server.2=nifi2.com:2888:3888
    
    nifi.properties
    - nifi.remote.input.host=nifi1.com
    - nifi.remote.input.socket.port=10440
    - nifi.web.http.host=nifi1.com
    - nifi.web.http.port=9550
    - nifi.cluster.node.address=nifi1.com
    - nifi.cluster.node.protocol.port=11440
    
    nifi1/state/zookeeper
    /myid (file contents = "1")
    /state-management.xml (no changes required)
    /version-2/
    
    
    nifi2/conf
    zookeeper.properties
    - clientPort=2181
    - server.1=nifi1.com:2888:3888
    - server.2=nifi2.com:2888:3888
    
    nifi.properties
    - nifi.remote.input.host=nifi2.com
    - nifi.remote.input.socket.port=10441
    - nifi.web.http.host=nifi2.com
    - nifi.web.http.port=9551
    - nifi.cluster.node.address=nifi2.com
    - nifi.cluster.node.protocol.port=11441
    
    nifi2/state/zookeeper
    /myid (file contents = "2")
    /state-management.xml (no changes required)
    /version-2/
    
    
    Nathan
    
    
    
    On 10/2/18, 2:07 AM, "Phil H" <gi...@gmail.com> wrote:
    
        Hi Andy,
        
        Thanks for the additional info.  I think I saw a link to that while searching but was wary since it was such an old version.
        
        I have two VMs (nifi1, and nifi2) both running NiFi with identical configs, and trying to use the inbuilt ZK to cluster them.
        
        If I only mention a single machine within the config (eg: if nifi1 doesn’t refer to nifi2, or visa versa) I don’t get any start up errors.
        
        Phil
        
        From: Andy LoPresto
        Sent: Tuesday, 2 October 2018 1:00 PM
        To: dev@nifi.apache.org
        Subject: Re: Zookeeper - help!
        
        Hi Phil, 
        
        Nathan’s advice is correct but I think he was assuming all other configurations are correct as well. Are you trying to run both NiFi nodes and ZK instances on the same machine? In that case you will have to ensure that the ports in use are different for each service so they don’t conflict. Setting them all to the same value only works if each service is running on an independent physical machine, virtual machine, or container. 
        
        I find Pierre’s guide [1] to be a helpful step-by-step instruction list as well as a good explanation of how the clustering concepts work in practice. When you get that working, and you’re ready to set up a secure cluster, he has a follow-on guide for that as well [2]. Even as someone who has set up many clustered instances of NiFi, I use his guides regularly to ensure I haven’t forgotten a step. 
        
        They were originally written for versions 1.0.0 and 1.1.0, but the only thing that has changed is the authorizer configuration for the secure instances (you’ll need to put the Initial Admin Identity and Node Identities in two locations in the authorizers.xml file instead of just once). 
        
        Hopefully this helps you get a working cluster up and running so you can experiment. Good luck. 
        
        [1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
        [2] https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/
        
        
        Andy LoPresto
        alopresto@apache.org
        alopresto.apache@gmail.com
        PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
        
        On Oct 1, 2018, at 2:45 PM, Phil H <gi...@gmail.com> wrote:
        
        Thanks Nathan,
        
        I changed the protocol.port to 10002 on both servers.
        
        On server 1, I now just see endless copies of the second error from my original message (“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when there’s only a single member of a cluster alive and running?  Seems like the logs will fill up very quickly if it is!
        
        On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter what I set it to (In this example, I changed it to 10500) I always get the same result.  If I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this is purely for teaching myself clustering config – so I don’t know why one is behaving differently to the other.
        
        2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
        2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/192.168.10.102:10500
        java.net.BindException: Address already in use (Bind failed)
        	at java.net.PlainSocketImpl.socketBind(Native Method)
        	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        	at java.net.ServerSocket.bind(ServerSocket.java:375)
        	at java.net.ServerSocket.bind(ServerSocket.java:329)
        	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
        	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
        	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
        
        
        
        
        From: Nathan Gough
        Sent: Tuesday, 2 October 2018 2:22 AM
        To: dev@nifi.apache.org
        Subject: Re: Zookeeper - help!
        
        Hi Phil,
        
        One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.
        
        You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
        
        Let us know what happens once these properties are configured correctly.
        
        Nathan
        
        
        On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:
        
           Hi guys,
        
           Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.
        
           Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.
        
           Server nifi1.domain:
           nifi.cluster.is.node=true
           nifi.cluster.node.address=nifi1.domain
           nifi.cluster.node.protocol.port=10000
        
           nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
           nifi.zookeeper.root.node=/nifi
        
           Server nifi2.domain:
           nifi.cluster.is.node=true
           nifi.cluster.node.address=nifi2.domain
           nifi.cluster.node.protocol.port=10000
        
           nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
           nifi.zookeeper.root.node=/nifi
        
           I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):
        
           2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
           2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
           2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
           2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
           2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
           org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
                   at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
                   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                   at java.lang.Thread.run(Thread.java:748)
           Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
                   ... 5 common frames omitted
           Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
                   ... 7 common frames omitted
        
        
        
           2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
           2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
           org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
                   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
                   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                   at java.lang.Thread.run(Thread.java:748)
        
        
        
        
        
        
        
    
    
    
    




Re: Zookeeper - help!

Posted by Nathan Gough <th...@gmail.com>.
I think you are correct on that, I assumed it was a range of some kind but it looks like it's not: http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_RunningReplicatedZooKeeper


On 10/2/18, 5:17 PM, "Phil H" <gi...@gmail.com> wrote:

    The second port in the zookeeper server config has been a mystery to me.  I thought it was a second port used for elections, not the upper bound in a range.  Why is the range so large?
    
    Sent from Mail for Windows 10
    
    From: Nathan Gough
    Sent: Wednesday, 3 October 2018 1:26 AM
    To: dev@nifi.apache.org
    Subject: Re: Zookeeper - help!
    
    Check your configs on nifi2. I don't believe that NiFi is starting two instances of Zookeeper but the ports configured are unintentionally configured to overlap ie. Ports used twice in different configs where they should be different.
    
    It may be that your zookeeper.properties has:
    
    clientPort=2180
    ...
    server.1=nifi1.com:2180:3888
    server.2=nifi2.com:2180:3888
    
    where it should be:
    
    clientPort=2180
    ...
    server.1=nifi1.com:2888:3888
    server.2=nifi2.com:2888:3888
    
    noticing that the server.1 and server.2 ranges don't overlap with the client port.
    
    
    Not sure if this helps, but the following is the relevant config that I have for my NiFi cluster nodes that run on the SAME machine where nifi1.com and nifi2.com are configured in /etc/hosts:
    
    nifi1/conf
    zookeeper.properties
    - clientPort=2180
    - server.1=nifi1.com:2888:3888
    - server.2=nifi2.com:2888:3888
    
    nifi.properties
    - nifi.remote.input.host=nifi1.com
    - nifi.remote.input.socket.port=10440
    - nifi.web.http.host=nifi1.com
    - nifi.web.http.port=9550
    - nifi.cluster.node.address=nifi1.com
    - nifi.cluster.node.protocol.port=11440
    
    nifi1/state/zookeeper
    /myid (file contents = "1")
    /state-management.xml (no changes required)
    /version-2/
    
    
    nifi2/conf
    zookeeper.properties
    - clientPort=2181
    - server.1=nifi1.com:2888:3888
    - server.2=nifi2.com:2888:3888
    
    nifi.properties
    - nifi.remote.input.host=nifi2.com
    - nifi.remote.input.socket.port=10441
    - nifi.web.http.host=nifi2.com
    - nifi.web.http.port=9551
    - nifi.cluster.node.address=nifi2.com
    - nifi.cluster.node.protocol.port=11441
    
    nifi2/state/zookeeper
    /myid (file contents = "2")
    /state-management.xml (no changes required)
    /version-2/
    
    
    Nathan
    
    
    
    On 10/2/18, 2:07 AM, "Phil H" <gi...@gmail.com> wrote:
    
        Hi Andy,
        
        Thanks for the additional info.  I think I saw a link to that while searching but was wary since it was such an old version.
        
        I have two VMs (nifi1, and nifi2) both running NiFi with identical configs, and trying to use the inbuilt ZK to cluster them.
        
        If I only mention a single machine within the config (eg: if nifi1 doesn’t refer to nifi2, or visa versa) I don’t get any start up errors.
        
        Phil
        
        From: Andy LoPresto
        Sent: Tuesday, 2 October 2018 1:00 PM
        To: dev@nifi.apache.org
        Subject: Re: Zookeeper - help!
        
        Hi Phil, 
        
        Nathan’s advice is correct but I think he was assuming all other configurations are correct as well. Are you trying to run both NiFi nodes and ZK instances on the same machine? In that case you will have to ensure that the ports in use are different for each service so they don’t conflict. Setting them all to the same value only works if each service is running on an independent physical machine, virtual machine, or container. 
        
        I find Pierre’s guide [1] to be a helpful step-by-step instruction list as well as a good explanation of how the clustering concepts work in practice. When you get that working, and you’re ready to set up a secure cluster, he has a follow-on guide for that as well [2]. Even as someone who has set up many clustered instances of NiFi, I use his guides regularly to ensure I haven’t forgotten a step. 
        
        They were originally written for versions 1.0.0 and 1.1.0, but the only thing that has changed is the authorizer configuration for the secure instances (you’ll need to put the Initial Admin Identity and Node Identities in two locations in the authorizers.xml file instead of just once). 
        
        Hopefully this helps you get a working cluster up and running so you can experiment. Good luck. 
        
        [1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
        [2] https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/
        
        
        Andy LoPresto
        alopresto@apache.org
        alopresto.apache@gmail.com
        PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
        
        On Oct 1, 2018, at 2:45 PM, Phil H <gi...@gmail.com> wrote:
        
        Thanks Nathan,
        
        I changed the protocol.port to 10002 on both servers.
        
        On server 1, I now just see endless copies of the second error from my original message (“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when there’s only a single member of a cluster alive and running?  Seems like the logs will fill up very quickly if it is!
        
        On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter what I set it to (In this example, I changed it to 10500) I always get the same result.  If I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this is purely for teaching myself clustering config – so I don’t know why one is behaving differently to the other.
        
        2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
        2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/192.168.10.102:10500
        java.net.BindException: Address already in use (Bind failed)
        	at java.net.PlainSocketImpl.socketBind(Native Method)
        	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        	at java.net.ServerSocket.bind(ServerSocket.java:375)
        	at java.net.ServerSocket.bind(ServerSocket.java:329)
        	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
        	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
        	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
        
        
        
        
        From: Nathan Gough
        Sent: Tuesday, 2 October 2018 2:22 AM
        To: dev@nifi.apache.org
        Subject: Re: Zookeeper - help!
        
        Hi Phil,
        
        One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.
        
        You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
        
        Let us know what happens once these properties are configured correctly.
        
        Nathan
        
        
        On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:
        
           Hi guys,
        
           Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.
        
           Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.
        
           Server nifi1.domain:
           nifi.cluster.is.node=true
           nifi.cluster.node.address=nifi1.domain
           nifi.cluster.node.protocol.port=10000
        
           nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
           nifi.zookeeper.root.node=/nifi
        
           Server nifi2.domain:
           nifi.cluster.is.node=true
           nifi.cluster.node.address=nifi2.domain
           nifi.cluster.node.protocol.port=10000
        
           nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
           nifi.zookeeper.root.node=/nifi
        
           I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):
        
           2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
           2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
           2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
           2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
           2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
           org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
                   at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
                   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                   at java.lang.Thread.run(Thread.java:748)
           Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
                   at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
                   ... 5 common frames omitted
           Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
                   at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
                   ... 7 common frames omitted
        
        
        
           2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
           2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
           org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
                   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
                   at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
                   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                   at java.lang.Thread.run(Thread.java:748)
        
        
        
        
        
        
        
    
    
    
    



RE: Zookeeper - help!

Posted by Phil H <gi...@gmail.com>.
The second port in the zookeeper server config has been a mystery to me.  I thought it was a second port used for elections, not the upper bound in a range.  Why is the range so large?

Sent from Mail for Windows 10

From: Nathan Gough
Sent: Wednesday, 3 October 2018 1:26 AM
To: dev@nifi.apache.org
Subject: Re: Zookeeper - help!

Check your configs on nifi2. I don't believe that NiFi is starting two instances of Zookeeper but the ports configured are unintentionally configured to overlap ie. Ports used twice in different configs where they should be different.

It may be that your zookeeper.properties has:

clientPort=2180
...
server.1=nifi1.com:2180:3888
server.2=nifi2.com:2180:3888

where it should be:

clientPort=2180
...
server.1=nifi1.com:2888:3888
server.2=nifi2.com:2888:3888

noticing that the server.1 and server.2 ranges don't overlap with the client port.


Not sure if this helps, but the following is the relevant config that I have for my NiFi cluster nodes that run on the SAME machine where nifi1.com and nifi2.com are configured in /etc/hosts:

nifi1/conf
zookeeper.properties
- clientPort=2180
- server.1=nifi1.com:2888:3888
- server.2=nifi2.com:2888:3888

nifi.properties
- nifi.remote.input.host=nifi1.com
- nifi.remote.input.socket.port=10440
- nifi.web.http.host=nifi1.com
- nifi.web.http.port=9550
- nifi.cluster.node.address=nifi1.com
- nifi.cluster.node.protocol.port=11440

nifi1/state/zookeeper
/myid (file contents = "1")
/state-management.xml (no changes required)
/version-2/


nifi2/conf
zookeeper.properties
- clientPort=2181
- server.1=nifi1.com:2888:3888
- server.2=nifi2.com:2888:3888

nifi.properties
- nifi.remote.input.host=nifi2.com
- nifi.remote.input.socket.port=10441
- nifi.web.http.host=nifi2.com
- nifi.web.http.port=9551
- nifi.cluster.node.address=nifi2.com
- nifi.cluster.node.protocol.port=11441

nifi2/state/zookeeper
/myid (file contents = "2")
/state-management.xml (no changes required)
/version-2/


Nathan



On 10/2/18, 2:07 AM, "Phil H" <gi...@gmail.com> wrote:

    Hi Andy,
    
    Thanks for the additional info.  I think I saw a link to that while searching but was wary since it was such an old version.
    
    I have two VMs (nifi1, and nifi2) both running NiFi with identical configs, and trying to use the inbuilt ZK to cluster them.
    
    If I only mention a single machine within the config (eg: if nifi1 doesn’t refer to nifi2, or visa versa) I don’t get any start up errors.
    
    Phil
    
    From: Andy LoPresto
    Sent: Tuesday, 2 October 2018 1:00 PM
    To: dev@nifi.apache.org
    Subject: Re: Zookeeper - help!
    
    Hi Phil, 
    
    Nathan’s advice is correct but I think he was assuming all other configurations are correct as well. Are you trying to run both NiFi nodes and ZK instances on the same machine? In that case you will have to ensure that the ports in use are different for each service so they don’t conflict. Setting them all to the same value only works if each service is running on an independent physical machine, virtual machine, or container. 
    
    I find Pierre’s guide [1] to be a helpful step-by-step instruction list as well as a good explanation of how the clustering concepts work in practice. When you get that working, and you’re ready to set up a secure cluster, he has a follow-on guide for that as well [2]. Even as someone who has set up many clustered instances of NiFi, I use his guides regularly to ensure I haven’t forgotten a step. 
    
    They were originally written for versions 1.0.0 and 1.1.0, but the only thing that has changed is the authorizer configuration for the secure instances (you’ll need to put the Initial Admin Identity and Node Identities in two locations in the authorizers.xml file instead of just once). 
    
    Hopefully this helps you get a working cluster up and running so you can experiment. Good luck. 
    
    [1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
    [2] https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/
    
    
    Andy LoPresto
    alopresto@apache.org
    alopresto.apache@gmail.com
    PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
    
    On Oct 1, 2018, at 2:45 PM, Phil H <gi...@gmail.com> wrote:
    
    Thanks Nathan,
    
    I changed the protocol.port to 10002 on both servers.
    
    On server 1, I now just see endless copies of the second error from my original message (“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when there’s only a single member of a cluster alive and running?  Seems like the logs will fill up very quickly if it is!
    
    On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter what I set it to (In this example, I changed it to 10500) I always get the same result.  If I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this is purely for teaching myself clustering config – so I don’t know why one is behaving differently to the other.
    
    2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
    2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/192.168.10.102:10500
    java.net.BindException: Address already in use (Bind failed)
    	at java.net.PlainSocketImpl.socketBind(Native Method)
    	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
    	at java.net.ServerSocket.bind(ServerSocket.java:375)
    	at java.net.ServerSocket.bind(ServerSocket.java:329)
    	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
    	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
    	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
    
    
    
    
    From: Nathan Gough
    Sent: Tuesday, 2 October 2018 2:22 AM
    To: dev@nifi.apache.org
    Subject: Re: Zookeeper - help!
    
    Hi Phil,
    
    One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.
    
    You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
    
    Let us know what happens once these properties are configured correctly.
    
    Nathan
    
    
    On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:
    
       Hi guys,
    
       Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.
    
       Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.
    
       Server nifi1.domain:
       nifi.cluster.is.node=true
       nifi.cluster.node.address=nifi1.domain
       nifi.cluster.node.protocol.port=10000
    
       nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
       nifi.zookeeper.root.node=/nifi
    
       Server nifi2.domain:
       nifi.cluster.is.node=true
       nifi.cluster.node.address=nifi2.domain
       nifi.cluster.node.protocol.port=10000
    
       nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
       nifi.zookeeper.root.node=/nifi
    
       I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):
    
       2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
       2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
       2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
       2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
       2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
       org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
               at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
               at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
               at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
               at java.lang.Thread.run(Thread.java:748)
       Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
               at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
               at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
               at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
               ... 5 common frames omitted
       Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
               at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
               at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
               ... 7 common frames omitted
    
    
    
       2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
       2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
       org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
               at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
               at java.util.concurrent.FutureTask.run(FutureTask.java:266)
               at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
               at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
               at java.lang.Thread.run(Thread.java:748)
    
    
    
    
    
    
    




Re: Zookeeper - help!

Posted by Nathan Gough <th...@gmail.com>.
Check your configs on nifi2. I don't believe that NiFi is starting two instances of Zookeeper but the ports configured are unintentionally configured to overlap ie. Ports used twice in different configs where they should be different.

It may be that your zookeeper.properties has:

clientPort=2180
...
server.1=nifi1.com:2180:3888
server.2=nifi2.com:2180:3888

where it should be:

clientPort=2180
...
server.1=nifi1.com:2888:3888
server.2=nifi2.com:2888:3888

noticing that the server.1 and server.2 ranges don't overlap with the client port.


Not sure if this helps, but the following is the relevant config that I have for my NiFi cluster nodes that run on the SAME machine where nifi1.com and nifi2.com are configured in /etc/hosts:

nifi1/conf
zookeeper.properties
- clientPort=2180
- server.1=nifi1.com:2888:3888
- server.2=nifi2.com:2888:3888

nifi.properties
- nifi.remote.input.host=nifi1.com
- nifi.remote.input.socket.port=10440
- nifi.web.http.host=nifi1.com
- nifi.web.http.port=9550
- nifi.cluster.node.address=nifi1.com
- nifi.cluster.node.protocol.port=11440

nifi1/state/zookeeper
/myid (file contents = "1")
/state-management.xml (no changes required)
/version-2/


nifi2/conf
zookeeper.properties
- clientPort=2181
- server.1=nifi1.com:2888:3888
- server.2=nifi2.com:2888:3888

nifi.properties
- nifi.remote.input.host=nifi2.com
- nifi.remote.input.socket.port=10441
- nifi.web.http.host=nifi2.com
- nifi.web.http.port=9551
- nifi.cluster.node.address=nifi2.com
- nifi.cluster.node.protocol.port=11441

nifi2/state/zookeeper
/myid (file contents = "2")
/state-management.xml (no changes required)
/version-2/


Nathan



On 10/2/18, 2:07 AM, "Phil H" <gi...@gmail.com> wrote:

    Hi Andy,
    
    Thanks for the additional info.  I think I saw a link to that while searching but was wary since it was such an old version.
    
    I have two VMs (nifi1, and nifi2) both running NiFi with identical configs, and trying to use the inbuilt ZK to cluster them.
    
    If I only mention a single machine within the config (eg: if nifi1 doesn’t refer to nifi2, or visa versa) I don’t get any start up errors.
    
    Phil
    
    From: Andy LoPresto
    Sent: Tuesday, 2 October 2018 1:00 PM
    To: dev@nifi.apache.org
    Subject: Re: Zookeeper - help!
    
    Hi Phil, 
    
    Nathan’s advice is correct but I think he was assuming all other configurations are correct as well. Are you trying to run both NiFi nodes and ZK instances on the same machine? In that case you will have to ensure that the ports in use are different for each service so they don’t conflict. Setting them all to the same value only works if each service is running on an independent physical machine, virtual machine, or container. 
    
    I find Pierre’s guide [1] to be a helpful step-by-step instruction list as well as a good explanation of how the clustering concepts work in practice. When you get that working, and you’re ready to set up a secure cluster, he has a follow-on guide for that as well [2]. Even as someone who has set up many clustered instances of NiFi, I use his guides regularly to ensure I haven’t forgotten a step. 
    
    They were originally written for versions 1.0.0 and 1.1.0, but the only thing that has changed is the authorizer configuration for the secure instances (you’ll need to put the Initial Admin Identity and Node Identities in two locations in the authorizers.xml file instead of just once). 
    
    Hopefully this helps you get a working cluster up and running so you can experiment. Good luck. 
    
    [1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
    [2] https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/
    
    
    Andy LoPresto
    alopresto@apache.org
    alopresto.apache@gmail.com
    PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
    
    On Oct 1, 2018, at 2:45 PM, Phil H <gi...@gmail.com> wrote:
    
    Thanks Nathan,
    
    I changed the protocol.port to 10002 on both servers.
    
    On server 1, I now just see endless copies of the second error from my original message (“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when there’s only a single member of a cluster alive and running?  Seems like the logs will fill up very quickly if it is!
    
    On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter what I set it to (In this example, I changed it to 10500) I always get the same result.  If I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this is purely for teaching myself clustering config – so I don’t know why one is behaving differently to the other.
    
    2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
    2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/192.168.10.102:10500
    java.net.BindException: Address already in use (Bind failed)
    	at java.net.PlainSocketImpl.socketBind(Native Method)
    	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
    	at java.net.ServerSocket.bind(ServerSocket.java:375)
    	at java.net.ServerSocket.bind(ServerSocket.java:329)
    	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
    	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
    	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
    
    
    
    
    From: Nathan Gough
    Sent: Tuesday, 2 October 2018 2:22 AM
    To: dev@nifi.apache.org
    Subject: Re: Zookeeper - help!
    
    Hi Phil,
    
    One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.
    
    You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
    
    Let us know what happens once these properties are configured correctly.
    
    Nathan
    
    
    On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:
    
       Hi guys,
    
       Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.
    
       Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.
    
       Server nifi1.domain:
       nifi.cluster.is.node=true
       nifi.cluster.node.address=nifi1.domain
       nifi.cluster.node.protocol.port=10000
    
       nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
       nifi.zookeeper.root.node=/nifi
    
       Server nifi2.domain:
       nifi.cluster.is.node=true
       nifi.cluster.node.address=nifi2.domain
       nifi.cluster.node.protocol.port=10000
    
       nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
       nifi.zookeeper.root.node=/nifi
    
       I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):
    
       2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
       2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
       2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
       2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
       2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
       org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
               at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
               at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
               at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
               at java.lang.Thread.run(Thread.java:748)
       Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
               at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
               at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
               at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
               ... 5 common frames omitted
       Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
               at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
               at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
               ... 7 common frames omitted
    
    
    
       2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
       2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
       org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
               at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
               at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
               at java.util.concurrent.FutureTask.run(FutureTask.java:266)
               at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
               at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
               at java.lang.Thread.run(Thread.java:748)
    
    
    
    
    
    
    



RE: Zookeeper - help!

Posted by Phil H <gi...@gmail.com>.
Hi Andy,

Thanks for the additional info.  I think I saw a link to that while searching but was wary since it was such an old version.

I have two VMs (nifi1, and nifi2) both running NiFi with identical configs, and trying to use the inbuilt ZK to cluster them.

If I only mention a single machine within the config (eg: if nifi1 doesn’t refer to nifi2, or visa versa) I don’t get any start up errors.

Phil

From: Andy LoPresto
Sent: Tuesday, 2 October 2018 1:00 PM
To: dev@nifi.apache.org
Subject: Re: Zookeeper - help!

Hi Phil, 

Nathan’s advice is correct but I think he was assuming all other configurations are correct as well. Are you trying to run both NiFi nodes and ZK instances on the same machine? In that case you will have to ensure that the ports in use are different for each service so they don’t conflict. Setting them all to the same value only works if each service is running on an independent physical machine, virtual machine, or container. 

I find Pierre’s guide [1] to be a helpful step-by-step instruction list as well as a good explanation of how the clustering concepts work in practice. When you get that working, and you’re ready to set up a secure cluster, he has a follow-on guide for that as well [2]. Even as someone who has set up many clustered instances of NiFi, I use his guides regularly to ensure I haven’t forgotten a step. 

They were originally written for versions 1.0.0 and 1.1.0, but the only thing that has changed is the authorizer configuration for the secure instances (you’ll need to put the Initial Admin Identity and Node Identities in two locations in the authorizers.xml file instead of just once). 

Hopefully this helps you get a working cluster up and running so you can experiment. Good luck. 

[1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
[2] https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Oct 1, 2018, at 2:45 PM, Phil H <gi...@gmail.com> wrote:

Thanks Nathan,

I changed the protocol.port to 10002 on both servers.

On server 1, I now just see endless copies of the second error from my original message (“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when there’s only a single member of a cluster alive and running?  Seems like the logs will fill up very quickly if it is!

On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter what I set it to (In this example, I changed it to 10500) I always get the same result.  If I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this is purely for teaching myself clustering config – so I don’t know why one is behaving differently to the other.

2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/192.168.10.102:10500
java.net.BindException: Address already in use (Bind failed)
	at java.net.PlainSocketImpl.socketBind(Native Method)
	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
	at java.net.ServerSocket.bind(ServerSocket.java:375)
	at java.net.ServerSocket.bind(ServerSocket.java:329)
	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)




From: Nathan Gough
Sent: Tuesday, 2 October 2018 2:22 AM
To: dev@nifi.apache.org
Subject: Re: Zookeeper - help!

Hi Phil,

One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.

You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.

Let us know what happens once these properties are configured correctly.

Nathan


On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:

   Hi guys,

   Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.

   Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.

   Server nifi1.domain:
   nifi.cluster.is.node=true
   nifi.cluster.node.address=nifi1.domain
   nifi.cluster.node.protocol.port=10000

   nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
   nifi.zookeeper.root.node=/nifi

   Server nifi2.domain:
   nifi.cluster.is.node=true
   nifi.cluster.node.address=nifi2.domain
   nifi.cluster.node.protocol.port=10000

   nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
   nifi.zookeeper.root.node=/nifi

   I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):

   2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
   2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
   2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
   2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
   2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
   org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
           at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
           at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
           at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
           at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
           at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
           at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
           ... 5 common frames omitted
   Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
           at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
           at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
           ... 7 common frames omitted



   2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
   2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
   org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
           at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
           at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
           at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
           at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
           at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
           at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)







Re: Zookeeper - help!

Posted by Andy LoPresto <al...@apache.org>.
Hi Phil,

Nathan’s advice is correct but I think he was assuming all other configurations are correct as well. Are you trying to run both NiFi nodes and ZK instances on the same machine? In that case you will have to ensure that the ports in use are different for each service so they don’t conflict. Setting them all to the same value only works if each service is running on an independent physical machine, virtual machine, or container.

I find Pierre’s guide [1] to be a helpful step-by-step instruction list as well as a good explanation of how the clustering concepts work in practice. When you get that working, and you’re ready to set up a secure cluster, he has a follow-on guide for that as well [2]. Even as someone who has set up many clustered instances of NiFi, I use his guides regularly to ensure I haven’t forgotten a step.

They were originally written for versions 1.0.0 and 1.1.0, but the only thing that has changed is the authorizer configuration for the secure instances (you’ll need to put the Initial Admin Identity and Node Identities in two locations in the authorizers.xml file instead of just once).

Hopefully this helps you get a working cluster up and running so you can experiment. Good luck.

[1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
[2] https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 1, 2018, at 2:45 PM, Phil H <gi...@gmail.com> wrote:
> 
> Thanks Nathan,
> 
> I changed the protocol.port to 10002 on both servers.
> 
> On server 1, I now just see endless copies of the second error from my original message (“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when there’s only a single member of a cluster alive and running?  Seems like the logs will fill up very quickly if it is!
> 
> On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter what I set it to (In this example, I changed it to 10500) I always get the same result.  If I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this is purely for teaching myself clustering config – so I don’t know why one is behaving differently to the other.
> 
> 2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
> 2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/192.168.10.102:10500
> java.net.BindException: Address already in use (Bind failed)
> 	at java.net.PlainSocketImpl.socketBind(Native Method)
> 	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
> 	at java.net.ServerSocket.bind(ServerSocket.java:375)
> 	at java.net.ServerSocket.bind(ServerSocket.java:329)
> 	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
> 
> 
> 
> 
> From: Nathan Gough
> Sent: Tuesday, 2 October 2018 2:22 AM
> To: dev@nifi.apache.org
> Subject: Re: Zookeeper - help!
> 
> Hi Phil,
> 
> One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.
> 
> You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
> 
> Let us know what happens once these properties are configured correctly.
> 
> Nathan
> 
> 
> On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:
> 
>    Hi guys,
> 
>    Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.
> 
>    Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.
> 
>    Server nifi1.domain:
>    nifi.cluster.is.node=true
>    nifi.cluster.node.address=nifi1.domain
>    nifi.cluster.node.protocol.port=10000
> 
>    nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
>    nifi.zookeeper.root.node=/nifi
> 
>    Server nifi2.domain:
>    nifi.cluster.is.node=true
>    nifi.cluster.node.address=nifi2.domain
>    nifi.cluster.node.protocol.port=10000
> 
>    nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
>    nifi.zookeeper.root.node=/nifi
> 
>    I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):
> 
>    2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
>    2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
>    2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
>    2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
>    2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>    org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
>            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
>            at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
>            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>            at java.lang.Thread.run(Thread.java:748)
>    Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
>            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
>            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
>            ... 5 common frames omitted
>    Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>            at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
>            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
>            ... 7 common frames omitted
> 
> 
> 
>    2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
>    2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
>    org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>            at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>            at java.lang.Thread.run(Thread.java:748)
> 
> 
> 
> 
> 


RE: Zookeeper - help!

Posted by Phil H <gi...@gmail.com>.
Thanks Nathan,

I changed the protocol.port to 10002 on both servers.

On server 1, I now just see endless copies of the second error from my original message (“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t know if that’s normal when there’s only a single member of a cluster alive and running?  Seems like the logs will fill up very quickly if it is!

On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter what I set it to (In this example, I changed it to 10500) I always get the same result.  If I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this is purely for teaching myself clustering config – so I don’t know why one is behaving differently to the other.

2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader Couldn't bind to nifi2.domain/192.168.10.102:10500
java.net.BindException: Address already in use (Bind failed)
	at java.net.PlainSocketImpl.socketBind(Native Method)
	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
	at java.net.ServerSocket.bind(ServerSocket.java:375)
	at java.net.ServerSocket.bind(ServerSocket.java:329)
	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)




From: Nathan Gough
Sent: Tuesday, 2 October 2018 2:22 AM
To: dev@nifi.apache.org
Subject: Re: Zookeeper - help!

Hi Phil,

One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.

You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.

Let us know what happens once these properties are configured correctly.

Nathan


On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:

    Hi guys,
    
    Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.
    
    Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.
    
    Server nifi1.domain:
    nifi.cluster.is.node=true
    nifi.cluster.node.address=nifi1.domain
    nifi.cluster.node.protocol.port=10000
    
    nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
    nifi.zookeeper.root.node=/nifi
    
    Server nifi2.domain:
    nifi.cluster.is.node=true
    nifi.cluster.node.address=nifi2.domain
    nifi.cluster.node.protocol.port=10000
    
    nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
    nifi.zookeeper.root.node=/nifi
    
    I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):
    
    2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
    2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
    2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
    2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
    2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
    org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
            at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
            ... 5 common frames omitted
    Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
            at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
            ... 7 common frames omitted
    
    
    
    2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
    2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
    org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
            at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    
    




Re: Zookeeper - help!

Posted by Nathan Gough <th...@gmail.com>.
Hi Phil,

One thing I notice with your config is that the cluster.node.protol.port and the zookeeper ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper service is listening on. The zookeeper port is configured by the clientPort property in the zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 is whatever clientPort is configured.

You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.

Let us know what happens once these properties are configured correctly.

Nathan


On 9/30/18, 11:07 PM, "Phil H" <gi...@gmail.com> wrote:

    Hi guys,
    
    Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers that I am trying to cluster.
    
    Here is the except from the properties files – all other properties are default so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via the browser, so I believe the certificates are correctly installed.
    
    Server nifi1.domain:
    nifi.cluster.is.node=true
    nifi.cluster.node.address=nifi1.domain
    nifi.cluster.node.protocol.port=10000
    
    nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
    nifi.zookeeper.root.node=/nifi
    
    Server nifi2.domain:
    nifi.cluster.is.node=true
    nifi.cluster.node.address=nifi2.domain
    nifi.cluster.node.protocol.port=10000
    
    nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
    nifi.zookeeper.root.node=/nifi
    
    I am getting these errors (this is from server 2, but seeing the same on server 1 apart from a different address, of course):
    
    2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
    2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
    2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
    2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils The incoming request did not contain client certificates and thus the DN cannot be extracted. Check that the other endpoint is providing a complete client certificate chain
    2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
    org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
            at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
            ... 5 common frames omitted
    Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
            at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
            ... 7 common frames omitted
    
    
    
    2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
    2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
    org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
            at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
            at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)