You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by markap14 <gi...@git.apache.org> on 2016/07/22 13:21:27 UTC

[GitHub] nifi pull request #705: NIFI-2360: Leave ZooKeeper running when a node is di...

GitHub user markap14 opened a pull request:

    https://github.com/apache/nifi/pull/705

    NIFI-2360: Leave ZooKeeper running when a node is disconnected. Do no\u2026

    \u2026t allow the last node in the cluster to be disconnected. Change ClusterProtocoLHeartbeater to use RetryNTime retry strategy instead of RetryForever because web requests could block on this

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/markap14/nifi NIFI-2360

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #705
    
----
commit 8ad138d02201d2b593ebb579b51b0c853829b568
Author: Mark Payne <ma...@hotmail.com>
Date:   2016-07-22T13:21:12Z

    NIFI-2360: Leave ZooKeeper running when a node is disconnected. Do not allow the last node in the cluster to be disconnected. Change ClusterProtocoLHeartbeater to use RetryNTime retry strategy instead of RetryForever because web requests could block on this

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #705: NIFI-2360: Leave ZooKeeper running when a node is disconnec...

Posted by JPercivall <gi...@git.apache.org>.
Github user JPercivall commented on the issue:

    https://github.com/apache/nifi/pull/705
  
    +1
    
    Visually verified code, and did a contrib check. Ran a 3 node secure cluster and tested various combinations of disconnecting and reconnecting nodes and restarting in order to find corner cases and performs as expected. Thanks @markap14, I will merge it in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #705: NIFI-2360: Leave ZooKeeper running when a node is disconnec...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/705
  
    @JPercivall - interestingly enough i didn't see this in any of my testing but when i switched back to the branch i saw it immediately when i started up the cluster. A restart fixed it, but then after restarting the cluster another 10 times or so, i was able to replicate the issue again. So it seems a bit sporadic. I was able to address the issue, though. I have updated the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #705: NIFI-2360: Leave ZooKeeper running when a node is di...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/705


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #705: NIFI-2360: Leave ZooKeeper running when a node is disconnec...

Posted by JPercivall <gi...@git.apache.org>.
Github user JPercivall commented on the issue:

    https://github.com/apache/nifi/pull/705
  
    Reproducible error:
    3 node cluster
    stop 2 of the nodes
    restart all three nodes cluster
    None of them can start up, hit "Unexpected Error".
    
    In the nifi.app logs see a lot of:
    2016-07-22 18:13:37,643 WARN [Replicate Request Thread-8] o.a.n.c.c.h.r.ThreadPoolRequestReplicator
    com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
            at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.Client.handle(Client.java:652) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509) ~[jersey-client-1.19.jar:1.19]
            at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:493) ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
            at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:687) ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_74]
            at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_74]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_74]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_74]
            at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
    Caused by: java.net.SocketTimeoutException: Read timed out
            at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.8.0_74]
            at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[na:1.8.0_74]
            at java.net.SocketInputStream.read(SocketInputStream.java:170) ~[na:1.8.0_74]
            at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[na:1.8.0_74]
            at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) ~[na:1.8.0_74]
            at sun.security.ssl.InputRecord.read(InputRecord.java:503) ~[na:1.8.0_74]
            at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) ~[na:1.8.0_74]
            at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) ~[na:1.8.0_74]
            at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) ~[na:1.8.0_74]
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[na:1.8.0_74]
            at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[na:1.8.0_74]
            at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_74]
            at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) ~[na:1.8.0_74]
            at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) ~[na:1.8.0_74]
            at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) ~[na:1.8.0_74]
            at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) ~[na:1.8.0_74]
            at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) ~[na:1.8.0_74]
            at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) ~[na:1.8.0_74]
            at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) ~[jersey-client-1.19.jar:1.19]
            ... 12 common frames omitted
    2016-07-22 18:13:37,652 WARN [Replicate Request Thread-7] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/controller/bulletins to localhost:8481 due to {}
    com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
            at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.Client.handle(Client.java:652) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509) ~[jersey-client-1.19.jar:1.19]
            at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:493) ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
            at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:687) ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_74]
            at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_74]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_74]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_74]
            at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
    Caused by: java.net.SocketTimeoutException: Read timed out
            at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.8.0_74]
            at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[na:1.8.0_74]
            at java.net.SocketInputStream.read(SocketInputStream.java:170) ~[na:1.8.0_74]
            at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[na:1.8.0_74]
            at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) ~[na:1.8.0_74]
            at sun.security.ssl.InputRecord.read(InputRecord.java:503) ~[na:1.8.0_74]
            at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) ~[na:1.8.0_74]
            at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) ~[na:1.8.0_74]
            at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) ~[na:1.8.0_74]
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[na:1.8.0_74]
            at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[na:1.8.0_74]
            at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_74]
            at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) ~[na:1.8.0_74]
            at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) ~[na:1.8.0_74]
            at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) ~[na:1.8.0_74]
            at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) ~[na:1.8.0_74]
            at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) ~[na:1.8.0_74]
            at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) ~[na:1.8.0_74]
            at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) ~[jersey-client-1.19.jar:1.19]
            at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) ~[jersey-client-1.19.jar:1.19]
            ... 12 common frames omitted



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---